“Tree-thinking”, using phylogenies to understand evolutionary relationships, name clades, and understand evolutionary transformations and biogeography, is now ubiquitous in systematics and evolutionary biology and is making its way quickly into the educational and public realms (e.g.,

In the Willi Hennig Memorial Symposium, held in 1977 and published in Systematic Zoology in 1979, David Hull expressed the concern that “uncertainty over what it is that cladograms are supposed to depict and how they are supposed to depict it has been one of the chief sources of confusion in the controversy over cladism” (

The vertices of a node-based tree represent taxa (sampled or inferred), while its edges model ancestry relationships. For example, if the tree represents the results of a phylogenetic analysis, then the tips of the tree are nodes and internal nodes represent inferred common ancestors. By contrast, in a stem-based tree, both sampled and inferred ancestral taxa are modeled by edges, while vertices correspond to speciation events. These two models are isomorphic (as that term is used in mathematics) but not equal: that is, they carry exactly the same information about ancestry, but it is encoded in two different ways. To make this explicit, we give a simple algorithm that constructs a unique node-based tree for every stem-based tree and vice versa. While some might see as frivolous the demonstration that these two tree models are equivalent, the relationship between these two representations has important repercussions for evaluating the biological meaning of trees. Thus, we provide an explicit example of the need for distinction between these representations through a discussion of how the phylogenetic concept of monophyly is represented in each graphical model.

Mathematically speaking, all of the diagrams we shall consider are

We will primarily be concerned with graphs that are

The

A

Trees are well suited for modeling phylogenetic relationships between species or taxa, in which each species or taxon has a unique parent. Uniqueness is vital; a tree in the sense that we use it here cannot model reticulations, such as tokogenetic relationships in a sexually reproducing species or hybridization events between two different species.

A. An example of a stem-based tree, indicating the evolutionary relationship among the sampled taxa A, B, C and their unsampled, but inferred, ancestral species y and z. — (B) The same tree with character data shown (the names of the internal edges have been omitted for clarity). In each case, taxon names are displaced from the leaf position to emphasize that the edge is the taxon.

By the term

We frequently refer to the internal edges as “hypothetical” ancestors. However, under the paradigm of evolution, there is nothing more hypothetical about these edges than there are about the named taxa represented by specimens. If the inferred tree is correct, then these ancestral taxa represented by these edges must have existed. Under the evolutionary paradigm, the extent to which we treat named taxa (A, B, C) as real entities of descent with modification is the extent to which we treat internal lines as symbolizing real ancestors. They are not “hypothetical”; they are simply unsampled and inferred (or, conceivable especially in systematics of fossil organisms, unrecognized or misidentified as descendant species).

In Fig. 1B, we have added more information to the tree. Each numbered black rectangle represents an evolutionary character hypothesized to be fixed (sensu

Hennig (

Left: a stem-based tree. Letters are symbols for species and the number applied to the letters are labels for samples of each species considered at a particular time period. Right: a node-based tree with single-headed arrows symbolizing relationship statements and circles representing species. Note the correspondence between the lineages on the left and the circles on the right, as shown by the brackets and double-headed arrows for selected lineages and vertices.

Fig. 2 is redrawn from Hennig (

Below, we prove mathematically that node-based trees and stem-based trees carry the same information, albeit encoded in different ways. We start by setting up some notation.

Let

It is a standard fact that for every set

Finally, we call

We now describe an equivalence between two different kinds of labeled trees. Let

Create a new root vertex, labeled 0, and create a new edge 0→

Label each edge

Erase the labels of the vertices.

An example of the construction of

Reading D to A illustrates Algorithm B.

We can reconstruct

Label each non-root vertex of

Erase all labels on the edges.

Delete vertex

These steps are exactly the reverse of those of Algorithm A; for an illustration, see Fig. 3. It is worth mentioning that the algorithms work the same way whether or not the input tree has polytomies (vertices with more than two children). The algorithms establish the following mathematical fact.

There is a one-to-one correspondence between the following two sets:

The set of all rooted trees

The set of all planted trees

Because the correspondence is one-to-one, the rooted tree

In the node-based tree

Indeed, it follows from Algorithms A and B that there is a one-to-one correspondence between proper subtrees of

Additional biological information associated with a stem-based or node-based tree can be translated via this algorithm. For instance, the character data represented by edge labels in a stem-based tree (Fig. 1B) can be represented by vertex labels in the corresponding node-based tree.

While node-based and stem-based trees carry the same basic information about taxa and ancestry, they represent this information in different ways. Therefore, it should not be surprising that biological concepts are modeled by different mathematical substructures in the two kinds of trees. We provide an example of this through a discussion of how the phylogenetic concept of monophyly is represented in each tree model. Hennig’s (

Definition 1: “A node-based clade is a clade originating with a particular node on a phylogenetic tree, where the node represents a lineage at the instant of a splitting event.” (The PhyloCode version 4c, January, 2010, Article 2.2,

Definition 2. “A branch-based clade is a clade originating with a particular branch (internode) on a phylogenetic tree, where the branch represents a lineage between two splitting events.” (

We argue that this distinction between node-based and branch-based (= stem-based) concepts of monophyly arises from confusion between the two types of trees we have discussed. This is not intended as a critique of the entirety of the PhyloCode, but rather is provided as an example of how being explicit regarding graphical models can provide clarity to discussions of biological concepts. Indeed, given the discussion of these tree models above and adopting Hennig's (

It is worth examining what happens if we apply Definitions 1 and 2 to the wrong kinds of trees. First, a “node-based clade” of a stem-based tree—speaking mathematically, a proper but non-planted subtree of a stem-based tree—does

Practicing "tree-thinkers" might easily make the mental conversion between node-based and stem-based trees. By explicitly detailing that these tree models are mathematically equivalent, we aim to add clarity to discussions related to the biological meaning of phylogenies. It is important to be specific about these two distinct representations of trees. During the latter half of the twentieth century, phylogenies transitioned from being essentially cartoon-representations to graphical representations of the results of an analysis of data (typically represented in a matrix). We argue that biological concepts relating to a phylogeny that is inferred based on an analysis of data should be discussed in a context consistent with the graphical model used to display results of the analysis. To our knowledge, most evolutionary biologists do not construct estimates of phylogenetic relationships based on mathematical models in which transformations of characters occur at

The authors have declared that no competing interests exist.

EOW thanks the late David Hull (Northwestern University) for sending a copy of a manuscript that he never published entitled “Hierarchies and Hierarchies” that touched upon the problems associated with process/pattern and tree/cladogram controversies, and for what must have seemed to him hours of discussion on things phylogenetic and philosophical regarding the subject. We also thank Shannon DeVaney (Los Angeles County Museum) and Mark Holder (University of Kansas) for reading the manuscript and providing a critical review.