Despite the prominence of “tree-thinking” among contemporary systematists and evolutionary biologists, the biological meaning of different mathematical representations of phylogenies may still be muddled. We compare two basic kinds of discrete mathematical models used to portray phylogenetic relationships among species and higher taxa: stem-based trees and node-based trees. Each model is a tree in the sense that is commonly used in mathematics; the difference between them lies in the biological interpretation of their vertices and edges. Stem-based and node-based trees carry exactly the same information and the biological interpretation of each is similar. Translation between these two kinds of trees can be accomplished by a simple algorithm, which we provide. With the mathematical representation of stem-based and node-based trees clarified, we argue for a distinction between types of trees and types of names. Node-based and stem-based trees contain exactly the same information for naming clades. However, evolutionary concepts, such as monophyly, are represented as different mathematical substructures in the two models. For a given stem-based tree, one should employ stem-based names, whereas for a given node-based tree, one should use node-based names, but applying a node-based name to a stem-based tree is not logical because node-based names cannot exist on a stem-based tree and visa versa. Authors might use node-based and stem-based concepts of monophyly for the same representation of a phylogeny, yet, if so, they must recognize that such a representation differs from the graphical models used for computing in phylogenetic systematics.
We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.