Red algae comprise an anciently diverged, species-rich phylum with morphologies that span unicells to large seaweeds. Here, leveraging a rich red algal genome and transcriptome dataset, we used 298 single-copy orthologous nuclear genes from 15 red algal species to erect a robust multi-gene phylogeny of Rhodophyta. This tree places red seaweeds (Bangiophyceae and Florideophyceae) at the base of the mesophilic red algae with the remaining non-seaweed mesophilic lineages forming a well-supported sister group. The early divergence of seaweeds contrasts with the evolution of multicellular land plants and brown algae that are nested among multiple, unicellular or filamentous sister lineages. Using this novel perspective on red algal evolution, we studied the evolution of the pathways for isoprenoid biosynthesis. This analysis revealed losses of the mevalonate pathway on at least three separate occasions in lineages that contain Cyanidioschyzon, Porphyridium, and Chondrus. Our results establish a framework for in-depth studies of the origin and evolution of genes and metabolic pathways in Rhodophyta.
The ancestor of Paulinella chromatophora established a symbiotic relationship with cyanobacteria related to the Prochloroccocus/Synechococcus clade. This event has been described as a second primary endosymbiosis leading to a plastid in the making. Based on the rate of pseudogene disintegration in the endosymbiotic bacteria Buchnera aphidicola, it was suggested that the chromatophore in P. chromatophora has a minimum age of ~60 Myr. Here we revisit this estimation by using a lognormal relaxed molecular clock on the 18S rRNA of P. chromatophora. Our time estimates show that depending on the assumptions made to calibrate the molecular clock, P. chromatophora diverged from heterotrophic Paulinella spp. ~ 90 to 140 Myr ago, thus establishing a maximum date for the origin of the chromatophore.
Recently developed molecular methods enable geneticists to target and sequence thousands of orthologous loci and infer evolutionary relationships across the tree of life. Large numbers of genetic markers benefit species tree inference but visual inspection of alignment quality, as traditionally conducted, is challenging with thousands of loci. Furthermore, due to the impracticality of repeated visual inspection with alternative filtering criteria, the potential consequences of using datasets with different degrees of missing data remain nominally explored in most empirical phylogenomic studies. In this short communication, I describe a flexible high-throughput pipeline designed to assess alignment quality and filter exonic sequence data for subsequent inference. The stringency criteria for alignment quality and missing data can be adapted based on the expected level of sequence divergence. Each alignment is automatically evaluated based on the stringency criteria specified, significantly reducing the number of alignments that require visual inspection. By developing a rapid method for alignment filtering and quality assessment, the consistency of phylogenetic estimation based on exonic sequence alignments can be further explored across distinct inference methods, while accounting for different degrees of missing data.
Incomplete lineage sorting (ILS), modelled by the multi-species coalescent, is a process that results in a gene tree being different from the species tree. Because ILS is expected to occur for at least some loci within genome-scale analyses, the evaluation of species tree estimation methods in the presence of ILS is of great interest. Performance on simulated and biological data have suggested that concatenation analyses can result in the wrong tree with high support under some conditions, and a recent theoretical result by Roch and Steel proved that concatenation using unpartitioned maximum likelihood analysis can be statistically inconsistent in the presence of ILS. In this study, we survey the major species tree estimation methods, including the newly proposed “statistical binning” methods, and discuss their theoretical properties. We also note that there are two interpretations of the term “statistical consistency”, and discuss the theoretical results proven under both interpretations.
Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.
More than 2,500 species of copepods (Class Maxillopoda; Subclass Copepoda) occur in the marine planktonic environment. The exceptional morphological conservation of the group, with numerous sibling species groups, makes the identification of species challenging, even for expert taxonomists. Molecular approaches to species identification have allowed rapid detection, discrimination, and identification of species based on DNA sequencing of single specimens and environmental samples. Despite the recent development of diverse genetic and genomic markers, the barcode region of the mitochondrial cytochrome c oxidase subunit I (COI) gene remains a useful and – in some cases – unequaled diagnostic character for species-level identification of copepods. This study reports 800 new barcode sequences for 63 copepod species not included in any previous study and examines the reliability and resolution of diverse statistical approaches to species identification based upon a dataset of 1,381 barcode sequences for 195 copepod species. We explore the impact of missing data (i.e., species not represented in the barcode database) on the accuracy and reliability of species identifications. Among the tested approaches, the best close match analysis resulted in accurate identification of all individuals to species, with no errors (false positives), and out-performed automated tree-based or BLAST based analyses. This comparative analysis yields new understanding of the strengths and weaknesses of DNA barcoding and confirms the value of DNA barcodes for species identification of copepods, including both individual specimens and bulk samples. Continued integrative morphological-molecular taxonomic analysis is needed to produce a taxonomically-comprehensive database of barcode sequences for all species of marine copepods.
Our knowledge of the avian tree of life remains uncertain, particularly at deeper levels due to the rapid diversification early in their evolutionary history. They are the most abundant land vertebrate on the planet and have been of great historical interest to systematists. Birds are also economically and ecologically important and as a result are intensively studied, yet despite their importance and interest to humans around 13% of taxa currently on the endangered species list perhaps as a result of human activity. Despite all this no comprehensive phylogeny that includes both extinct and extant species currently exists. Here we present a species-level supertree, constructed using the Matrix Representation with Parsimony method, of Aves containing approximately two thirds of all species from nearly 1000 source phylogenies with a broad taxonomic coverage. The source data for the tree were collected and processed according to a strict protocol to ensure robust and accurate data handling. The resulting tree topology is largely consistent with molecular hypotheses of avian phylogeny. We identify areas that are in broad agreement with current views on avian systematics and also those that require further work. We also highlight the need for leaf-based support measures to enable the identification of rogue taxa in supertrees. This is a first attempt at a supertree of both extinct and extant birds, it is not intended to be utilised in an overhaul of avian systematics or as a basis for taxonomic re-classification but provides a strong basis on which to base further studies on macroevolution, conservation, biodiversity, comparative biology and character evolution, in particular the inclusion of fossils will allow the study of bird evolution and diversification throughout deep time.
The tree of life of fishes is in a state of flux because we still lack a comprehensive phylogeny that includes all major groups. The situation is most critical for a large clade of spiny-finned fishes, traditionally referred to as percomorphs, whose uncertain relationships have plagued ichthyologists for over a century. Most of what we know about the higher-level relationships among fish lineages has been based on morphology, but rapid influx of molecular studies is changing many established systematic concepts. We report a comprehensive molecular phylogeny for bony fishes that includes representatives of all major lineages. DNA sequence data for 21 molecular markers (one mitochondrial and 20 nuclear genes) were collected for 1410 bony fish taxa, plus four tetrapod species and two chondrichthyan outgroups (total 1416 terminals). Bony fish diversity is represented by 1093 genera, 369 families, and all traditionally recognized orders. The maximum likelihood tree provides unprecedented resolution and high bootstrap support for most backbone nodes, defining for the first time a global phylogeny of fishes. The general structure of the tree is in agreement with expectations from previous morphological and molecular studies, but significant new clades arise. Most interestingly, the high degree of uncertainty among percomorphs is now resolved into nine well-supported supraordinal groups. The order Perciformes, considered by many a polyphyletic taxonomic waste basket, is defined for the first time as a monophyletic group in the global phylogeny. A new classification that reflects our phylogenetic hypothesis is proposed to facilitate communication about the newly found structure of the tree of life of fishes. Finally, the molecular phylogeny is calibrated using 60 fossil constraints to produce a comprehensive time tree. The new time-calibrated phylogeny will provide the basis for and stimulate new comparative studies to better understand the evolution of the amazing diversity of fishes.
Over half of all vertebrates are “fishes”, which exhibit enormous diversity in morphology, physiology, behavior, reproductive biology, and ecology. Investigation of fundamental areas of vertebrate biology depend critically on a robust phylogeny of fishes, yet evolutionary relationships among the major actinopterygian and sarcopterygian lineages have not been conclusively resolved. Although a consensus phylogeny of teleosts has been emerging recently, it has been based on analyses of various subsets of actinopterygian taxa, but not on a full sample of all bony fishes. Here we conducted a comprehensive phylogenetic study on a broad taxonomic sample of 61 actinopterygian and sarcopterygian lineages (with a chondrichthyan outgroup) using a molecular data set of 21 independent loci. These data yielded a resolved phylogenetic hypothesis for extant Osteichthyes, including 1) reciprocally monophyletic Sarcopterygii and Actinopterygii, as currently understood, with polypteriforms as the first diverging lineage within Actinopterygii; 2) a monophyletic group containing gars and bowfin (= Holostei) as sister group to teleosts; and 3) the earliest diverging lineage among teleosts being Elopomorpha, rather than Osteoglossomorpha. Relaxed-clock dating analysis employing a set of 24 newly applied fossil calibrations reveals divergence times that are more consistent with paleontological estimates than previous studies. Establishing a new phylogenetic pattern with accurate divergence dates for bony fishes illustrates several areas where the fossil record is incomplete and provides critical new insights on diversification of this important vertebrate group.
Most statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigated.
We prove that maximum likelihood phylogeny estimation, treating indels as missing data, can be statistically inconsistent for a general (and rather simple) model of sequence evolution, even when given the true alignment. Therefore, accurate phylogeny estimation cannot be guaranteed for maximum likelihood analyses, even given arbitrarily long sequences, when indels are present and treated as missing data.
Our result shows that the standard statistical techniques used to estimate phylogenies from sequence alignments may have unfavorable statistical properties, even when the sequence alignment is accurate and the assumed substitution model matches the generation model. This suggests that the recent research focus on developing statistical methods that treat indel events properly is an important direction for phylogeny estimation.
Comparative oncology aims at speeding up developments for both, human and companion animal cancer patients. Following this line, carcinoembryonic antigen (CEA, CEACAM5) could be a therapeutic target not only for human but also for canine (Canis lupus familiaris; dog) patients. CEACAM5 interacts with CEA-receptor (CEAR) in the cytoplasm of human cancer cells. Our aim was, therefore, to phylogenetically verify the antigenic relationship of CEACAM molecules and CEAR in human and canine cancer.
Anti-human CEACAM5 antibody Col-1, previously being applied for cancer diagnosis in dogs, immunohistochemically reacted to 23 out of 30 canine mammary cancer samples. In immunoblot analyses Col-1 specifically detected human CEACAM5 at 180 kDa in human colon cancer cells HT29, and the canine antigen at 60, 120, or 180 kDa in CF33 and CF41 mammary carcinoma cells as well as in spontaneous mammary tumors. While according to phylogenicity canine CEACAM1 molecules should be most closely related to human CEACAM5, Col-1 did not react with canine CEACAM1, -23, -24, -25, -28 or -30 transfected to canine TLM-1 cells. By flow cytometry the Col-1 target molecule was localized intracellularly in canine CF33 and CF41 cells, in contrast to membranous and cytoplasmic expression of human CEACAM5 in HT29. Col-1 incubation had neither effect on canine nor human cancer cell proliferation. Yet, Col-1 treatment decreased AKT-phosphorylation in canine CF33 cells possibly suggestive of anti-apoptotic function, whereas Col-1 increased AKT-phosphorylation in human HT29 cells. We report further a 99% amino acid similarity of human and canine CEA receptor (CEAR) within the phylogenetic tree. CEAR could be detected in four canine cancer cell lines by immunoblot and intracellularly in 10 out of 10 mammary cancer specimens from dog by immunohistochemistry. Whether the specific canine Col-1 target molecule may as functional analogue to human CEACAM5 act as ligand to canine CEAR, remains to be defined. This study demonstrates the limitations of comparative oncology due to the complex functional evolution of the different CEACAM molecules in humans versus dogs. In contrast, CEAR may be a comprehensive interspecies target for novel cancer therapeutics.
Neotropical Vaccinioideae (Ericaceae) are evolutionarily rather young and presumably of Northern Hemisphere origin. Vaccinioideae are highly dependent on their mycorrhizal symbionts and Sebacinales (basidiomycetes) were previously found to be the dominant mycobionts of Andean Clade Vaccinioideae (Neotropical Vaccinieae). We were interested to see whether the North American Vaccinioideae reached the Neotropics with their mycobionts or whether they acquired new, local Sebacinales.
We investigated Sebacinales of 58 individuals of Vaccinioideae from Ecuador, Panama and North America to examine whether mycobionts of each region are distantly or closely related.
We isolated the ITS of the ribosomal nuclear DNA in order to infer a molecular phylogeny of Sebacinales and to determine Molecular Operational Taxonomic Units (MOTUs). MOTU delimitation was based on a 3% threshold of ITS variability and conducted with complete linkage clustering. The analyses revealed that most Sebacinales from Ecuador, Panama and North America are closely related and that two MOTUs out of 33 have a distribution ranging from the Neotropics to the Pacific Northwest of North America. The data suggest that Neotropical and temperate Vaccinioideae of North America share their Sebacinales communities and that plants and fungi migrated together.
Phylogenies of multi-domain proteins have to incorporate macro-evolutionary events, which dramatically increases the complexity of their construction.
We present an application to infer ancestral multi-domain proteins given a species tree and domain phylogenies. As the individual domain phylogenies are often incongruent, we provide diagnostics for the identification and reconciliation of implausible topologies. We implement and extend a suggested algorithmic approach by Behzadi and Vingron (2006).
The phytopathogenic genus Xanthomonas comprises numerous species and pathovars described primarily on their host and tissue specificities. Stenotrophomonas maltophilia , which is non-phytopathogenic and taxonomically closely related to Xanthomonas , has undergone several classifications from Pseudomonas to Xanthomonas and finally to Stenotrophomonas . In this study, we have investigated the phylogenetic and taxonomic status of these members using the complete RNA polymerase beta-subunit ( rpoB ) gene sequences available from their sequenced genomes. Not only did we obtain a phylogenetic tree for xanthomonads, but rpoB gene sequence information has also resolved the taxonomic relationship of X. axonopodis pathovars, X. albilineans and other Xanthomonas strains, with the most marked evidence being that Stenotrophomonas is synonymous to Xanthomonas . This study has revealed the power and potential of complete rpoB gene sequence in taxonomic, phylogenetic and evolutionary studies on Xanthomonas and Stenotrophomonas generic complex.
Several naturally occurring hybrids in Potentilla (Rosaceae) have been reported, but no molecular evidence has so far been available to test these hypotheses of hybridization. We have compared a nuclear and a chloroplast gene tree to identify topological incongruences that may indicate hybridization events in the genus. Furthermore, the monophyly and phylogenetic position of the proposed segregated genera Argentina, Ivesia and Horkelia have been tested. The systematic signal from the two morphological characters, style- and anther shape, has also been investigated by ancestral state reconstruction, to elucidate how well these characters concur with the results of the molecular phylogenies.
Six major clades, Anserina, Alba, Fragarioides, Reptans, ivesioid and Argentea, have been identified within genus Potentilla. Horkelia, Ivesia and Horkeliella (the ivesioid clade), form a monophyletic group nested within Potentilla. Furthermore, the origin of the proposed segregated genus Argentina (the Anserina clade) is uncertain but not in conflict with a new generic status of the group. We also found style morphology to be an informative character that reflects the phylogenetic relationships within Potentilla. Five well-supported incongruences were found between the nuclear and the chloroplast phylogenies, and three of these involved polyploid taxa. However, further investigations, using low copy molecular markers, are required to infer the phylogeny of these species and to test the hypothesis of hybrid origin.
Bombyliidae (~5000 sp.), or bee flies, are a lower brachyceran family of flower-visiting flies that, as larvae, act as parasitoids of other insects. The evolutionary relationships are known from a morphological analysis that yielded minimal support for higher-level groupings. We use the protein-coding gene CAD and 28S rDNA to determine phylogeny and to test the monophyly of existing subfamilies, the divisions Tomophthalmae, and ‘the sand chamber subfamilies’. Additionally, we demonstrate that consensus networks can be used to identify rogue taxa in a Bayesian framework. Pruning rogue taxa post-analysis from the final tree distribution results in increased posterior probabilities. We find 8 subfamilies to be monophyletic and the subfamilies Heterotropinae and Mythicomyiinae to be the earliest diverging lineages. The large subfamily Bombyliinae is found to be polyphyletic and our data does not provide evidence for the monophyly of Tomophthalmae or the ‘sand chamber subfamilies’.
The NCBI Taxonomy underpins many bioinformatics and phyloinformatics databases, but by itself provides limited information on the taxa it contains. One readily available source of information on many taxa is Wikipedia. This paper describes iPhylo Linkout, a Semantic wiki that maps taxa in NCBI’s taxonomy database onto corresponding pages in Wikipedia. Storing the mapping in a wiki makes it easy to edit, correct, or otherwise annotate the links between NCBI and Wikipedia. The mapping currently comprises some 53,000 taxa, and is available at http://iphylo.org/linkout. The links between NCBI and Wikipedia are also made available to NCBI users through the NCBI LinkOut service.
Fruit bats of the genus Pteropus occur throughout the Austral-Asian region west to islands off the eastern coast of Africa. Recent phylogenetic analyses of Pteropus from the western Indian Ocean found low sequence divergence and poor phylogenetic resolution among several morphologically defined species. We reexamine the phylogenetic relationships of these taxa by using multiple individuals per species. In addition, we estimate population genetic structure in two well-sampled taxa occurring on Madagascar and the Comoro Islands (P. rufus and P. seychellensis comorensis). Despite finding a similar pattern of low sequence divergence among species, increased sampling provides insight into the phylogeographic history of western Indian Ocean Pteropus, uncovering high levels of gene flow within species.
The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
Despite their obvious utility, detailed species-level phylogenies are lacking for many groups, including several major mammalian lineages such as bats. Here we provide a cytochrome b genealogy of over 50% of bat species (648 terminal taxa). Based on prior analyzes of related mammal groups, cytb emerges as a particularly reliable phylogenetic marker, and given that our results are broadly congruent with prior knowledge, the phylogeny should be a useful tool for comparative analyzes. Nevertheless, we stress that a single-gene analysis of such a large and old group cannot be interpreted as more than a crude estimate of the bat species tree. Analysis of the full dataset supports the traditional division of bats into macro- and microchiroptera, but not the recently proposed division into Yinpterochiroptera and Yangochiroptera. However, our results only weakly reject the former and strongly support the latter group, and furthermore, a time calibrated analysis of a pruned dataset where most included taxa have the entire 1140bp cytb sequence finds monophyletic Yinpterochiroptera. Most bat families and many higher level groups are supported, however, relationships among families are in general weakly supported, as are many of the deeper nodes of the tree. The exceptions are in most cases apparently due to the misplacement of species with little available data, while in a few cases the results suggest putative problems with current classification, such as the non-monophyly of Mormoopidae. We provide this phylogenetic hypothesis, and an analysis of divergence times, as tools for evolutionary and ecological studies that will be useful until more inclusive studies using multiple loci become available.
Over the last decade, dramatic advances have been made in developing methods for large-scale phylogeny estimation, so that it is now feasible for investigators with moderate computational resources to obtain reasonable solutions to maximum likelihood and maximum parsimony, even for datasets with a few thousand sequences. There has also been progress on developing methods for multiple sequence alignment, so that greater alignment accuracy (and subsequent improvement in phylogenetic accuracy) is now possible through automated methods. However, these methods have not been tested under conditions that reflect properties of datasets confronted by large-scale phylogenetic estimation projects. In this paper we report on a study that compares several alignment methods on a benchmark collection of nucleotide sequence datasets of up to 78,132 sequences. We show that as the number of sequences increases, the number of alignment methods that can analyze the datasets decreases. Furthermore, the most accurate alignment methods are unable to analyze the very largest datasets we studied, so that only moderately accurate alignment methods can be used on the largest datasets. As a result, alignments computed for large datasets have relatively large error rates, and maximum likelihood phylogenies computed on these alignments also have high error rates. Therefore, the estimation of highly accurate multiple sequence alignments is a major challenge for Tree of Life projects, and more generally for large-scale systematics studies.
Grammitid ferns are a well-supported clade of ~900 primarily tropical epiphytic species. Recent phylogenetic studies have found support for a distinctive, geographically diverse group of 24 species referred to as the Lellingeria myosuroides clade and have provided evidence for a variety of phylogenetic relationships within the group, as well as hypotheses of historical processes that have produced current biogeographical patterns. We present new data and analyses that support the following primary conclusions: 1) the L. myosuroides clade is monophyletic and pantropical; 2) that clade is sister to a more species rich clade of entirely Neotropical species (Lellingeria s.s.); 3) we infer two independent dispersal events from the Neotropics to Pacific islands, five independent dispersal events from the Neotropics to the Paleotropics, and two separate dispersal events from mainland tropical America to the West Indies.
Despite the prominence of “tree-thinking” among contemporary systematists and evolutionary biologists, the biological meaning of different mathematical representations of phylogenies may still be muddled. We compare two basic kinds of discrete mathematical models used to portray phylogenetic relationships among species and higher taxa: stem-based trees and node-based trees. Each model is a tree in the sense that is commonly used in mathematics; the difference between them lies in the biological interpretation of their vertices and edges. Stem-based and node-based trees carry exactly the same information and the biological interpretation of each is similar. Translation between these two kinds of trees can be accomplished by a simple algorithm, which we provide. With the mathematical representation of stem-based and node-based trees clarified, we argue for a distinction between types of trees and types of names. Node-based and stem-based trees contain exactly the same information for naming clades. However, evolutionary concepts, such as monophyly, are represented as different mathematical substructures in the two models. For a given stem-based tree, one should employ stem-based names, whereas for a given node-based tree, one should use node-based names, but applying a node-based name to a stem-based tree is not logical because node-based names cannot exist on a stem-based tree and visa versa. Authors might use node-based and stem-based concepts of monophyly for the same representation of a phylogeny, yet, if so, they must recognize that such a representation differs from the graphical models used for computing in phylogenetic systematics.
We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.