Phylogenies of multi-domain proteins have to incorporate macro-evolutionary events, which dramatically increases the complexity of their construction.
We present an application to infer ancestral multi-domain proteins given a species tree and domain phylogenies. As the individual domain phylogenies are often incongruent, we provide diagnostics for the identification and reconciliation of implausible topologies. We implement and extend a suggested algorithmic approach by Behzadi and Vingron (2006).
The phytopathogenic genus Xanthomonas comprises numerous species and pathovars described primarily on their host and tissue specificities. Stenotrophomonas maltophilia , which is non-phytopathogenic and taxonomically closely related to Xanthomonas , has undergone several classifications from Pseudomonas to Xanthomonas and finally to Stenotrophomonas . In this study, we have investigated the phylogenetic and taxonomic status of these members using the complete RNA polymerase beta-subunit ( rpoB ) gene sequences available from their sequenced genomes. Not only did we obtain a phylogenetic tree for xanthomonads, but rpoB gene sequence information has also resolved the taxonomic relationship of X. axonopodis pathovars, X. albilineans and other Xanthomonas strains, with the most marked evidence being that Stenotrophomonas is synonymous to Xanthomonas . This study has revealed the power and potential of complete rpoB gene sequence in taxonomic, phylogenetic and evolutionary studies on Xanthomonas and Stenotrophomonas generic complex.
Several naturally occurring hybrids in Potentilla (Rosaceae) have been reported, but no molecular evidence has so far been available to test these hypotheses of hybridization. We have compared a nuclear and a chloroplast gene tree to identify topological incongruences that may indicate hybridization events in the genus. Furthermore, the monophyly and phylogenetic position of the proposed segregated genera Argentina, Ivesia and Horkelia have been tested. The systematic signal from the two morphological characters, style- and anther shape, has also been investigated by ancestral state reconstruction, to elucidate how well these characters concur with the results of the molecular phylogenies.
Six major clades, Anserina, Alba, Fragarioides, Reptans, ivesioid and Argentea, have been identified within genus Potentilla. Horkelia, Ivesia and Horkeliella (the ivesioid clade), form a monophyletic group nested within Potentilla. Furthermore, the origin of the proposed segregated genus Argentina (the Anserina clade) is uncertain but not in conflict with a new generic status of the group. We also found style morphology to be an informative character that reflects the phylogenetic relationships within Potentilla. Five well-supported incongruences were found between the nuclear and the chloroplast phylogenies, and three of these involved polyploid taxa. However, further investigations, using low copy molecular markers, are required to infer the phylogeny of these species and to test the hypothesis of hybrid origin.
Bombyliidae (~5000 sp.), or bee flies, are a lower brachyceran family of flower-visiting flies that, as larvae, act as parasitoids of other insects. The evolutionary relationships are known from a morphological analysis that yielded minimal support for higher-level groupings. We use the protein-coding gene CAD and 28S rDNA to determine phylogeny and to test the monophyly of existing subfamilies, the divisions Tomophthalmae, and ‘the sand chamber subfamilies’. Additionally, we demonstrate that consensus networks can be used to identify rogue taxa in a Bayesian framework. Pruning rogue taxa post-analysis from the final tree distribution results in increased posterior probabilities. We find 8 subfamilies to be monophyletic and the subfamilies Heterotropinae and Mythicomyiinae to be the earliest diverging lineages. The large subfamily Bombyliinae is found to be polyphyletic and our data does not provide evidence for the monophyly of Tomophthalmae or the ‘sand chamber subfamilies’.
The NCBI Taxonomy underpins many bioinformatics and phyloinformatics databases, but by itself provides limited information on the taxa it contains. One readily available source of information on many taxa is Wikipedia. This paper describes iPhylo Linkout, a Semantic wiki that maps taxa in NCBI’s taxonomy database onto corresponding pages in Wikipedia. Storing the mapping in a wiki makes it easy to edit, correct, or otherwise annotate the links between NCBI and Wikipedia. The mapping currently comprises some 53,000 taxa, and is available at http://iphylo.org/linkout. The links between NCBI and Wikipedia are also made available to NCBI users through the NCBI LinkOut service.
Fruit bats of the genus Pteropus occur throughout the Austral-Asian region west to islands off the eastern coast of Africa. Recent phylogenetic analyses of Pteropus from the western Indian Ocean found low sequence divergence and poor phylogenetic resolution among several morphologically defined species. We reexamine the phylogenetic relationships of these taxa by using multiple individuals per species. In addition, we estimate population genetic structure in two well-sampled taxa occurring on Madagascar and the Comoro Islands (P. rufus and P. seychellensis comorensis). Despite finding a similar pattern of low sequence divergence among species, increased sampling provides insight into the phylogeographic history of western Indian Ocean Pteropus, uncovering high levels of gene flow within species.
The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
Despite their obvious utility, detailed species-level phylogenies are lacking for many groups, including several major mammalian lineages such as bats. Here we provide a cytochrome b genealogy of over 50% of bat species (648 terminal taxa). Based on prior analyzes of related mammal groups, cytb emerges as a particularly reliable phylogenetic marker, and given that our results are broadly congruent with prior knowledge, the phylogeny should be a useful tool for comparative analyzes. Nevertheless, we stress that a single-gene analysis of such a large and old group cannot be interpreted as more than a crude estimate of the bat species tree. Analysis of the full dataset supports the traditional division of bats into macro- and microchiroptera, but not the recently proposed division into Yinpterochiroptera and Yangochiroptera. However, our results only weakly reject the former and strongly support the latter group, and furthermore, a time calibrated analysis of a pruned dataset where most included taxa have the entire 1140bp cytb sequence finds monophyletic Yinpterochiroptera. Most bat families and many higher level groups are supported, however, relationships among families are in general weakly supported, as are many of the deeper nodes of the tree. The exceptions are in most cases apparently due to the misplacement of species with little available data, while in a few cases the results suggest putative problems with current classification, such as the non-monophyly of Mormoopidae. We provide this phylogenetic hypothesis, and an analysis of divergence times, as tools for evolutionary and ecological studies that will be useful until more inclusive studies using multiple loci become available.
Over the last decade, dramatic advances have been made in developing methods for large-scale phylogeny estimation, so that it is now feasible for investigators with moderate computational resources to obtain reasonable solutions to maximum likelihood and maximum parsimony, even for datasets with a few thousand sequences. There has also been progress on developing methods for multiple sequence alignment, so that greater alignment accuracy (and subsequent improvement in phylogenetic accuracy) is now possible through automated methods. However, these methods have not been tested under conditions that reflect properties of datasets confronted by large-scale phylogenetic estimation projects. In this paper we report on a study that compares several alignment methods on a benchmark collection of nucleotide sequence datasets of up to 78,132 sequences. We show that as the number of sequences increases, the number of alignment methods that can analyze the datasets decreases. Furthermore, the most accurate alignment methods are unable to analyze the very largest datasets we studied, so that only moderately accurate alignment methods can be used on the largest datasets. As a result, alignments computed for large datasets have relatively large error rates, and maximum likelihood phylogenies computed on these alignments also have high error rates. Therefore, the estimation of highly accurate multiple sequence alignments is a major challenge for Tree of Life projects, and more generally for large-scale systematics studies.
Grammitid ferns are a well-supported clade of ~900 primarily tropical epiphytic species. Recent phylogenetic studies have found support for a distinctive, geographically diverse group of 24 species referred to as the Lellingeria myosuroides clade and have provided evidence for a variety of phylogenetic relationships within the group, as well as hypotheses of historical processes that have produced current biogeographical patterns. We present new data and analyses that support the following primary conclusions: 1) the L. myosuroides clade is monophyletic and pantropical; 2) that clade is sister to a more species rich clade of entirely Neotropical species (Lellingeria s.s.); 3) we infer two independent dispersal events from the Neotropics to Pacific islands, five independent dispersal events from the Neotropics to the Paleotropics, and two separate dispersal events from mainland tropical America to the West Indies.