Epidemiology of highly pathogenic microorganisms

Research Group Bioinformatics (NG4)

NG 4 - Bioinformatics

Phylogenetic analyses are a very popular and powerful way to extract information from sequences. The end product is a tree-like graph intended to summarize the course of evolution. Sequence relatedness can be discussed on the basis of this graph but only on condition that it is assigned a direction for time. This is achieved by a process known as rooting, that turns a tree-like graph into a proper phylogenetic tree

Dudas and Rambaut (2014) recently demonstrated that improper rooting can end up in supporting strikingly erroneous evolutionary scenarios, which can in turn mislead the formulation of important epidemiological hypotheses

The initial misplacement of Guinea 2014 EBOV was due to unnoticed long-branch attraction to the outgroup

This approach is very sensible. However, here we introduce an alternative method which is built under the same hypothesis but additionally allows for a quantitative assessment of the support for any root location. As an illustration, we apply this method to localize the root within the Zaïre clade. Using this same tool, we also investigate the position of the root within the genus ^{,}

Over the last decade, one of the most fundamental developments in the field of phylogenetics was the introduction of models employing relaxed molecular clocks^{,}^{,}^{,}^{,}^{,}

Assessing branch RPP only requires parsing the posterior sample of trees, recording the branches wearing the root and their frequency. Due to the large number of trees, this cannot be done manually. As, to the best of our knowledge, no software is available for this purpose, we developed RootAnnotator, a user-friendly, portable software that collects information on root positions in posterior samples of trees and annotates a target tree with the according RPP (available from

To investigate the position of the root within the Zaïre clade, we used the alignment of concatenated coding sequences published by Dudas and Rambaut (2014) (available at

The first alignment was analyzed in BEAST using a GTR+Γ model of nucleotide substitution and assuming an uncorrelated relaxed clock (lognormal) which was tip-calibrated. We performed these analyses under the same three distinct demographic priors used by Dudas and Rambaut (2014) (constant population size, exponential growth and Bayesian skyride)^{2}: 0.43). The second alignment was therefore analyzed in BEAST using a GTR+Γ model of nucleotide substitution and assuming an uncorrelated relaxed clock (lognormal) which was not tip-calibrated. We performed these analyses under two speciation priors (Yule and Birth-Death process).

Branch RPP were determined using RootAnnotator and plotted on the MCC tree

MCC tree and branch root posterior probabilities (RPP) derived from the analysis run under a constant population size model (the two other models ended up with generating very similar results) and an uncorrelated relaxed clock (lognormal). In the top left corner the complete list of branches that appeared at least once in the posterior tree sample and the according RPP. Note that two possible root locations (6 and 7) do not appear in the tree as the MCC tree did not comprise the corresponding branches. All internal branches linking coloured clades/groups received very good support (posterior probability: 1.00). The only exception was the branch defining the clade comprising Guinea 2014 EBOV and DRC 2007/2008 EBOV, which was only moderately supported (posterior probability comprised between 0.56 and 0.68).

For the Zaïre lineage, the posterior tree samples that we analyzed (one sample per demographic model) did not comprise a single tree whose root would be located on the branch leading to Guinea 2014 EBOV (Figure 1). Hence, under the assumption of a relaxed molecular clock it seems extremely unlikely that this virus falls outside the genetic diversity of the Zaïre lineage. The clock rooting approach implemented here therefore provides strong statistical support to the conclusion reached by Dudas and Rambaut (2014)

Depending on the demographic model, eight to nine root locations were identified within the Zaïre clade. Irrespective of the demographic model, the same two branches were always identified as receiving the two highest RPP. The external branch leading to the DRC 1976 ZEBOV strain (Mayinga) received RPP comprised between 0.62 and 0.69 whereas for the branch defining the bipartition [DRC 1976/1977 ZEBOV strains|other ZEBOV strains] RPP were between 0.21 and 0.28. These results mostly raise the question of the reciprocal monophyly of early DRC ZEBOV and all other ZEBOV strains (only supported by the second-to-best root location).

MCC tree and branch root posterior probabilities (RPP) derived from the analysis run under a Yule process (a Birth-Death process ended up with generating very similar results) and an uncorrelated relaxed clock (lognormal). The clock was not calibrated and the scale axis therefore is in substitution per site. RPP are reported in the list appearing at the left of the tree. All internal branches received very good support (posterior probability: 1.00).

We also applied the clock rooting approach to the genus

Under both speciation priors we tested, five root locations were identified and among these three gathered >0.99 RPP (Figure 2). The branch defining the bipartition [Sudan, Reston|Bundibugyo, Taï Forest, Zaïre] received RPP 0.69 and 0.68 (Yule and Birth-Death process, respectively), the external branch leading to Sudan ebolavirus RPP 0.19 and 0.18 and the external branch leading to Reston ebolavirus RPP 0.12 and 0.13. Therefore, while the hypothesis put forward by Carroll et al. (2013) gets more probabilistic support

In our view, these examples highlight the unique ability of clock rooting to capture uncertainty so as to root location. With RootAnnotator it is now easily possible to establish short lists of plausible roots warranting further examination

The authors have declared that no competing interests exist.

The authors thank Paul Schäpe for his help with the implementation of BEAST on a local server. SCS thanks the entire group of epidemiology of highly pathogenic microorganisms at the Robert Koch-Institute for stimulating discussions.