The 2014 Ebola virus in Western Africa is the largest Ebola epidemic in history and the number of infections continues to grow exponentially. The unprecedented rate of growth has negatively impacted the quality of epidemiological surveillance and has made it difficult to map and characterize the spread of the epidemic ^{13} . As local health and surveillance systems are overwhelmed, an unknown proportion of cases are unreported, are not isolated, and do not receive adequate treatment. Predictive modelling and evaluation of intervention efforts is also hampered by the rapid rate of increase and imperfect case reporting.
In the absence of complete surveillance data and contact tracing, mathematical models^{20} have provided valuable insights into the rate of epidemic growth and the reproduction number (). The reproduction number is a useful parameter for characterizing the difficulty of eradication. Early analyses based on case reporting by the World Health Organization (WHO) indicated that differed substantially between countries^{19}^{,}^{17} . In some instances, estimates of based on different models are not in agreement, implying that they are sensitive to the assumptions of the mathematical framework used, and the exact data sets used for parameter estimation. Althaus^{19} estimated (2.412.67) for Sierra Leone based on WHO case reports through late August 2014, whereas Fisman et al. ^{17} estimated for Sierra Leone using a similar data set. More recently, Towers et al.^{18} estimated (1.0,1.5) for Sierra Leone using a longer timeseries of case reports and a model with timedependent reproduction numbers. And, the WHO Ebola Response Team^{1} presents an estimate of (1.7892.26) for Sierra Leone, which additionally makes use of new information about the incubation period and serial interval for the current epidemic. The analysis by Althaus modeled the natural history of infection by including a mean 5.3 day latent period before cases become infectious, whereas the analysis by Fisman et al. used a much longer latent period, but did not explicitly consider the lack of infectiousness during the latent period.
We conduct a phylodynamic^{21} analysis of 78 Ebola virus genetic sequences discussed in Gire et al.^{16}. These data provide an independent source of information about epidemic growth rates and and may corroborate previous estimates based on case reporting data. To examine the sensitivity of estimates to the unknown latent period, we repeat our analysis with two values (5.3 and 12.7 days) which have been estimated from previous Ebola outbreaks.
Recently, Stadler et al.^{34} conducted a similar phylodynamic analysis of the same data. We conclude our analysis with a discussion of the primary differences in the analytic approach and findings of these two studies.
An advantage of phylodynamic analysis is that estimates are robust to incomplete sampling of cases, and the proportion of cases which are unreported does not enter directly into our model^{24} . Sequence data may also be informative about epidemiological parameters where standard surveillance data are unhelpful. Previous phylodynamic analyses have shown how sequence data can be highly informative about who infected whom^{26} and risk factors for transmission^{25}. In addition to , we estimate parameters that describe heterogeneity in transmission rates between infected individuals. Fitting these models allows us to characterize superspreading as well as estimate the proportion of cases which do not yield secondary infections.
We also examine the virus genomes for evidence of natural selection, which can potentially bias phylodynamic analyses by violating assumptions of neutral evolution. Because the primary analysis of EBOV isolates ^{16} found a large number of nonsynonymous mutations in whole length genomes, and because strong selective pressures can bias molecular clocks^{10}^{,}^{11}, and violate the assumptions of the standard coalescent process^{9}, we performed an exhaustive analysis of all genes in the EBOV genome for evidence of episodic diversifying natural selection using sensitive codonsubstitution evolutionary models.
Data. We conduct a secondary analysis of EBOV phylogenies presented by Gire et al.^{16} Samples were collected for wholegenome deep sequencing from 78 patients between 25 May and 20 June in Sierra Leone. In situations where multiple samples were available for a single patient, only the first sample was used in the phylogenetic analysis. Dates of common ancestry for all pairs of samples were estimated with using Bayesian relaxed clock methods ^{28} . Further details of the sequencing protocol and models used for phylogenetic analysis can be found in Gire et al. ^{16} This procedure yields a sample from the posterior distribution of dated phylogenies, from which we sampled 1,000 trees to make computation tractable.
For molecular selection analyses, we augmented the 2014 outbreak sequences with 1737 (depending on the gene) additional isolates from previous EBOV outbreaks (19762007), which were also included in the original Bayesian relaxed clock analysis ^{16}.
Cumulative numbers of cases reported by WHO were acquired from https://github.com/cmrivers/ebola on September 13, 2014.
Models.The starting point for the analysis is the SEIR model, which has previously been applied to EBOV outbreaks ^{29}^{,}^{30}and has recently been applied to the 2014 epidemic ^{19} .
The parameter will be the transmission rate per infectious individual, will be the rate that infected progress from the latent period to the infectious period, and will be the rate that infectious cases are removed due to death and burial or by effective isolation and treatment. The ordinary differential equations for the deterministic SEIR model are:
These equations describe the dynamics of the number exposed noninfectious individuals and the number of infectious individuals . We make the approximation that the majority of the population is in the susceptible category () for this and subsequent models, such that an equation for the dynamics of is not needed.In order to estimate heterogeneity in transmission rates, we extend the SEIR model to include two infectious categories, and . When the latent period ends, a case progresses to the category with probability and category with probability . According to this model, transmissions only occur from the category . By changing the parameter , the variance in transmissions per infectious case can be made arbitrarily large. The deterministic equations are:
We will refer to this as the ODE SEIIR model.
A stochastic version of the SEIIR model was also fitted in order to account for any bias due to noisy dynamics during the early exponential growth phase of the epidemic. The equations for this model are given by the following stochastic differential equations:
where are independent Wiener (standard Brownian motion) processes, and . This system accounts for noise in incidence and deaths, but does not account for noise in the composition of into the and categories because of the difficulty of fitting such a system to data. Note that the stochastic terms in the equations for and are the same and multiplied by and respectively. We will refer to this model as the SDE SEIIR model. The euler method with a time step of one day was used to simulate trajectories from the SDEs.Statistical analysis. The epidemiological models were fitted to EBOV phylogenies using the rcolgem package in R^{22}^{,}^{23}, which computes the likelihood of epidemiological parameters given a phylogeny. When fitting the ODE and SDE SEIIR models, likelihoods were calculated for each phylogeny with the boundary condition that each sample is a superspreader with probability , which is estimated. Models were fitted using a Bayesian Markov chain within Metropolis (MCWM) algorithm^{15} , which integrates over the distribution of phylogenies previously estimated in Gire et al.^{16} . This algorithm was implemented by customizing the mcmc package in R. At each step of the MCWM algorithm, the likelihood of the set of trees given a solution of the epidemiological model is approximated by The genealogies used in the approximation are drawn uniformly at random from the distribution estimated by Gire et al. ^{16} If fitting a stochastic model, a doublemarginalization is required over genealogies and simulations of the stochastic model. Such a Markov chain will sample the posterior distribution regardless of the choice of sample size , however the value used will influence the efficiency of the algorithm. We chose for ODE models and for SDE models to match the architecture of our high performance computing cluster.
In the SEIR model, two parameters were estimated: and the time when the epidemic was initiated in Sierra Leone. In the ODE SEIIR and SDE SEIIR model, the parameter was also estimated, which controls the proportion of cases in . A diffuse lognormal prior (mean 3.2, standard deviation 2.5) was used for , a uniform(0,1) prior was used for , and a normal prior for with mean April 23, and standard deviation of 6 days (based on the results by Althaus^{19}).
We compared models using the approximate AICM method^{32}. AICM is a summary statistic for the goodness of model fit for Bayesian analyses, and is analogous to the Akaike information criterion (AIC^{8}) used for model selection in the maximum likelihood framework. AICM was calculated using Tracer1.6^{14} . At least two MCWM chains were sampled for each model and combined with 20% of samples removed for burnin and effective sample sizes were computed to confirm adequate sample size.
The reproduction number was calculated using for the SEIR model and for the SEIIR models. The coefficient of variation (CV) was computed under the assumption that if an individual is infectious, the number of transmissions has a geometric distribution with parameter. For a geometric distribution under the SEIR model, CV=. According to the SEIIR model, an individual is infectious with probability, which yields CV =.
Selection analyses. We extracted sequences spanning the complete lengths of the seven annotated genes of EBOV^{16} and fitted several evolutionary models to multiple sequence alignments (MSAs) using fixed maximum clade credibility (MCC) trees computed from previous BEAST runs. MSAs were easily obtained because of the lack of indel variation in EBOV. We used sensitive methods for detecting episodic diversifying selection at the level of individual sites (mixed effects model of evolution^{5}), individual branches (branchsite random effects likelihood, BSREL^{6} ), and a modification of the BSREL to test for genewide selection operating on the (monophyletic) clade of 2014 EBOV sequences. Let denote the ratio of nonsynonymous to synonymous substitution rates. Briefly, whereas the original BSREL method describes the evolutionary process by fitting a model with a mixture of three separate values for each tree branch, we partitioned the tree into the foreground (the 2014 Western Africa EBOV clade) and background (all other branches) segments, and fitted two 3bin distributions jointly to all the branches in each partition. A likelihood ratio test of the unconstrained model versus the null model where all (i.e. negative selection or neutral evolution) was used to establish significance for evidence of diversifying positive selection affecting a proportion of sites along a proportion of 2014 EBOV lineages. All analyses have been implemented and run using HyPhy v2.12 ^{7}.
Table 1 shows parameter estimates based on four epidemiological models. Estimates of based on the simple SEIR model are similar to those based on the ODE SEIIR model (posterior median respectively), however the SEIR model does not provide an estimate of heterogeneity in transmission rates. The stochastic SEIIR model gives similar estimates to the deterministic SEIIR model, but wider credible intervals, and a slightly larger of 2.40 (95% HPD:1.543.87). These estimates are broadly consistent with the previously published estimates in Althaus ^{19} and by the WHO Response Team ^{1} which were based on WHO case reporting in Sierra Leone.
Estimates of are sensitive to the latent period which could not be estimated from the genetic data alone. Published estimates of the duration of the latent period based on earlier Ebola outbreaks are highly variable ^{31} , but we present results based on two values that have been used in recent modelling studies of the current epidemic in Western Africa. If the latent period is a mean of 12.7 days, the fitted ODE SEIIR model provides an estimate of (95% HPD:2.476.31).
Figure 4 shows the estimated cumulative number of infections through time as predicted by the ODE SEIIR model with days. These estimates, while imprecise, are consistent with reported cases by WHO which were not used for model fitting. The slightly larger number of cases predicted by the model may in part reflect underreporting of cases in the WHO data.
We find strong evidence for superspreading. The fitted SEIIR models which account for supserspreading have higher median posterior log likelihoods of 346.9 versus 380.1 for the SEIR model. The ODE SEIIR model is also superior to the SEIR model by the AICM criterion. For two distinct Markov chain samples, we find AICM=38.7 and 25.6 in favor of the ODE SEIIR model.
In all fits of the SEIIR model, the estimated proportion of cases in the high transmission rate category is less than 63% and posterior median estimates are approximately 10%. Figure 5 shows the EBOV phylogeny with maximum posterior probability. Branches are colored with the probability that the virus lineage inhabits a superspreading host. The superspreader lineage probabilities are based on the median posterior parameter estimates with the ODE SEIIR model. When a lineage occupies a superspreading host (shaded red), it is much more likely to undergo a coalescent event, that is, to have common ancestry with other sampled lineages. This process yields phylogenies with very imbalanced topologies. It also introduces correlation between the lengths of neighboring ancestral and daughter branches, as a lineage in a superspreading host is likely to undergo several coalescent events in short succession.
Selection analyses. Table S1 provides an overview of codonbased selection analyses of the seven EBOV protein coding regions. There is a notable variation in nucleotide level diversity across genes (total tree length), with the glycoprotein (GP) showing the highest diversity. About 0.5% of branchsite combinations in the long RNA polymerase (L) gene appear to be under strong diversifying selection () in the 2014 clade, whereas the entirety of sequence evolution in the GP gene is comprised of nonsynonymous changes (at 100% of branchsite combinations). The remaining genes do not contain significant positive selection signal.
When we asked which individual sites showed evidence of episodic diversification (using the MEME^{5} method with pvalue < 0.95), sites 388 and 389 in the heavily glycosylated mucinlike domain of GP, and sites 1396, 1492, 1722 in L were identified.
Phylodynamic analysis of EBOV sequences provides a new perspective on and epidemic growth rates that is independent of previous analyses based on WHO case reports. Previous analyses have reached divergent conclusions about in Sierra Leone, and our estimates are consistent with previous analyses in Althaus^{19} and by the WHO Ebola Response Team^{1} if assuming a short latent infection period, but slightly larger if assuming a longer latent infection period. Our results are sensitive to the early evolutionary history of the Sierra Leone EBOV, much of which occurred before the first WHO case reports. Thus, the discrepancy of our results with the studies by Towers et al.^{18}, who reported smaller values of , may be due the decrease in epidemic growth rates observed in July and August. Estimates of are very sensitive to the unknown duration of the latent period which can not be estimated from genetic data alone. Recently, the WHO Ebola Response Team published its first estimates of the incubation period and serial intervals for the 2014 epidemic^{1}, and found a mean incubation period of 11.4 days. This suggests that is closer to the upper bound of our range of estimates (2.103.85). Stadler et al.^{34} recently estimated =2.18 (95% HPD 1.243.55) using the same genetic data used in our analysis. These results are close to our findings if comparing similar models (SEIR and BDEI) and similar incubation periods. The credible intervals in the analysis by Stadler et al.^{34} are much wider because a diffuse prior was used for the latent period, whereas we tested the sensitivity to this parameter by repeating the analysis with the latent period fixed at different values.
While we find that it is not possible to estimate the latent period from genetic data alone, Stadler et al.^{34} have conducted a phylodynamic analysis of the same EBOV data and estimated a mean incubation period (assumed equivalent to the latent period) of 4.92 days (95% HPD 2.1123.20). Stadler et al.’s inference of incubation periods was made possible by using additional information, namely the times of genetic sequence sample collection, which were assumed to be collected at a constant percapita rate. By calibrating the exponential growth rates of the epidemic to match the rate of sample collection, other parameters are rendered identifiable. Incorporating a model of the sampling process into phylodynamic inference can greatly increase statistical power^{24}, however it can also bias estimates if the sampling process is misspecified. We do not find evidence that the sampling rate was constant as required by the analysis in Stadler at al.^{34}, but rather that it increased steadily over the sample collection period. By comparing the cumulative number of infections reported by WHO to the cumulative number of samples collected, we find that the sampling rate varied from 20% early in the epidemic to 70% near the end of the sample collection period. An alternative to using times of sample collection to calibrate growth rates would be to use the WHO case reports. Unfortunately, there is very little overlap between the timestamped EBOV phylogenies and WHO case reports because all samples were collected during the early portion of the epidemic.
In contrast to the analysis by Stadler et al.^{34} , we find statistically significant support for a model which features superpreading (heterogeneous transmission rates). These divergent findings may be due to differences in the population genetic models used (coalescent and birthdeathsampling). The discrepancy may also be due to a different parameterisation of superspreading. The analysis by Stadler et al. required two additional parameters to describe superspreading. We chose a model of superspreading which required only one additional parameter, thereby increasing discriminatory power at the expense of some realism. The quantitative estimates of the CV of the reproduction number may be biased upwards by unrealistic distributional assumptions in our model. It is unlikely that transmission heterogeneity is well described by a mixture of only two transmission rates.
It is possible to characterize superspreading patterns from virus phylogenies because the variance in transmissions per case alters the genetic relatedness of a random sample of EBOV sequences^{27}. The EBOV phylogenies are highly imbalanced, and neighbouring branch lengths are highly correlated. We hypothesize that these features are a consequence of high variance in transmission rates, and we have proposed an epidemiological population genetic model that reproduces these features. Our epidemiological model of superspreading lacks some realism, however our parameterisation of the transmission process allows us to easily estimate variance in transmission rates.
High variance in transmission rates may hamper contact tracing efforts, since a single missed contact may trigger a sizeable outbreak. Epidemics which feature a highly skewed distribution of transmissions per infected individual differ substantially from epidemics where the number of transmissions cluster around ^{3}^{,}^{2} . In epidemics with many superspreading events, the probability of epidemic extinction is greater, and the probability that a single introduction into a susceptible population will trigger an epidemic is also lower. But, when outbreaks do occur, they are more explosive and contact tracing may be more difficult. Furthermore, intervention strategies that are targeted towards individuals with higher transmission risk are likely to be more effective in epidemics with superspreading events. We estimate that a small proportion of infected cases are responsible for a majority of transmissions and a large proportion of infections yield no transmissions.
Our molecular selection analyses suggest that episodic diversifying selection may be operating on L and GP genes. When analysing recent viral isolates, much of the selection signal could be driven by overall maladaptive substitutions along terminal branches due to intrahost evolution ^{4}. When additional isolates become available, some of the techniques for filtering out such substitutions (e.g. analysing only internal branches ^{4} ) may prove fruitful. The functional importance of sites subject to such forces remains to be elucidated.
Many factors contribute to the uncertainty of our findings, including uncertainty in the EBOV phylogenies, dates of common ancestry, and inherent noisiness of the epidemic process during the early period of exponential growth. Our estimates are based on a relatively small sample of EBOV sequences, and much greater precision could be achieved if a larger proportion of cases are sequenced over a longer period of time. Genetic sequence data are only available for the very early portion of the epidemic in Sierra Leone. Estimates may differ in other countries and settings, as well as through time as intervention efforts are scaled up and the population adapts to the growing epidemic. Phylodynamic methods are robust to variable and incomplete sampling of cases, so that virus sequences may be a useful supplement to epidemic surveillance if a growing proportion of cases are not reported to health systems.
The authors have declared that no competing interests exist.
The 2014 West African Ebola virus (EBOV) epidemic is the largest Ebola virus outbreak to date with 7492 cases (4108 confirmed) and 3439 deaths (2078 confirmed) as of 3 October 2014^{1}. While previous EBOV outbreaks remained localized, the current epidemic has spread across Guinea, Sierra Leone and Liberia with a localized outbreak in Nigeria. (Both Senegal and the USA have reported one imported case with no local transmission, as of 3 October 2014). Relief efforts have so far been ineffective at containing the disease, due largely to porous borders, a lack of education about the disease and degraded public health infrastructure^{2}^{,}^{3}^{,}^{4}. Moreover, the epidemic has spread to major urban areas, further facilitating its continued spread and complicating containment efforts.
Patients exposed to EBOV first undergo an incubation period of 221 days before becoming infectious^{3}^{,}^{5}^{,}^{6}^{,}^{7}. Once infectious, patients either die between days 6 and 16 or may begin to recover between days 6 and 11^{3}^{,}^{8}^{,}^{9}. Although patients who recover are generally noninfectious after convalescence, EBOV has been isolated 33 days after the onset of symptoms from mucosal membranes and 61 days after the onset of symptoms from semen^{10}^{,}^{11}. There is currently no known effective treatment or vaccine for Ebola virus disease and relief efforts focus on bringing down the case fatality rate through supportive care and disease containment^{3}.
In Gire et al. (2014)^{12}, 99 Ebola genomes from 78 patients from the Sierra Leone outbreak are provided. This represents about 70% of confirmed cases during late May to mid June. Based on the phylogenies in Gire et al. (2014)^{12}, it is likely that the Sierra Leone outbreak was started by the simultaneous introduction of two genetically distinct viruses. The initial 14 confirmed cases in Sierra Leone have all been epidemiologically linked to the funeral of a traditional healer in Guinea, supporting a single introduction event. The first split of the Sierra Leone sequences, separating the two introductions, is supported in all posterior trees presented in Gire et al. (2014)^{12} as well as in our preliminary analyses. We focused on the introduction causing the larger outbreak (72 sampled patients) and ignored the smaller outbreak (6 sampled patients).
We use these genomic data to estimate epidemiological parameters. We employed the Bayesian MCMC framework BEAST2^{13} , applying a range of epidemiological tree priors to the sequencing data. The tree priors are based both on birthdeath^{14} and coalescent^{15} models. Furthermore, we estimated epidemiological parameters based on the trees from Gire et al. (2014)^{12} using a maximum likelihood framework implemented in R^{16} .
The larger outbreak, consisting of 72 Ebola sequences, is analysed in BEAST2^{13} to estimate the epidemiological parameters relevant to the epidemic. We employ birthdeath and coalescent approaches as models for epidemic spread.
Birthdeath models assume a transmission rate with which infected individuals transmit, a becomingnoninfectious rate with which infected individuals recover or die, and a sampling probability, which is the probability at which an infectious person is sampled and sequenced. Such a model naturally accounts for incomplete sampling and, since the sampling probability is a parameter in our model, this quantity may also be estimated. In particular, we run birthdeath analyses using the models depicted in Figure 1. We explain the assumptions of these models in the following.
The birthdeath (BD) model^{17} allows the three parameters, transmission rate, becomingnoninfectious rate, and sampling probability to change in a piecewise constant fashion.
To model the spread of EBOV more realistically, we further extend the birthdeath model to allow for an exposed class of infected people. The exposed class is entered upon infection, and an exposed individual moves from the exposed to the infectious class with a constant incubation rate. This model is referred to as the birthdeath exposedinfected (BDEI) model^{18}^{,}^{19}. In the BDEI model we assume that only infectious people are sampled, since exposed patients are asymptomatic.
BD and BDEI assume that individuals become noninfectious upon sampling. As Ebola may be transmitted also after sampling (transmission at funerals constitutes a major source of infection^{2}^{,}^{3}^{,}^{20}) we further run the birthdeath sampledancestors (BDsa) model^{21}, which extends BD by assuming that sampled individuals become noninfectious upon sampling with probability r and remain infectious with probability 1r. When r<1 the phylogeny may contain sampled ancestors, meaning samples do not have to coincide with tips in the tree, but a sample in the tree may have sampled descendants.
The BDSIR model^{22} is a variant of the BD model in which we explicitly account for susceptible hosts, meaning the epidemic slows down once the number of susceptible hosts declines. This model includes an explicit susceptible class and the number of initial susceptible hosts as a parameter, which was estimated using a LogNormal(8,4) prior distribution.
We also fit a deterministic coalescent model to the EBOV sequence data. We use the structured coalescent framework of Volz (2012)^{15}, assuming an exposed and infectious class (as in the BDEI model), to probabilistically take into account whether lineages reside in exposed or infectious individuals. This coalescent SEIR model (coalSEIR) was implemented in BEAST2 and epidemiological parameters were estimated along with the genealogy from the sequence data, with the initial number of susceptible hosts set to 1 million, following Althaus (2014)^{23}.
In all analyses we first assumed a constant basic reproductive number R_{0}, which is the ratio of the transmission rate over the becomingnoninfectious rate. Second we allowed the reproductive number to change twice: at the time of the oldest sample (May 26) and midway between the oldest and youngest samples (June 6). The becomingnoninfectious rate and the sampling probability were assumed to remain constant throughout the epidemic outbreak.
We assumed the following Bayesian prior distributions for our analyses. The prior for R_{0} is LogNormal(0,1.25). The time of origin, i.e. the time of infection of the first person in the Sierra Leone outbreak, was assumed to be uniform during the 6 (and for computational reasons in some analyses, 3) months prior to the most recent sample at time 18 June 2014, thus any start time of the Sierra Leone outbreak from 18 December 2014 (or 18 March 2014) was equally likely. For the incubation rate and the becomingnoninfectious rate we assumed a Gamma prior with shape 0.5 and scale 1/6 days^{1}, truncated, such that the periods of being exposed and infectious lie between 1 and 26 days, and such that all times in this interval have considerable support. The median of these priors is 0.11 days^{1}, meaning that the expected time of being exposed and infectious is 9 days each. As no sequencing effort has been performed prior to the oldest sample, collected on 25 May 2014, we assume that the sampling probability is 0 prior to that date and constant afterwards. After that date, we assume a uniform prior on [0,1] for the sampling probability in the analyses without exposed class. To improve computational performance in the more complex BDEI model, we assume a Beta(70,30) prior distribution, supporting a sampling proportion around 70%, based on our own results as well as Gire et al. (2014)^{12} , and also fix the mean clock rate to 1.984e^{3}/site/year^{12} . The priors on all epidemiological parameters as well as the mean clock rate were identical between the coalSEIR and BDEI models.
Instead of reporting the becomingnoninfectious rate and the incubation rate, we report their inverse values, which are the expected times of being exposed (incubation time) and being infectious. We report the median posterior value for each parameter together with the shortest interval containing 95% of the posterior samples.
Maximum likelihood analysis using birthdeath models
As a comparison, we performed maximum likelihood (ML) parameter estimation using the posterior trees from Gire et al. (2014)^{12}. Again, we first eliminated the Guinea samples and the 6 samples from the second Sierra Leone outbreak. Thus all trees analyzed consist of 72 tips. From the 10001 posterior trees provided by the authors of Gire et al. (2014)^{12} , we eliminated the first 1001 trees as burnin, and then chose every 100th tree from the remaining 9000 trees, yielding a set of 90 trees. For these 90 trees, we performed an analysis under the BD model with constant and timevarying reproductive number and BDEI with constant R_{0} using the R package TreePar v3.1^{16}. Additionally, we applied a birthdeath model to the trees quantifying the amount of superspreading in the population, BDss^{18}. This model extends the constantrate BD model, assuming that individuals belong to either one of two classes with a unique R_{0}. Individuals transmit to both classes. We report the median maximum likelihood value together with the shortest interval containing 95% of the ML estimates from all 90 trees.
Figure 2 displays the estimated R_{0} values for the different phylodynamic methods. Overall the different Bayesian methods simultaneously inferring trees and parameters yield median estimates between 1.652.18. The maximum likelihood methods inferring parameters based on fixed trees obtain lower estimates. In the following we discuss the results in detail.
Bayesian birthdeath analysis
R_{0}/R_{e }_{initial }  R_{e} _{middle} 
R_{e} _{recent } 
Incubation time (days) 
Infectious time (days) 
Sampling probability 
Epidemic origin 
Tree MRCA 


BD 1  1.65 _{(1.022.70)} 
–  –  –  6.09 _{(2.8418.84)} 
0.65 _{(0.201.00)} 
May 7 _{(7/422/5)} 
May 15 _{(3/522/5)} 
BD 3  0.95 _{(0.222.56)} 
1.57 _{(0.732.91)} 
1.81 _{(1.073.03)} 
–  6.15 _{(3.2217.94)} 
0.70 _{(0.271.00)} 
April 8 _{(30/1221/5)} 
May 12 _{(24/423/5)} 
BDsa 1  1.75 _{(1.042.95)} 
–  –  –  6.75 _{(3.1424.10)} 
0.60 _{(0.171.00)} 
May 8 _{(10/422/5)} 
May 15 _{(3/523/5)} 
BDsa 3  0.96 _{(0.202.65)} 
1.61 _{(0.743.00) } 
1.88 _{(1.093.23) } 
–  6.54 _{(3.2422.10)} 
0.65 _{(0.191.00)} 
April 9 _{(31/1220/5)} 
May 12 _{(24/423/5)} 
BDSIR  1.81 _{(1.122.84)} 
–  –  –  6.64 _{(3.6118.78)} 
0.70 _{(0.241.00)} 
May 4 _{(11/419/5)} 
May 15 _{(3/522/5)} 
BDEI 1^{*}  2.18 _{(1.463.22)} 
–  – 
5.6 _{(fixed)} 
2.29 _{(1.235.62)} 
0.72 _{(0.630.80)} 
May 10 _{(13/423/5)} 
May 14 _{(3/522/5)} 
BDEI 3^{*}  1.77 _{(0.594.35)} 
1.92 _{(0.803.64)} 
2.86 _{(1.584.78)} 
5.6 _{(fixed)} 
2.75 _{(1.417.07)} 
0.71 _{(0.620.79)} 
May 8 _{(14/322/5)} 
May 13 _{(28/422/5)} 
BDEI 1^{*}  1.85 _{(1.172.76)} 
–  – 
2.3 _{(fixed)} 
3.92 _{(2.159.47)} 
0.71 _{(0.620.79)} 
May 9 _{(15/421/5)} 
May 14 _{(4/522/5)} 
BDEI 3^{*}  1.63 _{(0.544.09)} 
1.66 _{(0.713.13)} 
2.45 _{(1.284.17)} 
2.3 _{(fixed)} 
4.72 _{(2.4610.74)} 
0.71 _{(0.620.79)} 
May 5 _{(12/323/5)} 
May 13 _{(29/422/5)} 
BDEI 1^{*} 
2.18 _{(1.243.55)} 
–  – 
4.92 _{(2.1123.20)} 
2.58 _{(1.246.98)} 
0.71 _{(0.620.80)} 
May 8 _{(10/421/5)} 
May 14 _{(3/522/5)} 
BDEI 3^{*}  2.00 _{(0.665.46)} 
1.85 _{(0.573.71)} 
3.15 _{(1.436.09)} 
5.92 _{(2.4924.92)} 
2.71 _{(1.289.22)} 
0.71 _{(0.630.80)} 
May 5 _{(3/421/5)} 
May 13 _{(30/422/5)} 
Table 1 shows the results of the Bayesian birthdeath analyses, including the times of origin and of the most recent common ancestor (MRCA). Under the constant birthdeathsampling model (BD1), we estimate an R_{0} of 1.65 (1.022.70), a sampling proportion of 65% (20100%) and an infectious period of 6 days (2.8418.84). There is no indication of a change in the reproductive number before mid June.
Since the BD model does not account for an incubation period, we also perform a simulation study in which we simulate an outbreak with incubation periods and analyse it under BD. This simulation shows that we can robustly estimate R_{0 }under the BD model even without including an explicit incubation period, and that the estimate of the infectious period is roughly equal to the sum of incubation and infectious period in the simulations (Supplementary Table 1).
Allowing individuals to stay infectious upon sampling using the sampled ancestors model (BDsa) leads to very similar estimates of the epidemiological parameters. In fact, we only estimate two sampled ancestors in our dataset and the probability to become noninfectious upon sampling is large, 0.93 (0.711.00).
The epidemiological parameters are also estimated similarly under the BDSIR model, in which incidence can decline over time due to depletion of susceptible hosts. The initial number of susceptible individuals is estimated at 46000 (median) with large uncertainty (95% HPD, 380534000). Estimating a similar R_{0} under a model that explicitly allows for the depletion of susceptible hosts over time suggests that the epidemic had not surpassed the exponential growth phase by mid June.
Using the BDEI model, which takes the incubation period into account, leads to slightly larger estimates of the basic reproductive number, 2.18 (1.243.55). There is a lot of uncertainty in our estimate of the incubation period of 5 days (2.1123.20 days). Figure 3B shows that there is only little deviation of the posterior from the prior. The infectious period is estimated to be rather short, 2.58 days (1.246.98). Here, the posterior deviates a lot from the prior (Figure 3C). When we fix the incubation time to a shorter (2.3 days) or longer (5.6 days, as in Althaus (2014)^{23}) period, we see a slight decrease or increase in the basic reproductive number, respectively. The times of origin (median May 8) and the MRCA (median May 14) show little variation.
Bayesian analysis in a coalescent framework
R_{0}  Incubation time (days) 
Infectious time (days) 
Epidemic origin 
Tree MRCA 


coalSEIR  1.90 _{(1.004.50)} 
6.23 _{(1.5326.05)} 
8.66 _{(1.07626.07)} 
May 5 _{(24/320/5)} 
May 14 _{(29/422/5)} 
Epidemiological estimates obtained under the coalSEIR model were generally very similar to those obtained under the BDEI model, which was expected given that both approaches include an incubation period and account for uncertainty in the genealogy. Table 2 shows the estimated medians and 95% HPD intervals for the coalSEIR model parameters. While the credible intervals for R_{0 }were wider under the coalSEIR than for the BDEI, R_{0 }was estimated to be 1.90, just lower than under the BDEI model. Likewise, both methods returned a median epidemic origin time in the first weeks of May. We are not able to precisely estimate the duration of the exposed or infectious periods under the coalescent model, and our estimates appear to be largely informed by the prior, see Figure 3B and C.
Maximum Likelihood birthdeath analyses based on fixed trees
R_{0}/R_{e} _{initial} 
R_{e} _{middle} 
R_{e} _{recent} 
Incubation time (days) 
Infectious time (days) 
Sampling probability 


BD 1  1.34 _{(1.121.55)} 
–  –  –  4.45 _{(2.856.29)} 
0.7 _{(fixed)} 
BD 3  1.18 _{(0.541.72)} 
1.17 _{(0.871.59)} 
1.62 _{(1.371.90)} 
–  4.74 _{(3.266.99)} 
0.7 _{(fixed)} 
BDEI 1 
1.45 _{(1.251.70)} 
–  – 
2.29 _{(0.083.24)} 
2.07 _{(0.944.80)} 
0.7 _{(fixed)} 
BD 1  1.24 _{(1.081.37)} 
–  –  –  3.04 _{(2.164.32)} 
0.35 _{(fixed)} 
BD 3  1.02 _{(0.631.50)} 
1.10 _{(0.871.47)} 
1.44 _{(1.291.69)} 
–  3.28 _{(2.264.60)} 
0.35 _{(fixed)} 
BDEI 1  1.31 _{(1.191.45)} 
–  –  1.81 _{(1.222.58)} 
1.22 _{(0.612.22)} 
0.35 _{(fixed)} 
R_{0} _{(overall)} 
R_{0} _{(class 1)} 
R_{0} _{(class 2)} 
Fraction class 1 
Infectious time (days) 
Sampling probability 


BDss  1.57 _{(1.281.91)} 
2.63 _{(1.428.31)} 
0.84 _{(0.001.40)} 
0.45 _{(0.070.87)} 
5.16 _{(3.507.35)} 
0.7 _{(fixed)} 
Finally, we performed maximum likelihood parameter inference on fixed trees from Gire et al. (2014)^{12} . Because not all four parameters are jointly identifiable^{24}, and because our Bayesian analysis confirmed previous estimates of the sampling probability, we fixed this parameter to 0.7 for times more recent than the oldest sample. Again, sampling probability was set to 0 prior to the oldest sample. To understand the sensitivity of our estimates with respect to this setting, we performed a second analysis fixing the sampling probability to 0.35.
For each of the 90 posterior trees, we obtained the maximum likelihood parameter estimates, see Table 3. Overall, assuming different fixed sampling probabilities did not significantly affect estimates. R_{0} was estimated slightly lower compared to the full Bayesian analyses above (medians 1.311.45). Again we did not find support for the reproductive number changing through time. A likelihood ratio test, comparing the results for three intervals for the reproductive number vs. a constant R_{0,} does not support three intervals for the effective reproductive number over one interval (for a sampling probability of 0.7, 9 trees out of 90 supported three intervals for R_{e} at the 95% level, and for a sampling probability of 0.35, 11 trees supported three intervals).
The upper bound for the number of days in the infected class across all analyses is 6.99 days. Thus, both full Bayesian and maximum likelihood methods suggest a time in exposed and infectious class that is lower than previous estimates.
As in the Bayesian analyses, when applying the BDEI method to the 90 Sierra Leone Ebola trees we obtain a slightly higher R_{0} when including the incubation period into the model.
When applying a birthdeath model assuming two population groups with unique transmission rates, we observe that half of the population appears to have a large R_{0} (median 2.63, 95% HPD 1.428.31), and the other half does not appear to effectively spread the disease (R_{0} median 0.84, 95% HPD 0.001.40). However, likelihood ratio tests do not strongly support the structured model over the unstructured model.
We used phylodynamic methods to estimate key epidemiological parameters of the current West African EBOV outbreak in Sierra Leone from sequencing data. Although we used a wide range of different models, we consistently recovered very similar estimates. In particular, we estimated the basic reproductive number of EBOV in Sierra Leone up to the time of the most recent sample (18 June 2014). The medians across the Bayesian methods were 1.652.18, with the most plausible model (BDEI) yielding a median estimate of 2.18 (95% HPD 1.243.55). We did not find any support for a reduction of the reproductive number prior to the most recent sample. Thus our results show that public health interventions during May and June were likely ineffective at reducing transmission in Sierra Leone. Furthermore, analyses suggest that there might be superspreaders among the infected population, however the significance of the population structure results should be reevaluated once larger datasets are available. We estimate expected incubation and infectious periods of 4.92 (2.1123.20) and 2.58 (1.246.98) days. Using our birthdeath methods, we confirm the previously estimated sampling proportion of 70%.
Our R_{0} estimates are within the range of estimates for previous outbreaks and other estimates for the current epidemic. For the 1995 EBOV Kikwit outbreak in the Democratic Republic of the Congo, R_{0 }was estimated as 1.359±0.128^{25}, 1.83±0.06^{26} or 2.7 (1.92.8)^{20}. Towers et al. (2014)^{27} estimate an R_{0} of about 1.5 for the current West African EBOV epidemic, but only R_{0}=1.2 (1.0,1.5) for the Sierra Leone epidemic, assuming incubation and infectious time periods of at most 7 days. Gomes et al. (2014)^{28} estimate an R_{0} of 1.8 (1.52.0) for the current West African EBOV outbreak while Althaus (2014)^{23} estimates an R_{0} of 2.53 (2.412.67) for the epidemic in Sierra Leone. Althaus (2014)^{23} further provides estimates of R_{0} for Guinea, 1.51 (1.501.52), and Liberia, 1.59 (1.571.60). Moreover he estimates that the R_{e} in Sierra Leone has been declining since the onset of control measures and dropped below 1 during July. During the period from our samples his estimates of R_{e} vary between 2.7 and 1.47. Nishiura and Chowell (2014)^{29} give estimates of R_{e} in Sierra Leone and Liberia of between 1.4 and 1.7 during June and July, with R_{e} in Guinea fluctuating erratically around 1 during the same period. Fisman et al. (2014)^{30} estimates values of R_{0 }between 1.66 and 2.19 for the West African epidemic, however they estimate an R_{0} of 8.33 for the epidemic in Sierra Leone alone which is clearly outside our HPD intervals. The WHO Ebola Response Team estimated an R_{0} of 2.02 (1.792.26) for Sierra Leone from empirical data^{32}. They also provide estimates for Guinea, 1.71 (1.442.01), and Liberia, 1.83 (1.721.94). These estimates are consistent with our estimates of R_{0}.
We estimate short incubation and infectious periods with all of our birthdeath methods. Estimates of the exposed and infectious periods for the 1995 EBOV Kikwit outbreak range from 5.3±0.23 and 5.61±0.19 days, respectively^{26}, to 10.11±0.713 and 6.52±0.56 days^{25}. However, for the 2000 Sudan Ebola virus (SUDV) outbreak in Uganda, Chowell et al. (2004)^{26} estimated the exposed and infectious periods to be 3.35±0.19 and 3.5±0.67 days. To the best of our knowledge the only estimates of the exposed and infectious periods for the current West African EBOV epidemic are by the WHO Ebola Response Team^{32}, based on observational data. They estimate an incubation period of 9.0±8.1 days (median of 8 days) for 201 patients in Sierra Leone with single exposures. No overall infectious period is estimated, but instead the authors provide separate estimates for the infectious period based on disease outcome. From the onset of symptoms in patients sampled in Sierra Leone they estimate a period of 8.6±6.9 days (from 128 patients) until death, 17.2±6.2 days (from 70 patients) until hospital discharge and 4.6±5.1 days (from 395 patients) until hospitalisation.
Our HPD interval estimates for the incubation period are in line with other estimates, however our estimates for the infectious period are substantially shorter than estimates from the current epidemic^{32}. Judging by the amount of variation in both the estimates from observational and genetic data we conclude that the incubation and infectious periods are highly variable and difficult to estimate accurately. However, we recover consistent estimates for the total time of infection (incubation + infectious periods), meaning there is a significant amount of information in the present dataset on the length of the infection. Sequencing data from more patients might help to get more confined credible intervals.
We see that a method accounting for an incubation period yields higher R_{0} estimates compared to a method assuming all infected individuals are infectious. As having an exposed class/incubation period will slow the initial growth of the epidemic, it is likely that estimates obtained under models that do not include the incubation period are lower to compensate for the slower growth rate. Thus it makes sense that R_{0} was estimated to be higher when the incubation period is included.
It is also noteworthy that both our BDEI and coalSEIR analyses converged on similar estimates for R_{0} and the epidemic origin. Thus our epidemiological estimates appear robust to the specific assumptions of these two models. Nonetheless, we do observe that our credible intervals for R_{0} and the exposed and infectious periods are considerably wider under the coalescent than the birthdeath model. This may seem counterintuitive as the deterministic coalescent models used here ignore demographic stochasticity and should therefore underestimate the true level of uncertainty about the parameters. The confidence intervals being in fact wider under the coalescent may reflect the fact that the birthdeath models are using information entering from the sampling times (while the coalescent conditions on sampling) to obtain more precise estimates of the epidemiological parameters.
Overall, we show that our inferences of the epidemiological dynamics of the current West African EBOV outbreak are robust to the model used and also consistent with estimates from previous outbreaks as well as other estimates of the current epidemic. Our hope is that more sequencing data from the epidemic will be made available in the immediate future. New data will allow us to estimate how the effective reproductive number has changed since June and allow us to estimate the incubation and infectious periods more reliably. Such estimates would be invaluable not only for evaluating the success of containment efforts, but also for planning future interventions.
All Bayesian methods will become available within the BEAST2^{13} addon “phylodynamics” (https://github.com/BEAST2Dev/phylodynamics) and are available directly from us prior to the official release. The maximum likelihood methods are available within our R packages TreeSim v2.1^{31} and TreePar v3.1^{16}. We provide an R script specific to the Ebola analyses on our website (www.bsse.ethz.ch/cevo).
The authors have declared that no competing interests exist.