The recent Ebola virus disease (EVD) epidemic in West Africa was the largest EVD outbreak in history, spreading across Guinea, Liberia, and Sierra Leone, infecting an estimated 28,600 individuals, and claiming over 11,000 lives.1 Numerous factors contributed to the human-to-human spread of EVD, including individual-level factors such as caring for infected individuals and involvement in funeral preparations of infected corpses, as well as systemic and upstream factors such as inadequate healthcare infrastructure.2 Mobile populations coupled with porous borders3 and commercial air travel4 resulted in EVD transmission outside of the epicenter of the outbreak to neighbouring countries including Nigeria, Mali, Senegal, and to other continents including North America and Europe.5

EVD has caused numerous outbreaks, the majority in equatorial Africa, since the first human outbreaks were detected in 1976 in the Democratic Republic of Congo and South Sudan.6 Five species of Ebolavirus have been isolated: Bundibugyo, Côte d’Ivoire (Taï Forest), Reston, Sudan, and Zaire.7 Bundibugyo, Sudan, and Zaire are responsible for the majority of human-related outbreaks with the strain in the West African epidemic belonging to the Zaire species. Fruit bats are believed to be a key reservoir of Ebolavirus, which can also cause illness and death in non-human primates such as in apes and monkeys.8 Human outbreaks are typically started when an individual has come into contact with the blood of an infected mammal or bushmeat6 although the West African outbreak is thought to have begun because of fruit bat exposure.9 Human-to-human transmission is then propagated through direct contact with infected individuals and cadavers.10 Illness in humans appears two to 21 days after infection and the initial symptoms are fever, headache, and myalgia. These symptoms are followed by vomiting, diarrhea, rash, impaired liver and kidney functions, and can also result in internal and external bleeding.8 The case fatality rate for the most recent outbreak was estimated at 71% during the first nine months of the epidemic, which is similar to other outbreaks of Zaire ebolavirus species (60%-90%) and higher than outbreaks from Sudan ebolavirus species (40%-60%).11

Several studies have identified predictors of Ebola ecological niches (occurrence of environmental conditions that support its presence in a particular location),12 spillover events (introduction into human populations), or the onset of EVD outbreaks. The identified predictors have included extensive EVD-related deaths in primates;13 deforestation and human forest activities;14 population density;15 elevated levels of precipitation,16 humidity,17 and elevation;12 the transition from rainy to dry season;14,18 moderate-to-high temperatures16 and lower temperatures in equatorial Africa;12,17 as well as increased vegetation density12 and evergreen broadleaf forest coverage16.

Previous work has investigated the probability of Ebola outbreaks, and mathematical modelling studies have estimated the size, speed, and spatio-temporal patterns of EVD using simulated data.9,10,12,13,16,17,19,20,21,22However, further evidence is needed to guide public health control efforts during an outbreak, and empirical analysis of real data from the West African epidemic may yield important insight into the successful control of future outbreaks. There have been limited studies that have assessed the spatial distribution of EVD human cases using empirical data; Stanturf et al.23 found that social vulnerability was qualitatively and positively associated with the spatial EVD transmission in Liberia and Rainisch et al.24 found that spatial risk of EVD infection was as related to population, cases, and distance between affected and unaffected areas. In this study, we sought to identify environmental and population-level demographic spatial predictors of human EVD risk from the recent West African epidemic.


Study area

The study included the three West African countries with widespread EVD transmission: Guinea, Liberia, and Sierra Leone. Guinea has an estimated population of 11,780,00025 and covers an area of 245,860 km2 including terrain consisting of costal mangrove plains, forested highlands, and savannah plains.26 Guinea has a tropical climate with a rainy season (April to October) and a dry season (November to March).25 Liberia has an area of 111,370 km2 with an estimated population of 4,196,000. Its terrain ranges from sandy coastal plains to rolling hills and rolling plateau, with low mountains in the northeast.26 Liberia has a warm, humid climate with a rainy season from May to October and a dry season from November to April. Sierra Leone’s estimated 5,879,000 population is contained within an area of 71,740 km.25 It has coastal mangrove swamps, wooded hills, an upland plateau and mountains in the east. Sierra Leone experiences a tropical climate with a rainy season from May to October and a dry season from November to April.26


Publicly available data from the World Health Organization (WHO) included weekly counts of confirmed EVD by prefecture (Guinea, n=34), county (Liberia, n=15), and district (Sierra Leone, n=14) as of May 13, 2015.27We calculated the cumulative confirmed cases from these data, which was our outcome of interest. We then explored correlations between various environmental and socio-demographic variables, as a first step in identifying predictors to include in our final model.

Data for potential predictors were obtained from multiple sources including satellite sensor-derived environmental data and national Demographic and Health Surveys (DHS). Rainfall, temperature, and land cover measures were obtained from the Tropical Rainfall Measuring Mission (TRMM) and moderate resolution imaging spectroradiometer (MODIS) instruments onboard the Terra satellite. The TRMM product (TRMM3B42RT) provided weekly accumulated rainfall estimates with a spatial resolution of 0.25° × 0.25°, which was then weighted by the surface area daytime and nighttime land surface temperature (LST) estimates were obtained from MODIS (MOD11A2) using eight-day composite images at a 1 km × 1 km resolution. TRMM and LST estimates were obtained between April 28, 2014 and May 3, 2015 and 17 land cover classifications were provided by MODIS (MCD12Q1) at a 500 m × 500 m resolution for the most recent year available (2012). Eight of the land cover classifications were considered in the analysis as the remaining nine categories had negligible presence in Guinea, Liberia, and Sierra Leone. Density of waterways and roadways were estimated (km per km2), as was the average elevation and land area from shapefiles obtained from DIVA-GIS. Shapefiles from the WorldPop project were acquired, which provided projected population estimates. All estimates were obtained at the district level and Universal Transverse Mercator zone 28 projection was used.

National DHS from Guinea (2012), Liberia (2013), and Sierra Leone (2013) were used for subnational estimates on household education, wealth, occupation, household structure, and possessions and amenities. For Liberia and Sierra Leone, these estimates were obtained for the county and district levels respectively. For Guinea, DHS estimates were only available at the regional level (n=8), therefore prefectures within each region were given the same DHS-derived values.

Covariate selection

Linearity between covariates and the outcome was first assessed and for ease of interpretation and to avoid modelling complex non-linear terms, all continuous variables were reclassified into tercile intervals. Simple transformations (e.g., exponential, log, quadratic) were insufficient to produce a linear relationship between the covariate and outcome. Multicollinearity between ternary covariates was then examined using Cramer’s V28 and if two or more variables had correlations under 0.4, we fit one bivariate Poisson regression model for each variable with the outcome, and selected the covariate whose model had the lowest Akaike information criterion (AIC).

We regressed the cumulative total of EVD cases (outcome) on the selected covariates using a Poisson model with the total population per district as an offset, given the population differences between the districts. We used the glmulti29 R package to exhaustively explore the covariate subset and selected the covariate subset whose model had the best AIC.30

Assessment of spatial variance

Having identified a covariate subset, we quantified the amount of variation in EVD cases that this covariate set explained by using a spatial autoregressive modelling approach.31 We expanded our multivariable Poisson model to include two random effects: one explaining uncorrelated residual variation (u), and one explaining spatially correlated residual variation (ν).32 Although this approach to modelling disease variability across space has been widely used in the spatial epidemiologic literature,33,34,35 we use this approach to qualitatively assess the spatial and non-spatial variation explained by covariate subset. The model is as follows:

where i and j refer to two distinct regions in the study area, i ~ j refers to two neighboring regions, and ni refers to the number of neighboring regions for region i. The covariates from x1 to x12 are defined in Table 1. T2 and T3 refer to the second and third terciles. Non-informative priors were used for the variance and the regression coefficients, and sensitivity analyses suggested that the priors did not significantly affect the marginal posteriors of the parameters of interest.

Table 1 Covariates

T2=second tercile, T3=third terciles; *Kilometers of roadway per 100 km2 of land area; †Number of people per km2 of land area; ‡Proportion (%) of total land surface area; §Households without radio possession; ¶Mean years of education of head of household.

Parameter Parameter value
x1 Rainfall T2
x2 Rainfall T3
x3 Roadway T2*
x4 Roadway T3
x5 Population T2†
x6 Population T3
x7 Urban T2‡
x8 Urban T3
x9 Radio T2§
x10 Radio T3
x11 Education T2¶
x12 Education T3

Spatial hierarchical models were fit using Bayesian estimation via the R-INLA package.36 R-INLA uses integrated nested Laplace approximations to estimate marginal posterior distributions for each parameter, rather than a simulation-based approach like Markov chain Monte Carlo (MCMC) methods.37 Approximations using INLA have been shown to be extremely accurate based on simulation studies and comparisons to well-performing MCMC, and are more computationally efficient than standard MCMC software implementation options such as WinBUGS.38 Two spatial models were fit using R-INLA; the first was a null model including only the outcome (EVD cases) with a population offset, an intercept, and the two random effects (uncorrelated variation and spatially correlated variation). For the second (full) model all of the selected covariates were added to the null model. This allowed for the visual assessment of areal-level spatial clustering of EVD cases that was explained by the selected covariates.

The analysis was conducted in R version 3.2.1 software and STATcompiler was used for DHS indicators included in Table 1, to obtain country-level values.


Table 2 summarizes selected covariates at the country-level for Guinea, Liberia, and Sierra Leone. Sierra Leone had the largest number of confirmed cases as well as the highest road and waterway density. Liberia and Guinea were similar in case burden, with Liberia having the largest portion of households headed by females and households without toilets. Guinea had the highest elevation as well as the highest proportion of households with electricity.

Table 2 Country-level summaries for selected covariates using mean values

*Total confirmed EVD cases as of May 13, 2015; †Kilometer of roadway or waterway per 100 km2 of land area; ‡Proportion (%) of total land surface area; §Proportion (%) of households headed by males who have completed secondary education; ¶Proportion (%) of households that drink surface water source such as river, canal, dam, irrigation channel, lake, pond, and stream.

Covariate Guinea Liberia Sierra Leone
Total confirmed EVD cases* 3,144 3,339 9,394
Average rainfall accumulation (cm) 3.2 3.6 4.8
Average elevation (m) 3.2 172.1 166.4
Roadway density (km)† 10.0 11.2 16.8
Waterway density (km)† 11.2 9.0 15.7
Cropland (%)‡ 9.5% 5.1% 13.3%
Female headed households (%) 17.3% 35.2% 28.0%
Secondary education (%)§ 1.5% 10.5% 4.8%
Households (%) without toilets 19.5% 45.2% 21.4%
Households (%) with drinking water 10.2% 15.4% 18.2%
Households (%) with electricity 26.2% 9.8% 13.5%
Households (%) with radios 61.5% 58.9% 58.8%

Figure 1 displays the correlation between the covariates that were included in a final model, which was less that 0.4 for all.

Figure 1: The color and circle size vary with correlation, with increasing circle size and intensity in the color blue represent increasing correlation.

Table 3 presents the median posterior rate ratios for the covariates that were included in the final model. We report medians of the marginal posterior distributions for each parameter as our point estimates, as commonly done in Bayesian analyses. The spatial analysis indicates that districts were more at risk of EVD with increasing rainfall (RRRainfall2 2.18; 95% credible interval 0.66-7.20; RRRainfall3 5.34, 1.20-23.90), urban land cover (RRUrban2 4.87, 1.56-15.40; RRUrban3 5.74, 1.68-19.67), households not possessing a radio (RRRadio2 2.79, 0.90-8.78; RRRadio3 4.23, 1.16-15.93), and years of education (RREducation3 1.58, 0.40-6.25). Districts with low density were at higher risk than those with medium population density (RRPopulation2 0.64, 0.18-2.32) and similarly, districts with low or high roadway density (RRRoadway3 1.22, 0.35-4.26) were at higher risk those with medium roadway density (RRRoadway2 0.61, 0.19-1.96).

Table 3 Rate ratio posterior median estimates from full multivariable spatial model

*Kilometers of roadway per 100 km2 of land area; †Number of people per km2 of land area; ‡Proportion (%) of total land surface area. §Mean years of education of head of household.

Covariate Rate rate (95% credible interval)
Weekly rainfall accumulation (cm)
<3.2 1.00
3.2-4.2 2.18 (0.66, 7.20)
>4.2 5.34 (1.20, 23.90)
Roadway density*
<0.09 1.00
0.09-0.11 0.61 (0.19, 1.96)
>0.11 1.22 (0.35, 4.26)
Population density†
<33.6 1.00
33.6-68.0 0.64 (0.18, 2.23)
>68.0 0.98 (0.22, 4.35)
Urban land cover (%)‡
<0.02 1.00
0.02-0.09 4.87 (1.56, 15.40)
>0.09 5.74 (1.68, 19.67)
Household not possessing radios (%)
<38.1 1.00
38.1-47.6 2.79 (0.90, 8.78)
>47.6 4.23 (1.16, 15.93)
Years of education§
<2.1 1.00
2.1-3.2 0.98 (0.25, 3.79)
>3.2 1.58 (0.40, 6.25)

In the full model, the spatial residuals are very small in magnitude and appear to be spatially random whereas the uncorrelated non-spatial residuals suggest that unmeasured non-clustered variation remains. In other words, the covariate set explains a meaningful amount of spatial variation but there are additional unmeasured or unmeasurable factors that explain the different rates of EVD per district.


In this study, we identified several environmental and demographic spatial predictors of EVD risk at the district level for Guinea, Liberia, and Sierra Leone, which require further study to determine causality. We found that lack of radio ownership was a strong predictor of EVD risk (RRRadio2 2.79, 0.90-8.78; RRRadio3 4.23, 1.16-15.93) at the district level. Radio campaigns in all three countries used serial dramas and popular music to disseminate risk communication, prevention, and social mobilization messages, which may have reduced EVD transmission risk.39,40,41 Future work should further examine this association and also consider cost-effectiveness analysis of alternative methods of health message dissemination for places where household radio ownership is low.

The correlation between rainfall and EVD transmission risk is supported by previous work which found associations between increased rainfall or humidity and EVD outbreaks.16,17 Roads can become impassable with higher levels of rainfall, which may make it more difficult to seek healthcare treatment,23 and to implement infection control measures, increasing EVD transmission risk. A time series approach to examining this relationship may provide further insight into the association between rainfall and EVD, coupled with improved road accessibility and health facility data.

We included roadway and waterway densities as proxies for population mobility, which is thought to have been an important influence in the explosive nature of West African EVD epidemic.12,42,43 Roadway density had a U-shaped association with EVD risk with the second tercile being protective against EVD risk and the third tercile having a slightly increased risk of EVD. Higher roadway density could decrease risk by improving accessibility to treatment centers, but this association could also be an artifact of measurement error, as the roadway data was from 2007.

Surprisingly, population density had no association with EVD risk with the exception of the second tercile having a weak protective effect (RRPopulation2 0.64, 0.18-2.32). This suggests that lower population densities were at increased risk for EVD transmission, which could be a consequence of clinical and public health service provision issues in remote areas. Conversely, our finding of increased EVD risk in more urban areas (RRUrban2 4.87, 1.56-15.40; RRUrban3 5.74, 1.68-19.67), while controlling for population density, may reflect the population mobility and the increased mixing between susceptible and infected individuals in urban areas.

Previous EVD outbreaks had much fewer cases and differences have been noted in historical outbreaks when EVD was introduced into the general population versus into a healthcare setting.44 General population outbreaks were small and appear to end spontaneously with limited generations of cases, whereas healthcare settings with low standards of hygiene and sanitation can amplify transmission and result in a high number of cases and deaths of healthcare workers.44,45 The West African EVD epidemic was a mixture of nosocomial and general population settings, which sustained human-to-human transmission due to various reasons including burial practices, inadequate infection control, population density and mobility, cultural beliefs and practices, and fear.2,23,42 We were not able to measure several of the risk factors for human-to-human transmission given a lack of sufficient data at the district level and examination of the residuals supports the need to include other factors in the model.

There are different ways in which measurement error could have influenced our findings. Remote sensing data was used in lieu of ground observations due to data availability and deriving measures of environmental characteristics from remotely sensed data requires assumptions about the values, which are also subject to measurement error.46,47,48 We did not have DHS estimates for each prefecture in Guinea, therefore coarser region-level estimates were used, which may have biased our effect estimates toward the null. Additionally, the DHS data were captured from different time periods than the remote sensing and WHO data and consequently, the district-level values for the indicator may have changed from the date of data collection to the EVD epidemic. We use ecological (aggregate) data to determine district-level predictors of EVD risk, which can lead to biases if used to interpret individual-level associations for individuals within districts between EVD risk and predictors of EVD infection.49,50

Our work has shed new light on population-level spatial factors for EVD risk and future research should examine the etiologic relationships of these risk factors and EVD transmission. The potentially significant role of radio having reduced the EVD risk requires further study and is an important and modifiable risk factor for future outbreaks. Future research should incorporate higher spatial resolution (e.g., sub-prefectures, districts, chiefdoms) and a temporal dimension, as it would provide further understanding into aspects of population mobility and healthcare accessibility, which are critical features of epidemic propagation and control. In addition, these findings should be compared to other diseases that are transmitted from human-to-human in Guinea, Liberia, and Sierra Leone. This would provide further information into disease transmission patterns in Guinea, Liberia, and Sierra Leone and common risk factors among different diseases that could be used for integrated outbreak management.

Competing Interests

The authors have declared that no competing interests exist.

Data Availability

The World Health Organization’s Ebola data is publicly available from the Ebola data and statistics page ( The Demographic and Health Survey (DHS) data is publicly available for registered users from the DHS Program ( Satellite images were provided by NASA for rainfall estimates ( and by USGS for temperature ( and land cover (

Corresponding Author

Kate Zinszer (