Introduction

On average, twenty-six million people per year have been displaced by natural disasters between 2008 and 2014 1. Displaced people are in many disaster settings the most vulnerable group and frequently most in need of assistance 2 . However, traditional methods of quantifying large-scale population movements after disasters are slow and unreliable 3,4,5. This is complicated by the fact that displaced people frequently move into urban informal settlements or seek shelter with host families, where they are effectively lost to follow-up from humanitarian agencies 6.

De-identified mobile operator data have the potential to shed light on population movement and displacement patterns. Operationally they were used in the aftermath of the Haiti 2010 earthquake to quantify population mobility and displacement patterns, in the Haiti 2010 cholera outbreak outputs to inform the early response activities7 and during the 2014 Ebola outbreak to model pre-outbreak mobility patterns in affected countries8. In Haiti, the relative geographic distribution of mobile phone movements correlated closely with data from a large-scale retrospective household survey, performed seven months after the earthquake7. However, the first analyses for responders to the Haiti earthquake were distributed four months after the earthquake, too late to support the early humanitarian response6. Similar analyses after a natural disaster, based on mobile operator data, were performed after the 2011 earthquake in Christchurch, New Zealand but were first released 12 months after the earthquake9.

De-identified mobile phone call detail records (CDRs) contain the time and associated cell tower of text messages and calls and thus can be used to study human behaviour and mobility patterns8,10,11. CDRs are electronic records kept by telecommunication providers that contain individual level data on mobile phone usage. The data typically consists of a record for each ‘event’ – such as making or receiving a call, or sending or receiving a text message – with attributes, such as a timestamp, duration, and – crucially – the cell tower that the user is connected to at the time12. CDRs thus provide rich information for measuring the collective mobility patterns of populations over time11. Although there are inherent biases within the data due to differences in phone ownership across geographical and socioeconomic groups, evidence so far suggest that CDRs currently provide by far the best – and most current – data available to describe population movement patterns in low and middle-income countries7,8,13. CDR-based mobility data, therefore, have been used across a variety of research areas; for example, prediction and modelling of the spread of infectious diseases14,15,16,17,18,19, traffic monitoring20 and the analysis of commuting patterns21.

Here, we report on ongoing work following the 2015 Nepal earthquake, where estimates of national level population movements based on de-identified data from 12 million mobile phones were released to all parties within nine days of the earthquake. The work was made possible by the establishment of a collaboration with the operator before the earthquake. Details of the analysis technical framework, population movement patterns after the earthquake and areas for future research will be discussed.

The 2015 Gorkha earthquake

The Gorkha earthquake, named from the district in which it originated, struck on the 25th April 2015 at 11.56 am local time. The quake had an intensity of 7.8Mw with an epicentre at 28°14’24”N, 84°45’0”E22. It triggered massive avalanches on Mount Everest and in the Langtang Valley. Aftershocks occurred at 15-20 minute intervals during the following days23. A 6.9ML secondary earthquake (epicentre 27°50’24”N, 86°2’60”E), occurred in the Dolakha District a day later, at 12:54 pm local time. A series of two more major earthquakes and four aftershocks hit Dolakha District once again on 12th May 2015, with the two earthquakes registering 6.8ML and 6.2ML22. By the 23rd June 2015, the Nepali Government stated that Dolakha District had the highest number of casualties outside the Kathmandu Valley24.

As of 1st June 2015, the Nepali Government estimated the earthquake to have caused 8,673 deaths and 22,309 injuries24,25. The UN Resident Coordinator’s Office in Nepal estimated that 2.8m people were in need of immediate assistance26. Significant landscape shifts, a severe degradation of Nepal’s transport network, numerous landslides and avalanches along the Kathmandu Valley resulted in large parts of the country becoming isolated.

The destruction of villages and towns in the most severely-affected regions further contributed to population displacement, compounded by severe damages to transport infrastructure27. The Nepali government estimates 2,600 government buildings were completely destroyed in the earthquakes, with some 3,700 more partially destroyed24.

Methods

Mobile phone use in Nepal

Nepal has 26 million mobile phone subscriptions28 and a population of 27 million people29. The two largest operators, Ncell and Nepal Telecom, have 12.9 and 12.2 million subscribers respectively30. Mobile phone penetration is increasing rapidly: in 2011, 75% percent of households (92% in urban areas, 72% in rural areas) reported having at least one mobile phone31 .

Setup

The Flowminder Foundation develops methods for estimating mobility patterns and population displacement. It has established collaborations with mobile phone operators in countries where analyses of de-identified mobile operator data can support preparedness and response to humanitarian disasters. An agreement between Flowminder and Ncell in Nepal was signed 6 months before the earthquake. The first technical planning meeting between the parties took place in Kathmandu one week prior to the earthquake. Although the technical set-up had not been completed at the time of the earthquake, Ncell were able to provide access to de-identified data for Flowminder analyses within six days of the earthquake.

CDR analysis was undertaken in compliance with the GSMA privacy guidelines developed in the context of the Ebola outbreak32. These guidelines state that analyses should be performed on de-identified data and that individual-level data should not be transferred from the mobile phone operator’s servers. Therefore, we collaborated with Ncell to set up a high-specification Linux analysis server (with 128GB of RAM and over 20TB of disk space) within their data centre. All analyses were performed by connecting to this server remotely, analysing the data, and then only transferring aggregated data outside the operator.

The analysis framework was developed in the Python programming language, and consisted of a series of preprocessing steps automatically run for each new day’s CDR data, followed by separate analyses designed to investigate particular aspects of mobility after the earthquake. Analysis was performed using custom-written code, which used the standard scientific Python stack33,34,35.

Preprocessing

Each day’s CDRs were provided by 1am on the following day and historical data back to January 2015 were also provided. The preprocessing steps were designed to reduce this raw CDR data (which was provided as approximately 12GB of CSV files per day) to a more manageable size, which consequently lowers the time and memory usage of the subsequent analyses significantly. The preprocessing involved:

  1. Shrinking the raw data by removing unused information, and assigning each separate tower location a location ID, thus reducing the data to around 2.5GB per day.
  2. Calculating a ‘daily location’ for each user (where a user is taken to be a unique phone number, i.e. a SIM card). This was designed to be a single tower location which represents the location of the user for that day. As the aim was to investigate displacement, the overnight location of a user determined their daily location. Investigations were made into how best to estimate this location from the location of all calls a user made that day, including subsetting by time, or spatially grouping calls. However, in Nepal, the most appropriate definition was judged to be simply the location of the last call the user made that day.
  3. Assigning daily locations on administrative boundary level based on the tower level location. In this work administrative boundaries at level 3 (District) and level 4 (Village Development Committee, or VDC) were used, digitised from Nepali governmental maps by the UN Organisation for the Coordination of Humanitarian Affairs36,37,38. Each user was assigned a daily location at District or VDC level, based on the administrative area that the daily location (at tower level) was situated in.

Estimating population flows

Conceptually, flows can be estimated by simply recording a location for each user at two different times, and then counting the number of users who moved from one location to another. This produces a transition matrix containing the flow of users between each possible pair of locations7,11. The origin and arrival locations can be defined at the tower-level or aggregated by administrative boundary level (District or VDC) as described above. To describe displacements, the ideal transition matrix includes moves from people’s homes to a new temporary or permanent location and ignores short term movements such as a daily commute. Therefore, choosing the right spatial and temporal scale to capture movements indicating moves or displacements is crucial. To reduce the influence of noise introduced by short term trips or commuting patterns a ‘home location’ was calculated for each individual by calculating the modal daily location over a certain period. These home locations were then used to calculate transition matrices describing the countrywide mobility between two points in time.

In Nepal, mobility is fairly high and thus large flows of people are observed between areas of Nepal under normal conditions. To account for this high baseline mobility, flows following the earthquake (termed post-earthquake flows) were normalised using pre-earthquake mobility estimates (normal flows). Normal flows were calculated as the changes in location from a benchmark period (consisting of historical data from the 1st January until 7th April 2015) to a comparison period just before the earthquake (20th-24th April, chosen to avoid the large population flows that may be experienced around the Nepali New Year festival). Post-earthquake flows were calculated between the benchmark period and the focal period, which is the period of interest, the most recent week of data (Figure 1).

The difference between post-earthquake flows and normal flows provides a measure of anomalous flows or ‘flows above/below normal’. An important added value of this approach is that most of the noise in the estimates of home locations due to high levels of daily mobility under normal conditions is cancelled out. This set-up allows examination of how the earthquake has affected mobility without erroneously attributing high but nevertheless normal flows, generated by overall high mobility, to the earthquake. Flows are only calculated for users who make calls in all three of these periods, thus excluding SIM cards lost during the earthquake, and incoming relief workers.

The flows calculation produced a transition matrix giving the anomalous flow (number of users, above and below normal) moving between each pair of locations. This can be summarised to produce two metrics for each region: the total above-normal inflows (the sum of all of the flows from any region into that region), and the total above-normal outflows (the sum of all the flows from that region into any other region).

image00

Fig. 1: Diagram showing the periods used for calculating anomalous flows

Scaling population flows

Humanitarian agencies require information about total population flows, rather than the subset represented by subscribers. Assuming SIM card movements to be representative of population movements, absolute flows were estimated by scaling SIM cards counts based on local Ncell user penetration rates. The number of active SIM cards in an area, for example an administrative unit, was calculated from CDRs. Pre-earthquake census-based population data is however not available per tower area or for small administrative areas. Data from WorldPop, which provides gridded population estimates per 100×100 meter grid square for the entire country, was used to estimate finer level population counts39 (http://www.worldpop.org.uk/).

For each District (administrative level 3), WorldPop population counts for 2015, adjusted to match UN estimates at the national level, were summed to produce administrative-level population estimates. Scaling factors, used to estimate absolute population flows were then calculated at District level. It was assumed that the ratio of the flow of SIM cards to the combined number of SIM cards ( ), is representative of the ratio of the actual population flow to the combined population ( ). This can be formalised as:

Fig. 2: Mathematical formalisation of scaling assumption

resulting in the following scaling factor used in the analyses:

Fig. 3: Scaling factor derivation

This was validated for the context of Nepal by comparing the scaled estimates of total inflows (including static users) for each area during the pre-earthquake baseline period with the census population data for that area, which showed a close match.

To understand how flows were changing in the weeks following the earthquake, anomalous flows were calculated for each week of interest, with the focal period gradually moving forward in time.

Analyses of returning residents

An additional question of importance to relief work is identifying regions to which people have not returned, as this may indicate regions in which recovery has not yet reached sufficient levels. One way to assess this is to identify users who have left their home location due to the earthquake, and continue to reside in another location.

This can be done by determining the home location of users over a long benchmark period prior to the earthquake (1st January until 7th April 2015 in this case), using the same method as for the flows calculation above. A user was counted as displaced if they had spent at least seven consecutive days away from their pre-earthquake home location in a two week period after the earthquake. Iterating through the remaining data the percentage of displaced users who remained away (i.e. at a location different to the pre-earthquake home location) was calculated.

Plotting the percentage of users who had not returned over time gives an indication of the rate at which users are returning to a given location. As this is a percentage it does not suffer from the uncertainties that scaling may introduce in the analyses above. Over time, a portion of users disappear from the data set. This may be because they have left Nepal or their SIM card has become inactive. We assumed that the same percentage of missing users remained away from home as those who were present in the data set.

Using this data, trends in the ‘return rate’ for each region can be derived as well as snapshots of the most recent data. In the latter case regions are coloured based on the mean and standard deviation of the dataset, marking regions with a high (), medium () and low () proportion of people still away from home.

Results

The first preliminary results were available nine days after the earthquake, with the first full report released thirteen days after the earthquake. In this time the server was set up, the preprocessing and analysis code designed and written, all data processed, and the outputs checked for accuracy.

The first few weeks after the earthquake saw large flows out of the Kathmandu Valley area to surrounding areas (particularly to Nuwakot and Kavrepalanchok) as well as to the highly-populated areas in the central southern region of Nepal (Figure 4). Overall, an estimated 390,000 people above normal levels had left the valley. Flows from Kathmandu to the areas in the North were still higher than normal, but significantly lower than those to the South – likely due in part to the higher level of earthquake damage in the northern regions.

Looking at the flows into each region over time, from just after the earthquake until mid-July (Figure 5) shows sharply decreased flows into Kathmandu Valley immediately after the earthquake. This reduction gradually normalised. By early June the flows were very close to normal conditions, and by late June the estimated number of people in Kathmandu Valley had increased to above the pre-earthquake level (nearly 50,000 additional people had come into the Kathmandu Valley compared to pre-earthquake levels). This increase may be influenced by normal seasonal movements but may also be caused by the ongoing reconstruction work in the Kathmandu Valley.

The other regions can be categorised into three groups: those which experienced little changes in flows due to the earthquake (Okhaldhunga and Rasuwa, with inflows around 5,000 people above normal), those which experienced very large inflows immediately after the earthquake (Nuwakot and Gorkha, with inflows around 30,000 people above normal) and the remaining regions, which experienced a moderate increase in flows (with inflows around 10,000-25,000 people above normal) immediately after the earthquake. Flows in all regions (excluding the Kathmandu Valley) seem to have stabilised since late June.

Fig. 4: Anomalous flows from the Kathmandu valley, comparing the 10th-14th May with the 20th-24th April

Fig. 5: Anomalous inflows (above normal) for the ten focus Districts (note the two y axes to deal with the significantly higher values for Kathmandu)

The trends in return rates over time (Figure 6) show a significant decline over time for all regions, with almost 50% of the people displaced still away in early May (not shown in graph, for clarity) and a maximum of 15% still away in late July. Some regions are doing notably better (Dhading and Gorkha) and some notably worse (Bhaktapur and Kathmandu) than average. Some regions have recovered more quickly than others, and some regions have relatively sudden changes (for example, Dolakha in early July), which may coincide with recovery work in these regions.

Fig. 6: Percentage of people who left their home district who remain away, over time from early May after the earthquake until the end of July

Examining this spatially, at VDC-level (Figure 7), shows which regions are performing particularly well, or particularly poorly. Clusters of regions with low return rates can be seen in the outskirts of Kathmandu (potentially suggesting a focus of recovery on the city centre), regions to the south-west of Kathmandu and a number of the regions in the mountainous northern regions. While mobile coverage of the population was good, many areas are mountainous and sparsely populated. Areas with no coverage or insufficient data are marked in grey.

Percentage_still_away_admin4_20_08_2015_InsufficientData

Fig. 7: Percentage of people displaced by the earthquake who remain away as of the 19th August, shown spatially for the focus Districts, at VDC-level.

Discussion

We showed that an estimated 390,000 people above normal left the Kathmandu Valley soon after the earthquake. Many of these moved to the surrounding areas, and the highly-populated areas in the central southern area of Nepal. People who left their home area after the earthquake have gradually returned, with the return rate varying between regions. By late July, all Districts had less than 15% of people still away from their original home location, with some as low as 5%.

The analyses presented above have provided an unprecedented level of information about human displacement after the Nepal earthquake. This data has never been available before in such a short time after a natural disaster has occurred, which was made possible by having a data access agreement in place as well as the operator providing rapid access to the data very soon after the earthquake. The analyses reveal national level population mobility patterns and return rates which are extremely difficult or impossible to acquire using other methods. These are of great relevance to humanitarian agencies, as mobility patterns can help identify where aid should be directed, and low return rates can identify areas where recovery and reconstruction work may not be progressing well.

Reports containing these results, along with interpretation, were distributed to relevant humanitarian agencies working on the ground in Nepal, and are available online at the WorldPop website (www.worldpop.org.uk/nepal/). Key results of analyses were included in reports by the UN Resident Coordinator40.

The analyses have several limitations, some of which were due to the speed with which the analyses needed to be delivered and several of which are possible to address in future studies. Phone ownership is skewed towards certain population groups: potential biases include higher ownership of phones among males than females as well as among higher income groups and certain age groups41,42,43. Similarly, phone usage within households can vary, and phones can be shared between multiple members of a household. While the scaling method described above is likely to account for some of these biases, estimates could be further improved by incorporating information on phone ownership from surveys in cases where those are available. It is also currently unknown to what extent family members without phones moved together with family members with phones.

Furthermore, analyses estimate the number of people moving after the earthquake but not the reasons for doing so. While estimation of population flows above and below normal levels aims to address this issue, a higher precision in displacement estimates would have been possible if analyses had been combined with population surveys. Such surveys are now being planned as well as comparison between CDR-derived displacement estimates and survey-based data collected by third parties.

CDRs only provide location updates for individuals when a call is made. In this dataset calling frequencies were relatively low, with around 50% of people calling at least every other day. Detailed movements among users with infrequently updated locations are therefore missing from the data 44. However, the effects on the analyses are reduced when home locations are calculated using the modal value of multiple consecutive daily locations.

A small proportion of SIM cards become inactive over time. New SIMs tend to enter the dataset at roughly the same rate but there is no way of linking the old and new SIMs to a particular user. Most analyses can be corrected to account for this effect, but this may cause bias if the data is analysed over very long time frames.

In these analyses we assumed that towers belong to the administrative area in which they are placed, as tower functionality and replacements took place repeatedly during the period. Coverage areas do however often extend over more than one administrative area, which may have contributed to bias. This can potentially be improved by dynamically accounting for changing tower coverage areas in the analyses. For many VDC areas little or no data is available. The principle reason for this is that the mobile network did not have sufficient coverage in these regions to make statistically significant inferences. However, while these areas geographically make up a large proportion of the area of Nepal, they are mostly concentrated in the sparsely populated mountainous areas with small population densities. We were not able to take into account seasonal drivers of population movements. Ideally, one or several years of historical data would be used to adjust for seasonality in these mobility patterns.

Avouac et al. (2015)45 concludes that the Ghorka earthquake in Nepal has not released all of the stress from the tectonic plates in the region. Another earthquake is therefore likely to occur within the next few years, and Nepal routinely experiences large floods and landslides. Our ongoing work is focusing on further automating the analyses processes and improving estimation methods to allow us to rapidly provide high-quality estimates of population displacement.

Conclusions

The value of CDRs when integrated with more traditional data sources within modelling frameworks has been shown across multiple disease, development and disaster application examples. The field of CDR analytics in the humanitarian space is therefore moving beyond pilot studies and towards a more mature and operationally valuable platform. Here we have shown how this can be achieved in a disaster response situation through the partnership of scientists, mobile network operators and response agencies. The work described is not a single study, but the initiation of an ongoing dynamic and near-real time monitoring system, providing data support to a country with high levels of poverty, and populations that are highly vulnerable to the effects of natural disasters and disease outbreaks.

Competing Interests

The authors have declared that no competing interests exist.