Introduction

On April 15 at 2:49pm EDT, two bombs improvised from pressure cookers exploded on the sidewalk near the finish line of 117th Boston Marathon. 264 patients were transported to 19 emergency departments throughout Boston1.

Public health authorities provided alerts to regional emergency departments approximately nine minutes after the explosions, just as ambulances left the scene2 via the Massachusetts Emergency Preparedness Bureau (MA EPB) Health and Homeland Alert Network (HHAN). The Massachusetts Central Medical Emergency Direction (CMED) Center also began communicating to individual hospitals by radio in the minutes prior. We sought to measure the timing of social media reports in relation to those issued through these official emergency response channels.

Social media and other mobile platforms enable individuals to post messages along with specific geographic information. These messages can help track infectious disease outbreaks,3 aid in natural disaster response4 and provide insight into conflicts5, where data collected through official reporting structures can take weeks to collect and analyze. Twitter streams are routinely used by high frequency trading applications to rapidly assess external factors that may affect market conditions6 and used in marketing to assess consumer response in real-time to advertisements.7 Data from informal media are typically available in near real-time and can be combined with geographically encoded news media reports and traditional data sources to improve surveillance.8 Here, we characterize the early social media response to a geographically-constrained, rapidly evolving critical situation.

Geo-localized social media trend analysis

Our analysis was based on the set of Twitter postings with geolocation data (latitude and longitude) freely available via the Public Twitter API.9 To increase specificity, we narrowed the radius of the tweets to 35 miles from the Boston Marathon finish line. We observed messages containing the word stems: ‘explos*’ or ‘explod*’, just 3 minutes after the explosions (Figure 1). When adding words beginning with ‘bomb’, a picture quickly emerges of an incident warranting further exploration.

While an increase in messages indicating an emergency from a particular location may not make it possible to fully ascertain the circumstances of an incident without computational or human review, analysis of such data could help public safety officers better understand the location or specifics of explosions or other emergencies. Figure 1 illustrates the timeliness of the social media data, along with initial Twitter postings about the event from pertinent national and local news sources (CNN, the Associated Press, Boston WCVB) and with electronic messages from the MA EPB HHAN. Social media messages directly from individuals on the ground were timely, followed closely by validated public health alerts; messages from news sources followed both of these.

Fig. 1: Cumulative time series of tweets from within a 35 mile radius of the Boston Marathon finish line selected using the stems “explod*”, “explos*” and “bomb*” after the bombings at 2:49.

Public health officials alerted regional emergency departments via the HHAN at 2:59. Reports from news stations such as WCVB, the Associated Press, and CNN followed shortly after.

The baseline level of messages with these included keywords is very low in this area. For the stems ‘explos*’ and ‘explod*’, there was only one other message the day prior to the explosions in the same geographic radius.

Geolocation and characterization of the event

Within the first 10 minutes of the bombings, many of the observed messages were from the immediate vicinity of the finish line (Figure 2).

Fig. 2: Public Twitter messages selected using the stems “explod*” and “explos*” in the immediate vicinity of the Boston Marathon finish line from the first 20 minutes after the bombings.

These messages were selected for inclusion because of their proximity to the event and content of these postings. Source OpenStreetMap.

Because of their proximity to the event and content of their postings, these individuals might be witnesses to the bombings or be of close enough proximity to provide helpful information. These finely detailed geographic data can be used to localize and characterize events assisting emergency response in decision-making. In the Supplementary Materials, we include the text of the Twitter messages from individuals (SM Table 1) and (SM Table 2), and Twitter postings from select news sources (SM Table 3). We have redacted expletives and personal identifiers.

Discussion

Each year, the Boston Athletic Association provides a medical tent near the finish line of the Boston Marathon as well as a strong security and media presence. Hence, first responders, law enforcement and reporters were already present near the explosions, and were able to respond to the injuries, activate the emergency response system, and begin the investigation. In other situations, crowd-sourced information may uniquely provide extremely timely initial recognition of an event and specific clues as to what events may be unfolding–e.g. “area of 671 Boylston St.”, “hundreds hurt…bloody”—that could be used to tailor and refine the response.2 Here we described how data from Twitter provided localization and characterization of the Boston Marathon explosions.

Caution in the use of social media reports is warranted, however. While social media data can provide timely insight into events as they unfold, they may also produce false positive reports with negative effects, as illustrated by a powerful example from the financial sector. In the recent “flash crash,” a spurious Twitter report of a White House attack by the Associated Press was promulgated over 4,000 times.6 This led automated financial systems to take rapid – but inappropriate – action, which was quickly reversed.10

Classification strategies and filtering approaches that have been developed for disease surveillance,8,11,12,13 geospatial cluster identification,14 and crime tracking15 may help refine the sensitivity and specificity as well as classify postings into relevant categories such as personal, informative and other.16 Additionally, by comparing newly observed data against temporally adjusted keyword frequencies, it is possible to identify aberrant spikes in keyword use. The inclusion of geographical data allows these spikes to be geographically adjusted, as well. Prospective data collection could also harness larger and other streams of crowdsourced data, and use more comprehensive emergency-related keywords and language processing to increase the sensitivity of this data source. The analysis of multiple keywords could further improve these prior probabilities by reducing the impact of single false positive keywords derived from benign events.

In the wake of the explosions, Twitter became a news source for many individuals, which allowed unvetted information to enter the public sphere. For example, in our keyword analysis, we see two tweets that blamed individuals from Korea for the explosions (SM Table 1 and SM Table 2). This misinformation — which may be posted in a practically anonymous fashion — is difficult to correct or expunge once it has been cited by the media or shared extensively on social media platforms.

Given this risk, the sensitivity and specificity of notifications must be optimized using the perceived cost of intervention (e.g. unnecessary investigation) and the opportunity cost of not reacting when appropriate (e.g. delay in care). In events that unfold over a longer period of time, or require more in-depth investigation, it is possible that the dangerous impact of false positives may be reduced, or that the data may be weighted to mitigate unwanted consequences.

There is a real opportunity to make use of Twitter streams and other social media data to expedite public health, safety, or medical response in crises. Approaches to actively survey social media to complement traditional approaches to situation awareness after emergency events should be developed which integrate with existing analysis and alerting infrastructure.