Introduction

The 2014-2016 West African Ebola outbreak was the most pervasive in history, reaching epidemic proportions 1. The outbreak was marked with 11,325 fatalities, 28,652 reported cases and uncontrolled spread 2. It was further characterized as an epidemic of fear and anxiety, when case identification reached the US and Europe 3. In spite of reports describing modes of transmission, signs and symptoms and the low likelihood of a widespread epidemic, people in developed nations remained overly anxious. In fact, Americans listed Ebola as the third top health concern in October of 2014, clearly indicating fear and highlighting the need for effective health education 3,4. As described by the Conceptual Framework of Public Health Surveillance and Action (PHSA), public health surveillance and action to control disease are connected through data information messages. 5. To support public health action, health information messages must be widely disseminated to the public. As fear drives epidemics, evidenced by the Ebola spread in West Africa, it is essential to understand public opinion to identify health information needs 23. When these needs are recognized, tailored literacy appropriate health information can be disseminated to diminish widespread fear 6. Social media can support dissemination efforts. It allows tremendous opportunities to provide literacy appropriate health information through mass dissemination 8. Evidence by health information tweets disseminated regularly through organizations including the CDC, World Health Organization and local health departments 17, 21. Social Networking Sites like Twitter encourages users to provide and share information. The analysis of social media data allows for an assessment of the public’s knowledge, personal experiences, and health information needs 7.

Mining of social media data provides a snap shot of health knowledge in addition to the longitudinal tracking of changes in such knowledge and emerging health information needs over time. Micro-blogging is a very powerful and popular communication tool. Millions of messages are sent daily on micro-blogging sites such as Twitter 8. Research has demonstrated that Twitter is a reliable source for tracking knowledge and opinions for a variety of events and issues 7. Such tools allow the public to play a role in knowledge translation including information generation, filtering and dissemination 7,9 . It then becomes critical during epidemics and other emergencies to monitor online public perceptions and responses. Online monitoring allows for the examination of knowledge translation and for the modification and tailoring of health information for effective health educational campaigns 5,7.

The current study sought to assess health information needs about Ebola through longitudinal tracking. We provide a snapshot of beliefs, opinions and responses at three time points beginning in August 2014, when the outbreak raged out of control and through March 2015, when the outbreak showed evidence of containment. Through natural language processing (NLP) and content analysis, we explore how Twitter users communicate about Ebola-related events over time and identify health information needs that emerge at each time point.

Methods

Tweet Corpus. Tweets mentioning Ebola were collected longitudinally from Twitter (https://twitter.com/) via Google Chrome-based version of NCaptureTM a crawler that captures internet-based text including social media for the analysis of data in NVivo. The direct access of tweets (tweet text) through https://twitter.com provides the streaming Application Programing Interface (API) 11. Streaming API allows its users to capture only limited amount of all tweets (e.g., 18000 Tweets per 15 minutes), the domain experts (SY, MO) captured tweets daily for optimal access to available tweets from July 2014 to March 2015.

To assess the change in Ebola information needs, we analyzed tweets one week after the official CDC announcement of the Ebola outbreak in West Africa. We additionally analyzed tweets three months later after increased efforts to fight the epidemic and increase public awareness of the virus. We also analyzed tweets six months after the CDC announcement, when new cases were on a steady decline. At this time point, Liberia was reporting no new cases for two consecutive weeks. In Conakry, Guinea and Freetown Sierra Leone, the outbreak formed geographic contagious arcs enabling response efforts focused on smaller areas 10. Specifically, the three time points categorized by the domain experts 1) baseline: one week after the official July 28th, 2014 CDC announcement of Ebola (August 4th to 8th, 2014); 2) 3-month post: three months after the official CDC announcement of Ebola (October 23rd to 26th, 2014) and 3) 6-month post: six months after the official CDC announcement of Ebola (March 3rd to 7th, 2015).

Using Ebola as the keyword, we retrieved all available English and Spanish language tweets. All Ebola-related hashtags were daily monitored for its mutation by the domain experts and any mutated keywords were included in the data analysis. Examples of included hashtags are #Ebola, #EbolaOutbreak, #EbolaVirus and #EbolaFacts. All Ebola-related hashtags (both simple and complex) were considered. Tweets were included in the analysis if they referred to Ebola. Tweets were collected once to every 15 minutes for 4-5 days for the three identified time points based on the total number of the tweets. This allowed for the collection of a representative sample to overcome the challenges of time and amount limit (e.g., 18000 tweets per every 15 minutes). For example, we did not need to extract tweets 10 times per day when Ebola was rarely mentioned. On the other hand, when Ebola tweets spiked in frequency, 15 minute data collection was required by the domain experts. Data elements collected for each tweet included content, time stamps, geographical locations and self-identified locations, the user names, the message type (unique or retweet), the number of Twitter followers. Dissemination is defined as the number of tweets and subsequent retweets spread to Twitter followers.

Natural language processing was conducted to depict the topics discussed about Ebola. To identify topics of collected tweets, we cleaned symbols and web addresses, and transformed text to a vector form and N-gram. This followed by reducing the dimensionality of the volume using Notepad++ and Weka 3.7 12. Tweet cleaning included the removal of nonstandard or special characters that may hinder the use of content mining tools, including the removal of hashtags. Transferred text is represented as a vector of features. N-grams, an example of features, is a subsequence of N items in a given sequence from characters to words. A tweet term-frequency dictionary was computed with the N-gram method. To reduce the dimensionality to algorithmically process the data, stop words (e.g., and, to, for) were removed using the stop-word removal function in Weka 12. Stemming, the process of identifying a word root and removing suffixes and prefixes was also performed to remove dimensionality. The stemming algorithm Porter’s algorithm, an affix-removal approach, was applied through “snowball stemmers” within Weka10. Preprocessed data was then analyzed to discover patterns including hot topics and sentiment. Topics were detected and summarized through descriptive statistics (e.g., frequency count of specific tweets), visualization, classification and clustering. The N-gram forms (unigram, bigram and trigram) of tweet messages were clustered based on content similarities for topic detection. K-means algorithm was then applied using Weka 12. Clusters were visualized to summarize the detected topics using infographics. Tweets were also thematically categorized based on their content and user information. Users were further coded based on world regions (i.e., Africa, Asia, Europe, Middle East, North America and South America) based on the geocode of the tweets .

Results

Information Spread. Time point 1: After the first official CDC announcement telebriefing, 45,079 tweets were posted and disseminated to 11,698,549,157 Twitter followers from August 4th to 8th, 2014. Time point 2: From October 23th to 26th, 2014, 3 months after the first announcement a total of 66,552 tweets were posted and disseminated to 31,265,701,152 Twitter followers. Time point 3: From March 3rd to 7th, 2015, 6 months after the first announcement a total of 44,016 tweets were posted and disseminated to 11,141,339,478 Twitter followers (Figure 1) (10.6084/m9.figshare.5466358).

Figure 1 larger

Information Seeking and Needs. Tweet topics at Time point 1 were: 1) Readiness of U.S. hospitals; 2) History and symptoms to report (e.g., travel to West Africa with high fever); 3) Ebola Statistics (i.e., number of death, number of cases, affected countries) and 4) Transmission mode (i.e., information about how Ebola spreads). Topics at Time point 2 were: 1) Response of Spanish government to Ebola (e.g., public frustration with its corrupted process of Ebola treatment); 2) Non-scientific information-seeking behavior of Ebola cure (e.g., magic healing of crystal stone for Ebola); 3) U.S. government policy (i.e., public response to the 21-day Ebola quarantine for volunteers from West Africa in New York and New Jersey); and 4) Fear of volunteering in West Africa (i.e., discouragement of volunteering for Ebola in West Africa such as Doctors Without Border). Topics at Time point 3 were: 1) Non-governmental organization (NGO) focusing on Save the Children and Mexico (e.g., protecting children in vulnerable Mexico); 2) Priorities and public frustration (e.g., 30 countries being depicted as vulnerable for Ebola and consequent public frustration towards slow response); 3) Lawsuit towards a hospital (i.e., Nina Pham, who was cured of Ebola, is suing the Texas Health Presbyterian Hospital in Dallas for lack of training and proper equipment); and 4) North Korean lifting of 4-month travel ban (i.e., public response to this), Table 1 (10.6084/m9.figshare.5466370).

Table 1 larger

Main topics of concern discussed in tweets mentioning Ebola are illustrated with infographics in Figure 2 (10.6084/m9.figshare.5466361). The bubble chart in Figure 2 illustrates the topics of Ebola tweets at each time point. The sizes of the bubbles were normalized by clusters representing their relative frequencies. The frequency of N-gram of the main topic for each time point is reported in Figure 2.

Figure 2 larger

In depth six-month analysis on News Announcements (March 3-7, 2015). During March of 2015, as the epidemic came under control, information needs tweets declined. Frequently discussed themes emerged differently by continents. While Africa and North America were focusing on medical practice and care (e.g., hospital, nurse), main focus of Australia and AISA was north Korea and Europe focused on global leadership (e.g., responsible) and duty. Results indicating Tweets by world region are displayed in Figure 3 (10.6084/m9.figshare.5466367).

Figure 3 larger

During this period, tweets were regarding potential vulnerabilities to the virus, the Dallas Nurse lawsuit and the lifting of the North Korean travel ban. The one major information need still tweeted was Ebola-related vulnerabilities, all regions except Australia still felt vulnerable to Ebola exposure. In North America, Africa, Europe, Asia and the Middle East, top tweets focused around the Dallas nurse suing the hospital for Ebola exposure, Table 2 (10.6084/m9.figshare.5466373).

Table 2 larger

Top Tweets were then ordered based on frequency and stratified by world region grouped in our corpus. Most frequent and consistent tweets across regions were Nurses (North America, Europe, Asia, Middle East and Africa); Hospitals (North America, Asia, Middle East and Africa); Sues (North America, Europe, Australia, Middle East and Africa) and Fight (North America, Europe, Australia and Africa), Table 3 (10.6084/m9.figshare.5466379).

Table 3 larger

Discussion

Strategies for public health practice during outbreaks of emerging or reemerging infectious disease require a balance between protecting the health of the population and proper health education to minimize fear, discrimination and stigma 3,12. These difficulties mark the 2014-2016 Ebola outbreak. As the global community provided support to affected countries for containment and treatment of disease, fear spread unchecked throughout communities around the globe. Fear was driven by anxiety from a poor understanding of Ebola’s cause, infectivity and mortality 13. The press further drove anxiety and fear as they reported worst-case scenarios and the possibility of a pandemic23. It is evidenced that during outbreaks of epidemic or pandemic proportions, the public requires immediate health information 13,14. The Conceptual Framework of PHSA demonstrates how health systems can link outbreak surveillance to action through data information messages 5. However, the current study revealed through our content analysis of tweets at three time points during the 2014-2016 Ebola outbreak, information needs that persisted throughout the global community.

The foundation of outbreak surveillance is the delivery of useful and effective information whether to health officials to support disease containment or to develop tailored messages in health education campaigns 6,15. In fact, during the most frequent Tweets send from US local health departments during September – November 2014 were information dissemination including risks/symptoms, prevention, dispelling myths in addition to tweets regarding preparedness including governments and hospitals 21. Myths include speculative treatments saltwater for bathing or drinking and the ingestion of Nano Silver 22. According to the Centers for Disease Control and Prevention (CDC), public health education efforts during outbreaks should include the surveying of attitudes and knowledge to support effective mass media messages 9. Twitter and other social networking sites have the ability to support these education goals. These sites can be used as tools to both assess information needs and to disseminate tailored health information to address such needs 24. Our study, through content analysis, utilized Twitter to assess health information needs. In fact, tweets about the Ebola outbreak were posted in the tens of thousands with dissemination in the billions for all three-time points.

The week of August 4-8, 2014, reflects the early stage of health alerts when multiple countries confirmed Ebola cases and the fear of a global spread was at its peak. During this time point, Twitter users sought and disseminated Ebola information regarding concerns, transmission, infection and prevention. Top tweets were understandably regarding the readiness of US hospitals for potential cases. Concerns were followed by information needs, with the most frequent being the history of Ebola, symptoms, statistics on the epidemic and mode of transmission. This was also the time the first healthcare worker returned to the US (July 31, 2014) 13,16. Although the health worker was contained, Americans feared transmission and spread, indicating the tremendous need for health education campaigns. During this time period organizations such as Harvard and Columbia tweeted Ebola education messages, but dissemination was limited due to limited following, with no retweets from media sources 17.

In the week of October 23-26, 2014, non-scientific information-seeking behavior of Ebola prevention and cure (e.g., magic healing of crystal stone for Ebola) was the second major topic tweeted and retweeted. During this period, Tweets discussed “God’s magic healing crystal.” These findings clearly indicate health education campaigns were suboptimal in meeting the health information needs of the public. The end of October marked the first 3 months of extensive news coverage on the outbreak, travel bans and returning health professionals. Understandably, the second major topic tweeted was regarding the U.S. government policy surrounding containment, control and fear of those volunteering in West Africa. This was the ideal time to reinforce the modes of transmission, understanding signs and symptoms and plans for outbreak containment in the event of returning cases. Widespread reports from health entities including the CDC and the World Health Organization (WHO) also occurred early in the epidemic. In August of 2014, the WHO utilized Twitter to dispel social media rumors about such products claiming to cure and prevent Ebola 18. However, three months later we can see that tweets about such remedies were persistent and concerns of outbreak containment remained apparent. These results revealed that information needs were not effectively met.

In the week of March 3-7, 2015, priorities and public frustration was the third major topic tweeted. This topic consisted of reports from non-governmental organizations (NGOs) about the list of 30 vulnerable Ebola countries. During this March time point, no further knowledge distribution regarding the spread and treatment of Ebola was observed as we saw in the beginning of August and the end of October. Tweets did continue around government policy for containment and control, specific to the North Korean lifting of the travel ban and the consequences of ineffective US hospital readiness with the their top tweets regarding the Dallas hospital lawsuit. Tweets during the March time point included ‘Ebola is not a disease that kills the rich in the West’. Such comments were retweeted by thousands. These tweets further indicate severe knowledge deficit. In fact, the 2014-2016 outbreak is the first in an urban setting with overcrowding driving the epidemic 3. We cannot safely conclude that exposure to an infected person during rush hour on public transportation in a western city could not cause a problem. It is evident the need still existed for health information and effective education on outbreak containment and control.

An in-depth analysis of tweets the week of March 3-7 indicated that one major information need was vulnerabilities. All regions except Australia still felt vulnerable to Ebola exposure. In fact, South America, Africa and North America topped the list for frequent tweets about being vulnerable to Ebola. This is an indication of the continual and ongoing need for international health organizations to: 1) reiterate Ebola containment and control procedures and 2) to report the success of containment efforts and the effectiveness of airport and immigration screenings and quarantine. Such communication would have supported further reassurance of the public and remind them of their decreased vulnerability to Ebola exposure. Furthermore, this was an excellent time to reinforce the modes of Ebola transmission to diminish ongoing perceived vulnerabilities and fears. South America’s vulnerability concern stemmed from its close proximity to North America and Africa, tourism to their countries and the potential difficulty in managing or containing an Ebola outbreak 19. This was an optimal time to reinforce educational messages, outbreak control and containment, to pledge support from developed nations and international organizations. Additionally, this was the ideal time to reemphasize appropriate screening protocols and to communicate successful containment efforts and approaches, particularly for developing countries such as Nigeria 17.

In North America, Africa, Europe, Asia and the Middle East, top tweets focused around the Dallas nurse suing the hospital for Ebola exposure. Based on our analysis of on tweeted messages, top tweets provide direct insight into public sentiment about the actions taken by the nurse to sue for reckless endangerment. However, these messages are two–fold. They reinforced fear of deficiencies in the screening process, which made it possible for Mr. Duncan to leave the airport after arriving in Dallas. It also reinforced fears of hospital negligence. First, sending Mr. Duncan home and in turn exposing nurse Pham to Mr. Duncan’s most severe symptoms 20. This speaks to public sentiment regarding feelings of being unsafe. This again was a teachable moment allowing for the dispelling of myths, explaining what went wrong at the airport screening and at the Dallas hospital and discussing lessons learned. Information dissemination based on these tweets should have included new measures implemented to ensure improved screening procedures and protection for healthcare workers 17 .

Limitations. The generalizability of this study is limited due to the analysis of data in only two languages (English and Spanish) and a single social medium (Twitter). Our search strategies utilized all Ebola-related hashtag variations in an attempt to include the maximum number of Ebola-related tweets, with no analysis of their independent contributions to the results. Streaming API allows us to only capture approximately 18000 Ebola related Tweets per 15 minutes, therefore tweets were captured daily for optimal access to available tweets. Our tweet analysis focused on tweets developed or retweeted and not tweets that were only composed. Future studies can further support the understating of Ebola health information needs by exploring different social media in addition to blogs or community websites. Our study assessed information needs and not health related information provided to the public. Future studies can further explore the quality of health information dissemination. A major limitation of this study is that Twitter users may not reflect the demographic characteristics of general population, rather slightly younger and more racial minority groups25, 26, 2725. Unlike other social media, Twitter is more popular among Black (27%) and Latino (25%) compared to non-Hispanic white (21%) internet users 28. In terms of age, only 31% of Twitter users are over age 50, 26 a category into which majority of them are higher risks in any kinds of infectious diseases than the young adults. We recognize the biases that result from differences in demographic characteristics of social media and in this study we were not able to characterize them because demographic information is not available in publically available Twitter data through API26, 28.26 Nevertheless, the organic big data use such as Twitter has potential to supplement traditional survey methods as we learned the tremendous limitation of conventional survey methods from the U.S. presidential election29, 30, 29, 31.

Conclusion

Social media content can support outbreak surveillance efforts. Such sites allows for the development of networks to share and seek a variety of information regarding events including outbreaks such as Ebola. At the same time, social media can be an impediment to health security when it propagates (social contagion of simplistic ideas) information based on collective unawareness (e.g., crystals protect against Ebola). This phenomenon of social media fills a collective thought space that could be replaced with a wealth of scientific knowledge. Twitter and other social networking sites are also important for assessing health information needs and serving as platforms for information dissemination. Our study demonstrates how Twitter can support public health outbreak surveillance to identify health information needs. Although our study analyzed tweets on a global level, our methods can be replicated in more targeted efforts by public health practitioners on national and sub-national level. In our assessment, we have provided a window into public interests at three time points during the 2014-2016 Ebola epidemic. Ebola-related tweets were a rich source of experiences and opinions. Although a variety of health education efforts occurred during the outbreak, our results indicate health information needs were ongoing and transforming. Twitter can used as a knowledge transfer tool to assess the ongoing needs and support relevant data information messages during outbreak control and surveillance efforts. Content mining of large big volume of tweets provides a convenient and fast assessment of response and can in turn, support literacy appropriate health information to improve behavioral response and diminish fear.

Corresponding Author

Michelle Odlum mlo12@columbia.edu

Competing Interests

The authors have declared that no competing interests exist.

Data Availability Statement

A minimal dataset used to produce the findings of this paper is freely available to other researchers http://figshare.com/. All relevant DIOs are listed withn the text for each Table and Figure.