Influenza – PLOS Currents Influenza Wed, 10 Oct 2018 16:13:49 +0000 en-US hourly 1 Critical paths in a metapopulation model of H1N1: Efficiently delaying influenza spreading through flight cancellation Mon, 23 Apr 2012 21:51:48 +0000


Complex networks are pervasive and underlie almost all aspects of life. They appear at different scales and paradigms, from metabolic networks, the structural correlates of brain function, the threads of our social fabric and to the larger scale making cultures and businesses come together through global travel and communication123456.

Recently, these systems have been modelled and studied using network science tools giving us new insight in fields such as sociology, epidemics, systems biology and neuroscience. Typically components such as persons, cities, proteins or brain regions are represented as nodes and connections between components as edges 67.

Many of these networks can be categorised by their common properties. Two properties relevant to spreading phenomena are the modular and scale-free organization of real-world networks. Modular network consist of several modules with relatively many connections within modules but few connections between modules. Scale-free networks with highly connected nodes (hubs) where the probability of a node having k edges follows a power law k−γ89. It is possible for a network to show both scale-free and modular properties, however the two features may also appear independently. The worldwide airline network observed in this study was found to be both scale-free and modular10.

Spreading in networks is a general topic ranging from communication over the Internet 1112, phenomena in biological networks13, or the spreading of diseases within populations14. Scale-free properties of airline networks are of interest in relation to the error and attack tolerance of these networks515. For scale-free networks, the selective removal of hubs produced a much greater impact on structural network integrity, as measured through increases in shortest-path lengths, than simply removing randomly selected nodes15. Structural network integrity can also be influenced by partially inactivating specific connections (edges) between nodes161718. Dynamical processes such as disease spreading over heterogeneous networks was also shown to be impeded by targeting the hubs 1920, with similar findings for highest traffic airports in the case of SARS epidemic spreading5.

In contrast to predictions of scale-free models, recent studies of the airline network21 demonstrated that the structural cohesiveness of the airline network did not arise from the high degree nodes, but it was in fact due to the particular community structure which meant some of the lesser connected airports had a more central role (indicated by an higher betweenness centrality, the ratio of all-pairs shortest paths crossing each node). Here we expand on this finding further by considering a range of centrality measures for individual connections between cities, show that their targeted removal can improve on existing control strategies5 for controlling influenza spreading and finally discuss the effect of the community structure on this control.

To demonstrate the impact on influenza spreading caused by topological changes to the airline network, we run simulations using a stochastic metapopulation model of influenza2223 where the worldwide network of commercial flights is used as the path for infected individuals traveling between cities (see Fig. 1A with Mexico City as starting node of an outbreak). For this, we observe individuals within cities that contain one of the 500 most frequently used airports worldwide (based on annual total passenger number). Individuals within the model can be susceptible (S), infected (I), or removed (R). The number of infected individuals depends on the population of each city and the volume of traffic over airline connections between cities. Note that the time course of disease spreading will also be influence by seasonality23; however, only spreading in one season was tested here.

Fig. 1: (A) Spreading over the airline network with Mexico City as starting node (red)

Nodes in yellow are directly connected whereas nodes in green are airports not directly linked to the starting point. (B) Connectivity of the airline network showing four clusters. A dot in the adjacency matrix indicates the presence of a connection between two cities.

The simulated epidemic starts in 1 July 2007 from a single city, Mexico City in our case, and its evolution over the following year is recorded. We then consider the number of days necessary for the epidemic to reach its peak as well as the maximum number of infected individuals (Fig. 2A). This procedure is then repeated following the removal of a percentage of connections ranked as by a range of distinct measures such as edge betweenness centrality, Jaccard coefficient or difference and product of node degrees. Finally we also test the effect of shutting down the most highly connected airports (hubs) up to the same level of cancelled connections.

Comparing single edge removal strategies against the previously proposed shutdown of whole nodes (airports) we find that removing selected edges has a greater impact on the spreading of influenza with a significantly smaller loss of connectivity between cities. For the global airline network only a smaller set of flights routes between cities would need to be stopped instead of cancelling all the flights from a set of airports to get the same reduction in spreading.

In addition as demonstrated in21 for structural cohesiveness and in24 regarding dynamical epidemic spreading, it is the community structure and not the degree distribution that plays a critical role in facilitating spreading. Our method of slowing down spreading by removing critical connections is efficient as it targets links between such communities.

Concerning the computational complexity, whereas some strategies are computationally costly for large or rapidly evolving networks, several edge removal strategies are as fast as hub removal while still offering much better spreading control.

Note that whereas we observed similar strategies in an earlier study25, the current work includes the following changes: First, simulations run at the level of individuals rather than simulating whether the disease has reached (‘infected’) airports. Second, the spreading between cities, over the airline network, now depends on the number of seats in airline connections between cities. This gives a much more realistic estimate of the actual spreading pattern as not only the existence of a flight connection but the specific number of passengers that flow over that link is taken into account. Third, the previous study used an SI model that is suitable for early stages of epidemic spreading. However, in this study we use an SIR model that allows us to observe the time course of influenza spreading up to one year after the initial outbreak.


For the network used in the study, the top 500 cities worldwide with the highest traffic airports became the nodes and an edge connects two of such nodes if there is at least one scheduled passenger flight between them. Edges are then weighted by the daily average passenger capacity for that route. Spreading in this network can then show how a disease outbreak, e.g. H1N1 or SARS influenza, can spread around the world523.

As in previous studies526, we have used a similar methodology22 where one city is the starting point for the epidemic and air travel between such cities offers the only transmission path for an infectious disease to spread between them. Due to the relevance of the recent H1N1 (Influenza A) epidemic we have used Mexico city to be the epidemic starting point of our simulations.

Spreading simulations starting in Mexico City with 100 exposed individuals were summarised by Ninfectious the greatest number of infected individuals that were infectious at any time during the epidemic. Spreading control strategies were evaluated by removing up to 25% of the flight routes and measuring the resulting decrease in Ninfectious (see Fig. 2A and Methods). Measures based on edge betweenness and Jaccard coefficient were the two best predictors of critical edges (Fig. 1A). Among the top intercontinental connections identified by betweenness centrality are flights from Sao Paulo (Brazil) to Beijing (China), Sapporo (Japan) to New York (USA) and Montevideo (Uruguay) to Paris (France). After removing a quarter of all edges, both strategies showed a decrease in infected population of 37% for edge betweenness centrality and 23% for the Jaccard coefficient, compared to only 18% for the hub removal strategy.

Fig. 2: Influenza spreading for Mexico City as starting node

(A) Influenza spreading for Mexico City as starting node, measured by the number of infected individuals over time on the intact network (blue) and after removing 25% of edges by hub removal (red) or edge betweenness (green). (B) Maximum infected population following sequential edge elimination by betweenness centrality, Jaccard coefficient, difference and product of degrees and hub removal (see Methods).

Whereas in23 a control strategy based on travel restrictions found that travel would need to be cut by 95% to significantly reduce the number of infected population, we observed that by removing connections ranked by edge betweenness this reduction to appeared after 18% of flight routes were cancelled (see Fig. 2B).

To understand the underlying mechanism of these results we produced two rewired versions of the original network: one version preserved the degree distribution alone while another preserved both the latter and also the original community structure.

Applying the same spreading simulations on these rewired versions of the network showed that only on networks that preserved the original’s community structure did we observe a significant reduction in infections when removing edges (see Fig. 3) connecting nodes ranked by Jaccard coefficient. For the 25% restriction level considered, betweenness centrality was the best measure even when no communities were present, offering a 41% reduction in infected cases in both types of network.

Fig. 3: Worldwide infections over time following edge removal as selected by edge betweenness centrality and Jaccard coefficient.

The two plots show results for two rewired versions of the original airline network: one preserving only the original degree distribution and another preserving both the degree distribution plus original communities. Full lines show the averaged results over 25 rewired versions of the airline network. Intact results are for the complete networks, while Jaccard and betweenness lines show the averaged results following the removal of 25% of edges as selected by each respective measure. Dotted lines show each corresponding standard deviation.

This apparent advantage of betweenness even in networks without communities is due to its use of the capacity of each connection (edge weight), at 25% edge removal it will have removed most major high capacity connections from the network. Jaccard is a purely structural measure and without knowledge of capacity. The presence of communities is then critical for its performance. At lower levels of damage we see that Jaccard is better than edge betweenness centrality at reducing infected cases in networks with community structure.


Selecting specific edges for removal efficiently controls spreading in the airline network. Although this was not tested directly, cancelling fewer flights might also lead to fewer passengers that are affected by these policies compared to the approach of cancelling mostly flights from highly connected nodes (hubs). With the same number of removed connections, edge removal strategies resulted in both a larger slowdown of spreading and a resulting much smaller number of infected individuals compared to hub removal strategies.

Edge betweenness was best at predicting critical edges that carried the greater traffic weighted by number of passengers traveling resulting in a large reduction in infectious population; however we also observed that removing edges ranked using the purely structural Jaccard coefficient (see Fig. 2A) led to the greatest delay in reaching the peak of the epidemic. Among the best predictor edge measures, due to a computational complexity of O(n2), the Jaccard coefficient is the fastest measure to calculate, making it particularly suitable for large networks or networks where the topology frequently changes. Edge betweenness was the computationally most costly measure with O(n ∗ e), for a network with n nodes and e edges.

Whereas hub removal was the worst strategy in this study, node centrality might lead to better results. Indeed, previous findings10 show that the most highly connected cities in the airline system do not necessarily have the highest node centrality. However, node centrality would be computationally as costly as edge betweenness.

Highly ranked connections predicted by edge measures were critical for the transmission of infections or activity and can be targeted individually with fewer disruptions for the overall network. In the transportation network studied, this means higher ranked individual connections could be cancelled instead of isolating whole cities from the rest of the world.

Results obtained from simulating the same spreading strategy over differently rewired versions of the airline network demonstrated the mechanism behind the performance of the Jaccard predictor in slowing down spreading in networks that display a community structure, as is the case for spatially distributed real-world networks272829. This is a good measure for these types of networks, given its good computational efficiency and the little information it requires to compute the critical links – it needs nothing else than to know the connections between nodes.

The current study was testing different strategies and different percentages of removed edges leading to a large number of scenarios that had to be tested. Therefore, several simplifications had to be performed whose role could be investigated in future studies. First, only one starting point, Mexico City, for epidemics was tested. While this is in line with earlier studies using 1-3 starting points523, it would be interesting to test whether there are exceptions to the outcomes presented here. Second, spreading was observed only in one season, summer. Previous work23 has pointed out that the actual spreading pattern differs for different seasons. Third, only the 500 airports with the largest traffic volume rather than all 3,968 airports were included in the simulation. While this was done in order to be comparable with the earlier study of Hufnagel et al.5, tests on the larger dataset would be interesting. Including airports with lower traffic volumes might preferable include national and regional airports within network modules. This could lead to a faster infection of regions; however, connections between communities would still remain crucial for the global spreading pattern.

Compared to our earlier study where the spreading of infection between airports rather than individuals was modelled25, edge betweenness could reduce the maximally infected population number more than targeting network hubs. The Jaccard coefficient that showed very good performance in the earlier study25, however, did not perform better than the hub strategy. The difference and product of node degrees were poor strategies for both spreading models. This indicates that metapopulation models can lead to a different evaluation of flight cancellation strategies for slowing down influenza spreading.

In conclusion, our results point to edge-based component removal for efficiently slowing spreading in airline and potentially other real-world networks.

Materials and Methods

The network of connections between the top 500 airports is available under the resources link on our website Note that distribution of the complete data set, including all airports and traffic volumes, is not allowed due to copyright restrictions. However, the complete dataset can be purchased directly from OAG Worldwide Limited.

Airline Connections Network

As in other work510, we obtained scheduled flight data for one year provided by OAG Aviation Solutions (Luton, UK). This listed 1,341,615 records of worldwide flights operating from July 1, 2007 to July 30, 2008, which is estimated by OAG to cover 99% of commercial flights. The records include the cities of origin and destination, days of operation, and the type of aircraft in service for that route. Airports were uniquely identified by their IATA code together with their corresponding cities. These cities became the nodes in the network. Short-distance links corresponding to rail, boat, bus or limousine connections were removed from our data set. An edge connecting a pair of cities is present if at least one scheduled flight connected both airports. As in previous studies5, we used a sub-graph containing the 500 top airports that was obtained by selecting the airports with greater seat traffic combining incoming and outgoing routes. This subset of airports still represents at least 95% of the global traffic, and as demonstrated in30 it includes sufficient information to describe the global spread of influenza. We are allowed to make the restricted data set of 500 airports available and you can download it under the resources link at

Spreading Model

Our analysis is based on the stochastic equation-based (SEB) epidemic spreading model as used in31, simulating the spreading of influenza both within cities and at a global level through flights connecting the cities’ local airports. Within cities, a stochastically variable portion of the susceptible population establishes contact with infected individuals. This type of meta-population model accounts for 5 different states of individuals within cities: non-susceptible, susceptible, exposed, infectious, and removed (deceased). As we have not considered vaccination in this model we did not use the non-susceptible class in our study.

Movement of individuals between cities is determined deterministically from the daily average passenger seats on flights between cities. Once infectious, individuals will not travel. We have assumed a moderate level of transmissibility between individuals, where R0 = 1.7, as also used in other influenza studies3132 . Note, however, that future epidemics of H5N1 and other viruses might have different In 5 a similar model including stochastic local dynamics was used, however it was focused on a specific outbreak of SARS (Severe Acute Respiratory Syndrome) and Hong Kong was considered its starting point.

Edge Removal Strategies

Five candidate measures for predicting critical edges in networks were tested. The measures are based on range of different parameters including node similarity, degree and all pairs shortest paths. Measures are taken only once from the intact network and are not recomputed after each removal step.

Edge betweenness centrality 3334 represents how many times that particular edge is part of the all-pairs shortest paths in the network. Edge betweenness can show the impact of a particular edge on the overall characteristic path length of the network; a high value reveals an edge that will quite likely increase the average number of steps needed for spreading.

The Jaccard similarity coefficient (or matching index 3536) shows how similar the neighbourhood connectivity structure of two nodes is, for example two nodes who shared the exact same set of neighbours would have the maximum similarity coefficient of 1. A low coefficient reveals a connection between two different network structures that might represent a ”shortcut” between remote regions, making such low Jaccard coefficient edges a good target for removal.

The absolute difference of degrees for the adjacent nodes is another measure of similarity of two nodes. A large value here indicates a connection between a network hub a more sparsely connected region of the network.

The product of the degrees of the nodes connected by the edge is high when both nodes are highly connected (hubs).

For testing the absolute difference and product of degrees we also considered the opposite removal strategy (starting with lowest values) but the results showed to be consistently under-performing when compared to all other measures (not shown).

Finally, highly connected nodes will be detected and the nodes, and therefore all the edges of that node, will be removed from the network. Note that this is referred to as ’hub removal strategy’ whereas the impact is shown in relation to the number of edges which are removed after each node removal.

Simulation Algorithm

Original simulation code, as used in 23, was obtained from the MIDAS Project (Research Computing Division, RTI International). The simulator was developed in Java (Sun Microsystems, USA) programming language using the AnyLogicTM (version 5.5, XJ Technologies, USA) simulation framework to implement the dynamical model. Network measures were implemented in custom MATLAB (R2008b, MathWorks, Inc., Natick, USA) code. Results were further processed in MATLAB . Simulations were run in parallel on a 16-core HP ProLiant server, using the Sun Java 6 virtual machine.

Edge betweenness centrality was implemented using the algorithm by Brandes 34. Links between cities in the network were considered to be directed, the network used included a total of 24,009 edges.

Mexico City was used as a starting node as observed in in the recent 2009 H1N1 pandemic. The starting date of the epidemic was assumed to be 1 July, and the pandemic evolution is simulated over the following 365 days, covering all the effects of seasonality as seen in both the Southern and Northern hemispheres.

Following the removal of each group of edges ranked by each control strategy, the spreading simulations were repeated.

Testing Rewired Networks

To test whether the mechanism of control arose from the particular community structure or degree distribution, we observed two different rewired versions of the original network. In one version only each individual node degree was maintained and the whole network was randomly rewired, destroying the original community structure. For the second, the original community structure was preserved but the sub-network within each community was rewired, so connections within the community were rearranged but the original inter- community links were preserved. Both rewiring strategies preserved the original degree structure by the commonly used algorithm 37 in order to maintain the same number of passengers departing from each city and the number of passengers is only shuffled to different destinations. This way both strategies did not change in the number of passengers departing from each city, only the connectivity structure was modified.

The original community structure was identified using an heuristic modularity optimization algorithm38 which identified four distinct clusters. These are predominantly geographic: one for North and Central America, including Canada and Hawaii, another for South America, a third including the greater part of China (except Hong Kong, Macau and Beijing) and finally a fourth including all other airports (Fig. 1B).

Twenty rewired networks were generated for each version of the rewiring algorithm and the daily average evolution of influenza, using the same spreading algorithm as above, was taken across these 20 networks. This was repeated after the removal of each group of edges. Therefore each measure on each of the rewired lots combines 182,500 individual results.

Competing Interests

The authors have declared that no competing interests exist.

]]> 0
Recrudescent wave of pandemic A/H1N1 influenza in Mexico, winter 2011-2012: Age shift and severity Fri, 24 Feb 2012 09:33:03 +0000 BACKGROUND:

A substantial recrudescent wave of pandemic influenza A/H1N1 that began in December 2011 is ongoing and has not yet peaked in Mexico, following a 2-year period of sporadic transmission. Mexico previously experienced three pandemic waves of A/H1N1 in 2009, associated with higher excess mortality rates than those reported in other countries, and prompting a large influenza vaccination campaign. Here we describe changes in the epidemiological patterns of the ongoing 4th pandemic wave in 2011-12, relative to the earlier waves in 2009. The analysis is intended to guide public health intervention strategies in near real time.


We analyzed demographic and geographic data on all hospitalizations with acute respiratory infection (ARI) and laboratory-confirmed A/H1N1 influenza, and inpatient deaths, from a large prospective surveillance system maintained by the Mexican Social Security medical system during 01-April 2009 to 10-Feb 2012. We characterized the age and regional patterns of A/H1N1-positive hospitalizations and inpatient-deaths relative to the 2009 A/H1N1 influenza pandemic. We also estimated the reproduction number (R) based on the growth rate of the daily case incidence by date of symptoms onset.


A total of 5,795 ARI hospitalizations and 186 inpatient-deaths (3.2%) were reported between 01-December 2011 and 10-February 2012 (685 A/H1N1-positive inpatients and 75 A/H1N1-positive deaths). The nationwide peak of daily ARI hospitalizations in early 2012 has already exceeded the peak of ARI hospitalizations observed during the major fall pandemic wave in 2009. The mean age was 34.3 y (SD=21.3) among A/H1N1 inpatients and 43.5 y (SD=21) among A/H1N1 deaths in 2011-12. The proportion of laboratory-confirmed A/H1N1 hospitalizations and deaths was higher among seniors >=60 years of age (Chi-square test P<0.001) and lower among younger age groups (Chi-square test, P<0.03) for the 2011-2012 pandemic wave, compared to the earlier waves in 2009. The reproduction number of the winter 2011-12 wave in central Mexico was estimated at 1.2-1.3, similar to that reported for the fall 2009 wave, but lower than that of spring 2009.


We have documented a substantial and ongoing increase in the number of ARI hospitalizations during the period December 2011-February 2012 and an older age distribution of laboratory-confirmed A/H1N1 influenza hospitalizations and deaths, relative to 2009 A/H1N1 pandemic patterns. The gradual change in the age distribution of A/H1N1 infections in the post-pandemic period is reminiscent of historical pandemics and indicates either a gradual drift in the A/H1N1 virus, and/or a build-up of immunity among younger populations.



The resurgence of swine-origin pandemic A/H1N1 influenza virus in winter 2011-12 is causing a sizable epidemic in Mexico, following a 2-year period of sporadic transmission. Mexico experienced a series of three A/H1N1 pandemic waves in the spring, summer, and fall of 2009 [1] [2] [3] , followed by a large pandemic vaccination campaign towards the end of 2009. These 3 waves were together associated with high excess mortality burden relative to that seen in other countries [4] [5] [6] . Because a significant fraction of the population is now protected from A/H1N1 influenza through natural exposure or vaccination [7], there is potential for the emergence of drift A/H1N1 influenza variants, and/or changing age patterns, as typically seen in post-pandemic periods [8] [9].

Here we report on the epidemiology of a recrudescent (4 th ) wave of pandemic A/H1N1 influenza activity in Mexico from 01-December 2011 to 10-February 2012. Because past pandemic experiences have indicated substantial post-pandemic morbidity and mortality burden may occur months to years after the initial pandemic waves [9] [10] [11] [12] [13] , we must remain vigilant and continue to monitor the epidemiology and health burden of A/H1N1 influenza. We compared the epidemiological characteristics of laboratory-confirmed A/H1N1 hospitalizations and deaths in winter 2011-12 with those previously reported for the 2009 pandemic waves and show a significant change in the age distribution of cases and deaths.


Epidemiological Data

Individual level hospitalization data were available from a prospective epidemiological surveillance system that was put in place especially for the 2009 influenza pandemic by the Mexican Institute for Social Security (IMSS) [4] [14] [15]. IMSS is a tripartite Mexican health system covering approximately 40% of the Mexican population comprising workers in the private sector and their families, relying on a network of 1,099 primary health-care units and 259 hospitals nationwide. The age and gender distributions of persons affiliated to the IMSS medical system are representative of the general Mexican population [4].

We analyzed information from all hospitalizations and inpatient-deaths among patients admitted with acute respiratory infection (ARI) during 01-December 2011 to 10-February 2012. ARI was defined as any person with respiratory difficulty presenting fever 38°C and cough together with one or more of the following clinical symptoms: confinement to bed, thoracic pain, polypnea, or acute respiratory distress syndrome. Children <5 years with pneumonia or severe pneumonia that required hospitalization were also considered as ARI cases. Respiratory swabs were obtained for about 26% of ARI hospitalizations (ARI) in winter 2011-12 and were tested for the influenza virus by rRT-PCR [16].

For all ARI hospitalizations, we retrieved demographic information (age in yrs, and gender), influenza laboratory test result (if tested), reporting state (including 31 states plus the Federal District), and dates of onset of symptoms (self-reported). We also obtained population data by state and age group for all persons affiliated with IMSS in 2009 to calculate incidence rates.

Age distribution and severity of A/H1N1 influenza in 2009 and 2011-12

We examined the age distribution of hospitalizations and deaths based on all ARIs and laboratory-confirmed A/H1N1 influenza patients reported from 01-December 2011 to 10-February 2012. We compared the age distribution of hospitalizations and deaths in winter 2011-12 with those described for the three waves of the 2009-10 A/H1N1 pandemic in Mexico, 01-April 2009 to 31-March 2010, using the same IMSS reporting system.

We also calculated preliminary estimates of the in-hospital case fatality rate by dividing inpatient deaths by hospitalizations, separately for ARI and laboratory-confirmed A/H1N1. These estimates are preliminary as we likely underestimate the true fatality ratio due to a delay from symptoms onset to death.

Spatial distribution of A/H1N1 influenza in winter 2011-12 and reproduction number estimate

We analyzed state- and age-specific time series of laboratory-confirmed A/H1N1 influenza hospitalizations by day of symptom onset to analyze the geographic dissemination patterns of ongoing sustained A/H1N1 influenza transmission in Mexico during the early weeks of the wave, 01-December 2011 to 10-February 2012.

Further, we estimated the reproduction number, R, in Central Mexico where the great majority of cases have been reported, based on a simple method previously used in the context of the 2009 A/H1N1 pandemic waves in Mexico [4]. Specifically we estimate the initial epidemic growth rate by fitting an exponential function to the early ascending phase of daily ARI or A/H1N1 hospitalizations by date of symptoms onset [17]. The early ascending phase was determined as the period between the day of pandemic onset and the midpoint between the onset and peak days. We assumed a mean generation interval of three and four days, which are within the range of mean estimates for the 2009 influenza pandemic [2] [18] [19] [20]. As a sensitivity analysis we also assessed small variations in the length of the ascending epidemic phase used to estimate the exponential growth rate (+/- 4 days).

This study did not need approval from a scientific committee; all individual data were kept de-identified.

Statistical analyses were performed using SPSS 20.0 and Matlab (The Mathworks, Inc).


Overall epidemiological patterns

The characteristics of all ARI and A/H1N1-positive hospitalizations reported to the IMSS medical system between 01-Dec 2011 and 10-February 2012 are given in Table 1. The time series of daily ARI hospitalizations and deaths and laboratory-confirmed influenza hospitalizations are shown in Figures 1 and 2, respectively. An A/H1N1 influenza outbreak began around 01-December 2011 and is ongoing at the time of writing of this report (Figure 3), particularly in central Mexico (Figure 1). The daily number of ARI hospitalizations in winter 2011-12 is exceeding the levels that were observed during the major fall wave of the 2009 A/H1N1 influenza pandemic (Figure 4). In Mexico City the cumulative number of ARI hospitalizations during 01-Dec 2011 to 10-February 2012 represents 37% of all ARI hospitalizations that were reported in Mexico City during the first year of A/H1N1 virus circulation (April 2009 to Mar 2010).

2015-05-18 17_11_03-Recrudescent wave of pandemic A_H1N1 influenza in Mexico, winter 2011-2012_ Age

Table 1 . Characteristics of all ARI hospitalizations and laboratory-confirmed A/H1N1 influenza hospitalizations, Mexico, 01 December 2011 through 10 February, 2012.

img1curve-ari-hosp-deaths (4)

Fig. 1: Daily epidemic curves of all ARI hospitalizations (top) and deaths (bottom) by dates of symptoms onset in northern, central, and southeastern states of Mexico, 01-April 2009 to 10-February 2012.

img2subtype-allsamples-hosp (1)

Fig. 2: Daily number of influenza tests among ARI hospitalizations and laboratory-confirmed influenza hospitalizations by dates of symptoms onset spanning 01-April 2009 to 10-Feberuary 2012 in the 32 Mexican states according to influenza subtype.


Fig. 3: Daily influenza positivity rates (no. influenza positive ARI hospitalizations/no. tests among ARI hospitalizations) and percentage of influenza specific subtypes among influenza positive tests from 01-December 2011 to 01-February 2012.


Fig. 4: Daily epidemic curves of all ARI hospitalizations by dates of symptoms onset across Mexico spanning 01-December to 01-February compared to the fall pandemic wave (Aug-Dec 2009).

Severity of disease

A total of 5,795 ARI hospitalizations and 186 inpatient-ARI deaths (preliminary case fatality rate, 3.2% (95% CI: 2.8, 3.7)) were reported to the IMSS system between 01-December 2011 and 10-February 2012. The preliminary estimate of case fatality rate for laboratory-confirmed A/H1N1 inpatients was 10.9% (95% CI: 8.6, 13.3) over the same period (685 inpatients and 75 deaths).

This preliminary estimate of the case fatality rate for hospitalized A/H1N1 patients in 2011-12 is significantly lower than the CFR measured in 2009 (16.1% (95% CI: 15.0, 17.2)).


Overall the majority of laboratory-confirmed influenza inpatients during 01-Dec 2011 to 10-Feb 2012 were among persons aged 15-59 years (66.4%) followed by the 0-4 year age group (11.4%) and seniors >=60 years (12.4%) (Table 1). Severity increased with older age, with an inpatient fatality rate of 18.8% (95% CI: 10.3, 27.3) for persons >=60 years.

The cumulative hospitalization and inpatient death rates for the 3 waves of the 2009-10 A/H1N1 pandemic are on average 6.5 and 9.5 times greater than the corresponding rates for the ongoing 2011-12 A/H1N1 wave in Mexico. Comparison of the age-specific A/H1N1 hospitalization and death rates reveals an increasing burden among older populations in 2011-12, relative to the 2009-10 waves (Figure 5). An analysis of the proportionate distribution of A/H1N1 hospitalization and inpatient deaths reveals a shift in the age distribution of recent cases towards older ages as well. Specifically, we note a significantly higher proportion of individuals older than 60 yrs hospitalized with laboratory-confirmed A/H1N1 in 2011-12, relative to the 2009-10 pandemic period (12.4% vs. 6.1%, Chi-square test P<0.0001, Table 2, Figure 6). We also found a reduction in the proportion of A/H1N1-positive hospitalizations among persons 5-14 years of age compared to the 2009 pandemic (9.8% vs. 14.9%, Chi-square test, P=0.0003).

We found a similar change in the age distribution of A/H1N1 inpatient deaths in 2011-12 compared to the 2009 A/H1N1 influenza pandemic (Table 3, Figure 6). Specifically, 21.3% of deaths occurred among persons >=60 years of age in the ongoing 2011-12 epidemic period whereas only 8.9% in the 2009-10 period (Chi-square test, P=0.0006). Similarly to the age shift in hospitalization data, the proportion of A/H1N1 inpatient deaths among individuals aged 15-29 declined, relative to 2009-10 (9.3% vs. 21%, Chi-square test, P=0.02).


Fig. 5: Relative distribution of age-specific A/H1N1 influenza hospitalization rates (left) and A/H1N1 inpatient death rates (right) for the ongoing A/H1N1 influenza epidemic (01-Dec 2011 to 10-Feb 2012) compared to those of the entire 2009 A/H1N1 pandemic period (01-Apr 2009 to 31-Mar 2010) and to the first 70 days of the 2009 fall pandemic wave (01-Aug 2009 to 10-Oct 2009).

01-Apr 2009 to 31-Ma 2010 01-Dec 2011 to 10-Feb 2012
Total Proportion of hospitalizations (%) Total Proportion of hospitalizations (%) P value *
Total 4420 100% 685 100%
0-4 446 10.1% 78 11.4% 0.30
5-14 660 14.9% 67 9.8% 0.0003
15-29 1237 28% 169 24.7% 0.071
30-44 1010 22.9% 144 21.0% 0.29
45-59 798 18.1% 142 20.7% 0.09
>=60 269 6.09% 85 12.4% <0.0001

Table 2. Age-specific proportions of total laboratory-confirmed A/H1N1 hospitalizations for the 2009 A/H1N1 influenza pandemic compared to ongoing A/H1N1 outbreaks in Mexico spanning 01-December 2011 to 10-February 2012. We note a significantly different age distribution of A/H1N1 hospitalizations during 01-December 2011 to 10-February 2012 compared to that of the 2009 A/H1N1 influenza pandemic spanning 01-April 2009 to 31-March 2010 (Wilcoxon test, P<0.0001). *Computed using the Chi-square test statistic for differences in time periods.

01-Apr 2009 to 31-Mar 2010 01-Dec 2011 to 10-Feb 2012 P value *
Total Proportion of deaths (%) Total Proportion of deaths (%)
Total 711 100% 75 100%
0-4 37 5.2% 7 9.3% 0.14
5-14 42 5.91% 3 4% 0.50
15-29 149 21% 7 9.3% 0.02
30-44 227 31.9% 19 25.3% 0.24
45-59 193 27.1% 23 30.7% 0.52
>=60 63 8.86% 16 21.3% 0.0006

Table 3. Age-specific proportions of total laboratory-confirmed A/H1N1 inpatient deaths for the 2009 A/H1N1 influenza pandemic and ongoing A/H1N1 outbreaks in Mexico spanning 01-December 2011 to 10-February 2012. We note a significantly different age distribution of A/H1N1 inpatient deaths in the 4th wave, compared to that of the previous 3 waves during 2009 (Wilcoxon test, P=0.001). The age shift is characterized by a doubled proportion of elderly deaths, offset by a halving in deaths in young adults. *Computed using the Chi-square test statistic for differences in time periods.

Figure 6. Age-specific proportions of A/H1N1 influenza hospitalizations (left) and A/H1N1 inpatient deaths (right) for the ongoing A/H1N1 influenza epidemic (01-Dec 2011 to 10-Feb 2012) compared to those of the entire 2009 A/H1N1 pandemic period (01-Apr 2009 to 31-Mar 2010) and to the first 70 days of the 2009 fall pandemic wave (01-Aug 2009 to 10-Oct 2009).

Fig. 6: Age-specific proportions of A/H1N1 influenza hospitalizations (left) and A/H1N1 inpatient deaths (right) for the ongoing A/H1N1 influenza epidemic (01-Dec 2011 to 10-Feb 2012) compared to those of the entire 2009 A/H1N1 pandemic period (01-Apr 2009 to 31-Mar 2010) and to the first 70 days of the 2009 fall pandemic wave (01-Aug 2009 to 10-Oct 2009).


The majority of A/H1N1 inpatients during the 4 th wave have been reported in central Mexican states (66.3%) followed by southeastern states (19.4%) (Table 1), and a higher proportion of A/H1N1 deaths have occurred in central states compared to other regions (Chi-square test, P=0.04).

Estimates of the reproduction number

Assuming a mean generation interval of 3 (and 4) days, the mean R was estimated to be 1.2 (1.3) during the period 17-Dec 2011 to 9-Jan 2012, based on daily A/H1N1-positive hospitalizations. As a sensitivity analysis we also estimated R using daily ARI hospitalizations; our estimate of R was somewhat lower at 1.1 (1.2). When the length of the epidemic ascending phase was varied (+/- 4 days), our R estimates changed by 0.1 or less.


We have characterized the epidemiology of a recrudescent 4 th wave of A/H1N1 influenza transmission in Mexico spanning 01-December 2011 to 10-Feberuary 2012, based hospitalizations for acute respiratory infections and laboratory-confirmed A/H1N1 infections. We compared the impact, severity, age patterns, and reproduction number of this 4th wave with those of earlier pandemic waves in spring, summer and fall 2009 in Mexico, [1] [4] . We used individual-level patient information collected through a prospective influenza surveillance system put in place especially for the 2009 pandemic by the largest Mexican Social Security medical system and providing daily data during 2009-2012 [14]. Our data show that the nationwide peak level of daily ARI hospitalizations obtained so far in early 2012 (it may not yet have peaked) has already exceeded the peak of ARI hospitalizations observed during the major fall pandemic wave in 2009. We have also documented a significant increase in the proportion of A/H1N1 hospitalizations and deaths among persons >=60 y, relative to the 2009 pandemic, and a significant reduction in the proportion of A/H1N1 hospitalizations and deaths among school age children.

The observed change in age distribution of hospitalization and deaths in the post 2009 pandemic period is reminiscent of the influenza seasons following the 1918 influenza pandemic [8] [9] [21] and the 1968 pandemic [22]. A quantitative analysis of excess mortality prior to and after the1918 influenza pandemic found that the age distribution of influenza-related mortality returned to pre-pandemic mortality levels a few years after the initial pandemic waves as a result of emerging drift variants [9] [23]. Hence the age shift seen in the 2011-12 winter season could signal either a gradual emergence of drift A/H1N1 variants, and/or a build up of immunity among younger populations. Both have implications for influenza prevention and mitigation strategies, which we discuss below.

During the first year of circulation of the 2009 A/H1N1 influenza pandemic virus, protection from influenza-related morbidity and mortality rates was reported in people over 60 years. This phenomenon of “senior sparing” in age cohorts born prior to the 1957 pandemic is consistent with first exposure to antigenically-related A/H1N1 viruses in childhood, a pattern consistent with the antigen recycling and original antigenic sin hypotheses [1] [24] [25] [26] [27]. A high fraction of the Mexican population is now protected against the 2009 A/H1N1 influenza virus through natural exposure in 2009 (children and young adults) or prior immunity (seniors) [7] and by pandemic vaccines. Over 7 million of seasonal influenza vaccine (featuring a good match for the H1N1-pdm vaccine component) were administered in 2011-12 winter (35% vaccination coverage among IMSS-affiliated seniors >=60 years; 70% among <5 years; 40% among 50-59 years; and 24% among 5-9 years).

Although we saw evidence of a shift in the age distribution of 2011-12 cases towards seniors, the absolute risk of getting hospitalized was still relatively low in this age group, relative to those in younger adults. The declining rates of severe cases in younger age groups is most consistent with build-up of immunity. Overall the age distribution of recent A/H1N1 influenza hospitalizations and deaths in Mexico is relatively flat and not quite back to the normal “J-shaped” age risk profile that characterizes seasonal influenza. In the long run, we expect the pandemic A/H1N1 virus to drift genetically to escape mounting population immunity – perhaps with the result that seniors are no longer protected [1]. Hence the epidemiological evidence is consistent with the genetic and antigenic information published on circulating influenza virus, suggesting a lack antigenic drift in A/H1N1 viruses in Mexico or elsewhere in winter 2011-12, a season associated with relatively low A/H1N1 activity globally [28].

Since transmission of the A/H1N1 influenza virus was sporadic in the winter of 2010-2011 in Mexico, we cannot rule out the possibility of some loss of population immunity since 2009. We estimated a reproduction number for the ongoing A/H1N1 epidemic to be significantly lower to that of the spring (R~1.8-2.1) and summer (R~1.6-1.9) pandemic waves in 2009 in Mexico, but in close agreement with estimates of the fall (3rd) 2009 wave (R~ 1.2-1.3)[4].

Perhaps the most surprising finding of this analysis is the occurrence of a substantial 4 th wave of pandemic A/H1N1 activity in Mexico, a country which has already experienced severe excess mortality impact during 3 waves of transmission in 2009 [4] [6]. Although we are just beginning to assess the global mortality burden of the 2009 A/H1N1 virus in the pandemic and post-pandemic period, important geographical variations in the number, timing, transmissibility and impact of sequential pandemic waves are obvious. For instance, the UK experienced 2 waves in spring and fall 2009, to be followed by a relatively severe recrudescent wave in 2010-11, not seen in other European countries [29]. The US experienced the brunt of the pandemic burden in the first year of A/H1N1 circulation. To our knowledge, the 4 wave pattern seen in Mexico in 2009-12 has not been reported in other countries. Whether these differences can be explained by geographical variation in prior immunity, seasonal drivers, control strategies, connectivity, health and healthcare, is unclear and remains a key area for future research.

In summary our findings indicate a changing age distribution of laboratory-confirmed A/H1N1 influenza hospitalizations and deaths in winter 2011-12, relative to 2009-10 A/H1N1 pandemic patterns. The proportion of hospitalizations and deaths is increasing in seniors >=60 years, an age group that was largely protected during the early pandemic waves in 2009. In contrast, rates of A/H1N1 hospitalizations and deaths are declining among younger population groups, consistent with a gradual build up of immunity. This gradual change in the age distribution A/H1N1 influenza in 2011-12 in Mexico is reminiscent of post-pandemic patterns in past influenza pandemic. As the 4 th wave is still ongoing, it is too early to determine whether it is more severe than the previous waves in terms of mortality – something that occurred in the 1889 pandemic in which a 3 rd wave occurring in the winter of 1891-92 was far more deadly than previous waves [13] [30].

Whether other countries will eventually experience similar severe recrudescent waves of A/H1N1 activity remains to be seen. A multinational comparison of the epidemiology of pandemic and post-pandemic waves would be useful to shed light on the long-term transmission dynamics and build up of immunity to pandemic viruses, and inform control strategies.


The authors declare no relevant competing interests.


Dr. Víctor H. Borja-Aburto Coordinación de Vigilancia Epidemiológica y Apoyo en Contingencias, Instituto Mexicano del Seguro Social, Mier y Pesado 120, México, DF 03100 México Email:

]]> 0
Applying a New Model for Sharing Population Health Data to National Syndromic Influenza Surveillance: DiSTRIBuTE Project Proof of Concept, 2006 to 2009 Mon, 12 Sep 2011 12:07:16 +0000


The Distributed Surveillance Taskforce for Real-time Influenza Burden Tracking and Evaluation (DiSTRIBuTE) project is a case example for a new paradigm in the collection and sharing of public health data [1]. By connecting state and local jurisdictions that conducted electronic, emergency department (ED) syndromic surveillance, the DiSTRIBuTE project aimed to demonstrate the feasibility and utility of a fast, inexpensive, low burden model for population level respiratory, febrile and influenza-like morbidity surveillance. This effort emerged out of the unique collaborative environment of the International Society for Disease Surveillance (ISDS), where federal, state, and local public health agencies, academia, businesses, non-profit organizations, and other stakeholders leverage resources and technology to work together to advance disease surveillance practice and research.

The origin and evolution of DiSTRIBuTE was influenced by and responsive to the needs of public health departments and their use of syndromic surveillance systems for influenza monitoring in their own jurisdictions. The project was consistent with emerging models of population health data sharing and with changing federal perspectives on syndromic and biosurveillance architecture. Founded in this context, DiSTRIBuTE was based on a set of core principles: 1) share aggregate level data to minimize risk of exposure of personally identifiable information; 2) maintain jurisdictional control of surveillance data and information; 3) minimize barriers for health department participation; and 4) follow a collaborative approach to build on the flexibility of local systems and create a dynamic network. The resulting network was built upon the expertise and infrastructure of participating public health departments.

In this paper, we set the context and describe the development of DiSTRIBuTE, presenting the goals and underlying principles behind the project and describing its evolution from the autumn of 2006 to the summer of 2009. Finally, we consider the lessons learned relevant to the use of syndromic surveillance for national influenza monitoring, and more generally to the sharing of population health data in the rapidly changing health information technology landscape.


Over the last decade, many public health agencies have implemented syndromic surveillance systems to provide early warning and detailed situational awareness of disease outbreaks, bioterrorist threats, and other ongoing health crises or events [2][3][4][5]. These systems represent a potential innovation over other surveillance approaches due to their rapid collection of high volume, pre-diagnostic, electronic data, coupled with routine and automated application of detection algorithms and other analytic methods. The original rationale for implementing syndromic surveillance systems was to provide the earliest possible warning of unusual health events or emerging threats. Early detection can translate into rapid implementation of control strategies, and ultimately, mitigation of morbidity, mortality, economic loss, and threats to national security [2][3][4][5]. These systems typically use existing electronic data sources and thus, compared to systems that require manual data collection, syndromic surveillance systems offer the potential for cost savings and more rapid collection of larger volumes of data.

Despite the potential advantages of these syndromic systems, they have not been embraced universally. This reticence has been due to concerns about the programs’ utility for initial outbreak and specific disease detection, limited public health funding and workforce resource constraints, particularly at the state and local level, and tensions between public health and national security priorities [2][3][4]. The need to invest in early detection of unusual events such as bioterrorism, pandemics, or other emerging health threats through syndromic surveillance systems has also been questioned in the context of the pressing need for public health departments to invest in existing surveillance systems aimed at monitoring notifiable diseases and outbreaks typically encountered in their jurisdictions. Public health departments that have adopted electronic syndromic surveillance systems, however, have reported that they increasingly use these systems to complement routine surveillance, most notably for influenza-like illness (ILI) syndromes, and for general large area morbidity trend monitoring [6].

As experience increased with the development and application of syndromic surveillance, evidence from health departments began to emerge that these data could provide important information at the local and regional levels to improve monitoring of seasonal and epidemic influenza [7][8][9][10][11]. This evidence led to public health interest in having more timely information about neighboring, regional, and national influenza trends. Users from health departments participating in the ISDS community felt that their state and local syndromic surveillance systems could provide rapid, representative, and accurate trends in their own jurisdictions. They also found it valuable to have information about surveillance trends in neighboring and “peer” health departments, and that sharing their information with other jurisdictions made those health departments in turn more willing to share their own surveillance information.

In the US, ILI surveillance has traditionally been conducted through a volunteer sentinel physician network by the Centers for Disease Control and Prevention (CDC) and coordinated through state and several large metropolitan health departments [12]. The CDC system, ILINet, monitors reported cases of ILI, defined clinically as individuals presenting with influenza, or fever with cough and/or sore-throat in the absence of another known cause. The ILINet cases are aggregated by week ending Saturday, typically reported to CDC during the following week manually via a web form, and presented as percent ILI to clinic visit ratios by region on the CDC website, typically by the end of that week. The system collects data nationwide, and includes a viral testing component whereby participating sentinel physicians submit clinical specimens to CDC periodically throughout the influenza season for laboratory testing.

The CDC ILI network is a highly valued public health surveillance system and serves the critical roles of monitoring ILI trends and collecting viral samples in all 50 states. Potential benefits of syndromic surveillance, however, over the traditional sentinel physician network include: faster provision of data due to electronic, automated data submission; lower burden on healthcare providers who would otherwise have to report manually; better stability of data, since providers might otherwise drop out or have delays if reporting manually; year-round reporting, unlike the sentinel system which only receives reports from the majority of participants during influenza season; availability of age-specific denominator data; flexible case definitions; and in many jurisdictions, potential for greater population coverage than with the sentinel physician network.

The DiSTRIBuTE Project

Models of Data Sharing

The DiSTRIBuTE effort was based on a practical philosophy of data sharing and on two key observations. First, for legal and organizational reasons, public health departments can share non-specific surveillance information and aggregate level data more easily than patient level records from health facilities within their jurisdictions. Second, public health departments are more willing to share information and data on disease patterns among trusted collaborators, particularly when only the minimum data needed to answer the public health questions are provided, and where jurisdictional control and authority over data are maintained. Previous national syndromic surveillance efforts, such as the original BioSense system, were top-down models which relied on centralized aggregation of detailed personally identifiable information, were not developed collaboratively and, consequently, were disconnected from the practical needs and resources of public health departments [13].

In 2006 and 2007, new federal level biosurveillance and pandemic preparedness recommendations, notably Homeland Security Presidential Directive, HSPD-21 [14] and the Pandemic and All-Hazards Preparedness Act, PAHPA [15], required that federal efforts be based on existing state and local biosurveillance and influenza surveillance systems. However, there were no clear guidelines regarding the coordination of these efforts, systems, and practices. During the same period, the Markle Connecting for Health Collaborative — a public-private collaborative of over 100 health, policy, and technology leaders brought together by the Markle Foundation — identified common challenges to data sharing in a wide range of population health efforts [16]. Many of these challenges were attributed to the current paradigm for analyzing population health data, which is typified by central collection of personally identifiable records, followed by data processing, cleaning and analysis. Notable problems with this model were a tendency to create data silos, lack of feedback to the original data holders, legal and practical restrictions to sharing individual-level data, delays in accessing or disseminating collected data, considerable cost to acquire data and concern over jurisdictional autonomy regarding use of health data and information.

Drawing on experiences from multiple stakeholders, Markle Connecting for Health proposed principles intended to facilitate the sharing of population health data in support of effective decision-making [1]. These principles include collecting only summarized data with personally identifiable data being held at the source; cleaning and analyzing data at the source before sharing it in a standardized format; making aggregate data available across the network for analysis without requiring access to the original data; and building trust among entities in the network, enabled by having a set of policies and practices for jurisdictional control and data protection.

The DiSTRIBuTE project emerged out of the unique collaborative environment of ISDS, which fosters the creation of interdisciplinary, cross-agency collaborations that bridge research, practice and policy. The development of the project was also influenced by current thinking around models for sharing population health data, developing experience with syndromic surveillance for influenza, and the evolving national biosurveillance policy landscape in the United States.

Project Goals

At the outset of the DiSTRIBuTE project, limitations in data sharing were evident in national influenza surveillance and biosurveillance practices, as reflected by position statements from within the public health community [13][17] and Federal legislation and directives [14][15]. Specific to ILI surveillance, limitations included: delays in reporting, high provider drop-out rates, burden on clinical and public health practice, limited flexibility with case definitions, lack of age-specific denominator, and lack of year-round reporting. Specific to biosurveillance practice, limitations included: creation of multiple separate data silos, lack of feedback to original data holders, legal and practical restrictions to sharing personal identifiable information, delays in accessing or disseminating collected data, considerable cost to acquire data, and concern over state and local jurisdictional autonomy. In an effort to address these limitations and needs, the ISDS DiSTRIBuTE project sought to develop a simple, low cost network for sharing aggregate data from ED syndromic surveillance systems that would protect privacy and allow jurisdictional data control.

The primary goals of DiSTRIBuTE were to establish the feasibility of sharing aggregate population health data; and to assess the utility of regional and national sharing of ED syndromic surveillance data for influenza surveillance. These goals were also in line with the Markle Connecting for Health principles [16].

Project Principles

In order to achieve the original goals of the project, a core set of principles were followed. These are summarized in Box 1, and discussed below.

Use Aggregate Data. Collection of aggregated data from health jurisdictions has advantages over individual-level raw data because it can sufficiently represent populations while protecting personally identifiable information. DiSTRIBuTE employed a simple data format that included an agreed upon minimum level of data detail for the epidemiological question and public health action of monitoring febrile, respiratory and ILI syndromes at the population level.

Maintain Jurisdictional Control. Public health departments are more likely to share surveillance data among trusted collaborators in an environment that protects jurisdictional control and where policies are defined in a participatory manner. The intent of this principle was to ensure that both the data framework and the exchange of surveillance findings and interpretation were suitably controlled by the participating jurisdictions.

Minimize Barriers to Participation. The use of a simple aggregate data format minimized one barrier to participation in the project. Participants were initially asked to submit counts of cases measured according to the definitions and standards used in their existing systems. This flexibility lowered the barrier to entry into the project, deferred to state and local authority, and leveraged existing practices with local syndrome definitions and standards. This approach acknowledged the role of local context in extracting public health information from local clinical data. In other words, we assumed that local syndromes were defined based on regional variations in data collection standards, idiom, language, syndrome coding, hospital information systems, as well as other factors, and we wanted to build on this experience and expertise.

Create a Collaborative Network. Data exchange and information sharing in the DiSTRIBuTE project was community-based, and participating jurisdictions were collaboratively involved in the development, specification, and implementation of the system, and in the data analysis and interpretation. The intent of this principle was to ensure that the data exchange framework was based on jurisdictional needs and priorities, and to foster data sharing among trusted collaborators in an environment where they defined the policies and controls in a participatory manner.

The DiSTRIBuTE System

DiSTRIBuTE began with the comparison of syndromic surveillance trends between jurisdictions and with the sharing among syndromic surveillance practitioners of system coding and specifications. The public health utility of comparing influenza surveillance trends and the relative ease of sharing aggregate data, programming code, and system specifications between a small number of participant sites suggested that a framework and system could be created to scale the effort to a larger group of local, state, and international participating health departments.

Beginning in 2006, the DiSTRIBuTE project sought to enroll health departments that conducted ED-based syndromic influenza surveillance in their jurisdictions to voluntarily share their data [6][7][8][9][10][11]. Many had expressed an interest in sharing data and collaborating in an effort to build a participatory, grassroots surveillance network since federal efforts at the time largely bypassed state and local systems, context, expertise and authority.

Data characteristics and specifications in DiSTRIBuTE were originally aimed at capturing aggregate daily total and influenza-related syndrome counts of ED visits by predefined age-group and three-digit ZIP-code (ZIP-3). Participant sites reported aggregate counts by groups covering infants and toddlers (age <2 yrs), preschool-age (2-4 yrs), school-age (5-17 yrs), working-age adults in younger (18-44 yrs) and older (45-64 yrs) groups, and senior citizens (age 65+ yrs). Geographic information was originally requested to capture patients’ reporting ZIP-3, however, some participating sites chose instead to submit data based on ED facility ZIP-3, or to aggregate to larger regional areas than ZIP-3 (e.g., reporting data aggregated to the city or county level). The febrile, respiratory and influenza-like syndromes initially used in the project were requested to be comparable to a commonly used “fever/flu” syndrome that many participant sites were familiar with, and which many were currently using for surveillance in their jurisdictions [7].

Syndrome Definitions

While participating sites generally did not use identical syndrome groupings to monitor influenza-related ED visits in their jurisdictions, the data were requested to be submitted as the preferred local syndrome grouping used by the health departments, with the intent of representing the patterns and trends that the jurisdiction wished to have shown for their region. During the proof of concept period, as sites shared data and observed each other’s trends, there was interest in comparing syndromes based on concepts rather than strictly on defined coding standards [18]. There also was interest in applying more broad and sensitive febrile and respiratory influenza-related groups, and more narrow and specific influenza-like syndromes that were more closely analogous to the traditional clinical surveillance definition of ILI (a presentation with influenza, or fever with cough and/or sore throat) [11][12].

In a pilot comparison of two DiSTRIBuTE jurisdictions, each used the other’s syndrome coding and applied it to their own data to compare the resulting surveillance trends with the syndromes they used locally. The results suggested that using the locally defined syndromes created surveillance time-series that better matched the viral isolate data of confirmed influenza cases locally [19][20]. As the project progressed, participating sites began to reevaluate their own syndrome definitions, and in many cases began to move toward common syndrome concepts through stepwise adaptations of local syndrome definitions, coding, and preferences. As sites looked at each other’s data more, they better understood the heterogeneity of the syndrome groups. And when they assessed their own syndrome characteristics, many reported making minor coding changes that were consistent with making the syndrome concepts and definitions closer and more comparable. This followed a similar process to standardization in other information technology efforts, where standards were aligned with the measurable and practical needs and interests of the users – and standards became “standard” because they became the normal case in the field [21].

During the proof of concept period, work focused on comparison of influenza surveillance systems, evaluation of the model, and harmonization of common ILI syndromes [19][20][22][23]. Similarly, acute gastroenteritis syndrome indicators were implemented in the project, with the goal of being able to monitor winter-seasonal increases believed due primarily to norovirus and rotavirus epidemics across the DiSTRIBuTE network. Implementing and sharing acute gastroenteritis trends also served to assess to what degree the DiSTRIBuTE model could be generalized from monitoring population level ILI morbidity trends to monitoring trends in another constellation of syndrome groups believed closely tied to two recurring winter-seasonal epidemic viral diseases. The implementation of these new groups required participants to adopt or modify syndromic output to include an additional column of data and report historical baselines of the new syndrome groups. Additionally, during the proof of concept period, evaluation efforts and informal ad hoc comparisons were done within the network, notably with respect to an assessment of DiSTRIBuTE data with Google Flu Trends [23] and with the emergence of novel A/H1N1 influenza in the spring of 2009.

Technical Infrastructure

The technical infrastructure to initially support DiSTRIBuTE employed an incremental approach to automating and securing the manual processes initially used in the pilot, and then by early participants. Data were originally exchanged by email attachments using comma separated value (CSV) files in similar formats. An automated system was implemented to permit secure file transmission using the secure shell file transfer protocol (SFTP) and utilizing existing commercial software at the health jurisdiction, or equivalent free software identified by the project. The users sent their data files to a central server where automated software services received the data files, performed basic error detection (including data type errors, simple range checking, and improper coded values), and transformed the data into a common format (allowing for site-specific variation in the composition and layout of the CSV file). The resulting files shared a common syntax that were free of both format and simple content errors. Examples of the latter included violations of simple data type, basic range checking of results, and constrained enumerated values (such as age ranges for age stratification), and were free of format and simple content errors, including violations of simple data type. The standardized files were parsed and their content imported into a simple database, which was implemented using the open source MySQL database package. Open source software was employed for the entire centralized server, using the “Linux, Apache, MySQL, PHP, Perl” development framework.

Visualization of Data

The display of DiSTRIBuTE trends was created as the primary means for sharing and presenting data from participating health jurisdictions. Initially, weekly aggregate ratios of febrile respiratory or influenza-like syndromes, based on each participating site’s routine syndromic criteria for monitoring seasonal influenza, to total ED visits, were visualized as regional weekly time-series (Figure 1), and as age-specific temporal epidemic response surface (TERS) plots (Figure 2). The regional time-series plots presented variation in relative baseline levels during non-influenza periods during the proof of concept period, from lower quartile weekly ratios of 1.4% to over 6% of total ED visits. Peak seasonal epidemic influenza levels by participating health departments, ranged from 6% to over 16%. The variation in non-influenza period baseline levels and peak seasonal epidemic levels indicated that the overall magnitude of the data were not directly comparable. However, the relative pattern of the time-series reported by participating systems were noted as being consistent with regional, state and local influenza surveillance systems and measures [6][7][8][9][10][11].

While participating DiSTRIBuTE sites were not representative of the whole nation on a population basis, taken as an aggregate 13% convenience sample of the U.S., the data for the 2006-2007 and 2007-2008 influenza seasons were highly correlated with national ILI surveillance data [22][23] (Figure 1).

Fig. 1: DiSTRIBuTE time-series visualizations by jurisdiction, February 16, 2009.

ILI syndrome time-series are plotted as ratios for DiSTRIBuTE and national CDC sentinel ILI, and as counts by subtype for viral influenza isolate data [12]. DiSTRIBuTE data were typically reported one or more weeks ahead of sentinel reporting data. Pearson correlation between all DiSTRIBuTE sites and CDC ILI for the 2006-2007 and 2007-2008 influenza seasons was 0.96 (p<0.01).

Fig. 2: DiSTRIBuTE age-specific visualization, February 16, 2009.

ILI syndrome time-series as age-specific temporal epidemic response surface (TERS) plots for 2006-2009 are shown as relative increase, calculated as observed over the lower-quartile baseline ratio by age-group and jurisdiction [11]. Age-groups are stratified into ranges representing infants and toddlers (age <2 yrs), preschool-age (2-4 yrs), school-age (5-17 yrs), younger adults (18-44 yrs) older adults (45-64 yrs), and senior citizens (age 65+ yrs).

The age-specific visualizations presented participant jurisdiction data as an interpolated surface gradient of the relative magnitude of visits, calculated as observed ratios over lower-quartile baseline, by age-group through time [11]. The plots presented a snapshot of age-specific trends and intensity by jurisdiction, with notable characteristics such as the age-specific timing and relative magnitude of the predominant circulating epidemic viruses in a particular jurisdiction (Figure 2).

National surveillance trends in ILINet from 2008 and later were highly correlated with DiSTRIBuTE data, however ILINet included three DiSTRIBuTE participating health department ED systems during this period. The inclusion of these three state and local syndromic systems represented a large portion (roughly 27%) of the total national ILINet visit volume, and this resulting overlap prevents direct comparison of the systems without disaggregating the regional data. A suitable gold standard for evaluation of population level influenza trends, whether from clinical sentinels or syndromic ED data, is time-series of laboratory confirmed influenza infections. The national viral isolate data, as a proportion of weekly viral tests positive for influenza, were highly correlated with the combined DiSTRIBuTE data (Figure 3).

Fig. 3: DiSTRIBuTE and CDC ILINet and viral surveillance data, 2007-2009.

DiSTRIBuTE ILI syndrome time-series are shown as ratios, with national CDC sentinel ILI and viral influenza isolate data [12]. Pearson correlation between all DiSTRIBuTE sites and CDC viral surveillance were significant for the 2007-2008 (0.90, p<0.01) and 2008-2009 (0.86, p<0.01) influenza seasons.

Originally, visualizations and findings were summarized and disseminated via email, on the ISDS website, and through meetings, webinars and presentations. Later visualizations were presented on a user website hosted at the University of Washington, where a series of automated queries and statistical and graphing routines were implemented as scripts in the R statistical programming language and run on a daily basis to produce images for both public and restricted-access web applications. The public site showed time series of daily and weekly syndrome ratios. The restricted site was accessible by only those jurisdictions contributing data to DiSTRIBuTE (Figure 4). It included both daily and weekly ratios for several ILI and gastroenteritis syndromes, as well as their constituent data, syndrome counts and visit totals.

As the number of participant sites grew, features were added to enhance the ability to create custom visualizations, as well as to enter descriptive data, or metadata, specific to each site. These data included information needed to manage the creation of combined or composite time series, control the flow of data based on specific terms of participation for particular sites, specify order of appearance for menu items and visualizations, provide denominator information such as estimates of the population covered or number of hospitals in a particular jurisdiction, and contact information for epidemiologists and IT staff with each jurisdiction and site.

Fig. 4: National and HHS Regional DiSTRIBuTE time-series, 2007-2011.

Daily trend data are shown for ILI counts, ILI ratios and total ED visits as presented on the ISDS restricted website (Accessed April 6, 2011). The 42 DiSTRIBuTE participating state, county and city jurisdictions are aggregated by federal surveillance region: Region 1, Boston, Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2, New Jersey, New York City, New York State; Region 3, Maryland, Pennsylvania, Virginia; Region 4, Florida, Georgia, North Carolina, South Carolina, Tennessee; Region 5, Cook County Illinois, Indiana, Michigan, Minnesota, Ohio, St. Louis Illinois, Wisconsin; Region 6, Arkansas, North Central Texas, Oklahoma City and Tulsa Counties Oklahoma; Region 7, Missouri, Nebraska; Region 8, Denver Colorado, North Dakota, Utah; Region 9, Arizona, Los Angeles, Napa and San Diego Counties California; Region 10, Portland Oregon, Seattle and King County and Washington State.

The Pandemic Phase

The development and evolution of the DiSTRIBuTE project changed dramatically with the emergence of novel A/H1N1 influenza in April of 2009. With the first influenza pandemic in over 40 years, the demands on local, state, and national public health practice increased. There were notable lessons learned with the DiSTRIBuTE effort during the emergence and progression of the pandemic in the spring of 2009 and with the public health concern over its anticipated recrudescence the following summer and fall – these lessons are presented in the sections below. The DiSTRIBuTE findings and the federal surveillance needs in response to the pandemic [24] led to the implementation and expansion of DiSTRIBuTE nationwide through a collaborative effort with the Centers for Disease the Centers Disease and Prevention (CDC), as part of an existing cooperative agreement with the Public Health Informatics Institute (PHII).

Feasibility of the Data Sharing Model

The DiSTRIBuTE project demonstrated that it was feasible to share population health ILI data nationally in a manner that addresses the needs of local jurisdictions [13][17] and is consistent with the Markle Connecting for Health principles [16]. Although it was clear that submitting aggregate data, as opposed to individual records, encouraged data sharing among participants, it did not altogether eliminate concerns about control and use, particularly with regard to sharing provisional and near real time data.

The principle of keeping barriers to participation low facilitated the growth of the DiSTRIBuTE network. One of the other apparent effects of this principle was that data submitted by different regions often reflected different syndromic definitions, with varying sensitivity and specificity for monitoring conditions of interest. Some epidemiologists and other observers perceived this difference as a limitation to overall data quality, particularly with regard to comparing data .across regions. However, others viewed the heterogeneity as something that was recognized and accounted for by the jurisdictions, where definitions were derived empirically or in response to a local surveillance need. Recognition of these differing perspectives led almost immediately to ongoing collaborative assessment of syndrome definitions and other surveillance practices.

Applying the Model

The success of the DiSTRIBuTE project in demonstrating the feasibility of implementing an innovative low cost model for national syndromic influenza surveillance suggests that it is worth considering expansion of this model to other surveillance activities and to other fields in public health and health care. In DiSTRIBuTE, the data collection process was generalized to acute gastroenteritis syndromes, and potential expansion, for instance, to the monitoring of quality of care would be more ambitious and rewarding. In fact, elements of the DiSTRIBuTE model could conceivably be applied to any system that relies on collaborative networks, rigorous data collection, detailed and comprehensive data, and ongoing technical support.

The lessons regarding standardization of influenza surveillance are also instructive. The issue of standardization remains a major challenge for syndromic surveillance. While relative disease patterns and trends can be compared between systems with heterogeneous syndrome definitions, the ability to compare more direct epidemiological measures is limited. ISDS and collaborators continue to work on strategies to find common ground for syndrome definitions and coding that can inform national agendas for syndromic surveillance.

Providing an emphasis on establishing data transmission while accepting variations in data standards and then allowing the community to move to collaborative harmonization appears to be a viable approach and one that offers an alternative to top-down standards-setting approaches, which can be slow to complete and result in high technical barriers to participation. It also needs to be acknowledged that population surveillance of aggregate data has inherent limitations for queries and analysis at deeper levels of detail.

The question of cost is also important to consider. The DiSTRIBuTE model demonstrated that a national network can be organized to support existing state and local systems, expertise and infrastructure for both public health surveillance and emergency preparedness and response with modest funding (i.e., less than $200,000 per year during the proof of concept period, 2006 to 2009). For the contributing systems, the necessary costs incurred by state and local jurisdictions to support syndromic surveillance were offset in part or in total through federal funding by the CDC through Public Health Emergency Preparedness (PHEP) cooperative agreements, and Public Health Emergency Response (PHER) grants, and through foundation support or direct spending by the jurisdictions themselves. While the total costs required for developing and operating the existing infrastructure of state and local syndromic surveillance through federal PHEP and PHER funding is unknown, federal level efforts to create a national centralized network, such as BioSense, cost roughly $30 million per year during the period 2003 to 2009 [25][26].


During the period from 2008 to 2009, the participating DiSTRIBuTE jurisdictions represented over 16% of the US population, and captured roughly 13% of all ED visits nationwide. By early 2011, the network had grown to over 43 reporting sites and captures over 40% of all ED visits. The DiSTRIBuTE project has changed the practice of syndromic surveillance in the US. It promoted an assessment of practice patterns and helped to identify variations among different public health jurisdictions, while offering national perspective and peer comparisons. DiSTRIBuTE has been useful for understanding the current landscape of syndromic surveillance, enhancing data quality, and creating a framework that can be applied to syndromes beyond influenza.

In 2009, DiSTRIBuTE was identified as a case example in a White House recommendation by the Presidents Council of Advisors on Science and Technology for implementation of a nationwide ED surveillance network as part of the preparedness and response effort to the 2009 influenza A/H1N1 pandemic [24]. The project was highlighted in Senate testimony by the White House Chief Technology Officer as an example of the federal ‘Open Government Directive’ in moving public health surveillance research into development and deployment, most notably for it’s grass-roots participation, low cost to acquire data and unprecedented public transparency [27]. DiSTRIBuTE has also impacted the way national health information technology initiatives view syndromic surveillance [1]. In 2010, the DiSTRIBuTE community initiated and participated in a process led by ISDS to define business standards for syndromic surveillance and create messaging standards in support of “Meaningful Use” certification (Recommendation: Core Processes and EHR Requirements for Public Health Syndromic Surveillance, available at

The significant progress in population level syndromic surveillance that the DiSTRIBuTE project made during the proof of concept period from 2006 to 2009 demonstrated that a national network can be created with modest funding and without unnecessary exposure of record level information or unnecessary burden on health practitioners – and all while supporting existing state and local public health systems and capacity and providing rapid, high volume, summary electronic surveillance. Electronic data processes and approaches like this can not replace clinician reporting or laboratory testing of clinical samples collected by sentinel physicians, but they can augment these systems in a way that confirms and expands our understanding of population level disease trends. Tremendous potential remains for DiSTRIBuTE to continue to improve the nation’s ability to prepare for, monitor and respond to disease outbreaks.

Expansion of Collaborator List

ISDS Distribute Working Group

  • Marc Paladini, New York City Department of Health & Mental Hygiene
  • Richard T. Heffernan, New York City Department of Health & Mental Hygiene
  • Atar Baer, Public Health – Seattle & King County, WA
  • Michael A. Coletta, Virginia Department of Health
  • Karl Soetebier, Georgia Division of Public Health
  • Erin L. Murray, Georgia Division of Public Health
  • Lana Deyneka, North Carolina Division of Public Health
  • Amy Ising, NC-Detect, University of North Carolina, Chapel Hill, NC
  • Ryan Gentry, Indiana Department of Health
  • Felicia Alvarez, Utah Department of Health
  • Melissa Dimond, Utah Department of Health
  • Bryant Thomas Karras, Washington State Department of Health
  • Kieran Moore, Queen’s University, Kingston, Ontario, Canada
  • Ian Painter, University of Washington, Seattle, WA
  • William B. Lober, University of Washington, Seattle, WA
  • David L. Buckeridge, McGill University, Montreal, Canada
  • Donald R. Olson, International Society for Disease Surveillance

Author Contributions

ICMJE criteria for authorship read and met, DRO, MP, WBL, DLB; agree with the manuscript’s results and conclusions, DRO, MP, WBL, DLB; wrote the first draft of the paper, DRO and MP; revised the paper for substantive content and interpretation, DRO, MP, WBL, DLB. Contributed to the roundtable meetings and discussions that contributed to the development of the project, led to the paper, and were provided opportunity to review and revise paper, ISDS DiSTRIBuTE Working Group members.

Competing Interests

The authors have declared that no competing interests exist.

]]> 0
Structure-based drug design of a new chemical class of small molecules active against influenza A nucleoprotein in vitro and in vivo Thu, 01 Sep 2011 23:58:50 +0000


The drugs used against influenza can be classified into two broad groups depending on whether the compound acts on a host factor of the human cell or on virus target proteins (e.g., see [1] for a review). The host factor drugs are less prone to the development of drug resistance and usually have a wide range of antiviral activities [2][3][4][5] . Although this approach is quite popular, attacking human cell targets may influence essential functions of the organism and may therefore require sophisticated studies regarding the mechanism of action (MoA) and adverse effects.

Commonly used virus-targeting compounds such as amantadine and rimantadine [6] inhibit the ion channel M2 [7][8] and disrupt virus entry into target cells. Modern treatments such as zanamivir and oseltamivir (Tamiflu) inhibit the action of the membrane protein neuraminidase [9] . Unfortunately, all of these treatments lead to the emergence of drug resistance and side effects [10][11][12][13][14] .

The development of novel antiviral medications is complicated by an extreme degree of genetic variability of the pathogen and its ability to cross interspecies barriers [15][16] . A plausible solution involves the development of a targeted antiviral compound that is highly active against a specific virus target. The selected target should concomitantly be highly conserved within the virus population with little similarity to any human protein.

An example of the approach involves the selective use of nucleoprotein (NP), which is one of the most conserved proteins within the influenza virus genome and has recently been suggested to be a promising drug design target [1][17] . We applied a sequence of in silico screening tools [18] using virtual libraries of small molecules and identified the derivatives of 3-mercapto-1,2,4-triazoles as potential NP inhibitors. The top predicted binders were confirmed to be effective against various strains of influenza A in vitro. The most active compound demonstrated sufficient efficacy in terms of animal protection in the influenza challenge model in mice.


The 3D structure of the monomeric NP was taken from a previous study [19] and corresponds to an H5N1 virus (PDB code 2Q06). Two binding cavities of sufficient volume (exceeding 400A^3) for the subsequent docking were found using the flood-fill algorithm of PocketPicker [20] . The first site (referred to as Site 1) is located near the epitope sequence I265-S274 from [21] . The other site, Site 2, is situated next to the epitope sequence R174-K184 from a previous study [21] .

Fig. 1: The compound F66 docked to influenza NP

(see Results and Discussion sections for the details).

NP inhibitors were identified via molecular docking of a specially prepared small molecules library (see Materials and Methods section) to Sites 1 and 2. To confirm the computed activity of the top ten predicted binders, we measured the cytopathic effect (CPE) of the influenza A/Wisconsin/67/2005(H3N2) virus in the presence of the studied compounds at a concentration of 5 µM in a plaque formation assay. Oseltamivir phosphate was used as a positive control. At least four different 3-mercapto-1,2,4-triazoles derivatives, all docked to Site 2 (see Figure 1), were determined to be active (see Table 1 for the summary of the results) in vitro.

Table 1. Plaque forming assay of the top 3-mercapto-1,2,4-triazoles derivatives against A/Wisconsin/67/2005(H3N2). The numbers in brackets represent the results of the two subsequent single concentration tests.

The most active compound in the series, (3-bromo-4-methoxyphenyl)methylidenehydrazide of 5-(4-chlorophenyl)-4-(methylphenyl)- 1,2,4-triazole-3-ylthioglycolic acid, or F66, demonstrated concentration-dependent inhibition of the plaque formation, resulting in a cell protection effect estimate of EC50?1 µM (see Figure 2).

Fig. 2: CPE of A/Wisconsin/67/2005(H3N2) virus in the presence of different concentrations of F66.

The smooth line represents Michaelis-Menten estimate with Michaelis constant (EC50) of 250nM.

The cytotoxicity measurement in HELA cell culture demonstrated no signs of cytotoxicity at concentrations up to 50 µM. In fact, we were not able to reach higher concentrations because of solubility issues, which indicated that the selectivity index (SI) well exceeded 50.

In another experiment against A/Aichi/2/68(H3N2), F66 also exhibited activity in the micromolar range. The hemagglutination activity (HA) inhibition efficacy of F66 exceeded the efficacy of rimantadine, which was used as a positive control in the same experiment (see Table 2).

Compound Dosage HA inhibition
F66 15 µM 75%
F66 10 µM 50%
F66 5 µM 50%
Rimantadine 50 mg/l 75%

Table 2. Efficacy of the F66 compound in-vitro against A/Aichi/2/68 (H3N2).

The cell-protection effect of F66 has been demonstrated in single-concentration measurements against a virus panel comprised of the virus strains targeted by the influenza vaccine for the season 2010-2011: B/Brisbane60/2008, A/NewCaledonia/20/99(H1N1), A/California/07/2009(H1N1) and A/Perth/16/09(H2N2). The results of the experiments are summarized in Table 3. F66 demonstrated a fair degree of cell protection. The EC50 can be estimated at approximately 1 µM against A/California/07/2009 (H1N1), approximately 5 µM against A/New Caledonia/20/99 (H1N1), and slightly higher than 5 µM against A/Perth/16/09(H3N2). There was no effect against the B/Brisbane60/2008 virus. All of the strains used in the experiment were sensitive to oseltamivir. A/California/07/2009 is a rimantadine-resistant strain.

Virus B/Brisbane60/2008 A/California/07/2009
A/New Caledonia/20/99
Drug Numbers of plagues in duplicate wells
F66, 5 µM 91%
71% (72%;70%)
Oseltamivir 0%
0% (0%;0%) 0%
0% (0%;0%)
No treatment 100% 100% 100% 100%

Table 3. Plaque formation by various influenza virus strains measured in the presence of F66 (5 µM) relative to an untreated control. The pair of values in brackets represents the results of the two successive measurements.

In a separate study, we measured the efficacy of the compounds from the F66 series at 5 µM against A/PuertoRico/8/34(H1N1). This strain is remantadine resistant, and thus, we were required to use a compound with a different MoA, such as ribavirin, as a positive control. We used the immunostaining of fixed cells assay to detect the virus NP and evaluate the efficacy of the compounds. All four hits in the series, including F66 itself, were found to be more active than ribavirin under the same experimental conditions and at the same concentration.

In a similar experiment, we tested the efficacy of F66 against the A/Chicken/Kurgan/2005(H5N1) virus in a hemagglutination assay. The virus protein activity inhibition reached 50% at an F66 concentration range of 1 to 5 µM. A/Chicken/Kurgan/2005(H5N1) is the most virulent influenza strain that was obtainable for the experiments.

Given the promising in vitro results and apparent lack of toxicity, we tested F66 in influenza challenge experiments in mice. The animals were infected with the A/Aichi/02/68 (H3N2) virus corresponding to the 1-3 LD100 dosage (10-30 LD50). The substance was administered peroral (p.o.) either a few hours after the infection and subsequently once a day for the duration of the experiment (the emergency prophylaxis protocol) or once a day beginning 24 hours after the infection (the treatment protocol). We tested the two doses, 7 mg/kg and 22.5 mg/kg, of F66 daily. Two more well-known antiviral drugs, rimantadine and oseltamivir, were also used as positive controls. In parallel, we evaluated one animal group treated with a placebo (no drug) as a negative control and one uninfected group treated with F66 at 150 mg/kg daily as the toxicity control. The results of the study are presented below in Table 4.

Protocol Treatment
Emergency prophylaxis protocol
Drug Survivinganimals, % Delta, days Survivinganimals, % Delta, days
F66, 7 mg/kg 21.1% 1.1 40% 2.7
F66, 22 mg/kg 31.6% 1.8 40% 2.7
Tamiflu, 25 mg/kg 94.7% 6.5 100% 5.8
Rimantadine, 30 mg/kg 45% 2.9 60% 3.9

Table 4. Efficacy of F66 against A/Aichi/02/68(H3N2) challenge in-vivo.The values indicate the average life expectancy increase in the group relative to the infected control.

The treatment protocol study of F66 demonstrated the animal protection effect of F66 in a dose-dependent manner. The maximum level of protection of 31.6% was achieved when the animals were receiving 22.5 mg/kg of the substance daily. The emergency prophylaxis protocol study demonstrated that F66 has similar and moderate protective efficacies at each of the doses of 7.5 and 22.5 mg/kg. The animals in the toxicity control group tolerated the compound well.


Both docking cavities identified by the floodfill algorithm are in close vicinity to and in close contact with the earlier suggested NP epitope sequences studied in a previous study [21] (see Figure 1). Site 2 occupies an arginine-rich region of the RNA-binding groove and forms extensive contacts with another NP-conserved epitope sequence R174-K184, which makes the site very attractive for targeting NP. Meanwhile, Site 1 has only minor contact with another epitope sequence, I265-S274. Surprisingly, both Site 1 and Site 2 are distant from the proposed binding site of the recent NP inhibitor nucleozin, which is marked by the residue Y289 [17] on Figure 1.

The demonstrated level of animal protection in treatments with F66 reached 40%, which is comparable to the protection provided by treatment with rimantadine in a similar experiment (60%). In the emergency prophylaxis study, the compound application increased the lifetime of the mice by 2.7 days. The results are inferior to the protection level achieved with Tamiflu, which is not surprising given that Tamiflu exhibited an efficacy that was more than two orders of magnitude higher in vitro.

The activity of F66 was first predicted in silico as a result of a docking model against influenza NP. Currently, we have limited knowledge regarding the MoA of the efficacy of the compound from the measurements in the compound series against the two different amantadine-resistant strains, A/Puerto Rico/8/34(H1N1) and A/California/07/2009(H1N1). In both of these cases, the activity of F66 was even better than that observed in the measurements against other influenza strains. This lets us conclude that the F66 MoA most likely does not involve an interaction with the target of amantadine, M2.

F66 was effective against a number of influenza A strains but was considerably less effective against influenza B. This result could be explained if the target of the compound is actually NP. The active site that we selected for the docking is only 75% conserved between the influenza A and B viruses, and the difference in the experimental efficiency may thus be compatible with the suspected MoA of the compounds.

Among the influenza A strains, the least in vitro activity was demonstrated against H3N2 variations of the virus. Moreover, among the H3N2 strains we utilized in vitro, the efficacy difference between A/Wisconsin/67/2005 and A/Perth/16/09 was the largest. Unfortunately, the NP of A/Perth/16/09 has not yet been sequenced, and our activity data thus cannot be used to elucidate the MoA. To confirm NP as the target of the compounds, we need to perform a mutant escape assay and/or a direct binding assay using surface plasmon resonance (SPR) with a recombinant version of the protein.

The calculations show that F66 binds to Site 2, which is responsible for RNA binding. The binding site is fairly conserved, and, if proven, our approach may provide an alternate strategy for influenza NP inhibition. Despite its clear medicinal chemistry deficiencies, the compound was well tolerated and active in vivo. Although results reported here are very preliminary and a lot more further steps in lead optimization should be performed, we believe that the derivatives of F66 may eventually become effective oral agents for the treatment of influenza virus infections.

Materials and methods

Virtual compounds library design

The molecules for the virtual screening were taken from the libraries provided by Alinda, Asinex, Chemdiv, Enamine and IBS. Together, they contain more than 3 million readily synthesized molecules. To reduce the calculations volume, the compounds library was clustered according to a previous study [22] . The measure of dissimilarity (the “distance”) between the molecules was determined by Tanimoto and similarly calculated with Daylight fingerprints [23] . The clustering parameters were chosen so that a typical cluster was comprised mostly of chemically similar compounds. We chose one representative molecule from each of the ~73,000 clusters with ten or more molecules. The inclusion of smaller clusters does not lead to more coverage of the libraries by the clustered molecules, and at the same time, it dramatically increases the total number of clusters. As a result, our cluster centroids library covered approximately 53% of the total number of molecules.

Virtual screening

Computational techniques generally involve a balance between speed and accuracy [24] . To overcome the bottle-neck effect posed by the computationally demanding molecular dynamics (MD)-based methods, we used a previously described combination [18][25] of the two approaches: a fast molecular docking for the generation of binding poses using a simplified potential of mean force (PMF, see e.g. [26] ) followed by the molecular dynamics (MD) simulations to rank the ligands according to their binding affinities calculated using the linear interaction energies (LIE) approximation [27] . The MD simulations were performed in an AMBER/GAFF [28][29] force field combined with a continuous model of the aqueous environment [30][31] . The compounds with the top predicted binding affinities of greater than 10 µM were selected for subsequent in vitro tests.

Influenza virus strains

The experiments with the virus strains A/Wisconsin/67/2005(H3N2), A/New Caledonia/20/1999(H1N1), A/Perth/16/2009(H3N2), A/California/07/2009 and B/Brisbane/60/2008) were performed at Virapur, US. The virus samples were taken from the Virapur collection. The strain A/Puerto Rico/8/34 (H1N1) was received from the collection of the Research Institute of influenza, Russian Academy of Medical Sciences, St. Petersburg, Russia. The virus strain A/Aichi/2/68 (H3N2) was taken from a collection of Branch of FGU “48 The Central Scientific Research Institute of the Russian Defense Ministry”, Russia. The virus strain A/Chicken/Kurgan/2005 (H5N1) was received from the Institute of Poliomyelitis and Viral Encephalitis, Russia.

Plaque assay

Incubation, infection and staining: In total, 96-well dishes were seeded 20 hours previously with a known concentration of MDCK cells. A 100µl volume of a drug dilution was carefully placed on each monolayer in quadruplicate and incubated on the monolayers for one hour. All of the virus stocks had been previously aliquoted and tested for the number of plaque-forming units per volume. Drug solutions were incubated on the cells for one hour, andapproximately 50-75 plaque-forming units of each influenza virus were subsequently added to each well. The virus was allowed to adsorb to the monolayer for a defined time. Following the incubation of the virus, the virus inoculum was suctioned off of the monolayer, and the appropriate dilute drug solution, to which 0.4% agarose had been added, was carefully overlaid on the appropriate monolayer.

Cultures infected with viruses A/Wisconsin/67/2005, A/New Caledonia/20/1999 and A/Perth/16/2009 were incubated at 34 °C for 48 hours and monitored for plaque size. The monolayers were observed under the inverted microscope and graded for the presence of the influenza cytopathic effect (CPE). Metabolically active cells were stained with 3-[4,5- dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide (MTT).

Cultures infected with the viruses B/Brisbane/60/2008 and A/California/07/2009 were incubated at 34°C for 68 hours and monitored for plaque size. Cultures were stained to easily visualize plaques after 68-72 hours of virus incubation. Metabolically active cells were stained with MTT.

Counting plaques: Plaques that consist of infected cells originating from one infectious virus are very metabolically active in the second and third day of infection. Stained plaques were counted manually, and the number of plaques observed in the duplicates of each drug was recorded. In some instances, plaque size was reduced with drug treatment, but the number of plaques was not reduced.

Virus proteins detection by immunostaining of fixed cells.

MDCK cells (ATCC CCL 34) were used. The compounds under investigation were dissolved in minimum essential medium (MEM) to a concentration of 5 µM. The solutions were applied to MDCK cells and incubated for 1 hour at 37 °C followed by infection with the influenza virus in two doses (1 and 10 EID50/0.1 ml) in the medium. Control wells did not contain the compounds. Ribavirin was used as a positive control. Viruses were cultivated at 37 °C for 24 hours followed by an ELISA assay.

For this purpose, cells were washed three times with phosphate-buffered saline (PBS) of pH 7.6, fixed in 80% cold acetone for 10 min at -10 °C, washed for 5 min with distilled water and air dried. Monoclonal antibodies against influenza virus NP protein were diluted to 5 µg/ml in 5% fat-free milk in PBS and incubated with fixed cells for 1 hour at 37 °C. After washing three times with PBS, cells were incubated with diluted 1:10,000 goat anti-mouse horseradish peroxidase-conjugated antibodies (Sigma, St. Louis MO) for 1 hour at 37 °C. Unbound antibodies were removed by washing three times in PBS, and a color reaction was developed by adding 3,3′,5,5′-tetramethylbenzidine and 0.03% H2O2 in a 0.1 M acetate buffer of pH 5.0. The reaction was stopped with 2N H2SO4, and the optical density of the wells was measured at a wavelength of 450 nm. The ELISA reaction was considered positive if the optical density (OD450) exceeded twice the value of the corresponding uninfected control cells. The compounds were considered to be active if OD450 was two or more times lower than that in the control.

Hemagglutination assay

MDCK cells were used. Dilutions of the compounds under investigation were prepared in MEM. The solutions were applied to MDCK cells followed by infection with influenza virus ?/Aichi/02/68 (H3N2) at 0.1-0.01 TCID50 and Incubated for 24-72 h at 37 °C until the development of a cytopathic effect in the control. The level of hemagglutinin activity in the lysate of the infected cells was determined using standard procedures [32].

Anti-influenza activity in animal models

Albino mice weighing 10-12 g were obtained from the 48 MD Russia vivarium research center. The drug was administered in two doses, 7 and 22.5 mg/kg bw. Each group of mice included 20 animals. The virus A/Aichi/2/68 (H3N2) was administered intranasally at 10-30 50% lethal influenza virus doses (LD50) under slight ether anesthesia. The following treatment scheme was used: 0.2 mL of a drug water solution was administered p.o. every 24 hours after infection for 6 days. The following emergency prophylaxis scheme was used: a 0.2 mL drug water solution was administered i.g. 1-2 hours after infection, and every 24 hours afterwards for 6 days. The animals were observed for 14 days, and deaths in the control and experimental groups were reported every day. Based on these data, the degree of animal protection was calculated.


The authors wish to thank Dr A. Kadushkin (Quantum Pharmaceuticals) for assistance in resolving numerous chemical issues, Dr. S. Borisevich (FGU “Virology Center”, Institute 48 MD, Russia) for the in vivo work and the Virapur (US) team for the in-vitro efficacy studies.

Funding information

The work is funded by Quantum Pharmaceuticals (Russian Federation).

Competing interests

The authors are employed by Quantum Pharmaceuticals. The Russian Patent Office application covering the reported activity of 3-mercapto-1,2,4-triazoles derivatives against influenza has been filed.

]]> 0
Prevalence and risk factors for swine influenza virus infection in the English pig population Fri, 11 Feb 2011 00:53:22 +0000


Influenza A viruses are the cause of considerable morbidity and mortality in humans and animals worldwide. Pigs have an integral role in the ecology of these viruses, as they are susceptible to infection with both human and avian adapted strains, as well as swine adapted strains. Pigs have therefore been considered both as a hypothetical ‘mixing vessel’ for the production of new strains of virus and as an intermediary by which virus transmission between species can occur [1]. Influenza virus infection is very common in pigs, and influenza viruses have been identified as one of the most commonly isolated pathogen from outbreaks of acute swine respiratory disease [2].

Disease associated with swine influenza virus (SIV) infection of pigs has been recognised as an important cause of economic loss to pig farmers [3] [4]. There is also concern over the potential human health risks associated with porcine infection. Humans working in close contact with pigs have been reported to have increased likelihood of seropositivity to SIVs [5] [6] [7], although clinical disease and continued transmission of swine adapted strains amongst humans is rarely investigated. Despite this, pigs are proposed to have been involved in the production of the pandemic strain of H1N1 influenza virus in 2009 [8] [9] [10], with all of the progenitor viral genes identified as circulating in pigs for over ten years prior to transmission to humans [11].

Since the first identification of influenza virus infection of British pigs in 1940 [12], the recorded levels of different influenza virus strains in pigs in the country have been highly variable. Over the last 25 years, new pig-adapted strains of H1N1, H1N2 and H3N2 viruses have circulated in the British pig population [13], with some differences in epidemiology from those viruses observed in continental Europe. Of the three subtypes of influenza virus mentioned above (which comprise numerous strains), two strains (‘avian-like H1N1’ and a human-avian reassortant H1N2) are known to have been circulating in the UK pig population in recent years, and the new pandemic H1N1 2009 (pH1N109) strain has been isolated from a number of English pig herds since November 2009 [14], although the total level of exposure or infection of pigs with this strain throughout the country is currently unknown.

Since 1991, the Veterinary Laboratories Agency (VLA) has conducted passive surveillance for SIVs in the UK through virological testing of pigs with respiratory disease [13] . Although these data are useful in the identification of general trends, they cannot accurately estimate the prevalence of infection, due to the lack of denominator data and under-reporting. The most recent structured survey of SIV infection in British pigs was performed in the early 1990s [15] . Haemagglutination inhibition was used to detect antibodies to a panel of influenza virus strains in a sample of 2,000 sows randomly selected at slaughter in England and Wales. No viruses of the H1N2 subtype were included in this study as this subtype had not been identified in Britain by this time. Approximately 60% of pigs were found to be seropositive to at least one strain of influenza A virus, of which approximately 40% were seropositive to SIVs only, 15% to human influenza viruses only, and 40% to both swine and human viruses.

Despite concerns regarding transmission of SIVs to humans, few studies have investigated risk factors in pigs for infection with these viruses [16] [17] [18] [19] [20]. Farm size and the density of pigs in the area around the farm have repeatedly been identified as risk factors for influenza seropositivity of farms. It is also well recognised that the principal risk factor for initial introduction of SIV is movement of infected pigs onto a farm [21] and continued virus presence on farms has been associated with certain herd management systems [21] [22]. Risk factors for swine respiratory disease in general have been reviewed [23], and a scarcity of analytic investigations into these diseases has been identified.

Here, results are presented from a study conducted to quantify the level of recent exposure to SIVs of pigs on English farms and to investigate risk factors for farm infection.

Materials and methods

A cross sectional study was conducted between April 2008 and April 2009, as part of a project investigating postweaning multisystemic wasting disease (PMWS) in English farrow-to-finish pig herds. Details of data collection and blood samples are described elsewhere [24] [25]. In total, 146 farms agreed to participate. Blood samples were taken from 20 pigs per farm and tested for the presence of antibodies directed against avian-like H1N1, H1N2 and human-like H3N2 strains of SIV using haemagglutination inhibition (HI) tests. The number of samples per farm for which results were available ranged from one to 24. Management data were collected through a pretested questionnaire, and data relating to the total number of pigs within a 10km radius of the farm postcode were taken from the 2004 UK agricultural census (

For the purposes of this study a reciprocal antibody titre of greater than or equal to 40 (i.e. a titre of 1/40 from serial dilution) in at least one growing or finishing pig (i.e. not from sows or weaners) was selected as indicating recent exposure of the herd to the virus. Animals for which age was not recorded (n=41) were excluded from the study, and three farms for which samples were available for fewer than five animals (n=3) were excluded from all farm-level analyses. Questionnaire and farm inspection data were entered into a relational database (Microsoft Access, Microsoft corporation), before being transferred to Stata 10.1 (Statacorp, Texas) for further data analysis. Due to the large number of variables in the dataset, knowledge of pig production systems and plausible risk factors for virus presence was used to remove variables considered highly unlikely to be associated with swine influenza status. To reduce the number of variables to be included in the formal statistical analysis stages, a three-stage process was adopted. In the first stage of this process, univariable chi-squared, Fisher’s exact tests and Mann-Whitney U tests were used to identify possible associations between variables and farm status, using a p-value of 0.25 or less to identify variables to include in the multivariable analysis.

The second stage of analysis involved grouping the resultant variables into one of ten groups relating to farm management, spatial or temporal characteristics. All variables within each group were then entered together into a multivariable logistic regression model. Ordinal and categorical variables were included as indicator variables, and a backward selection process was used to identify any associations with farm swine influenza status, using a likelihood ratio test p-value of 0.1 or less to suggest a statistical association. Ordinal variables were also tested for a linear trend in the change in the log odds of positivity across categories.

Variable groupings were then disregarded for the final stage of analysis, which first involved the placement of eight selected variables of interest (deemed a priori to be potential confounders or risk factors of particular interest, as detailed in Table 1) into a multivariable logistic regression model. In cases where a linear trend was considered plausible and shown to fit the data (as identified using a likelihood ratio test), the variable was entered into the final model as such. Other variables identified in the previous stage were then entered into the model using a stepwise forward selection process, starting with variables found to have the lowest likelihood ratio test p-value in the previous stage of analysis. Variables with a likelihood ratio p-value of more than 0.05, and with no evidence of a confounding effect on other associations (suggested by there being no biologically plausible confounding effect, or a change in the regression coefficient upon removal of the variable of less than 50%) were excluded from the final model. A stepwise backward selection process was then used to remove variables of a priori interest, according to the same criteria. Finally, an assessment was made of any plausible interaction between the remaining variables by adding interaction terms to the model and using the likelihood ratio test, with a p-value of 0.1 or less to suggest possible interaction.

Table 1. Variables of a priori interest which were forced into the model at the start of the third stage of analysis.

Name of variable Reason for inclusion in model
Total number of pigs in the proximity of the farm* Identified as a risk factor in previous studies [16] [17] [18] [19] .
Herd size Identified as a risk factor in previous studies [16] [18] [19] . Also considered to be a likely confounder.
Maximum number of different age groups kept on same site Proposed risk factor in a recent study [2] , and also considered to be a proxy for farm type (e.g. farrow to finish, weaner to grower etc.).
Whether herd reported moving gilts and/or boars on in the last year Movement of animals into the herd has been identified as a risk factor in a number of reports and publications [21] [23] [26] .
Accommodation types in place in herd (indoor / outdoor / straw yards) Considered to have an effect on the transmission routes of influenza viruses between humans [27] – would be plausible to see a similar effect in pigs
Stocking density of finishers Close contact between pigs and number of pigs per pen have been proposed as a risk factor in other reports [17] [26] . Finishers selected as another study has found that seroconversion to SIVs often occurs during the finishing period [2] .

* Although this may not necessarily relate to the location of the pig sheds themselves, it was selected as a broad indicator of the density of pig production in the area in question.


In total, 2,780 sera derived from 146 farms were tested using HI (of which, 143 farms were eligible for the farm-level analysis). Using a titre of 1/40 or higher as a cut-off, Table 2 summarises the numbers and percentages of seropositive animals according to age group, aggregated across farms. It should be noted here that these do not represent the individual animal seroprevalence due to the non-random sampling procedure used within farms. As such, no account has been taken of clustering of seropositivity within farms, and confidence intervals for the age specific estimates are not given.

Table 2. Numbers of seropositive animals amongst those sampled, according to age group.

Age group H1N1 H1N2 H3N2 At least one subtype Total samples
Weaners 14 (2%) 55 (8%) 2 (0%) 66 (9%) 711
Growers 22 (2%) 62 (7%) 0 (0%) 79 (9%) 917
Finishers 34 (4%) 67 (8%) 0 (0%) 92 (11%) 864
Sows 49 (19%) 74 (29%) 2 (1%) 97 (38%) 253

The total number of pigs present on each farm ranged from 47 to 18,000, with a median of 2,501. The number of growers or finishers tested per farm included in the study ranged from five to 23 (median = 12). The percentage of sampled growers or finishers which tested positive amongst those farms containing positive pigs ranged from 4 to 100%, as shown in Table 3.

Table 3 also shows the numbers and proportions of all 143 farms which were classified as seropositive for the three swine influenza A virus subtypes, based on results for pigs of these age groups. A total of 19 farms (13%) tested positive for both H1N1 and H1N2, and no farms were classified as seropositive for the H3N2 subtype of SIV.

Table 3. Farm-level seroprevalence estimates for different subtypes (excluding weaners and sows).

Subtype Maximum within herd ‘prevalence’* Median within herd ‘prevalence’* Farms positive (%) 95% confidence interval
H1N1 0.50 0.10 30 (21%) 14 -28%
H1N2 1.00 0.11 64 (45%) 37 -53%
H3N2 0.00 0.00 0
At least one subtype 1.00 0.15 75 (52%) 44 – 61%

* This measure does not relate to the prevalence as such, but rather the proportion of sampled growers and finishers testing positive amongst those farms containing positive pigs

Univariable analysis identified a total of 39 variables potentially associated with farm swine influenza status (Table 4). The second stage, which involved multivariable analysis of these variables within each of their defined groups, identified a total of 21 variables associated with farm swine influenza status. Of these, the practice of separating boars upon entry to the farm was excluded from further analysis due to the large number of missing values, leaving a total of 18 variables, as shown in Table 4.

Table 4. Variables associated with farm status upon univariable (all variables; p<0.25) and multivariable (variables in bold italic; p<0.10) analysis.

Variable grouping Variables associated in the first two stages of analysis
Pig farm characteristics Total number of pigs on farm
Number of farm sites
Sick pen management Number of sick pens
Sick pens in separate building to healthy pigs
Sick pens continually occupied
Mixing and contact between pigs on farm Use of an all in, all out (AIAO) system
Pigs stay in the same building
Number of litters mixed together
Cross fostering
Growers and finishers mixed togetherStocking density (for weaners and for finishers )
Use of indoor accommodation for pigs
Use of straw yards for pigs
Use of outdoor accommodation for pigs
Ventilation quality
Introduction of new pigs Gilts separated upon entry to the farm
Boars separated upon entry to the farm
Feeding/water By-product fed to pigs
Water location
Pigs per feed space (for weaners and for growers)Pigs per water space (for growers and for finishers)*
Stress Age of weaning
Number of moves between four and 14 weeks
Duration of rest from light
Tail docking of piglets performed
Castration of piglets performed
Contact with people Number of new farm workers in recent years
Number of other farmers visiting farm per year
Number of official visitors to farm per year
Use of protective clothing by visitors
Requirement for visitors to be pig clean
Contact with other animals Presence of poultry on farm
Presence of cattle, sheep or horses on farm
Farmer knowledge Years of stockman experience with pigsStockman participation in pig events
Date of sampling Date of farm visit

* A cut-off of 18 pigs per water space was the only category of this variable significantly associated with farm status

The final logistic regression model is shown in Table 5. This identified an increased likelihood of farm seropositivity for farms sampled in autumn, winter or spring months, for farms with more than 18 pigs per water space, and for farms rearing pigs indoors. Decreased likelihood of positivity was found for farms using straw yards (Table 5). The model fit, as identified using the Pearson chi-square test with 10 degrees of freedom, was 0.94. No evidence was found of any interaction between variables in the final model.

Table 5. Associations with farm seropositivity, as identified in the final multivariable model.

Variable Odds Ratio 95% confidence interval p-value
Pig access to water
18finishers or less per water space 1.00
More than 18 finishers per water space 5.22 1.57 – 17.43 0.01
Season of sampling
Pigs sampled in the Summer months (July-September) 1.00
Pigs sampled at other times of the year 2.54 1.09 – 5.95 0.03
Housing type
No pigs kept indoors 1.00
At least some pigs kept indoors 3.59 1.11 – 11.57 0.03
No pigs kept in straw yards 1.00
At least some pigs kept in straw yards 0.30 0.11 – 0.82 0.02


This study has, for the first time in recent years, provided insight into the seroepidemiology of swine influenza in the UK using a large sample set. Over half (52%) of the farms had evidence of ongoing virus circulation or recent virus introduction, with seropositivity in growing pigs to the H1N2 subtype most commonly identified (45% of all farms). No farms showed evidence of antibody to H3N2 in young pigs, which is consistent with passive surveillance, which has not identified this strain in the UK since 1998. There was strong evidence that farms visited in the summer months had a lower likelihood of seropositivity than those visited at other times. Regarding pig management characteristics, farms with large numbers of finishers per water space had a higher likelihood of positivity than those with fewer pigs per water space; farms containing pigs kept indoors had a higher likelihood of seropositivity than those which did not; and farms containing pigs in straw yards had a lower likelihood of seropositivity than those which did not.

A major limitation of most observational studies is selection bias. In the current study, only farrow to finish farms were included, which makes extrapolation to other farm types problematic, particularly as farrow-to-finish farms have features more likely to allow persistence of influenza virus than farms with single age groups or removal of growing pigs at weaning. It should be noted however that there was considerable variation in the management practices employed on different farms, and even between different sites on individual farms, meaning that other managemental practices were represented. As participating farms were predominantly self-selected through a vaccination scheme for PMWS, an overrepresentation of farms with a tendency towards overall poor health cannot be excluded, which may have lead to an overestimation of the true country-wide seroprevalence. Efforts were made to include high-health farms without PMWS problems (approximately 10% of farms).

An additional potential issue is that of misclassification of farm swine influenza status. The current case definition only related to infection of growers and finishers, in an attempt to identify farms with recent influenza virus circulation without resorting to more expensive virological methods. It was considered that sow and weaner serology results could not distinguish recent from historic infection, as antibodies against SIVs have been reported to persist up to 28 months in sows after removal of virus [28] and maternal derived immunity in piglets can persist up to 4 months of age [29], although an average half life of 12 days has been estimated [30]. This method would not identify farms with recent infections amongst sows (and/or weaners) only, and could result in underestimation of the true proportion of farms with virus circulation. This would be expected to be counteracted to some degree through the aggregate testing approach adopted (requiring only one seropositive animal in order to classify a herd as positive). Using the formula given in Dohoo, Martin et al. [31] (and assuming a population of infinite size), with 95% confidence, a sample of 12 pigs per farm would be expected to detect at least one infected animal assuming a prevalence of 22% or more, and a sample of five pigs per farm (the lowest number of animals tested amongst those farms included in the study) would be expected to detect at least one infected animal if the prevalence was 50% or more. Due to the generally high transmissibility of influenza viruses, it is expected that a large proportion of animals would have evidence of exposure following virus entry to a herd, provided that antibodies persist for a suitable length of time. Despite this, further investigation found weak evidence that farms testing more animals were more likely to test positive (Mann Whitney U test p=0.08).

The strains of virus used for serology in the current investigation are known to be representative of contemporary SIVs in England, thus the HI tests used are expected to detect recent infection with any of these strains. However, the possibility of infection with other, less reactive, virus strains cannot be excluded.

Antibodies to the H1N2 strain were most commonly identified in the current study, in contrast to the findings of the passive surveillance programme in the UK, which has regularly detected avian-like H1N1 virus more frequently than H1N2 until 2009 (Ian Brown, personal communication). It is difficult to compare these figures due to differences in the respective study populations. However, differences in the severity of clinical signs between the strains may result in selection bias in passive surveillance. In a field setting, variation in pathogenicity is likely to occur due to a number of factors, including virus strain. Both avian-like swine H1N1 and H1N2 have been described as being more clinically severe than classical H1N1 or human-like swine H3N2 [21] [32] [33] [34], however there are no published scientific reports directly comparing the pathogenicity of the two subtypes.

The current study is an investigation of risk factors for recent exposure of rearing pigs in pig herds to influenza viruses. This could result from either continued circulation of the virus within the herd or from recent virus introduction into the herd. Risk factors for these two events may differ, and therefore by combining both in the outcome, information regarding specific associations will be lost. However, the lack of earlier sera from the farms did not allow an evaluation of the historical swine influenza status. In the case of farms with longstanding swine influenza problems, information bias regarding farm management practices may be present, resulting from adoption of disease management processes since initial virus incursion in an attempt to control disease. No distinction was made in this study between seropositivity to H1N1 or H1N2 virus strains, as has been performed in a number of previous studies [2] [17] [19]. Although it is possible that differences exist regarding the epidemiology of the different viral subtypes, given the complex interactions between different respiratory pathogens within farms, the possible cross protective effects of different strains of influenza viruses [34] [35], and the fact that control measures for SI do not differ according to the virus strain, it was considered reasonable to assess status based on serology to both strains combined.

Manual techniques were adopted in order to reduce the number of variables prior to formal data analysis. Although this approach was based on an understanding of English pig herds and the dynamics of disease within these, it is subjective in nature, and therefore could result in bias. Other statistical techniques which could have been used for the identification of different pig herd types include scoring systems [36], multiple correspondence analysis [37], multiple factor analysis and hierarchal cluster analysis [38] [39], and multiblock redundancy analysis. These approaches were not used here as the presence of missing data for some of the variables would have led to the exclusion of a substantial number of farms.

The finding that farms visited during the summer months had a lower likelihood of positivity is consistent with the findings of others. Seasonality has been recognised as a feature of clinical swine influenza, with an increase in reported outbreaks of disease during the colder and less sunny months of the year [40] [41] and increased transmission of influenza A viruses in cold conditions [42]. Additionally, natural ventilation may be reduced in indoor herds in colder conditions, in an attempt to maintain an ambient temperature for the pigs, which may encourage virus circulation. Although it has also been shown that the virus itself can persist within herds throughout the year following introduction [43], due to the constant availability of susceptible pigs [26] [44], increased virus survival and transmission during colder months may result in a greater within-herd prevalence, and therefore a greater likelihood of detection during this time.

Pig access to water was strongly associated with swine influenza status, with farms with large numbers of finisher pigs in relation to water sources having a higher likelihood of positivity. This finding may relate to both pig-pig contact and resultant social stress factors. A large number of pigs per water source would be expected to increase both direct and indirect contact between pigs, which may aid transmission of influenza viruses. Reduced access to water may also increase stress amongst the animals, which has recognised immunosuppressive effects and may therefore also encourage persistence of virus within the herd. The finding of this association amongst finishers in particular may be a result of the criteria used for classifying a farm as positive for virus circulation. The possibility of this association being a result of confounding by other variables, cannot be excluded. However, plausible confounding variables for this association such as stocking density were included in the final model and not found to be associated with herd status, which points away from a confounding effect.

The finding that farms rearing pigs in indoor pen-based systems have an increased likelihood of positivity may relate to higher stocking densities and lower levels of ventilation, which are recognised risk factors for transmission of respiratory pathogens between pigs [23] [45], although a variety of pen-based systems were included in the definition here (combining those with and without individual kennels). Confined animal feeding operations (‘factory farms’) have been proposed to be involved in the amplification of influenza viruses in a recent report using mathematical modelling techniques [46], although no comparison was made in this report with other farming systems. Increased stocking density was found to be strongly associated with keeping pigs indoors (chi-square p<0.001) in the current study, with 100% (n=22) of herds with a ‘high’ stocking density amongst finisher pig containing animals kept indoors (27% of these herds also contained pigs in straw yards and 9% contained pigs kept outdoors). Although stocking density itself was not found to be associated with the circulation of influenza viruses in the final model, a study in Belgium has found that the number of pigs per pen was positively associated with swine influenza H3N2 seropositivity [17]. Finally, it should also be noted that there was a strong association between the use of indoor pen-based systems and outdoor systems, with only 14% (n=16) of farms rearing pigs indoors also keeping pigs outdoors (chi-square p<0.001). As such, it is plausible that there is collinearity between these variables, and the possibility of an additional protective effect of keeping pigs outdoors cannot be excluded.

The protective effect of keeping pigs in straw yards has not been found in other studies. Further investigation of this variable found that only 24% (n=10) of these farms used straw yards for all three age groups, which demonstrates the variability in management practices amongst those farms classified as ‘straw yards’ (conversely, 62% (n=68) of farms keeping pigs indoors kept all three age groups indoors). This ‘mixing’ of exposure makes it difficult to attribute a direct effect of straw yards. Pigs kept in straw yards would be expected to receive some degree of shelter from inclement climatic conditions, whilst still being kept at relatively low stocking densities. This finding warrants further investigation.

Introduction of virus onto a farm is most commonly associated with the introduction of an infected animal [19] [21] [23] [26]. However, no evidence was found of an association between recent movement of animals into the herd and farm status. As well as limitations in the study regarding misclassification, as described above, this lack of association may be due to many farms experiencing persistent infection rather than recent introduction of infection through infected animals, thus masking a potential association. This would also explain the lack of observed association with the frequency of movement of non-farm workers onto farms and into pig buildings, which was postulated to be a risk factor for virus entry a priori , and has been observed to be a risk factor for influenza virus presence in Malaysian pig farms [19].

No association was observed between herd size and/or pig density in the region and farm status in the final model, which were identified as risk factors for infection in a number of other studies [16] [17] [18] [19]. Plausible reasons for an association between herd size and pig disease have been detailed in a recent review [47], and include biological, managemental and diagnostic mechanisms, whereas little is currently known regarding the spatial pattern of influenza virus spread between herds in a geographical area.


This study is one of the first in recent years to estimate the seroprevalence of SIVs amongst English pig herds, and is the first to identify risk factors for the presence of SIVs in these herds. Further studies are required to confirm the findings. The potential human health risks associated with SIVs are well recognised [11], as evidenced by the recent emergence of pandemic H1N1 (2009) virus in humans, which was found to contain a genetic component from swine adapted viruses. Although serological data is useful for the investigation of SIVs in circulation in pig populations, virological surveillance is required in order to monitor virus evolution in this species. The passive virological surveillance system for SIVs currently in place in the UK is likely to be subject to considerable selection bias as it depends on sampling diseased pigs. Consideration should therefore be given to the implementation of structured serological surveys to complement this surveillance stream, possibly targeting surveillance to high-risk farms based on risk factors identified using analytic studies such as the current one. In order to facilitate these investigations, further work is required in order to accurately classify English pig herds with respect to farm practices.

Competing interests

The authors have declared that no competing interests exist.

]]> 0
Projection of seasonal influenza severity from sequence and serological data Mon, 06 Dec 2010 08:13:43 +0000


Kilbourne has described influenza as “…an unvarying disease (three-day fever) – caused by a varying virus” 1. In 1934, Torrens noted an apparent paradox: although influenza survivors of the epidemic of 1890 largely seemed to escape the 1918 epidemic, annual influenza attacks were common. He proposed that influenza was caused by a “pleomorphic virus” that “… may exhibit mutation from one type to another…” to explain this puzzling pattern of post-influenzal immunity 2. The challenge this would represent for vaccination became clear by 1947 when a vaccine that was effective in the 1943-1944 and 1945-1946 seasons had no effect for the 1946-1947 influenza season 3. Salk and Suriano, in a study of the 1946-1947 strain, raised the possibility that “… this strain is a ‘mutation’ and that this might mean that we will always be immunizing against a disease that occurred the year before” although they remained hopeful that there would be a limited number of antigenic varieties 4.

The variation of influenza by mutation, i.e., antigenic drift, was soon recognized as the basis for the difficulty of predicting epidemics 5, and better understanding the quantitative impact of drift on seasonal epidemics has become an important goal in influenza research. Given that varying levels of cross-protection by earlier viruses are likely to be an important factor in epidemic severity, one experimental strategy has been to immunize subjects with known strains and determine susceptibility to a challenge with another known strain. McLaren et al. 6 vaccinated ferrets with A/Hong Kong/1/68 or A/England/42/72, later challenged with a A/England/42/72-like virus, and indeed found greater protection by the matching strain. A similar, more extensive study has been performed on horses using equine influenza virus in order to estimate the parameters of an epidemiological model of outbreak risk as a function of immune escape 7. In a study with human volunteers immunized with one of four influenza strains isolated in 1968, 1972, 1973, or 1974, and then challenged with a 1974 strain, Potter et al. 8 found that susceptibility increased linearly with the number of years separating the challenge and vaccine strains. Consistent results were obtained in a similar study by Larson et al. 9.

Field studies of vaccine efficacy have confirmed the results of these experimental approaches (c.f. 10). As expected, efficacy is substantially higher when the vaccine strain matches the dominant seasonal strain than when there is a measurable mismatch between the vaccine strain and the dominant strain 11. Gupta et al. 12 compiled an extensive set of vaccine efficacy data and found a strong correlation (R 2 = 0.81) between vaccine efficacy in a given season and the fraction of amino acid replacements in the antigenic regions of the hemagglutinin protein between the vaccine strain and the dominant circulating strain. Evidence of a more direct quantitative connection between the antigenic novelty of the viruses from a given season and susceptibility to influenza can be found in a prospective study of the 1976 seasonal epidemic by Gill and Murphy 13. The researchers determined the most recent year of laboratory confirmed influenza for a set of volunteers and found a strong linear relationship between the probability of infection in the 1976 season and the number of years since the previous infection. The above studies suggest that the number of susceptible individuals for a given season, and hence the number of influenza cases, will be strongly correlated with measures of the antigenic distance between prevailing viruses for that season and viruses that circulated in previous seasons.

In light of the compelling evidence for a quantitative relationship between antigenic drift and seasonal severity, it is perhaps surprising that until recently no study demonstrated a positive correlation between drift and severity using data derived directly from influenza surveillance. A recent study by Wu et al. 14 focuses on influenza H1N1 and considers major “antigenic strains” based on hemagglutination inhibition (HI) assays — a method traditionally used by the Centers for Disease Control (CDC) to characterize seasonal viruses for vaccine selection and surveillance purposes. Wu et al. then estimate the excess mortality attributable to each antigenic strain using the fraction of isolates positive for that strain among all those tested in seasons in which the strain was detected. This analysis has shown a strong correlation between estimates of the multi-season excess mortality due to an antigenic strain and its antigenic distance from previous strains for H1N1, and a weaker correlation for H3N2. Given that in recent years, most of the inter-pandemic influenza morbidity was caused by H3N2 viruses 15, we were interested in investigating the relationships between sequence and antigenic variation in the H3N2 HA and the seasonal severity (health burden) of influenza caused by this subtype. In this analysis, we aimed at reaching beyond correlations and developing a practical model for projecting the health burden of influenza H3N2.

Extending previous work on the relationship of antigenic drift and seasonal severity, we here use sequences of hemagglutinin proteins from H3N2 clinical isolates and the corresponding HI data from two countries to demonstrate that one can reconstruct with surprising accuracy the severity of a given season. That is, given only data on the antigenic novelty of the viruses from a season we can predict its epidemiological severity. Furthermore, our statistical model provides a framework for making projections of the severity of the upcoming season using assumptions based on viral isolates collected in the current season. These results are expected to be helpful in the planning process for the control of seasonal influenza and have implications for influenza surveillance.


Epidemiological indices

The seasonal impact of influenza on a population is notably difficult to measure precisely due to the non-specificity of symptoms and the lack of diagnostic tests conducted in routine, and morbidity data are especially scarce. To overcome this problem, we used several epidemiological indicators and validated our predictive algorithm by considering two separate countries: the US and Hong Kong. In the US analysis, we used seven independently-collected epidemiological indicators, each representing a slightly different aspect of influenza A/H3 seasonal disease burden. These indicators were available for 16 consecutive seasons from 1993-1994 to 2008-2009 and comprised:

  1. a seasonal indicator of the dominance of influenza A/H3 over the A/H1 subtype for each season (F-H3);
  2. the seasonal fraction of positive influenza specimens over all respiratory samples tested (F-Pos);
  3. a proxy for the seasonal number of H3 cases, H3 severity (S-H3);
  4. influenza-related seasonal excess mortality rate (R-Mo);
  5. influenza-related seasonal excess hospitalization rate (R-Ho),
  6. seasonal percent of Influenza-Like Illnesses (F-ILI);
  7. index of speed of spread (I-Sp)

Percent of Influenza-Like Illnesses (F-ILI) was calculated as percent of consultations for influenza-like illness out of the total number of consultations reported to the surveillance system for a given season. Fraction of positive specimens (F-Pos) was calculated as percent of specimens positive for influenza virus of all specimens collected during the season. (Data source: CDC [16]). An indicator of the dominance of influenza A/H3 over the A/H1 ( F-H3) was calculated as the logarithm of the ratio of H1 to H3 isolates (Data source: CDC 16).

H3 epidemic severity (S-H3) was calculated as the product of total influenza epidemic severity and the fraction of H3 isolates among all influenza isolates 16. Total epidemic severity was calculated using two sources of data because no available data source covers the entire period of 1993-2008. For the period of 1997/98 – 2008/09, total epidemic severity was calculated as F-ILI x F-Pos . For the period of 1993/94-1996/97, we first calculated seasonal excess mortality impact (R-Mo , excess death rate from pneumonia and influenza per 100,000) as in 17. Excess mortality is a traditional indicator of influenza disease burden and is estimated from national vital statistics as mortality in excess of an expected seasonal baseline representing the level of mortality in the absence of influenza 17. Then, the total severity was estimated as a normalized mortality impact, based on the correlation between mortality impact and the ILI-based severity measure described above, over the years where both were available (1997/98-2005/06) (using the geometrical mean of the ratio ILI-based severity/mortality impact for 1997/98-2005/06).

The index of speed of spread (I-Sp) was calculated as the standard deviation of the dates of peak pneumonia and influenza mortality rates across 49 continental states (48 states+ DC) 18. This index has been shown to vary with the intensity of influenza epidemics, with larger epidemics spreading faster across the continental US and resulting in a lower index of spread (lower standard deviation).

Hospitalization excess rate (R-Ho) was derived from the State Inpatient databases maintained by the Agency for Healthcare Research and Quality. We chose 9 states which contributed data for 1989-2008 and represented 30% of the US population and compiled weekly number of admissions with any mention of pneumonia and influenza in the list of diagnoses. We applied the same method used to estimate excess mortality to hospitalization data and derived estimates of seasonal excess hospitalization rates attributable to influenza in the US.

Analysis of epidemiological data

Epidemiological data were analyzed using the following protocol. First, visual inspection of scatterplots was performed to ensure the absence of gross deviations from linear relationships between pairs of variables. All data vectors were standardized to an average of 0 and a standard deviation of 1; missing data points were assigned the value of 0. Principal Component Analysis (PCA) was performed without further scaling, so variables with more missing data contributed less to the principal components due to reduced variance.

Most of the seven indicators are highly correlated with each other (e.g. Spearman rank correlation rs[ S‑H3 , F‑H3 ] = ‑0.91 and rs[R‑Mo, R‑Ho] = 0.71). We used PCA to transform these correlated variables to a space of uncorrelated variables. The first principal component (PC1) accounted for 66% of the total variance; all original indicators are correlated with PC1 with |rs| = 0.57 .. 0.94. This variable was used as a single combined epidemiological index (Figure 1) for subsequent analyses with genetic and antigenic information.

Fig. 1: First Principal Component (epi.PC1) of seven measures of epidemiological severity

Genetic distances

HA1 sequences of the H3N2 influenza A virus available from the NCBI Influenza Virus Resource 19 as of June 2009 were used for analysis (see supplementary files 202122 ). All coding region (CDS) sequences of human H3N2 isolates (excluding short fragments) were downloaded and aligned using the MUSCLE multiple alignment program 23 with subsequent manual editing of the alignment. Aligned HA1 sequences of H3N2 influenza isolates from the US were sorted by seasons (from 1992-1993 to 2008-2009). The Northern Hemisphere influenza season was considered to start in August and end in July of the next calendar year. The distance between two seasons was taken to be the mean number of differences between sequences from one season and those of the other season. Such distances were calculated for all pairs of seasons separated by three years or less. This calculation included the comparison of every season to itself, which yields a measure of the within-season diversity. Overall amino acid distances ( sa ), amino acid distances in ( se ) and outside ( sn ) of the commonly used set of presumed antigenic epitope sites 2425 and nucleotide synonymous ( ss ) distances were computed. For each given season ( n -th), seven intra- and inter-season distances were calculated (Figure 2).

Fig. 2: Computation of inter-season genetic and serological distances

In combination with the above four types of distances, 28 relative season distance variables were available for each season (e.g. for the 2005-2006 influenza season se.n1.n1 refers to the average epitope sequence differences in the season of 2004-2005 and ss.n.n2 is the average synonymous distance between all pairs of isolates from 2005-2006 and 2003-2004 seasons).

Serological distances

We compiled hemagglutinin inhibition data from a variety of literature sources, including published articles, MMWR summaries, and FDA reports (see supplementary file for references 26). A two-dimensional antigenic map was calculated from hemagglutinin inhibition data according to 27. From this, antigenic distances were calculated for all pairs of isolates. Mean intra- and inter-season pairwise distances were then calculated for North America (aa) and Northern Hemisphere (an) isolates, corresponding to all of the types of distances calculated for sequence data (Figure 2) except that the n.n 3 distances were omitted. In combination with the above two sources of the isolates, 12 relative season distance variables were available for each season.

Model selection

Overall, we had 40 explanatory variables, comprising 7*4=28 genetic variables and 6*2=12 serological distance variables, and only 16 epidemiological observations. Due to the potential for severe overfitting, the standard approach of stepwise reduction of a model containing all variables was not feasible. Instead we adopted a two-stage approach to find the optimal balance between the goodness of fit and the number of degrees of freedom in the model. At the first stage we used a transformation of the original distance variables to determine which of them contributed most to the prediction of the epidemiological variable; at the second stage we applied this information to select the most statistically robust predictors using the original variables.

Transformation of distance variables

For each relative season, combination the data from different variables (sa, se, sn and ss for genetic and aa and an for serological data) are strongly correlated (e.g. rs[sa.n.n1, ss.n.n1] = 0.74). Thus each of the 28 genetic variables and 12 serological distance variables were subject to PCA (Figure 3). The first 2 principal components were retained for each data series, reducing the number of potential explanatory variables from 40 to 26.

Fig. 3: Principal Component Analysis (PCA) of genetic and serological distance variables

Retrospective model: first stage

We tested linear models containing all combinations of 3 (2600 models), 4 (14950 models), 5 (65780 models) and 6 (230230 models) variables. Model fitting was performed on the R platform using the lm() function. Within each class the models were ranked by decreasing residual sum of squares. Additionally, 19 variables most frequently present among the top 50 6-variable models were used to generate all combinations of 7 (50388 models) and 8 (75582 models) variables; 7- and 8-variable models were ranked in the same manner.

To test the models statistically, we used two variants of a jackknife resampling procedure 28. The first variant tests the statistical robustness of the relationships between the explanatory variables and the target variable in general. The second tests the statistical robustness of a model using a particular set of explanatory variables.

In the first variant we removed data for each single season from our dataset one-by-one and used the remaining 15 seasons to find the best-fitting model (both the set of the explanatory variables and coefficients). Then we used this model to predict the epidemiology for the remaining season (leave-one-out jackknife scheme). The sum of squares of the deviations of these predictions from the observations was accumulated over all 16 seasons to serve as the robustness indicator.

In the second variant we generated all possible bipartitions of the 16 epidemiological observations into two classes of equal size. With each bipartition, one half of the seasons data was used to train the model using the fixed set of the explanatory variables; then these model coefficients were used to predict the epidemiology for the remaining 8 seasons (leave-half-out jackknife scheme). The residual sums of squares for the control half of the data were accumulated over all bipartitions to serve as the indicator of the model robustness. We tested the top 10 models from each class and ranked them by decreasing total residual sum of squares.

In the leave-one-out jackknife test the 5-variable model family demonstrates a highly robust behavior, accounting for 0.96 of the of the variance in epi.PC 1. The following 5-variable model demonstrated the best performance among all tested models:

epi.PC 1’ ~ n.n2.PC 1 + n.n2.PC 2 + n1.n1. PC 1 + a.n1.n1.PC 1 + a.n1.n1.PC 2 + 0

Hence this model includes genetic distance between the current season and 2 seasons before, and intra-season genetic and serological distances between isolates circulating in the previous season in the US (genetic) or North America and Northern Hemisphere (serological). When trained on the full complement of 16 seasons (Figure 4), this model explains 0.98 of the original variance (F-statistics p-value of 3×10 -9), the coefficients being -1.36, -1.54, -0.32, -1.28 and -1.69 respectively (PCA gives arbitrary signs to the principal components, so the signs of coefficients do not indicate the kind of relationship by themselves). In the leave-half-out jackknife test it explains, on average, 0.89 of the variance in epi.PC1.

Fig. 4: Comparison of first stage retrospective model (retro.1) with observed epidemiological severity (epi.PC1)

To further ascertain the statistical validity of the relationship between the genetic and serological data and epidemiological severity we implemented an additional test procedure. The epidemiological data vector (epi.PC 1) was permuted and all combinations of 5 out of 26 explanatory variables were tested as potential reconstruction models for the scrambled epi.PC 1 data. Only 36 out of 100,000 permutations yielded a fit as good as that obtained with the unpermuted data.

The above shows that the sequence and antigenic data explain part of the variance in morbidity among seasons. We can reject the hypothesis that the true R2 equals zero with high statistical confidence, and we estimate that the true R2 is 0.96. In order to obtain a statistical lower bound on the true R2, we generated epidemiological severity values that mimicked various levels of unexplained variance, and determined the lowest R2 value that could not be rejected at the 5% level by the R2 observed for the real data (i.e., the lowest value for which the R2 of the fit was as good as the observed 98% in at least 2.5% of the replicates). To the epidemiological values predicted by the fitted model, which can be fit with R2 = 1, we added independent pseudo-random numbers. Two distributions for these were used: a normal distribution and a Laplace (double exponential) distribution. The Laplace distribution has an excess kurtosis of 3.0, and is used because the observed residuals have excess kurtosis (approximately 2.5). For each trial quantity of added noise, many replicates were generated, and each was independently fit to the sequence and antigenic data, with no constraint on the choice of combinations of explanatory variables.

The 95%-confidence lower bounds for R2 thus obtained were 0.88 with normally-distributed deviations and 0.84 with Laplace-distributed deviations. Upper bounds, obtained similarly, were both close to 0.99. We conclude that the fitted model likely accounts for the vast majority of the season-to-season variance in epidemic severity.

Retrospective model: second stage

In the first stage of the analysis, we found that genetic distances between the current season and a season two years ago (n.n2) and the genetic and antigenic diversity one season ago (n1.n1) yield the most statistically robust retrospective reconstruction of H3N2 influenza epidemiology. In the second stage, we apply a similar approach to prediction of influenza epidemiology in the next season.

In the second stage, we apply the standard stepwise model reduction approach to find the optimal set of original variables. We started with a model containing 8 variables based on the best-fit model identified in the 1st stage (se.n.n2, sn.n.n2, ss.n.n2, se.n1.n1, sn.n1.n1, ss.n1.n1, aa.n1.n1 and an.n1.n1) and removed one by one the variables that contribute the least to the model (i.e. with the lowest absolute t -value of the coefficient). Each derived model was compared to its parent using the anova() function of R; if the reduced model was not rejected at significance threshold of 0.05, the process continued. The final model that cannot be reduced any further without a significant drop of the goodness of fit contains the following 5 variables:

epi.PC 1’ ~ se.n.n2 + sn.n.n2 + ss.n.n2 + se.n1.n1 + aa.n1.n1 + 0

When trained on all 16 seasons (Figure 5), this model explains 0.98 of the original variance (F-statistics p-value of 3×10 -9), the coefficients being 1.26, 1.78, 0.81, -0.41 and 2.08, respectively. In the leave-half-out jackknife test the model explains, on average, 0.91 of the variance in the remaining 8 seasons.

Fig. 5: Comparison of second stage retrospective model (retro.2) with observed epidemiological severity (epi.PC1)

Independent test: Hong Kong data

To test whether these results pertain only to US or are more generally applicable to H3N2 influenza epidemiology, we collected a similar data set from Hong Kong and mainland China. Three epidemiological indicators (S-H3 , F-H3 and F-Pos) were obtained for 13 seasons (from 1997 to 2009) using data from Hong Kong Centre for Health Protection 29. Influenza epidemics in Hong Kong usually show two peaks (February-March and May-June). Both peaks were combined into the same season, and therefore a season corresponds to a calendar year in this analysis.

Because Hong Kong Centre for Health Protection data lacks ILI raw numbers, only %ILI for each month is available. Therefore, we calculated the total epidemic severity (%ILI*%Positive Specimens) for each month, and then totaled these numbers for a year to obtain. (As a test to show that this method gives results similar to the regular method (using cumulative raw numbers for each season to calculate %ILI and %Positive Specimens), we checked the correlation between the epidemic severity numbers calculated using the two methods for the USA data. The correlation between the two methods is very high (r=0.97). The H3 epidemic severity (S-H3) was calculated as the total influenza epidemic severity prorated by the fraction of H3 isolates among all isolates. These data were converted to a single epidemiological index using PCA (Figure 6). Interestingly, despite the fact that Hong Kong, like USA, is a non-tropical Northern Hemisphere country, the epidemic severity measures between the two are relatively weakly correlated (rs[epi.PC 1.USA,epi.PC 1.HK ] = 0.36 with the p-value of 0.21). Thus the Hong Kong dataset provides a largely independent test of the model applicability.

Sequence and antigenic distances were computed in a manner similar to that for the US data (the list of Hong Kong isolates used for genetic distance calculation is in the supplementary file 21). Again, in this analysis, a season corresponds to a calendar year. Due to a low number of serologically tested isolates from Hong Kong proper, antigenic data were computed for a combined set of Hong Kong and mainland China isolates. The seasonal epidemiology seems to be the same in Hong Kong and Southern mainland China, and only Northern mainland China follows the standard Northern Hemisphere pattern30. Moreover, the Northern China epidemics follow the Southern China epidemics and not vice versa 30. Based on this observation, two possible ways of merging these data sets were implemented, producing two versions of the serological distances (ac and as in contrast to aa and an serological distances for US). The first one (ac) corresponds to the scenario where for all isolates (Hong Kong and China) the season was considered to correspond to a calendar year, following the Hong Kong pattern. The second one (as) considers Hong Kong isolates following the Hong Kong pattern and China isolates following the Northern Hemisphere pattern with seasons starting in the fall.

Fig. 6: First Principal Component (epi.PC1) of three measures of epidemiological severity for Hong Kong

There are several possible ways to apply the model derived from the US data to the Hong Kong data. One would be to directly compute the model prediction using the formula:

epi.PC 1′ = 1.26se.n.n2 + 1.78sn.n.n2 + 0.81ss.n.n2 – 0.41se.n1.n1 + 2.08aa.n1.n1

and replacing aa.n1.n1 (US data) with either ac.n1.n1 or as.n1.n1. Using as.n1.n1 for the serological distance variable, this formula produces a prediction that is reasonably well correlated with the real epidemiology data (rs[epi.PC 1,epi.PC1’] = 0.60, with the p-value of 0.012 in a permutation test; Figure 7, “” plot). This model allows rough prediction of the ups and downs of the H3N2 influenza epidemics in 1997-2009, but gives a relatively poor quantitative estimate.

When the model is allowed to use the actual epidemiological data from Hong Kong to adjust its coefficients (yielding 1.50, 1.46, -0.82, 0.34 and 0.64 respectively), the prediction is improved (Figure 7, “retro.adjust” plot). The adjusted model explains 0.75 of the original variance with F-statistics p-value of 2×10 -2 .

Finally, the stepwise reduction of the full model containing the genetic and serological distances from the corresponding seasons (se.n.n2, sn.n.n2, ss.n.n2, se.n1.n1, sn.n1.n1, ss.n1.n1, ac.n1.n1 and as.n1.n1) leads to the following 4-parameter model (Figure 7, “retro.stepwise” plot):

epi.PC 1’ ~ se.n.n2 + sn.n.n2 + ac.n1.n1 + as.n1.n1 + 0

The model coefficients are 1.25, 0.75, 0.76 and 1.10 respectively; it explains 0.78 of the original variance with F-statistics p-value of 5×10-3. When trained on 8 out of the 13 seasons (leave-5-out jackknife test scheme) , this model explains 0.42 of the original variance on average.

Fig. 7: Comparison of different retrospective models with observed Hong Kong epidemiological severity (epi.PC1)

Projection model: Southern Hemisphere isolates

The retrospective model allows us to accurately reconstruct the epidemiological severity of a season using sequence and serological data from that season’s viral isolates. This provides a framework to project severity for the upcoming season based on assumptions about the population of viruses that might circulate in that season. An obvious assumption to assess would be that the population of viruses in an upcoming Northern Hemisphere season is identical to the population of viruses seen in the preceding Southern Hemisphere influenza season. To this end we use the sequence and serological data for the isolates observed in the Southern Hemisphere in the season preceding the flu season in the Northern Hemisphere. By September-October the Southern Hemisphere isolates are, at least in principle, available for analysis before the start of the Northern Hemisphere flu season. In any given season these Southern Hemisphere isolates are used as a substitute for the current Northern Hemisphere isolates as shown in Figure 8.

Fig. 8: Computation of inter-season genetic and serological distances using the isolates from the Southern Hemisphere. Distances computed along the red arrows replace those, computed along the dashed arrows

For the purpose of this analysis, we considered Southern Hemisphere as comprising only non-tropical Southern countries because of a different influenza epidemic pattern in the Tropics 31. Southern Hemisphere seasons were considered to be equivalent to calendar years, and correspond to the subsequent Northern Hemisphere seasons (e.g., 2003 Southern Hemisphere season corresponds to the 2003/04 Northern Hemisphere season). There were no Southern Hemisphere sequences from 2008, and we used the only January 2009 sequence as a substitute for 2008 because January is borderline between the two Southern seasons (2008 and 2009). List of Southern Hemisphere isolates used for genetic distance calculation is in the supplementary file 22. Genetic and antigenic distances were computed as before.

We apply the second stage retrospective model (list of explanatory variables together with the coefficients) trained on the original dataset to this modified dataset to obtain the model projections to the current season using the Southern Hemisphere data from the preceding season (Figure 9).

Fig. 9: Comparison of second stage projection model (proj.2) with observed epidemiological severity (epi.PC1).

Error bars show the root mean square deviation of the projections from the real observations

The projections explain 0.66 of the original epi.PC1 variance and shows an excellent correlation with the epidemiological observations (rs[epi.PC1, epi.PC1’] = 0.95, with the p-value <0.0001 in a permutation test).

Projection using previous seasons

Although projections based on Southern Hemisphere data correlate quite well with the actual epidemiological observations in the Northern Hemisphere, it would be useful to be able to make these projections earlier if possible, i.e. by the time when the decisions concerning the next season vaccines are made. A straightforward application of our approach (i.e. using only the data from preceding years to predict the morbidity for the upcoming year) proved to be insufficiently robust statistically (not shown). Often, however, multiple lineages co-circulate in a given season, and the epidemiological picture may be different depending on which lineage dominates the following season. Which of the co-circulating clades (if any) will become dominant in the next season is not known in advance; however one can construct explicit hypotheses about the upcoming season by assuming each of these clades taking over. Thus, it is possible to employ our model to explore the spectrum of possible predictions based on such assumptions.

As a demonstration of this approach, we divided the USA isolates from the 2002-2003 season into two subgroups – 9 Panama-like (members of the larger Sydney-like class) (supplementary file 32) and 7 Fujian-like viruses (supplementary file 33), based on the reconstructed phylogenetic tree for HA1 (A/H3N2 influenza) (supplementary file 34) as described in 35. Each of the subgroups was used to substitute for the real 2003-2004 season isolates in our data. The raw sequence distances computed for these subgroups were normalized using means and standard deviations previously determined for the full dataset. Then we used the list of variables and the coefficients determined for the second stage retrospective model to predict the severity of the 2003-2004 epidemics. The two subsets correspond to the two epidemiological hypotheses: one assumes that the 2003-2004 flu season will be dominated by Panama-like H3N2 virus, whereas the other assumes the Fujian-like virus prevalence. The results of these projections are shown on Figure 10.

Fig. 10: Comparison of the previous-season projection models (proj.Panama and proj.Fujian) with observed epidemiological severity (epi.PC1).

Using the Panama projection predicts a very mild flu season for 2003-2004; in contrast, the Fujian projection predicts a severe epidemics. In reality the 2003-2004 flu season was dominated by the Fujian-like isolates worldwide and was the strongest on record for our range of dates in the USA.

All data series used in this work may be found in supplementary file 36.


As reviewed above, an extensive body of research supports the notion that seasonal severity of Influenza A is strongly correlated with the degree of antigenic drift. Nevertheless, it is surprising that even applying a highly conservative approach (i.e. leave-half-out jackknife test), over 90% of the variance in epidemic severity can be explained by the antigenic and genetic novelty of the hemagglutinin protein. This implies that factors such as climate, school cycles, other co-circulating pathogens, and changes in influenza genes other than the hemagglutinin gene all have at best minor effects on seasonal morbidity, although they may affect epidemic timing. A number of studies have shown that temperature and humidity play important roles in the spread of influenza (cf. 37383940), however, for the scale of our analysis (e.g. geographically across all of North America and temporally for entire seasons), it might be difficult to detect a significant impact. Other co-circulating pathogens might reduce the number of seasonal influenza cases, e.g. owing to a direct impact on the host immune system or because of indirect effects on the host as the “vector” for spread. This “interference” effect has been reported for respiratory syncytial virus and rhinoviruses 41424344 but, given our results, this interference does not appear to have been significant during the seasons covered in this study.

Competition between Influenza A virus subtypes has been noted in several focused studies 454647 as well as in a broader analysis of surveillance data 48 . However, the impact of competition from H1N1 on variation of variation of H3N2 seasonal severity is unlikely to have been significant during the time interval studied here because H3N2 was dominant most seasons, and furthermore, there is little evidence of positive selection for H1N1 antigenic novelty 3549. In addition, Wolf et al. 35 have shown that during this period, H1N1 tended to dominate only after mild H3N2 seasons, so the competition appeared to be one-sided.

Although our results show that the degree of hemagglutinin novelty explains most of seasonal morbidity, other viral proteins have been shown to play important roles. For example, consider the evolutionary history of the Fujian H3N2 strain: initially an antigenically novel minor variant, succeeded by a reassortant containing the novel Fujian HA gene, which in turn is succeeded by a later “pure” Fujian variant in which all genes are derived from the original Fujian strain. An analysis of these events led to a proposal that deleterious mutations in genes other than HA prevented the earliest members of this clade from dominating despite their antigenic novelty but subsequent compensatory mutations ultimately led to its dominance over the reassortant strain 35. Recent experiments on the later Fujian variant support the idea that mutations in the polymerase PA gene were responsible for the decreased fitness of the original Fujian strain 50.

Antigenic diversity in the preceding season is an important component of our statistical model which is consistent with the above observation that an antigenically novel strain may circulate for a season as a minor variant before compensating mutations allow it to dominate the following season. Mutations in the hemagglutinin gene itself might have effects on fitness beyond that of changing antigenicity. For example, mutations have been identified in the HA gene that might allow the virus to evade the host’s immune response but also modify binding to the host receptor leading to decreased replication fitness (cf. 515253). Our results suggest that this could be a common feature of early drift variants.

Antigenic diversity in the preceding season is not the only warning of a rise in morbidity but it is difficult to extract other clear signals or rules from our analysis. In addition to the underlying complexity of the system, the sequence and HI data we have for clinical isolates are not sufficiently comprehensive or representative. For example, in the 2002-2003 season, according to the CDC’s antigenic characterization of the 2002 – 03 U.S. Influenza season 16, 85% of the H3N2 clinical isolates were similar to A/Panama/2007/99 and 15% were similar to A/Fujian/411/2002. However, our sequence data were about half Panama and half Fujian whereas virtually all of the HI data we were able to obtain were similar to Fujian.

While more representative input data could improve the performance of our statistical model, refinements in other aspects of the method may also be useful. For example, other sequence-derived measures of novelty might work better than our simple amino acid replacements-based approach. The method of Gupta et al. 12 yielded equivalent results but there are additional sequence-based methods that should be evaluated as well (cf. 54).

Because our method provides an accurate reconstruction of epidemiological severity given sequence and serological data, one can evaluate the epidemiological consequences of various assumptions about the population of viruses for the upcoming season. As seen in Figure 9, projections using data from Southern Hemisphere isolates provide realistic estimates of severity for the Northern Hemisphere. When there is significant diversity among co-circulating clades in a given season, one can assess different scenarios for the upcoming season, as was done using data from 2002-2003 seasonal isolates to project severity for the 2003-2004 season. The panel advising the US Food and Drug Administration on vaccines had difficulties that season because the Fujian strain – a minor strain in 2002-2003 – was antigenically novel but was difficult to grow in chicken eggs. Scientists at the CDC influenza branch had been able to passage the Fujian strain through dog kidney cells and obtain egg-adapted virus but concern about contamination steered the FDA advisory committee towards a decision to use the Panama strain ( cf 55). The ability to make accurate projections of epidemiological severity might have been helpful to further inform this decision.

Although, in some cases, projections based on co-circulating clades could be helpful prior to the availability of Southern Hemisphere isolates, thus assisting in vaccine strain decisions, in other situations, such as the 1997-1998 Sydney season, the rapid emergence of a new dominant strain makes earlier projections effectively useless. Deeper sampling of representative viral isolates might provide an earlier warning of such novel clades, and more sophisticated models using additional data might also eventually prove helpful.

Finally, we note that our predictions of influenza morbidity based on sequence and serological data are limited to the inter-pandemic period, and that our model cannot be used to project the severity of an emerging pandemic influenza virus, such as 2009 H1N1pdm. In addition, it remains unclear whether this model rooted in empirical data collected before the 2009 pandemic will be suitable to predict the severity of influenza A/H3N2 season in the post-pandemic period. Both H3N2 and H1N1 have begun to co-circulate in 2010 56, and further studies will need to determine whether the dynamics of these two subtypes has changed and whether the algorithms proposed here need to be revised to accommodate these changes.

The results presented here should be useful in the planning process for seasonal influenza. This work, along with the extensive earlier research that revealed the correlation between antigenic drift and seasonal severity, provides an objective function for refining measures to characterize antigenicity and seasonal morbidity. Given that we can now use surveillance data to make useful projections of influenza morbidity, we should begin to consider those changes to surveillance that maximize the effectiveness of this approach.

Competing interests

David Lipman, corresponding author, is one of the Editors of PLoS Currents: Influenza .

]]> 0
Lymphocyte to monocyte ratio as a screening tool for influenza Tue, 30 Mar 2010 09:44:15 +0000


In the 20th century alone, there were three overwhelming pandemics, the Spanish flu in 1918, leading to 50 million deaths, Asian flu in 1957 (1 million deaths) and Hong Kong flu in 1968 (700.000 deaths), caused by H1N1, H2N2 and H3N2, respectively [1]. Avian flu (H5N1) was the first to mobilize the medical world and governments about the possibility of a new influenza pandemic [2], which is now a reality with the pandemic influenza A (H1N1) virus (swine flu), having already affected millions of people.

Recent and previous studies have revealed the clinical features of H1N1 virus infection [3][4]. The mainly affected group is between 15 to 30 years old, the average incubation period is two days, and the majority of patients present with fever, cough, rhinorrhea, sore throat, myalgia and headache. Nausea, vomiting and diarrhoea are less frequent in adults compared to children [5].

For the uncomplicated cases, the disease is mild and no further invasive diagnostic testing is indicated [6], but preventive measures for the health care workers and guidelines for the patients are important, as the complications could be very serious, including acute respiratory distress syndrome and possible death even without comorbidities [7].

On the part of laboratory diagnostics the majority of patients reveal transient lymphopenia from the very first day [8]. Usually, a week after the symptoms subside, the virus cannot be traced [4].

The most sensitive but time-consuming evidence of infection with the new influenza A (H1N1) virus is provided by the reverse transcriptase polymerase chain reaction (RT-PCR) [9] as compared to the commercially available rapid tests for seasonal influenza, which show low sensitivity and specificity for H1N1 virus detection [10].

Therefore, confirming a diagnosis of H1N1 infection, even in hospitalized patients, can be proved difficult and protracted [6] . On the other hand starting antiviral treatment early is recommended [11] .

The aim of the present study was to examine, if there are more sensitive laboratory parameters, which could play a major role in identifying and treating those likely to have H1N1 virus infection.

Material and methods

Data from the emergency department of a private clinic in Patras, Western Greece, were collected during the period from September to December 2009.Data included complete blood count (CBC), rapid test for influenza A/B (RIT), C reactive protein (CRP), demographic and clinical characteristics.

Study population included all patients visiting the emergency department during that period, with at least two of the following symptoms: fever, cough, rhinorrhea or sore throat and swollen lymph nodes. All patients whose diagnostic investigation revealed a disease other than a possible viral infection were excluded from the study.Thus, patients with lower respiratory tract infections and laboratory-confirmed elevated total white blood cells (WBC) and neutrophils, or patients with sore throats and positive strep test were excluded from the study as having in all probability bacterial infections. Furthermore, patients with detected IgM antibodies to Epstein-Barr virus (EBV) were excluded as well.

Samples for the rapid test were taken by pharyngeal swab. SD BIOLINE one step influenza virus type A and B antigen tests were used. In the beginning all positive RITs were sent to specified laboratories for verification through RT-PCR, but later with the pandemic spread in November, the official guidelines required confirmation of hospitalized cases only. CBC was conducted by hematology analyzer CELL DYN 3700. Descriptive analysis was conducted using SPSS v 16.00.


The study population included 58 patients; the mean age was 28.6 years (range, 2 to 75), and 43% were male. More than 60% were aged 15 to 30 years, following the worldwide trend. The most common symptoms were cough (in 93% of the patients), fever (89%), rhinorrhea (54%) and swollen lymph nodes (20%). Incidence peaked at the end of November and early December with declining trends later on, in accordance to national data.

Table 1 presents the main laboratory parameters of the studied subjects. Normal white blood cell count (4100-10900/mm3) was observed in 90% of the patients. Lymphopenia (<1500/mm 3 or <20 percent of total white blood cells) was present in 64% and monocytosis (>800/mm 3 or >10 percent of total white blood cells) in 72% of the probable influenza cases. The majority of the patients with influenza-like illness (90%) revealed a decreased ratio of lymphocytes to monocytes (<2) with normal or low total white blood cells. 19 cases identified by RIT (33%) as positive, showed the latter laboratory characteristics (Table 1), and had a ratio of lymphocytes to monocytes in CBC near 2 or lower (only 2 cases ranged between 2,1-2,8). Thrombocytopenia was not observed, and CRP was slightly elevated.

There was only one complicated case, (lower tract infection), which needed to be hospitalised. Laboratory parameters revealed normal WBC and decreased ratio of lymphocytes to monocytes at the emergency department, while RT-PCR confirmed H1N1 virus infection.

Table 1: Laboratory parameters of all patients with influenza-like illness

Patients with influenza-like illness (ILI)N=58
Parameters TotalN=58 Positive Rapid Influenza Test (RIT)N=19 Lymphocytes/ monocytes ratio<2Ν=39
Leukocyte count
Mean count (SD)
<4000/mm 3
>10900/mm 3



Lymphocyte count
Mean count (SD)
<20% [lymphocyte/leukocyte]<1500/mm 3



Monocyte count
Mean count (SD)
>800/mm 3



CRP (mg/dl)
Mean count (SD)

3 (2)

1,8 (1)

3 (2,3)
Rapid test for influenza A/B
Influenza A
Influenza B



Ratio < 2
(Lymphocytes / Monocytes)
67% 85% 100%
Lymphopenia and/or Monocytosis 89% 100% 100%


According to recent studies lymphopenia is a common feature for seasonal influenza and new influenza A (H1N1) infection [4][8]. It has been blamed for prolonged viral excretion and lower respiratory tract infection [12]. However, the presence of monocytosis observed in this study has not been evaluated up to now. In our study population, it is significantly related to positive RIT, especially when combined with relative or absolute lymphopenia, with a ratio of lymphocytes to monocytes below 2 (p<0.01).

During the current pandemic wave it is described that the majority of influenza infections in Greece are caused by the new H1N1 virus [13], and therefore this pattern on CBC could most likely be attributed to that virus.

Based on these observations we believe that monocytosis combined with lymphopenia, or a decreased ratio of lymphocytes to monocytes (<2), with normal or low total white blood cells, could trace more cases than the rapid influenza test , in patients with influenza like illness and be more useful if demand for virology testing exceeds capacity.

Taking into account the proposed ratio, 67% of all ILI cases in this study would be considered to be infected with H1N1 virus. The national data from the Hellenic center for disease control reported for the same period that 60% of all samples with influenza-like illness were testing positive for H1N1 virus by RT-PCR [13].

Given the fact that the hematology analyzer cannot discriminate between monocytes and large lymphocytes this study cannot suggest the presence of actual relative monocytosis, however the use of the relevant ratio in the emergency department is independent of that fact. A point score system for the probability of H1N1 infection, involving elements of CBC and clinical criteria has already been proposed for hospitalised patients [14]. At present, a decrease in the number of cases in Greece is observed, probably due to milder weather conditions. However, as lower temperatures in Greece are expected in March and April, a new pandemic wave could stress the health system again, considering the fact, that seasonal influenza shows usually a peak at the end of the winter months in Greece [13].

The latest update from WHO [15] points that up to February 2010, worldwide more than 212 countries and overseas territories or communities have reported laboratory confirmed cases of pandemic influenza H1N1, including more than 15000 deaths, (4000 in Europe). In Europe, although pandemic influenza virus continues to circulate widely, particularly across central, southern, and eastern Europe, the overall intensity of pandemic influenza activity has declined substantially from peaks of activity seen earlier during the winter transmission period.There are more active areas of transmission like Northern Africa, South Asia, and East Asia [15].

However, even if a new pandemic wave does not occur, one should be still aware of the new influenza virus, because of its tendency to affect the lower respiratory tract [16]. Thus, a simple diagnostic tool as the proposed, can prove to be valuable in a low activity period, mainly in patients with influenza-like symptoms.

Finally, a ratio of lymphocytes to monocytes below 2, is considered indicative of the ‘turn’ in the parameters of CBC. However, the optimal cut-off point should be determined based on the RT-PCR as gold-standard diagnostic test.

We suggest this observation to be investigated in larger study populations including smaller age groups and performing RT-PCR and microscopic analysis of CBC in order to verify, if CBC and especially relative monocytosis could be applied as a time-saving and cost-effective screening test for H1N1 virus infection, leading to early antiviral treatment and hence to a decrease in the incidence of complicated cases. Such a tool would be very helpful in areas were laboratory confirmation is limited due to financial restrictions or excess demand.

Competing interests

The authors declare having no competing interests.

]]> 0
Structure and Receptor binding properties of a pandemic H1N1 virus hemagglutinin Mon, 22 Mar 2010 18:39:52 +0000


The first influenza pandemic of the new century emerged in April 2009, when a new H1N1 influenza virus (H1N1pdm), found in patients in Mexico and the United States, spread rapidly across the world by human-to-human transmission, resulting in the World Health Organization declaring a global pandemic on June 11 th 2009 [1]. The pandemic H1N1 virus (2009 H1N1) was unique in that it had a gene constellation from both North American and Eurasian swine lineages that had not been isolated previously in either swine or human populations [2]. Phylogenetic and antigenic analysis of the hemagglutinin (HA) gene revealed it to be distinct from seasonal human H1N1 viruses but more similar to the classical North American swine lineage.

Ten months after the first viruses were isolated, the virus is still antigenically homogeneous [3]. However, as the HA continues to circulate in the human population, its HA antigenic sites will continue to be targeted by antibody-mediated selection pressure. Therefore it is important from a public health perspective to structurally characterize the hemagglutinin so that the research community has a template with which to visualize any changes affecting antigenicity or virulence that may emerge as this virus evolves. To this end, we have cloned, expressed and solved the structure of a pandemic H1 hemagglutinin by x-ray crystallography. The structure was used to analyze amino acid substitutions in the HA that have raised some concern during the last 11 months of global surveillance activities. The same protein was analyzed by glycan microarray and compared to seasonal and other pandemic variants. Results reveal a strict human-like receptor specificity.

Materials and Methods

Recombinant HA cloning and expression: Utilizing a similar cloning strategy from previous studies [4][5][6], the HA ectodomain of the 2009 H1N1 pandemic influenza virus, A/Texas/05/2009 (Accession: FJ966959) was codon optimized, synthesized and cloned into the baculovirus transfer vector, pAcGP67-A (BD Biosciences, San Jose, CA) by Geneart AG, Germany. Constructs containing Ohio/7/2009 (Accession: FJ969535), A/Utah/20/2009 (Gisaid Accession: EPI217204) and A/Darwin/2001/2009 (Accession: GQ243757) were generated by mutagenesis of the A/Texas/05/2009 clone (A/Ohio/7/2009:Ser203Thr/Val411Ile; A/Darwin2001/2009:Ser203Thr/Arg205Lys/Val411Ile; A/Utah/20/2009:Asn156Asp/Gln293His) using the QuikChange Multi Site-Directed Mutagenesis Kit (Stratagene, CA). Seasonal H1N1 HA constructs were cloned into the baculovirus transfer vector, pAcGP67-A (BD Biosciences, San Jose, CA). Transfection and virus amplification were carried out as described previously [4][5][6]. Protein expressed from Trichoplusia ni (Hi5) cells (Invitrogen, Carlsbad, CA) in 10-stack CellSTACK™ culture chambers (Corning Inc., Corning, NY) was recovered from the culture supernatant and purified by metal affinity chromatography, subjected to thrombin cleavage and gel filtration chromatography [7]. Purified monomeric protein was buffer exchanged into 10 mM Tris-HCl, 50 mM NaCl, pH 8.0 and concentrated to 7.8 mg/ml for crystallization trials. At this stage, the protein sample still contained the additional plasmid-encoded residues at both the N (ADPG) and C terminus (SGRLVPR).

Crystallization and data collection: Initial crystallization trials were set up using a TopazTM Free Interface Diffusion (FID) Crystallizer system (Fluidigm Corporation, San Francisco, CA). Crystals were observed in conditions containing various molecular weights of PEG polymer. Following optimization, diffraction quality crystals for Darwin09 were obtained at 20 ºC using a modified method for ‘microbath under oil’ [8], by mixing the protein with reservoir solution containing 22% PEG2000MME, 0.1M HEPES at pH 7.5. Crystals were flash-cooled at 100K, data was collected at the Advanced Photon Source (APS) beamline 22-BM at 100K and processed with the DENZO-SACLEPACK suite [9]. The data were indexed in spacegroup P1 with unit cell dimensions a=73.98Å, b=109.71Å, c=129.90Å; α=86.25°, β=74.68°, ϒ=75.10°. Statistics for data collection are presented in Table 1.

Structure determination and refinement: The structure of Darwin09 was determined by molecular replacement with Phaser [10] using the HA structure from A/Japan/305/1957, pdb:3KU3 [11] (HA1, 55% identity; HA2, 82% identity) as the search model. Six hemagglutinin monomers making one non-crystallographic trimer, related by a non-crystallographic 3-fold and three monomers that form one-third and two-thirds of two crystallographic trimers, occupy the asymmetric unit with an estimated solvent content of 55% based on a Matthews’ coefficient (Vm) of 2.75 Å 3/Da. Rigid body refinement of the trimer led to an overall R/Rfree of 48.1%/48.6 %. The model was then “mutated” to the correct sequence and rebuilt by Coot [12], then the protein structures were refined with REFMAC [13] using TLS refinement [14]. The final models were assessed using MolProbity [15]. Statistics for data processing and refinement are presented in Table 1.

Table 1:Data collection and refinement statistics.

Data collection Darwin09
Space group P1
Cell dimensions a=73.98Å, b=109.71Å, c=129.90Åα=86.25°, β=74.68°, ϒ=75.10°
Resolution (Å) 50-2.8 (2.90-2.80)*
R sym 7.6 (44.2)
<I/σ> 9.7 (1.6)
Completeness (%) 98.1 (95.7)
Redundancy 2.0 (1.9)
Resolution (Å) 50-2.8 (2.87-2.80)
No. of reflections (total) 87344
No. of reflections (test) 4608
R work / R free 23.1/25.6
No. of atoms 23552
r.m.s.d.- bond length (Å) 0.016

r.m.s.d.- bond angle (°)

MolProbity # scores
Favored (%) 95.3
Allowed (%) 99.7
Outliers (%) (No. of residues) 0.3 (9/2934)

* Numbers in parentheses refer to the highest resolution shell.

# Reference [15]

Glycan microarray analysis: Microarray printing and recombinant HA analyses have been described previously [6][16]. Imprinted slides produced specifically for influenza research for the CDC using the CFG glycan library (CDC version 1 slides; see Table 2 for glycans used in these experiments) were used.

Table 2: Glycans covalently attached on the glycan microarray . Different categories of glycans on the array are color-coded in column 1 as follows: No color, sialic acid; blue, α2-3 sialosides; red, α2-6 sialosides, violet, mixed α2-3/ α2-6 biantennaries; green, N-glycolylneuraminic acid-containing glycans; brown, α2-8 linked sialosides; pink, b2-6 linked as well as 9-O-acetylated sialic acids; grey, asialo glycans.

Chart # Structure Description
1 α-Neu5Ac α-Neu5Ac
2 α-Neu5Ac α-Neu5Ac
3 b-Neu5Ac β-Neu5Ac
4 Neu5Acα2-3(6-O-Su)Galβ1-4(Fucα1-3)GlcNAcβ α2-3 so4
5 Neu5Acα2-3Galβ1-3[6OSO3]GalNAcα α2-3 so4
6 Neu5Acα2-3Galβ1-4[6OSO3]GlcNAcβ α2-3 so4
7 Neu5Acα2-3Galβ1-4(Fucα1-3)(6OSO3)GlcNAcβ α2-3 so4
8 Neu5Acα2-3Galβ1-3(6OSO3)GlcNAc α2-3 so4
9 Neu5Acα2-3Galβ1-3(Neu5Acα2-3Galβ1-4)GlcNAcβ di-sialoside
10 Neu5Acα2-3Galβ1-3(Neu5Acα2-3Galβ1-4GlcNAcβ1-6)GalNAcβ di-sialoside
11 Neu5Acα2-3Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-3Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-3 biantennary
12 Neu5Acα2-3Galβ α2-3
13 Neu5Acα2-3GalNAcα α2-3
14 Neu5Acα2-3Galβ1-3GalNAcα α2-3
15 Neu5Acα2-3Galβ1-3GlcNAcβ α2-3
16 Neu5Acα2-3Galβ1-3GlcNAcβ α2-3
17 Neu5Acα2-3Galβ1-4Glcβ α2-3
18 Neu5Acα2-3Galβ1-4Glcβ α2-3
19 Neu5Acα2-3Galβ1-4GlcNAcβ α2-3
20 Neu5Acα2-3Galβ1-4GlcNAcβ α2-3
21 Neu5Acα2-3GalNAcβ1-4GlcNAcβ α2-3
22 Neu5Acα2-3Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ α2-3
23 Neu5Acα2-3Galβ1-3GlcNAcβ1-3Galβ1-3GlcNAcβ α2-3
24 Neu5Acα2-3Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ α2-3
25 Neu5Acα2-3Galβ1-4GlcNAcβ1-3Galβ1-3GlcNAcβ α2-3
26 Neu5Acα2-3Galβ1-3GalNAcα α2-3
27 Galβ1-3(Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ1-6)GalNAcβ α2-3 fucosylated
28 Neu5Acα2-3Galβ1-3(Fucα1-4)GlcNAcβ α2-3 fucosylated
29 Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ α2-3 fucosylated
30 Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ α2-3 fucosylated
31 Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ α2-3 fucosylated
32 Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ1-4GlcNAcβ α2-3 fucosylated
33 Neu5Acα2-3Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ1-4(Fucα1-3)GlcNAcβ1­3Galβ1-4(Fucα1-3)GlcNAcβ α2-3 fucosylated
34 Neu5Acα2-3Galβ1-4GlcNAcβ1-3Galβ1-4(Fucα1-3)GlcNAc α2-3 fucosylated
35 Neu5Acα2-3(GalNAcβ1-4)Galβ1-4GlcNAcβ α2-3 internal
36 Neu5Acα2-3(GalNAcβ1-4)Galβ1-4GlcNAcβ α2-3 internal
37 Neu5Acα2-3(GalNAcβ1-4)Galβ1-4Glcβ α2-3 internal
38 Galβ1-3GalNAcβ1-4(Neu5Acα2-3)Galβ1-4Glcβ α2-3 internal
39 Fucα1-2Galβ1-3GalNAcβ1-4(Neu5Acα2-3)Galβ1-4Glcβ α2-3 internal
40 Fucα1-2Galβ1-3GalNAcβ1-4(Neu5Acα2-3)Galβ1-4Glcβ α2-3 internal
41 Neu5Acα2-6Galβ1-4[6OSO3]GlcNAcβ α2-6 so4
42 Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1­6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 branched
43 GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-6)Manβ1­4GlcNAcβ1-4GlcNAcβ α2-6 branched
44 Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1­6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 branched
45 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(GlcNAcβ1-2Manα1-6)Manβ1­4GlcNAcβ1-4GlcNAcβ α2-6 branched
46 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 biantenary
47 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 biantenary
48 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 biantenary
49 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Galβ1-4GlcNAcβ1-2Manα1­6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-6 biantenary
50 Neu5Acα2-6GalNAcα α2-6
51 Neu5Acα2-6Galβ α2-6
52 Neu5Acα2-6Galβ1-4Glcβ α2-6
53 Neu5Acα2-6Galβ1-4GlcNAcβ α2-6
54 Neu5Acα2-6Galβ1-4GlcNAcβ α2-6
55 Neu5Acα2-6GalNAcβ1-4GlcNAcβ α2-6
56 Neu5Acα2-6Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ α2-6
57 Neu5Acα2-6Galβ1-4GlcNAcβ1-3Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ1­4(Fucα1-3)GlcNAcβ α2-6 + fucosylation
58 Galβ1-3(Neu5Acα2-6)GlcNAcβ1-3Galβ1-4Glcβ α2-6 internal
59 Galβ1-3(Neu5Acα2-6)GalNAcα α2-6 internal
60 Neu5Acα2-3Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-3/6 biantennary
61 Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-3Galβ1-4GlcNAcβ1­2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ α2-3/6 biantennary
62 Neu5Acα2-3Galβ1-3(Neu5Acα2-6)GalNAc α2-3/6 disialoside
63 Neu5Acα2-3Galβ1-3(Neu5Acα2-6)GalNAcα α2-3/6 disialoside
64 Neu5Acα2-3(Neu5Acα2-6)GalNAcα α2-3/6 disialoside
65 Neu5Gcα Neu5Gc α
66 Neu5Gcα2-3Galβ1-3(Fucα1-4)GlcNAcβ Neu5Gc α2-3
67 Neu5Gca2-3Galβ1-3GlcNAcβ Neu5Gc α2-3
68 Neu5Gcα2-3Galβ1-4(Fucα1-3)GlcNAcβ Neu5Gc α2-3
69 Neu5Gcα2-3Galβ1-4GlcNAcβ Neu5Gc α2-3
70 Neu5Gcα2-3Galβ1-4Glcβ Neu5Gc α2-3
71 Neu5Gcα2-6GalNAcα Neu5Gc α2-6
72 Neu5Gcα2-6Galβ1-4GlcNAcβ Neu5Gc α2-6
73 Neu5Acα2-8Neu5Acα Neu5Ac α2-8
74 Neu5Acα2-8Neu5Acα2-8Neu5Acα Neu5Ac α2-8
75 Neu5Acα2-8Neu5Acα2-3(GalNAcβ1-4)Galβ1-4Glcβ Neu5Ac α2-8 α2-3
76 Neu5Acα2-8Neu5Acα2-3Galβ1-4Glcβ Neu5Ac α2-8 α2-3
77 Neu5Acα2-8Neu5Acα2-8Neu5Acα2-3(GalNAcβ1-4)Galβ1-4Glcβ Neu5Ac α2-8 α2-8 α2-3
78 Neu5Acα2-8Neu5Acα2-8Neu5Acα2-3Galβ1-4Glcβ Neu5Ac α2-8 α2-8 α2-3
79 Neu5Acα2-8Neu5Acα Neu5Ac α2-8
80 Neu5Acα2-8Neu5Acβ Neu5Ac α2-8
81 Neu5Acα2-8Neu5Acα2-8Neu5Acβ Neu5Ac α2-8 α2-8
82 Neu5Acβ2-6GalNAcα β2-6
83 Neu5Acβ2-6Galβ1-4GlcNAcβ β2-6
84 Neu5Gcβ2-6Galβ1-4GlcNAc β2-6
85 Galβ1-3(Neu5Acβ2-6)GalNAcα β2-6
86 9NAcNeu5Aca 9NAcNeu5
87 9NAcNeu5Acα2-6Galβ1-4GlcNAcβ 9NAcNeu5
88 Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ asialo
89 Galβ1-3GlcNAcβ1-3Galβ1-3GlcNAcβ asialo
90 Fucα1-2Galβ1-3GlcNAcβ1-3Galβ1-4Glcβ asialo
91 Fucα1-2Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ1-4(Fucα1-3)GlcNAcβ asialo
92 GalNAcα1-3(Fucα1-2)Galβ1-3GlcNAcβ asialo
93 GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ asialo
94 Galα1-3(Fucα1-2)Galβ1-3GlcNAcβ asialo
95 Galα1-3(Fucα1-2)Galβ1-4(Fucα1-3)GlcNAcβ asialo
96 Galβ1-3GalNAc asialo


Neu5Ac = Sialic acid

Neu5Gc = N-glycolylneuraminic acid

OSO3= sulfate; Gal = galactose

Fuc = fucose

GlcNAc = N-Acetyl-D-glucosamine

GalNAc = N-acetyl-D-galactosamine

Glc = D-glucose

Man = D-mannose

9NAc = 9-O-acetyl

Results and Discussion

Expression and purification Recombinant HA protein from A/Darwin/2001/2009 (H1N1) (Darwin09) virus was expressed in a baculovirus expression system utilizing a thrombin site at the C-terminus of Darwin09 followed by a trimerizing sequence (foldon) from the bacteriophage T4 fibritin for generating functional trimers [17], and a His-Tag to aid purification. Although protein was expressed as a trimer, only monomers were purified by gel filtration after foldon removal by the thrombin cleavage step. However, these monomers were stable, the protein stock maintained its monomeric state even after 8 weeks storage at 4 °C (confirmed by dynamic light scattering analysis). However, monomers were still able to reform trimers in the crystal as evidenced by the structure reported here.

Fig. 1: Structural overview of the Darwin09 HA monomer.

(A) One monomer is shown with the location of the receptor-binding site (RBS) and the HA1/HA2 cleavage site circled. The positions of residues discussed in the text are highlighted in red (B) H1 HA antigenic sites, Ca, Cb, Sa and Sb are mapped onto a surface representation of the HA1 domain of the Darwin trimer with positions of nearby potential glycosylation sites colored orange. (C) For comparison, a model of the last H1N1 seasonal vaccine component, A/Brisbane/59/2009 was generated by homology modeling [18]. (D) RBS of Darwin09 with the three structural elements comprising this binding site, the 130-loop, 220-loop and the 190-helix, colored light blue, light green and olive, respectively. All the figures were generated and rendered with the use of MacPyMOL [19].

Overall StructureBy using x-ray crystallography, the structure of pandemic H1N1 HA from the Darwin09 virus was determined to 2.8 Å resolution (Table 1). The overall structure of Darwin09 is similar to other reported HA structures with a globular head containing the RBS and vestigial esterase domain, and a membrane proximal domain with its distinctive, central helical stalk and HA1/HA2 cleavage site (Figure 1A). We selected representative HAs from human pandemic subtypes for structural analysis. Darwin09 HA was found to be structurally very similar to the 1918-pandemic HA and the pandemic potential H5N1 HA in comparisons. Although closely related to the HA2 domains of the other swine H1, H2 and H3 subtypes in the analysis, the HA1 domains were more divergent (Table 3).

Table 3: Comparison of r.m.s.d. (Å) for HA1 and HA2 domains. For analyzing differences in the overall structure, r.m.s.d. values were calculated between monomers or domains of different pandemic and pandemic potential HA’s, after the Ca atoms of the HA2 domains were superposed by sequence and structural alignment onto the equivalent domains of Darwin09.

Subtype PDB entry HA1 Domain HA2 Domain
1918-Hu-H1N1 South Carolina/1/18 1RD8 0.57 1.10
1930-Swine-H1N1 A/swine/Iowa/30, 1RUY 2.38 1.33
1957-Hu-H2N2 A/Japan/305/57 3KU5 3.23 1.76
1968-Hu-H3N2 A/Hong Kong/1/68 2HMG 7.08 1.86
2004-Hu-H5N1 A/Vietnam/1203/04 2FK0 1.52 0.88

Although six asparagine-linked glycosylation sequons are present in the Darwin09 HA monomer, interpretable electron density was observed at only 3 sites in HA1, Asn23, Asn87 and Asn276. At these sites, only one or two N-acetyl glucosamines could be interpreted. Compared to recent seasonal HAs, potential glycosylation sites in the HA1 of the pandemic HA are in comparable positions (Figure 1B and C). Position 87 in the pandemic HA is also a glycosylation site in seasonal HAs and has been a conserved feature since 1918 [7]. On recent H1 HAs, a second site, at Asn54, is in very close proximity to Asn87 and it is not known whether both sites are occupied. Similarly, the pandemic HA also has two potential glycosylation sites at positions 276 and 286, at the bottom of the HA1 that are close together. However, no conclusions can be made from this structure with respect to double occupancy at these positions since density was only observed at position 276 in two of the six chains in the asymmetric unit.

The receptor binding domainThe receptor-binding site (RBS) is at the membrane distal end of each HA monomer and its specificity for sialic acid and the nature of its linkage to a vicinal galactose residue determines host range-restriction. As for other HA structures, the Darwin09 RBS is composed of three structural elements: a 190-helix (residues 184-191), a 220-loop (residues 218-225), and a 130-loop (residues 131-135), while other highly conserved residues: Tyr91, Trp150, His180, and Tyr192 form the base of the pocket (Figure 1D).

Interestingly, previous published research highlighted dual receptor specificity for the early pandemic viruses [20]. Using carbohydrate microarray analysis, the authors observed mixed a2-3/ α2-6 receptor specificity with two pandemic viruses (California/4/2009 and Hamburg/5/2009), while a seasonal H1N1 virus bound exclusively to α2-6-linked sialosides. Using recombinant HA we can also probe these microarray platforms [4][5][6]. By pre-complexing trimers using primary and secondary antibodies one can overcome the low affinity of HA for its glycan ligand [21] by increasing the valency. Results using recombinant HA revealed a strict preference for five human-type sialyl-glycans, with no significant binding to avian α2-3 receptor analogs. All pandemic recombinant HAs bound to a α2-6 sialylated tri-N-acetyllactosamine glycan in which the two proximal (reducing end) lactosamines are α1-3 fucosylated (glycan #57 in the Table 2) as well as to a structurally related long linear α2-6 sialylated di-N-acetyllactosamine (Figure 2, glycan #56). These glycans were detected in N-glycans of cultured human bronchial epithelial cells [22]. Two other structurally diverse glycans, a α2-6 sialylated-sulfated N-acetyllactosamine structure (glycan #41) and the α2-6 sialylated LacNAc (glycans #53 & 54) were also recognized by these HAs. In addition, the proteins in this study bound weakly to α2-6 sialylated bi-antennary glycans (glycans #46-48), which are typically found on membrane glycoproteins [23]. These results were comparable to the two seasonal HAs used in the analysis (A/Solomon Islands/3/2006 and A/Brisbane/59/2007 are the two H1N1 components of the 2007-2008, 2008-2009 and 2009-2010 trivalent vaccine) although good binding to the α2-6 sialylated bi-antennary glycans (glycans #46-48) was observed for the Solomon Islands/3/2006 recombinant HA. Thus, these pandemic viruses bind to human type receptors as shown and postulated by previous reports [24][25]. This strict specificity is in contrast to the Childs et al report [20]. However, these differences can be attributed to the different platforms used as well as increased valency of the virus, which might enhance binding to weak ligands.

Fig. 2: Glycan microarray analysis of pandemic H1 recombinant HAs.

Protein of A/Texas/5/2009, A/Darwin/2001/2009, A/Ohio/7/2009 and A/Utah/20/2009 were analyzed and compared to the recent vaccine candidates from seasonal H1 HAs, A/Solomon Islands/3/2006 and A/Brisbane/59/2007. Colored bars highlight glycans that contain α2-3 SA (blue) and α2-6 SA (red), α2-6/ α2–3-mixed SA (purple), N-glycolyl SA (green), α2-8 SA (brown), b2-6 and 9-O-acetyl SA, and non-SA (grey). Error bars reflect the standard deviation in the signal for six independent replicates on the array. Structures of each of the numbered glycans are found in Table 2.

Genetic and antigenic changes Four antigenic sites for H1N1 virus HAs, have been identified (Ca, Cb, Sa, and Sb) [26][27]. In Darwin09, with the exception of Ca, all are exposed for antibody recognition. The Ca site is proximal to the oligosaccharide at HA1 Asn87, which may interfere with antibody recognition of this region. In recent seasonal H1 HAs the Sa site (and possibly Sb) looks to be affected by the presence of two glycosylation sites at positions 125 and 159 (Figure 1D). Lack of these sites in the pandemic HA exposes the entire top of the HA1 for targeting by the immune system and this feature may explain why the antibody recall response to the pandemic vaccine in adults was so effective [28].

Since the pandemic virus first emerged, the majority of viruses have shared a Ser203Thr amino acid change in the HA. This position is near the monomer-monomer interface and the small change in side chain appears not to have had a dramatic effect on the HA structure. Introduction of the extra methylene group in the side chain may help to stabilize the loop region in its surrounding environment (Figure 1A). Currently, two circulating subsets of viruses have amino acid changes, Asp222Glu or Glu374Lys, in the HA. The Asp222Glu mutation is in the receptor-binding site and may modulate which glycans bind to the receptor (Figure 1A). The latter mutation at position 374 is in the HA2 (residue 47) and points into the cavity where the fusion peptide resides in the mature fusion ready form of the HA molecule. Although this mutation may affect stability in this region (Figure 1A), it is also close to a region identified by two recent HA/neutralizing antibody structures which target the stem region of the HA [29][30]. Little is known about the immune response to this region and whether this mutation is able to modulate antibody binding.

Other HA mutations have also been observed that affect antigenicity, but most have been sporadic throughout the year, geographically separated or results of egg growth [31]. In particular, changes at positions 153-157 in the HA have been associated with reduced HI titers with ferret antisera to the A/California/7/09 vaccine virus. In most (if not all) cases, these changes have emerged after virus propagation in cell cultures. The structure highlights this region to be a prominent loop on the top left of the receptor binding site and is a component of the Sa (H1) or Site B (H3) antigenic site (Figure 1A and 1B) [27]. In the pandemic H1 HA, this region is exposed to the host immune system and not masked by vicinal glycosylation sites. Although this position is known to affect antigenicity, it does not appear to change receptor binding as shown by the glycan microarray results for A/Utah/20/2009 which has as Asn156Asp change compared to the other pandemic virus HAs analyzed (Figure 2). Its ability to change easily also highlights this region as a potential ‘hot spot’ for future mutation as the human population gains immunity and the virus experiences increased pressure to evade the immune response.

More recently, there has been focus on the possible role of a mutation at position 222 and its role in severe clinical outcome [32][33]. The Asp222Gly and Asp222Asn single and mixed variants have been found in pandemic viruses as well as direct sequencing from clinical specimens collected throughout the 2009 pandemic from approximately 20 countries, including Norway, Mexico, Ukraine and the USA. As already described above, position 222 resides in the receptor binding site of the HA protein and may possibly influence binding specificity. Indeed, the HA from the previous H1N1 pandemic in 1918 switched from avian to human receptor specificity through mutation at two positions (Glu187Asp and Gly222Asp) [5]. (The pandemic virus HA is also an Asp at position 187). In addition, the A/New York/1/18 strain of the 1918 pandemic possessed a Gly at position 222 and this markedly affected receptor binding, reducing α2-6 preference and increasing weak α2-3 [5].

Fig. 3: Glycan microarray analysis of A/Texas/5/2009 mutants.

The effect of position 222 mutations was assessed on the A/Texas/5/2009 framework by mutating the Asp to: A) a Gly and B) an Asn. Graphs are formatted as for Figure 2. C) The receptor binding site of Darwin09 with a 6’-sialyllactosamine (6’-SLN) modeled into the pocket highlights the residues that could contribute to the hydrogen bonding network between the receptor and the HA. Putative hydrogen bond interactions between the glycan and the HA RBS are shown as green broken lines.

To address this question on the 2009 pandemic H1N1 virus, we mutated position 222 on the A/Texas/5/2009 HA to produce variants with either an Asp222Gly or an Asp222Asn mutation. Interestingly, glycan microarray analysis of these mutants revealed a α2-6 binding profile (Figure 3A and 3B) similar to the wild-type A/Texas/5/2009 recombinant HA (Figure 2). However, these mutants also bound weakly to sulfated α2-3 sialylglycans (glycans #4-8) as well as α2-3 and α2-3/α2-6 di-sialoside structures (glycans #9 & 10). Currently, it is unknown if the same profile will be reflected with viruses carrying the same mutations on the glycan microarray or if the increased valency of the virus due to the increased number of HAs on the virus surface will enhance this weak binding. Thus, on the current pandemic HA framework, the effect of these mutations at position 222 on receptor binding appears less dramatic when compared to the 1918 framework since the binding preference for α2-6 sialylglycans is still maintained. Analysis of the RBS of Darwin09 offers a possible reason. The galactose of α2-6-linked receptors can interact via its 3- and 2-hydroxyls through a hydrogen bond network using residues Lys219, Asp222 and Glu224. A loss of Asp222 through mutation might not compromise this network to the same extent as was seen in the 1918 HA framework when the Asp225Gly mutation was introduced [5].


Although a number of mutations have been reported in circulating pandemic H1N1 viruses, they have not affected virus antigenicity and pathogenicity. The use of the Darwin09 structure to analyze the interactions of these HAs with virus receptors highlights the importance of having structural information to aid such analysis. The expression system used here also provides an important route for the safe production of these pandemic proteins on a large scale. Availability of recombinant protein enables its use for downstream applications such as glycan microarray analysis, as described here, reagents for diagnostic kit development or as antigens for antibody production. If this methodology were not available, HA production from the virus would have been difficult at the start of the pandemic, due to stringent biosafety requirements. Rapid determination and dissemination of the pandemic H1N1 hemagglutinin 3-D structure and characterization of its receptor specificity should enable the medical and public health research community to develop improved intervention approaches to control and prevent influenza morbidity and mortality as this virus becomes endemic in human populations.

Competing Interests

The authors have declared that no competing interests exist.

]]> 0
Characterizing the initial diffusion pattern of pandemic (H1N1) 2009 using surveillance data Fri, 12 Mar 2010 04:10:42 +0000


First identified in Mexico and originating from swine host, the novel human influenza (H1N1) 2009, hereafter referred as swine flu, has rapidly swept across the world after its discovery in human populations in April 2009. In many countries, stringent case-based epidemiologic investigations had been introduced, allowing public health authorities to design appropriate interventions for prevention and control. However, the high transmission rates and relatively moderate course of influenza disease did not justify prolonged implementation of such measures. In the US, for example, case reporting was discontinued in July 2009, a change that was later followed in other countries. In retrospect, the accumulation of data in the initial months is contributing to a pool of useful resources for supporting epidemiologic research, which has continued to generate outputs in term of risk factor analyses, clinical guidelines development, and the establishment of public health policy.

In Hong Kong, the first imported swine flu case was reported on 1 May 2009. The first wave peaked in September 2009 with more than 4500 cases recorded per week. By the first week of November 2009, over 30,000 laboratory confirmed cases have been reported to the Department of Health of the Hong Kong Special Administrative Region Government [1]. In the first two months of May and June 2009, investigation of each laboratory confirmed case was performed actively by the Department’s Centre for Health Protection (CHP). The daily statistics, alongside residence location down to building level of each case, was uploaded on the CHP website. Active case investigation was continued through the end of September, after which mandatory testing of suspected cases stopped. In this study, we set out to explore the diffusion pattern of swine flu by examining all reported cases in the first three months, which represents therefore the very initial spread of the infection in the territory of Hong Kong, home to a 7 million population and gateway to mainland China. We reckon that an exploration of the routinely collected georeferenced data would allow epidemiologic pattern to be delineated, which would be of useful reference should another epidemic occur in the future.


A database was established to include all laboratory confirmed cases of swine flu reported between 1 May and 31 July 2009 in Hong Kong, Provided by the CHP, the anonymised dataset had included the age, gender and building location of each individual. Institutional approval for access to the data was sought from the Department of Health, in compliance with the Personal Data (Privacy) Ordinance. Each textural residential address was transformed to x and y coordinates in Hong Kong Grid 1980. The geocoding led to the creation of a georeferenced dataset with point data for each swine flu patient during initial spread.

Digital maps were acquired from the Lands Department of the Hong Kong Special Administrative Region Government. Hong Kong is divided into 3 main regions – Hong Kong Island, Kowloon Peninsula and the New Territories, under which there are 18 administrative districts and 400 District Council Constituency Areas (DCCA), the latter being boundaries of subdistricts that have been created for electoral purpose, each with a population of about 17000. The geographic units at these three levels were used in the study. Statistics of the districts and DCCA at the most recent by-census in 2006 were obtained from the Census and Statistics Department. Microsoft Excel was used for data input. Statistical analysis was performed using Statistical Package for the Social Sciences version 13.0 (SPSS Inc 2004). ArcGIS 9.2 was used for mapping and exploratory spatial analysis. SaTScan™ was used for the detection and characterization of space-time cluster.


General characteristics and distribution of swine flu case

In the first three months since the diagnosis of the first case, a total of 3675 laboratory confirmed swine flu cases were recorded, of which 3460 (94.1%) could be geocoded. The characteristics of these cases are described in table 1. Overall, the male-to-female ratio was 1.04:1. Over half of the reported cases (61.2%) were between the age of 11 and 30. As we did not have the student status of the study population, we defined “students” as those who were likely to be attending kindergartens, primary or secondary schools on a daily basis. The age of these “students” would be those between 3 and 20, which amounted to 1925 (55.60%) of the geocoded population. On a spatial scale, the distribution of the study population was heterogeneous, with a reported rate ranging from 30.70 to 91.58 per 100,000 population across the 18 districts.(table 2) Compared to the 2 other main regions, Hong Kong Island accounted for 30.43% of all reported cases, against the population of 1.55 million; compared to 26.23% (1.74 million population) and 43.32% (3.57 million population).

Table 1. Characteristics of geocoded swine flu cases reported to the Department of Health of Hong Kong, between 1 May and 31 July 2009

Almost all DCCAs (394, 98.4%) have reported at least one case over the three-month period. However, the total number of cases reported in each DCCA varied significantly. Cumulatively, only 19 DCCAs (4.75%) have reported over 20 cases. The weekly number of cases reported also varied geographically, as shown in Table 2. The increase of reports was more notable after the 5 th week, at around the same time that local (within Hong Kong) spread was documented. The distribution of all reported cases is shown in Figure 1, against the background of the by-census population for 2006. Apparently the number of case reports did not correlate with population density.

Fig. 1: Map showing the overall distribution of reported swine flu cases (student and non-student) in the first three months of the epidemic

Table 2. Weekly reports of swine flu cases from District Council Constituency Areas (DCCA), 1 May to 31 July 2009

Spatio-temporal diffusion patterns

Figure 2(a) shows the diffusion of swine flu cases using an inverse distance weighting (IDW) model for interpolation. Uninhabitable areas including the harbour and mountains were excluded in the implementation of the model. The 3-month duration was divided into 9 time periods – the first covering 1 May through the end of the 5th week, during which local transmission has not yet been documented, followed by 8 weekly periods. These are shown as different shade intensity on the map. While the initial cases were more localized to Hong Kong Island, separate foci could be seen in the two other regions. SaTScan™ was applied to further characterize the spatio-temporal patterns. Using the Space-time Permutation Model, and with the implementation of a 50% spatial and temporal window, 6 clusters could be identified (p<0.05). The results are shown in figure 2(b) and Table 3. On a temporal scale, only the primary cluster on Hong Kong Island has extended from the very early phase through week five and beyond, an indication that initial local spread has occurred evidently there. The Hong Kong Island primary cluster was temporally linked to the primary clusters in Kowloon and the New Territories. The 2 Kowloon secondary clusters also followed the Hong Kong Island primary cluster in time, while the secondary cluster in the New Territories could have been one of the initial source foci, alongside the Hong Kong Island primary cluster.

Fig. 2: Display of the temporo-spatial patterns of swine flu spread in Hong Kong: (a) Diffusion of all cases by interpolation using inverse distance weighting, after exclusion of non-inhabitable areas, demonstrating situation through 9 time intervals, beginning 1 May 2009.

(b) Demonstration of temporo-spatial clusters of all cases, student cases and non-student cases using SaTScan™, with analyses made separately in the three regions of Hong Kong Island, Kowloon Peninsula and the New Territories)

Temporo-spatial distribution was explored in the three main geographic regions after separating all cases into students and non-students.(figure 2(b)) Four otherwise unidentified clusters were revealed. Two of these were student clusters – one in the New Territories (District R) and the other on Hong Kong Island (District C), the latter partially overlapping with and temporally following the primary all-cases cluster. The other two were small non-student clusters, one overlapping spatially with a student cluster in District J in Kowloon but preceded it by one month, while the other temporally following and spatially overlapping with an all cases cluster in the same region (District G). Exploration of non-student cases did not reveal the presence of any additional unique clusters. Interpolation by IDW of non-student cases resulted in scattered foci throughout the territory, while that for students looks remarkably similar to the all-cases map.

Table 3. Spatio-temporal clusters identified in three geographic regions in Hong Kong using SaTScan™

Using results from SaTScan™ and IDW, the paths of diffusion of swine flu cases were reconstructed, which is shown in figure 3. It can be seen that there were multiple foci of spread over this initial period of three months. The earlier foci on Hong Kong Island and Kowloon Peninsular could be related to one another, which were followed by 3 other foci in the New Territories, including Lantau Island on the west. At the end of the three-month period, the distribution of swine flu cases remained scattered.

Fig. 3: Diffusion of swine flu cases reconstructed from data obtained from temporo-spatial exploration depicted in figure 2.


The swine flu pandemic is now a well-characterized condition, in biologic, clinical and epidemiologic terms. [2] From the public health perspective, spatial diffusion of swine flu has, however, remained a less commonly appreciated phenomenon. Diffusion can be described as “the dynamics by which a phenomenon originally located at one point becomes transferred to another.” [3] The concept of diffusion is not restricted to public health but has also been examined in social contexts, like urbanization and social movements. [4][5] In infectious disease epidemiology, diffusion is an important concept as it depicts the dynamics of the spread of the microorganism in time and space. Surprisingly, diffusion has only been systematically examined occasionally, in a limited number of infectious disease outbreaks, for example, measles, [3] cholera, [6] pertussis, [7] poliomyelitis, [8] typhoid [9] and H5N1 avian influenza. [10] The emergence of swine fu has provided a unique opportunity to study the diffusion of influenza virus, as has been reported on a global scale, [11] but not at national levels. Very often, epidemiology of swine flu in a country is described as a snapshot overview of the condition without further analysis on its dissemination within national boundary. In the development of national strategy, it is crucial to understand the dynamics of spread within a country so that interventions can be prioritized and rolled out strategically, at the right time and place. It is against these backgrounds that the study was conceptualized when swine flu hit Hong Kong about a year ago.

Our study showed that swine flu has not been spreading swiftly across the territory in the initial phase since the first case was discovered. At the end of the first three month period, one fifth of the DCCAs, each with a similar population size, have reported no more than 5 cases. A combination of SaTScan™ and IDW methodologies has enabled us to highlight six initial foci with spatial diffusion on Hong Kong Island and Kowloon Peninsula, the highly urbanized regions in Hong Kong. There were smaller temporo-spatial clusters of infections beyond these densely populated areas. The relatively “slow” pace of spread supports the observations reported by other researchers that airborne transmission of swine flu was lacking. [12] The virus has presumably permeated through the population via close person-to-person contacts. The mapping of residence locations of all reported cases is therefore a valid investigative approach. Interactions within households and with neighbours in close spatial proximity should have underlain swine flu diffusion in the population. The slow diffusing pattern is in line with the relatively low basic reproductive number of 1 to 3, [13][14][15] compared to that of measles, an airborne virus. Interestingly, household transmission, though important, may in fact be less efficient than seasonal influenza, [16][17] though this remains to be confirmed. By separating the geocoded cases into students and non-students, we further determined that student infections tended to be more clustered. If all student cases were excluded, the connectivity among swine flu cases in the community became very loose and might not have led to the subsequent epidemic. Students were therefore likely to be the main virus disseminators across Hong Kong, as has been reported in other countries for swine flu. [18][19] and seasonal influenza. [20]

Our study carries certain limitations. Firstly, the data were drawn from a notification system, and therefore under-reporting was unavoidable. Specifically the milder cases could have been missed, while diagnoses of other patients might have been omitted if sampling was not performed in time. The date of diagnoses of each person had, likewise, varied considerably from one health service setting to another, an artifact which might affect the precision of any time-space analyses. However, the high volume of reported cases and the meticulous case-based investigation process introduced by the Government have served to minimize the problems arising therein. Secondly, there was no perfect or comprehensive tool for characterizing infection diffusion. In our study, IDW turned out to be a useful model for assessing the locations of initial foci as well as the directions of diffusion, whereas SaTScan™ offered a mechanism for defining critical masses of cases in time and space. The final model, albeit a crude one, can be adapted for enabling rapid assessment of infection diffusion to be made when a virus or other microorganism becomes introduced in a population. The assessment has relied on the use of regularly collected data, their processing, followed by the application of standard GIS tools for depicting the pattern of temporo-spatial diffusion. The results can be used for prioritizing the implementation of vaccination programme, as vaccines are often in short supply in the initial phase. Our results lend support to the high heterogeneity of the pattern of swine flu diffusion, which is closely associated with population structure and mobility, [21] In practice, the diffusion pattern of an infection should be routinely delineated in a locality as an integral component of public health responses to any infectious disease threats.

Competing interests

The authors have declared that no competing interests exist.

]]> 0
Evolutionary pattern of pandemic influenza (H1N1) 2009 virus in the late phases of the 2009 pandemic. Thu, 04 Mar 2010 04:47:25 +0000


2009 influenza A(H1N1)v pandemic virus has emerged following a recent reassortment event between swine strains [1][2]. Its jump in the human population has been tentatively dated back to the beginning of the year [3][4], and very early in its history the new virus could be differentiated in clades or, as later defined, clusters. [5] The significance of these findings is not clear, both in terms of a possible evolutionary pathway of the pandemic virus and in terms of pathogenicity. The early data showed that clade 7 (as in ref. 4, or cluster 2 in ref. 5) appeared in New York a few weeks after clades 1 and 2 were isolated in Mexico and California, and originated late in March, but all clades were reported to co-circulate in all continents thereafter. After September, a second more intense peak has involved most temperate countries in the Northern Hemisphere. However, viral sequence information on this second outbreak is relatively scant, and no clear trend in viral evolution has been outlined yet. In Italy most clades were circulating in the first months of the pandemic, when the great majority of the infections were imported by travellers (mostly from North and South America) who had become infected abroad. As in most European countries, a second, more intense wave of infections occurred in Italy during the period October-November 2009. Unlike the first epidemic peak of imported infections, this peak was powered by the rapid local spread of the virus (which had been circulating at low intensity during the whole summer) in children and adolescents (and their contacts) due to the opening of schools, kindergartens and other communities after the summer holidays.


We examined the nucleotide sequences (amplified from nasal swabs) of the hemagglutinin (HA, bases 440-828 of the coding sequence) and neuraminidase (NA) genes (variable length) from respectively 19 and 23 influenza A(H1N1)v strains isolated in the city of Rome, in the period May-August 2009. At position 658 (from the start codon) of HA the frequency of T was 63% (12/19), while the frequency of A (signature of clade 7/cluster2 virus) was 37%. These percentages were similar to those deduced from 589 HA sequences isolated globally (mostly in Mexico and in the United States) before August [5]. Viral strains isolated after September can be considered genuinely representative of the local evolution of the pandemic. The complete or partial HA and NA nucleotide sequences of an additional 43 influenza A(H1N1)v isolates were obtained. The frequency of the signature HA nucleotide 658 variants in those later isolates was 0% and 100% respectively for T and A, documenting the disappearance of other clades in favour of clade 7. Among these sequences, no mutations considered to be biologically significant were detected, including the D222G/N (position 239 from the start codon in the H1N1 2009 pandemic virus) in HA and oseltamivir resistance mutations in NA, despite most patients had been subjected to treatment. Fig.1a and b shows Neighbour-Joining phylogenetic trees (with bootstrap test) of HA and NA sequences (respectively) isolated at INMI from April to December 2009 (with indicated the month of collection) and compared to representative sequences of the initial pandemic from North America in April.

Fig. 1: 1a. Phylogenetic tree of partial HA sequences from this study (in blue), indicating the month of collection, in comparison to sequences from North America collected in April.

Clade 7 and the HA G222E subclade are indicated. 4 additional HA G222E sequences (collected after June) from USA and Sweden, clustering with sequences from Rome, are indicated by green and orange arrows, respectively. The sequences from this study are indicated by the names deposited in GenBank, abbreviated (Italy/xxx or Rome /xxx).

1b. Phylogenetic tree of complete (with a few exceptions) NA sequences from this study (in blue), indicating the month of collection, in comparison to sequences from North America collected in April. Clade 7 and the HA G222E subclade are indicated. NA sequences from the same HAG222E isolates as in Fig 1a from USA and Sweden are indicated by green and orange arrows, respectively. Blue arrows indicate early clade 7 sequences from Rome clustering with non clade 7 sequences. The sequences from this study are indicated by the names deposited in GenBank, abbreviated (Italy/xxx or Rome /xxx).

The HA tree confirms the clade shift during the summer. To be noted that a separate subclade of clade 7 consists of 12 sequences with the D222E substitution, which has been found at higher frequency in Italy, in Turkey and in Sweden, and whose biological meaning is still unknown. In contrast, a phylogenetic tree comparing NA sequences (full-length with a few exceptions), shows co-clustering of most clade 7 sequences with those from other clades until June, followed by progressive divergence of the late clade 7 Italian sequences from early New York sequences. By the end of June, clade 7 sequences from Rome isolates were clustering in three distinct subclades of clade 7. One of these was apparently the most successful as it was associated with the big autumn wave of infections. NA sequences from viruses bearing the HA G222E cluster together (including those from USA and Sweden), indicating that this mutation did not appear by converging evolution of different strains but rather may represent a signature of an authentic subclade within clade 7 sequences.

To establish whether the significant clade shift was due to the local epidemic rather than a global phenomenon, and to identify the time course and the local trends of this evolution in different parts of the world, the total set of HA sequences downloadable from GISAID database (December 2009) was divided by country (from countries with a reasonable number of sequences available over a period of at least three months) or geographical area and by month, and analyzed at the signature nucleotide 658 in the HA sequence. There are a few limits to such a database analysis: 1) the geographical origin of the isolates does not necessarily indicate the actual origin of the infection, rather merely the origin of the infected person, especially for early European and Asian isolates; 2) the collection times have been submitted with a variable degree of precision and some might be unreliable; 3) the majority of isolates have been sequenced in the first months of the epidemic, while very few sequences of later isolates have been published. Despite these limits, the pattern of cluster substitution appears quite clearly everywhere. Fig. 2 shows the increasing proportion of clade 7 sequences in different countries from April to December. These patterns are completely superimposable if other signature nucleotides [4][5] of clade 7/cluster2 virus are analyzed (not shown): NS position 367, MA positions 492 and 600, PB2 position 2163, confirming the stability of the clade and the absence of late reassortment events for these segments.

Fig. 2: The increasing proportion of clade 7 sequences in different countries from April to December.

Proportion (%) of clade 7 sequences (HA) for each month in different countries or geographical areas.

The time frame of this phenomenon was different depending on the country. In the Southern hemisphere (Oceania and South America) clade 7 was already predominant by the end of April, reflecting perhaps the faster spread of the epidemic in the Southern (winter) part of the globe. In Singapore, China and Scandinavian countries the shift occurred a few weeks later, while in the majority of northern hemisphere countries it occurred mostly between June and July. The apparent fall of clade 7 sequences from April to May 2008 in the USA, UK and Canada, might be considered an artefact due to the overrepresentation of sequences from the New York clade 7 outbreak in April, which ignited the boom of scientific and public interest. The somewhat erratic behaviour of South American sequences (mostly from Brazil, Chile and Argentina) can be attributed to the fact that the aggregation of data from different countries with such a huge North-South extension and with different timing in the epidemic peaks (necessary because of the low number of available sequences) might not be reliably representative of the whole area epidemic. The selection of clade 7 virus in Italian isolates (data from our lab, n= 66, aggregated to those published by other labs, n= 133 ) appears slightly anticipated compared to other Northern Hemisphere countries such as Mexico, USA, Japan, Spain. In USA and in Japan a few sequences other than clade 7 were collected apparently as late as October/November.

While the dynamics of the epidemics suggests a possible selective advantage of clade 7 virus over other early clades, the evolutionary trends within clade 7 need further investigation. One interesting analysis consists in the quantification of the selective pressure acting on the single genetic segments of the virus. For this purpose, 2 groups of clade 7 sequences (for each genomic segment) were selected, respectively collected in May (mostly from the initial New York outbreak) and in November (from all parts of the world, after the October peak). May sequences were randomly selected to match the much smaller number of November sequences. The selective pressure acting on each segment was computed as the ratio between the rate of non synonymous substitutions per non synonymous site and the rate of synonymous substitutions per synonymous sites (Ka/Ks) using the Nei and Gojobori substitution model ([6] with the Jukes-Cantor correction) implemented in the MEGA 4.0 package [7]. The Ka/Ks values suggest that a strong purifying selection was active on all segments, in agreement with previous findings for this and other influenza viruses [1][8]. In particular, purifying selection was extreme (<0.1) on NP, MP, PA and PB1, moderate (>0.2) on NS and HA. To identify if this evolutionary pattern changed during the course of the pandemic, and to identify the period of the maximum positive selective pressure on the virus, the ratio of average pairwise Ka and Ks was computed separately within the May and the November groups of sequences (again for each segment), while the evolutionary step during the period May-November (that encompassed the greatest number of infections for most countries) was analyzed as the ratio of average pairwise Ka and Ks between each May and each November sequence (Fig. 3a).

Fig. 3: Average Ka/Ks values and nucleotide distances within and between the same set of sequences.

A) Average Ka/Ks values within May sequences, between May and November sequences and within November sequences, for each genomic segment. B) Nucleotide distances within and between the same set of sequences. For each segment, the number of sequences analyzed in the May group, which matches those in November, is indicated.

Two different patterns can be observed: for PB1, PA, and NA segments Ka/Ks remained very low and constant for the whole period; by contrast, for NS, HA , PB2, NP and MP, Ka/Ks, although highly variable among segments, showed a common decreasing pattern. The dramatic reduction from May to November suggests that the strongest “positive” selective pressure acted before May on these segments. By contrast, the genetic distance values (Fig. 3b) for the same sets of sequences, indicate a progressive increase over time. Taken together, these findings suggest that the October-November peak of infections, which occurred in the Northern Hemisphere, did not impose any significant positive selective pressure on clade 7 virus, but only a random genetic drift in strict purifying selection conditions.


Our study demonstrates that pandemic (H1N1) 2009 virus has evolved worldwide, shifting from an initial mixed clade pattern to the predominance of one clade (clade 7) during the course of the pandemic. The virus constituting this clade was therefore responsible for most of the pandemic burden worldwide. After its origin, which remains obscure, clade 7 virus has been subjected to strong purifying selection, with the exception of the earliest phases of its evolution, behaving later as a well-fit virus, similar to viruses circulating in swine or seasonal influenza in humans. Interestingly, the highest Ka/Ks values were associated to HA and NS, key proteins for virus-host interactions, suggesting adaptation to the new host species. As yet, no pathogenetic correlate of this evolution has emerged, since no clear trend in the clinical aspects could be observed between the early and the late peaks of the epidemic [9]. Neither was a clear clinical impact demonstrated for HA variants which occurred on clade 7: D222/G/N or E, (WHO report, 28th December 2009). The hypothesis that clade 7 virus enjoyed a marked advantage, in terms of transmissibility, over other early clades is intriguing, but has yet to be demonstrated.

Materials and Methods

Patients and samples

Patients with febrile respiratory illness from the southern half of Rome were referred to the National Institute for Infectious Diseases (INMI) “L.Spallanzani” for diagnosis and treatment. All patients underwent nasal and faringeal swab sampling, and both swabs were stirred in the same tube containing RPMI 1640 tissue culture medium with antibiotics. Approximately half of the sequences in this study were obtained from patients with severe respiratory syndromes, the other half were from randomly chosen patients with mild symptoms.

RNA extraction, amplification and sequencing

Nucleic acids were extracted from the swab fluid by an automated procedure (Biorobot MDx, Quiagen, Hilden, Germany) and amplified by in house methods using One-Step qRT-PCR system (Invitrogen, Carlsbad CA, USA) to yield partial or full-length sequences of HA and NA. Sequencing was performed on an automated ABI Prism 3130 instrument (Applied Biosystems, Foster City CA, USA) by use of Big Dye3.1 cycle sequencing kits provided by the same manufacturer. All sequences have been deposited in Gen Bank with the following accession numbers: from CY052070 to CY052092 and from CY055309 to CY055414.

Phylogenetic and selective pressure analysis

The sequences were aligned by the Clustal algorithm. Phylogenesis was performed using the MEGA 4 package [7]. The Neighbor-Joining phylogenetic trees (300 Bootstrap replicas) were generated using the Tamura 3 parameters distance option. The selective pressure acting on each segment was computed using the Nei and Gojobori substitution model ([6] with the Jukes-Cantor correction): Ka/Ks values were calculated as the ratio between the average rate of non synonymous substitutions per non synonymous site of all pairwise comparisons (average Ka), and the average rate of synonymous substitutions per synonymous site of all pairwise comparisons (average Ks). The genetic distances between sequences were calculated by the Tamura 3 parameter method.

Competing interests

The authors have declared that no competing interests exist.

]]> 0