Previous human influenza pandemics were the results of emerging viruses from non-human reservoirs, with at least two caused by strains of mixed human and avian origin. Also, many cases of swine influenza viruses have reportedly infected humans, including the recent human H1N1 strain, isolated in Mexico and the United States. Pigs are documented to get infected with human, avian, and swine viruses and allow productive replication, thus it has been conjectured that they are the “mixing vessel” that create reassortant strains, causing the human pandemics. In this paper, we apply several statistical techniques to an ensemble of publicly available swine viruses to study the reassortment phenomena. The reassortment patterns in swine viruses confirm previous results found in human viruses that the glycoprotein coding segments reassort most often. Moreover, one of the polymerase segments (PB1), reassorted in the strains responsible for the last two human pandemics of 1957 and 1968, also reassorts frequently.
Pandemics are epidemics that rapidly spread on a worldwide scale, caused by pathogens against which humans have no immunity that infect a large part of the population and lead to associated serious illnesses. Human influenza pandemics are caused by emerging influenza viruses from non-human reservoirs. From the three influenza pandemics of the twentieth century, the 1918 pandemic was possibly caused by an influenza virus with an avian origin
There also have been many cases of swine influenza viruses infecting humans
Influenza A virus can be found in humans and a variety of animals with aquatic birds being considered as its main reservoir. Influenza viruses do not usually transmit between different hosts. However, pigs are documented to be infected with avian and human viruses, in addition to the swine viruses. Furthermore, multiple reassortment events are found to happen under natural conditions
The influenza A virus genome consists of eight single-stranded RNA segments that code for eleven known proteins. The PB2, PB1, and PA segments encode the RNA polymerase, and HA, NP, NA, and M encode hemagglutinin, nucleoprotein, neuraminidase, and the matrix proteins, respectively. Two distinct non-structural proteins are also coded by the NS segment. The subtypes of influenza A viruses are determined based on their antigenic surface glycoproteins, hemagglutinin and neuraminidase. Hemagglutinin binds to α2,3-galactose- and α2,6-galactose-linked sialic acids. The former is more preferential in avian viruses and the later in human viruses. However they are both present on the tracheal epithelium surface in pigs, making them susceptible to both avian and human viruses.
In addition to the genomic drift of influenza A virus that is caused by the high error rate in the process of replication of its genome, and the antigenic pressure on the HA and NA segments, the evolution of the virus is shaped by the reassortment process. When two different strains of influenza virus co-infect the same cell, new virions can be created that contain a mix of segments from both original strains. This phenomenon was responsible for the 1957 pandemic when the human H1N1 strain that had been circulating since 1918 reassorted to become a human H2N2 strain with new PB1, HA, and NA segments of avian origin
Swine classical H1N1 strains have been circulating in pigs since the human influenza pandemic in 1918 and were the dominant strains in the United States until 1998, when two new swine H3N2 strains were identified. These new strains were the results of a double reassortment of swine classical H1N1 with the PB1, HA, and NA segments from a human H3N2 strain, and a triple reassortment of swine classical H1N1, with the PB1, HA, and NA segments of a human H3N2 strain and the PB2 and PA segments of avian lineage
In this paper, we employ the temporally and geographically diverse information deposited in the Influenza Virus Resource of the National Center for Biotechnology Information
To compare the diversity within the segments of swine influenza A virus, we use strains deposited in the Influenza Virus Resource of the NCBI that have all eight segments completely sequenced. We include 150 sequences, containing 99 H1N1, 25 H1N2, 23 H3N2, and 3 H3N1 strains (see Appendix). For each segment, we align the sequences of their coding regions using the Smith-Waterman algorithm and calculate the normalized Hamming distances only at the third codon positions, to eliminate the effects of evolutionary pressure due to positive selection. For the M and NS segments, we only consider the coding regions of the M1 and NS1 genes, as they are the longest and the most frequently sequenced sections of the M and NS segments. Because homologous recombination is very rare or absent in influenza viruses
To measure the diversity of a segment i, we calculate D i , Rao’s quadratic entropy
where N is the total number of strains in the dataset and diab is the Hamming distance between strains a and b at the third codon positions of their corresponding segment i. We estimate the confidence intervals for the diversity measurements via 1000 bootstrap re-samplings of the dataset.
To find the possible reassortant strains, we primarily follow the method introduced by Rabadan et al. (2008), which was initially applied to complete sequences of human influenza A strains
The colors demonstrate the logarithm of the cumulative probability for the points, among which the ones with a cumulative probability of less than 10-7 indicate possible reassortment events. Left: The results from 150 strains in the dataset, where there are candidates for reassortment events in both PB2 and PB1. Right: The results when the dataset is limited to the classical H1N1 strains isolated in the 70’s, 80’s, and 90’s, where there are distinctively more candidates for reassortment events in PB1.
Given two strains a and b and two segments i and j, the probability to obtain hamming distances equal to diab and djab by random chance only is given by the hypergeometric distribution:
where L i and L j are the respective lengths of the segments divided by three. Hence, fixing the total distance between segments i and j of the two strains, the probability of observing a distance in segment i at most d iab is the cumulative of the hypergeometric distribution. Maintaining the assumption of similar average substitution rates at the third codon positions in all segments, in this model the lower the cumulative probability, the more likely it is that the two segments do not have a common ancestor. To correct for multiple hypotheses testing, for every two segments of each strain we generate 100 pairs of segments by randomly permuting their third codon positions. We observe that the cumulative probabilities for distances of pairs from the generated data are at least 10 -7 . Thus, a cumulative probability of at most 10 -7 for two given segments of two strains indicates a reassortment event.
Finally, for each of the 150 strains, we generate a list of strains with which they have low probabilities of having common ancestors, hinting to reassortment events. For further investigation of the origin of the segments, we compile a large target database of more than 10,800 strains of influenza A virus that includes all completely sequenced human and avian isolates, in addition to all swine isolates deposited in the Influenza Virus Resource of the NCBI. We use this database to compare the histories of two segments of a given swine strain. First, we align with NCBI BLAST
For a demonstration of the analyses described above consider the strain A/swine/Tennesse/23/1976. When compared to A/swine/Iowa/1/1976 the hamming distance in the PB1 segments is 11% and the hamming distance in the NP segments is 3%. The cumulative hypergeometric probability of this event is less than 10-7, which indicates a reassortment event in at least one of those strains at either segment PB1 or NP. The history overlaps for those two segments and the rest of the segments of A/swine/Tennesse/23/1976 are shown in Figure 2. The figure shows that NP and all the other segments except PB1 share a common recent history, whereas the recent history of PB1 is different from the other seven segments. This allows us to assert that the PB1 segment of A/swine/Tennesse/23/1976 is the result of a reassortment. An interesting feature apparent in Figure 2 is that at lower identities M1 shares fewer strains with PB1 and NP. This observation can be attributed to a possible slower evolutionary rate of M1 and a fork in its lineage to a line of human viruses. Similar considerations show that the PB1 segment of A/swine/Iowa/1/1976 is also the result of a reassortment, however the PB1 segments of A/swine/Iowa/1/1976 and A/swine/Tennessee/23/1976 are from different lineages and the target database contains isolates close to the former, but not the latter.
NP and all the other segments except PB1 share a common recent history, whereas the recent history of PB1 is different from the other seven segments, indicating a reassortment event at PB1. The small history overlap of M1 with PB1 and NP at lower identities can be attributed to a possible slower evolutionary rate of M1 and a fork in its lineage to a line of human viruses. The fluctuations in the history overlap of NP at 99% identity are due to small number of sample points at that level of identity.
Viruses present an enormous diversity due to their high mutation rate, short replication time, and high number of replicates. There are several ways of measuring the diversity of a viral population: richness, evenness, Rao’s entropy
Although the third codon evolutionary rates in influenza A viruses are thought to be similar in all segments, the analysis of the genomic diversity of the strains in our dataset reveals a very inhomogeneous pattern. Figure 3 (left), shows Rao’s quadratic entropy, measured at the third codon positions, for 150 swine influenza A viruses that have all 8 fully sequenced segments deposited in NCBI, along with the 95% bootstrap percentile confidence intervals. This figure indicates a statistically significant difference between NA, HA, and PB1 compared to PB2, PA, NP, M, and NS. Moreover, within a particular subtype, where the variations in HA and NA are fixed, PB1 appears as the most diverse segment. Figure 3 (right) shows the diversity in the swine classical H1N1 strains that were isolated in the 70’s, 80’s and 90’s. This analysis shows that the eight segments do not have a common history, with PB1, HA, and NA presenting a higher level of diversity.
Left: Considering the 150 strains in the dataset, NA, HA, and PB1 present a higher diversity than the rest. Right: When the dataset is limited to the classical H1N1 strains isolated in the 70’s, 80’s, and 90’s, which fixes the HA and NA variations in the population, shows a higher diversity in PB1 than the rest of the segments.
We further investigate the sources of variation in diversity via the pair-wise Pearson correlation of the distances at the third codon positions of the viral segments. Correlations, linear or non-linear, or any other measure of dependence, such as mutual information, encounter the same problems as those of the measures of diversity (sampling bias, bottleneck structures, population stratification, etc.). Nonetheless, they are revealing indicators of the origins of diversity in a population. When all the 150 strains in the dataset are considered, the correlations are lower between the surface glycoprotein coding segments and the other segments. More interestingly, the PB1 segment also has a low correlation with all segments that are not polymerase coding (Figure 4, left). Especially, when the strains from a particular subtype are considered and the variations in HA and NA segments are fixed in the dataset, PB1 presents the least correlation relative to the other segments. This is evident, among the classical swine H1N1 strains isolated in the 70’s, 80’s and 90’s (Figure 4, right).
Left: The HA, NA, and PB1 segments have the least correlation in regards to the rest of the segments. Right: When the HA and NA variations are fixed in the population by limiting the dataset to the classical H1N1 strains isolated in the 70’s, 80’s, and 90’s, PB1 presents a distinctively lower correlation relative to the other segments.
The above observations from the diversity and pair-wise correlation measures hint to distinct evolutionary patterns in the HA, NA, and PB1 segments. To elucidate the role of the process of reassortment in these patterns, we have enumerated the independent reassortment events in swine viruses that we identify through the hypergeometric distribution analysis of Rabadan et al. (2008)
To summarize, our analyses show that not every segment of the swine influenza virus reassorts in equal fashion. In accordance with the previous results from human influenza A viruses, both in vitro
The mechanisms behind the preferential reassortments are not clear, however several hypotheses can be advanced. There is substantial evidence for biases in the packaging mechanism of the viral RNA into the virion for influenza A viruses, which can impose a selective pressure on segments that can be exchanged between strains
The authors have no support or funding to report.
The authors have declared that no competing interests exist.
Year | Strain | Subtype | PB2 | PB1 | PA | HA | NP | NA | MP | NS | Ref. |
1976 | A/swine/Iowa/1/1976 | H1N1 | S1 | S2 | S1 | S1 | S1 | S1 | S1 | S1 | |
1976 | A/swine/Tennessee/15/1976 | H1N1 | S1 | S2 | S1 | S1 | S1 | S1 | S1 | S1 | |
1976 | A/swine/Tennessee/19/1976 | H1N1 | S1 | S1 | S1 | S1 | S1 | S1 | S2 | S1 | |
1976 | A/swine/Tennessee/23/1976 | H1N1 | S1 | S2 | S1 | S1 | S1 | S1 | S1 | S1 | |
1977 | A/swine/Tennessee/48/1977 | H1N1 | S1 | S1 | S2 | S1 | S1 | S1 | S2 | S1 | |
1977 | A/swine/Tennessee/61/1977 | H1N1 | S1 | S1 | S1 | S1 | S1 | S1 | S2 | S1 | |
1977 | A/swine/Tennessee/62/1977 | H1N1 | S1 | S1 | S1 | S1 | S1 | S2 | S2 | S1 | |
1977 | A/swine/Tennessee/64/1977 | H1N1 | S1 | S2 | S1 | S1 | S1 | S1 | S1 | S1 | |
1977 | A/swine/Tennessee/82/1977 | H1N1 | S1 | S1 | S1 | S2 | S1 | S2 | S1 | S1 | |
1977 | A/swine/Tennessee/96/1977 | H1N1 | S1 | S1 | S1 | S1 | S1 | S1 | S1 | S2 | |
1979 | A/swine/Minnesota/5892-7/1979 | H1N1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | |
1981 | A/swine/Ontario/6/1981 | H1N1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | |
1986 | A/swine/Iowa/1/1986 | H1N1 | S1 | S1 | S2 | S2 | S1 | S1 | S1 | S1 | |
1988 | A/swine/Wisconsin/1915/1988 | H1N1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | |
2004 | A/swine/Korea/CAN01/2004 | H1N1 | S1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | |
2004 | A/swine/Spain/53207/2004 | H1N1 | S1 | S1 | S1 | S2 | S1 | S2 | S1 | S3 | |
2007 | A/swine/Ohio/24366/07 | H1N1 | S1 | S1 | S1 | S2 | S1 | S2 | S1 | S1 | |
1998 | A/swine/Italy/1521/98 | H1N2 | S1 | S1 | S1 | S2 | S1 | S3 | S1 | S1 | |
1999 | A/Swine/Indiana/9K035/99 | H1N2 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | S1 | |
2000 | A/Swine/Minnesota/55551/00 | H1N2 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | S1 | |
2004 | A/swine/Zhejiang/1/2004 | H1N2 | S1 | S1 | S1 | S1 | S1 | H | S1 | S1 | |
2005 | A/swine/Cloppenburg/IDT4777/2005 | H1N2 | S1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | |
2006 | A/swine/Miyazaki/1/2006 | H1N2 | S1 | S1 | S1 | S1 | S1 | S3 | S1 | S1 | |
2007 | A/swine/Shanghai/1/2007 | H1N2 | S1 | S1 | S1 | S2 | S1 | S1 | S1 | S1 | |
1998 | A/Swine/Nebraska/209/98 | H3N2 | A | H | A | H | S | H | S | S | |
2001 | A/swine/Spain/33601/2001 | H3N2 | S1 | S1 | S1 | S2 | S1 | S2 | S1 | S1 | |
2003 | A/swine/North Carolina/2003 | H3N2 | S | S | S | H1 | S | H2 | S | S | |
2007 | A/swine/Korea/CY04/2007 | H3N2 | S1 | S1 | S1 | S1 | S1 | S1 | S1 | S2 | |
2007 | A/swine/Korea/CY07/2007 | H3N2 | S1 | S1 | S2 | S2 | S1 | S1 | S1 | S1 | |
Other published (incompletely sequenced) strains | |||||||||||
1998 | A/swine/North Carolina/35922/98 | H3N2 | S | H | S | H | S | H | S | S | |
2004 | A/swine/MI/PU243/04 | H3N1 | S1 | S1 | S1 | S1 | S1 | S2 | S1 | S1 | |
2006 | A/swine/Missouri/2124514/2006 | H2N3 | S1 | S2 | A | A | S1 | A | S1 | S1 | |
Strains “frozen in time” or with evidence of homologous recombination [31] | |||||||||||
2003 | A/swine/Alberta/56626/03 | H1N1 | S1 | S1 | S2 | S1 | S1 | S3 | S1 | S1 | |
2003 | A/swine/Ontario/53518/03 | H1N1 | S3 | S3 | S2 | S1 | S3 | S1 | S1 | S1 | |
2003 | A/swine/Ontario/57561/03 | H1N1 | S1 | S1 | S2 | S1 | S3 | S1 | S2 | S1 | |
2004 | A/swine/Ontario/48235/04 | H1N2 | S1 | H1 | S1 | H2 | S2 | H3 | S3 | S3 | |
2004 | A/swine/Ontario/11112/04 | H1N1 | S1 | H | S1 | S1 | S2 | S1 | S1 | S1 | |
2005 | A/swine/Alberta/14722/2005 | H3N2 | S | S | S | S | S | H | S | S |