J. Biomedical Science and Engineering, 2010, 3, 584-601 doi:10.4236/jbise.2010.36081 Published Online June 2010 (http://www.SciRP.org/journal/jbise/ JBiSE ). Published Online June 2010 in SciRes.http://www.scirp.org/journal/jbise Novel host markers in the 2009 pandemic H1N1 influenza a virus Wei Hu Department of Computer Science Houghton College, Houghton, USA. Email: wei.hu@houghton.edu Received 21 March 2010; revised 28 April 2010; accepted 8 May 2010. ABSTRACT The winter of 2009 witnessed the concurrent spread of 2009 pandemic H1N1 with 2009 seasonal H1N1. It is clinically important to develop knowledge of the key features of these two different viruses that make them unique. A robust pattern recognition technique, Random Forests, was employed to uncover essential amino acid markers to differentiate the two viruses. Some of these markers were also part of the previ- ously discovered genomic signature that separate avian or swine from human viruses. Much research to date in search of host markers in 2009 pandemic H1N1 has been primarily limited in the context of traditional markers of avian-human or swine-human host shifts. However, many of the molecular markers for adaptation to human hosts or to the emergence of a pandemic virus do not exist in 2009 pandemic H1N1, implying that other previously unrecognized molecular determinants are accountable for its capa- bility to infect humans. The current study aimed to explore novel host markers in the proteins of 2009 pandemic H1N1 that were not present in those clas- sical markers, thus providing fresh and unique in- sight into the adaptive genetic modifications that could lead to the generation of this new virus. Ran- dom Forests were used to find 18 such markers in HA, 15 in NA, 9 in PB2, 11 in PB1, 13 in PA, 10 in NS1, 1 in NS2, 11 in NP, 3 in M1, and 1 in M2. The amino acids at many of these novel sites in 2009 pan- demic H1N1 were distinct from those in avian, hu- man, and swine viruses that were identical at these positions, reflecting the uniqueness of these novel sites. Keywords: 2009 Pandemic H1N1; Host Switch; Influ- enza; Mutation; Random Forests 1. INTRODUCTION In addition to the common seasonal H1N1 influenza virus, an antigenically novel swine-origin pandemic H1N1 influenza virus marked the flu season in 2009. It is likely that both 2009 pandemic H1N1 and seasonal influenza will coexist for some time. Elucidation of the characteristics of this new virus has become an impor- tant part of the current flu research. The identification of molecular markers for drug resistance, virulence, viral transmission and replication, human adaptation, and evolution can shed new light into the nature of this virus. There are eight single-stranded RNA segments of the influenza A virus genome. They code 11 proteins: hema- gglutinin (HA), neuraminidase (NA), matrix 1 (M1), matrix 2 (M2), nucleoprotein (NP), non-structural pro- tein 1 (NS1), non-structural protein 2 (NS2; also termed nuclear export protein, NEP), polymerase acidic protein (PA), polymerase basic protein 1 (PB1), polymerase basic protein 2 (PB2), and polymerase basic protein 1 – F2 (PB1-F2). Segments 1, 3, 4, 5, and 6 each encode a single protein, i.e., PB2, PA, HA, NP, and NA, respec- tively, whereas segments 2, 7, and 8 each encode two proteins, i.e. , PB1 and PB1-F2, M1 and M2, NS1 and NS2, respectively. The life cycle of influenza virus has the following steps with several proteins involved in each: entry into the host cell (HA, M1 and M2), entry of viral ribonucleoproteins (vRNP) into the nucleus (NP, PA, PB1 and PB2), transcription and replication of the viral genome (PA, PB1, PB2, NS1, and NP), export of the vRNPs from the nucleus (NP, NS2 and M1), and assembly and budding at the host cell plasma membrane (HA, NA, M 1 a nd M2) [1]. Besides mutations, viruses with segmented genomes can generate genetic diversity by exchanging gene seg- ments between different viruses to produce a new virus. Comprehensive phylogenetic analysis suggested that the genes of 2009 pandemic H1N1 were derived from avian (PB2 and PA), human H3N2 (PB1), classical swine (HA, NP and NS), and Eurasian avian-like swine H1N1 (NA and M) lineages [2]. The symptoms of the 2009 pandemic H1N1 flu are
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 585 Copyright © 2010 SciRes. JBiSE similar to the 2009 seasonal flu with a possibility of ad- ditional symptoms such as vomiting and diarrhea [3]. The Center for Disease Control and Prevention (CDC) and Mayo Clinic have developed several molecular tests to detect and discriminate the novel 2009 pandemic H1N1 virus and the 2009 seasonal virus [4]. The matrix (M) gene is highly conserved compared to other gene segments, which makes it an ideal target for RT-PCR assays used to detect the presence of influenza. A muta- tion in the M gene of the 2009 pandemic virus could invalidate these tests [5]. Sequence survey suggested that there were two dis- tinct evolutionary trends in antig enic drift of H1N1 HAs at two residues 190 and 225. The epidemic H1N1 HAs favor position 190 while the 1918 pandemic and swine HAs favor position 225 [6]. In contrast to these two trends, the 2009 pandemic H1N1 strains are highly con- served at both HA 190 an d 225 and possess th e signature markers Asp190 and Asp225 that are known to confer specificity to the human α2-6 sialylated glycan receptors [7]. Further analysis indicated the 2009 pandemic H1N1 HAs possess residues that can be positioned to bind to avian α2-3 sialylated glycan receptors as well [7,8]. By homology modeling of the HA structure, the antigenic similarity between the 1918 H1N1 and the pandemic 2009 H1N1 viruses was confirmed, and the future amino acid substitutions on the antigenic sites of 2009 pan- demic H1N1 HA were also predicted [9], raising the concerns that these two pandemic H1N1 viruses may share a similar evolutionary path. With informational spectrum method [10], a bioinformatics technique, h ighly conserved domains and mutations in the 2009 pandemic H1N1 HAs were identified and the contributions of these mutations to the changes of binding specificity of the 2009 pandem i c H1N1 HAs w e re quantified [11- 13] . Many 2009 seasonal H1N1 strains carry a NA muta- tion H275Y that confers high-level resistance to osel- tamivir. Although most 2009 p andemic H1N1 strains are susceptible to oseltamivir, the co-circulation of pandemic and seasonal H1N1 viruses might provide opportunities for 2009 pandemic H1N1 to develop oseltamivir resis- tance through mutations and reassortments between pandemic and seasonal H1N1 viruses. Positive natural selection was detected in the NA proteins of 2009 pan- demic H1N1 at codons 275 and 248 and seasonal H1N1 at codon 275, with statistically significant bias of non- synonymous mutations relative to synonymous muta- tions [14]. Besides positio n 275, mutations at other posi- tions in NA such as 116, 117, 119, 136, 150, 151, 199, 223, 275, and 295 could also alter NA inhibitor suscep- tibility [15,16]. Two recent reports [17,18] revealed three NA variant groups in 2009 pandemic H1N1. The first group had V106 and N248, the second included I106 and N248, and the third contained I106 and D248, highlighting the rapid genetic variation of this surface antigen under host immune pressure and the need for close monitoring. The NA protein of the avian viruses has, in addition to the catalytic site, a separate sialic acid binding site that is not present in human viruses, which could enhance the catalytic efficiency of NA [19]. Although the second binding site was not conserved in swine NA strains, a recent report found the 2009 pandemic H1N1 strain of swine origin appeared to have retained some of the key features of the second binding site. Their data showed possible lowered HA activity for this second site, which might be an important event in the emergence of the 2009 pandemi c strain [20] . The interaction of NP and the influenza polymerase, containing the PA, PB1 and PB2 proteins, catalyses viral RNA replication (vRNA→cRNA→vRNA) and transcrip- tion (vRNA→mRNA) in the nucleus of infected cells. The PB2 protein of human viruses tend to possess a ly- sine at position 627 (K627), whereas avian viruses gen- erally have glutamic acid at this position (E627). The mutation E627K allows avian virus to efficiently grow in humans and was identified experimentally as a crucial host range and pathogenicity determinant [21,22]. The 2009 pandemic H1N1 strains could transmit in humans efficiently, but exclusively possess the avian signature E627. Therefore, there might be alternative strategies employed by the novel 2009 H1N1 polymerase to main- tain the efficient replication rate. A recent study discov- ered that serine at position 590 (S590) and arginine at position 591 (R591) might serve as a regulator of poly- merase activity that contributes to the increased replica- tion efficiency of 2009 pandemic H1N1. The paired mu- tations S590 and R591, termed the SR polymorphism, were present in only three of the 284 9 PB2 sequences of human viruses before 2009 [23]. Other sites might affect the polymerase activity as well. The mutations at posi- tion 504 in PB2 (I504V) and position 550 in PA (I550L) could result in enhanced virulence [24]. A special region (residues 360-374) in the NP protein was found to play a vital role in overcoming species barrier for 2009 pan- demic H1N1 [25]. Compared to other proteins in the influenza viruses, PB1-F2 is a newly discovered protein, which is unique in that this protein is coded by a subset of the nucle- otides that code for PB1 due to the use of a different reading frame. The PB1-F1 protein has been implicated in pathogenicity and the induction of cell death [26-28]. The 2009 pandemic H1N1 viru s has a truncated PB1-F1 protein, because its genome contains three stop codons preventing PB1-F2 expression. Recently, studies found that its function is not universal, but cell type and virus
586 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE strain dependent [29], and it plays a critical role in the pathogenicity and tranmissibility of 2009 pandemic H1N1 [30,31]. NS1 is a multifunctional protein that contributes to viral pathogenesis by neutralizing the interferon (IFN) -based defense system of the host cell [32], and serves as a strong inducer of apoptosis in infected human respira- tory epithelial cells [33]. The NS1 protein of 2009 pan- demic H1N1 is truncated and therefore missing a domain responsible for increased pathogenicity of avian virus [34]. There are two molecular markers that account for the virulence of the highly pathogenic avian H5N1 vi- ruses are not present in 2009 pandemic H1N1. They are a lysine (K) at position 627 of PB2, and glutamic (E) acid in position 92 of NS1 that might increase the repli- cation efficiency and block host inhibition of viral repli- cation, respectively [35,36]. A recent study on M gene identified sites of high se- lective pressure between human and avian influenza, which were 115, 121, 137 in M1, and 11, 16, 20, 54, 57, 78, 86, and 93 in M2 [37]. The 2009 pandemic H1N1 virus contains the adamantine-resistant mutation S31N in its M2 protein, thus making the NA Inhibitors osel- tamivir and zanamivir the only options available to treat the infections caused by the pandemic virus [38]. Several studies focused on determining which amino acid changes best disti ngui sh an avi an or swine influenza virus from a human virus. An entropy analysis revealed the human-avian host shift genomic signature of 52 markers in ten proteins of the influenza virus [39]. This signature extended in [40] provided th e basis for finding the amino acids of 2009 pandemic H1N1 at the host spe- cies-specific positions to illustrate the adaptive muta- tions of this virus. By comparison of the protein se- quences of 2009 pandemic H1N1 with those in the pre- vious pandemics and human, swine, and avian influenza viruses, the mutation trend of the residues at the signa- ture positions was discovered, and the potential roles of the mutated residues in human adaptation and virulence was probed in [41]. With mutual information analysis, the characteristic sites for human-to -human transmission in PB2 of influenza viruses were uncovered [42], and subsequently a catalogue of 68 such sites in eight inter- nal proteins were found to derive adaptation signatures of viral proteomes [43], which included many of the 32 and 34 markers identified in [44,45], respectively. Many of the molecular determinants associated with adaptation to human hosts or to the emergence of a pan- demic virus are not present in 2009 pandemic H1N1, suggesting that other previously u nrecognized molecular markers are responsible for its ability to infect humans [17]. Therefore, uncovering new molecular features of 2009 pandemic H1N1 is of prime significance. In this study, we collected all the protein sequences of the 2009 pandemic H1N1, 2009 seasonal H1N1, avian, human, and swine influenza viruses available from the National Center for Biotechnology Information (NCBI). Our ob- jective was to explore the novel host markers in 2009 pandemic H1N1 that were not present in the classical avian-human or sw ine-human ho st shift mark ers, and the top markers that could differentiate 2009 pandemic H1N1 from 2009 seasonal H1N1. 2. MATERIALS AND METHODS 2.1. Sequence Data All influenza virus protein sequences were retrieved from the Influenza Virus Resource (http://www.ncbi/ nlm.nih.giv/genomes/FLU/FLU.html) of the National Center for Biotechnology Information (NCBI). Detailed information about these sequences is in Table 1. All the sequences used in the study were aligned with MAFFT [46]. 2.2. Random Forests Random Forest, proposed by Leo Breiman in 1999 [47], is an ensemble classifier based on many decision trees. Each tree is built on a bootstrap sample from the origin al training set and is unprun ed to obtain low-bias trees. The variables used for splitting the tree nodes are a random subset of the whole variable set. The classification deci- sion of a new instance is made by majority voting over all trees. About one-third of the instances are left of the bootstrap sample and not used in the construction of the tree. These instances in the training set are called “out- of-bag” instances and are used to evaluate the perform- ance of the classifier, which can achieve both low bias and low variance with bagging and randomization. 2.3. Feature Selecti on Using Random For es ts Random Forest calculates several measures of variable importance. The mean decrease in accuracy measure was employed in [48] to rank the importance of the features in prediction. This measure is based on the decrease of classification accuracy when values of a variable in a node of a tree are permuted randomly. In this study, two packages of R, randomForest and varSelRF [48], were utilized to compute th e importance of the amino acids in a given protein sequence dataset. The effectiveness and robustness of this technique as a feature selection method has been demonstrated in various studies [49- 54]. 2.4. Procedure to Find Novel Host Sites in 2009 Pandemic H1N1 Four steps were created to locate the novel sites associated
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 587 Copyright © 2010 SciRes. JBiSE Table 1. Counts of the influenza protein sequences used in the current study. Host Subtype Protein Nu m b e r o f S eq u e n c e sYears Host SubtypeProteinNu m b e r o f S eq u e n c e sYears Human Pandemic H1N1 HA 710 2009 HumanAll t y p e sN S1 509 Al l y ea r s Human Pandemic H1N1 NA 643 2009 HumanAll t y p e sN S2 383 Al l y ea r s Human Pandemic H1N1 NP 394 2009 HumanAl l t yp e sPA 279 Al l y e ar s Human Pandemic H1N1 M1 490 2009 HumanAl l t yp e sPB1 289 All y e a r s Human Pandemic H1N1 M2 482 2009 HumanAl l t yp e sPB2 269 All y e a r s Human Pandemic H1N1 NS1 366 2009 Avian H1 HA 120 All years Human Pandemic H1N1 NS2 358 2009 AvianN1 NA 1821 All years Human Pandemic H1N1 PA 295 2009 AvianAl l t yp e sNP 2888 Al l y ea r s Human Pandemic H1N1 PB1 31 1 2009 AvianA l l t y p e sM1 4232 All y e a rs Human Pandemic H1N1 PB2 31 1 2009 AvianA l l t y p e sM2 3182 All y e a rs Human Seasonal H1N1 HA 128 2009 AvianAll t y p e sN S1 4610 A ll y ea r s Human Seasonal H1N1 NA 125 2009 AvianAll t y p e sN S2 3422 A ll y ea r s Human Seasonal H1N1 NP 25 2009 Av ianAll t yp e sPA 3 106 Al l ye a r s Human Seasonal H1N1 M1 129 2009 AvianAl l t y p e sPB1 2979 Al l y e a r s Human Seasonal H1N1 M2 129 2009 AvianAl l t y p e sPB2 2643 Al l y e a r s Human Seasonal H1N1 NS1 25 2009 SwineH1 HA 379 All years Human Seasonal H1N1 NS2 25 2009 SwineN1 NA 278 All years Human Seasonal H1N1 PA 23 2009 SwineAl l t y p esNP 4 20 A l l y e a rs Human Seasonal H1N1 P B1 25 2009 S wineAl l ty p esM1 516 Al l y e a rs Human Seasonal H1N1 P B2 25 2009 S wineAl l ty p esM2 406 Al l y e a rs Human H1 HA 640 All yearsSwineAl l t y p e sN S1 506 Al l y e a rs Human N1 NA 1 127 All yearsSwineAl l t y p esN S2 351 Al l y e a r s Human All types NP 393 All yearsSwineAll typesPA 343 All years Human All types M1 1512 All yearsSwineAll typesPB1 368 All years Human All types M2 1415 All yearsSwineAll typesPB2 327 All years with host adaptation in 2009 pandemic H1N1. Step 1: For each protein, the consensus sequence of avian, 2009 pandemic H1N1, human, and swine viruses were calculated separately, and the positions with dif- ferent amino acids of the four consensus sequences were identified, since the different amino acids at these posi- tions have the poten tial to contribute to host switches. Step 2: For each protein, Random Forests were used to identify the top 20 positions that have highest impor- tance in separating avian from human viruses, and swine from human viruses, respectively. Step 3: Finding the intersection of the top positions with importance larger than 0.005 for separating 2009 pandemic H1N1 from human viruses and the positions with different co nsensus ami no acids found in step one. Step 4: The positions discovered in step three minus the positions found in step two will be the novel posi- tions important for separatig 2009 pandemic H1N1 n
588 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE Table 2. This table contains the Hamming distances of consensus protein sequences of avian, human, 2009 pandemic H1N1, and swine viruses. Proteins HA NANPM1M2NS1NS2PA PB1 PB2 Dist(Avian,2009_pandemic) 90 35 28 8 6 35 11 17 23 17 Dist(Human,2009_pandemic) 10961 42 19 15 42 13 30 21 31 Dist(Swine,2009_pandemic) 43 52 10 11 6 17 7 15 20 17 Dist(Avian,Human) 90 50 31 11 13 23 5 20 4 19 Dist(Avian,Swine) 61 37 21 3 2 22 7 4 6 2 Dist(Human,Swine) 97 52 35 8 11 36 7 19 6 21 from human viruses. The purpose of steps 2 and 3 was to calculate the ad- aptation signatures for various virus groups, which were then used in step 4. The amino acids at most of these novel sites in 2009 pandemic H1N1 turned out to be different from those in avian, human, and swine viruses that were the same at these positions. Random Forests produce non-deterministic outcomes. To compensate this bias, the Random Forests algorithm was run multiple times and then the average of the results was taken. The importance of each residue in the protein sequences was based on the averaged calculations by using the function randomVarImpsRF in varSelRF re- peated 5 times. 3. RESULTS 3.1. Com p ar i s o n o f Cons e n s u s Prote i n Sequences of Influenza Viruses In considering the relationship among the proteins of influenza viruses, the Hamming distance, defined as the number of positions at which the corresponding amino acids of two sequences are different, of any two consen- sus protein sequences of avian, human, 2009 pandemic H1N1, and swine viruses was calculated. The distance information in Table 2 provided insight into the se- quence similarity between the proteins of 2009 pan- demic H1N1 and those of other virus groups. In particu- lar, the distances between 2009 pandemic H1N1 and avian, human, and swine viruses reflected the origin of 2009 pandemic H1N1 [2]. 3.2. Novel Host Sites in the Proteins of 2009 Pandemic H1N1 Our analysis discovered a catalogue of novel host mark- ers in the proteins of 2009 pandemic H1N1 that included 18 markers in HA, 15 in NA, 9 in PB2, 11 in PB1, 13 in PA, 10 in NS1, 1 in NS2, 11 in NP, 3 in M1, and 1 in M2. In the following sections, each of the ten proteins of 2009 pandemic H1N1 was compared to that of avian, human, and swine viruses. Random Forests were em- ployed to identify the top impo rtant positions in the pro- teins of influenza that could separate 2009 pandemic H1N1 from avian, human, and swine viruses, and the top positions that could discriminate 2009 pandemic H1N1 and 2009 seasonal H1N1. The novel host markers in 200 9 pandemic H1N1 were uncovered with the procedure outlined in Section 2.3. Some of the markers that could classify 2009 pandemic H1N1 and 2009 seasonal H1N1 were also part of the previously discovered genomic signature that separate avian or swine from human viruses. Because the se- quences of 2009 seasonal H1N1 were a subset of those of human viruses, there were common important sites in each protein between the sites in 2009 pandemic versus 2009 seasonal and the sites in 2009 pandemic versus human viruses. To render a complete picture of host shift markers of different types, the novel sites in each of the ten proteins of 2009 pandemic H1N1 were exhibited along with the avian-human and swine-human sites in a single table. The conservation of residues comprising these sites in each protein as represented by their frequency at these positions was also displayed in the table. The top impor- tant sites in each protein for differentiating 2009 pan- demic H1N1 from avian, human, and swine viruses were displayed in a single figure, which were used in the pro- cedure to find the novel sites. Due to high genetic variation of the HA and NA pro- teins, only the HA protein sequences of H1 subtype and the NA protein sequences of N1 subtype of avian, hu- man, and swine viruses were used to compare those of 2009 pandemic H1N1 in the current analysis. Therefore, the novel markers in HA and NA of 2009 pandemic H1N1 found in this study were subtype-specific. Be- cause all the PB1-F2 proteins of 2009 pandemic H1N1 were truncated and nonfunctional, they were excluded in this study. 3.2.1. HA Prot ein As the primary target of host immune responses, the surface protein HA is under high selection pressure, as evidenced by the large number of amino acid substitu- tions in this protein. There was a clear distinction of amino acids at position 127, where the human HA had a
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 589 Copyright © 2010 SciRes. JBiSE Table 3. This table contains the consensus amino acids and their frequency at positions in HA that have high importance in separat- ing 2009 pandemic H1N1 from human H1 viruses. The single letter ‘a’ (for avian) or ‘s’ (for swine) in parenthesis after a position number indicates whether the same position is also important for separating 2009 pandemic H1N1 from avian or swine viruses or both. The novel host sites in this protein are the positions without an ‘a’ or a ‘s’ or both. Position 71 84 120(a) 127(a,s) 128(s) 129(s) 130(a,s) 142 168 216 239 250 Avian L(93.3%) N(99.2%) A(96.7%) E(97.5%) T(100%) T(93.3%) K(94.2%)S(92.5%)N(99.2%) A(94.2%) T(97.5%) A(94.2%) Human I(92.5%) N(98.4%) E(95.0%) -(100%) T(95.8%)V(95.0%) T(97.8%)S(88.4%)N(98.4%) K(96.1%) T(99.2%) A(100%) 2009 H1N1 S(100%) S(100%) T(100%) D(99.7%) S(97.0%)N(100%) K(100%) K(100%) D(100%)I(99.9%) K(100%) V(99.9%) Swine F(58.6%) N(91.8%) A(44.9%) E(57.3%) T(81.0%)N(64.12%)R(62.0%)N(66.2%)N(85.5%) A(46.7%) T(80.0%) V(62.5%) Position 257 258 260 261 298 302 314 365 374 493 527 Avian L(85.8%) N(95.0%) G(94.2%) S(98.3%) I(93.3%) E(98.33%)M(99.2%)Q(94.2%)G(100%)S(96.7%) L(99.2%) Human L(96.7%) S(96.7%) G(98.0%) F(97.3%) V(99.5%)E(100%) M(99.5%)Q(98.9%)G(99.5%) S(99.2%) L(99.1%) 2009 H1N1 M(100%) E(99.9%) N(99.0%) A(99.6%) I(100%) K(100%) L(99.9%)L(100%) E(99.4%)A(100%) V(99.9%) Swine M(54.1%) N(41.7%) G(78.1%) S(72.0%) V(73.1%)E(94.2%) M(95.3%)Q(61.7%)G(90.8%) S(67.0%) L(78.1%) Table 4. This table contains the consensus amino acids and their frequency at positions in NA that have high importance in separating 2009 pandemic H1N1 from human N1 viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 84 126 149 163(s) 166 189 257 269 285(s) 321 Avian T(76.4%) H(98.9%) V(95.2%) V(94.1%) A(99.6%) S(93.5%) K(96.8%) L(99.7%) A(97.5%) V(94.6%) Human T(92.8%) H(100%) V(95.6%) L(83.8%) A(99.8%) G(86.4%) K(99.7%) L(100%) T(85.5%) V(99.9%) 2009 H1N1 K(100%) P(100%) I(100%) I(100%) V(99.7%) N(99.8%) R(99.8%) M(99.7%) S(100%) I(100%) Swine I(55.8%) H(89.2%) V(73.0%) I(84.5%) A(64.0%) G(58.3%) K(90.3%) L(90.7%) T(55.04%) V(71.6%) Position 331 365(a,s) 369(a,s) 385 389 395 397 398 436 Avian G(99.7%) T(91.4%) S(99.4%) S(88.3%) V(90.4%) A(99.2%) T(99.2%) D(99.5%) T(99.6%) Human G(98.8%) N(84.7%) K(84.5%) S(99.7%) V(94.5%) A(99.7%) T(99.7%) D(99.7%) T(99.5%) 2009 H1N1 K(100%) I(99.8%) N(100%) N(100%) I(100%) G(100%) N(99.7%) E(100%) -(100%) Swine G(68.7%) I(63.0%) S(82.0%) S(69.4%) V(38.1%) A(63.3%) T(87.8%) D(92.1%) T(99.6%) deletion whereas the other three virus groups had not (Table 3). However, as will be seen in the NA protein section below (Table 4), the NA protein of 2009 pan- demic H1N1 had a deletion at position 436 though the other three virus groups had not. The positions in Table 3 including 71, 84, 130, 257, 258, and 314 had signifi- cant effects on the receptor binding specificity of HA of 2009 pandemic H1N1[13]. HA has two functional do- mains HA1 (residues 1-327) and HA2 (residues 328- 549). Evidently, most of the sites in Table 3 were in HA1, illustrating a much higher selection pressure of HA1 relative to HA2. The HA active site located in a cleft is composed of the residues 91, 150, 152, 180, 187, 191, and 192. The active site cleft of HA is formed by its right edge (131_GVTAA) and left edge (221_RGQAGR) [55]. Four sites 127, 128, 129, and 130 in Table 3 were near the right edge of the active site (Table 3). 3.2.2. NA Protein In addition to the surface protein HA, the influenza A virus also has NA as another surface protein, and the balanced interplay between them is essential for the life cycle of this virus. Because of its critical role in viral replication and its highly conserved active sites, NA is the main target for drug design against influenza virus. The NA Inhibitors oseltamivir and zanamivir were the only drugs available to treat the infections caused by 2009 pandemic H1N1, because the novel virus had an adamantine-resistant mutation S31N in its M2 protein [38]. As a result, the surveillance of any potential drug- resistant mutations in the NA protein of 2009 pandemic H1N1 received high priority. The mutation H275Y (N1 numbering, H274Y in N2 numbering) in NA is well known for its resistance to NA Inhibitors. There were 123 Ys and 2 Hs in 125 NA sequences of 2009 seasonal H1N1 and 12 Ys and 631 Hs in 643 NA sequences of 2009 pandemic H1N1 used in the current study. Both NAs in 2009 pan demic and 2009 season al H1N1 did not have the novel NA mutation Q136K [41] that confers zanamivir resistance. NA is also is constantly evolving under host immune pressure, and the mutations in Table 4 illustrated its ge- netic variation. As mentioned in the HA protein section above, the NA of 2009 pandemic H1N1 had a deletion at
590 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. Figure 1. Top important HA positions in distinguishing avian H1, human H1, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine H1 viruses. Figure 2. Top important NA positions in distinguishing avian N1, human N1, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine N1 viruses. position 436 while the other three virus groups had not. However, the HA of human virus had a deletion at posi- tion 127 but the other three virus groups had not. The NA active site is a shallow pocket constructed from conserved residues, some of which contact the sub- strate directly and participate in catalysis, while others provide a structural framework [56]. According to the numbering in [57], these residues of N1 are 118, 119, JBiSE
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 591 Copyright © 2010 SciRes. JBiSE 151, 152, 156, 179, 180, 223, 225, 228, 247, 277, 278, 293, 295, 368, and 402. The antigenic sites of N1 are residues 83-143, 156-190, 252-303, 330, 332, 340-345, 368, 370,38 7- 3 95 , 43 1-435, 448-468. Novel host sit es 84, 126, 166, 189, 257, 269, 389, and 395 were at the anti- genic sites of N1 (Table 4). Because of a common deletion in the stalk region of the NA proteins of avian viruses, only the residues after position 82 were includ ed in the Random Forest analysis on avian, human, 2009 pandemic H1N1, and swine vi- ruses. However, the whole NA sequences were used in the analysis of 2009 pandemic and 2009 seasonal H1N1. 3.2.3. M1 Protein M1 protein forms a shell inside the viral envelope to offer strength and rigidity to the viral structure. M1 in- teracts with HA, NA, M2, and lipid membranes during budding of new virions from the cell surface, and func- tions in the formation of vRNP complexe and the disso- ciation of vRNP from the nuclear matrix, and in assem- bly by recruiting the viral components to the site of as- sembly. The dissociation of M1 from vRNP is triggered by transport of hydrogen ions across the viral membrane by M2, an early step preceding entry of vRNPs into the cytoplasm of the host cells. M1 also binds to NS2 to facilitate nuclear export of the vRNP [37]. There was a mutation R101K in the M1 protein of 2009 pandemic H1N1 (Table 5). It would be of interest to exam the im- pact of this mutation on viral replication. The basic amino acids 101RKLKR105 of M1 were involved in vRNP binding and nuclear localization. In [58], the func- tions of 101RKLKR105 were studied by introducing mutations into the M gene of influenza virus A/WSN/33. Individual substitution, R101S or R105S, had a minimal effect on viral replication, but the double mutation R101S-R105S reduced viral replication at a restrictive temperature. The M1 is a highly conserved protein. Therefore, the changes of M1 may reflect host-specific adaptation. Po- sitions 115, 121, and 137 were identified as avian-human host shift markers in [43]. Our investigation indicated position 218 was as important as position 121. Position 137 was a swine-human marker in [40], but our study also revealed positions 115 and 218 were as important as positions 137 as swine-human markers (Figure 3). The novel site 30 was in the membrane binding domain [43], and sites 207 and 209 were in the C-terminal part of M1 (residues 165-252) that binds to vRNP [59]. 3.2.4. M2 Protein This 97 amino acid-long integral membrane protein has three domains, one N-terminal extracellular domain (24 residues) recognized by host immune system, one 19- residue transmembrane domain responsible for ion channel activity, and one 54-residue cytoplasmic tail interacting with M1 and required for genome packing and formation of virus particles [37]. Two M2 inhibitors (adamantine and rimantadine) affect two steps in the replication cycle, viral uncoating and viral maturation. There are five known adamantine-resistanant mutations in M2 (L26F, V27A, A30V, A30T, S31N, and G34E). The 2009 pandemic H1N1 virus contains a mutation S31N. They also contain a mutation L43T in M2 (Table 6 and Figure 4), which is not present in seasonal, tri- ple-reassortant swine or H5N1 influenza viruses [15]. The replacement of the non-polar residue L43 by the polar residue T43 in M2 may influence a nearby func- tional residue W44, the channel lock and the binding site of rimantadine [60]. Positions 11, 14, 20, 28, 54, 55, 57, 78, and 86 were avian-human host shift sites found in [43]. However, the positions 18, 50, 86, and 93 were as important as these sites in our examination. Positions 57 , 86, and 93 were swine-human shift markers in [40], but our analysis also included positions 28, 54, 77, 78, 79, and 89 as swine-human markers with high importance (Figure 4). The only novel site in this protein was 13 which was in the extracellular domain (Table 6). 3.2.5. NP Protein The NP protein of the influenza virus binds the RNA genome and functions as an adaptor between the virus and the host cell. The interaction of the NP protein with the viral polymerase is required for viral RNA replica- tion, but not for the synthesis of viral messenger RNAs(transcription). Previous experiments implicated Table 5. This table contains the consensus amino acids and their frequency at positions in M1 that have high importance in separat- ing 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 15(a) 30 101(a,s) 115(a,s) 116(s) 121(a,s) 137(a,s) 142(a,s) 166(a,s) 207 209 214(s) 218(a,s) Avian I(52.4%) D(99.9%) R(52.7%) V(99.5%) A(97.5%) T(96.2%) T(99.4%) V(91.6%)V(53.7%)S(68.4%) A(98.9%) Q(99.2%)T(99.8%) Human V(67.6%) D(99.7%) R(93.1%) I(92.3%) A(98.7%) A(92.9%) A(93.1%) V(68.2%)V(93.2%)S(92.7%) A(99.9%) Q(99.8%)A(84.5%) 2009 H1N1 I(100%) S (99.8%) K(100%) V(99.4%) S(100%) T(100%) T(100%) A(99.6%)A(100%) N(99.8%) T(100%) H(100%) T(100%) Swine V(65.3%) D(77.3%) R(64.7%) V(90.9%) A(68.4%) A(59.7%) T(93.2%) V(77.3%)V(66.1%)S(91.9%) A(78.5%) Q(67.1%)T(92.4%)
592 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. 0.03 0.025 0.02 0.015 0.01 0.005 0 Importance Figur e 3 . Top important M1 positions in dis tinguis hing avian , human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses. Table 6. This table contains the consensus amino acids and their frequency at positions in M2 that have high importance in separating 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 11(a,s) 13 16(a,s) 20(a,s) 28(a,s) 31(s) 43(a,s) 77(s) 78(a,s) 86(a,s) Avian T(92.0%) N(91.70%) E(92.4%) S(96.2%) I(54.5%) S(88.6%) L(97.0%) R(98.1%) Q(99.4%) V(99.6%) Human I(91.6%) N(99.29%) G(91.2%) N(92.2%) V(96.8%) S(67.6%) L(67.7%) R(99.6%) K(60.9%) A(91.7%) 2009 H1N1 T(100%) S(99.79%) E(100%) S(100%) I(100%) N(100% ) T(100%) Q(100%) Q(100%) V(100%) Swine T(52.7%) N(72.66%) E(57.9%) N(52.7%) I(38.9%) S(60.8%) L(93.8%) R(63.5%) Q(95.6%) V(94.6%) Figure 4. Top important M2 positions in distinguishing avian, human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses. JBiSE
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 593 Copyright © 2010 SciRes. three NP r egions (residues 1-160, 2 56-340 and 340-49 8) in binding to PB1 and PB2 [61]. Novel sites 21, 53, 119, 316, 353, 371, 377, 433, 444, and 498 were scattered in these three regions (Table 7). One region, residues 360- 374, in NP of 2009 pandemic H1N1 was deemed ex- tremely important for host range restriction, and is a common feature of pandemic viruses [25]. Two posi- tions 371 and 373 in Table 7 were in this region. Residue 100 was involved in the NP-PB2 interaction [62], and ranked second in separating 2009 pandemic H1N1 from avian viruses. The consensus amino acids at 100 of PB2 proteins of avian, human, 2009 pandemic H1N1, and swine viruses were R, V, I and V respectively. The mu- tation V100I might contribute to the increased transmis- sibility or infection of 2009 pandemic H1N1 [41]. Positions 16, 33, 61, 100, 136, 214, 283, 305, 313, 357, 375, and 423 were avian-human host shift markers in [43]. Furthermore, we found positions 31, 217, 373, and 455 significant for discriminating avian and human viruses (Figure 5). 3.2.6. NS1 Protein All of the proteins in influenza virus are structural ex- cept for NS1 and PB1-F2. This protein is designated as non-structural because it is synthesized in infected cells, but is not incorporated into virions. NS1 is a multifunc- tional protein involved in both protein-protein and pro- tein-RNA interactions. Its N-terminal region has an RNA-binding domain (residues 1-73) and its C-terminal region (residues 74-237) contains the effector domain that inhibits the maturation and exportation of the host cellular antiviral mRNAs [63]. Because of a truncation in the NS1 proteins of 2009 pandemic H1N1, only the first 219 residues of the NS1 proteins were included in our analysis. Positions 22, 60, 81, 84, 215, and 227 were avian-human host shift sites in [43], whereas our Random Forests analysis implied po- sitions 79, 81, 114, 171, and 215 were as significant as Table 7. This table contains the consensus amino acids and their frequency at positions in NP that have high importance in separating 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 21 31(a,s) 53 119 189(s) 190 217(a) 289(s) 313(a,s) 316 Avian N(99.20%) R(99.8%) E(99.90%) I(97.65%) M(99.1%)V(98.61%)I(94.8%) Y(99.2%) F(99.0%) I(99.58%) Human N(97.5%) K(65.4%) E(100%) I(97.46%) M(97.5%)V(97.20%)S(48.1%) Y(97.5%) Y(78.1%) I(99.75%) 2009 H1N1 D(100%) R(100% ) D(100%) V(100%) I(99.75%) A(100%) V(98.2%) H(99.7%) V(100%) M(100%) Swine D(61.19%) R(82.6%) E(98.10%) V(57.14%)I(60.00%) A(56.42%)I(76.90%) H(64.29%) F(86.66%) I(95.24%) Position 350(s) 353 371 373(a) 377 430(s) 433 444 456(s) 498 Avian T(94.39%) V(90.30%)M(94.77%) T(69.18%)S(69.67%)T(94.8%) T(95.36%)I(99.00%) V(98.4%) N(96.09%) Human T(97.20%) S(52.42%)M(91.35%) A(34.35%)S(80.66%)T(83.72%)T(88.04%)I(98.22%) V(82.95%) N(96.18%) 2009 H1N1 K(100%) I(99.75%) V(100%) T(76.40%)N(100%) S(100%) N(100%) V(100%) L(100%) S(99.24%) Swine K(64.52%) V(53.81%)V(59.29%) A(57.38%)S(51.90%)S(38.57%)N(60.95%)I(64.76%) L(61.19%) N(69.29%) Figure 5. Top important NP positions in distinguishing avian, human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses. JBiSE
594 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE Table 8. This table contains the consensus amino acids and their frequency at positions in NS1 that have high importance in separat- ing 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 6 25(s) 59 67(s) 74 76 91 111 112(a) Avian V(77.85%) Q(79.35%) R(73.41%) R(78.85%)D(97.87%)A(77.79%)T(97.33%) V(79.76%) A(59.11%) Human V(95.48%) Q(95.09%) H(45.19%) R(54.62%)D(97.64%)A(97.84%)T(96.66%) V(96.07%) E(55.00%) 2009 H1N1 M(99.73%) N(99.73%) L(100%) W(100%) S(99.73%)T(1 00 % ) S(99.73%) I(100%) I(99.45%) Swine V(95.26%) N(58.70%) L(59.49%) W(59.68%)D(37.49%)T(58.70%)A(60.28%) V(50.59%) A(36.96%) Position 119 129(s) 171(a,s) 198 205 206(s) 207 213(s) 217(s) Avian M(99.39%) I(75.55%) D(48.87%) L(52.52%)S(69.50%)S(67.66%)D(59.20%) P(92.56%) K(68.00%) Human M(85.85%) M(51.28%) I(55.20%) L(84.28%)S(92.93%)S(91.55%)N(78.00%) P(97.05%) K(69.16%) 2009 H1N1 L(100%) V(100%) Y(100%) I(100%) N(100%) C(100%) D(100%) S(100%) E(100%) Swine M(90.12%) I(64.23%) D(59.29%) L(97.04%)S(64.62%)R(56.13%)N(92.09%) P(51.38%) E(57.51%) Figure 6. Top important NS1 positions in distinguishing avian, human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses. these sites (Figure 6). There were two novel sites 6 and 59 in the RNA-binding domain and the other novel sites in the effector domain (Table 8). 3.2.7. NS2 Protein Influenza virus replicates its RNA genome in the nucleus of infected cells. The NS2 protein mediates the nuclear export of virion RNAs, with help from M1 and NP. A recent report indicated that it also has a role in the regu- lation of viral transcription and replication [64]. NS2 contains a highly conserved nuclear export signal motif in its amino-terminal region (residues 12-21) [65], and site 14 in Table 9 was in this region. Positions 60, 70, and 107 were avian-human ho st shift markers in [43]. We found position 14 important as a host marker as well. Position 107 was a swine-human host switch marker in [40], but our analysis also pointed to positions 14, 32, 49, and 57 as such sites of high sig- nificance (Figure 7). The NS2 protein of 2009 pandemic H1N1 contained so many important avian-human or swine- human sites, resulting in only one site as a novel site (Table 9). 3.2.8. PA Protein Compared to the well-defined functions of PB1 and PB1, PA is involved in a diverse range of functions of the po- lymerase complex, including protein stability, endonu- clease activity, and cap binding and promoter binding
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 595 Copyright © 2010 SciRes. JBiSE [66]. Positions 28, 55, 57, 65, 66, 100, 225, 268, 321, 337, 356, 382, 400, 404, 409, 421, and 552 were avian-human host shift markers in [43]. Additionally, we found positions 241 and 383 equally important as these positions as avian-hu man markers. Positions 268 and 552 were swine-human markers uncovered in [40]. Our analysis suggested the positions 28, 225, 337, and 400 were equally crucial as these two sites as swine-human markers (Figure 8). The N-terminal domain of PA (residues 1-256) har- bors several functional domains, including an endonu- clease active site with a putative active site motif, two putative nuclear transport motifs (residues 124-139 (NLS1) and residues 186-247 (NLS2)), and a proteolytic domain that can induce generalized proteolysis of both viral and host proteins. The C-terminal domain of PA (residues 257-716) binds to PB1 for complex formation and nuclear transport [66]. There were three novel sites 186, 204, and 213 within the second putative nuclear localization signals (NLS2), and one novel site 626 within the PB1 binding domain (Table 10). 3.2.9. P B1 Protein The influenza virus polymerase is responsible for repli- cation and transcription of the eight gene segments of the viral RNA genome in the infected host cell. PB1 can interact with PB2, PA, and NP and binds to viral pro- moter, and is accountable for viral RNA elongation and cap RNA cleavage activities [66,67]. Position 336 was the only avian-human host shift markers in [43]. We found positions 212, 327, 361, 375, 384, 401, 473, and 584 equally significant as position 336 (Figure 9). There were one novel site 12 within the PB1-PA binding domain (residues 1-25) and two novel sites 618 and 728 in the PB1-PB2 binding domain (resi- dues 600- 7 57 ) (Table 11) [68]. PB1-PA binding domain (residues 1-25) and two novel Table 9. This table contains the consensus amino acids and their frequency at positions in NS2 that have high importance in separat- ing 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 6 14(a,s) 32(s) 34(s) 40(a,s) 48(a) 57(a,s) Avian V(79.1%) M(56.2%) I(99.0%) Q(95.6%) L(71.54%) A(73.2%) S(97.6%) Human V(98.2%) L(55.6%) I(99.0%) Q(98.7%) L(61.9%) A(96.6%) S(60.3%) 2009 H1N1 M(99.7%) M(100%) V(99.7%) R(100%) I(100%) T(100%) Y(100%) Swine V(95.2%) M(83.2%) V(68.1%) Q(53.3%) I(67.8%) A(70.1%) Y(65.8%) Position 60(a,s) 63(a,s) 83(a) 89(a,s) 107(a,s) 115(a) Avian S(55.7%) G(75.8%) V(71.7%) I(69.8%) L(99.9%) T(84.4%) Human N(68.7%) G(96.3%) V(98.2%) T(63.2%) F(74.9%) T(89.8%) 2009 H1N1 S(100%) E(95.8%) M(99.7%) A(97.8%) L(100%) A(99.4%) Swine N(61.0%) E(62.1%) V(97.4%) M(32.5%) L(90.9%) T(98.3%) Figure 7. Top important NS2 positions in distinguishing avian, human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses.
596 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE Table 10. This table contains the consensus amino acids and their frequency at positions in PA that have high importance in separat- ing 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 28(a,s) 55(a) 85 100(a,s) 186 204 213 256 262 275 277 Avian P(99.5%) D(98.9%)T(96.5%) V(94.5%)G(98.0%)R(80.6%)R(97.8%)R(98.6%)K(96.9%) P(97.5%)S(97.1%) Human L(70.3%) N(71.3%)T(91.0%) A(70.6%)G(100%) R(53.0%)R(97.5%)R(54.5%)K(95.0%) P(98.9%)S(35.8%) 2009 H1N1 P(100%) D(100%) I(100%) V( 100%) S(100%) K(100%)K(100%) K(100%) R(100%) L(98.6%)H(100%) Swine P(79.9%) D(53.4%)T(60.9%) V(86.6%)G(96.8%)R(90.7%)R(92.1%)R(63.8%)K(77.8%) P(91.5%)S(53.4%) Position 336 337(a,s) 356(a) 362 388 400(a,s) 404(a,s) 407 552(a,s) 626 Avian L(99.5%) A(88.7%)K(98.9%) K(99.5%)S(80.1%)S(40.9%)A(93.2%)I(95.9%) T(99.7%) K(83.7%) Human L(97.1%) S(35.8%) R(69.9%) K(98.6%)S(84.2%)L(79.2%)S(72.0%)I(98.6%) S(71.7%) K(98.2%) 2009 H1N1 M(100%) A(99.0%)R(99.7%) R(100%) G(99.0%)P(100%) A(100%) V(99.3%)T(99.7%) R(99.3%) Swine L(95.6%) A(86.9%)K(53.9%) K(71.1%)S(52.8%)P(30.6%)A(86.9%)I(71.1%) T(91.5%) K(96.2%) Figure 8 . Top important PA positio ns in distinguishing avian, human , 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses. sites 618 and 728 in the PB1-PB2 binding domain (resi- dues 600- 7 57 ) (Table 11) [68]. 3.2.10. PB2 Protein PB2 interacts with PB1 and NP, but not PA. Its primary function is binding to cap structures on host cell pre- mRNAs before they are cleaved to provide primers for viral mRNA synthesis [66]. Positions 9, 44, 64, 81, 105, 199, 271, 292, 368, 475, 567, 588, 613, 627, 661, 674, and 702 were avian-human host shift markers in [43]. Positions 108, 197, and 684 were as significant as these sites in our finding. Position 44 was a swine- human marker in [40], but our analysis i mplied positions 64, 65, 81, 105, 199, 292, 567, 627, 649, 661, and 674 were equally important as position 44 (Figure 10). Position 702, an avian-human marker selected in [40,43,69], ranked 21th in our Random Forests analysis, and there- fore it was not included in our plot in Figure 10. In addi- tion to the SR polymorphism, S590 and R591, found in [23], novel sites in PB2 discovered here provided addi- tional polymorphism that might convey enhanced poly- merase activity in human cells. Position 627 in PB2 was considered critical for host shifts in our analysis, a well-known host marker discussed in [21,22], and was located in the PB2-PB1 and PB2-NP b inding domains [43]. The PB2-NP binding domain contains residues 1-269 and 580-683, and the PB2-PB1 binding domain contains residues 51-259 and 580-759. There w ere novel sites 54, 590, 645, and 667 in the PB2-PB1 and PB2-NP binding
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 597 Copyright © 2010 SciRes. JBiSE Table 1 1 . This table contains the consensus amino acids and their frequency at positions in PB1 that have high importance in sepa- rating 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 12 175 179 216 298 327(a,s) 339(s) 361(a,s) 364 386(s) Avian V(99.0%) D(95.9%) M(95.9%) S(95.7%) L(98.6%) R(98.7%) I(98.7%) S(99.0%) L(99.5%) R(57.0%) Human V(99.7%) D(97.2%) M(69.6%) S(65.7%) L(79.2%) K(53.6%) I(96.9%) S(61.9%) L(99.3%) R(65.4%) 2009 H1N1 I(99.7%) N(100%) I(100%) G(100%) I(100%) R(100%) M(100%) R(100%) I(100%) K(100%) Swine V(95.1%) D(96.7%) M(66.0%) S(60.1%) L(96.7%) R(89.7%) I(44.6%) N(32.3%) L(96.5%) R(95.4%) Position 435 486 517(s) 584(a,s) 587 618 638(s) 728 741(a,s) Avian T(99.0%) R(98.6%) I(99.1%) R(97.1%) A(98.4%) E(97.7%) E(98.8%) I(99.1%) A(96.2%) Human T(99.3%) R(64.7%) I(81.3%) R(63.7%) A(98.6%) E(99.7%) E(98.6%) I(100%) A(59.2%) 2009 H1N1 I(99.4%) K(100%) V(100%) Q(100%) V(97.4%) D(100%) D(100%) V(10 0%) S(100%) Swine T(66.8%) R(64.1%) I(75.5%) R(40.5%) A(86.7%) E(61.4%) E(69.0%) I(98.4%) A(59.0%) Figur e 9 . Top important PB1 pos itions in d istinguis hing avian , human, 2 009 pandemic H1N1, 2009 s easonal H1N 1, and swin e viruses . domains and sites 147 and 225 in the PB2-NP binding domain [43] (Table 12). 4. DISCUSSIONS Extensive research to date provided highly informative knowledge about th e origin and genetic lin eages of 2009 pandemic H1N1, but the host markers of this new virus remained elusive. Recent studies indicated that human host adaptation is complex and multigenic, and the well-known host shift markers are lacking in this new virus. The hypothesis in the current study was that these markers of 2009 pandemic H1N1 might exist outside of the space of traditional host switch markers. To test this hypothesis in this study, Rando m Forests were applied to uncover novel important markers in each of the ten pro- teins of influenza that could differentiate 2009 p andemic H1N1 from human viruses, but were not present in the previous avian-human or swine-human host switch mark - ers. Our approach naturally led to a systematic discovery of new host markers like the SQ polymorphis m found in [23] that could enrich our current knowledge of 2009 pandemic H1N1 and complement the repertoire of ex- isting host shift signatures. Among others, this study revealed th e n ovel ho st sites 54, 14 7, 22 5, 315 , 45 3, 55 9, 590, 645, and 667 in PB2 of 2009 pandemic H1N1. They provided ample potential sites to investigate ex- perimentally whether they also compensate the lack of
598 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE amino acid lysine at residue 627, as the SR polymor- phism. Their prospective broader roles in enhancing this new virus’s replication and transmission in humans are worthy of further research. In this regard, the three posi- tions 54, 315, and 559 in PB2 were particularly of inter- est because they had much higher importance than the two positions 590 and 591 associated with the SR poly- morphism. Four proteins are involved and required in the synthe- sis of influenza virus RNA, which are PB2 and PA of avian lineage, PB1 of human origin, and NP derived from classical swine viruses in 2009 pandemic H1N1. To gain insight into the adaptive strategies employed by these four proteins of different origins to evade restric- tion in human cells will be a challenge. The novel sites identified in this study provided a starting point for fu- ture integrative examination of the interactions of these proteins. It was expected that 2009 pandemic H1N1 would co- circulate with seasonal H1N1 for some time. Our cata- logue of amino acid markers that could effectively sepa- rate 2009 pandemic H1N1 from 2009 seasonal H1N1 presented a valuable view of these two viruses that share similar clinical courses but are unique gen etically. Table 12. This table contains the consensus amino acids and their frequency at positions in PB2 that have high importance in sepa- rating 2009 pandemic H1N1 from human viruses. The novel host sites in this protein are the positions without an ‘a’ (for avian) or a ‘s’ (for swine) or both. Position 9(a) 54 64(a,s) 65(s) 81(a,s) 105(a,s) 147 184(s) 199(a,s) Avian D(97.5%) K(99.7%) M(74.9%) E(97.8%) T(97.3%) T(90.9%) I(82.1%) T(96.3%) A(99.2%) Human N(71.0%) K(100%) T(68.0%) E(98.5%) M(52.0%) V(52.4%) I(87.7%) T(99.6%) S(72.9%) 2009 H1N1 D(10 0%) R(100%) M(100%) D(99.7%) T(100%) T(100%) T(100%) A(99.0%) A(100%) Swine D(63.9%) K(98.5%) M(53.2%) E(69.4%) T(84.7%) T(87.8%) I(68.2%) T(57.2%) A(55.0%) Position 225 292(a,s) 315 340(s) 453 475(a) 559 567(a,s) 588(a,s) Avian S(99.4%) I(88.6%) M(95.2%) R(52.2%) P(94.7%) L(99.2%) T(91.0%) D(98.0%) A(95.8%) Human S(98.9%) T(73.6%) M(99.6%) R(60.2%) H(52.0%) M(70.3%) T(71.7%) N(70.3%) I(68.4%) 2009 H1N1 G(100%) V(99.4%) I(100%) K(97.4%) S(99.7%) L(100%) I(100%) D(100%) T(98.4%) Swine S(72.2%) I(56.0%) M(97.2%) R(59.9%) P(57.8%) L(54.1%) T(70.9%) D(90.2%) A(55.7%) Position 590 591(s) 613(a,s) 627(a,s) 645 661(a,s) 667 674(a,s) 684(a) Avian G(87.6%) Q(97.9%) V(98.4%) E(91.7%) M(99.4%) A(96.2%) V(92.4%) A(97.1%) A(96.9%) Human G(69.9%) Q(98.9%) T(64.3%) K(80.3%) M(98.9%) T(78.4%) I(62.1%) T(69.1%) A(51.3%) 2009 H1N1 S(99.7%) R(100%) V(100%) E(100%) L(100%) A(100%) V(100%) A(99.7%) S(100%) Swine G(71.3%) Q(67.3%) V(74.9%) E(53.8%) M(74.9%) A(48.0%) V(69.4%) A(86.5%) A(74.3%) Figure 10. Top important PB2 positions in distinguishing avian, human, 2009 pandemic H1N1, 2009 seasonal H1N1, and swine viruses.
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 599 Copyright © 2010 SciRes. JBiSE Various computational techniques including entropy [39,40], mutual information [42,43], statistical tests [44 ], and support vector machines [45] were utilized to dis- cover molecular markers in influenza viruses. To dem- onstrate the validity of using Random Forests as a fea- ture selection technique in identifying novel host markers in 2009 pandemic H1 N1, the top markers found by Random Forests to distinguish the human virus from avian or swine viruses were also included in this report, which contained many known host adoption markers from previous studies. There were fewer novel sites in M1, M2, and NS2 than in the other proteins under this study resulting from many avian-human or swine-human sites among these proteins. 5. CONCLUSIONS Our findings confirmed that there are novel host sites in the proteins of 2009 pandemic H1N1 that could separate this new virus from human viruses with high confidence. These markers could not be found in the search space of traditional avian-human or swine-human host shift mark- ers, thus offering new potential sites for further experi- mental verificat ion to el ucidate t heir bi ologi cal functi ons. 6. ACKNOWLEDGEMENTS We thank Houghton College for its financial support. REFERENCES [1] Samji, T. (2009) Influenza A: Understanding the viral life cycle . Yale Journal of Biology Medicine, 82(4), 153- 159. [2] Gavin, J.D., Smith, D.V., Bahl, J., Lycett, S.J., et al. (2009) Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459, 1122-1125. [3] Chang, Y.S., van Hal, S.J., Spencer, P.M, Gosbell, I.B. and Collett, P.W. (2010) Comparison of adult patients hospitalised with pandemic (H1N1) 2009 influenza and seasonal influenza during the PROTECT phase of the pandemic response. The Medical Journal of Australia, 192(2), 90-93. [4] Dhiman, N., Mark, J.E., Irish, C., Wright, P., Smith, T.F. and Pritt, B.S. (2010) Mutability in the matrix gene of novel influenza A H1N1 virus detected using a fret probe-based real-time reverse transcriptase PCR assay. Journal of Clinical Microbiology, 48(2), 677-679. [5] Zheng, X., Todd, K.M., Yen-Lieberman, B., Kaul, K., Mangold, K. and Shulman, S.T. (2009) Unique finding of a 2009 H1N1 influenza virus-positive clinical sample suggests matrix gene sequence variation. Journal of Clinical Microbiology, 48(2), 665-666. [6] Shen, J., Ma, J. and Wang, Q. (2009) Evolutionary trends of A (H1N1) influenza virus hemagglutinin since 1918. PLoS One, 4(11), e7789. [7] Soundararajan, V., Tharakaraman, K., Raman, R., Ragu- ram, S., Shriver, Z., Sasisekharan, V. and Sasisekharan, R. (2009) Extrapolating from sequence—the 2009 H1N1 ‘swine’ influenza virus. Nature Biotechnology, 27, 510- 513. [8] Childs, R.A., Palma, A.S. , Wharton, S., Matrosovich, T., Liu, Y., Chai, W.G., Campanero-Rhodes, M.A., et al. (2009) Receptor-binding specificity of pandemic influ- enza A (H1N1) 2009 virus determined by carbohydrate microarray. Nature Biotechnology, 27, 797-799. [9] Igarashi, M., Ito, K., Yoshida, R., Tomabechi, D., Kida, H. and Takada, A. (2009) Predicting the antigenic struc- ture of the pandemic (H1N1) 2009 influenza virus he- magglutinin. PLoS One, 5(1), e8553. [10] Cosic, I. (1997) The resonant recognition model of mac- romolecular bioreactivity, theory and application. Birk- hauser Verlag, Berlin. [11] Veljkovic, V., Niman, H.L., Glisic, S., Veljkovic, N., Perovic, V. and Muller, C.P. (2009) Identification of he- magglutinin structural domain and polymorphisms which may modulate swine H1N1 interactions with human re- ceptor. BMC Structural Biology, 9, 62. [12] Hu, W. (2010) Identification of highly conserved do- mains in hemagglutinin associated with the receptor binding specificity of influenza viruses: 2009 H1N1, avian H5N1 and swine. Journal of Biomedical Science and Engineering, 3, 114-123. [13] Hu, W. (2010) Quantifying the effects of mutations on receptor binding specificity of influenza viruses. Journal of Biomedical Science and Engineering, 3, 227-240. [14] Janies, D.A., Voronkin, I.O., Studer, J., Hardman, J., Alexandrov, B.B., Treseder, T.W. and Valson, C. (2010) Selection for resistance to oseltamivir in seasonal and pandemic H1N1 influenza and widespread co-circulation of the lineages. International Journal of Health Geogra- phics, 9(1), 13. [15] Deyde, V.M., Sheu, T.G., Trujillo, A.A., Okomo- Adhiambo, M., Garten, R., Klimov, A.I. and Gubareva, L.V. (2010) Detection of molecular markers of drug re- sistance in 2009 pandemic influenza A (H1N1) viruses by pyrosequencing. Antimicrob Agents Chemother, 54(3), 1102-1110. [16] Hurt, A.C., Holien, J.K., Parker, M., Kelso, A. and Barr, I.G. (2009) Zanamivir-resistant influenza viruses with a novel neuraminidase mutation. The Journal of Virology, 83(20), 10366-10373. [17] Garten, R.J., Davis, C.T., Russell, C.A., Shu, B., Lind- strom, S., Balish, A., Sessions, W.M., Xu, X., et al. (2009) Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science, 325(5937), 197-201. [18] Itoh, Y., Shinya, K., Kiso, M. , Watanabe, T., Sakoda, Y., Hatta, M., Muramoto, Y., et al. (2009) In vitro and in vivo characterization of new swine-origin H1N1 influ- enza viruses. Nature, 460, 1021-1025. [19] Uhlendorff, J., Matrosovich, T., Klenk, H.D. and Ma- trosovich, M. (2009) Functional significance of the he- madsorption activity of influenza virus neuraminidase and its alteration in pandemic viruses. Archives of Virol- ogy, 154(6), 945-957. [20] Sung, J.C., van Wynsberghe A.W., Amaro, R.E., Li, W.W. and McCammon, J.A. (2010) Role of secondary
600 W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 Copyright © 2010 SciRes. JBiSE sialic acid binding sites in influenza N1 neuraminidase. Journal of the American Chemistry Society, 132(9), 2883-2885. [21] Steel, J., Lowen, A., Mubareka, S., Palese, P. and Baric, R. (2009) Transmission of influenza virus in a mammal- ian host is increased by PB2 amino acids 627K or 627E/701N. PLoS Pathog, 5, e1000252. [22] Subbarao, E.K., London, W. and Murphy, B.R. (1993) A single amino-acid in the Pb2-gene of influenza-A virus is a determinant of host range. Journal of Virology, 67, 1761-1764. [23] Mehle, A. and Doudna, J.A. (2009) Adaptive strategies of the influenza virus polymerase for replication in hu- mans. Proceedings of the National Academy of Sciences of the United States of America, 106(50), 21312-21316. [24] Rolling, T., Koerner, I., Zimmermann, P., Holz, K., Hal- ler, O., Staeheli, P. and Kochs, G. (2009) Adaptive muta- tions resulting in enhanced polymerase activity contrib- ute to high virulence of influenza A virus in mice. Jour- nal of Virology, 83 (13), 6673-6680. [25] Liu, X. and Zhao, Y.P. (2010) Switch region for patho- genic structural change in conformational disease and its prediction. PLoS One, 5(1), e8441. [26] Chen, W., Calvo, P.A., Malide, D., Gibbs, J., Schubert, U., Bacik, I., Basta, S., O'Neill, R., Schickli, J., Palese, P., Henklein, P., Bennink, J.R. and Yewdell, J.W. (2001) A novel influenza A virus mitochondrial protein that in- duces cell death. Nature Medicine, 7, 1306-1312. [27] Lamb, R.A. and Takeda, M. (2001) Death by influenza virus protein. Nature Medici ne, 7, 1286-1288. [28] Zell, R., Krumbholz, A., Eitner, A., Krieg, R., Halbhuber, K.J. and Wutzler, P. (2007) Prevalence of PB1-F2 of in- fluenza A viruses. Journal of General Virology, 88, 536- 546. [29] McAuley, J.L., Zhang, K. and McCullers, J.A. (2010) The effects of influenza A virus PB1-F2 protein on po- lymerase activity are strain specific and do not impact pathogenesis. Journal of Virology, 84(1), 558-564. [30] Ramakrishnan, M.A., Gramer, M.R., Goyal, S.M. and Sreevatsan, S. (2009) A Serine12Stop mutation in PB1- F2 of the 2009 pandemic (H1N1) influenza A: A possible reason for its enhanced transmission and pathogenicity to humans. Journal of Veterinary Science, 10(4), 349-351. [31] Trifonov, V. and Rabadan, R. (2009) The contribution of the pb1-f2 protein to the fitness of influenza a viruses and its recent evolution in the 2009 influenza A (H1N1) pandemic virus. PLoS Current: Influenza, 21, RRN1006. [32] Hale, B.G., Randall, R.E., Ortín, J. and Jackson, D. (2008) The multifunctional NS1 protein of influenza A viruses, Journal of General Vi rology, 89, 2359-2376. [33] Zhang, C.F., Yang, Y.T., Zhou, X.W., Liu, X.L., Song, H.B., He, Y.X. and Huang, P.T. (2010). Highly patho- genic avian influenza A virus H5N1 NS1 protein induces caspase-dependent apoptosis in human alveolar basal epithelial cells. Virology Journal, 7, 51. [34] Jackson, D., Hossain, M.J., Hickman, D., Perez, D.R. and Lamb, R.A. (2008) A new infl uenza virus virulence de- terminant: The NS1 protein four C-terminal residues modulate pathogenicity. Proceedings of the National Academy of Sciences of the United States of America, 105, 4381-4386. [35] Seo, S.H., Hoffmann, E. and Webster, R.G. (2002) Le- thal H5N1 influenza viruses escape host anti-viral cyto- kine responses. Nature Medicine, 8, 950-954. [36] Salomon, R., Franks, J., Govorkova, E.A., Ilyushina, N.A., Yen, H.L., Hulse-Post, D.J., Humberd, J., Trichet, M., Rehg, J.E., Webby, R.J., Webster, R.G. and Hoff- mann, E. (2006) The polymerase complex genes contrib- ute to the high virulence of the human H5N1 influenza virus isolate A/Vietnam/1203/04, Journal of Experimen- tal Medicine, 203(3), 689-697. [37] Furuse, Y., Suzuki, A., Kamigaki, T. and Oshitani, H. (2009) Evolution of the M gene of the influenza A virus in different host species: Large-scale sequence analysis. Virology Journal, 6, 67. [38] Furuse, Y., Suzuki A. and Oshitani, H. (2009) Large- scale sequence analysis of M gene of influenz a A viruse s from different species: Mechanisms for emergence and spread of amantadine resistance. Antimicrobial Agents and Chemotherapy, 53(10), 4457-4463. [39] Chen, G.W., Chang, S.C., Mok, C.K., Lo, Y.L., Kung, Y.N., et al. (2006) Genomic signatures of human versus avian influenza A viruses. Emerging Infectious Diseases, 12, 1353-1360. [40] Chen, G.W. and Shih, S.R. (2009) Genomic signatures of influenza A pandemic (H1N1) 2009, Virus. Emerging Infectious Diseases, 15, 1897-1903. [41] Pan, C., Cheung, B., Tan, S., Li, C., Li, L., et al. (2010) Genomic signature and mutation trend analysis of pan- demic (H1N1) 2009, Influenza A virus. PLoS One, 5(3), e9549. [42] Miotto, O., Heiny, A., Tan, T.W., August, J.T., Brusic, V. (2008) Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis. BMC Bioinformatics, 9, S18. [43] Miotto, O., Heiny, A.T., Albrecht, R., García-Sastre, A., Tan, T.W., August, J.T. and Brusic, V. (2010) Com- plete-proteome mapping of human influenza A adaptive mutations: implications for human transmissibility of zoonotic strains. PLoS One, 5(2), e9025. [44] Finkelstein, D.B., Mukatira, S., Mehta, P.K., Obenauer, J.C., Su, X., Webster, R.G. and Naeve, C.W. (2007) Per- sistent host markers in pandemic and H5N1 influenza viruses. Journal of Virology, 81(19), 10292-10299. [45] Allen, J.E., Gardner, S.N., Vitalis, E.A., Slezak, T.R. (2009) Conserved amino acid markers from past influ- enza pandemic strains. BMC Microbioloy, 9, 77. [46] Katoh, K., Kuma, K., Toh, H. and Miyata, T. (2005) MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Research, 33, 511- 518. [47] Breiman, L. (2001) Random Forests. Machine Learning, 45(1), 5-32. [48] Díaz-Uriarte, R. and Alvarez de Andrés, S. (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7, 3. [49] Archer, K.J. and Kimes, R.V. (2008) Empirical charac- terization of random forest variable importance measures. Computational Statistics and Data Analysis, 52, 2249- 2260. [50] Reif, D.M. Motsinger, A.A., McKinney, B.A., Crowe, J.E. and Moore, J.H. (2006) Feature selection using a random forests classifier for the integrated analysis of
W. Hu / J. Biomedical Science and Engineering 3 (2010) 584-601 601 Copyright © 2010 SciRes. JBiSE multiple data types. Proceedings of 2006 IEEE Sympo- sium on Computational Intelligence and Bioinformatics and Computational Biology, Toronto. [51] Granittoa, P.M., Furlanellob, C., Biasiolia, F. and Gasperia, F. (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Sys- tems, 83, 83-90. [52] Menze1, B.H., Kelm, B.M., Masuch, R., Himmelreich, U., Bachert, P., Petrich, W. and Hamprecht, F.A. (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature se- lection and classification of spectral data. BMC Bioin- formatics, 10, 213. [53] Gao, D., Zhang, Y.X. and Zhao, Y.H. (2009) Random forest algorithm for classification of multi-wa velen gth data. Research in Astronomy and Astrophysics, 9(2), 220-226. [54] Hu, W. (2009) Identifying predictive markers of chemo- sensitivity of breast cancer with random forests. Journal of Biomedical Science and Engineering, 3(1), 59-64. [55] KováccaronOVá, A., Ruttkay-Nedecký, G., Karol Haver- líK1, I. and Janecccaronek, S. (2002) Sequence similari- ties and evolutionary relationships of influenza virus A hemagglutinins. Virus Genes, 24, 57-63. [56] Colman, P.M., Hoyne, P.A. and Lawrence, M.C. (1993) Sequence and structure alignment of paramyxovirus he- magglutinin-neuraminidase with influenza virus neura- minidase. Journal of Virology, 67, 2972-2980. [57] Maurer-Stroh, S. Ma, J.M., Lee, R.T.C., Sirota, F.L. and Eisenhaber, F. (2009) Mapping the sequence mutations of the 2009 H1N1 influenza A virus neuraminidase rela- tive to drug and antibody binding sites. Biology Direct, 4, 18. [58] Liu, T. and Ye, Z.P. (2005) Attenuating mutations of the matrix gene of influenza A/WSN/33 Virus. Journal of Virology, 79(3), 1918-1923. [59] Baudin, F., Petit, I., Weissenhorn, W. and Ruigrok, R.W.H. (2001) In vitro dissection of the membrane binding and RNP binding activities of influenza virus M1 protein. Virology, 281, 102-108. [60] Dua, Q.S., Wang, S.Q., Huang, R.B. and Chou, K.C. (2010) Computational 3D structures of drug-targeting proteins in the 2009-H1N1 influenza A virus. Chemical Physics Letters, 485, 191-195. [61] Ye Q., Krug R.M. and Tao Y.J. (2006) The mechanism by which influenza A virus nucleoprotein forms oli- gomers and binds RNA. Nature, 444, 1078-1082. [62] Biswas, S.K., Boutz, P.L. and Nayak, D.P. (1998) Influ- enza virus nucleoprotein interacts with influenza virus polymerase proteins. Journal of Virology, 72, 5493-5501. [63] Lin, D., Lan, J. and Zhang, Z. (2007) Structure and func- tion of the NS1 protein of influenza A virus. Acta Bio- chim Biophys Sin (Shanghai), 39(3), 155-162. [64] Robb, N.C., Smith, M., Vreede, F.T. and Fodor, E. (2009) NS2/NEP protein regulates transcription and replication of the influenza virus RNA genome. Journal of General Virology, 90, 1398-1407. [65] Iwatsuki-Horimoto, K., Horimoto, T., Fujii, Y. and Kawa- oka, Y. (2004) Generation of influenza A virus NS2 (NEP) mutants with an altered nuclear export signal se- quence. Journal of Virology, 78(18), 10149-10155. [66] Yuan, P.W., Bartlam, M., Lou, Z.Y., Chen, S.D., Zhou, J., He, X.J., Lv, Z.Y., Ge, R.W., Li, X.M., Deng, T., Fo- dor, E., Rao, Z.H. and Liu, Y.F. (2009) Crystal structure of an avian influenza polymerase PAN reveals an en- donuclease active site. Nature, 458, 909-913. [67] Biswas, S.K. and Nayak, D.P. (1994) Mutational analysis of the conserved motifs of influenza A virus polymerase basic protein 1. Journal of Virology, 68, 1819-1826. [68] Ohtsu, Y., Honda, Y., Sakata, Y., Kato, H. and Toyoda, T. (2002) Fine mapping of the subunit binding sites of influenza virus RNA polymerase. Microbiology and Im- munology, 46, 167-175. [69] Taubenberger, J.K., Reid, A.H., Lourens, R.M., Wang, R., Jin, G. and Fanning, T.G. (2005) Characterization of the 1918 influenza virus polymerase genes. Nature, 437(7060), 889-893.
|