American Journal of Molecular Biology, 2011, 1, 52-61
doi:10.4236/ajmb.2011.12007 Published Online July 2011 (
Published Online July 2011 in SciRes.
Characterization of Asian and North American avian H5N1
Wei Hu
Department of Computer Science, Houghton College, New York, USA.
Received 15 March 2011; revised 13 May 2011; accepted 30 May 2011.
Since the emerge of the highly pathogenic avian
H5N1 virus in Asia in 1996, the possibility for this
virus to cross species barriers to infect humans and
its ability to cause large outb reaks in birds have been
a public health concern. This virus has been spread-
ing from Asia to Europe and Africa by migratory
birds with North America as its next possible stop. In
this study, an ensemble of computational techniques
including Random Forests, Informational Spectrum
Method, Entropy, and Mutual Information were em-
ployed to unravel the distinct characteristics of Asian
and North American avian H5N1 in comparison with
human and swine H5N1. Critical differences were
identified in the HA cleavage and binding sites, the
HA receptor selection, the interaction patterns of HA
and NA, and NP, PA, PB1, and PB2, and the impor-
tant sites in the influenza proteins including HA, NA,
M1, M2, NS1, NS2, NP, PA, PB1, PB1-F2, and PB2.
Keywords: H5N1; Hemagglutinin, Influenza;
Informational Spectrum Method; Mutation; Random
Forests; Receptor Binding Specificity
Wild birds are a natural reservoir of all known influenza
A subtypes, and many of these viruses cause only mild
symptoms in birds. The highly pathogenic avian H5N1
virus was first detected in China in 1996 [1] and was
also the cause of two subsequent outbreaks in migratory
birds at Qinghai Lake in China in 2005 and 2006. The
first group of H5N1 human infections were reported in
Hong Kong in 1997 [2]. However, there is no evidence
of efficient human-to-human transmission of the highly
pathogenic avian H5N1 virus. This virus has spread to
other Asian countries including Indonesia, Japan, Korea,
Thailand, Vietnam, and Malaysia, and most recently to
Europe and Africa. Therefore, there is a growing concern
over the potential for migratory birds to introduce the
highly pathogenic Asian H5N1 strain into North Amer-
ica. A strategic plan was developed for the early detec-
tion of highly pathogenic avian H5N1 in the United
States [3].
In addition to the possible spread of highly pathogenic
avian H5N1 by migratory birds, viral mutations and re-
assortment of avian, human and swine viruses could
generate a new strain capable of transmission among
humans. In particular, swine can serve as a “mixing ves-
sel” in the generation of a novel virus, because they are
susceptible to infection with both avian and human vi-
ruses. A recent report [4] showed that swine H1N1,
H1N2, and H3N2 viruses are currently co-circulating in
China, and the highly pathogenic avian H5N1 virus
might be able to contribute genes to swine H3N2 virus,
demonstrating the continued risk for further reassortment
of swine virus and continued spread of pandemic 2009
H1N1 virus worldwide. Another fear is that if swine can
carry both H5N1 and 2009 H1N1, the viruses can com-
bine and mutate into a novel strain of high virulence that
can transmit efficiently among humans.
The influenza A virus genome encodes for 11 genes.
Four proteins, HA, NS1, PB1-F2, and PB2, are recog-
nized as major determinants for pathogenicity of influ-
enza, with PB1-F2 as the most recently discovered pro-
tein. Although the host barrier for avian viruses to spread
in humans is multigenic, the receptor binding specificity
of HA is a major obstacle for direct transmission of
avian viruses to humans. In general, human influenza
viruses tend to bind to SA α2, 6Gal receptors, whereas
avian viruses favor SA α2, 3Gal receptors. Adaptation of
avian virus to humans likely requires a shift in receptor
binding specificity of the virus from avian-type to hu-
man-type. In humans, the SA α2, 6Gal receptor is ex-
pressed mainly in the upper airway, but the SA α2, 3Gal
receptor is expressed in alveoli and the terminal bron-
chiole [5]. Clinical data illustrated that an influenza virus
that could bind to both SA α2, 3Gal and SA α2, 6Gal
receptors is highly pathogenic.
The receptor binding site of HA comprises three sec-
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61 53
ondary structure elements: the 190 helix (residues 190 -
198), the 130 loop (residues 135 - 138) and the 220 loop
(residues 221 - 228) that form the sides of the site with
the base made up of the conserved residues Tyr 98, Trp
153, His 183 and Tyr 195 (H3 numbering) [6]. The re-
ceptor binding affinity of HA could be altered by several
key residues, and a wide variety of such mutations have
been identified. A single mutation D225G changed the
binding of one strain of 1918 H1N1 from pure human-
type binding to human and avian types (dual binding) [7].
Amino acids at positions 226 and 228 (H3 numbering)
could affect binding preference of several subtypes in-
cluding H2, H3, H4 and H9, and on the other hand the
changes at positions 190 and 225 could influence H1
subtype. Computing modeling also indicated that HA
amino acid residues, Tyr 98, Val 135, Ser 136, Ser 137,
Trp 153, Ile 155, His 183, Glu 190, Leu 194, and Gln
226, are important for the avian-type binding of H5N1
HA, with Gln 226 as a very critical residue in this regard.
Further, amino acid residues, Leu 133, Val 135, Trp 153,
Ile 155, Glu 190, and Lys 193, are important for the hu-
man-type binding of H5N1 HA [8]. Interestingly, a bio-
informatics approach, termed informational spectrum
method (ISM), was applied to the HA1 domain of HA in
the study of its receptor binding specificity [9-13].
Besides the role of HA plays in host range restriction,
the cooperative contributions to human adaptation from
other proteins of avian and swine influenza were also
investigated. Support vector machines, entropy, and mu-
tual information were utilized to uncover the unique mo-
lecular features of these viruses [14-22]. Most recently,
Random Forests [23] were applied successfully to tackle
the same problem, where novel host markers in the pro-
teins and genes of pandemic 2009 H1N1 were identified
[24,25]. These findings highlighted that host adaptation
of influenza viruses is a complex and polygenic trait.
While there have been extensive studies on host markers
in general influenza species including avian and swine
viruses, the purpose of this study was to narrow the fo-
cus by identifying the distinct characteristics of the
highly pathogenic avian H5N1 from Asia and the low
pathogenic avian H5N1 from North America, in com-
parison with human and swine H5N1.
2.1. Sequence Data
The sequences of influenza were retrieved from the In-
fluenza Virus Resource of the National Center for Bio-
technology Information (NCBI). Only the full length and
unique sequences were selected. All sequences used in
this study were aligned with MAFFT [26].
2.2. Informational Spectrum Method
The informational spectrum method (ISM) is a bioin-
Ta b l e 1 . Characteristic IS frequencies of HA proteins in 2009
H1N1, swine H1N1/H1N2, avian H1N1, and A/South Caro-
lina/1/18 (H1N1).
Subtype 2009
A/South Caro-
Frequency F(0.295) F(0.055)F(0.076)F(0.236)F(0.258)
formatics technique that can be used to analyze protein
sequences [27]. The idea is to translate the protein se-
quences into numerical sequences based on electron-ion
interaction potential (EIIP) of each amino acid. Then the
Discrete Fourier Transform (DFT) can be applied to
these numerical sequences, and the resulting DFT coef-
ficients are used to produce the energy density spectrum.
The informational spectrum (IS) comprises the frequen-
cies and the amplitudes of this energy density spectrum.
According to the ISM theory, the peak frequencies of IS
of a protein sequence reflect its biological or biochemi-
cal functions. The ISM was successfully applied to
quantify the effects of HA mutations on the receptor
binding preference in [10] and reveal the change of re-
ceptor binding selection caused by the mutations identi-
fied in [28] and [12]. Table 1 shows several common IS
frequencies identified in [12,13].
It was observed in [11] that some of the influenza
strains displayed dual HA receptor binding preference.
Consequently, in this study we used top two IS frequen-
cies, one primary and one secondary, to describe the HA
receptor selection.
2.3. Entropy and Mutual Information
In information theory [29,30], entropy is a measure of
the uncertainty associated with a random variable. Let x
be a discrete random variable that has a set of possible
values {a1, a2, a3,…, an} with probabilities{p1, p2, p3,…,
pn} where P (x = ai) = pi. The entropy H of x is
The mutual information of two random variables is a
quantity that measures the mutual dependence of the two
variables or the average amount of information that x
conveys about y, which can defined as
, ,
xyHx Hy Hxy
where H(x) is the entropy of x, and H(x, y) is the joint
entropy of x and y. I(x, y) = 0 if and only if x and y are
independent random variables.
In the current study, each of the n columns in a multi-
ple sequence alignment of a set of influenza protein se-
quences of length N is considered as a discrete random
variable xi (1 i N) that takes on one of the 20 (n = 20)
opyright © 2011 SciRes. AJMB
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
amino acid types with some probability. H(xi) has its
Table 2. Amino acids near and at the cleavage site in HA con-
sensus sequences of different origins.
Virus Cleavage site
Number of basic
amino acids
Human H1N1 NIPS – – – – IQSRGLF 1
2009 H1N1 NVPS – – – – IQSRGLF 1
North American avian
H5N1 NVPQ – – – – RETRGLF 2
2010: Asian avian H5N1 NSPQREGRRRKRGLF 6
2009: Asian avian H5N1 NSPQRERRRRKRGLF 7
2008: Asian avian H5N1 NSPQRERRRKKRGLF 7
2007: Asian avian H5N1 NSPQRERRRKKRGLF 7
2006: Asian avian H5N1 NSPQRERRRKKRGLF 7
2005: Asian avian H5N1 NSPQRERRRKKRGLF 7
2004: Asian avian H5N1 NSPQRERRRKKRGLF 7
minimum value 0 if all the amino acids at position i are
the same, and achieves its maximum if all the 20 amino
acid types appear with equal probability at position i,
which can be verified by the Lagrange multiplier tech-
nique. A position of high entropy means that the amino
acids are often varied at this position. While H (xi)
measures the genetic diversity at position i in our current
study, I (x, y) measures the correlation between amino
acid substitutions at positions i and j.
3.1. Cleavage Site in HA
Avian H5N1 can be divided into two groups, highly and
low pathogenic viruses based on their difference in viru-
lence. The HA protein playing a key role in pathogenic-
ity has two domains, HA1 and HA2, which are cleaved
from their precursor HA0 by cellular proteases. Nor-
mally, mammalian and low pathogenic avian viruses
carry an HA cleavage site with a monobasic motif,
whereas high pathogenic avian viruses possess a K/R
polybasic HA cleavage site, which is hydrolyzed by a
broad range of proteases in the host cells. Therefore, the
polybasic HA cleavage site is a salient virulence feature.
Removal of the polybasic HA cleavage site results in a
drastic decrease in pathogenicity. However, introduction
of a polybasic motif into the HA cleavage site of a low
pathogenic avian strain might or might not transform it
into a highly pathogenic strain, indicating the existence
of additional virulence determinants in HA or other pro-
teins [31]. It turned out the amino acids V346 and S346
(323 in H3 numbering) adjacent to the cleavage site
played a central role in virulence as well [32] (Table 2),
which demonstrated experimentally that the presence of
V346 reduced the virulence whereas S346 produced the
opposite results. Clearly, North America avian H5N1
had V346 and its Asian counterpart had S346 (Table 2),
in addition to the difference in their cleavage site.
3.2. Receptor Binding Specificity
3.2.1. Mutations in HA Capable of Changing Binding
It is well established that avian viruses will have to ac-
quire human-type receptor preference for sustained rep-
lication and transmission in humans. The receptor bind-
ing affinity is primarily determined by the amino acids at
the receptor binding domain (RBD) along with other
critical sites [9-11]. According to [33], the amino acids at
the sites implicated in receptor specificity were listed in
Ta b le 3 (H3 numbering), which showed that the amino
acid differences between avian, human, and swine H5N1
only occurred at sites 193 and 216, and their key sites
190E and 225G retained the avian-type binding prefer-
ence. This finding supported that viruses with avian spe-
cific receptor binding properties could replicate and
cause infection in humans and swine, but could not do so
efficiently. To offer a comparison with human viruses,
the related amino acids in the human H1N1 HA consen-
sus sequence from 1918 to 2008 was added to the table.
H5 HA mutation K193R increased the human-type bin-
ding, however the virus with mutation R216K or S227N
did not bind to human-type receptor [34]. The frequent
occurrences of mutations K193R and R216K observed
in Ta b le 3 implied the inclination of H5N1 to increase
its human-type binding, which could be a concern. North
American avian H5N1 had E216 and P221, but Asian
avian, human, and swine H5N1 all had K(R) 221 and
S221. Considering the corresponding amino acids at the
same sites in human H1N1 HA, the two amino acids
E216 and P221 in North American avian H5N1 appeared
to favor human-type binding.
In addition to the critical mutations discussed above,
avian H5N1 HA mutations L133V, A137V [35], N186K,
Q196R [36], E190D, G225D [37], G228S [38] were
shown to enhance its human-type binding, and mutations
Q226L and G228S to reduce its avian-type binding af-
finity [35], and mutation S227N to reduce its avian-type
binding and increase its human-type binding affinity
Even though pandemic 2009 H1N1 was known for its
efficient transmission among humans, its HA was of
classical swine lineage [40]. Its HA carried D190, E216,
P221, D225, and E227 (Table 3), the first two were a
feature for avian-type binding whereas the second two
opyright © 2011 SciRes. AJMB
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
Copyright © 2011 SciRes.
were for human-type binding. A new mutation E391K in
HA of 2009 H1N1 was found recently [41], which was
at a known antigenic site and could influence the HA
Ta ble 3. Amino acids at critical sites (H3 numbering) for receptor selection in the HA consensus sequences of various origins. The
distances in the table represent the Hamming distances between human H1N1 and others based on the amino acids at the 17 sites in
the table.
Position 98 136 153 183 186 190 193 194 195 196216 221 222 225 226 227 228Dist
human H1N1 Y S W H P D A L Y H E P K D Q E G 0
2009 H1N1 Y T W H S D S L Y Q E P K D Q E G 4
Swine H5N1 Y S W H N E K L Y Q K S K G Q S G 8
North American avian
H5N1 Y S W H N E K L Y Q E P K G Q S G 6
2010:Asia avian H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2009:Asian avian H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2008:Asian avian H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2007:Asian avian H5N1 Y S W H N E K L Y Q K S K G Q S G 8
2006:Asian avian H5N1
Y S W H N E K L Y Q K S K G Q S G 8
2005:Asian avian H5N1 Y S W H N E K L Y Q K S K G Q S G 8
2004:Asian avian H5N1 Y S W H N E K L Y Q R S K G Q S G 8
2010:human H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2009:human H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2008:human H5N1 Y S W H N E K L Y Q K S K G Q S G 8
2007:human H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2006:human H5N1 Y S W H N E R L Y Q K S K G Q S G 8
2005:human H5N1 Y S W H N E K L Y Q K S K G Q S G 8
2004:human H5N1 Y S W H N E K L Y Q R S K G Q S G 8
Figure 1. IS of consensus H5N1 HA1 of different origins.
membrane fusion. It is expected this novel virus will
continue to mutate [28,41,42].
3.2.2. Receptor Binding Preferences
Some influenza viruses tend to display dual binding pre-
ference at two different frequencies, one as their primary
frequency and one as secondary [11]. With the ISM, Asia
avian, human, and swine H5N1 had F(0.0765) (avian-
type binding) as their primary binding frequency and
F(0.236) (seasonal human H1N1 binding) as secondary
(Figure 1). Our bioinformatics discovery was in agree-
ment with the experimental results in [43], which showed
that avian H5N1 had strong avian-type binding and weak
human-type binding. Distinctly, North American avian
H5N1 had F(0.0765) and F(0.137) as its primary and
secondary frequencies respectively. The HA protein of
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
North America avian H5N1 didn’t carry the highly pa-
thogennic avian signature amino acids RERR at the its
cleavage site. Adding these four amino acids to the clea-
vage site of North America avian viruses didn’t change
Table 4. Hamming distances between the concatenated protein
sequences of H5N1 of different origins. Due to limited number
of sequences for Swine (“S”) and North American H5N1 (“N”),
we formed one consensus from different years. Asian H5N1
(“A”) had seven consensuses by year (2004-2010), while hu-
man H5N1 (“H”) had five consensuses by year (2004-2008).
Year\Dist (N, A) (S, A) (N, H) (S, H) (A, H)
2010 240 112
2009 197 110
2008 186 44 200 65 40
2007 186 28 206 56 50
2006 163 61 214 35 88
2005 162 48 196 32 51
2004 171 68 195 47 28
their primary and secondary frequencies, implying that
these amino acids were not relevant for HA binding. As
a comparison, the primary F(0.295) (human-type) and
the secondary F(0.055) (swine-type) frequencies of 2009
H1N1 were included in Figure 1.
This kind of subtle difference in receptor binding pre-
ference as reflected in the secondary frequency was also
observed in the two clusters of 2009 H1N1 found in [44],
where both of them shared the same primary binding
frequency while differed in their secondary frequency.
One had swine secondary frequency and the other had
1918 Spanish flu frequency [45]. As a whole group, the
2009 H1N1 HA sequences showed its swine-type recep-
tor selection in the early months of its run in 2009, but
this selection gradually disappeared in the late months
[28]. As an extension of the work in [28], a recent report
using affinity propagation identified six clusters of 2009
H1N1 HA sequences collected from May 2010 to Feb-
ruary 2011 and found the HA receptor preference of each
cluster [46].
3.3. Hamming Distances between H5N1
Sequences of Different Origins
One easy way to compare two sequences is to calculate
their Hamming distance. To present a holistic view for
the similarities of influenza sequences under the current
study, we concatenated the consensus sequences of dif-
ferent proteins from each species including HA, NA,
M1, M2, NS1, NS2, NP, PA, PB1, PB1-F2, and PB2.
Because there were many Asian avian H5N1 sequences
available, the consensus was formed by year. Table 4
presents Hamming distances of consensus sequences of
different origins. When no sequences were available in a
particular year, we left the table entry empty in that year.
The distances between Asian avian, human, and swine
H5N1 were close to each other. However, there was a
clear increase in the distances of swine and Asian avian
H5N1 in 2009 and 2010. Unfortunately, there were no
data for distances of swine and human as well as swine
and Asian avian H5N1 in 2009 and 2010. Finally, the
distances from North American avian H5N1 to other
species were the largest in Table 4. In summary, human
and swine H5N1 sequences were closer to Asian avian
H5N1 than to North American avian H5N1.
3.4. Characteristic Sites between Asian and
North American Avian H5N1
Here we first summarized some amino acid differences
in HA, NA, NS1, and PB2 between Asian and North
American avian H5N1 because of their crucial functions
in the pathogenicity of influenza. As discussed in section
3.1, Asian avian H5N1 HA possessed a series of basic
amino acids (RRKKR) at the cleavage site, a determi-
nant of high pathogenicity for avian virus and a marker
its North American counterpart lacked. In addition, all
five consensuses of human H5N1 NS1 (2004-2008) car-
ried a deletion of 5 amino acids at positions 80 - 84 and
all swine H5N1 strains but one had this deletion, while
Asian avian H5H1 contained this deletion in 2007, 2008,
and 2010. In contrast, North American avian H5H1 had
full length NS1. This deletion in NS1 seems to increase
pathogenicity in chickens and mice [47]. Large-scale
sequence analysis of avian viruses identified a PDZ li-
gand domain of the X-S/T-X-V type at the C-terminus of
NS1. Typically, highly pathogenic H5N1 NS1 has ESEV
or EPEV while most low pathogenic human viruses
contain a different motif RSKV or RSEV, which cannot
bind PDZ-containing proteins. Most of the Asian and
North American avian, human, and swine H5N1 strains
in our dataset had an avian-type motif ESEV
All North American avian and swine H5N1 strains
had E at PB2 627, but both human and Asian avian
H5N1 had a mixture of E and K at 627. In general, 627K
is a marker for human influenza and 627E is for avian
viruses. PB2-627K confers to avian H5N1 the advantage
of efficient growth in the upper and lower respiratory
tracts of mammals [48].
Many Asian avian, human, and swine H5N1 strains
carried a deletion of 20 amino acids in the NA stalk (re-
sidues 49 - 68), which is also a indicator for high viru-
lence. But only two of the North American avian strains
contained a deletion of 21 amino acids in the NA stalk
(residues 54 - 74). The active site of NA is lined by sev-
eral conserved residues (117 - 119, 133 - 138, 146 - 152,
156, 179, 180, 196 - 200, 223 - 228, 243 - 247, 277, 278,
293, 295, 344 - 347, 368, 401, 402, and 426 - 441) that
participate in recognition of its substrate [40]. In [49] the
role of second active site in NA was assessed. It found
opyright © 2011 SciRes. AJMB
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
Copyright © 2011 SciRes.
that in avian NA the interaction between this second site
and the primary site is essential for NA function, and the
NA of 2009 H1N1 has retained some of the important
features of the second site. Following [49], we listed the
amino acids at the positions at are lined with the sec- th
Figure 2. AT/CG density curves of PB1-F2 of different origins with sliding window size = sequence length/20.
Figure 3. Top 10 important positions in each protein of H5N1 that could distinguish North American and Asian avian H5N1.
American Journal of Molecular Biology, 2011, 1, 52-61
doi:10.4236/ajmb.2011.12007 Published Online July 2011 (
Published Online July 2011 in SciRes.
ond site (N2 numbering) (Ta b l e 5 ). It appeared that the
main amino acid difference occurred at position 369 for
H5N1 and A/California/04/2009(H1N1) (Cal_04_09).
The GC content of PB1-F2 of 2009 H1N1 is more
similar to swine influenza than to humans [50], although
its PB1 gene segment was derived from human H3N2
Table 5. Amino acid comparison at second active site residues
in NA of H5N1 and Cal_04_09. The conserved residues are
highlighted (N2 numbering).
366 - 373 399 - 403 430 - 433
Ta b l e 6 . Consensus amino acids at top 10 important positions
in each protein of H5N1 distinguishing Asian and North Ame-
rican avian viruses. There were several positions that had two
frequent amino acids. A “-” stands for a deletion.
Position/HA 36 45 96 105108 163 169 275294504
North American E K S MT T V DVD
Asian T N, D N LI G, S Q NIS
Position/NA 23 42 45 4853 67 75 82234241
North American V S Y TV I I PII
Asian I, M N Q P-, I - L SVV
Position/M1 15 27 36 101166 207 224 230232240
North American V R N RV S S KDY
Asian I K N KA N N RNY
Position/M2 11 14 18 2127 28 66 829095
North American T G R DV I E SHE
Asian T E R DV V E,A SHE
Position/NS1 80 81 82 8384 91 118 127171217
North American T I A SV T R NDK
Asian -, A, T -, I -,A -, S-, S, V T K TSN
Position/NS2 14 22 23 3455 60 71 89113115
North American M G S QL S Q IIT
Asian V A S QF I Q IIA
Position/NP 34 77 105 146373 377 430 450482497
North American G K M AT S T NSD
Asian S R V AA N T NND
Position/PA 58 101 129 204261 269 272 348400631
North American G E I RL K D LPG
Asian G, S D, E I, T R, K L, M R D IS, PG, S
Position/PB1 14 113 149 191215 384 386 511642744
North American A V V VR S R SNM
Asian V I I VR L K SNM
Position/PB1-F2 28 33 35 4850 55 57 586670
North American L H L QD T S LSE
Asian Q P S PG I Y WNG
Position/PB2 64 108 147 295339 340 368 390478649
North American M T I VK R R DIV
Asian I T, A T, I VT, K K Q, R N, DVV
[40]. This discovery prompted us to look at the GC con-
tent of PB1-F2 of the viruses under the current study. It
turned out the three PB1-F2s of Asian avian, human, and
swine H5N1 were similar to each other in their GC con-
tent, but none of them was close to the GC content of
general avian, human, or swine viruses determined in
[50] (Figure 2). However, the GC content of PB1-F2 of
North American avian H5N1 was similar to the general
avian pattern found in [50]. The high pathogenicity of
1918 H1N1 reminded us to analyze the GC content of its
PB1-F2 (A/Brevig Mission/1/1918), which turned out to
be different from all the other viruses in Figure 2.
Finally, Random Forests [23] were applied to identify
Figure 4. Correlated pair counts within and between H5N1 HA
and NA of different origins.
top 10 positions in each protein that could differentiate
Asian and North American avian H5N1 with high con-
fidence (Figure 3). The two surface proteins HA and NA
had their top 10 positions with similar importance, while
the internal proteins had their importance more varied
and some of these positions displayed very high impor-
tance values. To further elucidate the difference observed
in Figure 3, we presented the consensus amino acids at
the top 10 positions in Table 6, which could complement
the information for the amino acids at the HA binding
site in section 3.2. The difference in amino acids at these
positions and those in Tab le 3 contributed to the differ-
ent HA receptor selections revealed in Figure 1.
3.5. Correlations within and between HA and
NA, and NP, PA, PB1, and PB2 Respectively
A right balance between the HA receptor binding and the
release of progeny virions by NA requires the close co-
operation of these two surface proteins. To reveal the
correlation patterns for the two proteins of H5N1, their
mutual information was calculated. We only recorded
the positions in these proteins that had a positive MI
value. The correlated position pairs of a positive MI
value were counted according to their locations in the
proteins, and then averaged by the number of sequences
in each species. It was evident that the correlation be-
tween HA and NA was higher than within across differ-
ent species. The overall interaction patterns for HA and
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
Copyright © 2011 SciRes.
NA of Asian avian and human H5N1 were alike, whe-
reas the North American avian and swine H5N1 dis-
played distinct patterns (Figure 4).
The interactions among NP and the three proteins PA,
PB1, and PB2 of the viral polymerase are essential for
virus replication and host adaptation. The mutual infor-
mation was also computed for these four proteins of
H5N1. As in the HA and NA situation, in general inter-
protein correlations of the four proteins were stronger
than intra-protein across different species (Figure 5).
Although the sequences identities of Asian avian, human,
and swine H5N1 were close to each other (Table 4),
only the NP, PA, PB1, and PB2 of Asian avian and swine
ere similar in their interaction patterns (Figure 5). It is w
Figure 5. Correlated pair counts within and between H5N1 NP, PA, PB1, and PB2 of different origins.
interesting to note that the most similar two sequences
does not always exhibit the most similar correlation pat-
terns [51].
It is of prime importance to learn the molecular dis-
tinctions between the highly and low pathogenic avian
H5N1 viruses as we develop the knowledge for avian
influenza. This study took Asian and North American
avian H5N1 as examples of highly and low pathogenic
avian viruses respectively. We sought to investigate
several crucial aspects of these viruses including HA
receptor preference and cleavage site, NA second ac-
tive site, interaction patterns of HA and NA, and NP,
PA, PB1, and PB2, and important sites in the proteins
of these viruses.
It is believed that a switch from SA α2, 3Gal to SA
α2, 3Gal receptor specificity is a critical step in the
adaptation of avian viruses to a human host. The SA α2,
3Gal specificity of avian influenza viruses makes it
difficult for these viruses to be easily transmitted from
human to human after avian to human infection. The
bioinformatics technique ISM provided an efficient
way of revealing the HA receptor selections of avian
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
H5N1. The IS analysis on the consensus HA1 sequen-
ces of Asian avian, human, and swine H5N1 demon-
strated two dominant frequencies: F(0.076) as primary
frequency and F(0.236) as secondary, while North
American avian H5N1 had F(0.076) and F(0.137). Se-
quence examination also showed that amino acid dif-
ference at two critical sites (H3 numbering) for recep-
tor selection were 193K and 216E for North American
avian H5N1 and 193R and 216K for Asian avian H5N1
in recent years. The frequent occurrences of mutations
K193R and R216K observed in Asian avian H5N1
highlighted the selection pressure on this virus to in-
crease its human-type binding. The different amino
acids at sites 193 and 216 plus the top 10 important
HA sites identified by Random Forests accounted for
the difference in HA receptor specificity of Asian and
North American avian H5N1 revealed in this study.
We thank Houghton College for its financial support.
[1] Xu, X., Subbarao, K., Cox, N.J. and Guo, Y. (1999)
Genetic characterization of the pathogenic influenza A/
Goose/Guangdong/1/96 (H5N1) virus: similarity of its
hemagg-lutinin gene to those of H5N1 viruses from the
1997 outbreaks in Hong Kong. Virology, 261, 15-19.
[2] Claas, E.C.J., Osterhaus, A.D.M.E., van Beek, R., de
Jong, J.C., Rimmelzwaan, G.F., et al. (1998) Human
influenza a H5N1 virus related to a highly pathogenic
avian influenza virus. The Lancet, 351, 472-477.
[4] Bi, Y., Fu, G., Chen, J., Peng, J., Sun, Y., Wang, J., et al.
(2010) Novel swine influenza virus reassortants in pigs,
China. Emerging Infectious Diseases.
http://www.cdc. gov/EID/content/16/7/1162.htm
[5] Shinya, K., Ebina, M., Yamada, S., Ono, M., Kasai, N.
and Kawaoka, Y. (2006) Avian flu: Influenza virus re-
ceptors in the human airway. Nature, 440, 435-436.
[6] Skehel, J.J. and Wiley, D.C. (2000) Receptor binding and
membrane fusion in virus entry: The influenza hema-
gglutinin. Annual Review of Biochemistry, 69, 531-569.
[7] Glaser, L., Stevens, J., Zamarin, D., Wilson, I.A., García-
Sastre, A., Tumpey, T.M., Basler, C.F., Taubenberger,
J.K. and Palese, P. (2005) A single amino acid substi-
tution in 1918 influenza virus hemagglutinin changes
receptor binding specificity. Journal of Virology, 79,
[8] Li, M. and Wang, B. (2006) Computational studies of
H5N1 hemagglutinin binding with SA-alpha-2,3-Gal and
SA-alpha-2,6-Gal. Biochemical and Biophysical Rese-
arch Communications, 347, 662-668.
[9] Hu, W. (2010) Identification of highly conserved do-
mains in gemagglutinin associated with the receptor
binding specificity of influenza viruses: 2009 H1N1,
avian H5N1, and swine H1N2. Journal of Biomedical
Science and Engineering, 3, 114-123.
[10] Hu, W. (2010) Quantifying the effects of mutations on
receptor binding specificity of influenza viruses. Jour-
nal of Biomedical Science and Engineering, 3, 227-240.
[11] Hu, W. (2010) Highly conserved domains in hemagg-
lutinin of influenza viruses characterizing dual receptor
binding. Natural Science, 2, 1005-1014.
[12] Veljko, V., Henry, L.N., Sanja, G., Nevena, V., Vladimir,
P. and Claude, P.M. (2009) Identification of hema-
gglutinin structural domain and polymorphisms which
may modulate swine H1N1 interactions with human
receptor. BMC Structural Biology, 9, 62.
[13] Veljkovic, V., Veljkovic, N., Muller, C.P., Müller, S.,
Glisic, S., Perovic, V. and Köhler, H. (2009) Charac-
terization of conserved properties of hemagglutinin of
H5N1 and human influenza viruses: Possible conse-
quences for therapy and infection control. BMC Struc-
tural Biology, 7, 9-21.
[14] Hu, W. (2009) Analysis of correlated mutations, stalk
motifs, and phylogenetic relationship of the 2009 influ-
enza a virus neuraminidase sequences. Journal of Bio-
medical Science and Engineering, 2, 550-558.
[15] Hu, W. (2010) The interaction between the 2009 H1N1
influenza a hemagglutinin and neuraminidase: Muta-
tions, co-mutations, and the NA stalk motifs. Journal of
Biomedical Science and Engineering, 3, 1-12.
[16] Chen, G.-W., Chang, S.-C., Mok, C.-K., Lo, Y.-L., Kung,
Y.-N., et al. (2006) Genomic signatures of human versus
avian influenza a viruses. Emerging Infectious Diseases,
12, 1353-1360.
[17] Chen, G.-W. and Shih, S.-R. (2009) Genomic signatures
of influenza a pandemic (H1N1) 2009, virus. Emerging
Infectious Diseases, 15, 1897-1903.
[18] Pan, C., Cheung, B., Tan, S., Li, C., Li, L., et al. (2010)
Genomic signature and mutation trend analysis of
pandemic (H1N1) 2009, influenza A virus. PLoS ONE, 5,
Article ID e9549. doi:10.1371/journal.pone.0009549
[19] Miotto, O., Heiny, A., Tan, T.W., August, J.T. and
Brusic, V. (2008) Identification of human-to-human trans-
missibility factors in PB2 proteins of influenza A by
large-scale mutual information analysis. BMC Bioinfor-
matics, 9, S18. doi:10.1186/1471-2105-9-S1-S18
[20] Miotto, O., Heiny, A.T., Albrecht, R., García-Sastre, A.,
Tan, T.W., August, J.T. and Brusic, V. (2010) Complete-
proteome mapping of human influenza A adaptive
mutations: Implications for human transmissibility of
zoonotic strains. PLoS ONE, 5, Article ID e9025.
[21] Finkelstein, D.B., Mukatira, S., Mehta, P.K., Obenauer,
J.C., Su, X., Webster, R.G. and Naeve, C.W. (2007)
Persistent host markers in pandemic and H5N1 influenza
viruses. Journal of Virology, 81, 10292-10299.
opyright © 2011 SciRes. AJMB
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61 61
[22] Allen, J.E., Gardner, S.N., Vitalis, E.A. and Slezak, T.R.
(2009) Conserved amino acid markers from past
influenza pandemic strains. BMC Microbiology, 9, 77.
[23] Breiman, L. (2001) Random forests. Machine Learning,
45, 5-32. doi:10.1023/A:1010933404324
[24] Hu, W. (2010) Novel host markers in the 2009 pande-
mic H1N1 influenza A virus. Journal of Biomedical
Science and Engineering, 3, 584-601.
[25] Hu, W. (2010) Nucleotide host markers in the influenza
A viruses. Journal of Biomedical Science and Engineer-
ing, 3, 684-699.
[26] Katoh, K., Kuma, K., Toh, H. and Miyata, T. (2005)
MAFFT version 5: Improvement in accuracy of multiple
sequence alignment. Nucleic Acids Research, 33, 511-
518. doi:10.1093/nar/gki198
[27] Cosic, I. (1997) The Resonant Recognition Model of
Macromolecular Bioreactivity, Theory and Application.
Birkhauser Verlag, Berlin.
[28] Hu, W. (2011) Receptor binding specificity and origin of
2009 H1N1 pandemic influenza virus. Natural Science, 3,
[29] Cover, T.A. and Thomas, J.A. (1991) Elements of
Information Theory. John Wiley and Sons, New York.
[30] MacKay, D. (2003) Information Theory, Inference, and
Learning Algorithms. Cambridge University Press, Cam-
[31] Bogs, J., Veits, J., Gohrbandt, S., Hundt, J., Stech, O., et
al. (2010) Highly pathogenic H5N1 influenza viruses
carry virulence determinants beyond the polybasic he-
magglutinin cleavage site. PLoS ONE, 5, Article ID
e11826. doi:10.1371/journal.pone.0011826
[32] Gohrbandt, S., Veits, J., Hundt, J., Bogs, J., Breithaupt,
A., Teifke, J.P., Weber, S., Mettenleiter, T.C. and Stech,
J. (2011) Amino acids adjacent to the haemagglutinin
cleavage site are relevant for virulence of avian influenza
viruses of subtype H5. Journal of General Virology, 92,
51-59. doi:10.1099/vir.0.023887-0
[33] Stevens, J., Blixt, O., Tumpey, T.M., Taubenberger, J.K.,
Paulson, J.C. and Wilson, I.A. (2006) Structure and
receptor specificity of the hemagglutinin from an H5N1
influenza virus. Science, 312, 404-410.
[34] Wang, W., Lu, B., Zhou, H., Suguitan, A.L. Jr., Cheng,
X., Subbarao, K., Kemble, G. and Jin, H. (2010)
Glycosylation at 158N of the hemagglutinin protein and
receptor binding specificity synergistically affect the
antigenicity and immunogenicity of a live attenuated
H5N1 A/Vietnam/1203/2004 vaccine virus in ferrets.
Journal of Virology, 84, 6570-6577.
[35] Auewarakul, P., Suptawiwat, O., Kongchanagul, A., et al.
(2007) An avian influenza H5N1 virus that binds to a
human-type receptor. Journal of Virology, 81, 9950-9955.
[36] Yamada, S., Suzuki, Y., Suzuki, T., Le, M.Q., Nidom,
C.A., et al. (2006) Haemagglutinin mutations responsible
for the binding of H5N1 influenza A viruses to human-
type receptors. Nature , 444, 378-382.
[37] Neumann, G., Chen, H., Gao, G.F., Shu, Y.-L. and
Kawaoka, Y. (2010) H5N1 influenza viruses: Outbreaks
and biological properties. Cell Research, 20, 51-61.
[38] Ayora-Talavera, G., Shelton, H., Scull, M.A., Ren, J.,
Jones, I.M., et al. (2009) Mutations in H5N1 influenza
virus hemagglutinin that confer binding to human tra-
cheal airway epithelium. PLoS ONE, 4, Article ID e7836.
[39] Gambaryan, A., Tuzikov, A., Pazynina, G., Bovin, N.,
Balish, A. and Klimov, A. (2006) Evolution of the rece-
ptor binding phenotype of influenza a (H5) viruses.
Virology, 344, 432-438. doi:10.1016/j.virol.2005.08.035
[40] Scalera, N.M. and Mossad, S.B. (2009) The first pande-
mic of the 21st century: A review of the 2009 pan-demic
variant influenza a (H1N1) virus. Postgraduate Medicine,
121, 43-47. doi:10.3810/pgm.2009.09.2051
[41] Maurer-Stroh, S., Lee, R.T., Eisenhaber, F., Cui, L.,
Phuah, S.P. and Lin, R.T. (2010) A new common mu-
tation in the hemagglutinin of the 2009 (H1N1) influ-
enza A virus. PLoS Currents Influenza, 1, 162.
[42] Barr, I.G., Cui, L., Komadina, N., Lee, R.T., Lin, R.T.,
Deng, Y., Caldwell, N., Shaw, R. and Maurer-Stroh, S.
(2010) A new pandemic influenza A(H1N1) genetic vari-
ant predominated in the winter 2010 influenza season in
Australia, New Zealand and Singapore. Euro Surveill, 15,
[43] Stevens, J., Blixt, O., Chen, L.M., Donis, R.O., Paulson,
J.C. and Wilson, I.A. (2008) Recent avian H5N1 viruses
exhibit increased propensity for acquiring human recep-
tor specificity. Journal of Molecular Biology, 381, 1382-
1394. doi:10.1016/j.jmb.2008.04.016
[44] Fereidouni, S.R., Beer, M., Vahlenkamp, T. and Starick,
E. (2009) Differentiation of two distinct clusters among
currently circulating influenza A(H1N1) viruses. Euro
Surveill, 14, 19409.
[45] Hu, W. (2010) Subtle differences in receptor binding
specificity and gene sequences of the 2009 pandemic
H1N1 influenza virus. Advances in Bioscience and
Biotechnology, 1, 305-314.
[46] Hu, W. (2011) New mutational trends in the HA protein
of 2009 H1N1 pandemic influenza virus from May 2010
to February 2011. Natural Science, 3, 379-387.
[47] Long, J.X., Peng, D.X., Liu, Y.L., Wu, Y.T. and Liu, X.F.
(2008) Virulence of H5N1 avian influenza virus enhan-
ced by a 15-nucleotide deletion in the viral nonstructural
gene. Virus Genes, 36, 471-478.
[48] Hatta, M., Hatta, Y., Kim, J.H., Watanabe, S., Shinya, K.,
et al. (2007) Growth of H5N1 influenza a viruses in the
upper respiratory tracts of mice. PLoS Pathog, 3, Article
ID e133. doi:10.1371/journal.ppat.0030133
[49] Sung, J.C., van Wynsberghe A.W., Amaro, R.E., Li,
W.W. and McCammon, J.A. (2010) Role of secondary
sialic acid binding sites in influenza N1 neuraminidase.
Journal of the American Chemical Society, 132, 2883-
2885. doi:10.1021/ja9073672
[50] Hu, W. (2010) Host markers and correlated mutations in
the overlapping genes of influenza viruses: M1, M2; NS1,
NS2; and PB1, PB1-F2. Natural Science, 2, 1225-1246.
opyright © 2011 SciRes. AJMB
W. Hu / American Journal of Molecular Biology 1 (2011) 52-61
Copyright © 2011 SciRes.
[51] Hu, W. (2010) Correlated mutations in the four influen-
za proteins essential for viral RNA synthesis, host
adaptation, and virulence: NP, PA, PB1, and PB2.
Natural Science, 2, 1138-1147.