J. Biomedical Science and Engineering, 2011, 4, 562-568
doi:10.4236/jbise.2011.48072 Published Online August 2011 (http://www.SciRP.org/journal/jbise/
Published Online August 2011 in SciRes. http://www.scirp.org/journal/JBiSE
Prediction of hydrophobic regions effectively in
transmembrane proteins using digital filter
Jayakishan M e he r 1, Mukesh Kumar Raval2, Gananath Dash3, Pramod Kumar Meher4
1Department of Computer Science and Engineering, Vikash College of Engineering for Women, Bargarh, Odisha, India;
2Department of Chemistry, G. M. College, Sambalpur, Odisha, India;
3School of Physics, Sambalpur University, Jyoti Vihar, Odisha, India;
4Department of Embedded System, Institute for Infocomm Research, Singapore City, Singapore.
Email: jk_meher@yahoo.co.in; mraval@yahoo.com; gndash@ieee.org; pkmeher@i2r.astar.edu.sg
Received 18 June 2011; revised 8 July 2011; accepted 20 July 2011.
The hydrophobic effect is the major factor that
drives a protein molecule towards folding and to a
great degree the stability of protein structures.
Therefore the knowledge of hydrophobic regions and
its prediction is of great help in understanding the
structure and function of the protein. Hence deter-
mination of membrane buried region is a computa-
tionally intensive task in bioinformatics. Several pre-
diction methods have been reported but there are
some deficiencies in prediction accuracy and adapta-
bility of these methods. Of these proteins that are
found embedded in cellular membranes, called as
membrane proteins, are of particular importance
because they form targets for over 60% of drugs on
the market. 20% - 30% of all the proteins in any or-
ganism are membrane proteins. Thus transmem-
brane protein plays important role in the life activity
of the cells. Hence prediction of membrane buried
segments in transmembrane proteins is of particular
importance. In this paper we have proposed signal
processing algorithms based on digital filter for pre-
diction of hydrophobic regions in the transmembrane
proteins and found improved prediction efficiency
than the existing methods. Hydrophobic regions are
extracted by assigning physico-chemical parameter
such as hydrophobicity and hydration energy index
to each amino acid residue and the resulting numeri-
cal representation of the protein is subjected to digi-
tal low pass filter. The proposed method is validated
on transmembrane proteins using Orientation of
Proteins in Membranes (OPM) dataset with various
prediction measures and found better prediction ac-
curacy than the existing methods.
Keywords: Hydrophobic Region; Transmembrane
Protein; Wavelet Transform; Physico-chemical
Parameter; Digital Filter
Proteins are important functional molecules in living
organisms. Every protein assumes a specific shape and
performs a specific function. A key characteristic of the
protein is the three-dimensional structure into which
linear chain folds which is referred to as tertiary struc-
ture. This structure results in electrochemical interaction
domains of protein and gives it the ability to interact
with other proteins and ligands to carry out specific
functions [1]. Of these proteins that are found embedded
in cellular membranes, called as membrane proteins, are
of particular importance because they form targets for
over 60% of drugs on the market. 20% - 30% of all the
proteins in any organism are membrane proteins [2].
Knowledge of segments of transmembrane proteins, the
bends in helices and the membrane buried regions help
in the study of tertiary structure. Understanding the
structure of a protein helps in understanding the role
played by that protein.
It is widely known that amino acid sequences of pro-
teins carry all the information needed to form their three-
dimensional structures [3]. Thus, the protein structure
theoretically can be predicted based solely on amino acid
sequences. Structural or biological information such as
secondary structure [4,5], kink [6,7] and hydrophobic
regions [8] are derived by assigning a physicochemical
index to an amino acid sequence. The knowledge of hy-
drophobic regions and its prediction is of great help in
understanding the structure and function of the protein.
Several steps have been taken to predict these regions.
The transmembrane structure and the membrane buried
region are shown in Figure 1.
In sliding window averaging technique the physico-
J. K. Meher al. / J. Biomedical Science and Engineering 4 (2011) 562-568 563
Figure 1. Definition of transmembrane structures. [A;D] is a
transmembrane helix (TMH) and [B;C] represents the trans-
membrane segment (TMS)which is the embedded part of
chemical value for each residue inside the frame is
summed up and assigned in the middle of the window.
Then the window slides across the sequence [9]. How-
ever, the relationship between segments and structure
does not always correspond. As extracting structural
information from amino acid sequences alone is difficult,
various prediction methods have been developed using
evolutionary information and neural network [10].
The span of window size is investigated in [11], which
reflected the interior and exterior portions of proteins.
Hydrophobicity profiles using the shortest window size
were noisy, and size less than seven residues produce
unsatisfactory result. On the other hand, long spans
tended to lose structural segments. Thus the optimum
choice between hydrophobic region and position of
amino acid residues was obtained with a nine for globu-
lar proteins. The problem with this technique is noisy in
the smoothed profiles, which makes it particularly diffi-
cult to find segments in case of globular proteins.
Recently signal processing methods play a major role
in predicting hydrophobic regions. Fourier analysis has
been applied to predict secondary structures from a se-
quential dotted hydrophobicity index [12]. The utilities
of the Fourier transform lie in its ability to analyze a
signal in the time domain for its frequency content. The
transform works by first translating a function in the
time domain into a function in the frequency domain.
The signal can then be analyzed for its frequency content
because the Fourier coefficients of the transformed func-
tion represent the contribution of each sine and cosine
function at each frequency. Although the Fourier analy-
sis is useful for acquiring structural information, this
method tends to cause positional error.
Wavelets are mathematical functions that divide data
into different frequency components. This approach has
advantages over traditional Fourier methods in analyzing
data where the signal contains discontinuities or high
frequency noise. Recently, the use of wavelet transform,
both continuous and discrete in the Bioinformatics field
is promising [13]. Continuous Wavelet Transform (CWT)
allows one-dimensional signal to be viewed in a more
discriminative two-dimensional time-scale representa-
tion. CWT is calculated by the continuous shifting of the
continuously scalable wavelet over the signal. In discrete
wavelet transform (DWT) a subset of scales and posi-
tions are chosen, in which the correlation between the
signal and the shifted and dilated waveforms are calcu-
lated. Consequently, the signal is decomposed into sev-
eral groups of coefficients, each containing signal fea-
tures corresponding to a group of frequencies. Small
scales refer to compressed wavelets, depicted by rapid
variations appropriate for extracting high frequency fea-
tures of the signal. An important attribute of wavelet
methods is that, due to the limited duration of every
wavelet, local variations of the signal are better extracted
and information on the location of these local features is
retained in the constituent waveforms. DWT has been
applied on hydrophobicity signals in order to predict
hydrophobic cores in proteins [14,15]. Protein sequence
similarity has also been studied using DWT of a signal
associated with the average energy states of all valence
electrons of each amino acid [16]. Wavelet transform
has been applied for transmembrane structure prediction
[17]. DWT has been used to decompose the amino acids
of TM proteins into a series of structures in different
layers, then predicting the location of TMHs according
to the information of the amino acids sequence in dif-
ferent scales [18]. A method based on discrete wavelet
transform has been developed to predict the number and
location of TMHs in membrane proteins [19].
The existing methods have their limitations in terms
of accuracy. As mentioned above, numerous attempts
have been made by researchers to define the relation
between interior and exterior regions directly from the
amino acid sequence. However, it is difficult to divide
interior and exterior position of amino acid residues by
assigning a hydrophobicity threshold, because of unac-
ceptable noise level. Hence there is a need to develop
advanced algorithm for faster and accurate prediction of
hydrophobic regions. This motivates to develop novel
approach based on digital filtering method to effectively
predict these regions in transmembrane α-helices.
The rest of the paper is organised as follows. Sec-
tion-2 deals with the proposed method for prediction of
hydrophobic regions in transmembrane α-helices. This
paper focuses on the development of signal processing
algorithms based on digital filter. Section-3 deals with
discussion of simulation results of proposed methods
using standard data set in terms of prediction measures.
Section-4 presents the conclusions of this paper.
opyright © 2011 SciRes. JBiSE
J. K. Meher et al. / J. Biomedical Science and Engineering 4 (2011) 562-568
The signal processing approach plays a major role in
prediction of membrane buried regions in amino acid
sequence and separate variations in signal from back-
ground noise. In this paper the membrane buried regions
in transmembrane proteins is determined effectively us-
ing digital filter. Previously, methods for hydrophobic
region prediction using only amino acid sequences have
been reported [20,21] and it was shown that hydropho-
bicity tended to be low at the loop region. These studies
suggested that hydrophobic residues were buried in the
core of protein that could be predicted using a hydro-
phobicity index. The minimal hydrophobicity profile
corresponded to the loop region, and the turn region
could be predicted effectively. Highly hydrophobic re-
gions tended to form a α-helix. Thus a hydrophobicity
plot involving hydrophobicity index is useful for the
purpose of prediction of hydrophobic regions in amino
acid sequences. We made use of digital filter to extract
low frequencies and detected the hydrophobic regions
more effectively.
In this section a technique for identification of hydro-
phobic regions in transmembrane helices using digital
filter is described. As it is seen that hydrophobic regions
are observed in low frequencies, suitable digital low pass
filter can easily filter low frequency components and
thus hydrophobic regions can be extracted. A widely
used family of low pass filters is the set of Butterworth
filters [22,23]. Butterworth filters are characterized by a
magnitude response that is maximally flat in the pass-
band and monotonic overall. A low pass Butterworth
filter of order n has the following magnitude response.
1, 1
Hf n
HF, where Fc is called the 3-dB cutoff
frequency of the filter. The magnitude response of butter
filters with different filter orders are shown in Figure 3.
As the order increases, the magnitude response comes
closer and closer to the ideal low pass characteristic. The
transfer function of the low pass Butterworth filter of
order n with radian cutoff frequency ωc = 2πFc can be
expressed as follows.
12 2
ann n
Hs sas as
 
where ωc = 1 rad/sec or Fc = 1/2π Hertz.
Cutoff frequency ωn is that frequency where the mag-
nitude response of the filter is 12. For butter, the
normalized cutoff frequency ωn must be a number be-
tween 0 and 1, where 1 corresponds to the Nyquist fre-
quency, π radians per sample. A butterworth low pass
filter of order 2 with cut-off frequency ωn = 0.4 can per-
fectly select the hydrophobic regions of the protein se-
quences. The pole-zero plot and frequency response of
second order low pass butterworth filter are shown in
Figures 2 and 3 respectively. The filter is used to let
pass a low frequency component of a signal and attenu-
ates a high frequency component of the signal.
The Butterworth low pass filter gives rise to patterns
that are distinct between the interior and exterior loca-
tions. To analyze the protein sequence for prediction of
hydrophobic regions, it is first transformed into a nu-
merical signal based on hydrophobicity indices of resi-
dues along a protein sequence. The numerical hydro-
phobicity indices of 20 amino acid residues obtained
from Hyperchempro 8.0 software of HyperCube Inc.,
Figure 2. Pole-Zero plot of Butterworth low pass filter of order
n = 2
Figure 3. Magnitude response plot of Butterworth low pass
filter of order n = 1, 2, 3 and 4.
opyright © 2011 SciRes. JBiSE
J. K. Meher al. / J. Biomedical Science and Engineering 4 (2011) 562-568 565
USA (Table 1) are assigned to the protein sequence. The
resulting numerical sequence is passed through the pro-
posed lowpass filter and plotted. The peaks observed in
the plot indicate the hydrophobic regions.
For a given protein sequence x[i], the numerical se-
quence using hydrophobicity index, x[n] is expressed as
x[n]= [–3.9 –1.3 1.8 1.8 –3.9 –3.5 –0.4 –3.5 –1.6 –3.5
–3.5 3.8 –0.8 –3.9 –3.5 3.8].
The filter output y[n] is plotted for observation of
peaks at low frequency regions.
A step-by-step procedure of the proposed method for
prediction of hydrophobic regions is as follows:
1) Convert the protein sequence into numerical se-
quence using hydrophobic indices of each amino acid
2) The resulting numerical sequence is passed through
proposed low pass filter that would select low frequency;
3) Plot the magnitude response and determine the
threshold to observe peaks where the low frequency re-
gions are dominant;
4) Locate the hydrophobic regions by locating the en-
ergy peaks in the filtered signal.
The sliding window averaging technique and fre-
quency domain and wavelet analysis are then compared
with proposed method based on corresponding results.
Table 1. Physico-chemical properties of amino acids.
Amino acids Hydrophobicity Hydration Energy
A Alanine 1.8 2.06
C Cysteine 2.5 –0.61
D Aspartic acid –3.5 0.82
E Glutamic acid –3.5 1.04
F Phenylalanine 2.8 –0.33
G Glycine –0.4 1.48
H Histidine –3.2 –7.6
I Isoleucine 4.5 3.07
K Lysine –3.9 –2.8
L Leucine 3.8 3.01
M Methionine 1.9 1.43
N Asparagine –3.5 –4.43
P Proline –1.6 –1.33
Q Glutamine –3.5 –3.94
R Arginine –4.5 –9.91
S Serine –0.8 –5.79
T Threonine –0.7 –4.18
V Valine 4.2 2.79
W Tryptophan –0.9 –3.49
Y Tyrosine –1.3 –7.13
Hydration Energy Index
In this section, we discuss a novel numerical representa-
tion of protein sequence generated by a physico-chemi-
cal property of amino acid residues called hydration en-
ergy to detect the membrane buried regions in trans-
membrane proteins. There are various physico-chemical
properties namely; hydration energy, dipole moment,
electron ion interaction pseudopotential (EIIP), polariza-
bility, refractivity, molar surface area and molar volume
which are frequently used for quantitative structure ac-
tivity relationship (QSAR) of molecules. Of these prop-
erties, specifically the numerical sequence based on the
hydration energy is found to produce sharp peak at
membrane buried region when used in digital filtering.
Hydration energy reflects the hydrophilicity (or hydro-
phobicity) of molecules. Hence it is correlated with the
solution of the problem. The hydration energy indices of
amino acids are obtained from Hyperchempro 8.0 soft-
ware of HyperCubeInc, USA (Table 1). The transmem-
brane protein sequence is first transformed into a nu-
merical signal by assigning hydration energy indices of
residues along a protein sequence. The resulting nu-
merical sequence is subjected to the proposed low pass
filter and plotted. The peaks observed in the plot indicate
the membrane buried regions.
We have used the proposed digital low pass filter to
detect the hydrophobic regions using numerical repre-
sentation based on hydrophobicity indices and hydration
energy of amino acid residues. List of transmembrane
proteins and their coordinate files were obtained from
the Orientation of Proteins in Membranes (OPM) data-
base at College of Pharmacy, University of Michigan
(http://www.phar.umich.edu). Transmembrane proteins
from OPM data sets are used as bench mark for this
purpose. In a good number of cases the proposed method
performed well.
The performance analysis of various methods can be
made by prediction measures such as accuracy (A), pre-
cision (P) and recall (R) which are defined in terms of
four parameters true positive (tp), false positive (fp), true
negative (tn) and false negative (fn) (Table 2). tp denotes
the number of actual buried regions and are also pre-
dicted as buried regions, fp denotes the number of actu-
ally residues exposed but are predicted to be buried, tn is
the number of actually exposed and also predicted to be
exposed, and fn is the number of actually buried and pre-
dicted to be exposed.
3.1. Accuracy
The accuracy of prediction of hydrophobic regions in
amino acid sequence is defined as the percentage of bur-
opyright © 2011 SciRes. JBiSE
J. K. Meher et al. / J. Biomedical Science and Engineering 4 (2011) 562-568
Copyright © 2011 SciRes.
ied regions correctly predicted of the total buried and
exposed present. It is computed as follows:
umber of correct buired predictions
Total number of buired and exposed
3.2. Precision
It is defined as the percentage of buried regions correctly
predicted to be one class of the total buried predicted to
be of that class. Precision is computed as:
umber of correctly predicted buired
Total number of buired predicted
3.3. Recall
It is defined as the percentage of the buried regions that
belong to a class that are predicted to be that class. Re-
call is computed as:
Number of correctly predicted buired
Total number of actual buired
We attempted to predict hydrophobic regions in
transmembrane proteins, using digital low pass filter.
The sliding window averaging technique and wavelet
analysis were then compared with the proposed method
based on corresponding results.
The models as well as sequence of the transmembrane
proteins are obtained from PDBTM database. When se-
quence file in fasta format is submitted to TMHMM
pred server then the sections for transmembrane regions
(TM), residues buried (rbu) and residues outside exposed
(rex) regions are identified. In transmembrane protein
Bovine Cytochrome BC1 Complex with Stigmatellin
bound having PDB Id: 2a06, proposed method have de-
tected all membrane buried regions as shown in Figure
4. The figure shows the hydrophobicity plot of the
original sequence, first order filter response and second
order filter response. The hydrophobicity profile smoo-
thed by the low frequencies extracted using digital low
filter of second order shows buried residues indicating
sharp peaks remained in the low frequency. Above the
threshold shows the membrane buried regions which is
embedded part the transmembrane helix (TMH). On the
other hand, the profile of the sliding window averaging
technique was hard to find segments corresponded to
buried residues. Table 3 summarizes the prediction ac-
curacy of hydrophobic regions by various methods such
as the sliding window averaging technique, wavelet
analysis and proposed filter method. Table 4 shows the
average prediction accuracy of hydrophobic regions of
88 dataset by the various methods. It is found that the
prediction accuracy of proposed method is the highest of
all tested cases. Thus the proposed method shows im-
proved accuracy in predicting hydrophobic regions.
The profile of assigning hydrophobicity index to
amino acid sequence is inherently noisy. To eliminate
noise from raw functions, various methods have been
proposed. Although the sliding window averaging tech-
nique is in wide use, the precise region of the hydropho-
bic region is difficult to determine because of the aver-
aging calculation. We extracted the low frequencies
from raw data using digital filter and investigated the
relationship between low frequencies of proteins. The
efficiency of filtering analysis for interior/exterior pre-
diction indicated the detection of segments related to a
Table 2. Contingency table for evaluation metrics.
Actual Predicted Residue buried (rbu) Residue exposed (rex)
Residue buried (rbu)tp f
Residue exposed (rex)fn t
Figure 4. Hydrophobicity plot of Bovine Cytochrome BC1
Stigmatellin bound using (a) raw sequence; (b) low pass filter
of 1st order; (c) low pass filter of 2st order. rbu and rex are
residues buried within TM and residues exposed separated by
J. K. Meher al. / J. Biomedical Science and Engineering 4 (2011) 562-568 567
Table 3. Prediction accuracy of various methods using OPM datsets.
Prediction Measures
PDB Id Methods
sliding window averaging 0.66 0.66 1
wavelet transform 0.8 0.66 1
Bovine Cytochrome BC1 Stigmatellin bound
Digital filter 1 1 1
sliding window averaging 0.76 0.76 1
wavelet transform 0.83 0.83 1
DXR from Thermooga maritia complex with fosmidmycin and NADPH
Digital filter 0.91 0.91 1
sliding window averaging 0.66 0.5 1
wavelet transform 0.75 0.66 1
sodium-potassium pump with bound potassium and ouabain
Digital filter 1 1 1
sliding window averaging 0.6 0.5 1
wavelet transform 0.7 0.75 1
Subunit C of F1F0 ATP synthase of E. Coli NMR, 10 structures
Digital filter 0.88 0.87 1
sliding window averaging 0.8 0.9 1
wavelet transform 0.8 0.9 1
Bovine heart cytochrome C oxidase at the NO-bound fully reduced state
Digital filter 1 1 1
Table 4. Average prediction accuracy of 88 transmembrane
protein datasets.
Methods A P R
Sliding window averaging 0.69 0.66 1
Wavelet transform 0.84 0.79 1
Digital filter 0.94 0.92 1
hydrophobic region at peak, thus facilitating the accurate
prediction of buried residues directly from amino acid
Digital low pass filter plays a vital role in the prediction
of hydrophobic regions in transmembrane proteins. The
proposed method is not only fast but also has improved
accuracy as compared to existing methods. However
prediction of membrane buried region in the protein de-
pends on the features of aminoacid sequence. Feature
vector with hydrophobicity and hydration energy indices
of amino acid residues are used for numerical represen-
tation. The prediction accuracy of proposed method is
the highest of all tested cases. It is found that the pro-
posed methods show better accuracy in predicting hy-
drophobic regions.
[1] Ramachandran, G., Ramakrishnan, C. and Sasisekharan,
V. (1963) Stereochemistry of polypeptide chain configu-
ration. Journal of Molecular Biology, 7, 95-99.
[2] Cordes, F., Bright, J. and Sansom, M. (2002) Proline
induced distortions of transmembrane helices. Journal of
Molecular Biology, 323, 951-960.
[3] Anfinsen, C.B. (1973) Principles that govern the folding
of protein chains. Science, 181, 223-230.
[4] Rose, G.D. (1978) Prediction of chain turns in globular
proteins on a hydrophobic basis. Nature, 272, 586-590.
[5] Qian, H. (1996) Prediction of α-helices in proteins based
on thermodynamic parameters from solution chemistry.
Journal of Molecular Biology, 256, 663-666.
[6] Mohapatra, P., Khamari, A. and Raval, M. (2004) A
method for structural analysis of α-helices of membrane
proteins. Journal of Molecular Biology, 10, 393-398.
[7] Heijne, G.V. (1991) Proline kinks in transmembrane
α-helices. Journal of Molecular Biology, 218, 499-503.
[8] Swindells, M.B. (1995) A procedure for the automatic
determination of hydrophobic cores in protein structures.
Protein Science, 4, 93-102. doi:10.1002/pro.5560040112
[9] Desjarlais, J.R. and Handel, T.M. (1995) De novo design
of the hydrophobic cores of proteins. Protein Science, 4,
2006-2018. doi:10.1002/pro.5560041006
[10] Rost, B. and Sander, C. (1994) Combining evolutionary
information and neural networks to predict secondary
structure. PROTEINS: Structure, Function, and Genetics,
19, 55-72. doi:10.1002/prot.340190108
[11] Kyte, J. and Doolittle, R.F. (1982) A simple method for
displaying the hydropathic character of a protein. Journal
of Molecular Biology, 157, 105-132.
opyright © 2011 SciRes. JBiSE
J. K. Meher et al. / J. Biomedical Science and Engineering 4 (2011) 562-568
[12] Cornette, J.L., Cease, K.B., Margalit, H., Spouge, J.L.,
Berzofsky, J.A. and DeLisi, C. (1987) Hydrophobicity
scales and computational techniques for detecting am-
phipathic structures in proteins. Journal of Molecular Bi-
ology, 195, 659-685. doi:10.1016/0022-2836(87)90189-6
[13] Liò, P. (2003) Wavelets in bioinformatics and computa-
tional biology: State of art and perspectives. Bioinfor-
matics, 19, 2-9.
[14] Hirakawa, H. and Kuhara, S. (1997) Prediction of hy-
drophobic cores of proteins using wavelet analysis. Ge-
nome Inform Ser Workshop Genome Inform, 8, 61-70.
[15] Hirakawa, H., Muta, S. and Kuhara, S. (1999) The hy-
drophobic cores of proteins predicted by wavelet analysis.
Bioinformatics, 15, 141-148.
[16] De Trad, C., Fang, Q. and Cosic, I. (2002) Protein se-
quence comparision based on the wavelet transform ap-
proach. Protein Engineering, Design and Selection, 15,
193-203. doi:10.1093/protein/15.3.193
[17] Murray, K.B., Gorse, D. and Thornton, J. (2002) Wavelet
transforms for the characterization and detection of re-
peating motifs. Journal of Molecular Biology, 316,
341-363. doi:10.1006/jmbi.2001.5332
[18] Yu, B., Meng, X.H. and Liu, H.J. (2006) Prediction of
transmembrane helical segments in transmembrane pro-
teins based on wavelet transform. Journal of Shanghai
University (English Edition), 10, 308-318.
[19] Bin, Y. and Yan, Z. (2010) On the Prediction of trans-
membrane helical segments in membrane protein. Inter-
national Journal of Mathematical and Computer Sci-
ences, 6, 192-195.
[20] Kuntz, I.D. (1972) Protein folding. Journal of the
American Chemical Society, 94, 4009-4012.
[21] Qian, H. (1996) Prediction of α-helices in proteins based
on thermodynamic parameters from solution chemistry.
Journal of Molecular Biology, 256, 663-666.
[22] Mitra, S.K. (2006) Digital signal processing. Tata
McGraw-Hill, Noida.
[23] Oppenheim, A.V. and Schafer, R.W. (1999) Discrete-time
signal processing. Prentice-Hall, Inc., Upper Saddle River.
opyright © 2011 SciRes. JBiSE