B. LIU, X. L. WANG
Copyright © 2013 SciRes. ENG
the families. All the 531 amino acid indices were used
for predicting each family.
The predictive results of different sequence-based me-
thods are listed in Table 1. SVM-Ngram, SVM-Pattern,
and SVM-Motif are based on three different building
blocks of proteins. Mismatch method allows a given
number of mismatches between the substrings of the
proteins. SVM-LA is based on the pairwise similarity
scores. The performance of PseAACIndex is highly
comparable with SVM-LA and outperforms other me-
thods in terms of both ROC and ROC50 scores, indicat-
ing that the proposed PseAACIndex approach is an effi-
cient method for protein remote homology detection.
4. Conclusion
In this study, inspired by the success of PseAAC, we
combined the PseAAC with various amino acid indices
extracted from the AAIndex database for protein remote
homology detection. It took both the sequence-order in-
formation and the amino acid physicochemical proper-
ties extracted from the AAIndex database into considera-
tion. Experimental results demonstrated that this ap-
proach was useful for protein remote homology detection
and showed better predictive results than the compared
methods.
Table 1. Results of different methods for protein remote
homology detection.
Average ROC and ROC50 scores
Methods ROC ROC50 Source
PseAACIndex (λ = 5) 0.880 0.620 This study
SVM-Ngram 0.791 0.584 [24]
SVM-Pattern 0.835 0.589 [24]
SVM-LA(ß = 0.5) 0.925 0.649 [5]
Mismatch 0.872 0.400 [25]
SVM-Motif 0.814 0.616 [24]
5. Acknowledgements
We would like to thank Professor Kuo-Chen Chou at
Gordon Life Science Institute for his helpful suggestions
on this manuscript. This work was supported by the Na-
tional Natural Science Foundation of China (No.
61173075, 61003090 and 60973076), the Project HIT.
NSRIF.2013103 supported by Natural Scientific Re-
search Innovation Foundation in Harbin Institute of
Technology, Natural Science Foundation of Guangdong
province (No.S2012040007390), and Shanghai Key La-
boratory of Intelligent Information Processing, China
(Grant No.IIPL-2012-002).
REFERENCES
[1] L. Liao and W. S. Noble, “Combining Pairwise Sequence
Similarity and Support Vector Machines for Detecting
Remote Protein Evolutionary and Structural Relation-
ships,” Journal of Computational Biology, Vol. 10, No. 6,
2003, pp. 857-868.
http://dx.doi.org/10.1089/106652703322756113
[2] T. F. Smith and M. S. Waterman, “Identification of Com-
mon Molecular Subsequences,” Journal of Molecular Bi-
ology, Vol. 147, No. 1, 1981, pp. 195-197.
http://dx.doi.org/10.1016/0022-2836(81)90087-5
[3] B. Qian and R. A. Goldstein, “Performance of an Iterated
T-Hmm for Homology Detection,” Bioinformatics, Vol.
20, No. 14, 2004, pp. 2175-2180.
http://dx.doi.org/10.1093/bioinformatics/bth181
[4] V. N. Vapnik, “Statistical Learning Theory,” 1998.
[5] H. Saigo, et al., “Protein Homology Detection Using Str-
ing Alignment Kernels,” Bioinformatics, Vol. 20, No. 11,
2004, pp. 1682-1689.
http://dx.doi.org/10.1093/bioinformatics/bth141
[6] B. Liu, et al., “Using Amino Acid Physicochemical Dis-
tance Transformation for Fast Protein Remote Homology
Detection,” PLoS ONE, Vol. 7, No. 9, 2012, p. e46633.
http://dx.doi.org/10.1371/journal.pone.0046633
[7] S. Kawashima, et al., “AAindex: Amino Acid Index Da-
tabase, Progress Report 2008,” Nucleic Acids Research,
Vol. 36, No. Database, 2008, pp. D202-D205.
[8] B. Liu, et al., “A Discriminative Method for Protein Re-
mote Homology Detection and Fold Recognition Com-
bining Top-n-Grams and Latent Semantic Analysis,” BMC
Bioinformatics, Vol. 9, 2008, p. 510.
http://dx.doi.org/10.1186/1471-2105-9-510
[9] T. Lingner and P. Meinicke, “Remote Homology Detec-
tion Based on Oligomer Distances,” Bioinformatics, Vol.
22, No. 18, 2006, pp. 2224-2231.
http://dx.doi.org/10.1093/bioinformatics/btl376
[10] K. C. Chou, “Prediction of Protein Cellular Attributes Us-
ing Pseudo Amino Acid Composition,” Proteins: Struc-
ture, Function, and Bioinformatics, Vol. 43, 2001, pp.
246-255. http://dx.doi.org/10.1002/prot.1035
[11] Q. W. Dong, et al., “Application of Latent Semantic Ana-
lysis to Protein Remote Homology Detection,” Bioinfor-
matics, Vol. 22, No. 3, 2006, pp. 285-290.
http://dx.doi.org/10.1093/bioinformatics/bti801
[12] S. E. Brenner, et al., “The ASTRAL Compendium for
Sequence and Struc ture Analysis,” Nucleic Acids Research,
Vol. 28, No. 1, 2000, pp. 254-256.
http://dx.doi.org/10.1093/nar/28.1.254
[13] Y. D. Cai and K. C. Chou, “Predicting Enzyme Subclass
by Functional Domain Composition and Pseudo Amino
Acid Composition,” Journal of Proteome Research, Vol.
4, 2005, pp. 967-971.
http://dx.doi.org/10.1021/pr0500399
[14] Y. D. Cai and K. C. Chou, “Nearest Neighbour Algorithm
for Predicting Protein Subcellular Location by Combining
Functional Domain Composition and Pseudoamino Acid
Composition,” Biochemical and Biophysical Research Com-
munications, Vol. 305, 2003, pp. 407-411.
http://dx.doi.org/10.1016/S0006-291X(03)00775-7
[15] H. B. Shen and K. C. Chou, “Predicting Protein Subnuc-