S. C. JIA ET AL.
Copyright © 2013 SciRes. ENG
395
to RF algorithm doesn’t produce overfitting when the
dimension of the characteristic parameters is higher, bet-
ter results are still obtained. But the phenomenon will
appear in this case for the SVM; 3) Previous independent
test is usually that using the large sample to test the small
one. However, we still obtain better predictive results by
using the small sample to test the large one when RF
algorithm is us ed . Th is imp lie s th at RF alg or i th m is steady
and effective.
REFERENCES
[1] M. Kuhn, J. Meiler and D. Baker, “Strand-Loop-Strand
Motifs: Prediction of Hairpins and Diverging Turns in
Proteins,” Proteins: Structure, Function, and Bioinforma-
tics, Vol. 54, 2004, pp. 282-288.
http://dx.doi.org/10.1002/prot.10589
[2] X. Cruz, E. G. Hutchinson, A. Shepherd and J. M. Thorn-
ton, “Toward Predicting Protein Topology: An Approach
to Identifying β Hairpins,” Proceedings of the National
Academy of Sciences of the USA, Vol. 99, 2002, pp.
11157-11162. http://dx.doi.org/10.1073/pnas.162376199
[3] M. Kumar, M. Bhasin, N. K. Natt and G. P. S. Raghava,
“BhairPred: Prediction of β-Hairpins in a Protein from
Multiple Alignment Information Using ANN and SVM
Techniques,” Nucleic Acids Research, Vol. 33, 2005, pp.
154-159. http://dx.doi.org/10.1093/nar/gki588
[4] T. F. Jenny, D. L. Gerloff, M. A. Cohen and S. A. Benner,
“Predicted Secondary and Super Secondary Structure for
the Serine-Threonine-Specific Protein Phosphatase Fam-
ily,” Proteins: Structure, Function, and Bioinformatics,
Vol. 21, 1995, pp. l-10 .
[5] A. Godzik, J. Skolnick and A. Kolinski, “Simulations of
the Folding Pathway of Triose Phosphate Isomerase-Type
Alpha/Beta Barrel Proteins,” P roceedings of the National
Academy of Sciences of the USA, Vol. 89, 1992, pp.
2629-2633. http://dx.doi.org/10.1073/pnas.89.7.2629
[6] R. T. Wintjens, M. J. Rooman and S. J. Wodak, “Auto-
matic Classification and Ana l ysi s of Alpha Alpha-Turn
Motifs in Proteins,” Journal of Molecular Biology, Vol.
255, 1996, pp. 235-253.
[7] X. Z. Hu and Q. Z. Li, “Prediction of the β-Hairpins in
Proteins Using Support Vector Machine,” Protein Jour-
nal, Vol. 27, 2008, pp. 115-122.
http://dx.doi.org/10.1007/s10930-007-9114-z
[8] X. Z. Hu, Q. Z. Li and C. L. Wang, “Recognition of β-
Hairpin Motifs in Proteins by Using the Composite Vec-
tor,” Ami no Acids , Vol. 38, 2010, pp. 915-921.
http://dx.doi.org/10.1007/s00726-009-0299-7
[9] W. Kabsch and C. Sander, “Dictionary of Protein Secon-
dary Structure: Pattern Recognition of Hydrogen-Bonded
and Geometrical Features,” Biopolymers, Vol. 22, 1983,
pp. 2577-2637. http://dx.doi.org/10.1002/bip.360221211
[10] B. Oliva, P. A. Bates, E. Querol, F. X. Aviles and M. J. E.
Sternberg, “An Automated Classification of the Structure
of Protein Loops,” Journal of Molecular Biology, Vol.
266, 1997, pp. 814 -830.
http://dx.doi.org/10.1006/jmbi.1996.0819
[11] J. Espadaler, N. F. Fuentes, A. Hermoso, E. Querol, F. X.
Aviles, M. J. E. Sternberg and B. Oliva, “ArchDB: Auto-
mated Protein Loop Classification as a Tool for Structural
Genomics,” Nucleic Acids Research, Vol. 32, 2004, pp.
185-188. http://dx.doi.org/10.1093/nar/gkh002
[12] L. Breiman, “Random Forests,” Machine Learning, Vol.
45, 2001, pp. 5-32.
http://dx.doi.org/10.1023/A:1010933404324
[13] F. S. Edelenyi, L. Goumidi and S. Bertrais, “Prediction of
the Metabolic Syndrome Status Based on Dietary and Ge-
netic Parameters, Using Random Forest,” Genes & Nutri-
tion, Vol. 3, 2008, pp. 173-176.
http://dx.doi.org/10.1007/s12263-008-0097-y
[14] O. Okun and H. Priisalu, “Random Forest for Gene Ex-
pression Based Cancer Classification: Overlooked Issues,”
Pattern Recognition and Image Analysis, Vol. 4478, 2007,
pp. 483-490.
http://dx.doi.org/10.1007/978-3-540-72849-8_61
[15] A. Liaw and M. Wiener, “Classification and Regression
by Random Forest,” R News, Vol. 2, 2002, pp. 18-22.
[16] V. Vapnik, “Statistical Learing Theory,” Wiley-Intersci-
ence, 1998.
[17] J. Panek, I. Eidhammer and R. Aasland, “A New Method
for Identification of Protein (sub) Families in a Set of
Proteins Based on Hydropathy Distribution in Proteins,”
Proteins: Structure, Function, and Bioinformatics, Vol.
58, 2005, pp. 923-934.
http://dx.doi.org/10.1002/prot.20356
[18] R. R. Laxton, “The Measure of Diversity,” Journal of
Theoretical Biology, Vol. 70, 1978, pp. 51-67.
http://dx.doi.org/10.1016/0022-5193(78)90302-8
[19] J. M. Claverie and S. Audic, “The Statical Significance of
Nucleotide Position-Weight Matrix Matches,” CABIOS,
Vol. 12, 1996, pp. 431-439.