J. Biomedical Science and Engineering, 2008, 1, 141-146
Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE
Prediction of human microRNA hairpins using
only positive sample learning
Dang Hung Tran*, 1, Tho Hoan Pham2, Kenji Satou1, 3 & Tu Bao Ho1
1Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292 J apan. 2Han oi National Unive rs i ty of Education, 136 Xuan
Thuy, Hanoi, Viet Nam. 3Kanazawa University, Kakuma, Kanazawa 920-1192, Japan. *Corresponden ce should be addressed to Dang Hung T r an
(hungtd@jaist.ac.jp).
ABSTRACT
MicroRNAs (miRNAs) are small molecular
non-coding RNAs that have important roles in
the post-transcriptional mechanism of animals
and plants. They are commonly 21-25 nucleo-
tides (nt) long and derived from 60-90 nt RNA
hairpin structures, called miRNA hairpins. A lar-
ger number of sequence segments in the human
genome have been computationally identified
with such 60-90 nt hairpins, however the major-
ity of them are not miRNA hairpins. Most exist-
ing computational methods for predicting
miRNA hairpins are based on a two-class classi-
fier to distinguish between miRNA hairpins and
other sequence segments with hairpin struc-
tures. The difficulty of these methods is how to
select hairpins as negative examples of miRNA
hairpins in the training dataset, since only a few
miRNA hairpins are available. Therefore, these
classifiers may be mis-trained due to some false
negative examples of the training dataset. In this
paper, we introduce a one-class support vector
machine (SVM) method to predict miRNA hair-
pins among the hairpin structures. Different from
existing methods for predicting miRNA hairpins,
the one-class SVM classifier is trained only on
the information of the miRNA class. We also il-
lustrate some examples of predicting miRNA
hairpins in human chromosomes 10, 15, and 21,
where our method overcomes the above disad-
vantages of existing two-class methods.
Keywords: MicroRNA; Hairpin; One-class SVM
1. INTRODUCTION
MicroRNAs (miRNAs) are small, non-coding RNAs (21-
25 nucleotides in length) that regulate the expression of
protein-encoding genes at the post-transcriptional level [1,
2, 21]. Each miRNA derives from a larger precursor,
which folds into an imperfect stem-loop structure.
In human, the processing and maturation of miRNAs
are divided into several steps before silencing their targets.
First, the long primary transcripts (pri-miRNAs), which
can be up to several kilobases, are processed by Drosha-
complex in nucleus to yield precursor miRNAs (pre-
miRNAs) [10, 12]. The pre-miRNA is a double-stranded
sequence of about 60-90 nt with a 2-nt 3' overhang and
forms a hairpin structure (also called miRNA hairpin).
Second, pre-miRNAs are transported from the nucleus
into the cytoplasm by another complex, which consists of
Exportin 5 and RanGTP [6, 29]. Subsequently, the pre-
miRNA is cleaved into an imperfect double-stranded
RNA duplex by endonuclease RNase III enzyme called
Dicer [25, 29, 42]. This duplex is composed of the mature
miRNA strand and its complementary strand. Finally,
mature miRNAs are incorporated into RICS (RNA-
induced silencing complex) before they bind to their tar-
gets to regulate gene expression.
Until now, several computational approaches have
been proposed for predicting miRNAs. Most of them are
based on the common structural characteristic of secon-
dary structures of their pre-miRNAs [15, 35, 40]. Since
pre-miRNAs are often short (60-90 nt), there can be too
many subsequences in a genome having hairpin structures.
However, only a minority of them are miRNA hairpins.
Using only information of their structures therefore may
not allow us to distinguish miRNA hairpins from other
hairpin structures. Other methods that consider informa-
tion of both sequences and structures are needed.
Most methods so far used a two-class classifier to sepa-
rate the miRNA hairpins from the ones assumed to be
negative. The main difference between these methods is
how negative examples are selected for the two-class
classifier training dataset. For example, Szafranski et al.
[35] and Xue et al. [40] selected examples that overlap
with one of the last exon of known mRNAs; or Helvik et
al. [15] tried to get them randomly from DNA sequences
with hairpin structures as ``negative'' miRNA hairpins.
The negative examples collected in such ways would con-
tain false negatives, since no study so far has mentioned
the information regarding true negative miRNA hairpins.
In other words, only the information of miRNA hairpins
is available. Therefore, the classifier of existing methods
may be incorrect, due to some false negative miRNA
hairpins contained in the training dataset.
In this paper, we present a new method for predicting
miRNA hairpins that employs support vector machines
SciRes Copyright © 2008
142 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146
SciRes Copyright © 2008 JBiSE
for one-class classification (one-class SVMs). One-class
SVMs recently have been successfully applied in several
areas, especially domains with imbalanced data such as
document classification [30], gene prediction [19] and
image retrieval [9]. Different from previous methods for
predicting miRNA hairpins, our method uses only avail-
able miRNA hairpins for training the model, while other
methods train their classifiers by using an additional data-
set of negative examples, which may contain some false
negatives as explained above. Moreover, more features of
hairpin sequences and structures are used to represent
hairpins, with expectation that they would be useful for
the model. Our one-class SVM classifier gave good re-
sults in predicting miRNA hairpins. We also illustrated
some examples of predicting miRNA hairpins in human
chromosomes 10, 15, and 21 where our method can avoid
the problem of false negative examples of the existing
two-class methods.
2. MATERIALS AND METHODS
2.1. Datasets for training and testing
As mentioned in Section 1, our method uses one-class
SVMs to recognize miRNA hairpins from potential ones
produced by ScorePin [15]. To do this, the one-class
SVM model should capture the characteristics of known
miRNA hairpins. In our work, the positive class we used
consists of 474 known human miRNA hairpins from
miRBase (version 8.1) [13, 14]. (http://microrna.
sanger.ac.uk/sequences/) that have been verified by ex-
periments or predicted by computational methods with
high confidence. To ensure that all miRNA hairpins were
folded as hairpins, we removed a few of those containing
none or more than one RNAfold-predicted hairpin-loop.
The positive class used in this work is therefore of 451
miRNA hairpins.
To evaluate our one-class SVM models for miRNA
hairpins, we conducted two kinds of experiments.
Cross-validation : like some previous researches [15, 35,
40], we first prepared the dataset for the cross-validation
procedure to compare our method with the other methods.
The dataset contained 451 positive examples as described
above, and 727 negative examples of miRNAs hairpins.
These 727 negative miRNA hairpins were ScorePin-
hairpins that overlap with the last exon of known coding-
protein genes. We randomly partitioned the dataset into
three subsets, such that the numbers of both positive and
negative examples in each of the three subsets were
equivalent or nearly equivalent. Of them, one subset was
retrained as the validation data for test prediction methods,
and was trained on the two remaining subsets (note that
with our method, one-class SVM, only positive examples
are used for the training). The cross-validation procedure
was repeated three times. Results from three trials were
then averaged to produce a single estimation.
Test on chromosomes 10, 15, and 21: we use all known
miRNA hairpins, excluding ones on chromosomes 10, 15
and 21, to train the one-class SVM model. This model is
then used to recognize miRNA hairpins from ScorePin-
hairpins (see Section 3.1). Table 1 presents a summary of
all data sets used in two kinds of experiments.
Table 1. The data of human miRNA hairpins.
Experiment #Examples
Testing on chromosomes 437 training examples
41039 hairpin candidates
Cross-validation 2/3 x 451 training positives
1/3 x 451 testing positives
1/3 x 727 testing negatives
2.2. One-class support vector machines
Support vector machine (SVM) is a learning technique
based on statistical learning theory [38]. It has been ap-
plied to a wide range of real-world tasks. The formulation
of SVMs can be considered as a simple linear classifica-
tion, normally using both negative and positive examples
for training. SVMs can perform nonlinear separation by
using a kernel technique, which realizes a nonlinear map-
ping to a feature space. Scholkopf et al. [33] have ex-
tended standard SVMs to one-class classification prob-
lems. Their approach is to construct a hyperplane that is
maximally distant from the origin [33].
In this section, we give details of the algorithm for
training one-class SVMs proposed by Scholkopf et al.
[33]. The training algorithm is as follows: let the training
data N
lRxxx ,...,, 21 belong to one class, where i
xis a
feature vector and l is the number of examples. The one-
class SVM estimates a function that will take the valu e +1
in a region where the majority of the data points are con-
centrated, and the value -1 everywhere else [30, 33]. For-
mally, the function can be written as follows:
∈−
∈+
=Sxif
Sxif
xf 1
1
)(
where S is a simple subset of input space and S is the
complement of.SLet HX
Φ
: be a kernel map which
converts the training examples from the origin space to a
feature space. The strategy is to map the data into the fea-
ture space corresponding to the kernel, and to separate
them from the origin by the maximum margin. In order to
separate the data set from the origin, we need to solve the
following quadratic programming problem [9, 30, 33]:
≥=−≥Φ
−+
=
.0;,...,1,))(.(
1
2
1
min
1
2
iii
l
i
i
lixw
vl
w
ξξρ
ρξ
where )1,0(
v is a parameter that represents an upper
bound on the fraction of outliers in the data,
ρ
is the mar-
gin of the hyperplane with respect to the data, and i
x are
non-zero slack variables allowing a soft margin. We ob-
tain w and
ρ
by solving this problem. When we give a
new data point
x
to be classified, a label is assigned ac-
D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146 143
SciRes Copyright © 2008 JBiSE
cording to the decision function, which can be expressed
as: )))(.sgn(()(
ρ
Φ
=i
xwxf
Instead of solving the primal optimization problem di-
rectly, one can consider the following dual program:
=≤≤
.1,
1
0
),(
2
1
max
iii
i
jiji
vl
xxK
αα
αα
here, )),((),( jiji xxxxKΦ= are kernels, which allow
many more general decision functions when the data are
not linearly separable, and the hyperplane can be repre-
sented in a feature space. The parameters i
α
are La-
grange multipliers.
In our research, we used the LIBSVM (version 2.84)
with three types of kernel functions (linear, polynomial,
and radial basis (RBF)). This library is an integrated tool
for support vector classification and regression which can
handle one-class SVM using the algorithm proposed by
Scholkopf et al. [33]. The LIBSVM is available at [43].
2.3. Structural and sequential features of miRNA
hairpins
There are many miRNA prediction methods which used
structural features as key features. However, recent re-
ports have shown that the sequence features are important
in predicting miRNA hairpins [39, 40]. Xue et al. [40]
indicated that the short contiguous subsequences of
miRNA hairpin sequences are significantly distinct from
other RNA hairpin sequences. For this reason, we pro-
pose a set of features that uses both the sequential fea-
tures and structural features to characterize the RNA
hairpin structure sequences.
For sequential features, we extracted features from
RNA hairpin sequences using a 5-nucleotide sliding win-
dow along an RNA hairpin sequence, and computed the
number of occurrences of each 5-gram. As a result, each
sequence is represented by a 1,024-dimensional vector of
the number of occurrences of all possible 5-grams. In
addition, several other features based on the sequences
are considered, such as the number of occurrences of each
nucleotide (A, C, G, U) in the 5' and 3' arms and GC-
content defined as in [35].
For structural features, we extracted them from the sec-
ondary structure of each hairpin. The secondary structures
are predicted using RNAfold [16]. The structural features
used in our method, were introduced in other previous
miRNA prediction methods [15, 35]. The features consist
of:
1. miRNA hairpin length as the number of nucleotides.
2. Loop size as the number of unpaired bases in the
hairpin loop of the predicted secondary structure.
3. Minimum free energy (MFE) as the total free energy
of hairpin structure predicted by using RNAfold tool.
4. Paired bases as the number of nucleotides predicted
to be in a hydrogen-bonded state.
5. The numbers of nucleotides from 5' site to the loop
start.
6. The number of 2-nt overhangs from 5' site and 3' site
to loop start.
In total, the feature vector, which is input to our one-
class SVMs, consists of 1,036 variables. It captures the
characteristics of both the sequence and the structure of
the RNA hairpin sequences.
3. RESULTS AND DISCUSSIONS
3.1. One-class SVM performance
We experimentally evaluated our method by using the
three-fold cross-validation procedure as described in Sec-
tion 2.1. In order to avoid miRNA hairpins in the same
group (defined in [44]) being divided into different folds,
we placed all similar miRNA hairpins in the same fold.
Three criteria of precision, recall, and F1-measure were
used to evaluate the results. We carried out experiments
with three types of kernels (linear, polynomial, and radial
basic function (RBF)). For each cross-validation run, we
used default parameters ,
δ
,dand various values of pa-
rameter v in the range of [0.07, 0.11]. The prediction
results are shown in Table 2. It can be seen that one-class
SVMs worked well with v= 0.10 and RBF kernels (
γ
=
0.0001); the highest F1-measure =95.27%, precision =
94.63%, and recall = 95.92%. The work in this paper is
an extension of our conference paper [37]. Basically,
there is one improvement here: we tried to check the con-
tribution of each kind of features to the prediction results.
Table 2. The prediction results of one-class SVMs on the testing dataset. Pre., Rec., and F1. are precision recall and F1-measure, re-
spectively.
Linear kernel Polynomial kernel RBF kernel
v Pre. Rec. F1. Pre. Rec. F1. Pre. Rec. F1.
0.07 88.02 98.66 93.04 88.02 98.66 93.04 91.20 97.32 94.16
0.08 89.09 98.66 93.63 89.09 98.66 93.63 94.67 95.30 94.98
0.09 91.03 95.30 93.11 91.03 95.30 93.11 95.27 94.63 94.95
0.10 93.75 90.60 92.15 93.71 89.93 91.78 95.92 94.63 95.27
0.11 94.37 89.93 92.10 93.71 89.93 91.78 95.24 93.96 94.59
144 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146
SciRes Copyright © 2008 JBiSE
Table 3. The prediction results of one-class SVMs using different
feature sets. FS1 is the feature set using only structural features;
FS2 is the feature set using both sequential features and struc-
tural features; Pre., Rec., and F1. are prediction, recall and F1-
measure, respectively.
Kernel Feature
set Pre. Rec. F1.
FS1 95.42 83.33 88.97
Linear FS2 93.75 90.60 92.15
FS1 95.42 83.33 88.97
Polynomial FS2 93.71 89.93 91.78
FS1 92.81 86.00 89.27
RBF FS2 95.92 94.63 95.27
To determine the importance of the sequential features
introduced for the first time for this research, we re-
moved the sequential features, and then conducted train-
ing and testing of the model again. The vector represen-
tation of examples using only structural features, denoted
as FS1, using two kinds of sequential and structural fea-
tures, denoted as FS2. Table 3 shows the results of one-
class SVM with the two kinds of vector representations
FS1 and FS2 (with the same value for parameter v =
0.10). It can be seen that the classifier performance of
FS1 is much lower than that of FS2. Therefore, the se-
quential features are relevant for modeling miRNA hair-
pins.
We also tried to compare the one-class SVM method
with the two-class SVM method, which has been intro-
duced in [35] for the same problem, predicting miRNAs.
Different from our one-class SVM method, the two-class
SVMs have to be trained on both positive and negative
classes of miRNA hairpins. As we mentioned in Section
1, only positive examples of miRNAs are available, and
it is difficult to select some potential miRNA hairpins as
``negatives''. Similar to some previous researches, we are
indisposed to establish a class of 727 “negative” miRNA
hairpins as described in Section 2.1, and thus the test
results here would be respect for the assumption that
these 727 negative examples would be true. Table 4 pre-
sents the performance of one-class SVMs and two-class
SVMs. It can be seen that although one-class SVMs
trained on fewer examples (only positive ones), they
performed well when compared with two-class SVM
methods.
3.2. Test on chromosomes 10, 15, and 21
To emphasize that the one-class SVM is more suitable
than a two-class classifier in the problem of recognizing
miRNA hairpins, we tested the one-class SVM method
on three human chromosomes 10, 15, and 21 and com-
pared the predicted results with the results from the two-
class SVM method described in [35].
In this work, the training dataset is all real miRNA
hairpins after excluding ones on the testing chromo-
somes (Table 1). Through various cross-validation ex-
periments as mentioned in the preceding section, we
found that one-class SVM models have a good perform-
ance with RBF kernel (
γ
= 0.0001). We fixed these
values to build th e one-class SVM model for the training
dataset of miRNA hairpins in this kind of experiments.
We then used ScorePin to scan along both genomic
strands of the three chromosomes, 10, 15, and 21, to find
good hairpin candidates. There were 62,508 hairpin can-
didates with a ScorePin-score 105. Among them,
10,035 were confirmed to have an RNAfold-predicted
hairpin with a minimum free energy -25 kcal/mol.
Each candidate is represented by a vector of structural
and sequential features as described in Section 2.3, and
then input to the one-class SVM model. Table 5 shows
some predicted miRNA hairpins which have previously
been confirmed by labor experiments or other prediction
methods. It can be seen, our method recognized all 4
existing miRNA hairpins on chromosome 10, and four of
five existing miRNAs on both chromosomes 15 and 21.
Other miRNA hairpins found by our method are pro-
vided in the supplementary files
(http://www.jaist.ac.jp/~tran/miRNAs/). We also used a
two-class SVM method as described in [35] to predict
miRNA hairpins on the same chromo somes 10, 15, and
21. In addition to all known miRNA hairpins in the train-
ing set of the one-class SVM method, the training data
for this two-class SVM model needed negative examples
of miRNA hairpins. We got all 727 negative examples of
hairpins as described in Section 2.1, together with 437
existing miRNA hairpins in the human genome exclud-
ing ones on chromosomes 10, 15, and 21, to train
Table 4. Comparisons of prediction results between one-class SVMs and two-class SVMs on the testing dataset. FS1 is the feature set
using only structural features; FS2 is the feature set using both sequential features and structural features; Pre., Rec., and F1. are pre-
diction, recall, and F1-measure, respectively.
One-class SVMs Two-class SVMs
Feature set Kernel Pre. Rec. F1. Pre. Rec. F1.
Linear 95.42 83.33 88.97 94.00 94.00 94.00
Polynomial 95.42 83.33 88.97 98.43 83.33 90.25
FS1
RBF 92.81 86.00 89.27 97.76 87.33 92.25
Linear 89.09 98.66 93.63 97.96 96.64 97.30
Polynomial 89.09 98.66 93.63 98.63 96.64 97.63
FS2
RBF 95.92 94.63 95.27 97.97 97.32 97.64
D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146 145
SciRes Copyright © 2008 JBiSE
Table 5. The known miRNA hairpins predicted by one-class
SVMs on chromosomes 10, 15, and 21. Location consists of the
start point and end point of the miRNA hairpin on the chromo-
some. MFE is a minimum free energy of the miRNA hairpin struc-
ture.
Chr
# Location miRNA_ID MFE
17927110:17927200 hsa-mir-511-1 -34.6
17927110:17927200 hsa-mir-511-2 -34.6
52729335:52729425 hsa-mir-605 -54.8
10
104186251:104186341 hsa-mir-146b -41.2
60903206:60903296 hsa-mir-190 -32.5
86956075:86956165 hsa-mir-7-2 -43.1
77289181:77289271 hsa-mir-184 -37.9
15
87712251:87712341 hsa-mir-9-3 -41.1
16833274:16833364 hsa-mir-99a -47.0
25868151:25868241 hsa-mir-155 -39.5
36014883:36014973 hsa-mir-802 -35.0
21
16834016:16834106 hsa-let-7c -43.2
Table 6. The known miRNA hairpins predicted by two-class SVMs
on chromosomes 10, 15, and 21. Location consists of the start
point and end point of the miRNA hairpin on the chromosome.
MFE is a minimum value of the miRNA hairpin structure.
Ch
#
Location miRNA_ID MFE
17927110:17927200 hsa-mir-511-1 -34.6
17927110:17927200 hsa-mir-511-2 -34.6
52729335:52729425 hsa-mir-605 -54.8
10
104186251:104186341 hsa-mir-146b -41.2
60903206:60903296 hsa-mir-190 -32.5
15 86956075:86956165 hsa-mir-7-2 -43.1
16833274:16833364 hsa-mir-99a -47.0
25868151:25868241 hsa-mir-155 -39.5
21
36014883:36014973 hsa-mir-802 -35.0
the discriminative two-class SVM model. Table 6 shows
some miRNA hairpins predicted by the two-class SVM
model. Among them, all four miRNA hairpins on chro-
mosome 10 were identified as same as using the one-class
SVM. Consistent with the results reported in [35], the
two-class SVM also recognized three of five existing
miRNA hairpins on chromosome 21, and two of four on
chromosome 15. Especially, while one-class SVM recog-
nized correctly an additional miRNA hairpin on chromo-
some 21, the two-class SVM predicted them as negatives.
The reasons why two-class SVM method incorrectly rec-
ognized some known miRNA hairpins might be that the
two-class SVM training is based on some negative exam-
ples of miRNA hairpins, which might not be true due to
the way to select "negative" ones.
4. CONCLUSIONS
We have introduced a one-class learning method to pre-
dict pre-miRNAs in the human genome. Our one-class
support vector machine method has an advantage over
other two-class discriminative models: it uses only avail-
able positive examples of miRNA hairpins for building
the model, while all existing methods for the same prob-
lem must use additional negative ones, which are not
available, since it is hard to find true negatives for the
training of a two-class classifier. Our method showed
good performance, and we have illustrated the case of
testing on chromosomes 10, 15 and 21, in which our
method gave the prediction results more precise than
those from an existing two-class support vector machine
method.
ACKNOWLEDGMENTS
The research described in this paper was partially supported by the Insti-
tute for Bioinformatics Research and Development of the Japan Science
and Technology Agency, and by COE project JCP KS1 of the Japan
Advanced Institute of Science and Technology. The first author has been
supported by Japanese government scholarship (Monbukagakusho) to
study in Japan. The authors also would like to thank Prof. Ivo Hofacker
from University of Vienna for providing the ViennaRNA package and
Dr. Chih-Jen Lin from National Taiwan University for providing the
LIBSVM tool.
REFERENCES
[1] V. Ambros. (2004) The functions of animal microRNAs. Nature, 431,
350–355.
[2] D. P. Bartel. (2004) MicroRNAs: genomics, biogenesis, mechanism,
and function. Cell, 116, 281–297.
[3] I. Bentwich. (2005) Prediction and validation of microRNAs and
their targets. FEBS Lett, 579, 5904–5910.
[4] E. Berezikov, E. Cuppen, and R. H. Plas terk. (2006) Approaches to
microRNA discovery. Nat. Genet., 38, S2-S7.
[5] C. J. Burges. (1998) A tutorial on support vector machines for pat-
tern recognition. J. Data Mining and Knowledge Discovery, 2, 121-167.
[6] M. T. Bohnsack, K. Czaplinski and D. Grlich. (2004) Exportin 5 is
a RanGTP-dependent dsRNA-binding protein that mediates nuclear
export of pre-miRNAs. RNA, 10, 185–191.
[7] J. Brown, P. Sanseau. (2005) A computational view of microRNAs
and their targets. Drug discovery today: biosilico, 10(8), 595–601.
[8] C. -C. Chang, and C. -J. Lin. (2001) LIBSVM: a library for support
vector machines.
[9] Y. Chen, X. Zhou, and T. S. Huang. (2001) One-class SVM for
learning in image retrieval. Proc. IEEE Int’l Conf. on Image Processin g,
Thessaloniki, Greece.
[10] M. A. Denli, B. J. Tops, H. A. Plasterk, R. F. Ketting, and G.
J.Hannon. (2004) Processing of primary microRNAs by the Microproc-
essor complex. Nature, 432, 231–235.
[11] Y. Grad, J. Aach, G. D. Hayes, B. J. Reinhart, G. M. Church, G.
Ruvkun, and J. Kim. (2003) Computational and experimental identifica-
tion of C. elegans microRNAs. Mol Cell, 11,1253–1263.
[12] R. I. Gregory, K. P. Yan, G. Amuthan, T. Chendrimada, B. Dorato-
taj, N. Cooch, and R. Shiekhattar. (2004) The microprocessor complex
mediates the genesis of microRNAs. Nature, 423, 235–240.
[13] S. Griffiths-Jones, R. J. Grocock, S. Dongen, A. Bateman, A. J.
Enright. (2006) miRBase: microRNA sequences, targets and gene no-
menclature. Nucleic Acids Res., 34, D140–D144.
[14] S. Griffiths-Jones. (2004) The microRNA Registry, Nucleic Acids
Res., 32, D109–D111.
[15] S. A. Helvik, O. S. Jr, and P. Strom. (2007) Reliable prediction of
Drosha processing sites improves microRNA gene prediction. Bioinfor-
146 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146
SciRes Copyright © 2008 JBiSE
matics, 23(2), 142-149.
[16] I. L. Hofacker, S. Fontana, W. Stadler, S. Bonhoeffer, M. Tacker,
and P. Schuster. (1994) Fast folding and comparison of RNA secondary
structures, Monatshefte f. Chem ie, 125, 167-188.
[17] I. L. Hofacker. (2003) Vienna RNA secondary structure server.
Nucleic Acids Res, 31, 3429–3431.
[18] M. Kiriakidou, P. T. Nelson, A. Kouranov, P. Fitziev, C. Bouyiou-
kos, Z. Mourelatos, and A. Hatzigeorgiou. (2004) A combined computa-
tional experimental approach predicts human microRNA targets. Genes
Dev, 18, 1165–1178.
[19] A. Kowalczyk, and B. Raskutti. (2002) One-class svm for yeast
regulation pr ediction. Proc. SI GKDD Explorations Workshop, 99–100.
[20] R. Kohavi. (1995) A study of cross-validation and bootstrap for
accuracy estimation and model selection. Proc. 14th IJCAI, San Fran-
cisco, CA, Morgan Kaufmann Pubshers, 113 7–1143,.
[21] Y. Kong, J.-H. Han. (2005) MicroRNA: Biological and computa-
tional perspective. Geno. Prot. Bioinfo., 3(2), 62–72.
[22] J. Krol, K. Sobczak, U. Wilcztnska, M. Drath, A. Jasinska, D.
Kaczynska, and W. J. Krzyzosiak. (2004) Structural features of mi-
croRNA (miRNA) precursors and their relevance to miRNA biogenesis
and small interfering RNA/short hairpin RNA design. J Biol Chem, 279,
42230–42239.
[23] M. Lagos-Quintana, R. Rauhut, W. L endeckel, and T. Tuschl. (2001)
Identification of novel gene coding for small expressed RNAs. Science,
294, 853–858.
[24] E. C. Lai, P. Tomancak, R. W. Williams, and G. M. Rubin. (2003)
Computational identification of Drosophila microRNA genes. Genome
Biol, 4, R42.
[25] Y. Lee, C. Ahn, J. Han, H. Choi, J. Yim, P. Provost, O. Radmark, S.
Kim, and V. N. Kim. (2003) The nuclear RNase III Drosha initiates
microRNA processing. Nature, 424, 415–419.
[26] Y. Lee, M. Kim, J. Han, K. Yeom, S. H. Lee, S. H. Baek, and V. N.
Kim. (2004) MicroRNA genes are transcribed by RNA polymerase II.
EmboJ, 23,4051–4060.
[27] L. P. Lim, M. E. Glasner, S. Yekta, C. B. Burge, and D. P. Bartel.
(2003) Vertebrate microRNA genes. Science, 299, 1540.
[28] L. P. Lim, N. C. Lau, E. G. Weinstein, A. Abdelhakim, S. Yekta, M.
W. Rhoades, C. B. Burge, and D. P. Bartel. (2003) The microRNAs of
Caenorhabditis elegans. Genes Dev, 17, 991-1008.
[29] E. Lund, S. Guttinger, A. Calado, J. E. Dahlberg, and U. Kutay.
(2004) Nuclear export of microRNA precursors. Science, 303, 95–98.
[30] L. M. Manevitz, and M. Yousef. (2001) One-class SVMs for docu-
ment classification. Journal of Machine Learning, 2, 139–154.
[31] J. W. Nam, K. R. Shin, Y. V. Lee, N. Kim, and B. T. Zhang. (2005)
Human microRNA prediction through a probabilistic co-learning model
of sequence and structure. Nucleic Acids Res., 33, 3570–3581.
[32] U. Ohler, S. Yekta, L. P. Lim, D. P. Bartel, and C. B. Burge. (2004)
Patterns of flanking sequence conservation and a characteristic upstream
motif for microRNA gene identification. RNA, 10, 1309–1322.
[33] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C.
Williamson. (2001) Estimating the support of a high-dimensional distri-
bution. Neural Comput, 13, 1443–1471.
[34] P. Strom, O. S. Jr, M. Nedland, T. B. Grnfeld, Y. Lin, M. B.
Bass, J. Canon. (2006) Conserved microRNA characteristics in mam-
mals. Oligonucleotides, 16, 115–144.
[35] K. Szafranski, M. Megraw, M. Reczko, G. H. Hatzigeorgiou. (2006)
Support vector machine for predicting microRNA hairpins. Proc. The
2006 International Conference on Bioinformatics and Computational
Biology, 270–276.
[36] A. Tsirigos, and I. Rigoutsos. (2005) A sensitive, support-vector-
machine method for the detection of horizontal gene transfers in viral,
archaeal and bacterial genomes. Nucleic Acids Research, 33(12):3699–
3707.
[37] D. H. Tran, T. H. Pham, K. Satou, and T. B. Ho. (2008) Prediction
of microRNA hairpins using one-class support vector machine. Proc.
The 2nd international conference on bioinformatics and biomedical
engineering (iCBBE), Sanghai, China, May 16-18.
[38] V. Vapnik. Statistical learning theory, Wiley, Chichester, United
Kingdom, 19 98.
[39] X. Xie, J. Lu, E. J. Kulbokas, T. R. Golub, V. Mootha, K. Lindblad-
Toh, E. S. Lander, and M. Kellis. (2005) Systematic discovery of regula-
tory motifs in human promoters and 3’ UTRs by comparison of several
mammals. Nature, 434, 338–346.
[40] C. Xue, F. Li, T. He, G. P. Liu, Y. Li, and X. Zhang. (2005) Classi-
fication of real and pseudo microRNA precursors using local structure-
sequence features and support vector machine. BMC Bioinformatics, 6,
310.
[41] L. H. Yang, W. Hsu, M. L. Lee, and L. Wong. (2006) Identification
of microRNA precursors via SVM, Proc. The 4th Asia-
PacificBioinformatics Conference, 267–276.
[42] Y. Zeng, R. Yi, and B. R. Cullen. (2005) Recognition and cleavage
of primary microRNA precursors by the nuclear processing enzyme
Drosha. Embo J, 24,138–148.
[43] http://www.csie.ntu.edu.tw/ cjlin/libsvm/
[44] http://microrna.sanger.ac.uk/sequences/index.shtml
[45] http://www.tbi.univie.ac.at/RNA/