Prediction of human microRNA hairpins using only positive sample learning

doi:10.4236/jbise.2008.12023

Paper Menu >>

Journal Menu >>

J. Biomedical Science and Engineering, 2008, 1, 141-146

Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE

Prediction of human microRNA hairpins using

only positive sample learning

Dang Hung Tran*, 1, Tho Hoan Pham2, Kenji Satou1, 3 & Tu Bao Ho1

1Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292 J apan. 2Han oi National Unive rs i ty of Education, 136 Xuan

Thuy, Hanoi, Viet Nam. 3Kanazawa University, Kakuma, Kanazawa 920-1192, Japan. *Corresponden ce should be addressed to Dang Hung T r an

(hungtd@jaist.ac.jp).

ABSTRACT

MicroRNAs (miRNAs) are small molecular

non-coding RNAs that have important roles in

the post-transcriptional mechanism of animals

and plants. They are commonly 21-25 nucleo-

tides (nt) long and derived from 60-90 nt RNA

hairpin structures, called miRNA hairpins. A lar-

ger number of sequence segments in the human

genome have been computationally identified

with such 60-90 nt hairpins, however the major-

ity of them are not miRNA hairpins. Most exist-

ing computational methods for predicting

miRNA hairpins are based on a two-class classi-

fier to distinguish between miRNA hairpins and

other sequence segments with hairpin struc-

tures. The difficulty of these methods is how to

select hairpins as negative examples of miRNA

hairpins in the training dataset, since only a few

miRNA hairpins are available. Therefore, these

classifiers may be mis-trained due to some false

negative examples of the training dataset. In this

paper, we introduce a one-class support vector

machine (SVM) method to predict miRNA hair-

pins among the hairpin structures. Different from

existing methods for predicting miRNA hairpins,

the one-class SVM classifier is trained only on

the information of the miRNA class. We also il-

lustrate some examples of predicting miRNA

hairpins in human chromosomes 10, 15, and 21,

where our method overcomes the above disad-

vantages of existing two-class methods.

Keywords: MicroRNA; Hairpin; One-class SVM

1. INTRODUCTION

MicroRNAs (miRNAs) are small, non-coding RNAs (21-

25 nucleotides in length) that regulate the expression of

protein-encoding genes at the post-transcriptional level [1,

2, 21]. Each miRNA derives from a larger precursor,

which folds into an imperfect stem-loop structure.

In human, the processing and maturation of miRNAs

are divided into several steps before silencing their targets.

First, the long primary transcripts (pri-miRNAs), which

can be up to several kilobases, are processed by Drosha-

complex in nucleus to yield precursor miRNAs (pre-

miRNAs) [10, 12]. The pre-miRNA is a double-stranded

sequence of about 60-90 nt with a 2-nt 3' overhang and

forms a hairpin structure (also called miRNA hairpin).

Second, pre-miRNAs are transported from the nucleus

into the cytoplasm by another complex, which consists of

Exportin 5 and RanGTP [6, 29]. Subsequently, the pre-

miRNA is cleaved into an imperfect double-stranded

RNA duplex by endonuclease RNase III enzyme called

Dicer [25, 29, 42]. This duplex is composed of the mature

miRNA strand and its complementary strand. Finally,

mature miRNAs are incorporated into RICS (RNA-

induced silencing complex) before they bind to their tar-

gets to regulate gene expression.

Until now, several computational approaches have

been proposed for predicting miRNAs. Most of them are

based on the common structural characteristic of secon-

dary structures of their pre-miRNAs [15, 35, 40]. Since

pre-miRNAs are often short (60-90 nt), there can be too

many subsequences in a genome having hairpin structures.

However, only a minority of them are miRNA hairpins.

Using only information of their structures therefore may

not allow us to distinguish miRNA hairpins from other

hairpin structures. Other methods that consider informa-

tion of both sequences and structures are needed.

Most methods so far used a two-class classifier to sepa-

rate the miRNA hairpins from the ones assumed to be

negative. The main difference between these methods is

how negative examples are selected for the two-class

classifier training dataset. For example, Szafranski et al.

[35] and Xue et al. [40] selected examples that overlap

with one of the last exon of known mRNAs; or Helvik et

al. [15] tried to get them randomly from DNA sequences

with hairpin structures as ``negative'' miRNA hairpins.

The negative examples collected in such ways would con-

tain false negatives, since no study so far has mentioned

the information regarding true negative miRNA hairpins.

In other words, only the information of miRNA hairpins

is available. Therefore, the classifier of existing methods

may be incorrect, due to some false negative miRNA

hairpins contained in the training dataset.

In this paper, we present a new method for predicting

miRNA hairpins that employs support vector machines

142 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146

for one-class classification (one-class SVMs). One-class

SVMs recently have been successfully applied in several

areas, especially domains with imbalanced data such as

document classification [30], gene prediction [19] and

image retrieval [9]. Different from previous methods for

predicting miRNA hairpins, our method uses only avail-

able miRNA hairpins for training the model, while other

methods train their classifiers by using an additional data-

set of negative examples, which may contain some false

negatives as explained above. Moreover, more features of

hairpin sequences and structures are used to represent

hairpins, with expectation that they would be useful for

the model. Our one-class SVM classifier gave good re-

sults in predicting miRNA hairpins. We also illustrated

some examples of predicting miRNA hairpins in human

chromosomes 10, 15, and 21 where our method can avoid

the problem of false negative examples of the existing

two-class methods.

2. MATERIALS AND METHODS

2.1. Datasets for training and testing

As mentioned in Section 1, our method uses one-class

SVMs to recognize miRNA hairpins from potential ones

produced by ScorePin [15]. To do this, the one-class

SVM model should capture the characteristics of known

miRNA hairpins. In our work, the positive class we used

consists of 474 known human miRNA hairpins from

miRBase (version 8.1) [13, 14]. (http://microrna.

sanger.ac.uk/sequences/) that have been verified by ex-

periments or predicted by computational methods with

high confidence. To ensure that all miRNA hairpins were

folded as hairpins, we removed a few of those containing

none or more than one RNAfold-predicted hairpin-loop.

The positive class used in this work is therefore of 451

miRNA hairpins.

To evaluate our one-class SVM models for miRNA

hairpins, we conducted two kinds of experiments.

Cross-validation : like some previous researches [15, 35,

40], we first prepared the dataset for the cross-validation

procedure to compare our method with the other methods.

The dataset contained 451 positive examples as described

above, and 727 negative examples of miRNAs hairpins.

These 727 negative miRNA hairpins were ScorePin-

hairpins that overlap with the last exon of known coding-

protein genes. We randomly partitioned the dataset into

three subsets, such that the numbers of both positive and

negative examples in each of the three subsets were

equivalent or nearly equivalent. Of them, one subset was

retrained as the validation data for test prediction methods,

and was trained on the two remaining subsets (note that

with our method, one-class SVM, only positive examples

are used for the training). The cross-validation procedure

was repeated three times. Results from three trials were

then averaged to produce a single estimation.

Test on chromosomes 10, 15, and 21: we use all known

miRNA hairpins, excluding ones on chromosomes 10, 15

and 21, to train the one-class SVM model. This model is

then used to recognize miRNA hairpins from ScorePin-

hairpins (see Section 3.1). Table 1 presents a summary of

all data sets used in two kinds of experiments.

Table 1. The data of human miRNA hairpins.

Experiment #Examples

Testing on chromosomes 437 training examples

41039 hairpin candidates

Cross-validation 2/3 x 451 training positives

1/3 x 451 testing positives

1/3 x 727 testing negatives

2.2. One-class support vector machines

Support vector machine (SVM) is a learning technique

based on statistical learning theory [38]. It has been ap-

plied to a wide range of real-world tasks. The formulation

of SVMs can be considered as a simple linear classifica-

tion, normally using both negative and positive examples

for training. SVMs can perform nonlinear separation by

using a kernel technique, which realizes a nonlinear map-

ping to a feature space. Scholkopf et al. [33] have ex-

tended standard SVMs to one-class classification prob-

lems. Their approach is to construct a hyperplane that is

maximally distant from the origin [33].

In this section, we give details of the algorithm for

training one-class SVMs proposed by Scholkopf et al.

[33]. The training algorithm is as follows: let the training

data N

lRxxx ∈,...,, 21 belong to one class, where i

xis a

feature vector and l is the number of examples. The one-

class SVM estimates a function that will take the valu e +1

in a region where the majority of the data points are con-

centrated, and the value -1 everywhere else [30, 33]. For-

mally, the function can be written as follows:

⎪

⎩

⎪

⎨

⎧

∈−

∈+

=Sxif

Sxif

xf 1

)(

where S is a simple subset of input space and S is the

complement of.SLet HX →

: be a kernel map which

converts the training examples from the origin space to a

feature space. The strategy is to map the data into the fea-

ture space corresponding to the kernel, and to separate

them from the origin by the maximum margin. In order to

separate the data set from the origin, we need to solve the

following quadratic programming problem [9, 30, 33]:

⎪

⎩

⎪

⎨

⎧

≥=−≥Φ

−+ ∑

.0;,...,1,))(.(

min

iii

lixw

ξξρ

ρξ

where )1,0(

∈

v is a parameter that represents an upper

bound on the fraction of outliers in the data,

is the mar-

gin of the hyperplane with respect to the data, and i

x are

non-zero slack variables allowing a soft margin. We ob-

tain w and

by solving this problem. When we give a

new data point

to be classified, a label is assigned ac-

D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146 143

cording to the decision function, which can be expressed

as: )))(.sgn(()(

−

xwxf

Instead of solving the primal optimization problem di-

rectly, one can consider the following dual program:

⎪

⎩

⎪

⎨

⎧

=≤≤ ∑

∑

.1,

),(

max

iii

jiji

xxK

αα

here, )),((),( jiji xxxxKΦ= are kernels, which allow

many more general decision functions when the data are

not linearly separable, and the hyperplane can be repre-

sented in a feature space. The parameters i

are La-

grange multipliers.

In our research, we used the LIBSVM (version 2.84)

with three types of kernel functions (linear, polynomial,

and radial basis (RBF)). This library is an integrated tool

for support vector classification and regression which can

handle one-class SVM using the algorithm proposed by

Scholkopf et al. [33]. The LIBSVM is available at [43].

2.3. Structural and sequential features of miRNA

hairpins

There are many miRNA prediction methods which used

structural features as key features. However, recent re-

ports have shown that the sequence features are important

in predicting miRNA hairpins [39, 40]. Xue et al. [40]

indicated that the short contiguous subsequences of

miRNA hairpin sequences are significantly distinct from

other RNA hairpin sequences. For this reason, we pro-

pose a set of features that uses both the sequential fea-

tures and structural features to characterize the RNA

hairpin structure sequences.

For sequential features, we extracted features from

RNA hairpin sequences using a 5-nucleotide sliding win-

dow along an RNA hairpin sequence, and computed the

number of occurrences of each 5-gram. As a result, each

sequence is represented by a 1,024-dimensional vector of

the number of occurrences of all possible 5-grams. In

addition, several other features based on the sequences

are considered, such as the number of occurrences of each

nucleotide (A, C, G, U) in the 5' and 3' arms and GC-

content defined as in [35].

For structural features, we extracted them from the sec-

ondary structure of each hairpin. The secondary structures

are predicted using RNAfold [16]. The structural features

used in our method, were introduced in other previous

miRNA prediction methods [15, 35]. The features consist

of:

1. miRNA hairpin length as the number of nucleotides.

2. Loop size as the number of unpaired bases in the

hairpin loop of the predicted secondary structure.

3. Minimum free energy (MFE) as the total free energy

of hairpin structure predicted by using RNAfold tool.

4. Paired bases as the number of nucleotides predicted

to be in a hydrogen-bonded state.

5. The numbers of nucleotides from 5' site to the loop

start.

6. The number of 2-nt overhangs from 5' site and 3' site

to loop start.

In total, the feature vector, which is input to our one-

class SVMs, consists of 1,036 variables. It captures the

characteristics of both the sequence and the structure of

the RNA hairpin sequences.

3. RESULTS AND DISCUSSIONS

3.1. One-class SVM performance

We experimentally evaluated our method by using the

three-fold cross-validation procedure as described in Sec-

tion 2.1. In order to avoid miRNA hairpins in the same

group (defined in [44]) being divided into different folds,

we placed all similar miRNA hairpins in the same fold.

Three criteria of precision, recall, and F1-measure were

used to evaluate the results. We carried out experiments

with three types of kernels (linear, polynomial, and radial

basic function (RBF)). For each cross-validation run, we

used default parameters ,

,dand various values of pa-

rameter v in the range of [0.07, 0.11]. The prediction

results are shown in Table 2. It can be seen that one-class

SVMs worked well with v= 0.10 and RBF kernels (

0.0001); the highest F1-measure =95.27%, precision =

94.63%, and recall = 95.92%. The work in this paper is

an extension of our conference paper [37]. Basically,

there is one improvement here: we tried to check the con-

tribution of each kind of features to the prediction results.

Table 2. The prediction results of one-class SVMs on the testing dataset. Pre., Rec., and F1. are precision recall and F1-measure, re-

spectively.

Linear kernel Polynomial kernel RBF kernel

v Pre. Rec. F1. Pre. Rec. F1. Pre. Rec. F1.

0.07 88.02 98.66 93.04 88.02 98.66 93.04 91.20 97.32 94.16

0.08 89.09 98.66 93.63 89.09 98.66 93.63 94.67 95.30 94.98

0.09 91.03 95.30 93.11 91.03 95.30 93.11 95.27 94.63 94.95

0.10 93.75 90.60 92.15 93.71 89.93 91.78 95.92 94.63 95.27

0.11 94.37 89.93 92.10 93.71 89.93 91.78 95.24 93.96 94.59

144 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146

Table 3. The prediction results of one-class SVMs using different

feature sets. FS1 is the feature set using only structural features;

FS2 is the feature set using both sequential features and struc-

tural features; Pre., Rec., and F1. are prediction, recall and F1-

measure, respectively.

Kernel Feature

set Pre. Rec. F1.

FS1 95.42 83.33 88.97

Linear FS2 93.75 90.60 92.15

FS1 95.42 83.33 88.97

Polynomial FS2 93.71 89.93 91.78

FS1 92.81 86.00 89.27

RBF FS2 95.92 94.63 95.27

To determine the importance of the sequential features

introduced for the first time for this research, we re-

moved the sequential features, and then conducted train-

ing and testing of the model again. The vector represen-

tation of examples using only structural features, denoted

as FS1, using two kinds of sequential and structural fea-

tures, denoted as FS2. Table 3 shows the results of one-

class SVM with the two kinds of vector representations

FS1 and FS2 (with the same value for parameter v =

0.10). It can be seen that the classifier performance of

FS1 is much lower than that of FS2. Therefore, the se-

quential features are relevant for modeling miRNA hair-

pins.

We also tried to compare the one-class SVM method

with the two-class SVM method, which has been intro-

duced in [35] for the same problem, predicting miRNAs.

Different from our one-class SVM method, the two-class

SVMs have to be trained on both positive and negative

classes of miRNA hairpins. As we mentioned in Section

1, only positive examples of miRNAs are available, and

it is difficult to select some potential miRNA hairpins as

``negatives''. Similar to some previous researches, we are

indisposed to establish a class of 727 “negative” miRNA

hairpins as described in Section 2.1, and thus the test

results here would be respect for the assumption that

these 727 negative examples would be true. Table 4 pre-

sents the performance of one-class SVMs and two-class

SVMs. It can be seen that although one-class SVMs

trained on fewer examples (only positive ones), they

performed well when compared with two-class SVM

methods.

3.2. Test on chromosomes 10, 15, and 21

To emphasize that the one-class SVM is more suitable

than a two-class classifier in the problem of recognizing

miRNA hairpins, we tested the one-class SVM method

on three human chromosomes 10, 15, and 21 and com-

pared the predicted results with the results from the two-

class SVM method described in [35].

In this work, the training dataset is all real miRNA

hairpins after excluding ones on the testing chromo-

somes (Table 1). Through various cross-validation ex-

periments as mentioned in the preceding section, we

found that one-class SVM models have a good perform-

ance with RBF kernel (

= 0.0001). We fixed these

values to build th e one-class SVM model for the training

dataset of miRNA hairpins in this kind of experiments.

We then used ScorePin to scan along both genomic

strands of the three chromosomes, 10, 15, and 21, to find

good hairpin candidates. There were 62,508 hairpin can-

didates with a ScorePin-score ≤ 105. Among them,

10,035 were confirmed to have an RNAfold-predicted

hairpin with a minimum free energy ≤ -25 kcal/mol.

Each candidate is represented by a vector of structural

and sequential features as described in Section 2.3, and

then input to the one-class SVM model. Table 5 shows

some predicted miRNA hairpins which have previously

been confirmed by labor experiments or other prediction

methods. It can be seen, our method recognized all 4

existing miRNA hairpins on chromosome 10, and four of

five existing miRNAs on both chromosomes 15 and 21.

Other miRNA hairpins found by our method are pro-

vided in the supplementary files

(http://www.jaist.ac.jp/~tran/miRNAs/). We also used a

two-class SVM method as described in [35] to predict

miRNA hairpins on the same chromo somes 10, 15, and

21. In addition to all known miRNA hairpins in the train-

ing set of the one-class SVM method, the training data

for this two-class SVM model needed negative examples

of miRNA hairpins. We got all 727 negative examples of

hairpins as described in Section 2.1, together with 437

existing miRNA hairpins in the human genome exclud-

ing ones on chromosomes 10, 15, and 21, to train

Table 4. Comparisons of prediction results between one-class SVMs and two-class SVMs on the testing dataset. FS1 is the feature set

using only structural features; FS2 is the feature set using both sequential features and structural features; Pre., Rec., and F1. are pre-

diction, recall, and F1-measure, respectively.

One-class SVMs Two-class SVMs

Feature set Kernel Pre. Rec. F1. Pre. Rec. F1.

Linear 95.42 83.33 88.97 94.00 94.00 94.00

Polynomial 95.42 83.33 88.97 98.43 83.33 90.25

FS1

RBF 92.81 86.00 89.27 97.76 87.33 92.25

Linear 89.09 98.66 93.63 97.96 96.64 97.30

Polynomial 89.09 98.66 93.63 98.63 96.64 97.63

FS2

RBF 95.92 94.63 95.27 97.97 97.32 97.64

D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146 145

Table 5. The known miRNA hairpins predicted by one-class

SVMs on chromosomes 10, 15, and 21. Location consists of the

start point and end point of the miRNA hairpin on the chromo-

some. MFE is a minimum free energy of the miRNA hairpin struc-

ture.

Chr

# Location miRNA_ID MFE

17927110:17927200 hsa-mir-511-1 -34.6

17927110:17927200 hsa-mir-511-2 -34.6

52729335:52729425 hsa-mir-605 -54.8

104186251:104186341 hsa-mir-146b -41.2

60903206:60903296 hsa-mir-190 -32.5

86956075:86956165 hsa-mir-7-2 -43.1

77289181:77289271 hsa-mir-184 -37.9

87712251:87712341 hsa-mir-9-3 -41.1

16833274:16833364 hsa-mir-99a -47.0

25868151:25868241 hsa-mir-155 -39.5

36014883:36014973 hsa-mir-802 -35.0

16834016:16834106 hsa-let-7c -43.2

Table 6. The known miRNA hairpins predicted by two-class SVMs

on chromosomes 10, 15, and 21. Location consists of the start

point and end point of the miRNA hairpin on the chromosome.

MFE is a minimum value of the miRNA hairpin structure.

Location miRNA_ID MFE

17927110:17927200 hsa-mir-511-1 -34.6

17927110:17927200 hsa-mir-511-2 -34.6

52729335:52729425 hsa-mir-605 -54.8

104186251:104186341 hsa-mir-146b -41.2

60903206:60903296 hsa-mir-190 -32.5

15 86956075:86956165 hsa-mir-7-2 -43.1

16833274:16833364 hsa-mir-99a -47.0

25868151:25868241 hsa-mir-155 -39.5

36014883:36014973 hsa-mir-802 -35.0

the discriminative two-class SVM model. Table 6 shows

some miRNA hairpins predicted by the two-class SVM

model. Among them, all four miRNA hairpins on chro-

mosome 10 were identified as same as using the one-class

SVM. Consistent with the results reported in [35], the

two-class SVM also recognized three of five existing

miRNA hairpins on chromosome 21, and two of four on

chromosome 15. Especially, while one-class SVM recog-

nized correctly an additional miRNA hairpin on chromo-

some 21, the two-class SVM predicted them as negatives.

The reasons why two-class SVM method incorrectly rec-

ognized some known miRNA hairpins might be that the

two-class SVM training is based on some negative exam-

ples of miRNA hairpins, which might not be true due to

the way to select "negative" ones.

4. CONCLUSIONS

We have introduced a one-class learning method to pre-

dict pre-miRNAs in the human genome. Our one-class

support vector machine method has an advantage over

other two-class discriminative models: it uses only avail-

able positive examples of miRNA hairpins for building

the model, while all existing methods for the same prob-

lem must use additional negative ones, which are not

available, since it is hard to find true negatives for the

training of a two-class classifier. Our method showed

good performance, and we have illustrated the case of

testing on chromosomes 10, 15 and 21, in which our

method gave the prediction results more precise than

those from an existing two-class support vector machine

method.

ACKNOWLEDGMENTS

The research described in this paper was partially supported by the Insti-

tute for Bioinformatics Research and Development of the Japan Science

and Technology Agency, and by COE project JCP KS1 of the Japan

Advanced Institute of Science and Technology. The first author has been

supported by Japanese government scholarship (Monbukagakusho) to

study in Japan. The authors also would like to thank Prof. Ivo Hofacker

from University of Vienna for providing the ViennaRNA package and

Dr. Chih-Jen Lin from National Taiwan University for providing the

LIBSVM tool.

REFERENCES

[1] V. Ambros. (2004) The functions of animal microRNAs. Nature, 431,

350–355.

[2] D. P. Bartel. (2004) MicroRNAs: genomics, biogenesis, mechanism,

and function. Cell, 116, 281–297.

[3] I. Bentwich. (2005) Prediction and validation of microRNAs and

their targets. FEBS Lett, 579, 5904–5910.

[4] E. Berezikov, E. Cuppen, and R. H. Plas terk. (2006) Approaches to

microRNA discovery. Nat. Genet., 38, S2-S7.

[5] C. J. Burges. (1998) A tutorial on support vector machines for pat-

tern recognition. J. Data Mining and Knowledge Discovery, 2, 121-167.

[6] M. T. Bohnsack, K. Czaplinski and D. Grlich. (2004) Exportin 5 is

a RanGTP-dependent dsRNA-binding protein that mediates nuclear

export of pre-miRNAs. RNA, 10, 185–191.

[7] J. Brown, P. Sanseau. (2005) A computational view of microRNAs

and their targets. Drug discovery today: biosilico, 10(8), 595–601.

[8] C. -C. Chang, and C. -J. Lin. (2001) LIBSVM: a library for support

vector machines.

[9] Y. Chen, X. Zhou, and T. S. Huang. (2001) One-class SVM for

learning in image retrieval. Proc. IEEE Int’l Conf. on Image Processin g,

Thessaloniki, Greece.

[10] M. A. Denli, B. J. Tops, H. A. Plasterk, R. F. Ketting, and G.

J.Hannon. (2004) Processing of primary microRNAs by the Microproc-

essor complex. Nature, 432, 231–235.

[11] Y. Grad, J. Aach, G. D. Hayes, B. J. Reinhart, G. M. Church, G.

Ruvkun, and J. Kim. (2003) Computational and experimental identifica-

tion of C. elegans microRNAs. Mol Cell, 11,1253–1263.

[12] R. I. Gregory, K. P. Yan, G. Amuthan, T. Chendrimada, B. Dorato-

taj, N. Cooch, and R. Shiekhattar. (2004) The microprocessor complex

mediates the genesis of microRNAs. Nature, 423, 235–240.

[13] S. Griffiths-Jones, R. J. Grocock, S. Dongen, A. Bateman, A. J.

Enright. (2006) miRBase: microRNA sequences, targets and gene no-

menclature. Nucleic Acids Res., 34, D140–D144.

[14] S. Griffiths-Jones. (2004) The microRNA Registry, Nucleic Acids

Res., 32, D109–D111.

[15] S. A. Helvik, O. S. Jr, and P. Strom. (2007) Reliable prediction of

Drosha processing sites improves microRNA gene prediction. Bioinfor-

146 D. H. Tran et al. / J. Biomedical Science and Engineering 1 (2008) 141-146

matics, 23(2), 142-149.

[16] I. L. Hofacker, S. Fontana, W. Stadler, S. Bonhoeffer, M. Tacker,

and P. Schuster. (1994) Fast folding and comparison of RNA secondary

structures, Monatshefte f. Chem ie, 125, 167-188.

[17] I. L. Hofacker. (2003) Vienna RNA secondary structure server.

Nucleic Acids Res, 31, 3429–3431.

[18] M. Kiriakidou, P. T. Nelson, A. Kouranov, P. Fitziev, C. Bouyiou-

kos, Z. Mourelatos, and A. Hatzigeorgiou. (2004) A combined computa-

tional experimental approach predicts human microRNA targets. Genes

Dev, 18, 1165–1178.

[19] A. Kowalczyk, and B. Raskutti. (2002) One-class svm for yeast

regulation pr ediction. Proc. SI GKDD Explorations Workshop, 99–100.

[20] R. Kohavi. (1995) A study of cross-validation and bootstrap for

accuracy estimation and model selection. Proc. 14th IJCAI, San Fran-

cisco, CA, Morgan Kaufmann Pubshers, 113 7–1143,.

[21] Y. Kong, J.-H. Han. (2005) MicroRNA: Biological and computa-

tional perspective. Geno. Prot. Bioinfo., 3(2), 62–72.

[22] J. Krol, K. Sobczak, U. Wilcztnska, M. Drath, A. Jasinska, D.

Kaczynska, and W. J. Krzyzosiak. (2004) Structural features of mi-

croRNA (miRNA) precursors and their relevance to miRNA biogenesis

and small interfering RNA/short hairpin RNA design. J Biol Chem, 279,

42230–42239.

[23] M. Lagos-Quintana, R. Rauhut, W. L endeckel, and T. Tuschl. (2001)

Identification of novel gene coding for small expressed RNAs. Science,

294, 853–858.

[24] E. C. Lai, P. Tomancak, R. W. Williams, and G. M. Rubin. (2003)

Computational identification of Drosophila microRNA genes. Genome

Biol, 4, R42.

[25] Y. Lee, C. Ahn, J. Han, H. Choi, J. Yim, P. Provost, O. Radmark, S.

Kim, and V. N. Kim. (2003) The nuclear RNase III Drosha initiates

microRNA processing. Nature, 424, 415–419.

[26] Y. Lee, M. Kim, J. Han, K. Yeom, S. H. Lee, S. H. Baek, and V. N.

Kim. (2004) MicroRNA genes are transcribed by RNA polymerase II.

EmboJ, 23,4051–4060.

[27] L. P. Lim, M. E. Glasner, S. Yekta, C. B. Burge, and D. P. Bartel.

(2003) Vertebrate microRNA genes. Science, 299, 1540.

[28] L. P. Lim, N. C. Lau, E. G. Weinstein, A. Abdelhakim, S. Yekta, M.

W. Rhoades, C. B. Burge, and D. P. Bartel. (2003) The microRNAs of

Caenorhabditis elegans. Genes Dev, 17, 991-1008.

[29] E. Lund, S. Guttinger, A. Calado, J. E. Dahlberg, and U. Kutay.

(2004) Nuclear export of microRNA precursors. Science, 303, 95–98.

[30] L. M. Manevitz, and M. Yousef. (2001) One-class SVMs for docu-

ment classification. Journal of Machine Learning, 2, 139–154.

[31] J. W. Nam, K. R. Shin, Y. V. Lee, N. Kim, and B. T. Zhang. (2005)

Human microRNA prediction through a probabilistic co-learning model

of sequence and structure. Nucleic Acids Res., 33, 3570–3581.

[32] U. Ohler, S. Yekta, L. P. Lim, D. P. Bartel, and C. B. Burge. (2004)

Patterns of flanking sequence conservation and a characteristic upstream

motif for microRNA gene identification. RNA, 10, 1309–1322.

[33] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C.

Williamson. (2001) Estimating the support of a high-dimensional distri-

bution. Neural Comput, 13, 1443–1471.

[34] P. Strom, O. S. Jr, M. Nedland, T. B. Grnfeld, Y. Lin, M. B.

Bass, J. Canon. (2006) Conserved microRNA characteristics in mam-

mals. Oligonucleotides, 16, 115–144.

[35] K. Szafranski, M. Megraw, M. Reczko, G. H. Hatzigeorgiou. (2006)

Support vector machine for predicting microRNA hairpins. Proc. The

2006 International Conference on Bioinformatics and Computational

Biology, 270–276.

[36] A. Tsirigos, and I. Rigoutsos. (2005) A sensitive, support-vector-

machine method for the detection of horizontal gene transfers in viral,

archaeal and bacterial genomes. Nucleic Acids Research, 33(12):3699–

3707.

[37] D. H. Tran, T. H. Pham, K. Satou, and T. B. Ho. (2008) Prediction

of microRNA hairpins using one-class support vector machine. Proc.

The 2nd international conference on bioinformatics and biomedical

engineering (iCBBE), Sanghai, China, May 16-18.

[38] V. Vapnik. Statistical learning theory, Wiley, Chichester, United

Kingdom, 19 98.

[39] X. Xie, J. Lu, E. J. Kulbokas, T. R. Golub, V. Mootha, K. Lindblad-

Toh, E. S. Lander, and M. Kellis. (2005) Systematic discovery of regula-

tory motifs in human promoters and 3’ UTRs by comparison of several

mammals. Nature, 434, 338–346.

[40] C. Xue, F. Li, T. He, G. P. Liu, Y. Li, and X. Zhang. (2005) Classi-

fication of real and pseudo microRNA precursors using local structure-

sequence features and support vector machine. BMC Bioinformatics, 6,

310.

[41] L. H. Yang, W. Hsu, M. L. Lee, and L. Wong. (2006) Identification

of microRNA precursors via SVM, Proc. The 4th Asia-

PacificBioinformatics Conference, 267–276.

[42] Y. Zeng, R. Yi, and B. R. Cullen. (2005) Recognition and cleavage

of primary microRNA precursors by the nuclear processing enzyme

Drosha. Embo J, 24,138–148.

[43] http://www.csie.ntu.edu.tw/ cjlin/libsvm/

[44] http://microrna.sanger.ac.uk/sequences/index.shtml

[45] http://www.tbi.univie.ac.at/RNA/