Protein-Protein Interaction Extraction Based on Convex Combination Kernel Function

Copyright © 2013 SciRes. JCC

Figure 2. An instance of the dependent information.

is convex, the first thing you need is to get the PPI per-

formance of each single kernel function. Under the above

feature selection strategy, then we can convert all in-

stances to feature matrixes. In order to obtain different

PPI extraction performance of the single kernel function,

the high dimension matrixes are obtained by different

feature matrixes mapping from single kernel function,

the high dimensional matrix after training can obtain the

different classification models. Finally using the test cor-

pus and we can achieve the PPI extraction of each single

kernel function.

For convex combination kernel function, the determi-

nation of convex combination ratio parameter is a very

important problem. Suppose there are n kinds of kernel

function, such as k1, …, kn. The convex combination

kernel (CCK, Convex Combination Kernel) function

consists by the n kinds of single kernel functions is

shown in type (1).

is called the convex combination

ratio parameter,

.

(1)

If the convex combination parameter selection may

have m, the computational complexity is O (mn). If the

parameter selection of the range is 0 to 1, step length is

0.1, and there ar e four kinds of kernel functions, that is to

say, m is 11, n is 4, and there are 14641 times you will

need to experiment to determine the optimal parameters

of kernel function, Obviously, O (mn), such a huge com-

putational complexity is unacceptable in practical appli-

cation. Aiming at this problem, this paper presents some

principles constantly looking for two kinds of kernel

function is the optimal convex combination kernel func-

tion instead of these two kinds of kernel function strategy,

for a variety of kernel function is the optimal kernel

function is convex combination. This strategy can be de-

creased from computational complexity O (m * (n − 1)),

under the condition of just to find the optimal convex

combination kernel function only needs 33 experiments.

As for kernel function with the worst performance for the

principle, to obtain the optimal algorithm of convex com-

bination kernel function is shown in Figure 3.

The optimal convex combinati on kernel functi on generation algor it hm

Algorithm input: N kinds of kernel function is a collection of PPI ex-

traction performance F, F = {Fi, i = 1,2….,n.}

Algorithm output: The optimal kernel function is convex combina-

tion(Optimal Convex Combination Kernel, OCCk)

Algorithm steps:

While(number(F)>1)

candidate1←mi n(F )

F=F-min (F )

candidate2←min(F)

F=F-min (F )

F=F+Optima l (cand idate1,candidat e2)

if number(F)=1

then Return Optima l (candidate1,candidate2)

△Number (F) is the numbe r o f F i in F

△min (F) is the minimum value in F

△Optimal (k1, k2) is the Optimal kernel function from k1,k2

Figure 3. The optimal convex combination kernel function

generation algorithm.

3. Experiment and Results

This Paper uses AIMed corpus as the experimental data.

AIMed is the corpus of PPI extraction used most fre-

quently. It contains 225 MEDLINE abstracts, there are

177 abstracts contain PPI instance, and 48 abstracts does

not contain the PPI instance, referring to 4084 proteins

entities. After preprocessing, this paper removes 59 au-

tocorrelation PPI instances, retains 154 nested instances.

Finally, there are 1000 positive instances and 4084 nega-

tive instances. In [11,12], they verified that the SVM

classification model have better effect in PPI extraction,

therefore, this paper also uses the SVM as a machine

learning method. In addition, in order to facilitate com-

parison with other experiments, the paper uses 10 times

the cross validation method. This paper uses precision,

recall and F1 Value to evaluate the r e sult.

3.1. Experimental Method

The paper sets up three groups of experiments:

• Experiment 1 will test different kernel functions for

the same characteristics in terms of PPI extraction

performances are different. Experiment will get all

the PPI extraction performance of the single kernel

function, and sort them, then search for the optimal

preparation convex combination kernel function. This

paper uses three kinds of single kernel function in-

cluding: Radial Basis Function, Polynomial Kernel

Function, Linear kernel. Respectively, such as type

(2), (3), (4), where x and y are two arbitrary dimen-

sion vector.

( )

( )

2

,exp ||

RBF

Kxyx y= −−

(2)

( )

( )

3

,

T

polynomial

Kxyxy= ⋅

(3)

(4)

• Experiment 2 will compare three optimal convex