Protein-Protein Interaction Extraction Based on Convex Combination Kernel Function
Copyright © 2013 SciRes. JCC
Figure 2. An instance of the dependent information.
is convex, the first thing you need is to get the PPI per-
formance of each single kernel function. Under the above
feature selection strategy, then we can convert all in-
stances to feature matrixes. In order to obtain different
PPI extraction performance of the single kernel function,
the high dimension matrixes are obtained by different
feature matrixes mapping from single kernel function,
the high dimensional matrix after training can obtain the
different classification models. Finally using the test cor-
pus and we can achieve the PPI extraction of each single
kernel function.
For convex combination kernel function, the determi-
nation of convex combination ratio parameter is a very
important problem. Suppose there are n kinds of kernel
function, such as k1, …, kn. The convex combination
kernel (CCK, Convex Combination Kernel) function
consists by the n kinds of single kernel functions is
shown in type (1).
is called the convex combination
ratio parameter,
.
(1)
If the convex combination parameter selection may
have m, the computational complexity is O (mn). If the
parameter selection of the range is 0 to 1, step length is
0.1, and there ar e four kinds of kernel functions, that is to
say, m is 11, n is 4, and there are 14641 times you will
need to experiment to determine the optimal parameters
of kernel function, Obviously, O (mn), such a huge com-
putational complexity is unacceptable in practical appli-
cation. Aiming at this problem, this paper presents some
principles constantly looking for two kinds of kernel
function is the optimal convex combination kernel func-
tion instead of these two kinds of kernel function strategy,
for a variety of kernel function is the optimal kernel
function is convex combination. This strategy can be de-
creased from computational complexity O (m * (n − 1)),
under the condition of just to find the optimal convex
combination kernel function only needs 33 experiments.
As for kernel function with the worst performance for the
principle, to obtain the optimal algorithm of convex com-
bination kernel function is shown in Figure 3.
The optimal convex combinati on kernel functi on generation algor it hm
Algorithm input: N kinds of kernel function is a collection of PPI ex-
traction performance F, F = {Fi, i = 1,2….,n.}
Algorithm output: The optimal kernel function is convex combina-
tion(Optimal Convex Combination Kernel, OCCk)
Algorithm steps:
While(number(F)>1)
candidate1←mi n(F )
F=F-min (F )
candidate2←min(F)
F=F-min (F )
F=F+Optima l (cand idate1,candidat e2)
if number(F)=1
then Return Optima l (candidate1,candidate2)
△Number (F) is the numbe r o f F i in F
△min (F) is the minimum value in F
△Optimal (k1, k2) is the Optimal kernel function from k1,k2
Figure 3. The optimal convex combination kernel function
generation algorithm.
3. Experiment and Results
This Paper uses AIMed corpus as the experimental data.
AIMed is the corpus of PPI extraction used most fre-
quently. It contains 225 MEDLINE abstracts, there are
177 abstracts contain PPI instance, and 48 abstracts does
not contain the PPI instance, referring to 4084 proteins
entities. After preprocessing, this paper removes 59 au-
tocorrelation PPI instances, retains 154 nested instances.
Finally, there are 1000 positive instances and 4084 nega-
tive instances. In [11,12], they verified that the SVM
classification model have better effect in PPI extraction,
therefore, this paper also uses the SVM as a machine
learning method. In addition, in order to facilitate com-
parison with other experiments, the paper uses 10 times
the cross validation method. This paper uses precision,
recall and F1 Value to evaluate the r e sult.
3.1. Experimental Method
The paper sets up three groups of experiments:
• Experiment 1 will test different kernel functions for
the same characteristics in terms of PPI extraction
performances are different. Experiment will get all
the PPI extraction performance of the single kernel
function, and sort them, then search for the optimal
preparation convex combination kernel function. This
paper uses three kinds of single kernel function in-
cluding: Radial Basis Function, Polynomial Kernel
Function, Linear kernel. Respectively, such as type
(2), (3), (4), where x and y are two arbitrary dimen-
sion vector.
( )
( )
2
,exp ||
RBF
Kxyx y= −−
(2)
( )
( )
3
,
T
polynomial
Kxyxy= ⋅
(3)
(4)
• Experiment 2 will compare three optimal convex