A Journal of Software Engineering and Applications, 2012, 5, 55-58
doi:10.4236/jsea.2012.512b012 Published Online December 2012 (http://www.scirp.org/journal/jsea)
Copyright © 2012 SciRes. JSEA
Text Classification Using Su pport Vector Machine
with Mixture of Kernel
Liwei Wei1, Bo Wei2, Bin Wang1
1China National Institute of S tandardizatio n, Beijing 100088, China; 2Depar tment of Ideological and Political Theory Course
Teaching Research, Wuhan University of Technology, Wuhan, China.
Email: weilw@cnis.gov.cn, willweixiaobo_1001@163.com, wangbin@cnis.gov.cn
Received 2012.
ABSTRACT
Recent studies have revealed that emerging modern machine learning techniques are advantageo us to statistical models
for text class ification, suc h as S VM. In this study, we discu ss the applicatio ns of the supp ort vector machine with mix-
ture o f kerne l (SVM-MK) to design a text cla ssification system. Di ffe ring from the standard SVM, the SV M-MK uses
the 1-norm based object function and adopts the convex combinations of single feature basic kernels. Only a linear pro-
gramming problem needs to be resolved and it greatly reduces the computational costs. More important, it is a transpa-
rent model and the optimal feature subset can be obtained automatically. A real Chinese c orpus from Fud an U niver sity
is used to demonstrate the good performance of the SVM- MK.
Keywords: Text Classification; SVM -MK; Feature se le c tion; Cla ssification model; SVM
1. Introduction
With the arrival of "information explosion" era, the
infinite growth information re sources put forward a great
challenge to the info rmation pr o ce ssin g. O n t he one hand ,
the digital i nformation resources increase at a high speed.
On the other hand, People obtain valuable information
demand also continuously improve. So, how search out
the effective information in the vast and complex infor-
mation, which has been the goal of information
processing field.
Text c lassifica tion is t he research focus of information
processing areas. A text file is the object of study in this
method. And a lot of files set are mapped to a predefined
text attribute class. And its task will be to divide hyper-
text files into several categories according to predefined
contents. Almost all areas are involved in this kind of
problems. For example, email filtering, web search, of-
fice automation, subject indexing and classification of
news stor ie s.
In text classificatio n syste m, the classifier i s a key part,
the classifier performance quality directly related to the
effect of text clas sification and efficiency. At present,
most of the classifier reference to the methods from in-
formation retr ieval and machine learning. And almost all
the important machine learning algorithms is introduced
in text classific ation. Such as, support vector machine
(SVM), least square support vector machine(LS-SVM).
Support vector machine (SVM) is first proposed by
Vapnik[1]. Now it has been proved to be a powerful and
promising data classification and func ti on e stimation tool.
In the first part, Joachims introduced SVM method into
the text classification [2]. He compared the traditional
method to the SVM method in text classification, which
shows that SVM in text categorization has the excellent
properties than other algorithms. Reference [3] and [4]
applied SVM to text classification. They have obtained
some valuable results. But SVM is sensitive to outliers
and noises in the training sample and has limited inter-
pretability due to its kernel theory. Another problem is
that SVM has a high computational complexity because
of the solving of large scale quadratic programming in
parameter iterative learning procedure. And as we know
feature selection is a very import method to the high
vector space of text. The last problem of SVM is that
SVM can not realize the feature selection.
Motivated by above questions and ideas, we propose a
new method named support vector machines with mix-
ture of kernel (SVM-MK) to classify the text. In this
met ho d the kernel is a convex co mbinatio n of many fi-
nitely basic kernels. Eac h basic kernel has a kernel coef-
ficient and is pro vided with a single feature. The 1-norm
is utilized in SV M-MK. As a result, its objective function
Text Classification Using Support Vector Machine with Mix ture of Kernel
Copyright © 2012 SciRes. JSEA
56
turns into a linear programming parameter iterative
learning procedure and grea tly r ed uc es the co mput ational
complexity. Furthermore, we can select the optimal fea-
ture subset automatically a nd get an interpretable model.
The rest of this paper is organized as follows: section
2 gives a brie f outli ne of SVM-MK. To evaluate the per-
for mance o f SVM -MK for the text classi ficatio n, we use
a real life Chinese corpus from Fudan University in this
test in section 3. Finally, section 4 draws the conclusion
and gives an outlook of possible future research areas.
2. Support Vector Machine with Mixture
of Kernel
Considering a training data set
( ){}
n
i
ii yxG 1
,=
=
,
m
iRx
is the
th
i
input pattern and
i
y
is its corresponding ob-
served result
{ }
11 −+∈
i
y
. In test classification model,
i
xdenotes the attributes of text vector,
i
y
is class la-
bel.
The optimal separating hyper-plane is found by solv-
ing the following regularized optimization problem [1]:
( )
=
+= n
ii
cJ 1
2
2
1
,min
ξωξω
(1)
( )
( )
ni
bxy
ts
i
ii
T
i
,1
0
1
.. =
−≥+
ξ
ξφω
(2)
\where
is a constant denoting a trade-off between the
margin and the sum of total errors.
( )
x
φ
is a nonlinear
function that maps the input space into a higher dimen-
sional feature space. The margin between the two parts
is
ω
2
.
The quadratic optimization problem can be solved by
tra nsfo rming (1) and (2) into the saddle point of the La-
grange dual functi on:
( )
∑∑
==
n
ji jijiji
n
ii
xxkyy
1,1
,
2
1
max 
ααα
(3)
=≤≤
=
=
nic
y
ts
i
n
iii
,,1,0
0
..
1
α
α
(4)
where
( )
( )
( )
jiji xxxxk 
φφ
•=,
is called the kernel func-
tion,
i
α
are the Lagrange multiplie rs.
Recently, how to learn the kernel from data draws
many researchers’ attention[5]. And some formulations
[6-7] have been proposed to perform the optimization in
manner of convex combinations of basic kernels. In
practice, a simple and efficient method is that the kernel
function being illustrated as the convex of combinations
of the basic kernel:
( )()
=
=
n
idjdidji
xxkxxk
1,,
,,
β

(5)
where
di
x
,
denotes the
th
d
component of the input
vector i
x
.
Substituting Equation (5) into Equation (3), and mul-
tiplyi n g Equation (3) and (4) by
d
β
, suppose
didi
βαγ
⋅=
,
, then the Lagrange dual problem change
into:
( )
∑∑=
mn
dji djdijidjdi
di di
xxkyy
,
1,, ,,,,
,,
,
2
1
max
γγγ
(6)
=≥
=≤≤
=
=
md
nic
y
ts
di
m
ddi
di dii
,,1,0
,,1,0
0
..
,
1,
,,
γ
γ
γ
(7)
The new coefficientdi,
γ
replaces the Lagrange coeffi-
cient i
α
. The number of coefficient that needs to be op-
timized is increased from
n
to
mn×
. It increases the
computational cost especially when the number of the
attributes in the da taset is large. The linear programming
implementation of SVM is a promising approach to re-
duce the computational cost of SVM and attracts some
scholars’ atten tion. Based on above idea, a 1-norm based
linear programming is proposed:
( )
∑∑
=
+=
n
ii
didi
J
1, ,
,min
ξλγξγ
(8)
( )
( )
=≥
=≥
−≥+
md
ni
bxxkyy
ts
di
i
i
dj djdijdji
,,1,0
,,1,0
1,
..
,
,,,,
γ
ξ
ξγ
(9)
In equation (8), the regularized parameter
λ
controls
the sparse of the coefficie ntdi,
γ
.
The dual of this line ar pro gram mi ng i s :
=
n
ii
u
1
max
(10)
( )
=≤≤
=
==≤
=
=
niu
yu
mdnjxxkyyu
ts
i
n
iii
n
idjdijii
,,1,0
0
,,1.,,1,1,
.. 1
1,,

λ
(11)
The choice of kernel function includes the linear ker-
nel, polynomial kernel or RBF kernel. Thus, the
SVM-MK classifier can be represented as:
( )
( )
( )
+=
dj djdijdj
bxxkysignxf
,,,,
,
γ
(12)
It can be found that above linear programming formu-
lation and it s dual description is equivalent to that of the
approach called “mixture of kernel[7-8]. So the new
Text Classification Using Support Vector Machine with Mix ture of Kernel
Copyright © 2012 SciRes. JSEA
57
coefficient di,
γ
is called the mixture coefficient. Thus
this approach is named support vector machine with
mixture of kernel” (SVM-MK). And we can obtain the
sparse solution of coefficient di,
γ
using 1-norm. So the
SVM-MK model greatly reduces the computational
complexity. It is more important that the sparse coeffi-
cients
di,
γ
give us more choices to extract the satisfied
features in the whole space spanned by all the attributes.
1. Experiment analysis
In this section, we use the typical experimental data:
Chi nese corpus that is collected by Fudan University
Dr. Li Ronglu. The corpus including the training set
and testing set. There are 1882 documents in the train-
ing set. The test set contains 934 documents which
have no class ification label. And the test set is divided
into 10 classes. The proportion of training set text
number and test set text number is two-to-one.
In automatic text classification system, will be used
in the experiment data is usually divided into two parts:
the training set and testing set. The so-called training
set is composed of a set of have finished classification
(namely has a given category label) text, which used
for summed up the characteristics of each category in
structure classifier. According to the classification
system settings, each class should contain a certain
amount of training text.
The test set is the collection of documents that used
to test the effect o f the classificatio n. Each one of these
texts was through the classifier classification, and then
the cla ss i fica tio n r esu lt s co ntrast to t he correct deci sion .
Thus we can evaluate the effect of classifier. But the
test set is no t participated in constructing the clas sifier.
Table 1 experimental dat a
Text Class Number o f ex perimen tal set
Training set Test set
Art 166 82
Computer 134 66
Economy 217 108
Education 147 73
Environment 1 34 67
medicine 136 68
Polic y 338 167
Sports 301 149
Transportation 143 71
Military 166 83
In addition, three evaluation criteria measure the effi-
ciency of text cla ssification:
ca
a
recalli+
=
(13)
ba
a
precisioni+
=
(14)
recallprecision
recallprecision
F+
××
=2
1
(15)
Where
a
is the positive e xample test d ocumentat ions that
are correctly classified as belongs to the number of this
kind .
b
is the negative example test documentations
that are be error classified for belong to the number of
this kind.
is the positive example test documenta-
tions that are be error classified for does not belong to
the number of this ki nd.
Firstly, the data is pre-processed by the ICTCLAS
(Institute of Computing Technology, Chinese Lexical
Analysis System)[9]. In this method the Gaussian ker-
nel is used, and the kernel parameter needs to be cho-
sen. Thus the method has two parameters to be pre-
pared set: the kernel parameter
2
σ
and the regularized
parameter
λ
. The recall, precision and F1 for each
category text using SVM-MK approach are s h own in
Table 2.
From Table 2 and Figure 1, we can conclude that
the SVM-MK model has better text cla ss ification ca-
pability in term of the recall, precision and the F1 in
comparison with the traditional improved SVM mod-
els[10].This method is of better text cla ssification per-
formance in kinds of art, computer, politics, environ-
ment, sports, and transportation. Compared to the tra-
ditional improved SVM, MK-SVM is not very good in
Table 2. Experimenta l results us ing SVM -MK.
Text Class Evaluation index
Precision Recall F1
Art 100.00 97.02 98.49
Computer 98.62 98.79 98.7
Economy 95.10 94.65 94.87
Education 98.64 94.67 96.61
Environment 1 00.00 93.17 96.46
medicine 97.73 95.48 96.59
Polic y 93.23 98.15 95.63
Sports 96.17 99.06 97.59
Transportation 98.57 95.06 96.78
Military 90.21 88.72 89.46
Text Classification Using Support Vector Machine with Mix ture of Kernel
Copyright © 2012 SciRes. JSEA
58
82.00%
84.00%
86.00%
88.00%
90.00%
92.00%
94.00%
96.00%
98.00%
100.00%
102.00%
art
computer
economy
education
environment
medicine
poliocy
sport
transportation
military
recall
MK-SVM recall
precision
MK-SVM precision
F1
MK-SVM F1
Figure 1. Comparison of results of traditional SVM and
MK-SVM.
classifying the kinds of economy, military. This may
be because i n remo ving rele vant feat ur es of te st r e s ul ts,
and lo st so me infor mation. So that the recall rate index
is affected. This is also need to further improve. Con-
sequently, the proposed SVM-MK model can provide
efficient alternatives in conducting text classification
tasks.
2. Conclusions
This paper presents a novel SVM-MK text classification
model. By using the 1-norm and a convex combination
of basic kernels, the object function which is a quadratic
programming problem in the standard SVM becomes a
linea r pro grammi ng paramete r iterative learning problem
so that greatly reducin g the computational costs. In prac-
tice, it is not difficult to adjust kernel parameter and re-
gularized parameter to obtain a satisfied classification
result. Through the practical data experime n t , we have
obtained good classification results and meanwhile
demonstrated that SVM-MK model is of good perfo r-
mance in text classification syste m. Thus the SVM-MK
is a transpar ent model, and it provides efficient alterna-
tives in conducting text classification tasks. Future stu-
dies will aim at finding the law existing in the parame-
ters’ setting. Generalizing the rules by the features that
have been selected is another f urther wo rk.
3. Acknowledgements
This research has been supported by a public benefit
special fund from Quality inspection industry of China
(#201210011).
REFERENCES
[1] V. Vapnik, “The nature of statistic learning theory.
Springer, New York, 1995.
[2] T. Joachims, Text Categorization with Support Vector
Machines Learning with Many Relevant Features,” In
European Conference on Machine Learning ( ECML).
Chemnitz, Germany: [s.n.], 1998, pp. 137-142.
[3] T. Gartner, P. A. Flach, “WBCSVM: Weighted Bayesian
Classification based on support vector machine,” 18th Int.
Conf. on Machine Learning. Willianstown, Carla E.
Brodley, Andrea Pohoreckyj Danyluk, (eds.), 2001, pp.
207–209.
[4] ChengHua Li, JuCheng Yang, S. C. P ar k, “Text catego-
rization algorithms using semantic approaches, cor-
pus-based thesaurus and WordNet,” Expert Syst. Appl.
39(1), pp. 765-772, 2012.
[5] A. Ch. Micchelli, M. Pontil, “Learning the kernel func-
tion via regularization,” Journal of Machine Learning
Research, 6, 2005, pp. 1099-1125.
[6] G. R.G. Lanckrient, N. Cristianini, P. Bartlett, L. El
Ghaoui, M.I. Jordan. Learning the kernel matrix with se-
mide finite programming. Journal of Machine Learning
Research, 5, 2004, pp. 27-72.
[7] F.R. Bach, G. R.G. Lanckrient, M.I. Jordan. Multiple
kernel learning, conic duality and the SMO algorithm.
Twent y First International Conference on Machine
Learning, 2004, pp. 41-48.
[8] L.W. Wei, J.P. Li, Z.Y. Chen. Credit Risk Evaluation
Using Support Vector Machine with Mixture of Kernel,
The 7th International Conference on Computational
Science 2007, Lecture Notes in Computer Science 4488,
2007, pp. 431-438.
[9] Institute of Computing Technology, Chinese Lexical
Analysis System:
http://www.nlp.org.cn/project/project.php?proj_id=6.
[10] F. Jiang, “Research on Chinese Text Categor ization based
on Support Vector Machine,” Degree of Master paper,
Chongqing University, 2009.