Text Classification Using Support Vector Machine with Mixture of Kernel

doi:10.4236/jsea.2012.512B012

Paper Menu >>

Journal Menu >>

A Journal of Software Engineering and Applications, 2012, 5, 55-58

doi:10.4236/jsea.2012.512b012 Published Online December 2012 (http://www.scirp.org/journal/jsea)

Text Classification Using Su pport Vector Machine

with Mixture of Kernel

Liwei Wei1, Bo Wei2, Bin Wang1

1China National Institute of S tandardizatio n, Beijing 100088, China; 2Depar tment of Ideological and Political Theory Course

Teaching Research, Wuhan University of Technology, Wuhan, China.

Email: weilw@cnis.gov.cn, willweixiaobo_1001@163.com, wangbin@cnis.gov.cn

Received 2012.

ABSTRACT

Recent studies have revealed that emerging modern machine learning techniques are advantageo us to statistical models

for text class ification, suc h as S VM. In this study, we discu ss the applicatio ns of the supp ort vector machine with mix-

ture o f kerne l (SVM-MK) to design a text cla ssification system. Di ffe ring from the standard SVM, the SV M-MK uses

the 1-norm based object function and adopts the convex combinations of single feature basic kernels. Only a linear pro-

gramming problem needs to be resolved and it greatly reduces the computational costs. More important, it is a transpa-

rent model and the optimal feature subset can be obtained automatically. A real Chinese c orpus from Fud an U niver sity

is used to demonstrate the good performance of the SVM- MK.

Keywords: Text Classification; SVM -MK; Feature se le c tion; Cla ssification model; SVM

1. Introduction

With the arrival of "information explosion" era, the

infinite growth information re sources put forward a great

challenge to the info rmation pr o ce ssin g. O n t he one hand ,

the digital i nformation resources increase at a high speed.

On the other hand, People obtain valuable information

demand also continuously improve. So, how search out

the effective information in the vast and complex infor-

mation, which has been the goal of information

processing field.

Text c lassifica tion is t he research focus of information

processing areas. A text file is the object of study in this

method. And a lot of files set are mapped to a predefined

text attribute class. And its task will be to divide hyper-

text files into several categories according to predefined

contents. Almost all areas are involved in this kind of

problems. For example, email filtering, web search, of-

fice automation, subject indexing and classification of

news stor ie s.

In text classificatio n syste m, the classifier i s a key part,

the classifier performance quality directly related to the

effect of text clas sification and efficiency. At present,

most of the classifier reference to the methods from in-

formation retr ieval and machine learning. And almost all

the important machine learning algorithms is introduced

in text classific ation. Such as, support vector machine

(SVM), least square support vector machine(LS-SVM).

Support vector machine (SVM) is first proposed by

Vapnik[1]. Now it has been proved to be a powerful and

promising data classification and func ti on e stimation tool.

In the first part, Joachims introduced SVM method into

the text classification [2]. He compared the traditional

method to the SVM method in text classification, which

shows that SVM in text categorization has the excellent

properties than other algorithms. Reference [3] and [4]

applied SVM to text classification. They have obtained

some valuable results. But SVM is sensitive to outliers

and noises in the training sample and has limited inter-

pretability due to its kernel theory. Another problem is

that SVM has a high computational complexity because

of the solving of large scale quadratic programming in

parameter iterative learning procedure. And as we know

feature selection is a very import method to the high

vector space of text. The last problem of SVM is that

SVM can not realize the feature selection.

Motivated by above questions and ideas, we propose a

new method named support vector machines with mix-

ture of kernel (SVM-MK) to classify the text. In this

met ho d the kernel is a convex co mbinatio n of many fi-

nitely basic kernels. Eac h basic kernel has a kernel coef-

ficient and is pro vided with a single feature. The 1-norm

is utilized in SV M-MK. As a result, its objective function

Text Classification Using Support Vector Machine with Mix ture of Kernel

turns into a linear programming parameter iterative

learning procedure and grea tly r ed uc es the co mput ational

complexity. Furthermore, we can select the optimal fea-

ture subset automatically a nd get an interpretable model.

The rest of this paper is organized as follows: section

2 gives a brie f outli ne of SVM-MK. To evaluate the per-

for mance o f SVM -MK for the text classi ficatio n, we use

a real life Chinese corpus from Fudan University in this

test in section 3. Finally, section 4 draws the conclusion

and gives an outlook of possible future research areas.

2. Support Vector Machine with Mixture

of Kernel

Considering a training data set

( ){}

ii yxG 1

=

iRx ∈



is the

input pattern and

is its corresponding ob-

served result

{ }

11 −+∈ ，

. In test classification model,

xdenotes the attributes of text vector,

is class la-

bel.

The optimal separating hyper-plane is found by solv-

ing the following regularized optimization problem [1]:

( )

∑=

+= n

cJ 1

,min

ξωξω



(1)

( )

bxy





.. =







≥

−≥+

ξφω

(2)

\where

is a constant denoting a trade-off between the

margin and the sum of total errors.

( )



is a nonlinear

function that maps the input space into a higher dimen-

sional feature space. The margin between the two parts

The quadratic optimization problem can be solved by

tra nsfo rming (1) and (2) into the saddle point of the La-

grange dual functi on:

( )











−

∑∑

ji jijiji

xxkyy

1,1

max 

ααα

(3)











=≤≤

∑

nic

iii

,,1,0



(4)

where

( )

jiji xxxxk 

φφ

•=,

is called the kernel func-

tion,

are the Lagrange multiplie rs.

Recently, how to learn the kernel from data draws

many researchers’ attention[5]. And some formulations

[6-7] have been proposed to perform the optimization in

manner of convex combinations of basic kernels. In

practice, a simple and efficient method is that the kernel

function being illustrated as the convex of combinations

of the basic kernel:

( )()

∑

idjdidji

xxkxxk

1,,



(5)

where

denotes the

component of the input

vector i



Substituting Equation (5) into Equation (3), and mul-

tiplyi n g Equation (3) and (4) by

, suppose

didi

βαγ

⋅=

, then the Lagrange dual problem change

into:

( )











−

∑∑=

dji djdijidjdi

di di

xxkyy

1,, ,,,,

max

γγγ

(6)











=≥

=≤≤

∑

nic

ddi

di dii

,,1,0



(7)

The new coefficientdi,

replaces the Lagrange coeffi-

cient i

. The number of coefficient that needs to be op-

timized is increased from

mn×

. It increases the

computational cost especially when the number of the

attributes in the da taset is large. The linear programming

implementation of SVM is a promising approach to re-

duce the computational cost of SVM and attracts some

scholars’ atten tion. Based on above idea, a 1-norm based

linear programming is proposed:

( )

∑∑

didi

1, ,

,min

ξλγξγ



(8)

( )











=≥

−≥+

∑

bxxkyy

dj djdijdji

,,1,0

,,,,



ξγ

(9)

In equation (8), the regularized parameter

controls

the sparse of the coefficie ntdi,

The dual of this line ar pro gram mi ng i s :

∑

max

(10)

( )











=≤≤

==≤

∑

niu

mdnjxxkyyu

iii

idjdijii

,,1,0

,,1.,,1,1,

.. 1

1,,





(11)

The choice of kernel function includes the linear ker-

nel, polynomial kernel or RBF kernel. Thus, the

SVM-MK classifier can be represented as:

( )

∑

dj djdijdj

bxxkysignxf

,,,,



(12)

It can be found that above linear programming formu-

lation and it s dual description is equivalent to that of the

approach called “mixture of kernel” [7-8]. So the new

Text Classification Using Support Vector Machine with Mix ture of Kernel

coefficient di,

is called the mixture coefficient. Thus

this approach is named “support vector machine with

mixture of kernel” (SVM-MK). And we can obtain the

sparse solution of coefficient di,

using 1-norm. So the

SVM-MK model greatly reduces the computational

complexity. It is more important that the sparse coeffi-

cients

di,

give us more choices to extract the satisfied

features in the whole space spanned by all the attributes.

1. Experiment analysis

In this section, we use the typical experimental data:

Chi nese corpus that is collected by Fudan University

Dr. Li Ronglu. The corpus including the training set

and testing set. There are 1882 documents in the train-

ing set. The test set contains 934 documents which

have no class ification label. And the test set is divided

into 10 classes. The proportion of training set text

number and test set text number is two-to-one.

In automatic text classification system, will be used

in the experiment data is usually divided into two parts:

the training set and testing set. The so-called training

set is composed of a set of have finished classification

(namely has a given category label) text, which used

for summed up the characteristics of each category in

structure classifier. According to the classification

system settings, each class should contain a certain

amount of training text.

The test set is the collection of documents that used

to test the effect o f the classificatio n. Each one of these

texts was through the classifier classification, and then

the cla ss i fica tio n r esu lt s co ntrast to t he correct deci sion .

Thus we can evaluate the effect of classifier. But the

test set is no t participated in constructing the clas sifier.

Table 1 experimental dat a

Text Class Number o f ex perimen tal set

Training set Test set

Art 166 82

Computer 134 66

Economy 217 108

Education 147 73

Environment 1 34 67

medicine 136 68

Polic y 338 167

Sports 301 149

Transportation 143 71

Military 166 83

In addition, three evaluation criteria measure the effi-

ciency of text cla ssification:

recalli+

(13)

precisioni+

(14)

recallprecision

F+

××

(15)

Where

is the positive e xample test d ocumentat ions that

are correctly classified as belongs to the number of this

kind .

is the negative example test documentations

that are be error classified for belong to the number of

this kind.

is the positive example test documenta-

tions that are be error classified for does not belong to

the number of this ki nd.

Firstly, the data is pre-processed by the ICTCLAS

(Institute of Computing Technology, Chinese Lexical

Analysis System)[9]. In this method the Gaussian ker-

nel is used, and the kernel parameter needs to be cho-

sen. Thus the method has two parameters to be pre-

pared set: the kernel parameter

and the regularized

parameter

. The recall, precision and F1 for each

category text using SVM-MK approach are s h own in

Table 2.

From Table 2 and Figure 1, we can conclude that

the SVM-MK model has better text cla ss ification ca-

pability in term of the recall, precision and the F1 in

comparison with the traditional improved SVM mod-

els[10].This method is of better text cla ssification per-

formance in kinds of art, computer, politics, environ-

ment, sports, and transportation. Compared to the tra-

ditional improved SVM, MK-SVM is not very good in

Table 2. Experimenta l results us ing SVM -MK.

Text Class Evaluation index

Precision Recall F1

Art 100.00 97.02 98.49

Computer 98.62 98.79 98.7

Economy 95.10 94.65 94.87

Education 98.64 94.67 96.61

Environment 1 00.00 93.17 96.46

medicine 97.73 95.48 96.59

Polic y 93.23 98.15 95.63

Sports 96.17 99.06 97.59

Transportation 98.57 95.06 96.78

Military 90.21 88.72 89.46

Text Classification Using Support Vector Machine with Mix ture of Kernel

82.00%

84.00%

86.00%

88.00%

90.00%

92.00%

94.00%

96.00%

98.00%

100.00%

102.00%

art

computer

economy

education

environment

medicine

poliocy

sport

transportation

military

recall

MK-SVM recall

precision

MK-SVM precision

MK-SVM F1

Figure 1. Comparison of results of traditional SVM and

MK-SVM.

classifying the kinds of economy, military. This may

be because i n remo ving rele vant feat ur es of te st r e s ul ts,

and lo st so me infor mation. So that the recall rate index

is affected. This is also need to further improve. Con-

sequently, the proposed SVM-MK model can provide

efficient alternatives in conducting text classification

tasks.

2. Conclusions

This paper presents a novel SVM-MK text classification

model. By using the 1-norm and a convex combination

of basic kernels, the object function which is a quadratic

programming problem in the standard SVM becomes a

linea r pro grammi ng paramete r iterative learning problem

so that greatly reducin g the computational costs. In prac-

tice, it is not difficult to adjust kernel parameter and re-

gularized parameter to obtain a satisfied classification

result. Through the practical data experime n t , we have

obtained good classification results and meanwhile

demonstrated that SVM-MK model is of good perfo r-

mance in text classification syste m. Thus the SVM-MK

is a transpar ent model, and it provides efficient alterna-

tives in conducting text classification tasks. Future stu-

dies will aim at finding the law existing in the parame-

ters’ setting. Generalizing the rules by the features that

have been selected is another f urther wo rk.

3. Acknowledgements

This research has been supported by a public benefit

special fund from Quality inspection industry of China

(#201210011).

REFERENCES

[1] V. Vapnik, “The nature of statistic learning theory.

Springer, New York, 1995.

[2] T. Joachims, “Text Categorization with Support Vector

Machines Learning with Many Relevant Features,” In

European Conference on Machine Learning ( ECML).

Chemnitz, Germany: [s.n.], 1998, pp. 137-142.

[3] T. Gartner, P. A. Flach, “WBCSVM: Weighted Bayesian

Classification based on support vector machine,” 18th Int.

Conf. on Machine Learning. Willianstown, Carla E.

Brodley, Andrea Pohoreckyj Danyluk, (eds.), 2001, pp.

207–209.

[4] ChengHua Li, JuCheng Yang, S. C. P ar k, “Text catego-

rization algorithms using semantic approaches, cor-

pus-based thesaurus and WordNet,” Expert Syst. Appl.

39(1), pp. 765-772, 2012.

[5] A. Ch. Micchelli, M. Pontil, “Learning the kernel func-

tion via regularization,” Journal of Machine Learning

Research, 6, 2005, pp. 1099-1125.

[6] G. R.G. Lanckrient, N. Cristianini, P. Bartlett, L. El

Ghaoui, M.I. Jordan. Learning the kernel matrix with se-

mide finite programming. Journal of Machine Learning

Research, 5, 2004, pp. 27-72.

[7] F.R. Bach, G. R.G. Lanckrient, M.I. Jordan. Multiple

kernel learning, conic duality and the SMO algorithm.

Twent y First International Conference on Machine

Learning, 2004, pp. 41-48.

[8] L.W. Wei, J.P. Li, Z.Y. Chen. Credit Risk Evaluation

Using Support Vector Machine with Mixture of Kernel,

The 7th International Conference on Computational

Science 2007, Lecture Notes in Computer Science 4488,

2007, pp. 431-438.

[9] Institute of Computing Technology, Chinese Lexical

Analysis System:

http://www.nlp.org.cn/project/project.php?proj_id=6.

[10] F. Jiang, “Research on Chinese Text Categor ization based

on Support Vector Machine,” Degree of Master paper,

Chongqing University, 2009.