Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

doi:10.4236/jilsa.2011.33015

Paper Menu >>

Journal Menu >>

Journal of Intelligent Learning Systems and Applications, 2011, 3, 131-138

doi:10.4236/jilsa.2011.33015 Published Online August 2011 (http://www.scirp.org/journal/jilsa)

131

Insertion of Ontological Knowledge to Improve

Automatic Summarization Extraction Methods

Jésus Antonio Motta, Laurence Capus, Nicole Tourigny

Département D’Informatique et de Génie Logiciel, Université Laval, Québec, Canada.

Email: jesus-a.motta.1@ulaval.ca, {laurence.capus, nicole.tourigny}@ift.ulaval.ca

Received May 30th, 2011; revised July 20th, 2011; accepted July 27th, 2011.

ABSTRACT

The vast availability of information sources has created a need for research on automatic summarization. Current

methods perform either by extraction or abstraction. The extraction methods are interesting, because they are robust

and independent of the language used. An extractive summary is obtained by selecting sentences of the original source

based on information content. This selection can be automated using a classification function induced by a machine

learning algorithm . This function classifies sentences into two groups : important or non-importan t. The important sen-

tences then form the summary. But, the efficiency of this function directly depends on the used training set to induce it.

This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological

knowledge bases. The training set optimized is reinforced by ontological knowledge. An experiment with four machine

learning algorithm s was made to valida te this proposition. The improvement achieved is clearly significant for each of

these algorithms.

Keywords: Automatic Summarization, Ontology , Machine Learning, Extraction Method

1. Introduction

Research works on automatic summarization have greatly

increased in recent years. Indeed, digital sources of in-

formation have become increasingly available. When a

user runs a query on Internet, she/he must choose among

the retrieved documents those containing relevant infor-

mation for her/him. The task becomes more difficult

when the number of documents increase. An automated

system able to ‘discover’ the essential informatio n is one

of the challenges of artificial intelligence, especially in

natural language processing. In some cases, methods of

machine learning based on symbols are used to tackle

this problem.

Automatic summarization can be seen as a problem of

transforming one or more documents in a shorter version

with preserving information content [1]. The methods

used are divided into two main approaches: extraction

and abstraction, respectively surface methods and deep

methods in a more linguistic viewpoint. A summary ob-

tained by extraction is composed of a set of sentences

selected from the source document(s) by using statistical

or heuristic methods based on information entropy of

sentences. The summarization process by extraction is a

relevant alternative, robust and independent of language,

compared with the summarization process by abstraction

[2]. An abstractive summary is obtained by semantic

analysis in order to interpret the source text, and find

new concepts to generate a new text that will be the

summary. This method requires linguistic processing at a

certain level [3]. In addition, a summary can be produ ced

in a generic way to give a general idea of the contents of

documents to be summarized. It can also be based on

keywords supplied by the user. In this case, it will con-

tain the most relevant information related to these key-

words [4]. Automatic summarization process by abstrac-

tion is usually decomposed into three steps: interpreta-

tion of source document(s) to obtain representation,

transformation of this representation, and production of a

textual synthesis [5]. Both approaches have their advan-

tages and drawbacks. For this research, we are only in-

terested in automatic summarization process by extrac-

tion and how to improv e it.

The main problem of this kind automatic summariza-

tion by extraction lies in identifying the most important

information of the document sources [2]. Different

methods have been used until now with more or less

successful results according to measurements based on

recall (the number of correct sentences selected on the

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

132

total number of correct sentences) and precision (the

number of correct sentences on the total number of se-

lected sentences) [6]. Some methods use an ontology or

ontological knowledge to analyze terms and relations [7].

More recently, other methods have been reported using

machine learning algorithms for determining descrip-

tions of concepts. These methods build a training set di-

vided into two subsets: important sentences and non-

important sentences [1]. This training set is next used to

induce a classification function from the concepts de-

scription. This function will serve to classify future sen-

tences to produce new summaries. Generally in classifi-

cation problems, the set of attributes is very large and

entropic, with much noise and irrelevant attributes. The

well-known underlying problem is named the curse of

dimensionality [8]. Indeed, data too scattered do not fa-

cilitate a good estimate, nor obtain good classification

models. This problem is actually tackled by using heuris-

tic methods based on linear approximations, which opti-

mize the training set by reducing it or constructing a new

smaller set from another series of attributes [9]. The ob-

tained results so far, even if they have progressed, could

be further improved.

In this paper, we propose to optimize the training set

in an original way. We insert lexemes of ontological

knowledge bases into the training set to form a concept-

tual space, which will be used by the learning algorithm.

Our hypothesis is that is possible to obtain a reinforced

set, by using ontological knowledge to select or trans-

form the characteristics of the set. We validated our hy-

pothesis with four machine learning algorithms. We

compared their performance by using various evaluation

indicators. The obtained results showed that our solution

improves the performance; it is then promising for the

suite of this research. In Section 2, we will describe the

solution proposed. In Section 3, we will presen t the con-

ducted experimentation to validate our solution. In Sec-

tion 4, we will conclude our paper by giving future work.

2. Insert Ontological Knowledge in

Summary Extraction Process

Automatic summarization by extraction is a broad topic

that uses different approaches, methods or techniques. It

seems important at first to give our research framework,

i.e. the process that we have considered and decided to

improve. Then, we explain what we mean by the insertion

of ontological knowledge and how this insertion fits into

the summarization process. Finally, we give the evalua-

tion methods that have allowed us to validate our hypo-

thesis.

2.1. Summarization Process Considered

The different methods used for automatic summarization

by extraction can be grouped into three approaches: sta-

tistical, enriched statistical and machine learning [5]. In

this research work, we are interested especially in ma-

chine learning approaches because the results obtained

are relevant and promising. The key item of these ap-

proaches lies on the choice of the training set and its op-

timization, which will be used to induce the classification

function for summarizing futures documents in function

of information content. More precisely, the sentences of

the documents are represented by vectors, which consti-

tute an initial matrix [10 -12]. This matrix corresponds to

the training set. The induced classification function en-

ables to classify sentences into two classes: class 1 for

important sentences and class –1 for non-important sen-

tences. The summary will be then composed of the sen-

tences of class 1. The crucial problem of this process is

the fact that the sentences of this matrix are very entropic.

It is necessary to optimize the matrix in order that it be-

comes an efficient training set.

Although many efforts have been made to improve the

quality of summaries obtained, thus approaching those

achieved by humans, there are still gaps in terms of ac-

curacy and precision of results. Moreover, most of the

summaries obtained are built from a single document.

The most evident explanation is that the problems of

redundancy increases along with the number of docu-

ments to be summarized.

The idea of our research work is then to propose a so-

lution to better optimize the training set, i.e. the set of

selected sentences forming the initial matrix needed for

inducing the classification function. We wanted to find a

solution more efficient, which do not need initially sum-

maries already written to constitute the training set and

can be applied on several documents to be summarized.

2.2. Insertion of Ontological Knowledge

Before inserting ontological knowledge, we identify the

sentences of the document(s) to be summarized. Next,

we apply a syntactic analysis and delete stop words. We

then create a matrix E, formed of words by sentences.

Each item of the matrix contains the value tf × idf of the

word i in the sentence j. This discriminatory value is

based on the Salton et al.’s fo rmula [13], which evaluates

the value of a term compared to a corpus of documents.

We insert ontological knowledge to this matrix in order

to obtain the new matrix E0. Our hypothesis lies on the

fact that the training set is reinforced by new information,

i.e. terms or items with more semantic content and po-

tentially discriminatory. Su ch an insertion also enab les to

solve partially the problem of synonymy [1], one of main

open problems of information retrieval. Briefly, the ini-

tial matrix is improved by adding a set of sub-trees of

hypernym and hyponym, for each word.

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods133

We elaborated an algorithm to insert ontological know-

ledge that enables to find conceptual structures from on-

tology, to compute their importance in an information

viewpoint and introduce th em in each term of the matrix.

This algorithm gives lexemes in function of the words of

sentences and inserts their semantic values to the matrix.

The optimum lexemes have a factor that adds a semantic

value to items of the matrix, improving its performance

for classification. More precisely, the algorithm begins to

do a search in the ontology by subject and verb. Next, it

identifies the various concepts of each sentence by ana-

lyzing different sub-trees of parts of the sentence. The

sub-trees are built in function of semantic relations of

hypernym and hyponym. The algorithm evaluates the

various sub-trees and chooses the best one. To finish, it

inserts the selected sub-tree in the training set. When all

new components of the training sets are inserted we have

a new conceptual space enriched.

After inserting ontolog ical knowledge, we do different

steps to obtain the final training set and then induce the

classification function. First, we filter entropic attributes

with algebraic methods. To obtain our new set, we used a

similarity transformation matrix, which enables to find

smaller and less redundant subsets of attributes. By ap-

plying a transformation matrix to the matrices E and E0,

we id enti fy prin cipa l compo nents [8] and singular values

[14] in order to reduce the entropy of the matrix and sort

sentences in function their information content. The

principal components of a matrix enable to identify

groups of variables/words (principal component), greatly

connected in the group, but without correlation between

groups. The determining factor of this grouping is the

variability, which represents the information or impor-

tance. We can then choose the first sentences, with the

greatest variability, as important sentences and the last

ones as non-important.

In detail, we represent the matrix E (for singular val-

ues for instance), by :



0 L 0

M 0 MM

0 L s

diag

Euuu

UVss s





 















Each value si corresponds to the variability of each

sentence in all sentences. This v ariability is correlated to

the information content of each sentence. We associate

the values si, the greatest to the smallest, with the corre-

sponding words of the matrix. Next, we label the sen-

tences that contain high values si as important (class 1)

and those that contain low values si as non-important

(class –1). Finally, we use machine learning algo rithm to

induce the classification function, among those proposed

by literature.

The induced classification function is ab le to differenti-

ate sentences with much information (important) to sen-

tences with insufficient information (non-important). So

by applying the induced function on new documents,

only the sentences containing the most important infor-

mation are chosen to produce summaries.

2.3. Evaluation Method

To evaluate our solution and verify its efficacy, we cre-

ated a contingency table, named confusion matrix. The

inputs of this table correspond to considered classes,

knowing that these values are given after applying the

classification function to the training set. As the training

set is a binary set, we obtain Table 1 elaborated in func-

tion of the two classes to be determined: important and

non-important.

When the process of classification is finished, we

identify four categories of sentences among all those

analyzed:

 TP: if the function predicts correctly a sentence la-

beled as important;

 TN: if the function predicts correctly a sentence la-

beled as non-important;

 FP: if the function predicts incorrectly a sentence la-

beled as important;

 FN: if the function predicts incorrectly a sentence la-

beled as non-important.

We used the information given by Table 1 to obtain

the values of three evaluation indicators known in auto-

matic summarization, that are recall, precision and F-

score, as well as ROC curves (Receiver Operation Char-

acteristic) [15].

Recall (R) is the number of predictions TP divided by

the true number of positive instances classified as posi-

tives [6]. It informs about the capability of the classifica-

tion function to identify a sentence as important when it

is really important. The following formula enables to

compute recall: TP

RTP + FN



Precision (P) corresponds to the number of predictions

TP divided by the total number of instances classified as

Table 1. Confusion matrix used to evaluate efficacy.

Predicted Class

Class Important Non-important

Important True Positive Case (TP) False Negative Case

(FN)

Non-important False Positive Case

(FP) True Negative Case (TN)

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

134

positives [6]. This indicator informs about the capability

of the function to classify correctly a sentence according

to all the sentences added to this category. It is computed

with the following formula:

PTP FP





Fscore corresponds to a harmonic average of recall and

precision [6] and it is defined by:

2RP

score

F

 

We also represented our results by using ROC curves.

These graphs enable to represent all the pairs of values

TPR (True Positive Rate) or sensitivity and FPR (False

Positive Rate) or 1-specificity, resulting of a continuous

variation of the observation points in the whole row of

observed results [15]. By simple observation of these

graphs, we obtain a qualitative comparison. When we

apply each model to be evaluated on the training set, the

curve placed on the top and to the left has the greatest

accuracy. Likewise, the area under the curve indicates

the success probability of the model by identifying a

sentence as important. The ROC curves then give indica-

tions on the accuracy of the classificatio n model, as well

as a unified criterion in the evaluation process. The men-

tioned values are obtained by the following formulas:

sensitivy(TPR)TP FN



specificity(TNR)1 FPR

TN TP

FPR1specificity







FPR FP TN



Recall, Precision and Fscore as well as ROC curves al-

lowed us to evaluate the improvement rates obtained by

our solution

3. Experiment and Results

The experiment is conducted on a set of documents from

Reuters Corpus [16], a news database that contains ap-

proximately 11000 documents, classified into 90 currents

events subjects and grouped into two sets, respectively

named training and test. Each document contains on av-

erage 120 words and 15 sentences. We chose a total

number of 2000 documents for our exp eriment.

For the extraction of ontological knowledge, we used

WordNet database [17], developed by Princeton univer-

sity. This is a database oriented semantically with a very

rich frame, greatly used in computational linguistic. It is

composed of words related to names, verbs, adjectives

and adverbs. Words are organized into sets of synonyms

named synsets, related by semantic relations of hy-

pernym, hyponym, meronymy and holonomy. WordNet

database thus contains 155287 words and 117659 syn-

sets.

We also chose four machine learning algorithms

greatly used. The first algorithm is named Support Vec-

tor Machine [18]. It builds a hyper-plane in an n-dimen-

sions space to classification, regression or other tasks.

Intuitively, a good separation between classes is obtained

when a hyper-plane has the greatest distance for all the

nearest points of the training set. The second algorith m is

a probabilistic classifier based on Bayes’ theorem, Baye-

sian Classifier or Naïve Bayes [9], but with a great inde-

pendence hypothesis. In other words, it assumes that the

presence or absence of a characteristic is not related to

the presence or absence of another characteristic. The

third algorithm, Random Tree [19], realizes its classifica-

tion by building a tree in which the total number of se-

lected nodes is randomly chosen while being equal to:

log (attribute number1)





Finally, the fourth algorithm is Multilayer Perceptron

[20]. This is an artificial neuronal network with many

layers. The activation function of each neuron is not lin-

ear. This neuronal network can be used to identify line-

arly inseparable classes. The function is learned from

multilayers that are totally connected to each other.

The experiment conducted is composed of two parts.

In the first part, we produced summaries by extraction

from the chosen corpus without inserting ontological

knowledge to the machine learning algorithm. W e evalu-

ated the obtained results for each algorithm in function of

the evaluation methods given. In the second part, we

produced summaries from the same corpus, this way by

inserting ontological knowledge. We also evaluated the

results obtained. By following, we present these results

and discuss them.

3.1. Results for Recall, Precision and Fscore

We used methods of random subsampling (1/3 for the

test set and 2/3 for the training set) and cross-validation

(10 crossings). We also based on Tanagra software of

Lyon University [21], Weka software of Waikato Uni-

versity [22] and Orange software of Ljubljana University

[23]. The two methods used gave similar results.

Tables 2 and 3 present the values of recall, precision

and Fscore as well as the confusion matrix for each of the

four algorithms evaluated. In Table 2, the algorithms

were applied to the training set obtained from principal

omponents of the word matrix, while, in Table 3, the c

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods135

Table 2. Predictions and confusion matr ix wi th pr incipal components.

Algorithm Without ontological knowledge With ontological knowledge

Prediction Confusion matrixPrediction Confusion matrix

Recall Precision Fscore Important Non-

Important Recall PrecisionFscore Important Non-

important

Naïve Bayes 0.993 0.978

Important 1.00 0.986 488 0 0.9940.964 480 18

Non-important 0.837 1.00 7 36 0.8180.964 18 81

SVM

1.00 0.999

Important 1.00 1.00 488 0 1.000.998 483 0

Non-important 1.00 1.00 0 43 0.9991.00 1 98

Random Tree 0.982 0.9521

Important 1.00 0.972 488 0 0.9880.919 472 6

Non-important 0.674 1.00 14 29 0.5790.905 42 57

ML Per c ept ron 0.993 0.983

Important 0.998 0.988 487 1 0.9920.974 479 4

Non-important 0.861 0.974 6 37 0.8690.956 13 86

Table 3. Predictions and confusion matrix with singular values.

Algorithm Without ontological knowledge With ontological knowledge

Prediction Confusion matrixPrediction Confusion matrix

RecallPrecision Fscore ImportantNon-

important Recall PrecisionFscore Important Non-

important

Naïve Bayes 0.997 0.958

Important 1.00 0.994 486 0 0.9980.956 483 1

Non-important 0.858 1.00 3 18 0.776.987 22 76

SVM

1.00 0.999

Important 1.00 1.00 486 0 1.00 0.998 484 0

Non-important 1.00 1.00 0 21 1.0 1.00 1 97

Random Tree 0.992 0.957

Important 1.00 0.984 486 0 1.0 0.917 484 0

Non-important 0.61 1.00 8 13 0.5511.0 44 54

ML Per c ept ron 0.992 0.980

Important 1.00 0.984 486 0 0.9710.999 470 14

Non-important 0.619 1.00 8 13 0.9490.869 5 93

training set was obtained from singular values. The val-

ues are given for each part of the experiment, i.e. before

inserting ont ological knowle d ge and after inserting it.

By observing Tables 2 and 3, we note that perform-

ances in terms of recall and precision are high. Naïve

Bayes algorithm presents the greatest performance in the

case of using principal components. Its performance is

followed by the two other algorithms Support Vector

Machine and MultiLayer Perceptron. When ontological

knowledge is inserted then the greatest performance is

those of Support Vector Machine algorithm followed by

those of MultiLayer Perceptron algorithm. In the case

where the space was based on singular values, this is

Support Vector Machine algorithm that obtains the grea-

test performance, followed by those of Naïve Bayes al-

gorithm. When the ontological knowledge is inserted,

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

136

Support Vector Machine algorithm continues to occupy

the first place. The performance of MultiLayer Percep-

tron algorithm is improved at the expense of those of

Naïve Bayes algorithm.

3.2. Results for ROC Curves

In Table 4, we observe the AUC values (Area Under

Curve) of each ROC curves, for each algorithm applied

at a training set obtained from principal components. The

results are given before and after inserting ontological

knowledge. The last column indicates the relative im-

provement obtained.

Table 5 gives the same values, but this way by con-

sidering that the training set is obtained from singular

values.

From the joint observation of Tables 4 and 5, we can

say that the introductio n of ontological knowledge in the

training set, obtained using principal components or sin-

gular values, increases the quality of all algorithms. For

instance, we note an improvement of 58.8% and 19.1%

for respectively MultiLayer Perceptron and Random

Tree algorithms when using principal components. Sup-

port Vector Machine and Naïve Bayes algorithms are

improved of 14.8 % and 10.33 % respectiv ely, when using

singular values. The algorithm with the best performance

is Support Vector Machine followed by Naïve Bayes,

when using principal components. In the case of singular

values are used, we also observe a very great improve-

ment of the algorithm quality when the ontological

knowledge is inserted. MultiLayer Perceptron algorithm

obtains again the highest value with 27%. Support Vector

Table 4. Improvement when using principal components.

Values of ROC curves

Algorithm Without onto-

logical knowl-

edge

With ontological

knowledge

Improvement (%)

Naïve Bayes 0.668 0.737 10.33

SVM 0.682 0.783 14.8

Random Tree 0.555 0.661 19.1

ML Perceptron 0.449 0.713 58.8

Table 5. Improvement when using singular values.

Values of ROC curves

Algorithm Without onto-

logical knowl-

edge

With ontological

knowledge

Improvement (%)

Naïve Bayes 0.683 0.820 20.06

SVM 0.673 0.811 20.51

Random Tree 0.697 0.740 6.17

ML Perceptron 0.589 0.748 26.99

Machine and Naïve Bayes algorithms occupy respect-

tively the second and the third places with 20.5% and

20.1%. The best quality algorithm is Naïve Bayes fol-

lowed by Support Vector Machine.

Table 6 shows a comparison of AUC values of each

algorithm after inserting ontological knowledge, de-

pending on whether the training sets are obtained from

principal components or singular values.

Table 6 enables to estimate the difference of im-

provement between a space based on principal compo-

nents and another based on singular values. First, we

observe that all algorithms stud ied improve their qualita-

tive performance moving from a space of principal com-

ponents to another of singular values. The greatest im-

provement is obtained by Random Tree algorithm with

12% followed by Naïve Bayes with 11.3%. From this

table, we also conclude that the two best algorithms and

the most promising to extract summaries with spaces

ontologically reinforced, among the four ones studied,

are Naïve Bayes and Support Vector Machine.

Figures 1 and 2 correspond to ROC curves obtained

for each algorithm with training sets produced from

principal components and singular values. In these two

cases, the results are given without inserting ontological

knowledge (1a), and with inserting ontological knowl-

edge (1b).

As we already mentioned, the ROC curves offer a way

of evaluating the quality of a classification algorithm in

function of its capability to give good prediction s. In our

case, a good prediction corresponds to sentences classi-

fied as important, that should take part of summary, and

discriminated to non-important sentences.

In addition to the information obtained by Ta bles 4-6,

careful observation of figures enables to identify the op-

timum points of each algorithm, simply by placing the

point at the h ighest position to the left. We also compare

the accuracy between algorithms. For instance, if we

want to compare Support Vector Machine algorithm

when it reaches a little more than 90% of TP cases face

to Ran dom Tree algo rithm then we see on the Figure 2(b)

that Support Vector Machine algorithm has an appro-

Table 6. Difference of improvement between using princi-

pal components and using singular values.

Values of ROC curves

Algorithm Principal

mponentcos Singular

values

Improvement

(%)

Naïve Bayes 0.737 0.820 11.30

SVM 0.783 0.811 3.60

Random Tree 0.697 0.740 12.00

ML Perceptron 0.589 0.748 4.90

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods137

(a)

(b)

Figure 1. (a) Principal components before inserting onto-

logical knowledge; (b) Principal components after inserting

ontological knowledge.

ximate FP rate of 37% and Random Tree algorithm has a

rate of 70%.

The obtained results show a significantly improvement

to the classification function after inserting ontological

knowledge, whatever the machine learning algorithm

used. More, these results give the two best algorithms.

The Naïve Bayes and Support Vector Machine algo-

rithms should be then applied to automatic summariza-

tion with a training set produced from singular values

and reinforced by ontological knowledge.

4. Conclusions

There are still great opportunities for deepening and de-

velopment research to find suitable methods for summa-

rizing. In this paper, we studied the behaviour of four

machine learning algorithms that induce classification

(a)

(b)

Figure 2. (a) Singular values before inserting ontological

knowledge; (b) Singular values after inserting ontological

knowledge.

function from training sets. These sets were reinforced

by inserting ontological knowledge and used to dis-

criminate the important sentences, from one or several

documents, of those which are not. The used algorithms

are Naïve Bayes, Support Vector Machine, Random Tree

and Multilayer Perceptron. By analyzing the results of

experimentation, we concluded that all considered algo-

rithms may be used to produce summaries. We also note

that using principal components or singular values to

select the training set may be successfully retained to

induce the learning functions of the four studied algo-

rithms. The insertion of ontological knowledge gives

qualitative improvements of performance remarkable.

This insertion enables to propose good classification

functions, which are able to discriminate sentences be-

Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

138

tween important and non-important. The sentences dis-

criminated as important constitute the future summary.

Likewise, we observe that ontological knowledge pro-

duces more great effects on the classifier quality, if the

training set is obtained from singular values rather than

from principal components. From this final analysis, we

conclude that the two best algorithms that should be ap-

plied to auto matic summarization by extraction are Naïve

Bayes and Support Vector Machine, from a set of singu-

lar values reinforced by ontological kn o wl e d ge.

As future work, we think that it would be interesting to

evaluate the performance of classification algorithms on

more reduced spaces, i.e. optimized by means of tech-

niques different to those used in this experimentation. It

would be also interesting to explore their behaviour on

sets reinforced by ontological knowledge.

5. Acknowledgements

The authors would thank Natural Sciences and Engi-

neering Research Council of Canada (NSERC) for its

financial support.

6. References

[1] A. Sharan and H. Imran, “Machine Learning Approach

for Automatic Document Summarization,” Proceedings

of World Academy of Science, Engineering and Techno-

logy, 2009, pp. 103-109.

[2] R. A. García-Hernandez, R. Montiel, Y. Ledeneva, E.

Rendón, A. Gelbukh and R. Cruz, “Text Summarization

by Sentence Extraction Using Unsupervised Learning,”

Proceedings of the 7th Mexican International Conference

on Artificial Intelligence: Advances in Artificial Intelli-

gence, 2008, pp. 133-143.

[3] I. Mani and E. Bloedorn, “Machine Learning of Generic

and User-Focused-Summarization,” Proceedings of the

Tenth Conference on Innovative Applications of Artificial

Intelligence, Menlo Park, 1998, pp. 821-826.

[4] J. Goldstein, “Evaluating and generating summaries using

normalized probabilities,” Proceedings of the 22nd Annual

International ACM SIGIR Conference on Research and

Development in Information Retrieval, New York, 1999,

pp. 121-128. doi:10.1145/312624.312665

[5] K. S. Jones, “Automatic Summarising: The State of Art,”

Information Processing and Management, Vol. 43, No. 6,

2007, pp. 1449-1481. doi:10.1016/j.ipm.2007.03.009

[6] R. R. Korfhage, “Information Storage and Retrieval,”

Wiley, New York, 1997.

[7] L. Hennig, W. Umbrath and R. Wetzker, “An Ontology-

Based Approach to Text Summarization,” IEEE/WIC

/ACM Proceedings of the International Conference on

Web Intelligence and Intelligent Agent Technology Work-

shops, 2008, pp. 291-294.

[8] R. Bellman, “Introduction to Matrix Analysis,” McGraw-

Hill, New York, 1997.

[9] M. Steinbach, “Introduction to Data Mining”, Pearson

Education, Boston, 2006.

[10] M. Ikonomakis, S. Kotsiantis and V. Tampakas, “Text

Classification Using Machine Learning Techniques,”

Proceedings of the 9th WSEAS International Conference

on Computers, Stevens Point, 2005, pp. 966-974.

[11] I. Mani, “Recent Development in Text Summarization,”

Proceedings of the Tenth International Conference on

Information and Knowledge Management, McLean, 2001,

pp. 529-531.

[12] H. Xuexian, “Accuracy Improvement of Automatic Text

Classification Based in Feature Transformation and Mul-

ti-classifier Combination,” Proceedings of AWCC’2004,

Zhenjiang, 2004, pp. 463-464.

[13] G. Salton and C. Buckley, “Term-Weighting Approaches

in Automatic Text Retrieval,” Information Processing

and Management, Vol. 24, No. 5, 1988, pp. 513-523.

doi:10.1016/0306-4573(88)90021-0

[14] G. H. Golub, “Calculing the Singular Values and Pseudo-

Inverse of a Matrix,” Journal of the Society for Industrial

and Applied Mathematics, Vol. 2, No. 2, 1965, pp.

205-224. doi:10.1137/0702016

[15] T. A. Lasko, J. G. Bhagwat, K. H. Zou and L. Ohno-

Machado, “The Use of Receiver Operating Characteristic

Curves in Biomedical Informatics,” Journal of Biomedi-

cal Informatics, Vol. 38, No. 5, 2005, pp. 404-415.

doi:10.1016/j.jbi.2005.02.008

[16] A. Saleh, “Reuters Corpus (Offered by Reuters News

Agency),” 2004.

http://about.reuters.com/researchandstandards/corpus/

[17] C. D. Fellbaum, “WordNet (A Lexical Database for Eng-

lish),” Princeton University, 1985.

http://wordnet.princeton.edu//

[18] V. Vapnik, “The Nature of Statistical Learning Theory,”

Springer-Verlag, New York, 1995.

[19] R. Bellman, “Algorithms, Graphs and Computers”, Aca-

demic Press, New York, 1970.

[20] F. Rosenblatt, “Principles of Neurodynamics: Perceptrons

and the Theory of Brain Mechanisms,” Spartan Books,

Washington DC, 1961.

[21] R. Rakotomalala, “Tanagra: Un Logiciel Gratuit Pour

L’enseignement et la Recherche,” Proceedings of the

EGC’2005 Conference, Amsterdam, 2005, pp. 697-702.

[22] G. Holmes, A. Donkin and I. H. Witten, “Weka (A Soft-

ware Developed by Machine Learning Group),” Univer-

sty of Waikato, 1994.

http://www.cs.waikato.ac.nz/ml/weka/

[23] J. Demzar and B. Zupan, “Orange (A Software Devel-

oped at Laboratory of Artificial Intelligence),” Faculty of

Computer and Information Science, University of Ljubl-

jana, 2010. http://orange.biolab.si/