Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

doi:10.4236/jilsa.2010.23017

Paper Menu >>

Journal Menu >>

J. Intelligent Learning Systems & Applications, 2010, 2, 139-146

doi:10.4236/jilsa.2010.23017 Published Online August 2010 (http://www.SciRP.org/journal/jilsa)

139

Chunk Parsing and Entity Relation Extracting to

Chinese Text by Using Conditional Random

Fields Model

Junhua Wu, Longxia Liu

College of Electronics and Information Engineering, Nanjing University of Technology, Nanjing, China.

Email: wujh@njut.edu.cn

Received March 19th, 2010; revised July 20th, 2010; accepted July 30th, 2010.

ABSTRACT

Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-

guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation ex-

tracting is important work to understanding information semantic in natural language processing. Chunk analysis is a

shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full

syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and

relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label se-

quence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label

solution we can use CRFs to realize the chunk analysis and entities relation extraction.

Keywords: Information Extraction, Chunk Parsing, Entity Relation Extraction

1. Introduction

At present, information is presented in various digital

media. Many of them are organized in natural language,

such as information in Web pages, text document in

digital library etc. They are non structural or semi-struc-

tural and difficult to understand by computer. Further

processing to the information is blocked. It makes large

amounts of information wasted. So research on semantic

Web, natural language understanding is developed in

order to structure and retrieve information from Web

pages or other natural language documents. And infor-

mation extraction is important task in the work.

Information extraction is a process to retrieve informa-

tion from large text set. It may be concerned with identi-

fying named entity, extracting relationship and label

properties of sentence etc. It is a subfield of natural lan-

guage understanding. There are some methods for infor-

mation extraction including methods based on rules [1,2]

and statistical model [3-6].

Chunk analysis and relation extraction play the impor-

tant roles in information extraction. It is a simplified

syntax paring technology to define and label chunk based

on syntax and semantics [7]. Comparing with full parsing

this method only identifies the partial structure in a sen-

tence, such as noun phrase or verb phrase. Through

which, the simple syntax parsing can be implemented

and information extraction may be more effective and

simple.

The objective of entity relation extraction is identify-

ing the relationship between entities in text. Miller et al.

considered the problem of relation extraction in the con-

text of natural language parsing and augmented syntactic

parses with semantic relation-specific attributes [8]. It

will be critical in events detecting and describing for re-

search on information extraction. Entity relation may be

explicit and implicit. Some encountered problems make

studying on entities relation hard such as few dataset,

difficult extraction of implicit relation and immature

parsing to Chinese.

Conditional random fields model is a valid probabilis-

tic model to segment and label sequence data [9]. In Chi-

nese understanding, some research use CRFs in Chinese

part-of-speech and word segmentation [10,11], but sel-

dom in chunk parsing and entity relation extraction.

Compared with other statistical model CRFs can rep-

resent long-range dependences and multiple interacting

features. Our innovation is that we analyze Chinese cha-

racteristics and then model chunk and entity relation

problems as label problem. Moreover using CRFs real-

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

140

izes the chunk analysis and relation extraction.

2. Related Work

A number of approaches currently have been used for

natural language tasks as part of speech tagging and en-

tity extraction. They are usually based on rules or statis-

tic models.

Text chunk divides a text in syntactically correlated parts

of words. Steven introduced chunks [12] firstly. Many

machine learning approaches, such as Memory-based

Learning (MBL) [13], Transformation-based Learning

(TBL) [14], and Hidden Markov Models (HMMs), have

been applied to text chunking [15] for parsing.

Named entity is important linguistic unit. So there are

many works such as named entity recognition, disam-

biguation, and relationship extraction on it [16-20]. The

problem of relation extraction is starting to be addressed

within the natural language processing and machine

learning communities. Since it is proposed, many meth-

ods have been suggested. Methods based on knowledge

base were used in decision of relation extraction firstly.

But it is difficult to construct knowledge base. Therefore

some methods based on machine learning were emerged,

such as feature-based [16], kernel-based [17] method.

Approach kernel-based is a valid one for relation ex-

tracting, but its training and testing time is long for large

amounts of data.

2.1 Model for Information Extraction

A lot of research to information extraction is based on

machine learning methods using statistic model because

by the model sentence can be segmented and labeled.

Statistical language model is a probability model which

estimate probability of expected text sequence by com-

puting probability. These models are concerned with

Hidden Markov Model (HMM), Maximum Entropy

Model (ME), Maximum Entropy Markov Model (MEMM)

and conditional random fields model CRFs. Our method

is also based on statistic model.

Hidden Markov models (HMMs) are a powerful pro-

babilistic tool for modeling sequential data, and have

been applied with success to many text-related tasks,

such as part-of-speech tagging, text segmentation and

information extraction [6]. HMM can be considered as a

finite state machine that presents states and transition

chains of an application. The model is built either by

manual or training. Usually extracting text information is

concerned with training and labeling. Maximum likeli-

hood and Baum-Welch algorithm are used to learning

sample data labeled or unlabeled. And then Viterbi algo-

rithm is used to label state sequence with maximum

probability in text needed processing.

HMM is easy to build. It needn’t large dictionary or

rule sets with well flexibilities. There are many improved

HMM model and their application in information extrac-

tion. Freitag and McCallum’s paper [3] uses stochastic

optimization to search the fittest HMM. Souyma Ray and

Mark Crave [4] choose HMM to represent sentence

structure. Scheffer T, Decomain C and Wrobel S [5] pro-

poses a method which uses active learning to minimize

the label data for HMM training. But HMM is a genera-

tive model and independent hypothesis is needed, so it

will ignore the context of information and lead to an un-

expected result.

Maximum Entropy (ME) method [21] converts the se-

quence label into data classifying. Its principle can be

stated as follows [22]:

1) Reformulate the different information sources as

constraints to be satisfied by the target (combined) esti-

mate.

2) Among all probability distributions that satisfy

these constraints, choose the one that has the highest en-

tropy.

The advantage of ME is [21]: It makes the least as-

sumptions about the distribution being modeled other

than those imposed by constraints and given by the prior

information. The framework is completely general in that

almost any consistent piece of probabilistic information

can be formulated a constraint. Moreover, if the con-

straints are consistent, that is there exists a probability

function which satisfies them, then amongst all probabil-

ity functions which satisfy the constraints, there is a

unique maximum entropy.

This ME method will lost sequence properties. So a

model combining ME and MM (Markov Model) is emer-

ged, that is MEMM [6].

In MEMM, the HMM transition and observation func-

tions are replaced by a single function P(s|s’,o) that pro-

vides the probability of the current state s given the pre-

vious state s’ and the current observation o. In this model,

as in most applications of HMMs, the observations are

given—reflecting the fact that we don’t actually care

about their probability, only the probability of the state

sequence (and hence label sequence) they induce.

Conditional probability of transition between states is

introduced in MEMM, which makes the arbitrary choice

of properties possible. But MEMM is partial model

which needs normalization for each node. Therefore only

a localized optimization value is obtained. Also the pro-

blem named length bias and label bias [9] may be caused.

It means the method will ignore those not in training

dataset.

2.2 Label Bias

Classical discriminative Markov models, maximum en-

tropy taggers (Ratnaparkhi, 1996), and MEMMs, as well

as non-probabilistic sequence tagging and segmentation

models with independently trained next-state classifiers

are all potential victims of the label bias problem [9].

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model 141

Consider a MEMM model shown in Figure 1 which is

a finite-state acceptor for shallow parsing of two sen-

tences:

The robot wheels Fred round.

The robot wheels are round.

Here [B-NP] etc. are labels for sentence. NP, VP,

ADJP and PP mean Noun Phrase, Verb Phrase, Adjective

Phrase and Prep Phrase. B or I stand for word location,

begin or inter of a phrase.

It is obvious that sum of transition probability is 1

from a state i to other adjacent states. Because there is

only one transition in state 3 and 7, while current state

and observed value Fred are specified, conditional prob-

ability of next state is:

(4|3,)(8|7,)pFredpFred1

But this equation will face to some problems if there

isn’t existing a transition from state 7 to state 8 while

observe value is Fred in training dataset. Generally a low

probability is specified if an unknown event exists in

training dataset. But for state with single output, the fol-

low equation have to be given:

(|7,) 1

allstates

fromstate to

psFred 



It means that the observed value Fred is ignored. This

will result in that label sequence is not related to ob-

served sequence. That is label bias.

Proper solutions require models that account for whole

state sequences at once by letting some transitions “vote”

more strongly than others depending on the correspond-

ing observations [9].

Lafferty suggests a global model CRFs that can solve

the problems discussed before. Instead of local normal-

izing CRFs can realize global processing, so a global opti-

mization value will be produced. CRFs is a new graph

model of probability which can represent the long-range

dependences and multiple interacting features. Domain

knowledge is represented conveniently by the model.

McCallum use this model to process named entity recog-

nition [23]. His experiments shows F value is 84.04%

Figure 1. Finite-state acceptor for shallow parsing of two

sentences

while processing English, F value is 68.11% while proc-

essing German. Hong mingcai uses CRFs to label Chi-

nese part-of speech [11]. But information extraction of

Chinese is still a difficult task presented in many sub-

fields such as chunk analysis and entity relation extrac-

tion. So this paper explores the methods about chunk

analysis and entity relation extraction to Chinese text

based on CRFs.

3. Conditional Random Fields (CRFs) Model

Conditional random fields model is a probabilistic model

to segment and label sequence data based on statistic. It

is a non-directional graph model that can compute condi-

tional probability of output sequence when conditioned

on input sequence of model.

Definition 1 [9]. Let G = (V,E) be a graph such that Y

= (Yv) v2∈V , so that Y is indexed by the vertices of G.

Then (X,Y) is a conditional random field in case, when

conditioned on X, the random variables Y

v obey the

Markov property with respect to the graph: P(Yv |X, Yw ,w

≠ v) =P(Yv |X, Yw, w ~ v), where w ~ v means that w and v

are neighbors in G.

CRFs is a random field globally conditioned on the

observation X. if X = {x1, x2, … xn} is specified as data

sequence needed label then Y = {y1, y2, … yn}is the result

data which have been segmented or labeled by the model.

The model computes the joint distribution over the label

sequence Y given X instead of only defining next state in

terms of current state.

The conditional probability of label sequence Y de-

pends on the global interactional features with different

weight.

Assume





1,..., k



 is a vector of features, con-

ditional probability, for a given X, PΛ(Y| X) is defined as

follow:



(|)exp, , ,

kk tt

pYXfy yXt



















 (1)



exp, , ,

Xkkt

Ytk



fy yXt



















 (2)

Zx is a normalized value that makes the total probability

of all state sequence is 1 for given X. 1

(,,,

kt t)

yyXt

 is

a feature function to mark the feature at position t and t-1

for observed X. Its value is between 0 and 1.









1,..., k



is corresponding to the context of data se-

quence and is a weight set of 1

(,,,

kt t)

yyXt

.

If we want to use the CRFs model to obtain expected

result the critical task is training model. A model trained

can produce optimization P(Y|X), that is

*arg max

Y

(| )pY X. It also means





1,..., k



 will be deter-

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

142

mined. Training may use log-likehood algorithm that is

independent of applications.

In this paper chunk analysis and entity relation extrac-

tion will be converted into the label solution. Data se-

quence X is made up of some words. For each word W0

there are some words ahead or back of it. It is repre-

sented W = {W-n, …W-1, W0, W+1, …W+n}. W-n, stands for

nth word previous to W0 and W+n is the nth one following

W0.





1,...,k



 , in model, can be thought as feature

weights related to W= {W-k, …W-1, W0, W+1, …W+k}.

Each



is specified in model after training, and label

sequence can be produced by Viterbi algorithm through

running the model .

4. Chunk Analysis Based on CRFs

Chunk is firstly proposed by Abney [12]. He thinks

chunk is the syntax element between word and sentence

and with non-recursive properties.

Chunk analysis is partial parsing, also named shallow

parsing, relative to complete parsing with simplified pol-

icy [15]. It is a new technology of natural language proc-

essing. Full parsing can produce a complete parse tree

finally by series analysis process to sentence, which

needs large cost. But chunk analysis only needs to iden-

tify some structures of the sentences such as non-recur-

sive noun phrase, verb phrases etc, called chunk. By di-

viding sentence into different chunks in syntax or seman-

tics and labeling chunks we can improve the efficiency of

information extraction. It is a policy between lexical ana-

lysis and syntax analysis. Chunk partitioning and identi-

fying are completed by chunk parsing in natural language

processing.

4.1 The Definition and Label of Chunks

Definition 2. Chunk is a structure that is non-recursive

phrase meet syntax. Each chunk has a head word and

begins or ends at this word.

Non-recursive phrase means nested structure not exist.

That is, all chunks are the same level.

Conference on Computational Natural Language Lear-

ning (CoNLL-2000) developed a dataset of English chunk

which provided a platform to evaluating and test chunk

analysis algorithms. There are 11 types chunks defined.

They are NP, VP, ADVP, ADJP, PP, SBAR, CONJP,

PRT, INTJ, LST, UCP [24].

Most of Chinese chunks present the same properties

compared with English. But there is some difference.

By analyzing the properties of Chinese we defined some

chunk types: noun chunk(NP), verb chunk(VP), adjective

chunk(AP), adverb chunk(DP), preposition chunk(SP),

time chunk(TP), quantifier chunk(MP), conjunction chunk

(CONJP) and other chunk(UCP).

In fact chunk analysis based on CRFs has become a

process of labeling chunk like tagging part-of speech. Ge-

nerally there are two kinds of standard method to label:

Inside/Outside and Start/End methods. Inside/Outside

policy, named IOB1, uses tag set {I,O,B} [25] to label

internal, outside and first word of a chunk. Combining it

with chunk type we will have chunk labeled. Such as

B-VP, it shows that is a first word of a verb chunk. O

means the word doesn’t belong to any chunk. Start/End

method, named IOBES, uses tag set {I,O,B,E,S}. When

chunk only includes one word, S tag is used. E labels the

last word of a chunk. Other tags are the same as In-

side/Outside. For example S-NP means a chunk is con-

structed by one word. Table 1 presents the label chunks

of a sentence. The first column of table is Chinese words

and the second column is corresponding to English for

reader understanding. Next two columns are notations

used IOB1 and IOBES.

In the table, row 4-6 represent a verb chunk which

consist of three Chinese words labeled B-VP, I-VP and

E-VP if use IOBES method. By these label chunk analy-

sis is considered as chunk label which can be imple-

mented by training CRFs model.

4.2 Model Training

CRFs model must be trained using labeled dataset to de-

termine the model parameters. That trained model can be

used to realize processing text which expects to be seg-

mented and labeled. If X is sentences that have been la-

beled and Y is corresponding label sequence of chunk

CRFs model training will make label sequence *



optimal.

arg max ( |)

YpY X

Here we use CRF++0.50 as training and testing tool.

CRF++0.50 is a string learning tool based on CRFs prin-

ciple. The training sample file and feature template file

are needed in training process. Training will result in a

CRfs model which will be used in labeling chunk to

Chinese text.

Table 1. An example of label chunks

Chinese English IOB1 IOBES

因而 So I-CONJP S-CONJP

我们 we B-NP S-NP

可能 may B-VP B-VP

会 be I-VP I-VP

面临 Face to I-VP E-VP

一个 a B-MP S-MP

不 un B-NP B-NP

稳定 stable I-NP I-NP

时期 period I-NP E-NP

。 . O O

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model 143

The training sample file is made up of some blocks

and each block represents a sentence. The block form of

training sample file is presented in Table 2.

There is a blank row between blocks. Each block in-

cludes some tokens and each token is a label word in one

row. First column is the Chinese word and next column

is the English word to help understanding. Third column

lists the properties of the word (may be more than one

column). Last column is the tag notations.

In section 3 we know some feature weights used in

representing context of a word W0. So it is important to

select feature set. Generally context of a word and their

properties are very useful for decision of feature. That

means we can use some words which are previous or

succeed to word W0 as features. Features may be N-gram.

Features {…W-2, W-1, W0, W1, W2…} is named Uni-gram

basic features and {…W-2, W-1, W-1, W0, W0, W1, W1,

W2…} is named Bi-gram basic features. Here Wn stands

for a word. In addition, advanced features {… WP-2,

WP-1, WP0, WP1, WP2 …} which combine the word and

its property together are also used to improve result of

analysis. P means property of a word. Table 3 shows

various features of word “赤字” (deficit) in Table 2. The

last column of Table 3 is English word corresponding to

Chinese word.

So an observing window of token W0 need to be given

for training. The window includes W0 and some words

before and after it, that is W ={ W-n,W-(n-1), …,W0, …, Wn-1,

Wn}. Table 4 is an example about observing window. It

is used as feature source for training vector





1,..., k



 .

Larger window provides more context feature, but it will

increase the cost of processing. Too small window may

Table 2. Form of experiment dataset sample

Chinese token English tokenProperty Notation

他 He PRP B-NP

认为 reckons VBZ B-VP

当前的 current JJ I-NP

赤字 deficit NN I-NP

将 will MD B-VP

缩小 narrow VB I-VP

到 to TO B-PP

仅 only RB B-NP

1800 18000 CD I-NP

万 thousands CD I-NP

9月 september NNP B-NP

. . O

Table 3. Feature instance

Feature Feature itemFeature value Value in

English

W-2 认为 reckons

W-1 当前的 current

W0 赤字 deficit

W1 将 will

Uni-gram

basic features

W2 缩小 narrow

W-1W0 当前的/赤字 current/ deficit

Bi-gram basic

features W0W1 赤字/将 deficit/will

WP-1, 当前的/JJ current/JJ

Uni-gram

advanced

features WP1 将/MD Will/MD

Table 4. Observing window of features

Feature

position Description Chinese

example

W = W0 Token 当前的

W = W-1 Last word of token 认为

W = W+1 Next word of token 赤字

W = W0W+1 Token and next word 当前的赤字

W = W-1W0W+1

Last word, token and next

word 认为当前的赤字

lose important features. So we define windows size as 5,

that is W = {W-2, W-1, W0, W1, W2}.

The template file of features defines the feature item

for training. After training





1,..., k



 is produced,

that is CRFs model has been available.

5. Entity Relation Extraction

Entity is the basic element in natural text, such as place,

role, organization, thing etc. Entities play important roles

in natural language text. Generally there are some rela-

tionships between them. Such as locating, belong to, ad-

jacent and so on. These relationships may be explicit or

implicit. Implicit relationship needs reasoning by know-

ledge. Entity relation extraction is the process of identi-

fying the relationship between entities in text and label-

ing them. It is not only an important work in information

extraction but also useful in automatic answer or seman-

tic network.

Testing from MUC shows that many systems are able

to process named entity to large of English document

[26]. But entity relation extraction to Chinese may be

difficult. As we known machine learning is the valid

method for extracting, but it needs dataset labeled. Cur-

rently, “People’s Daily” labeled by Beijing university is

perhaps a better choice. This dataset has been labeled in

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

144

5.2 Experiment of Entity Relation Extraction

part-of speech, properties of word, named entity. But

extracting implicit relationship needs more external

knowledge. Experiment is designed based on the definition of Rela-

tion 1 (M,N) and Relation 2 (M,N). The paper uses label

dataset of “people’s daily” on January 1998 as sample

dataset named M, size 3.2 MB. It is divided into 10 sub-

sets from M1 to M10. M1 includes text on Jan 1st and

M2 ranges from 1st to 2nd. By the way M10 ranges from

1st to 10th. That is Mn means newspaper contents of n

days.

5.1 The Definition and Label of Entity Relation

ACE 2006 defines six classes and 18 subclasses rela-

tionships between two entities. They are shown in Table

5. The dataset provided by ACE covers English, Chinese

and Arabie.

This paper focuses on Chinese entity relation. Here we

only illustrate two kinds of relationship of physical class,

and only implement extraction based on the definition of

these two relationships.

M1-M9 is considered as training dataset and M10 as

test set. Table 7 is the dataset and result of experiment.

In the table Recall R, Precision P and F-Score are met-

rics to information extraction. Let r1 be numbers of rela-

tionships extracted correctly, r2 as numbers of relation-

ships extracted actually, r3 as numbers of original rela-

tionships in text. Then:

Definition 3: Relation 1 (M, N) is defined as located

relationship. With Entity M, N  geographical entity

and N M.



Definition 4: Relation 2(M, N) is defined as near rela-

tionship. With M, N geographical entity and 2(M, N)

= 2 (N, M).



Table 5. Named entity relation types and instances

Class Subclass

Physical Located, Near

Part-Whole Artifact, Geographical, Subsidiary

Personal-Social Business, Family, Lasting-Personal

ORG-Affiliation

Employment, Founder, Ownership,

Student-Alum, Sports-Affiliation,

Investor-Shareholder, Membership

Agent-Artifact User-Owner-Investor-Manufacturer

General-

Affiliation

Citizen-Resident-Religion-Ethnicity,

Org-Location

In this paper the same principle as chunk analysis is

used to realize extraction which training CRFs model

through label sample dataset. Then using CRfs model

realizes the extraction. Here we suggest nine kinds of

notation to label the position relationship in dataset. Ta-

ble 6 presents these nine notations.

In the table each row presents one or two entities and

their relationship. For a notation 1-B, 1 stands for Rela-

tion 1 (M, N) and B stands for the first entity M. The

third column lists the sentence including entities. The end

column shows the instance corresponding to notation.

Table 6. Nine kinds of labeling and instances

Notation Description Sentences for example Instance

1-B Entity M in Relation1(M,N) 中国首都北京

(Beijing of china capital)

中国

(China)

1-E Entity N in Relation1(M,N) 中国首都北京

(Beijing of china capital)

北京

(Beijing)

2 Entity M,N in Relation 2(M,N) 城市北京和天津

(City Beijing and Tianjin)

北京 ,天津

(Beijing,Tianjin)

1-E-1-B Entity N in Relation 1(M,N), 1(N,S) 位于中国首都北京的西单

(Xidan in Beijing of China capital)

北京

(Beijing)

1-E-1-E Entity S in Relation 1(M,N), 1(N,S) 位于中国首都北京的西单

(Xidan in Beijing of China capital)

西单

(Xidan)

1-B-2 Entity M, S in Relation 1(M,N), 2（M,S) 美国总统访问中国

(The president of America visits China)

美国，中国

(America, China)

1-E-2 Entity N,S in 1(M,N), 2（N,S) 中国城市北京和天津

(City Beijing and Tianjin of China)

北京，天津

(Beijing, Tianjin)

2-1-B Entity N in 2(M,N), 1（N,S) 美国总统访问中国北京

(The president of America visits China)

中国

(China)

2-1-E Entity S in 2(M,N), 1（N,S) 美国总统访问中国北京

(The president of America visits China)

北京

(Beijing)

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model 145

Table 7. Experiment result of entity relation extraction

Dataset CRFs trained Time (s) Precision (%) Recall R(%) F-Score (%)

M1 Model 1 48 39.8 45.3 42.4

M2 Model 2 112 38.2 57.2 45.8

M3 Model 3 203 45.3 59.6 51.5

M4 Model 4 293 50.2 65.5 56.8

M5 Model 5 547 63.1 69.8 66.3

M6 Model 6 923 73.1 74.9 74.0

M7 Model 7 1982 76.9 84.7 80.8

M8 Model 8 2439 87.9 89.4 88.6

M9 Model 9 3010 90.5 95.8 93.1

100%



100%



2PR

FPR







The data in table is the result of using M10 to test each

model trained through M1-M9. Experiment shows that P,

R and F increase with the more dataset used in training

model. When we use M1 training the model F-Score is

42.4%. This is a disappointed value. But it only uses few

sample dataset (newspaper of one day) for training.

When using data of nine days we have obtained P,R,F

values as 90.5%, 95.8% and 93.1% by use Model10. It

illustrates it’s a valid method. If we provide enough sam-

ple dataset we may win better result.

6. Conclusions

This paper discusses the information extraction of Chi-

nese text based on CRFs which aims at the chunk parsing

and relation extraction. Processing Chinese text is a

complex system. This is an exploration because we ha-

ven’t enough sample dataset for training CRFs model by

now. But we think it’s an effective method by experi-

ments since CRFs model possesses working with global

features.

At present we are developing a prototype of informa-

tion extraction so a lot of work will be continued. Ab-

sence of training database is the common problem for

many kinds of language. But manual label will be large

cost. Therefore there are some researches on automatic or

semi-automatic constructing dataset. In addition, how to

select suitable feature set and improve precision is also

future works.

REFERENCES

[1] E. C. Mary and J. M. Raymond, “Relational Learning of

Pattern-match Rules for Information Extraction,” Ph.D.

Thesis, University of Texas, Austin, 1998.

[2] S. Stephen, “Learning Information Extraction Rules for

Semi-Structured and Free Text,” Machine Learning, Vol.

34, No. 13, 1999, pp. 233-272.

[3] D. Freitag and A. McCallum, “Information Extraction

with HMM Structures Learned by Stochastic Optimiza-

tion,” Proceedings of 18th Conference on Artificial Intel-

ligence, AAAI Press, Edmonton, 2002, pp. 584-589.

[4] R. Souyma and C. Mark, “Representing Sentence Struc-

ture in Hidden Markov Models for Information Extrac-

tion,” Proceedings of the Seventeenth International Joint

Conference on Artificial Intelligence, Morgan Kaufmann,

Washington, 2001, pp. 1273-1279.

[5] T. Scheffer, C. Decomain and S. Wrobel, “Active Hidden

Markov Models for Information Extraction,” Proceedings

of the Fourth International Symposium on Intelligent

Data Analysis, Springer, Lisbon, 2001, pp. 301-109.

[6] D. Freitag, A. McCallum and F. Pereira, “Maximum En-

tropy Markov Models for Information Extraction and

Segmentation,” Proceedings of the Seventeenth Interna-

tional Conference on Machine Learning, Morgan Kauf-

mann, San Francisco, 2000, pp. 591-598.

[7] H. L. Sun and S. W. Yu, “Shallow Parsing: An Over-

view,” Contemporary Linguistics, 2000.

[8] S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz,

R. Stone and R. Weischedel, “Algorithms that Learn to

Extract Information-BBN: Description Of The SIFT Sys-

tem as Used for MUC-7, Proceedings of MUC-7, Fairfax,

1998.

[9] J. Lafferty, A. McCallum and F. Pereira, “Conditional

Random Fields: Probabilistic Models for Segmenting and

Labeling Sequence Data,” Proceedings of the Interna-

tional Conference on Machine Learning (ICML), 2001,

pp. 282-289.

Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model

146

[10] Y. Y. Luo and D. G. Huang, “Chinese Word Segmenta-

tion Based on the Marginal Probabilities Generated by

CRFs,” Journal of Chinese Information Processing, Vol.

23, No. 5, 2009, pp. 3-8.

[11] M.-C. Hong, K. Zhang, J. Tang and J.-Z. Li “A Chinese

Part-of-Speech Tagging Approach Using Conditional

Random Fields,” Computer Science, Vol. 33, No. 10,

2006, pp. 148-152.

[12] S. P. Abney and C. Tenny, “Parsing by Chunks. Principle

based Parsing: Computation and Psycholinguistics,” Klu-

wer Academic Publishers, Dordrecht, 1991, pp. 257-278.

[13] F. Erik, “Tjong Kim Sang and Sabine Buch holz. Intro-

duction to the Conll-2000 Shared Task: Chunking,” Pro-

ceedings of CoNLL-2000 and LLL2000, Lisbin, 2000, pp.

127-132.

[14] L. Ramshaw and M. Marcus, “Text Chunking Using

Transformation-Based Learning,” In: D. Yarovsky and K.

Church, Eds., Proceedings of the Third Workshop on

Very Large Corpora, Association for Computational Lin-

guistics, Somerset, 1995, pp. 82-94.

[15] J. Hammerton, M. Osborne, S. Armstrong and W.

Daelemans, “Introduction to Special Issue on Machine

Learning Approaches to Shallow Parsing,” Journal of

Machine Learning Research, Vol. 2, No. 3, 2002, pp.

551-558.

[16] K. Nanda, “Combining Lexical, Syntactic and Semantic

Features with Maximum Entropy Models for Extracting

Relations,” Proceedings of the ACL 2004 on Interactive

poster and demonstration sessions, Barcelona, 2004, pp.

22-25.

[17] D. Zelenko, C. Aone and A. Richardella, “Kernel Meth-

ods for Relation Extraction,” Journal of Machine Learn-

ing Research, Vol. 3, 2003, pp. 1083-1106.

[18] C. Whitelaw, A. Kehlenbeck, N. Petrovic, et al., “Web-

Scale Named Entity Recognition,” Proceeding of ACM

17th Conference on Information and Knowledge Man-

agement, Napa Valley, 2008, pp. 123-132.

[19] Z. Q. Chen, D. V. Kalashnikov and S. Mehrotra, “Adap-

tive Graphical Approach to Entity Resolution,” Proceed-

ings of ACM IEEE Joint Conference on Digital Libraries,

Vancouver, 2007, pp. 204-213.

[20] X. P. Han and J. Zhao, “Person Name Disambiguation

Based on Web-Based Person Mining and Categorization,”

2nd Web People Search Evaluation Workshop in con-

junction with WWW2009, Madrid, 2009.

[21] S. D. Pietra, R. L. Mercer and S. Roukos, “Adaptive Lan-

guage Modeling Using Minimum Discriminate Estima-

tion,” Proceedings of the Speech and Natural Language

DARPA Workshop, San Francisco, 1992, pp. 103-106.

[22] R. Rosenfeld, “Adaptive Statistical Language Modeling:

A Maximum Entropy Approach,” Ph.D. Thesis, School of

Computer Science, Carnegie Mellon University, Pitts-

burgh, 1994.

[23] A. McCallum and W. Li, “Early Results for Named En-

tity Recognition with Conditional Random Fields Feature

Induction and Web-Enhanced Lexicons,” Proceedings of

CoNLL-2003 Association for Computational Linguistics,

Daelemans, 2003, pp. 188-191.

[24] K. Tjong, E. F. Sang and S. Buchholz, “Introduction to

the CoNLL-2000 Shared Task: Chunking,” Proceedings

of CoNLL-2000 and LLL-2000 Association for Computa-

tional Linguistics, Lisbon, 2000, pp. 127-132.

[25] K. Tjong, E. F. Sang and J. Veenstra, “Representing Text

Chunks,” Proceedings of EACL’99, Association for

Computational Linguistics, Bergen, 1995, pp. 173-179.

[26] J. Zhao, “A Survey on Named Entity Recognition, Dis-

ambiguation and Cross 2 Lingual Conference Resolu-

tion,” Journal of Chinese Information Processing, Vol.

23, No. 2 March 2009, pp. 3-17.