Tagging Accuracy Analysis on Part-of-Speech Taggers

doi:10.4236/jcc.2014.24021

Paper Menu >>

Journal Menu >>

Journal of Computer and Communications, 2014, 2,157-162

Published Online March 2014 in SciRes. http://www.scirp.org/journal/jcc

http://dx.doi.org/10.4236/jcc.2014.24021

How to cite this paper: Yumusak, S., Dogdu, E. and Kodaz, H. (2014) Tagging Accuracy Analysis on Part-of-Speech Taggers.

Journal of Computer and Communications, 2, 157-162. http://dx.doi.org/10.4236/jcc.2014.24021

Tagging Accuracy Analysis on Part-of-Speech

Taggers

Semih Yumusak1, Erdogan Dogdu2, Halife Kodaz3

1Computer Engineering Department, KTO Karatay University, Konya , Turkey

2Computer Engineering Department, TOBB University of Economics and Technology, Ankara, T urke y

3Computer Engineering Department, Selcuk University, Konya, Turkey

Email: semih.yumusak @karatay.edu.tr, edogdu@etu.edu.tr, hkodaz@selcuk.edu.tr

Received Novemb er 2013

Abstract

Part of Speech (POS) Tagging can be applied by several tools and several programming languages.

This work focuses on the Natural Language Toolkit (NLTK) library in the Python environment and

the gold standard corpora installable. The corpora and tagging methods are analyzed and com-

pared by using the Python language. Different taggers are analyzed according to their tagging ac-

curacies with data from three different corpora. In this study, we have analyzed Brown, Penn

Treebank and NPS Chat corpuses. The taggers we have used for the analysis are; default tagger,

regex tagger, n-gram taggers. We have applied all taggers to these three corpuses, resultantly we

have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of

taggers does better if it is correctly ordered. Additionally, we have seen that NPS Chat Corpus gives

different accuracy results than the other two corpuses.

Keywords

POS Tagger; Brown Corpus; Penn Treebank Corpus; NPS Chat Corpus

1. Introduction

Part-of-speech (POS) tagging is a process of classifying the words in a sentence according to their types [1-3].

POS taggers are used for different purposes. POS tagging can be used for Linguistic-text Pre-processing before

semantic analysis [4,5]. POS tagging was also used for multi-lingual text analysis [6]. The tag sets and the ter-

minology of tagging are defined by different projects such as Brown Corpus Tag-Set, Penn Treebank Tag-Set

[7]. In Python programming language there is an infrastructure about natural language processing, which is

called Natural Language Toolkit (NLTK). NLTK 3.0 is a code library that is used to build Natural Language

Programming (NLP) in Python [8,9]. In the Python development environment, there are several tagged corpora

available for installation. The available corpus names are listed by using nltk. download () method. However,

not all of the corpora listed by this method are tagged. In this paper, we will be dealing with tagged corpora, i.e.

which includes part-of-speech annotations. The tagged corpora installed for this work in the Python NLTK li-

brary are as follows; Brown Corpus [10] Penn Treebank Corpus [7] NPS-Chat Corpus [11]. In this study, we

S. Yumusak et al.

158

have more deeply examined the different corpuses and taggers available in NLTK and mentioned in [8]. We

have first listed the methods and data used for analyzing the taggers. The Python source code for analyzing the

Penn Treebank corpus is written in method and analysis section to show the exact way of how we made the

analysis. While writing the code for analysis, we have inspired from the code samples used in [8]. Finally, all

corpora are analyzed by using the same tagging methods and the accuracies are listed in Table 1.

The rest of the paper is organized as follows. Section 2 presents explanation of the materials and the method

we used. Section 3 describes details of our application and reports experimental results. Finally section 4 con-

cludes the paper.

2. Materials and Methods

NLTK 3.0 (NLTK) is a code library that is used to build NLP programs in Python. The features in the NLTK li-

brary are; sentence and word tokenization, part-of speech tagging, chunking and named entity recognition, text

classification [8,9]. There are also several corpora available in NLTK library to use for data analysis. We ex-

plored and imported these corpora by using the following method, by replacing * with the name of the corpus;

>>> from nltk.corpus import*

We have found that not all of the corpora installable are tagged. Then we filtered the tagged corpora as if the

tagged words() method is reachable in a corpus, we assumed it as a tagged corpus. The assumption is made ac-

cording to the NLTK web site, whi ch is written as tagged words() method is only supported by corpora that in-

clude part-of-speech annotations [12]. As a result of this filtering, we have found that Brown Corpus, Penn

Treebank Corpus and NPS Chat Corpus are available with their tagged sentences. After all, we have applied

Default Tagger, Regex Tagger, Bigram Tagger, Unigram Tagger, Trigram Tagger to these three corpuses. The

source code used for corpus exploration and the POS tagger applications are explained by using the Penn Tree-

bank corpus in the following sections.

2.1. Corpus Exploration

The Python command used for checking part-of speech annotations is as follows:

>>> tagged = treebank.tagged_words()

>>> tagged [(‘Pierre’, ‘NNP’), (‘Vin k en’, ‘NNP’), (‘,’, ‘,’), . ..]

In order to make evaluations by using different taggers, we needed to get the tagged sentences and non-tagged

sentences for tagging and evaluation. We have used tagged sents() method to get the tagged sentences for evalu-

ation, and sents() method to get the original sentences for tagging. The sample code we used for getting the

tagged and nontagged sentences are as follows;

>>> treebank_tagged_sents = treebank.tagged_sents()

>>> treebank_tagged_sents [[(‘Pierre’, ‘NNP’), (‘Vink en’, ‘NNP’), (‘,’, ‘,’), (‘61’, ‘CD’) , (‘year s’, ‘NNS’),

(‘old’, ‘JJ’), (‘,’, ‘,’), ...]

>>> treebank_sents = treebank.sents()

>>> treebank_sents [[‘Pierre’, ‘Vinken ’, ‘,’, ‘61’, ‘years’, ‘old’, ‘,’, ‘will’, ‘join’, ‘the’, ‘board’, ‘as’, ‘a’,

‘nonexecutive’, ‘director’, ... ]

2.2. Application of POS Taggers

Since we have the tagged and non-tagged versions of the same text, we are able to evaluate the correctness of a

tagger. There are several taggers in Phyton NLTK library, which we used to tag and evaluate the tagger algo-

rithm. The taggers available in the NLTK library can be explained as follows;

Default Tagger: Applied with assigning NN (Noun) tag to each token. It basically gives the percentage of

nouns, which is most likely to occur in a sentence.

Regex Tagger: The regular expression tagger is initialized with a pattern array, and tags according to that pat-

tern array.

Unigram Tagger: It assigns a tag to a word which is most likely to occur. More specifically, it trains with a

training data and calculates the occurrences of the words, then tags the test data according to the occurrence sta-

tistics in the training data.

Bigram Tagger: It applies statistically the same procedure as the unigram tagger with one specific difference.

S. Yumusak et al.

159

It checks for the occurrences of the words together with one word before.

Trigram Tagger: It applies statistically the same procedure as the unigram tagger with one speciﬁc difference.

It checks for the occurrences of the words together with two words before. As it is shown in Figure 1, the

n-gram tagging is explained by using tri-gram representation as; the tag to be chosen, tn, is circled, and the con-

text is shaded in grey. We have n = 3; that is, we consider the tags of the two preceding words in addition to the

current word. An n-gram tagger picks the tag that is most likely in the given context.

2.3. Unsupervised Taggers

In order to evaluate the taggers on the treebank corpus, we first started with un-supervised taggers. In the pro-

gram code below, we applied the most frequent tag, i.e. the NN tag to the default tagger. It shows that 13 percent

of the tags are nouns for the treebank corpus.

>>> defaultTagger = nltk.DefaultTagger(‘NN’)

>>> evalResult = defaultTagger.evaluate (treebank.tagged_sents())

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 13.08%

Secondly, we improved the Regular expression template given in [8] and formed a new pattern as follows:

patterns = [

(r‘.*ing$’, ‘VBG’),

(r‘.*ed$’, ‘VBD ’),

(r‘.*es$’, ‘VBZ’),

(r‘.*ould$’, ‘MD’),

(r‘.*\’s$’, ‘NN$’),

(r‘.*s$’, ‘NNS’) ,

(r‘(The|th e|A|a |An |an)$’, ‘AT’),

(r‘.*able$’, ‘JJ’),

(r‘.*ly$’, ‘RB’),

(r‘.*s$’, ‘NNS’) ,

(r‘.*’, ‘NN’)]

The regular expression tagger is applied as follows:

>>> regexTagger = nltk.RegexpTagger(patterns)

>>> evalResult = regexTagger.evaluate(treebank.tagged_sents())

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 20.52%

Regular expression tagger by itself is limited to very common language properties; therefore it is able to tag

only the 20.52% of the whole corpus correctly.

2.4. Supervised Taggers

There are also some supervised taggers available in the standard installation of Python environment. We applied

3 step n-gram taggers in order to analyze their tagger performance.

First of all, unigram tagger is analyzed. Unigram tagger is designed as a simple statistical algorithm. It assigns

the tags with the most probable tag by calculating the frequencies of each token [8]. In the first step it trains the

data, and then tests it with a different data. Unigram Tagger is applied to the treebank corpus with the Python

code below:

>>> corpusSize = int(len(treebank.tagged_sents()) * 0.8)

Figure 1. N-gram tagging [8].

S. Yumusak et al.

160

>>> trainingSents = treebank.tagged_sents()[:corpusSize]

>>> testSents = treebank.tagged_sents()[corpusSize:]

>>> unigram_tagger = nltk.UnigramTagger(trainingSents)

>>> evalResult = unigram_tagger.evaluate (testSents)

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 86.25%

In this code, we selected 80% of the treebank corpus as the training data, 20% of the corpus as the test data.

The accuracy of the tagger is 86, 25% for the treebank corpus.

Secondly, we looked for the multiple n-gram taggers. The difference between unigram and n-gram is ex-

plained in [8] Natural Language Processing with Python as follows; when we perform a language processing

task based on unigrams, we are using one item of context. In the case of tagging, we consider only the current

token, in isolation from any larger context. Given such a model, the best we can do is tag each word with its a

priori most likely tag. This means we would tag a word such as wind with the same tag, regardless of whether it

appears in the context the wind or to wind. An n-gram tagger is a generalization of a unigram tagger whose con-

text is the current word together with the part-of-speech tags of the n-1 preceding tokens.

The bigram tagger is applied similar to the unigram tagger, the same training and test sets are used.

>>> bigramTagger = nltk.BigramTagger(trainingSents)

>>> evalResult = bigramTagger.evaluate(testSents)

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 13.55%

The accuracy of the tagger decreases dramatically according to the unigram tagger. This decrease is explained

in [8] as follows; Notice that the bigram tagger manages to tag every word in a sentence it saw during training,

but does badly on an unseen sentence. As soon as it encounters a new word (i.e., 13.5), it is unable to assign a tag.

It cannot tag the following word (i.e., million), even if it was seen during training, simply because it never saw it

during training with none tag on the previous word. Consequently, the tagger fails to tag the rest of the sentence.

As a third step, we applied the trigram tagger to treebank corpus. As it is seen in the following code, the same

training and test sets are applied.

>>> TrigramTagger = nltk.TrigramTagger(trainingSents)

>>> evalResult = TrigramTagger.evaluate(testSents)

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 67.06%

The accuracy of the trigram tagger is better than the bigram tagger, however it is low for an application on a

gold standard corpus. Thus, we applied all the taggers by combining them in an order as follows:

>>> defaultTagger = nltk.DefaultTagger(‘NN’)

>>> regexpTagger = nltk.RegexpTagger(patterns, backoff = defaultTagge r)

>>> unigramTagger = nltk.UnigramTagger(trainingSents, backoff = regexpTagger)

>>> bigramTagger = nltk.BigramTagger(trainingSents, backoff = unigramTagger)

>>> trigramTagger = nltk.TrigramTagger(trainingSents, backoff = bigramTagger)

>>> evalResult = trigramTagger.evaluate(testSents)

>>> print ‘Accuracy is: %4.2f %%’ % (100.0 * evalResult)

Accuracy is: 89.56%

This code does the tagging with this order; firstly look for the trigram tagger, then bigram, unigram, reg ex,

lastly the default tagger is used for the non-tagged ones. Consequently, the result becomes better than all of the

above taggers used.

3. Experimental Results

In the evaluation section, all types of taggers used in this work are applied to three corpora, which are the Brown

Corpus, Penn Treebank Corpus and the NPS Chat Corpus. The combination of taggers is selected by combining

taggers by every possible order starting with default and regex taggers. In Figure 2 horizontal axis represents

the combination of taggers. For instance, drubt means; default, reg ex, unigram, bigram, trigram taggers applied

respectively. The vertical axis is the accuracy percentage. As it is seen in Figure 2, the best accuracy result

comes with drubt ordering with a small difference from the drutb combination. Thus, we used drubt ordering for

combination of taggers accuracy calculation. The results are listed in Table 1.

S. Yumusak et al.

161

Figure 2. Combination of taggers accuracy.

Table 1. NLTK tagging accurac ies.

Corpus Name Default (%) Regex (%) Unigram (%) Bigram (%) Trigram (%) Comb. of Taggers (%)

Brown Corpus 10.99 25.78 87.70 33.88 19.20 91.33

Treebank Corpus 14.47 21.75 86.25 11.33 6.7 89.56

NPS Chat Corpus 9.11 12.69 80.85 43.61 35.91 86.45

As seen in Table 1, Unigram Tagger performs better than the other taggers when applied alone on all three

corpus. But the combinations of taggers’ results are even better. Whereas trigram and bigram taggers give lower

results in accuracy, its recall is most probably higher than other taggers. Although we are not able to calculate

the recall value, we can infer the probability of getting two(bigram) words together in different tags from one

sentence to another is logically lower than one(unigram) word. Thus, if we use the most precise (trigram) tagger

first, then continue with more generalized taggers as bigram, unigram, regex, default taggers respectively; the

accuracy becomes higher than any other individual tagger. When we look at the results vertically for the cor-

puses, the NPS chat corpus is tagged better than the other ones with the bigram and trigram taggers. Since a

person use a limited vocabulary in a chat conversation and use usually similar patterns, the accuracies in bigram

and trigram taggers seems reasonable. However, in the treebank and brown corpora, the trigram tagger does the

worst, even worse from the default tagger in treebank corpus. This shows us that the variety of English words

and phrases are higher than the NPS Chat corpus. Whereas, n-gram taggers in general give more accurate results

than the others applied, bigram and trigram taggers need more data to train in order to give better results for the

Penn Treebank and Brown Corpora.

4. Conclusion

In this work, part of speech tagging is analyzed by improving the usage of different techniques and different

corpora. The differences between each corpus are analyzed according to their tagging accuracies. Whereas

Brown and Treebank corpora are similar according to their accuracy values, NPS Chat corpus differs signifi-

cantly. Consequently, NPS Chat corpus seems as a good reference for non standard internet data and the results

might be generalized for the free text. For the future work, more advanced taggers will be analyzed such as; Brill

Tagger, Hunpos Tagger and Stanford Tagger. Whereas Brill and Hunpos taggers are available in the python en-

vironment, Stanford tagger needs a reference for the tagger java file for tagging. Moreover, some more im-

provements will be made for the regular expression tagger by using more complex patterns. Finally, some Tur-

kish Corpus will be applied for these taggers and be analyzed according to their accuracies.

S. Yumusak et al.

162

Acknowledgements

This study has been supported by Scientific Research Project of Selcuk University.

References

[1] Brants, T. (2006) Part-of-Speech Tagging. Encyclopedia of Language & Linguistics (Second Edition), Elsevier, Ox-

ford, 221-230.

[2] Søgaard, A. (2010) Simple Semi-Supervised Training of Part-of-Speech Taggers. Proceedings of the ACL 2010 Con-

ference Short Papers, Uppsala, 11-16 July 2006, 205-208.

[3] Das, D. and Petrov, S. (2011) Unsupervised Pa rt-of-Speech Tagging with Bilingual Graph-Based Projections. H LT’11

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technol-

ogies, Association for Computational Linguistics, Portland, 19-24 June 2011, 600-609.

[4] Ittoo, A. and Bouma, G. (2013) Term Extraction from Sparse, Ungrammatical Domain-Specific Documents. Expert

Systems with Applications, 40, 2530-2540.

[5] Demner-Fushman, D., Chapman, W.W. and McDonald, C.J. (2009) What Can Natural Language Processing Do for

Clinical Decision Support? Journal of Biomedical Informatics, 42, 760-772.

[6] Nothman, J., Ringland, N., Radford, W., Mur phy, T. and Curran, J.R. (2013) Learning Multilingual Named Entity

Recognition from Wikipedia. Artiﬁcial Intelligence, 194, 151-175.

[7] Marcus, M.P. , Santorini, B. and Marcinkiewicz, M.A. (1993) Building a Large Annotated Corpus of English: The Penn

Treebank. Computational Linguistics, 19, pp. 313-330.

[8] Bird, S., Klein, E. and Loper, E. (2009) Natural Language Processing with Python. OReilly Media, USA.

[9] NLTK 3.0 Documentation. http://www.nltk.org/

[10] Brown Corpus Manual. http://icame.uib.no/brown/bcm.html

[11] The NPS Chat Corpus. http://faculty.nps.edu/cmartell/NPSChat.htm

[12] Coprus Readers-Tagged Corpora. http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#tagged-corpora