A Preliminary Version of an Internet-Based Picture Naming Test

doi:10.4236/ojml.2012.21005

Paper Menu >>

Journal Menu >>

Open Journal of Modern Linguistics

2012. Vol.2, No.1, 34-41

Published Online March 2012 in SciRes (http://www.SciRP.org/journal/ojml) http://dx.doi.org/10.4236/ojml.2012.21005

A Preliminary Version of an Internet-Based Picture Naming Test

Anatoliy V. Kharkhurin

Department of International Studies, American University of Sharjah, Sharjah, UAE

Email: akharkhurin@aus.edu

Received December 16th, 2011; revised February 15th, 2012; accepted February 22nd, 2012

The study presents a web-based productive vocabulary assessment tool, the internet Picture Naming Test

(iPNT). The iPNT is administered online and takes eight minutes to complete. The iPNT assesses vo-

cabulary knowledge by rating participants’ responses to 120 colored drawings of simple objects. Partici-

pants type the names of the objects and the names are saved as a computer file that can be uploaded into

statistical software for further processing. The test is rated by comparing participants’ responses against a

list of correct labels. High test-retest reliability suggests that iPNT can be considered a reliable measure.

The study evaluates convergent validity of the iPNT by comparing its scores with paper-based and oral

versions of the same test and concurrent validity by comparing its scores with that of receptive Peabody

Picture Vocabulary Test, a language aptitude Cloze test, standard admission Test of English as a Foreign

Language, and DIALANG diagnostic tool. Highly significant correlations between the scores on these

tests and iPNT scores suggest that the latter is a suitable assessment tool for language proficiency. How-

ever, the moderate correlation values ranging from .52 to .68 indicate that the use of this test should be

limited to psychometric research assessing an individual’s productive vocabulary knowledge.

Keywords: Online; Language Proficiency; Picture Naming; Test

Introduction

The realities of the contemporary world with its vast migra-

tions, massive international cultural, political and economic

interactions encouraged the representatives of different national,

ethnic, religious and social groups to select a common language

that facilitates mutual communication. The contemporary lin-

gua franca, English, became the vehicular language in various

areas of human endeavor, including academic communities.

With the increasing number of international academic commu-

nities requiring English as second language, there is a growing

demand in testing of language skills. Most educational institu-

tions have an English exam as an admission criterion for non-

native speakers (e.g., Test of English as a Foreign Language,

TOEFL; Cambridge English for Speakers of Other Languages

certificates). These tests present a comprehensive assessment of

a variety of linguistic skills (e.g., internet-based TOEFL as-

sesses listening, writing, reading, and speaking skills). The

European Community sponsored an on-line diagnostic lan-

guage assessment system DIALANG1, which is based on the

Common European Framework of Reference (Council of

Europe, 2001). It provides learners with information about their

language proficiency in 14 European languages and informs

them of their Common European Framework level (Chapelle,

2006). This tool assesses the learner’s skills in listening, writing,

reading, structure, and vocabulary.

In addition to educational purposes, testing of linguistic skills

becomes an important issue for empirical research in first and

second language acquisition and bilingualism. A large portion

of studies in these fields either lack control over participants’

language proficiency (see Kharkhurin, 2005, for an overview)

or employ admission tests developed for educational purposes.

However, one apparent problem with using the admission tests

is the duration of the testing, which cannot be suitable for ex-

perimental conditions. For example, the internet-based TOEFL

may take around four hours (ETS, 2008b) and DIALANG may

last for up to two hours. Although, this time frame is feasible in

a class-room setting, it may become a challenge for time con-

strained experimental settings with adults. Therefore, contem-

porary language related empirical research focuses on increas-

ing the number of tests that can be completed within a rela-

tively short time interval. These tests use different techniques

and claim to assess different linguistic skills.

Several techniques have gained a reputation of being a reli-

able language assessment tool and have thus been widely used

in empirical research. The Cloze procedure (Taylor, 1953) asks

participants to complete written text with various gaps. Exten-

sive empirical investigation presents such tests as assessments

of overall native language proficiency (e.g., Dupuis, 1980; Pe-

terson, Peters, & Paradis, 1972). For example, Dupuis found a

Cloze test to be a good predictor of reading comprehension in

monolingual 10th graders. This test was also shown as a reliable

indicator of foreign language proficiency in second language

learners (e.g., Baldauf & Propst, 1979; Oller, 1972, 1973; Oller

& Conrad, 1971). It was found to strongly correlate with the

TOEFL (Darnell, 1968). Another study demonstrated the ability

of a Cloze test to discriminate between English native speakers

who learned German, Japanese, Russian, or Spanish for the first,

second, or third semester, i.e. the scores on the Cloze test cor-

related with the length of foreign language study (Briere,

Clausing, Senko, & Purcell, 1978). Jochems and Montens

(1987) tested second language Dutch learners on a Cloze test

and four tests of language proficiency: listening, speaking,

reading, and writing, conducted by the national workgroup

Centrale Toets Nederlands. They found that the scores on their

Cloze test highly correlated with all four tests of language pro-

ficiency and appeared to form a solid basis for prediction of the

1http://www.lancs.ac.uk/researchenterprise/dialang/about.htm

A. V. KHARKHURIN

total scores for all these tests taken together.

A second widely used test, the Peabody Picture Vocabulary

Test (PPVT) of receptive vocabulary, asks participants to indi-

cate which of the four shown pictures corresponds to a name

spoken by the experimenter. Clinicians and researchers rely on

the test to accurately assess children’s and adults’ single word

lexical knowledge. According to the user’s manual of the fourth

edition of the test (Dunn & Dunn, 2007), PPVT scores corre-

lated with the scores of the second edition of the Expressive

Vocabulary Test (Williams, 2007; mean r = .82 across age

groups), the Comprehensive Assessment of Spoken Language

(Carrow-Woolfolk, 1999; adjusted r ranging from .41 to .79 for

various test activities and age groups), the fourth edition of a

more comprehensive Clinical Evaluation of Language Funda-

mentals (Semel, Wiig, & Secord, 2003; adjusted r ranging

from .67 to .75 for various test activities and age groups), and

the Group Reading Assessment and Diagnostic Evaluation

(Williams, 2001; mean r = .63 across various test levels).

A third commonly used test of productive vocabulary, the

Boston Naming Test (BNT, Kaplan, Goodglass, & Weintraub,

1983) is generally used by clinicians to assess word retrieval

performance of brain-damaged patients. The test consists of 60

outline drawings of objects and animals presented in the order

of word frequency and grade of difficulty. Participants are

asked to name the pictures arranged in a booklet. This test was

recognized as a reliable tool to identify naming deficits and

impaired word-retrieval capacities in a variety of cerebral pa-

thologies in an adult and a childhood population (see Mariën,

Mampaey, Vervaet, Saerens, & De Deyn, 1998, for a summary).

It has been translated into several languages and administered

to healthy populations from a variety of age and gender groups

with different educational backgrounds in various geographic

regions (see Patricacou, Psallida, Pring, & Dipper, 2007, for a

summary).

In contrast to TOEFL and DIALANG, the Cloze test, PPVT,

and BNT require considerably less administration time. How-

ever, they still need to be administered and rated by an experi-

menter. The purpose of the present study is to present a new

measure that provides a fully automatic, rater independent, and

reliable measure of language proficiency that can be adminis-

tered in any location in a relatively short time interval. The test

employs a technique similar to the BNT; that is, participants are

presented with drawings of objects and asked to name these

objects. There are four major differences of the proposed pic-

ture naming test (PNT) and the original BNT. First, it is admin-

istered on the web and therefore can be accessed worldwide.

Second, the responses are to be provided in a written form (not

orally as in the BNT), which eliminates the need for the ex-

perimenter to be present. Third, it is timed and therefore it en-

sures equal testing time for all participants. This condition ap-

pears to be crucial when the test is administered in an uncon-

trolled manner. Fourth, the written responses are recorded in a

computer file that can be automatically uploaded to statistical

software for further analysis.

An obvious limitation of this test, as well as many other lan-

guage proficiency tests used in psycholinguistic research, is its

inability to assess all four major language skills: speaking,

writing, listening, and reading (cf. Padilla & Ruiz, 1973).

However, the limited testing scope is compensated for by a

short testing time. A high predictive power of these tests may

also reconcile the researchers with this limitation. Indeed,

Kharkhurin (2005) found in a pilot study that paper-based writ-

ten PNTs in English and Russian highly correlated with the

Cloze procedure in these respective languages (r = .77, p < .01

for English; and r = .83, p < .01 for Russian). Another study

found this test to strongly correlate with participants’ self-rating

of language skills in English and Russian and their self-as-

sessment of the degree of Russian-English bilingualism (Khark-

hurin, 2008). To ensure the concurrent validity of the PNT in

the present study, participants’ performance on this test was

compared with a battery of other measures including an as-

sessment tool (TOEFL), a diagnostic tool (DIALANG), a com-

mon test of language proficiency (a Cloze procedure), and a

widely used test in psycholinguistic research (the PPVT). To

ensure convergent validity of this test, in addition to the inter-

net-based version (iPNT), the PNT was presented to the same

individuals in paper-based written and oral forms.

The objective of the present study was to present the meth-

odology of the web-based productive vocabulary test with three

crucial characteristics: 1) iPNT is internet-based; 2) iPNT is

timed; 3) iPNT automatically produces a statistical software

compatible output file with an individual’s responses. Experi-

ment 1 of the study aimed to provide evidence for convergent

validity of the iPNT by comparing performance on this test

with that on other versions of the PNT, and concurrent validity

of the iPNT by correlating performance on this test with that on

TOEFL, DIALANG, a Cloze test, and the PPVT. Experiment 2

evaluated test-retest reliability of the iPNT by administering the

test to the same group of participants twice.

Experiment 1

Method

Participants

The participants were 87 American University of Sharjah

(United Arab Emirates) students (29 male and 58 female; aged

between 17 and 34, M = 20.10, SD = 2.19) who were recruited

from the General Psychology subject pool. Although, they var-

ied in their countries of origins (representing Middle East, Asia,

Africa, North America, and Europe) and therefore in the distri-

bution of their native languages, all of them were fluent in Eng-

lish due to the fact that English is the language of instruction at

the University.

Instruments and Procedure

Upon completion of an online biographical questionnaire and

submitting a copy of the TOEFL, participants were given a

battery of language proficiency tests that were distributed be-

tween two sessions. One session included Cloze procedure,

DIALANG online testing, and internet- and paper-based ver-

sions of PNT. One of the PNT versions was presented at the

beginning and the other at the end of the session to minimize

the priming effect. The other session included the PPVT and an

oral version of the PNT. The presentation order of the tests in

both sessions was counterbalanced across participants.

Biographical Questio n naire

An online multilingual and multicultural experience ques-

tionnaire2 was administered to determine participants’ linguistic

and cultural background. They received a questionnaire that

among other issues, obtained data on each participant’s place of

origin, languages they speak, their assessment of linguistic

2http://surveys.aus.edu/index.php?sid=87644

A. V. KHARKHURIN

skills in each of these languages, and age of acquisition of these

languages.

Picture Naming Test

This test was initially designed by Kharkhurin (2005) as a

test of productive vocabulary, which assesses language profi-

ciency as the accuracy of participants’ responses to pictures of

simple objects, a technique similar to the BNT and the one used

by Lemmon and Goggin (1989). The test stimuli are 120 pic-

tures of simple objects (Appendix A) randomly selected from

those scaled by Rossion and Pourtois (2004), an improved ver-

sion of Snodgrass and Vanderwart (1980). The procedure re-

quires participants to produce a name of the object presented in

the picture, which they would normally use in everyday life.

Three versions of the PNT were used in the present study. In

the oral version, each picture was presented separately on the

computer screen using Microsoft PowerPoint’s full screen

mode. Participants were asked to label an object in the picture

by saying its name out loud. There was no time restriction for

this test, but participants were encouraged to respond as fast as

possible. Responses were recorded using Microsoft Sound Re-

corder version 5.1 software and played back during rating. In

the paper-based version, the pictures were arranged on four

pages. Participants wrote down their responses in a booklet

with numbered lines corresponding to the pictures. Each par-

ticipant was given two minutes to label as many as possible of

the 30 pictures on each page. In the internet-based version3, the

pictures are presented using LimeSurvey version 1.72 environ-

ment. They are arranged in four groups each of which appears

on a separate webpage; each picture is accompanied by a 50

character space provided for an answer. The presentation order

of the pictures within each group is randomized. Participants

are given two minutes to label as many as possible of the 30

pictures on each page. The timer in the top left corner of the

page indicates the elapsed time. After the time is elapsed, an

“out of time” message appears on the screen and next page is

loaded automatically.

The scoring procedure was the same for all three versions.

Each response was scored either 1 or 0, so that the maximum

number of points for picture naming was 120. A list of appro-

priate labels was generated for each picture. A list of primary

labels was adopted from Snodgrass and Vanderwart (1980).

The average name agreement coefficient4 for these labels

was .50 with 94.79% of participants giving the primary label,

which according to Snodgrass and Vanderwart, suggests high

name agreement for these labels. The word frequency for the

primary list ranged from 1 to 431 (M = 35.01, SD = 70.40) per

million according to Kučera and Francis (1967) and from .12 to

483.06 (M = 33.26, SD = 68.44) per million according to Brys-

baert and New (2009). A list of secondary labels was formed

based on the synonyms for the primary lables obtained by

Snodgrass and Vanderwart. The word frequency for both pri-

mary and secondary lists ranged from .12 to 509.37 (M = 35.75,

SD = 76.60) per milion acording to Brysbaert and New. If the

participants’ response matched the corresponding item on the

list, they scored 1 point; otherwise, 0 points. Two sets of rating

strategies were used: the primary rating gave a point only if the

label from a primary list was used; the secondary rating gave a

point if the label from either primary or secondary list was used;

the strict rating gave a point if the produced label was spelled

correctly; the lenient rating disregarded the spelling errors.

Therefore, the paper- and internet-based PNTs received four

scores: primary strict, primary lenient, secondary strict, and

secondary lenient, and oral PNT received two scores: primary

and secondary.

Cloze Procedure

The materials for the Cloze procedure were adopted from the

practice tests for Cambridge English for Speakers of Other

Languages certificates, an examination for people who use

everyday written and spoken English at an upper-intermediate

level for work or study purposes. Participants were asked to

complete text with various linguistic gaps. In the rational selec-

tion procedure (Jongsma, 1980), the words of different lexical

categories were deleted from the text fragments and substituted

by blank spaces; participants had to insert the missing words.

Two versions of the Cloze procedure were employed: a multi-

ple-choice and an open-end acceptable response tasks. In the

multiple-choice Cloze task, participants were asked to select

one out of four words that best fits the blank space. In the open-

end acceptable response Cloze task, they were asked to provide

a word that best fits the blank space. Two texts with 15 blank

spaces were supplied for each version of the test. Each partici-

pant received two texts, one of each version, preceded by a

written instruction that explained the procedure and provided an

example. The texts were presented to participants in a counter-

balanced order to prevent any task version or fatigue effect.

Participants had 10 minutes to complete both texts. They were

given 1 point if their answer matched the one in the Cambridge

English for Speakers of Other Languages certificates’ answer

key and 0 otherwise, which resulted in a maximum score for the

Cloze procedure equating 30.

Peabody Picture Vocabulary Test IV, Form A

This is a standardized test of receptive vocabulary (Dunn &

Dunn, 2007). The task is to indicate one out of four pictures

shown on the test plate, which corresponds to the name given

by the experimenter. The plates are arranged in order of in-

creasing difficulty and grouped in 19 sets of 12 trials each. A

raw score is calculated using basal and ceiling sets determined

by the scoring procedure. The basal set is the easiest one in

which a participant makes one error or less; the ceiling set is the

one in which a participant makes eight errors or more. Testing

continues until the ceiling set is determined. The raw score is

calculated as a difference between a number of all possible

correct responses (computed as a ceiling set number multiplied

by 12) and a number of errors made by a participant during

testing. The raw score is converted to a standard score by the

recommended procedure, which takes age-related norms into

account.

3http://surveys.aus.edu/index.php?sid=31316

4Snodgrass and Vanderwart (1980) defined a name agreement coefficient as

a distribution of names given to a picture across participants. A picture that

obtained the same name from every participant had a name agreement coef-

ficient equal to .00 (perfect name agreement). A picture that obtained ex-

actly two different names with equal frequency had a name agreement coef-

ficient equal to 1.00. Increasing name agreement value indicated decreasing

name agreement.

Test of English as a Foreign Language

The standard TOEFL certificate comes in three versions:

A. V. KHARKHURIN

internet-based, computer-based, and paper-based. The reliabil-

ity estimates for the test range from .74 to .85 for different

skills and .94 for the total score (ETS, 2008a). Zhang (2008)

compared the test scores of 12,385 examinees who have taken

two internet-based TOEFLs within a period of one month. The

correlations of their scores on the two test forms were .77 for

listening and writing sections, .78 for reading, .84 for speaking,

and .91 for the total test score.

The American University of Sharjah admission procedure

requires all students to obtain a minimum of 71 for internet-

based, 197 for computer-based, or 530 for paper-based TOEFL.

This requirement ensured that all participants had their TOEFL

certificate, and therefore they were asked to submit it before the

beginning of the testing. The obtained scores from different

certificate versions were converted into computer-based scores

using TOEFL score comparison conversion tables (ETS, 2005).

The computer-based scores for listening, reading, writing, and

total were used in the further analyses. It is important to note

however that different participants have taken the TOEFL ex-

amination at different times before the current testing (ranging

from 1 to 7 years, M = 3.07, SD = 1.09), and therefore their

scores cannon be considered an accurate measure of their lan-

guage proficiency at the time of testing.

DIALANG

The English version of DIALANG was used in the present

study. This testing system consists of a number of activities

assessing the linguistic skills in five domains on three levels of

difficulty (see Alderson & Huhta, 2005, for detailed descrip-

tion). In the beginning, participants are asked to do a Vocabu-

lary Size Placement Test (VSPT), which is used to estimate the

vocabulary size and to determine the level of subsequent testing.

In the VSPT, participants have to decide whether the letter

string presented is a word or a non-word (e.g., “to study” is a

word, “to futt” is a non-word). The test uses 75 verbs (50 words

and 25 non-words) presented in a random order, and the VSPT

score range is 1 - 1000. After the placement test, participants

are presented with five modules assessing linguistic skills in

listening, writing, reading, structure, and vocabulary, which

they can take in the order of their preference. The first three

modules are preceded by a self-assessment questionnaire, in

which participants are asked to make judgments about their

abilities in the selected language skill by validating 18 state-

ments per skill (e.g., for listening: “I can catch the main points

in broadcasts on familiar topics and topics of personal interest

when the language is relatively slow and clear.”). The self-

assessment is also used to determine the level of subsequent

testing. After completion of both the VSPT and the self-as-

sessment, the system combines the two results to decide which

level of linguistic skill testing to administer. In the listening

module, participants hear a short vocal presentation and receive

questions based on this presentation. In the writing module,

participants are asked to fill in the gaps in the text. In the read-

ing module, they are asked to read a short text and answer

questions based on this material. In the structure module, par-

ticipants’ knowledge of grammar is probed. Finally, the vo-

cabulary module assesses participants’ understanding of the

words. The test items come in four different formats: multiple

choice, drop-down menus, text entry and short-answer ques-

tions. All self-assessment and testing modules are scored ac-

cording to six levels of the Common European Framework of

Reference scale (Council of Europe, 2001) listed in the order of

increased proficiency: A1, A2, B1, B2, C1, C2.

Results

The purpose of this experiment was to investigate the valid-

ity of the internet-based PNT. Therefore, participants’ iPNT

scores were compared with the paper-based and oral versions of

the PNT as well as with the scores on the standardized meas-

ures of language proficiency: TOEFL, PPVT, Cloze, and DIA-

LANG.

Picture Naming Test

The mean scores for all three versions of the PNT are pre-

sented in Table 1 and the correlations between these scores are

shown in Table 2. All four testing strategies applied to each of

three testing modes obtained nearly perfectly correlated scores.

The correlations between different PNT versions were also

significantly high. These results suggest that various testing

modes and the rating strategies assess the vocabulary knowl-

edge similarly. This finding provides a justification for em-

ploying the internet-based PNT version that can be rated using

the primary list of labels and strict spelling. The primary strict

iPNT scores are used in the further analyses.

Note however that all three versions differ in the magnitude

of obtained scores. Oral PNT rated with the primary list of la-

bels obtained significantly higher scores than its internet- and

paper-based counterparts rated with the lenient condition5 (t =

4.73, p < .001 and t = 2.81, p < .01, respectively); the latter two

scores were not significantly different. When the secondary list

of labels was allowed, the oral PNT obtained the highest scores

followed by paper-based (t = –2.68, p < .01) and internet-based

PNTs (t = –7.39 and t = –3.77, respectively, both ps < .001).

Participants were also found to obtain significantly higher

scores on the paper-based PNT rated with the primary list of

labels using the strict condition than on the iPNT rated with the

same strategy (ΔM = 6.00, t = 4.66, p < .001).

TOEFL

Table 3 presents correlations for the computer-based TOEFL

scores for listening, reading, writing, and total. The multiple

regression analysis revealed that listening and reading scores

Table 1.

Mean scores and standard deviations (SD) for internet-based, paper-

based, and oral PNTs obtained by applying four rating strategies: pri-

mary strict, secondary strict, primary lenient and secondary lenient.

PNT version/Rating strategy Mean SD

Internet-based/Primary strict 72.05 17.33

Internet-based/Secondary strict 83.39 14.39

Internet-based/rimary lenient 74.82 17.82

Internet-based/Secondary lenient 86.06 14.89

Paper-based/Primary strict 78.05 17.29

Paper-based/Secondary strict 85.54 16.07

Paper-based/Primary lenient 83.30 18.40

Paper-based secondary lenient 91.72 17.71

Oral primary 89.18 9.18

Oral secondary 95.66 8.66

5Strict condition in rating of the internet- and paper-based PNTs cannot be

compared with the oral PNT, because the latter involves no spelling.

A. V. KHARKHURIN

Table 2.

Pearson correlations between internet-based, paper-based, and oral PNTs scores obtained by applying four rating strategies: primary strict, secondary

strict, primary lenient and secondary lenient.

Internet-based Paper-based Oral

2 3 4 5 6 7 8 9 10

1. Internet-based primary strict .96 1.00 .96 .76 .71 .76 .69 .62 .60

2. Internet-based secondary strict .95 .99 .68 .66 .67 .63 .61 .58

3. Internet-based primary lenient .96 .76 .72 .77 .70 .63 .61

4. Internet-based secondary lenient .67 .66 .67 .64 .61 .58

5. Paper-based primary strict .97 .99 .96 .68 .68

6. Paper-based secondary strict .96 .99 .66 .67

7. Paper-based primary lenient .97 .67 .68

8. Paper-based secondary lenient .65 .66

9. Oral primary .98

10. Oral secondary

All ps < .001

were significant predictors of the total score (F(3, 83) = 57.44,

p < .001, adjusted-R2 = .67; b = 6.39, SE = .63, ß = .69, t =

10.18, p < .001 for listening; and b = 1.26, SE = .343, ß = .25, t

= 3.66, p < .001 for reading).

DIALANG

The DIALANG six proficiency levels were ranked 1 through

6 with a greater rank representing higher proficiency level. First,

it was found that the correlations between the self-assessment

and testing scores in all three modules for which the self-as-

sessment was administered were highly significant (ρ = .32, p

< .01, for listening; ρ = .43, p < .001, for writing; and ρ = .27, p

< .05, for reading). The correlations between the VSPT and all

testing scores were also highly significant (see Table 4).

Proficiency Tests Comparison

The iPNT, PPVT, Cloze test, total TOEFL, and DIALANG

placement scores significantly correlated with each other (see

Table 5). The iPNT was also found to significantly correlate

with DIALANG scores for listening (ρ = .53, p < .001), writing

Table 3.

Pearson correlations between TOEFL scores.

2 3 4

1. Listening .36** .18 .79***

2. Reading .27* .51***

3. Writing .23*

4. Total

*p < .05, **p < .01, ***p < .001.

Table 4.

Spearman correlations between DIALANG VSPT and testing scores.

2 3 4 5 6

1. VSPT .47 .45 .42 .44 .55

2. Listening .57 .63 .52 .48

3. Writing .61 .60 .53

4. Reading .55 .49

5. Structure .62

6. Vocabulary

All ps < .001.

(ρ = .55, p < .001), reading (ρ = .57, p < .001), structure (ρ

= .48, p < .001), and vocabulary (ρ = .57, p < .001) modules. In

addition, it correlated significantly with TOEFL scores for lis-

tening (r = .54, p < .001) and reading (r = .31, p < .01) modules.

The respective TOEFL and DIALANG scores for listening (ρ

= .57, p < .001), writing (ρ = .54, p < .001), and reading (ρ

= .41, p < .001) also significantly correlated with each other.

Experiment 2

A different group of participants from the same subject pool

was recruited for this experiment. The participants were 130

students (45 male and 85 female; aged between 17 and 26, M =

19.94, SD = 1.82). They were administered the iPNT twice with

a 35 days lag between the sessions. The responses were rated

using the primary list of labels and strict spelling (see above). A

highly significant correlation (r = .83, p < .001) between the

iPNT scores on the first and the second sessions suggests a high

test-retest reliability of the assessment tool.

Discussion

The study presents a new psychometric tool assessing lan-

guage proficiency with respect to an individual’s productive

vocabulary. The iPNT is an internet-based test that assesses

vocabulary knowledge by rating participants’ responses to 120

colored drawings of simple objects. Participants are given eight

minutes to type the names of the objects, which consequently

are compared against a list of correct labels.

To employ an automatic rating of the iPNT, only those re-

sponses that perfectly match corresponding items from a list of

correct responses were scored a point. To ensure convergent

Table 5.

Pearson correlations between various language proficiency tests.

2 3 4 5

1. iPNT .60 .53 .68 .52

2. PPVT .44 .57 .43

3. Cloze .60 .47

4. TOEFL total .45

5. DIALANG VSPT

All ps < .001.

A. V. KHARKHURIN

validity of this scoring, four rating strategies were applied to

participants’ responses: primary strict, primary lenient, secon-

dary strict and secondary lenient. The findings that all four

rating strategies provided highly correlated results justify the

iPNT rating based on the primary list of labels and strict spell-

ing. According to this rating schema, only those responses that

perfectly match correct labels should score a point. Therefore, a

simple algorithm implemented in a computer can process the

responses and automatically provide a language proficiency

score.

Three versions of the PNT—internet-based, paper-based, and

oral—were administered to participants. Although, all three

versions obtained significantly different scores, they were

found to correlate with each other highly. These findings elimi-

nate potential bias that may have occurred due to change in

media used to administer the test. The iPNT can be safely used

to control for language proficiency in a sample assessed with

the same version of the test. However, it is not recommended to

use different PNT versions in the same sample.

To test the concurrent validity of the productive vocabulary

iPNT as a test of language proficiency, it was compared with

the tests assessing different linguistic skills: receptive vocabu-

lary PPVT, overall language proficiency Cloze test, admission

TOEFL, and diagnostic system DIALANG. Although correla-

tions between the iPNT and other tests were highly significant,

the correlation values ranging from .52 to .68 indicate that this

test provides a partial assessment of the linguistic abilities

measured by other tests in this study. This limitation is common

to most abbreviated language proficiency tests, which is com-

pensated by their efficient administration. Note that the purpose

of the iPNT is to assess an individual’s productive vocabulary

knowledge. Therefore, it can be only used in research that taps

into this specific linguistic ability. For example, Kharkhurin

(2011) used paper-based PNT to assess bilinguals’ vocabulary

knowledge and relate it to their performance on the tests of

selective attention, creativity, and fluid intelligence. Greater

vocabulary knowledge was hypothesized to facilitate certain

cognitive mechanisms underlying performance on these tests.

In this framework, productive vocabulary test was an appropri-

ate measure of bilinguals’ language proficiency.

Another potential limitation of this test stems from its pro-

cedure. The participants have to type their responses within a

limited time interval, which presents a potential disadvantage

for poor typists. The sample of the present study comprised of

college students with presumably extensive typing experience,

but even in this sample the scores on the paper version were

higher than on the internet version when participants’ responses

were rated using the primary list of labels with strict spelling.

Future studies should look into this issue by establishing age

and education related norms, which reflect participants’ typing

abilities.

In conclusion, the current preliminary version of the iPNT

presents a new reliable tool that offers a number of advantages

to the empirical investigation that involves language profi-

ciency assessment. This test can be administered online in a

relatively short period of time (eight minutes) without involve-

ment of any additional resources. The data file with partici-

pants’ responses can be uploaded into statistical software for

further processing. In a new version of this test6, the iPNT score

is calculated within the testing environment and test users are

provided with an outcome immediately upon completion of the

test. Another important advantage of this test is its suitability

for any language. The iPNT can be used in any language pro-

viding a list of correct labels in that language. The current ver-

sion of the test includes an interface option to upload a list of

labels in a given language. The administrating convenience and

online accessibility of the test encourages future studies to pro-

vide the norms for this test by collecting data in a broad range

of linguistic and cultural groups.

REFERENCES

Alderson, J. C., & Huhta, A. (2005). The development of a suite of

computer-based diagnostic tests based on the Common European

Framework. Language T es tin g, 22, 301-320.

doi:10.1191/0265532205lt310oa

Baldauf, R. B., & Propst, I. K. (1979). Matching and multiple-choice

cloze tests. Journal of Educational Research, 72, 321-326.

Briere, E. J., Clausing, G., Senko, D., & Purcell, E. (1978). A look at

cloze testing across languages and levels. Modern Language Journal,

62, 23-26.

Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis:

A critical evaluation of current word frequency norms and the intro-

duction of a new and improved word frequency measure for Ameri-

can English. Behavior Resear ch M eth ods, 41, 977-990.

doi:10.3758/BRM.41.4.977

Carrow-Woolfolk, E. (1999). Comprehensive assessment of spoken lan-

guage. Circle Pines, MN: American Guidance Service.

Chapelle, C. A. (2006). DIALANG: A diagnostic language test in 14

European languages. Language Testing, 23, 544-550.

doi:10.1191/0265532206lt341xx

Council of Europe. (2001). Common European Framework of Refer-

ence for languages: Learning, teaching, assessment. Cambridge, MA:

Cambridge University Press.

Darnell, D. K. (1968). The development of an English language profi-

ciency test of foreign students, using a clozentropy procedure (No.

Bureau No. BR-7-H-OlO). Boulder, CO: Colorado University.

Dunn, L. M., & Dunn, D. M. (2007). Peabody picture vocabulary Test-

IV. Circle Pines, MN: American Guidance Service.

Dupuis, M. M. (1980). The cloze procedure as a predictor of compre-

hension in literature. Journal of Educational Research, 74, 27-33.

ETS. (2005). TOEFL® internet-based test: Score comparison tables.

Princeton, NJ: Educational Testing Service.

ETS. (2008a). Reliability and comparability of TOEFL® iBT scores.

Princeton, NJ: Educational Testing Service.

ETS. (2008b). TOEFL iBT and PBT: A comparison. Princeton, NJ:

Educational Testing Service.

Jochems, W., & Montens, F. (1987). De multiple-choice cloze-toets als

algemene taalvaardigheidstoets. Tijdschrift voor Onderwijsresearch,

12, 133-143.

Jongsma, E. A. (1980). Cloze instruction research: A second look. Ne-

wark, DE: International Reading Association.

Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston naming

test. Philadelphia, PA: Lea & Febiger.

Kharkhurin, A. V. (2005). On the possible relationships between bilin-

gualism, biculturalism and creativity: A cognitive perspective. Un-

published Dissertation, New York: City University of New York.

Kharkhurin, A. V. (2008). The effect of linguistic proficiency, age of

second language acquisition, and length of exposure to a new cultural

environment on bilinguals’ divergent thinking. Bilingualism: Lan-

guage and Cognition, 11, 225-243.

Kharkhurin, A. V. (2011). The role of selective attention in bilingual

creativity. Creativity Re searc h Journal, 23, 239-254.

doi:10.1080/10400419.2011.595979

Kučera, H., & Francis, W. N. (1967). Computational analysis of pre-

sentday American English. Providence, RI: Brown University Press.

Lemmon, C. R., & Goggin, J. P. (1989). The measurement of bilingua-

lism and its relationship to cognitive ability. Applied Psycholinguis-

tics, 10, 133-155. doi:10.1017/S0142716400008493

6htt

://www.harhur.com/research/i

nt.html

A. V. KHARKHURIN

Mariën, P., Mampaey, E., Vervaet, A., Saerens, J., & De Deyn, P. P.

(1998). Normative data for the Boston naming test in native

Dutch-speaking Belgian elderly. Brain and Language, 65, 447-467.

doi:10.1006/brln.1998.2000

Oller, J. W. Jr. (1972). Scoring methods and difficulty levels for cloze

tests of proficiency in English as a second language. Modern Lan-

guage Journal, 56, 151-158. doi:10.2307/324037

Oller, J. W. Jr. (1973). Cloze tests of second language proficiency and

what they measure. Language Learning, 23, 105-118.

doi:10.1111/j.1467-1770.1973.tb00100.x

Oller, J. W. Jr., & Conrad, C. A. (1971). The cloze technique and ESL

proficiency. Language Learning, 21, 183-195.

doi:10.1111/j.1467-1770.1971.tb00057.x

Padilla, A. M., & Ruiz, R. A. (1973). Latino mental health: A review of

literature. Washington DC: US Government Printing Office.

Patricacou, A., Psallida, E., Pring, T., & Dipper, L. (2007). The Boston

naming test in Greek: Normative data and the effects of age and edu-

cation on naming. Aphasiol ogy, 21, 1157-1170.

doi:10.1080/02687030600670643

Peterson, J., Peters, N., & Paradis, E. (1972). Validation of the cloze

procedure as a measure of readability with high school, trade school,

and college populations. In F. B. Greene (Ed.), Investigations relat-

ing to mature readers, twenty-first yearbook of the National Reading

Conference (pp. 45-50). Milwaukee: The National Reading Confer-

ence, Inc.

Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vander-

wart’s object pictorial set: The role of surface detail in basic-level

object recognition. Perception , 33, 217-236. doi:10.1068/p5117

Semel, E. M., Wiig, E. H., & Secord, W. (2003). Clinical evaluation of

language fundamentals (3rd ed.). San Antonio, TX: Psychological

Corporation.

Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260

pictures: Norms for name agreement, image agreement, familiarity,

and visual complexity. Journal of Experimental Psychology: Human

Learning & Memory, 6, 174-215. doi:10.1037/0278-7393.6.2.174

Taylor, W. L. (1953). Cloze procedure: A new tool for measuring rea-

dability. Journalism Quarterl y, 30, 415-433.

Williams, K. T. (2001). Group reading assessment and diagnostic

evaluation. Circle Pines, MN: American Guidance Service.

Williams, K. T. (2007). Expressive vocabulary test (2nd ed.). Circle

Pines, MN: American Guidance Service.

Zhang, Y. (2008). Repeater analyses for TOEFL® iBT (ETS Research

Memorandum No. RM.08-05). Princeton, NJ: Educational Testing

Service.

A. V. KHARKHURIN

Appendix A. Picture Naming Test stimuli

[I] rolling pin 23

[I] zebra

[I] pen 24

[I] basket

[I] umbrella 25

[I] cake

[I] nose 26

[I] truck

[I] doorknob 27

[I] blouse

[II] Shirt

[II] jacket

[I] box 28

[I] dress

[I] bicycle 29

[I] key

[I] rabbit 30

[I] nail

[I] refrigerator 31

[I] butterfly

[I] duck 32

[I] mouse

[I] leaf 33

[I] kangaroo

[I] coat 34

[I] mountain

[I] frog 35

[I] mushroom

[I] doll

[II] baby

[II] little girl

[I] hanger

[I] screwdriver 37

[I] lamp

[I] kettle

[II] tea kettle

[II] teapot

[I] cigar

[I] cap 39

[I] balloon

[I] pants 40

[I] baby carriage

[II] carriage

[I] brush 41

[I] chair

[I] sweater 42

[I] eye

[I] pineapple

[I] snake

[I] dresser

[II] bureau

[II] chest

[II] chest of drawers

[I] pear 68

[I] clown

[I] bell 69

[I] watermelon

[I] hat 70

[I] anchor

[I] grapes 71

[I] rooster

[II] chicken

[I] fork 72

[I] wine glass

[II] glass

[II] goblet

[I] helicopter 73

[I] chicken

[II] hen

[I] light bulb 74

[I] pipe

[I] ruler 75

[I] frying pan

[II] pan

52 [I] seal 76

[I] windmill

53 [I] car

[I] Lincoln 77

[I] corn

54 [I] wrench

55 [I] rhinoceros

[I] moon

[II] quarte

r moon

nt moon [II] cresce

[II] half moon

56 [I] donkey 79

[I] saltshaker

57 [I] hammer 80

[I] arrow

58 [I] horse 81

[I] turtle

59 [I] whistle 82

[I] harp

60 [I] sandwich 83

[I] stool

[I] sock 84

[I] church

[I] rocking chair 85

[I] nut

63 [I] hand 86

[I] motorcycle

[I] strawberry 87

[I] flower

65 [I] clothespin 88

[I] traffic lig

[II] stop light

66 [I] paintbrush

[II] brush 89

[I] goat

67 [I] flag 90

[I] cup

91 [I] camel 106

[I] spoon

92 [I] train 107

[I] television

[II] tv

[II] television se

93 [I] ant 108

[I] pencil

94 [I] dog 109

[I] wheel

95 [I] toothbrush 110

[I] iron

96 [I] swan 111

[I] apple

97 [I] saw 112

[I] scissors

[I] violin 113

[I] canon

[I] spool of thread

[II] thread

[II] spool

114

[I] shirt

100 [I] baseball bat

[II] bat 115

[I] caterpillar

101 [I] star 116

[I] owl

102

[I] cigarette 117

[I] bus

103

[I] pitcher 118

[I] beetle

104 [I] envelope 119

[I] axe

105 [I] drum 120

[I] light switch

[I] indicates the words from the primary list and [II] the words from the secondary

list.