Open Journal of Modern Linguistics
2013. Vol.3, No.3, 182-189
Published Online September 2013 in SciRes (
Copyright © 2013 SciRes.
The Discrimination of English Vowels by Cantonese ESL
Learners in Hong Kong: A Test of the Perceptual
Assimilation Model
Alice Y. W. Chan
City University of Hong Kong, Hong Kong, China
Received January 6th, 2013; revised March 1st, 2013; accepted March 9th, 2013
Copyright © 2013 Alice Y. W. Chan. This is an open access article distributed under the Creative Commons At-
tribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
This article discusses the results of a study which investigated Cantonese ESL learners’ perception of
English vowels and their perceived similarity between similar L1 and L2 vowels in an attempt to test the
prediction of the Perceptual Assimilation Model (PAM). Forty university English majors participated in
three L2 perception tasks, which aimed at discerning their perception of English vowels spoken in differ-
ent contexts, and one L1 L2 speech perception task, which aimed at discerning their classification of L2
vowels into native vowel categories and their perceived similarity between similar L1 and L2 vowels. It
was found that their classifications of English vowels into Cantonese vowels and their perception of the
corresponding English vowels did not provide strong support for the prediction of the model. The effects
and extent of native language phonological influence are yet to be determined.
Keywords: Second Language Acquisition; Speech Perception; Phonetics and Phonology
A lot of research into second language phonology acquisition
is centered around speech production, and mother tongue in-
fluence has often been argued as one major contributor to
learner difficulties, in the sense that L2 sounds which are dif-
ferent from the L1 sounds are often difficult to produce. Mother
tongue influence is, however, prevalent not just in the speech
production arena, as research also shows that it has tremendous
effects on the perception of L2 speech sounds, albeit in a dif-
ferent fashion. Flege (1995), for example, argues in his Speech
Learning Model (SLM) that the more similar an L2 sound is to
an L1 sound, the more problems an L2 learner will have in
perceiving the L2 sound, because L2 learners are likely to judge
L2 sounds as realizations of an L1 category. If L2 learners can
detect the phonetic differences between an L2 sound and the
nearest L1 sound, then they can perceive the L2 sound more
easily. If not, problems will arise. Similarities, rather than dif-
ferences, between the native and target languages are thus seen
as the main contributor to learner difficulties. Another well-
known model which attributes L2 learners’ discrimination prob-
lems to the phonetic similarity between L1 and L2 sounds is the
Perceptual Assimilation Model (PAM), to which the focus of
the present article will turn.
Perceptual Assimilation Model
The Perceptual Assimilation Model (PAM), developed by
Best (1994), proposes that non-native contrasts are perceived in
terms of their phonetic similarity to the phonological categories
present in a listener’s native language (Harnsberger, 2001). It
posits that “non-native speech perception is strongly affected by
listeners’ knowledge (whether implicit or explicit) of native
phonological equivalence classes, and that listeners perceptu-
ally assimilate non-native phones to native phonemes whenever
possible, based on detection of commonalities in the articula-
tors, constriction locations and/or constriction degrees used”
(Best, 1993; cited in Best, McRoberts, & Goodell, 2001: p. 777).
The similarity between the native and target languages is seen
as a vital factor determining L2 speech perception, as the de-
gree of gestural similarity determines the matching between na-
tive phoneme categories and non-native phones. A listener will
not be able to detect discrepancies between native and nonna-
tive phonemes if he or she perceives the nonnative phones to be
very similar to a native phoneme category in their articulatory-
gestural properties.
Best, McRoberts and Sithole (1988) (cited in Best, 1994) have
listed four patterns of assimilation, which can be used to predict
how well listeners will discriminate different foreign sounds
from one another.
1) TC (Two Categories): The members of a non-native con-
trast may be gesturally similar to two different native phonemes,
thereby assimilated to two categories;
2) SC (Single Category): The non-native phones may assimi-
late equally well, or poorly, to a single native category;
3) CG (Category Goodness): The non-native contrasts may
both be assimilated to a single native category, with one more
similar than the other to the native phoneme; and
4) NA (Non-assimilable): The non-native sounds may be too
discrepant from the gestural properties of any native categories
to be assimilated into any categories of the native phonology.
Copyright © 2013 SciRes. 183
These should be perceived as non-speech sounds.
According to the PAM, only some non-native contrasts are
difficult for mature listeners (phonologically sophisticated lis-
teners) to discriminate, whereas others should be easy to discri-
minate even without prior training or exposure. The discrimina-
tion performance pattern for adults from highest performance to
lowest should be: TC (NA < = > CG) SC (Best, 1994).
Such a prediction assumes strong phonological influence from
the L1, and the perceptual variations depend on the differences
in the gestural similarities and discrepancies between the non-
native contrasts and the native phonemes. For NA contrasts,
discrimination performance depends on how similarly the two
sounds are perceived to be non-speech sounds. It was claimed,
in Best (1994), that the pattern of performance they obtained
with adult listeners across several experiments with non-native
speech contrasts had been consistent with this prediction. Other
research studies carried out by Best and her collaborators also
support this prediction (e.g. Best, McRoberts, & Goodell,
Current Research into the PAM
Since the introduction of the PAM, a number of research
studies have been carried out to test their proposals and/or to
investigate L2 or foreign language learners’ speech perception
abilities. Aoyama (2003), for example, investigated Korean and
Japanese speakers’ perception of English nasals to examine
how learners’ L1 influenced the perception of L2 segments. It
was found that the speakers’ performance was consistent with
the prediction of the PAM: The final /n/-// contrast was par-
ticularly difficult, because neither sound was consistently clas-
sified with one L1 category and the same L1 categories were
used for both. On the other hand, Kingston (2003) obtained
data incompatible with the claims of the PAM in his study of
the ability of American English learners to categorize German
non-low vowels: He found that pairs of vowels contrasting
minimally for the same feature in German often would not as-
similate in the same way to English vowels, so some instances
of the same contrast between German vowels were more easily
discriminated than others. The ease with which a learner could
tell one non-native phoneme from another, thus, did not vary
directly with the extent to which these sounds assimilated to
different native phonemes. In his investigation of the produc-
tion and perception of Australian English vowels by Vietnam-
ese and Japanese ESL speakers, Proctor (2004) also argued that
although the PAM was useful at explaining some aspects of L2
phonology, there was a need for a more unified approach which
could account for other issues such as temporal transfer (the
transfer of skills in the perception of duration). Other research
studies which claimed to have found supporting evidence for
the assertions or basic premises of the PAM include Imsri
(2003), who found that inexperienced learners perceived non-
native sounds according to their L1 inventory; and Pilus (2002),
whose data pointed to learners’ better perception abilities than
production abilities. Those which raised problems for the PAM
or implicate factors other than perceptual assimilation include
Harnsberger (2001), who argued that discriminability of non-
native contrasts was a function of the similarity of non-native
sounds to each other in a multidimensional, phonologized per-
ceptual space; and Strange, Akahane-Yamada, Kubo, Trent, Ni-
shi and Jenkins (1998) and Strange, Akahane-Yamada, Kubo,
Trent and Nishi (2001), who argued that identification and dis-
crimination of L2 vowels varied significantly as a function of
the contexts in which they were produced and presented.
Phonology Acquisition by Cantonese ESL
Learners in Hong Kong
Many research studies have been carried out to investigate
Cantonese ESL learners’ second language phonology acquisi-
tion, most of which focus on learners’ difficulties in the produc-
tion of English speech sounds (e.g. Bolton and Kwok, 1990; A.
Y. W. Chan, 2006a, 2006b, 2007; C. Y. H. Chan, 2005, 2007;
Chan & Li, 2000; Hung, 2000, 2005; Lo, 2007; Stibbard, 2004).
Both segmental problems (including problems in vowels, in
consonants and in consonant clusters) and suprasegmental pro-
blems (such as word stress and rhythm) have been documented.
With regard to the production of English vowels, substitution
by a near sound in the native language has been reported as a
most common strategy used to cope with problematic English
sounds non-existent in the L1. For example, English //, a
short vowel not found in Cantonese, is often replaced by a
similar Cantonese vowel in production, namely /e/, as in words
such as leng3 /le/ (The number at the end of each Cantonese
word is a tone mark indicating one of the nine distinctive tones
in Can- tonese). English tense and lax vowel pairs, such as /i:/
and //, /u:/ and //, and /ɔ:/ and //, have often become in-
distinguish- able in length in the speech of Cantonese ESL
learners. “De- pending on individual learners, some may use a
short vowel for a long one, others a long vowel for a short one;
still others may produce a vowel sound which is somewhere in
between the long and short vowels when pronouncing either
one” (Chan & Li, 2000: pp. 80-81; see also Stibbard, 2004).
Other widespread mispronunciation features include the unnec-
essary lip-rounding in the production of the central vowel /:/
(e.g. in words such as bird) and the substitution of pure vowels
for diphthongs (e.g. /:/ for /a/ in words such as time). These
problems in produc- ing English vowels are often explained in
terms of the inven- tory gaps between the two languages, that
L2 sounds non-exis- tent in the native language are more diffi-
cult than those shared by both the native and target phoneme
inventories, and that the substitution sounds often bear some
articulatory and acoustic resemblance to the closest L1 sounds.
Research into the perception of English speech sounds by
Cantonese ESL learners in Hong Kong has, to the author’s
knowledge, been very scarce. Chan (2001) is one notable ex-
ception. She studied Cantonese ESL learners’ perception of Eng-
lish word-initial consonants and found a positive correlation be-
tween perception problems and production problems: Learners
who consistently demonstrated perceptual confusion for the
contrast pairs (/v, w/, /, f/, /, d/, /z, s/ and /r, w/) also demon-
strated confusion in production, and the target items /v, , , z,
r/ were often misperceived as the same as their mispronounced
versions /w, f, d, s, w/ respectively. Chan (2001) explained the
results in terms of Bradlow et al. (1997)’s suggestion, that there
might be a common mental representation determining both
speech perception and speech production. Her data also sup-
ported Flege’s (1991, 1992) model of L2 speech learning, that
“L2 learners tend to perceive L2 sounds categorically within
the sound classes of their L1” (Chan, 2001: p. 39). Another
study which incorporated speech perception is Hung (2000). In
this study of the phonology of Hong Kong English, Hung con-
ducted a perception test of English vowels and found that his
subjects could not distinguish pairs of vowels such as /i:/ and
Copyright © 2013 SciRes.
//, and // and /e/. The focus of the study, however, was on
production, and the perception tests were just meant to provide
further support for the production data obtained in the study and
the acoustic analyses given rather than to investigate learners’
perceptual abilities. No systematic research, to the author’s
knowledge, has been carried out to investigate the perception of
English vowels by Cantonese ESL learners in Hong Kong, nor
has there been any attempt to attribute learners’ perception diffi-
culties or abilities to established models such as the PAM. The
present research, which is a sub-study of a large-scale project
on the perception and production of L2 speech sounds by Can-
tonese ESL learners in Hong Kong, serves to bridge this re-
search gap.
The Study
The present study examined ESL learners’ perception of L2
vowels and their perceived relations between L1 and L2 vowels
with an aim to investigate the extent to which the prediction of
the PAM regarding different pairs of non-native contrasts are
valid for explaining second language phonology acquisition by
Cantonese ESL learners in Hong Kong.
A group of forty Hong Kong ESL learners (all native speak-
ers of Cantonese) participated in the study. Twenty-nine of them
were females and eleven males, with ages ranging from nine-
teen to forty-two at the time of the study. They all studied Eng-
lish as their majors at three local universities, including eight
year 1 students, twenty-two year 2 students, and ten year 3 or
postgraduate students. All of them started to learn English for-
mally at the age of six or earlier when they entered primary
schools. Twenty-six claimed to have received some form of pho-
netics training (such as taking a phonetics and phonology or
pronunciation course), and the accent they learnt was Received
Pronunciation (RP) English. Fourteen had not received any pho-
netics training before.
Perceptual Targets and Procedures
Three L2 perception tasks and one L1 L2 perception task
were conducted to investigate the participants’ perception of L2
vowels and their perceived similarity between “similar” L1 and
L2 vowels. A total of eight English vowels, including three
long and short vowel pairs, namely, /i:, /, /u:, /, /:, /,
and the vowel pair /, e/, were included in all the L2 percep-
tion and L1 L2 perception tasks. Cantonese vowels which have
“similar” acoustic and articulatory features with a target English
vowel (e.g. Cantonese /i/ with English /i:/ and //) were in-
cluded for con- trast in the L1 and L2 perception task. The Eng-
lish stimuli were spoken in RP English and the Chinese stimuli
were spoken in Cantonese. The stimuli were presented to the
participants indi- vidually at a comfortable volume over ear-
phones in a quiet room during the implementation and a re-
search assistant was responsible for administering the experi-
L2 Categorial Discrimination Task (Task 1)
A categorial AXB discrimination test based on Best, McRoberts
and Goodell (2001) was conducted to investigate the partici-
pants’ discrimination of phones in isolation. In this task, series
of three isolated phones (i.e. AAB (e.g. u:, u:, ), ABB (e.g. u:,
, ), BBA, (e.g. , , u:) or BAA (e.g. , u:, u:)) were pre-
sented. The participants were given a response sheet with a list
of AXB sequences and asked to listen to the recorded stimuli
and determine for each series whether the middle item (X) was
the same as the first or the third item.
Word Discrimination Task (Task 2)
The purpose of the second task was to test the participants’
ability to differentiate English minimal pairs. English words
(e.g. fool) were spoken in isolation. A response sheet with the
recorded word (e.g. fool) and a word differing in only one pho-
neme (e.g. full) was given to the participants. They had to listen
to the recording and indicated the word they had heard from the
corresponding pair on the response sheet (see Appendix 1 for
some sample pairs of words).
Picture Discrimination Task (Task 3)
The picture discrimination task tested the participants’ ability
to differentiate English minimal pairs spoken in carrier sen-
tences (e.g. Now I say ______). A response sheet with a picture
showing the recorded word (e.g. pool) and another picture
showing a word in a minimal pair relationship with the re-
corded word (e.g. pull) was given to the participants, who had
to indicate the picture which showed the word they had heard
(see Appendix 1 for some sample pairs of words).
Classification of English Vowels into Cantonese
Vowels and Rating of Similarity (Task 4)
The task was divided into two parts. In the first part, a set of
English words spoken in RP English and corresponding Canto-
nese words with “similar” vowels were presented to the par-
ticipants. They had to classify the target English vowel (e.g. //)
as a Cantonese vowel when hearing an English word and its
corresponding Chinese list. For instance, when hearing an Eng-
lish stimulus [kk] cook, the participants had to classify the
English vowel // as one of the Cantonese vowels in a given
list of Chinese words spoken in Cantonese (e.g. [kɔk] kok8,
[kuk] kuk7, [kek] kek9, [kœk] koek8). The target English word
(e.g. cook) was then presented to the participants for a second
time, who had to rate the English vowel (e.g. //) for the degree
of similarity to the Cantonese vowel just selected (e.g. /u/ in
[kuk] kuk7) using a scale ranging from 1 (very different) to 5
(very similar). These two parts of the task required the partici-
pants to give both a classification response and a good-
ness-of-fit rating before proceeding to the next set of words. No
previous training was provided for either the classification task
or the rating task, but the participants were given a written list
of all the words spoken (see Appendix 2 for some sample sets
of words).
Data Analysis
For Tasks 1 to 3, the proportion of correct judgments by the
participants on each English sound and/or sound pair was
computed to reveal the frequency with which a particular Eng-
1A Proportion Z Test is a test of differences between two proportions from
independent samples. Assuming that the samples are normally distributed, i
Z (Z-value) > 1.96, then there is a significant difference between the two
roportions at the .05 significance level. Otherwise, the difference can be
attributed to sam
Copyright © 2013 SciRes. 185
lish sound or sound pair was correctly perceived in each task
and in all the tasks. Proportion Z Tests1 were conducted to de-
termine the significance of the differences between the partici-
pants’ performance on different sound pairs or on individual
members of a pair.
For the first part of Task 4, the percentage of times that a
particular English phone (e.g. //) was classified as instances of
a Cantonese sound category (e.g. /u:/, /e/, /i/, etc.) was com-
puted. Classification overlap scores (Flege and Mackay, 2004)
were also calculated for each pair of English contrasts. For
example, if the participants had classified English /u:/ and //
as Cantonese /u/ for p% and q% of instances respectively, then
the classification overlap was q% if p > q, but p% if p < q.
For the second part of Task 4, the perceived similarity be-
tween a pair of L1 and L2 vowels (e.g. Cantonese // and Eng-
lish /u:/; Cantonese /i/ and English /i:/) was found by comput-
ing the mean goodness-of-fit rating that the participants as-
signed to the pair: For each degree of similarity ranging from 1
to 5, the product of the degree of similarity and the number of
participants who chose that degree was first computed, then all
the products were added up, and the sum was divided by the
total number of participants.
Statistical analyses were conducted by SPSS 14.0. T-Tests
were run to compare the mean goodness-of-fit ratings between
different pairs of English and Cantonese contrasts (e.g. English
/u:/ and Cantonese /u/ with English // and Cantonese /u/).
L2 Perception Tasks (Tasks 1-3)
Table 1 shows the participants’ perception of different Eng-
lish vowels in different tasks. It can be seen that their percep-
tion was generally good. About 76% of all the target vowels
were accurately perceived. Their perception of the vowel pair
/ɔ:, / was the poorest. Overall accuracy rate was only 69%.
(76% for /ɔ:/ and 62% for //2). More instances of // were
misperceived as /ɔ:/ than vice versa. The /, e/ pair also pre-
sented a number of perceptual problems to the participants,
with an overall accuracy rate of 77% and a similar number of
both sounds accurately perceived (75% for // and 79% for /e/).
The accuracy rates for /u:/ and for // were similar (76% and
79% respectively). /i:, / was the best pair of vowels for per-
ception. 81% of these vowels were accurately perceived, but
the accuracy rate for /i:/ was only 73% whereas that for // was
90%. When individual members of tense and lax vowels were
compared, it can be seen that lax vowels were on the whole
more accurately perceived than corresponding tense ones. The
only exception was /:, /.
Table 1.
Perception of different vowel pairs by the participants.
Vowels Task 1 Task 2 Task 3 All Tasks Z-statistics between first member and second member
Percentages of sounds correctly perceived
N = 160 N = 160 N = 200 N = 520
i:, 92% 89% 65% 81%
N = 80 N = 80 N = 120 N = 280
i: 98% 89% 46% 73%
N = 80 N = 80 N = 80 N = 240
86% 90% 93% 90% Z = 4.91*
N = 160 N = 160 N = 160 N = 480
u:, 98% 75% 59% 77%
N = 80 N = 80 N = 40 N = 200
u: 98% 68% 48% 76%
N = 80 N = 80 N = 120 N = 280
99% 83% 63% 79% Z = .78
N = 160 N = 160 N = 200 N = 520
ɔ:, 99% 69% 44% 69%
N = 80 N = 80 N = 80 N = 240
ɔ: 100% 89% 40% 76%
N = 80 N = 80 N = 120 N = 280
99% 50% 46% 62% Z = 3.43*
N = 160 N = 200 N = 200 N = 560
, e 100% 75% 61% 77%
N = 80 N = 80 N = 80 N = 240
100% 68% 56% 75%
N = 80 N = 120 N = 120 N = 320
e 100% 80% 63% 79% Z = 1.12
N = 640 N = 680 N = 760 N = 2080
Average 97% 77% 57% 76%
*Difference is significant at the .05 level.
Copyright © 2013 SciRes.
Proportion Z tests showed that the difference between /i:/ and
// and that between /ɔ:/ and // were significant at the .05
sig- nificance level, whereas the difference between /u:/ and //
and that between // and /e/ were non-significant (see
Z-statistics in Table 1). Proportion Z tests also showed that the
differences in overall accuracy rates between the /ɔ:, / pair
and all other pairs were significant, whereas the differences in
overall accuracy rates between other pairs of vowels were not
statistically sig- nificant at all (not shown in Table 1 to avoid
Classification of English Vowels as Cantonese Vowels
(Task 4a)
Table 2 shows the participants’ classification of English vow-
els as Cantonese vowels. English /i:/ and // were predomi-
nantly classified as Cantonese /i/ (91% and 89% respectively),
English /u:/ and // as Cantonese /u/ (96% and 93% respec-
tively), and English /ɔ:/ and // as Cantonese /ɔ/ (88% and
95% respectively). This shows that all English tense and lax
vowel pairs were predominantly classified as the “nearest”
Cantonese lax vowels, which presumably have the closest ar-
ticulatory fea- tures, i.e. Cantonese /i/, like English /i:/ and //,
is a high front vowel; Cantonese /u/, like English /u:/ and //, is
a high back vowel; and Cantonese /ɔ/, like English /ɔ:/ is at the
mid back region3. Despite such predominant classifications, all
the target English tense and lax vowels were also classified as
other Can- tonese vowels with rather different articulatory fea-
tures. For example, 5% and 6% of English /i:/ and // respec-
tively were classified as Cantonese /a/, whereas 8% of English
/ɔ:/ were classified as Cantonese /u/. As for the /, e/ pair, both
were predominantly classified as Cantonese /e/ (90% for //
and 50% for /e/), though the latter showed more diverse classi-
fica- tions, with 13%, 19% and 18% being classified as Can-
tonese /i/, /a:/ and /a/ respectively.
Classification overlap was highest for /u:, /, with a score as
high as 93%, and lowest for the low and mid vowel pair (/, e/)
(overlap = 50%). Overlap scores for other tense and lax vowel
pairs were also high, with 89% for /i:, / and 88% for /:, /.
Perceived Degrees of Similarity between English and
Cantonese Vowels (Task 4b)
Table 3 shows the participants’ perceived degrees of similar-
ity between the target English vowels and the Cantonese vow-
els which they had selected as most similar, and Table 4 shows
the T-tests results. It can be seen that the mean goodness-of-fit
ratings were mostly in the range between 3 and 4. For the Eng-
lish low and mid vowel pair, English // was regarded more
similar to Cantonese /e/ than English /e/. The mean goodness-
of-fit rating assigned for the former was 3.95 and that for the
latter was 3.48. This difference was significant at the .05 signi-
ficance level. The mean goodness-of-fit ratings assigned for
English /i:/ and English // to Cantonese /i/ were 3.70 and 3.59
and those for English /u:/ and English // to Cantonese /u/ were
3.70 and 3.81 respectively. Neither of these differences be-
tween the corresponding English tense and lax vowels was sta-
tistically significant. The mean goodness-of-fit ratings for Eng-
lish /ɔ:/ and English // to Cantonese /ɔ/ were 3.59 and 4.01
respectively, and the difference between these two was statisti-
cally significant at the .05 significance level (see Tables 3 and 4).
Table 2.
Participants’ classification of English vowels as Cantonese vowels.
Percentages of English vowels classified as Cantonese vowels
Can. vowels
Eng. vowels i a: a œ u ɔ e
i: 91% 4% 5% 0% 0%
89% 3% 6% 3% 0%
u: 0% 0% 0% 4% 96% 0%
0% 3% 1% 93% 4% 0%
ɔ: 0% 4% 1% 8% 88%
1% 1% 1% 1% 95% 0%
5% 0% 4% 1% 0% 90%
e 13% 19% 18% 1% 50%
Table 3.
Participants’ perception of degrees of similarity between English and Cantonese vowels.
Mean goodness-of-fit ratings
Can. vowels
Eng. vowels i a: a œ u ɔ e
i: 3.70 3.00 3.00
3.59 4.00 3.20 2.50
u: 2.67 3.70
1.50 4.00 3.81 3.00
ɔ: 3.00 2.00 2.50 3.59
2.00 4.00 1.00 2.00 4.01
2.75 2.67 3.00 3.95
2In Table 1, the data are presented as results on a pair and results on individual items in the pair. If the accuracy rate of an individual item (e.g. /ɔ:/) is lower
than 100%, then the misperceived tokens were perceived as the other item (e.g. //) in the corresponding pair (e.g. /ɔ:, /).
Copyright © 2013 SciRes. 187
e 2.70 2.40 2.36 3.00 3.48
Table 4.
Comparison of mean goodness-of-fit ratings for similar English and Cantonese vowels.
English and Cantonese Vowels N Mean Mean Difference Sig.
/i:/ and /i/
// and /i/
/u:/ and /u/
// and /u/
/ɔ:/ and /ɔ/
// and /ɔ/
// and /e/
/e/ and /e/
*Difference is significant at .05 level.
Interestingly, all the English lax vowels were classified by a
minority of the participants as very similar (goodness-of-fit rat-
ings = 4) to a “non-equivalent” Cantonese vowel: English //
when compared with Cantonese /œ/, English // with Canton-
ese /a:/, and English // with Cantonese /a/ all received a good-
ness-of-fit rating of about 4.
Prediction of the PAM
From the results of the study, it can be seen that some L2
vowel pairs were assimilated by the participants to a single native
category with one more similar than the other to the native
phoneme (CG: Category Goodness), and some should be seen
as equally similar to a single native category (SC: Single Cate-
gory). The /, e/ pair was an example of the former. Both of
these two sounds were classified as most similar to Cantonese
/e/ with a low classification overlap of 50%, showing that this
pair of non-native contrasts may have assimilated to Cantonese
/e/ but with English // more similar than English /e/ to the
native phoneme (CG). The statistically significant goodness-of-
fit rating difference between the two (when compared to Can-
tonese /e/) also confirms that they should be regarded as CG.
English /i:, / and /u:, / were good exemplars of the SC pat-
tern, as they assimilated equally well to a single native category
with classification overlaps as high as 89% or above. The sta-
tistically nonsignificant goodness-of-fit rating differences be-
tween the tense and lax vowels (when compared with the cor-
responding Cantonese vowel) also confirm this grouping. The
English /ɔ:, / pair, on the other hand, invites some indetermi-
nacy in patterning. The classification overlap between the two
was high (88%), suggesting that they should have been assimi-
lated to a single category, but there was a statistically signifi-
cant goodness-of-fit rating difference between the former and
the latter when compared with the same Cantonese vowel /ɔ/,
showing that they should be regarded as CG instead.
None of the English vowel pairs could be regarded as similar
to two different native phonemes (TC: Two Categories): Al-
though all the vowels were perceived by some participants as
similar to some other L2 phonemes rather than the “nearest”
one and the goodness-of-fit ratings were very high, the per-
centages of such classifications were too low to be of signifi-
cance for comparison. As such, there was no TC pattern in the
study. NA (Non-assimilable) was not applicable in the study
As mentioned before, the PAM predicts that the discrimina-
tion performance pattern for adults from highest performance to
lowest performance is TC CG SC. The participants’ classi-
fications of English vowels into Cantonese vowels and their
perception of the corresponding English vowels did not provide
supporting evidence for this prediction. Their perception of the
CG pair /, e/ was largely the same as that of the SC pairs /u:,
/ and /i:, /. Their perception of another CG pair /ɔ:, / was
actually the worst, statistically significantly poorer than their
perception of the two SC pairs. With a performance pattern lar-
gely different from the pattern of the prediction, it is doubtful
whether the prediction is substantiated and valid for explaining
second language vowel acquisition by Cantonese ESL learners.
The ease with which a Cantonese ESL learner can distinguish a
pair of non-native contrast from another pair is, thus, not nec-
essarily a function of the extent to which the L2 contrasts as-
similate to the L1 phonemes.
Native Phonological Influence
Although the results of the present study do not give sup-
porting evidence to the PAM’s predicted discrimination per-
formance pattern, the model’s claims regarding native phono-
logical influence and learners’ perception of non-native phones
in terms of their L1 phonological categories are not to be falsi-
fied: Cantonese ESL learners do regard different English vow-
els as similar, albeit to different extents, to one or some of their
native vowels, and the L1 phoneme prevalently perceived as
similar to a certain L2 vowel is the one which shares the closest
articulatory properties with the L2 sound, showing that learners
do assimilate non-native phones to native phonemes based on
detection of commonalities that exist between them in the ar-
ticulations. The high goodness-of-fit ratings assigned to the L1
and L2 vowels also confirm learners’ perceived similarities be-
tween the L1 and the L2.
A notable pattern also seems to emerge from the results: The
perception of an individual L2 sound bears an intimate relation
with the perceived distance between the L2 sound itself and the
closest L1 phoneme, asserting the basic premise of the PAM,
that if a learner perceives a non-native phone as very similar to
a native phoneme, he or she will not be able to detect discrep-
ancies between the two. English //, for example, which was
3Unlike English /ɔ:/ and Cantonese/ɔ/, English // is regarded as a low vowel rather than a mid vowel.
Copyright © 2013 SciRes.
considered significantly more similar than English /ɔ:/ to Can-
tonese /ɔ/, was perceived significantly less accurately by the
participants than English /ɔ:/ (accuracy rate for the former =
62% and that for the latter = 76%). The significantly smaller
perceived similarity between English /e/ to Cantonese /e/ than
English // to Cantonese /e/ also resulted in better perception
the former than the latter, although the difference was not sta-
tistically significant. Because vowels contrasting minimally in
English do not assimilate in the same way to the same Canton-
ese vowel, a member (e.g. English /ɔ:/) of a vowel pair (e.g.
English /ɔ:, /) may be more easily or difficultly perceived
than the other (e.g. English //). Rather than making reference
to pairs of non-native contrasts and predicting learners’ relative
difficulty in perceiving one pair (e.g. CG) and another (e.g. TC)
as the PAM does, it seems more appropriate to count on the
perceived distance between an individual L2 vowel and a native
phoneme. The more similar an L2 vowel is to an L1 vowel, the
more difficultly the L2 vowel is perceived (Flege, 1995; also
see Chan, 2012). With the limited pool of data obtained from
the present study, no reliable conclusions can be drawn regard-
ing perceived distance and perception. More research is needed
to ascertain this relation as well as the extent of native phono-
logical influence.
In this article, I have reported on the results of a research
study which investigated the perception of English pure vowels
by Cantonese ESL learners in Hong Kong in an attempt to test
the PAM’s prediction on discrimination performance. The re-
sults of the study do not provide strong support for the predic-
tion, suggesting that a pair of non-native contrasts with one
member classified as closer to a native phoneme than the other
member may not necessarily be more accurately perceived than
another pair which is classified as equally similar to a native
phoneme category. Native language phonological influence is,
however, not nullified in the area of speech perception. Rather
than predicting perception performance with reference to pairs
of non-native contrasts and learners’ assimilation of these non-
native contrasts to L1 phonemes, it seems more appropriate to
predict L2 perception based on the perceived distance between
a certain non-native sound and the closest native phoneme(s).
The results of the study have theoretical significance, providing
a platform for future research into the Perceptual Assimilation
Model, the relationship between perceived similarity and L2
perception, as well as the extent of native phonological influ-
ence. As only a homogeneous group of tertiary-level partici-
pants participated in the study, the results cannot be generalized
to all Cantonese ESL learners in Hong Kong, especially ele-
mentary learners. Further research is needed to include learners
from different proficiency levels and, preferably, from speakers
of other languages. It is also illuminating to include other non-
native sound categories, such as consonants, as well as vowels
in different phonological environments, if a full picture of na-
tive phonological influence is to be attained.
The work described in this article was fully supported by a
competitive earmarked research grant from the Research Grants
Council of the Hong Kong Special Administrative Region, China
[Project Number: CityU 1455/05H]. The support of the Council
is acknowledged. I would also like to thank the participants of
the study for their contribution and my research assistant for her
administrative assistance.
Aoyama, K. (2003). Perception of syllable-initial and syllable-final
nasals in English by Korean and Japanese speakers. Second Lan-
guage Research, 19.3, 251-265. doi:10.1191/0267658303sr222oa
Best, C. T. (1993). Emergence of language-specific constraints in per-
ception of non-native speech: A window on early phonological deve-
lopment. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P.
MacNeilage, & J. Morton (Eds.), Developmental neurocognition:
Speech and face processing in the first year (pp. 289-304). Dordrecht:
Kluwer Academic.
Best, C. T. (1994). The emergence of native-language phonological in-
fluences in infants: A perceptual assimilation model. In J. C. Good-
man, & H. C. Nusbaum (Eds.), The development of speech percep-
tion: The transition from speech sounds to spoken words (pp. 167-
224). Cambridge: MIT Press.
Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of
non-native consonant contrasts varying in perceptual assimilation to
the listener’s native phonological system. Journal of the Acoustical
Society of America, 109, 775-794. doi:10.1121/1.1332378
Best, C. T., McRoberts, G. W., & Sithole, N. N. (1988). The phonolo-
gical basis of perceptual loss for nonnative contrasts: Maintenance of
discrimination among Zulu clicks by English-speaking adults and
infants. Journal of Experimental Psychology: Human Perception and
Performance, 14, 345-360. doi:10.1037/0096-1523.14.3.345
Bolton, K., & Kwok, H. (1990). The dynamics of the Hong Kong ac-
cent: Social identity and sociolinguistic description. Journal of Asian
Pacific Communication, 1.1, 147-172.
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y.
(1997). Training Japanese listeners to identify English /r/ and /l/: IV.
Some effects of perceptual learning on speech production. Journal of
the Acoustical Society of America, 101, 2299-2310.
Chan, A. Y. W. (2006a). Cantonese ESL learners’ pronunciation of Eng-
lish final consonants. Language, Culture and Curriculum, 19.3, 296-
313. doi:10.1080/07908310608668769
Chan, A. Y. W. (2006b). Strategies used by Cantonese speakers in pro-
nouncing English initial consonant clusters: Insights into the interlan-
guage phonology of Cantonese ESL learners in Hong Kong. Interna-
tional Review of Applied Linguistics in Language Teaching, 44, 331-
355. doi:10.1515/IRAL.2006.015
Chan, A. Y. W. (2007). The acquisition of English word-final conso-
nants by Cantonese ESL learners in Hong Kong. Canadian Journal
of Linguistics. 52.3, 231-253. doi:10.1353/cjl.2008.0023
Chan, A. Y. W. (2012). Cantonese English as a second language learn-
ers’ perceived relations between “similar” L1 and L2 speech sounds:
A test of the speech learning model. The Modern Language Journal,
96.1, 1-19. doi:10.1111/j.1540-4781.2012.01291.x
Chan, A. Y. W., & Li, D. C. S. (2000). English and Cantonese phonolo-
gy in contrast: Explaining Cantonese ESL learners’ English pronun-
ciation problems. Language, Culture and Curriculum, 13.1, 67-85.
Chan, C. P. H. (2001). The perception (and production) of English
word-initial consonants by native speakers of Cantonese. Hong Kong
Journal of Applied Linguistics, 6.1, 26-44.
Chan, C. Y. H. (2005). L1 and L2 phonological variation: The merging
of the syllable-initial /n-/ with /l-/ in Cantonese and English by Hong
Kong students. Paper presented at IACL 13, Leiden: Leiden Univer-
Chan, C. Y. H. (2007). Factors affecting L2 pronunciation: The merg-
ing of the syllable-initial /n-/ with /l-/ by Cantonese speakers learning
English. The 32th Annual Congress of Applied Linguistics Associa-
tion of Australia. Wollongong: University of Wollongong.
Flege, J. (1991). Perception and production: The relevance of phonetic
input to L2 phonological learning. In T. Huebner, & C. Ferguson
Copyright © 2013 SciRes. 189
(Eds.), Crosscurrents in second language acquisition and linguistic
theories (pp. 249-289). Philadelphia: John Benjamins Publishing
Flege, J. (1992). Speech learning in a second language. In C. Ferguson,
L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Mo-
dels, research, implications (pp. 565-604). Timonium: York Press.
Flege, J. E. (1995). Second language speech learning: Theory, findings
and problems. In W. Strange (Ed.), Speech perception and linguistic
experience: Issues in cross-language research (pp. 233-277). Balti-
more: York Press.
Flege, J. E., &. Mackay, I. R. A (2004). Perceiving vowels in a second
language. Studies in Second Language Acquisition, 26, 1-34.
Harnsberger, J. D. (2001). On the relationship between identification
and discrimination of non-native nasal consonants. Journal of the
Acoustical Society of America, 110.1, 489-503.
Hung, T. T. N. (2000). Towards a phonology of Hong Kong English.
World Englishes, 19.3, 337-356. doi:10.1111/1467-971X.00183
Hung, T. T. N. (2005). Word stress in Hong Kong English: A prelimi-
nary study. Applied Language Studies, 9, 29-40.
Imsri, P. (2003). The perception of English stop consonants by Thai
children and adults. Doctoral Thesis, Newark, DE: University of De-
Kingston, J. (2003). Learning foreign vowels. Language and Speech,
46.2-3, 295-349. doi:10.1177/00238309030460020201
Lo, S. K. (2007). The markedness differential hypothesis and the acqui-
sition of English final consonants by Cantonese ESL learners in
Hong Kong. M. Phil Thesis, Hong Kong: City University of Hong
Pilus, Z. (2002). Second language speech: Production and perception of
voicing contrasts in word-final obstruents by Malay speakers of Eng-
lish. Doctoral Thesis, Madison, WI: University of Wisconsin-Madi-
Proctor, M. (2004). Production and perception of AusE vowels by Vi-
etnamese and Japanese ESL learners. 2004 Australian Linguistic So-
ciety Annual Conference. Sydney: University of Sydney.
Stibbard, R. (2004). The spoken English of Hong Kong: A study of co-
occurring segmental errors. Language, Culture and Curriculum, 17.
2, 127-142. doi:10.1080/07908310408666688
Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., & Nishi, K.
(2001). Effects of consonantal context on perceptual assimilation of
American English vowels by Japanese listeners. Journal of the Acou-
stical Society of America, 109.4, 1691-1704.
Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., Nishi, K., &
Jenkins, J. J. (1998). Perceptual assimilation of American English
vowels by Japanese listener. Journal of Phonetics, 26, 311-344.
Appendix 1
List of Word Pairs Used in Task 2 and Task 3
Task 2.
Minimal pair discrimination.
1) eat it 2) fool full 3) look Luke
4) wok walk 5) beg bag 6) bin bean
7) beach bitch 8) pick peak 9) suit soot
10) hood who’d 11) pod pawed 12) don dawn
13) cot caught 14) bed bad 15) sat set
16) man men
Task 3.
Picture discrimination.
1) bean bin 2) hit heat 3) tin teen
4) sit seat 5) ship sheep 6) look Luke
7) full fool 8) pool pull 9) hood who’d
10) caller collar 11) wok walk 12) chalk choc
13) stock stalk 14) not nought 15) send sand
16) bend band 17) men man 18) said sad
19) pen pan
Appendix 2
Response Sheet for Task 4
Task 4.
Sample words only.
Below is a list of English words. You will hear each English word
twice and each list of Chinese words once.
Task a) After hearing the English word and the list of Chinese words
for the first time, classify the English vowel in the word as a Canton-
ese vowel. A Cantonese word for each given vowel has been supplied
as hints.
Task b) After hearing the English word for the second time, rate the
English vowel in the word for the degree of similarity to the Canton-
ese vowel you have just chosen, using the given scale ranging from 1
(very different) to 5 (very similar).
Set 1
1teen a:(taan1), a(tan1), i(tin1), œ (teon5)
2sit a:(saat8), a(sat7), œ (seot7), i(sit8)
3lip u(luk9), a(lap9), a:(laap9), i(lip9)
4beak i(bik7), u(buk7), a(bak7), a:(baak7)
Set 4
1ket e(kek9), u(kut8), i(kit8), a(kat7)
2pack a:(paak8), u(puk7), e(pek8), i(pik7)
3men a(man6), a:(maan6), i(min6), u(mun4)
4bang e(beng3), a(bang1), ɔ (bong1), i(bing1)