Open Journal of Modern Linguistics
2013. Vol.3, No.2, 141-148
Published Online June 2013 in SciRes (
Copyright © 2013 SciRes. 141
The Phonetics of Multiple Vowel Lengthening in Japanese
Shigeto Kawahara1, Aaron Braver2
1The Institute of Cultural and Linguistic Studies, Keio University, Tokyo, Japan
2Department of Linguistics, Rutgers University, New Brunswick, USA
Received November 2nd, 2012; revised April 21st, 2013; accepted April 29th, 2013
Copyright © 2013 Shigeto Kawahara, Aaron Braver. This is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Many languages exploit a short vs. long lexical contrast in vowels. In most, if not all of these languages,
the contrast is binary. In Japanese, however, speakers can lengthen vowels to express emphasis, and mul-
tiple degrees of lengthening can be used to express different degrees of emphasis. This paper offers the
first experimental documentation of this emphatic vowel lengthening phenomenon. The current results
demonstrate that, among the seven speakers recorded, at least a few speakers show six-levels of distinc-
tion in duration, and all but one speaker showed a steady linear correlation between duration and level of
emphasis. We conclude that Japanese speakers have articulatory control that allows them to make very
fine-grained durational distinctions, which go beyond mere binary short vs. long distinctions.
Keywords: Vowel Length; Duration; Emphasis; Japanese; (Non-)Binarity
Many languages distinguish short vowels from long vowels
to make lexical contrasts, but these duration-based length con-
trasts are usually binary; e.g. [hato] “dove” vs. [haato] “heart”
and [obasaN] “aunt” vs. [obaasaN] “grandmother” in Japanese.
While there is the rare typological exception such as Estonian,
in which this contrast can be ternary (Prince, 1980), the distri-
bution of superlong vowels is constrained by various prosodic
and morphological factors (see Ladefoged & Maddieson, 1996;
Lehiste, 1970; Prince, 1980 for discussion). Ladefoged and Ma-
ddieson (1996: p. 320) state that Mixe (Hoogshagen, 1959) is
the only language that they know of that has a purely lexical
duration-based three-way contrast (cf. Jany, 2006, 2007), al-
though they also mention Yavapai (Thomas & Shaterian, 1990)
as another possible candidate. At any rate, three-way vowel
length contrasts are rare at best cross-linguistically, and in the
languages where they do exist, the ternary contrast is prosodi-
cally and/or morphologically restricted. As far as we know,
there are no convincing cases of languages that make use of a
purely lexical four-way (or greater) duration-based length con-
trast in vowels1.
In Japanese, however, speakers can use vowel lengthening to
express emphasis. This process is commonly found in collo-
quial Japanese; a quick Google search (
with examples like [suɡoo-i] (すごーい) “great” and [çidoo-i]
(ひどーい) “awful” with lengthened stem-final vowels yields
many hits. In addition, this pattern can manifest as multiple
levels of emphasis (and therefore lengthening), extending be-
yond the familiar short/long binary distinction2.
This study offers the first experimental documentation of the
vowel lengthening pattern3. One theoretical contribution of this
paper is to investigate exactly how many levels of durational
distinction Japanese speakers can make in expressing different
degrees of emphasis—especially given that lexical vowel length
contrasts are usually limited to a binary distinction in many
languages, including Japanese.
Durational properties of Japanese short vowels and long
vowels have been studied rather extensively in the previous
literature both in terms of their production and perception (Be-
hne et al., 1999; Braver & Kawahara, 2012; Han, 1962; Hirata,
2004; Hirata & Lambacher, 2004; Hirata & Tsukuda, 2009;
Hoequist, 1982; Kinoshita et al., 2002; Moreton & Amano,
1999; Port et al., 1987). These studies have shown that duration
is the major acoustic and perceptual correlate of short vs. long
contrasts in Japanese, although there may be slight differences
in formant characteristics as well, in such a way that long vow-
els are more dispersed in F1 and F2 dimensions than short
2Japanese speakers can also lengthen consonants to express emphasis (Ai-
zawa, 1985; Kawahara, 2001, to appear; Nasu, 1999). For a phonetic study
testing different degrees of lengthening of Japanese consonants, see Kawa-
hara (2012b). For a previous
honetic study investigating various acoustic
properties of “paralinguistic focus”, which may be similar to what the cur-
rent project examines, see Maekawa (1998).
We also note, as we will discuss in the General Discussion section, that
English has a similar process, as in Thank you sooooo much and she
cuuuuuuute. See a post on Language Log by Mark Liberman
( nll/?p=2006) for related observations. It
is beyond the scope of the current study to conduct a cross-linguistic com-
parison, but a cross-linguistic study of this sort of lengthening phenomena
is certainly hoped for.
3The current study focuses on the durational properties of the vowel leng-
thening pattern. See Section 5 for discussion of other acoustic correlates
that may possibly accompany the lengthening pattern.
1When two phonological contrasts interact, it is possible to have a four-way
durational difference. For example, vowels are usually longer before voiced
stops than before voiceless stops (e.g. Chen, 1970; Kawahara, 2006; King-
ston & Diehl, 1994; Kluender et al., 1988; Lisker, 1986). This lengthening
effect may interact with a phonemic vowel length contrast to yield a
four-way durational difference; e.g. VT < VD < VVT < VVD. What we do
not observe, however, is one lexical contrast that is realized as a four-way
durational diffe
vowels (Hirata & Tsukuda, 2009).
Although the phonetics of Japanese short and long vowels
has been well studied in the past, to the best of our knowledge,
there has not been experimental documentation of the emphatic
lengthening pattern, which makes use of multiple levels of dur-
ational distinctions. One relevant study is Kakehi and Hirose
(1997) which tested the production of (heteromorphemic) se-
quences of the same vowels across morphemes in Japanese (e.g.
Matsue e ejiten-wo okutta “(I) sent a picture dictionary to Mat-
sue”), and showed that Japanese speakers do make a distinction
among 2 consecutive [e]s, 3 consecutive [e]s, 4 consecutive [e]s,
and 6 consecutive [e]s in their production. Drawing on this
study, our study below investigates vowel lengthening patterns
with multiple levels, and shows that Japanese speakers can
make similar fine-detailed durational distinctions even within
single morphemes, and that this fine distinction can hold across
a wider range of vowels in Japanese.
This study used emphasis of stem-final vowels in adjectives
which are commonly observed in Japanese casual speech. The
stimuli were grouped according to their final vowels, [a, o, u],
which commonly appear stem-finally in Japanese adjectives4.
For each vowel, two adjectives were chosen. The adjectives
used in this experiment are listed in Table 1, where [-i] is an
adjectival ending (present/non-past tense). All the stimuli were
disyllabic and had a lexical pitch accent on the second syllable
(i.e. the second syllable had an HL falling pitch contour). A
subject noun was added to each adjective to make a complete
sentence: e.g. [çiza-ɡa ita-i] “(I have) a knee pain”5.
In Japanese orthography, vowel length can be expressed us-
ing “” following the target vowel6. In this experiment, in
addition to the non-lengthened rendition, five different degrees
of emphasis were included as stimuli, as illustrated in Table 2.
There were a total of 36 stimuli (3 vowels * 2 adjectives * 6
emphasis levels). A random number was assigned to each
stimulus item so that transcribers could later track which item
had been produced.
The participants were seven native speakers of Japanese
(anonymously coded as Speakers TF, TN, TX, TW, TT, SX,
TV). They were all undergraduate students at International
Table 1.
The list of stimuli.
[a] [o] [u]
[kata-i] “hard” [suɡo-i] “great” [nemu-i] “sleepy”
[ita-i] “aching” [çido-i] “awful” [samu-i] “cold”
Table 2.
An illustration of one stimulus set in Japanese orthography.
Japanese orthography Transcription Condition
a. いたい [itai] No emphasis
b. いたーい [itaai] Level 1 emphasis
c. いたーーい [itaaai] Level 2 emphasis
d. いたーーーい [itaaaai] Level 3 emphasis
e. いたーーーーい [itaaaaai] Level 4 emphasis
f. いたーーーーーい [itaaaaaai] Level 5 emphasis
Christian University (Tokyo, Japan). They were paid 500 Japa-
nese yen for their time. They all signed a consent form before
participating in the experiment.
The recording sessions took place in a sound-attenuated
room at International Christian University. The stimuli and all
instructions were presented in Japanese orthography using Su-
perlab ver. 4.0 (Cedrus Corporation, 2010). In the instructions,
speakers were told that the experiment was about multiple lev-
els of emphasis in Japanese, and that they were going to read
sentences with vowels of differing length. They were instructed
to read the whole frame sentence, not just the target words, for
each stimulus.
Each block contained one token of every stimulus item. The
speakers were allowed to take a short break after each block.
The order of the stimuli within each block was randomized by
Superlab. The speakers went through ten blocks, which resulted
in a total of 360 tokens (36 stimuli * 10 repetitions). Each spea-
ker was assigned 30 minutes for the experiment.
Before the main session, as practice, each speaker read all the
stimuli once to familiarize themselves with the stimuli and the
task. After the practice phase, the experimenter (the first author)
clarified any questions that they had. Speakers were recorded
directly via a portable recorder (TASCAM DR-40) with a 44.1
kHz sampling rate and a 16 bit quantization level. The first au-
thor sat with each speaker throughout the experiment to moni-
tor the progress of the recording.
4Most stem-final vowels in Japanese adjectives are back, although there are
some exceptions (e.g. [samiɕi-i] “lonely”).
5The target words were placed sentence-finally, and as a result some o
them showed some creakiness and/or weakening (see e.g. Kawahara &
Shinya, 2008; Myers and Hansen 2007). Although this property of the
stimuli did not cause a particular problem for the present acoustic analysis,
a follow-up study which places the target stimuli in sentence internal posi-
tions may be worthwhile.
6A long vowel is written as a sequence of two letters in the case of the
hiragana orthography. For example, [kaasaN] “mother” is written in hira-
gana as ‘かあさん’ (ka + a + sa + N). Long [e:] and [o:] can also be ortho-
graphically expressed as ei and oi in some contexts. For example, ou [oo]
“king” would be written in hiragana as “おう” (o + u) and eiga [eega]
“movie” as “えいが” (e + i + ga). In loanwords as well as in this expressive
emphasis pattern, however, the length mark () is used to express vowel
length. See Labrune (2012) for a recent explanation (in English) of the
Japanese orthographic system.
Acoustic Analysis
The duration of each stem-final vowel plus the adjectival
suffix [i] was measured. We did not attempt to put a boundary
between the stem-final vowels and the suffixal [i], because the
transitions from the stem vowels into the suffixal [i] were
blurry (a vowel-to-vowel transition is generally blurry and hard
to unambiguously locate in an acoustic analysis: Turk et al.,
2006). However, since only the stem-final vowels were empha-
Copyright © 2013 SciRes.
sized, and not the suffixal vowel (see Table 2), the duration of
[i] should be more or less constant across all conditions. Vowel
onset and offset were determined by inspecting both waveforms
and spectrograms, and the boundaries were placed where F2
and F3 (dis-)appear. Sample spectrograms are shown in Figure
After the segmental boundaries were placed, the durations of
the target intervals were automatically extracted. Acoustic mea-
surements were done using Praat (Boersma & Weenink, 1999-
Since there are many comparisons (6 levels of emphasis * 3
types of vowels * 7 speakers), no pair-wise comparisons at each
emphasis level were conducted, in order to avoid Type I error
(i.e. to avoid finding some significant effects by chance). How-
ever, error bars, which represent 95% confidence intervals, are
provided in the result figures. They were calculated over 20
repetitions of each vowel (2 adjectives * 10 repetitions), except
when speakers mispronounced some relevant token. A post-hoc
inspection of the data showed that a linear regression analysis
would be useful, so they are reported in the results section. All
statistical analyses were performed using R (R Development
Core Team, 1993-2013). R was also used to generate result
Since different speakers showed different patterns, we report
the results of individual speakers separately, and present a
summary in the next section after reporting the results of indi-
vidual speakers. We start first by discussing those speakers who
showed the clearest distinctions among the different emphasis
levels. First, as shown in Figure 2, Speaker TF seems to make
a perfect six-way distinction; i.e., the vowel durations for each
level of emphasis are different from those of every other level
of emphasis for this speaker, and error bars do not overlap.
There are large jumps in duration from the non-emphatic
level to the first level of emphasis; with each additional degree
of emphasis, there is a shorter, but steady, increase in duration.
To assess the correlation between emphasis level and dura-
tion, a linear regression analysis was run with vowel duration as
the dependent variable, and emphasis level as the independent
variable. Since the increase from non-emphatic vowels to the
first level of emphasis is non-linear, they were excluded from
this regression analysis. The coefficient estimate of the regres-
sion analysis is 120 ms (t(247) = 30.8, p < .001). This correla-
tion estimate represents an average durational increase per em-
phasis level for this speaker. In other words, it estimates that for
each level of emphasis, vowel duration should increase by 120
ms. The correlation between duration and emphasis level is
very high (r = .89), showing that the linear relationship between
durational increase and emphasis level is very strong.
As shown in Figure 3, like Speaker TF, Speaker TX shows a
six-level distinction among emphatic vowels. The average du-
ration for each condition differs, and error bars barely overlap.
In the regression analysis, the coefficient estimate is 105 ms
(t(245) = 20.2, p < .001), and the correlation estimate r is .79.
As with Speaker TF, there are large durational jumps from
non-emphatic to emphatic vowels. The emphatic vowels show
steady, linear increases in duration, except for exceptionally
Figure 1.
Sample spectrograms: no emphasis, emphasis level 1, emphasis level 2.
ll time scales are 1000 ms. A
Copyright © 2013 SciRes. 143
Copyright © 2013 SciRes.
Figure 2.
The average durations of each emphasis level with 95% confidence intervals: Speaker TF.
Figure 3.
The average durations of each emphasis level with 95% confidence intervals: Speaker TX.
large differences between emphasis level 4 and emphasis level
5. These large, non-linear jumps may be responsible for the
lower r-value compared to that of Speaker TF. Presumably, for
this speaker, the most emphatic vowel has a special status, so it
receives extra lengthening.
Speaker TN, as shown in Figure 4, showed the next clearest
increase in duration as the emphasis levels go up. Although the
speaker generally does not show a clear difference between
level 1 and level 2 emphasis for any of the three vowels, the
speaker nevertheless seems to make a difference between the
other emphasis levels. This speaker also makes an exception-
ally large increase from level 4 to level 5 for [a]. In the regres-
sion analysis, the coefficient estimate is 78 ms (t(230) = 20.7, p
< .001), and the correlation coefficient r is .81.
Speaker TW did not show differences between level 1 and
level 2 (or level 3 for [u]), as illustrated in Figure 5. It is as
though this speaker was treating these levels of emphasis as one
category of emphasis. However, the speaker did make a distinc-
tion between other levels of emphasis. The correlation between
emphasis level and duration is therefore still high (r = .76). In
the regression analysis, the coefficient estimate is 51 ms (t(240)
= 17.9, p < .001). The smaller estimate is also reflected in this
speaker’s duration range; in Figure 5, the duration range is
about 600 ms, whereas for the previous speakers, the duration
ranges are between approximately 800 ms and 1000 ms (Fig-
ures 2-4).
As shown in Figure 6, Speaker TT shows the next highest
correlation between emphasis level and duration (r = .66). This
speaker shows large variability in several conditions (as repre-
sented in the size of the error bars for these conditions); e.g.
emphasis level 5 for [a], and at all emphasis levels for [u]. This
speaker also does not show a difference between level 1 and
level 2 for [u]. These behaviors may be responsible for the
lower r-value of this speaker compared to those discussed
above. In the regression analysis, the coefficient estimate is 75
ms (t(242) = 13.8, p < .001).
As shown in Figure 7, Speaker SX does not show differ-
ences between several of the conditions: between level 2 and
level 3 for [a], between level 4 and level 5 for [o] and between
level 3 and level 4 for [u]. The lack of differences in these con-
ditions resulted in an r-value that is lower than previous speak-
ers (r = .61); however, this linear correlation is still high. In the
regression analysis, the coefficient estimate is 27 ms (t(248) =
12.1, p < .001).
Finally, as shown in Figure 8, Speaker TV shows a more or
less binary distinction—i.e. non-emphatic vs. emphatic—al-
though we do observe a slight increase in duration as emphasis
levels go higher (r = .41). Indeed the regression analysis reveals
that the coefficient estimate is as low as 12 ms, although it did
reach statistical significance (t(245) = 7.2, p < .001).
Table 3 provides a summary of each speaker’s data. It pro-
vides a regression function, an r value as a measure of the
strength of the linear correlation between emphasis levels and
duration, and maximum duration (token-wise) as a measure of
their duration range—the range each speaker is willing to use
for the emphatic vowels.
In spite of some inter-speaker variability, all speakers
showed a positive, steady correlation between level of emphasis
and vowel duration. Speakers TF and TX showed a perfect six-
way durational distinction, without much overlap in error bars.
While other speakers did not show all these distinctions quite as
clearly, they showed a (mostly) steady linear increase in dura-
tion as emphasis levels increased. Furthermore, all speakers ex-
cept Speaker TV made an at least 5-way distinction: they either
had all levels distinguished, o did not show a difference r
Figure 4.
The average durations of each emphasis level with 95% confidence intervals: Speaker TN.
Figure 5.
The average durations of each emphasis level with 95% confidence intervals: Speaker TW.
Figure 6.
The average durations of each emphasis level with 95% confidence intervals: Speaker TT.
Figure 7.
The average durations of each emphasis level with 95% confidence intervals: Speaker SX.
between two (but not more than two) adjacent levels (with the
potential exception of [u] for Speaker TW). On the other hand,
Speaker TV appeared to make an (almost) binary distinction
between emphasized and non-emphasized vowels. Overall,
there were no evident significant reversals, where higher em-
phasis levels would have shown shorter durations (perhaps
except for Speaker TV’s [a], level 4 and level 5).
In Table 3, we can observe that there is some association
between the strength of correlation (r) and the maximum dura-
tion a speaker used; for example, Speaker TF, who showed the
Copyright © 2013 SciRes. 145
Figure 8.
The average durations of each emphasis level with 95% confidence intervals: Speaker TV.
Table 3.
Summary of each speaker’s behavior.
Speaker Regression function r Max dur (ms)
Speaker TF y = 152 + 120x .89 975
Speaker TN y = 217 + 78x .81 782
Speaker TX y = 320 + 105x .79 1301
Speaker TW y = 209 + 51x .76 670
Speaker TT y = 299 + 75x .66 1055
Speaker SX y = 291 + 27x .61 603
Speaker TV y = 395 + 12x .41 533
highest correlation, used a large duration range, whereas Spea-
ker TV, who showed the lowest correlation, used the smallest
duration range. The correlation is not perfect, however, since
Speaker TT showed the second-largest duration range, yet this
speaker has the third-lowest r-value.
General Discussion
The current study, to the best of our knowledge, has provided
the first experimental description of the emphatic vowel len-
gthening pattern in Japanese. Although there is some inter-
speaker variability, several speakers were able to make dur-
ational distinctions as fine-grained as six-ways. Other speakers
showed a positive correlation between durations and emphasis
levels, all to a statistically significant degree. These patterns are
in line with the conclusions drawn from a companion study on
emphatic consonant lengthening in Japanese (Kawhara, 2012b),
which used a method that is similar to the current experiment to
measure the duration of Japanese consonants with multiple
degrees of emphasis (The current speakers and those who par-
ticipated in Kawahara (2012b) do not overlap).
Taken together, one general implication of our current study,
beyond providing an experimental description of the Japanese
vowel lengthening pattern, is that the current results show that
Japanese speakers have articulatory controls which enable them
to potentially make six-way durational distinctions.
Further Questions
One question that arises, given that speakers can make such
fine-grained durational distinctions, is why natural languages
generally deploy only a two-way distinction for lexical con-
trasts (as discussed in the introduction). One possible answer to
this question is that a three way durational contrast may be dif-
ficult to unambiguously perceive in real communicative situa-
tions—in other words, perceptual distinctiveness restricts a
range of possible contrasts that the grammar can deploy (see e.g.
Boersma, 1998; Diehl et al., 2004; Flemming, 1995, 2004; Lil-
jencrants & Lindblom, 1972; Lindblom, 1986; Padgett, 2002;
Schwartz et al., 1997a, 1997b; see especially Engstrand & Krull,
1994; Podesva, 2000; and Kawahara, 2012a for the grammati-
cal imperatives on perceptual dispersion in durational contrasts).
For a discussion of an alternative, more formally-based expla-
nation, see the companion paper (Kawahara, 2012b).
The current study also raises many questions which should
be addressed in future studies. For example, would Japanese
listeners be able to track these different degrees of emphasis?
The current experiment used only up to 5 levels of emphasis,
but given how well some speakers performed, what would the
real limit for Japanese speakers be? Although the current paper
focused on vowels only, it is possible to lengthen consonants
(Kawahara, 2012b), and it is also possible to lengthen both
vowels and consonants: e.g. [suɡɡoo-i] (すっごーい). How
vowel lengthening and consonant lengthening interact is an
interesting question. Also, it is possible to lengthen stem-initial
vowels [suuɡo-i] (すーごい) instead of stem-final vowels
[suɡoo-i] (すごーい). Whether position of emphasis affects
durational manifestations of vowels is question worth pursuing.
Additionally, the differences, if any, between lengthened vow-
els and sequences of (heteromorphemic) vowel sequences (Ka-
kehi & Hirose, 1997), merits investigation. Toshio Matsuura
(p.c.) offers an example paradigm in Table 4 to address this last
question (although this paradigm does not control for accent).
Comparing a paradigm like the one in Table 4, which con-
tains heteromorphemic strings of up to six [o]s, with our current
results, which show six-way contrasts within a single adjective,
may reveal interesting effects of morphological boundaries on
phonetics (see e.g. Bird, 2004; Cho, 2001; Frazier, 2006; Pluy-
maekers et al., 2010).
Moving beyond Japanese, would we expect speakers of other
languages to be able to produce similar durational differences
(and would they make as many levels of distinction)? Would
other languages draw the boundaries between each durational
level at the same place? Would there be a difference between
languages that exploit duration-based contrasts (as in Japanese)
and those that do not? In English, for example, we observe
examples like: Thank you sooooooo much, I loooooooove you
Copyright © 2013 SciRes.
Table 4.
An illustration of one stimulus set in Japanese.
a. 甥と言った [oi-to-itta] “I said ‘nephew’”
b. 遠いと言った [too-i-to-itta] “I said ‘far’”
c. 子を置いた [ko-o-oita] “I placed a child”
d. 甲を置いた [koo-o-oita] “I placed the back of my hand”
e. 憎悪を置いた [zoo-o-o-oita] “I set aside my anger”
f. 法王を置いた [hoo-oo-o-oita] “I set aside the Pope”
and Shes so cuuuuuuute. Given these stimuli, would English
speakers make distinctions similar to those of the Japanese
speakers tested in this experiment?
Further, as an anonymous reviewer points out, semantic fo-
cus can be realized in acoustic dimensions other than duration;
e.g. stronger intensity and pitch range expansion (see Ishihara,
2003; Liu & Xu, 2005; Taheri Ardali & Xu, 2012; Xu, 2005
among many others). It remains to be investigated how Japa-
nese speakers (and speakers of other languages, for that matter)
make use of these acoustic dimensions to express the sort of
emphasis investigated in this paper.
Finally, Hirata and Tsukuda (2009) show that long vowels
are more dispersed in their F1 and F2 dimensions than short
vowels in Japanese. Thus, the effects of emphatic vowel len-
gthening on formant displacement should be explored in future
studies. All of these are interesting questions, which are, how-
ever, beyond the scope of the current study.
A Final Remark
We would like to close with a remark about the distinction
between non-emphatic vowels and emphatic vowels. Recall that
all the speakers produced the emphatic vowels as longer than
the non-emphatic vowels, despite the fact that not all speakers
realized differences among all different levels of emphasis.
Moreover, as observed in all the figures, all speakers showed a
very large increase in duration from non-emphatic vowels to
emphatic vowels, and this increase is larger than the observed
differences between the various levels of emphatic vowels. We
therefore suggest that Japanese speakers overall make a binary
distinction between emphatic and non-emphatic durations, and
within the emphatic durations, speakers differ in how to acous-
tically realize the degrees of emphasis. This conclusion may
imply that, semantically speaking, the difference between non-
emphatic and emphatic is more important than different degrees
of emphasis. Further, Japanese speakers attempt to reflect this
difference in semantic importance in their production of em-
phatic and non-emphatic vowels. Again, we find the same pat-
terning in the companion study on consonant lengthening (Ka-
wahara, 2012b), which reinforces this conclusion.
The recording for this experiment was conducted while the
first author was a visiting scholar at International Christian
University, which was made possible by a fellowship offered
by the Japan ICU Foundation (JICUF). We also thank Shin-
ichiroo Sano and Tomo Yoshida for their support in arranging
the recording sessions. Research assistants at the Rutgers Pho-
netics Laboratory—Nat Dresher, Chris Kish, Sarah Korostoff,
Michelle Marron, Jessica Trombetta, and especially, Melanie
Pangilinan—offered indispensable help with the acoustic ana-
lysis. We thank John Kingston for comments on this project, as
well as Kyle Gorman, Manami Hirayama, Chris Kish, Toshio
Matsuura, Hope McManus, Atsushi Oho, Akiko Takemura, and
anonymous reviewers for comments on earlier versions of this
paper. The publication of this project was supported by the
startup funds provided to the first author by Rutgers University.
Remaining errors are ours.
Aizawa, Y. (1985). Intensification by so-called “choked sounds”—long
consonants—in Japanese. The Study of Sounds, 21, 313-324.
Behne, D., Arai, T., Czigler, P., & Sullivan, K. (1999). Vowel duration
and spectra as perceptual cues to vowel quantity: A comparison of
Japanese and Swedish. Proceedings of ICPhS 1999, 857-860.
Bird, S. (2004). Lheidli intervocalic consonants: Phonetic and morpho-
logical effects. Journal of the International Phonetic Association, 34,
69-91. doi:10.1017/S0025100304001616
Boersma, P. (1998). Functional phonology: Formalizing the interaction
between articulato ry and perceptual drives. The Hague: Holland Aca-
demic Graphics.
Boersma, P., & Weenink, D. (1999-2013). Praat: Doing phonetics by
computer. Software.
Braver, A., & Kawahara, S. (2012). Complete and incomplete neutrali-
zation in Japanese monomoraic lengthening. Newark, NJ: Rutgers
Cedrus Corporation (2010). Superlab v. 4.0. Software.
Chen, M. (1970). Vowel length variation as a function of the voicing of
the consonant environment. P honetica, 22, 129-159.
Cho, T. (2001). Effects of morpheme boundaries on gestural timing:
Evidence from Korean. Phonetica, 5 8, 129-162.
Diehl, R., Lindblom, B., & Creeger, C. (2004). Increasing realism of
auditory representations yields further insights into vowel phonetics.
Proceedings of ICPhS, XV, 1381-1384.
Engstrand, O., & Krull, D. (1994). Durational correlates of quantity in
Swedish, Finnish and Estonian: Cross-language evidence for a theory
of adaptive dispersion. Phonetica, 51, 80-91.
Flemming, E. (1995). Auditory representations in phonology. Doctoral
Dissertation, Los Angeles, CA: University of California Los Ange-
Flemming, E. (2004). Contrast and perceptual distinctiveness. In B.
Hayes, R. Kirchner, & D. Steriade (Eds.), Phonetically based phono-
logy (pp. 232-276). Cambridge: Cambridge University Press.
Frazier, M. (2006). Output-output faithfulness to moraic structure: Evi-
dence from American English. In C. Davis, A. R. Deal, & Y. Zabbal
(Eds.), Proceedings of north east linguistic society (pp. 1-14). Am-
herst: GLSA Publications.
Han, M. (1962). The feature of duration in Japanese. Onsei no Kenkyuu
[Studies in Phonetics ], 10, 65-80.
Hirata, Y. (2004). Effects of speaking rate on the vowel length distinc-
tion in Japanese. Journal of Phonetics, 32, 565-589.
Hirata, Y., & Lambacher, S. G. (2004). Role of word-external contexts
in native speakers’ identification of vowel length in Japanese. Pho-
netica, 61, 177-200. doi:10.1159/000084157
Hirata, Y., & Tsukada, K. (2009). Effects of speaking rate and vowel
length on formant frequency displacement in Japanese. Phonetica, 66,
129-149. doi:10.1159/000235657
Hoequist, C. E. (1982). Duraional correlates of linguistic rhythm cate-
gories. Phonetica, 40, 19-31. doi:10.1159/000261679
Hoogshagen, S. (1959). Three contrastive vowel lengths in Mixe. Zeit-
Copyright © 2013 SciRes. 147
Copyright © 2013 SciRes.
schrift für Phonetik und allgemeine Sprachwissenschafe, 12, 111-
Ishihara, S. (2003). Intonation and interface conditions. Doctoral Dis-
sertation, Cambridge, MA: MIT.
Jany, C. (2006). Vowel length and phonation contrasts in Chuxnabán
Mixe. Santa Barbara papers in linguistics 18: Proceedings from the
9th Annual Workshop on Native American Language.
Jany, C. (2007). Phonemic versus phonetic correlates of vowel length in
n Mixe. Proceedings of Berkeley linguistics society 33S:
Languages of Mexico and Central America. Berkeley: Berkeley Lin-
guistics Society.
Kakehi, K., & Hirose, Y. (1997). Taishoo-kenkyuu-o riyoosh-ita moora
suu chikaku-no kentou [Constructive study on mora identification].
Onsei Kenkyuu [Journal of the Phonetic Society of Japan], 1, 23-28.
Kawahara, S. (2001). Similarity among variants: Output-variant corre-
spondence. BA Thesis, Tokyo: International Christian University.
Kawahara, S. (2006). A faithfulness ranking projected from a percepti-
bility scale: The case of [+voice] in Japanese. Language, 82, 536-574.
Kawahara, S. (2012a). Amplitude changes facilitate categorization and
discrimination of length contrasts. IEICE Technical Report. The In-
stitute of Electronics, Information, and Communication Engineers,
112, 67-72.
Kawahara, S. (2012b). Durational properties of emphatic consonants in
Japanese. Newark, NJ: Rutgers University.
Kawahara, S. (2013). Emphatic gemination in Japanese mimetic words:
A wug-test with auditory stimuli. Language sciences.
Kawahara, S., & Shinya, T. (2008). The intonation of gapping and co-
ordination in Japanese: Evidence for international phrase and utter-
ance. Phonetica, 65, 62-105. doi:10.1159/000130016
Kingston, J., & Diehl, R. (1994). Phonetic knowledge. Language, 70,
419-454. doi:10.2307/416481
Kinoshita, K., Behne, D., & Arai, T. (2002). Duration and F0 as per-
ceptual cues to Japanese vowel quantity. Proceedings of ICSLP,
Denver, 757-760.
Kluender, K., Diehl, R., & Wright, B. (1988). Vowel-length differences
before voiced and voiceless consonants: An auditory explanation.
Journal of Phonetics, 16, 153-169.
Labrune, L. (2012). The phonology of Japanese. Oxford: Oxford Uni-
versity Press.
Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s
languages (2nd ed.). Oxford: Blackwell Publishers.
Lehiste, I. (1970). Suprasegmentals. Cambridge: MIT Press.
Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel
quality systems: The role of perceptual contrast. Language, 48,
839-862. doi:10.2307/411991
Lindblom, B. (1986). Phonetic universals in vowel systems. In J. Ohala,
& J. Jaeger (Eds.), Experimental Phonology (pp. 13-44). Orlando:
Academic Press.
Lisker, L. (1986). “Voicing” in English: A catalog of acoustic features
signaling /b/ versus /p/ in trochees. Language and Speech, 2 9, 3-11.
Liu, F., & Xu, Y. (2005). Parallel encoding of focus and interrogative
meaning in Mandarin intonation. Phonetica, 72, 70-87.
Maekawa, K. (1998). Phonetic and phonological characteristics of
paralinguistic information in spoken Japanese. Proceedings of ICSLP
1998, Australian Speech Science and Technology Association, Incor-
porated (ASSTA).
Moreton, E., & Amano, S. (1999). Phonotactics in the perception of
Japanese vowel length: Evidence for long distance dependencies.
Proceedings of the 6th European Conference on Speech Communi-
cation and Technology, Budapest.
Myers, S., & Hansen, B. (2007). The origin of vowel length neutraliza-
tion in final position: Evidence from Finnish speakers. Natural Lan-
guage and Linguistic Theory, 25, 157-193.
Nasu, A. (1999). Chouhukukei onomatope no kyouchou keitai to yuu-
hyousei (Emphatic forms of reduplicative mimetics and markedness).
Nihongo/Nihon Bunka Kenkyuu, Japan/Japanese Culture, 9, 13-25.
Padgett, J. (2002). Contrast and postvelar fronting in Russian. Natural
Language and Linguistic Theory, 21, 39-87.
Pluymaekers, M., Ernestus, M., Baayen, H., & Booij, G. (2010). Mor-
phological effects on fine phonetic detail: The case of Dutch igheid.
In C. Fougeron, B. Kühnert, M. D’Imperio, & N. Vallée (Eds.), La-
boratory Phonolog y 10 (pp. 511-531). Berlin: Mouton De Gruyter.
Podesva, R. (2000). Constraints on geminates in Burmese and Sela-
yarese. In R. Bilerey-Mosier, & B. D. Lillehaugen (Eds.), Proceed-
ings of West Coast Conference on Formal Linguistics 19 (pp. 343-
356). Somerville: Cascadilla Press.
Port, R., Dalby, J., & O’Dell, M. (1987). Evidence for mora timing in
Japanese. Journal of the Acoustical Society of America, 81, 1574-
1585. doi:10.1121/1.394510
Prince, A. (1980). A metrical theory for Estonian quantity. Linguistic
Inquiry, 11, 511-562.
R Development Core Team (1993-2013). R: A language and environ-
ment for statistical computing. R Foundation for Statistical Comput-
ing, Vienna, Austria.
Schwartz, J.-L., Boë, L.-J., Valleé, N., & Abry, C. (1997a). The disper-
sion focalization theory of vowel systems. Journal of Phonetics, 25,
255-286. doi:10.1006/jpho.1997.0043
Schwartz, J.-L., Boë, L.-J., Valleé, N., & Abry, C. (1997b). Major
trends in vowel system inventtories. Journal of Phonetics, 25, 233-
253. doi:10.1006/jpho.1997.0044
Taheri, A. M., & Xu, Y. (2012). Phonetic realization of prosodic focus
in Persian. Proceedings of Speech and Prosody 2012, Shanghai.
Thomas, K. D., & Shaterian, A. (1990). Vowel length and pitch in Ya-
vapai. In M. Langdon (Ed.), Papers from the 1990 Hokan-Penutian
languages workshop (pp. 144-153). University of Southern Illinois.
Turk, A., Nakai, S., & Sugahara, M. (2006). Acoustic segment dura-
tions in prosodic research: A practical guide. In S. Sudhoff, D. Len-
ertova, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, &
J. Schliesser (Eds.), Methods in Empirical Prosody Research (pp.
1-27). Berlin: Walter de Gruyter. doi:10.1515/9783110914641.1
Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English
declarative intonation. Journal of Phonetics, 33, 159-197.