The Phonetics of Multiple Vowel Lengthening in Japanese

doi:10.4236/ojml.2013.32019

Paper Menu >>

Journal Menu >>

Open Journal of Modern Linguistics

2013. Vol.3, No.2, 141-148

Published Online June 2013 in SciRes (http://www.scirp.org/journal/ojml) http://dx.doi.org/10.4236/ojml.2013.32019

The Phonetics of Multiple Vowel Lengthening in Japanese

Shigeto Kawahara1, Aaron Braver2

1The Institute of Cultural and Linguistic Studies, Keio University, Tokyo, Japan

2Department of Linguistics, Rutgers University, New Brunswick, USA

Email: kawahara@icl.keio.ac.jp

Received November 2nd, 2012; revised April 21st, 2013; accepted April 29th, 2013

Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,

provided the original work is properly cited.

Many languages exploit a short vs. long lexical contrast in vowels. In most, if not all of these languages,

the contrast is binary. In Japanese, however, speakers can lengthen vowels to express emphasis, and mul-

tiple degrees of lengthening can be used to express different degrees of emphasis. This paper offers the

first experimental documentation of this emphatic vowel lengthening phenomenon. The current results

demonstrate that, among the seven speakers recorded, at least a few speakers show six-levels of distinc-

tion in duration, and all but one speaker showed a steady linear correlation between duration and level of

emphasis. We conclude that Japanese speakers have articulatory control that allows them to make very

fine-grained durational distinctions, which go beyond mere binary short vs. long distinctions.

Keywords: Vowel Length; Duration; Emphasis; Japanese; (Non-)Binarity

Introduction

Many languages distinguish short vowels from long vowels

to make lexical contrasts, but these duration-based length con-

trasts are usually binary; e.g. [hato] “dove” vs. [haato] “heart”

and [obasaN] “aunt” vs. [obaasaN] “grandmother” in Japanese.

While there is the rare typological exception such as Estonian,

in which this contrast can be ternary (Prince, 1980), the distri-

bution of superlong vowels is constrained by various prosodic

and morphological factors (see Ladefoged & Maddieson, 1996;

Lehiste, 1970; Prince, 1980 for discussion). Ladefoged and Ma-

ddieson (1996: p. 320) state that Mixe (Hoogshagen, 1959) is

the only language that they know of that has a purely lexical

duration-based three-way contrast (cf. Jany, 2006, 2007), al-

though they also mention Yavapai (Thomas & Shaterian, 1990)

as another possible candidate. At any rate, three-way vowel

length contrasts are rare at best cross-linguistically, and in the

languages where they do exist, the ternary contrast is prosodi-

cally and/or morphologically restricted. As far as we know,

there are no convincing cases of languages that make use of a

purely lexical four-way (or greater) duration-based length con-

trast in vowels1.

In Japanese, however, speakers can use vowel lengthening to

express emphasis. This process is commonly found in collo-

quial Japanese; a quick Google search (http://www.google.co.jp)

with examples like [suɡoo-i] (すごーい) “great” and [çidoo-i]

(ひどーい) “awful” with lengthened stem-final vowels yields

many hits. In addition, this pattern can manifest as multiple

levels of emphasis (and therefore lengthening), extending be-

yond the familiar short/long binary distinction2.

This study offers the first experimental documentation of the

vowel lengthening pattern3. One theoretical contribution of this

paper is to investigate exactly how many levels of durational

distinction Japanese speakers can make in expressing different

degrees of emphasis—especially given that lexical vowel length

contrasts are usually limited to a binary distinction in many

languages, including Japanese.

Durational properties of Japanese short vowels and long

vowels have been studied rather extensively in the previous

literature both in terms of their production and perception (Be-

hne et al., 1999; Braver & Kawahara, 2012; Han, 1962; Hirata,

2004; Hirata & Lambacher, 2004; Hirata & Tsukuda, 2009;

Hoequist, 1982; Kinoshita et al., 2002; Moreton & Amano,

1999; Port et al., 1987). These studies have shown that duration

is the major acoustic and perceptual correlate of short vs. long

contrasts in Japanese, although there may be slight differences

in formant characteristics as well, in such a way that long vow-

els are more dispersed in F1 and F2 dimensions than short

2Japanese speakers can also lengthen consonants to express emphasis (Ai-

zawa, 1985; Kawahara, 2001, to appear; Nasu, 1999). For a phonetic study

testing different degrees of lengthening of Japanese consonants, see Kawa-

hara (2012b). For a previous

honetic study investigating various acoustic

properties of “paralinguistic focus”, which may be similar to what the cur-

rent project examines, see Maekawa (1998).

We also note, as we will discuss in the General Discussion section, that

English has a similar process, as in Thank you sooooo much and she’

cuuuuuuute. See a post on Language Log by Mark Liberman

(http://languagelog.ldc.upenn.edu/ nll/?p=2006) for related observations. It

is beyond the scope of the current study to conduct a cross-linguistic com-

parison, but a cross-linguistic study of this sort of lengthening phenomena

is certainly hoped for.

3The current study focuses on the durational properties of the vowel leng-

thening pattern. See Section 5 for discussion of other acoustic correlates

that may possibly accompany the lengthening pattern.

1When two phonological contrasts interact, it is possible to have a four-way

durational difference. For example, vowels are usually longer before voiced

stops than before voiceless stops (e.g. Chen, 1970; Kawahara, 2006; King-

ston & Diehl, 1994; Kluender et al., 1988; Lisker, 1986). This lengthening

effect may interact with a phonemic vowel length contrast to yield a

four-way durational difference; e.g. VT < VD < VVT < VVD. What we do

not observe, however, is one lexical contrast that is realized as a four-way

durational diffe

ence.

S. KAWAHARA, A. BRAVER

vowels (Hirata & Tsukuda, 2009).

Although the phonetics of Japanese short and long vowels

has been well studied in the past, to the best of our knowledge,

there has not been experimental documentation of the emphatic

lengthening pattern, which makes use of multiple levels of dur-

ational distinctions. One relevant study is Kakehi and Hirose

(1997) which tested the production of (heteromorphemic) se-

quences of the same vowels across morphemes in Japanese (e.g.

Matsue e ejiten-wo okutta “(I) sent a picture dictionary to Mat-

sue”), and showed that Japanese speakers do make a distinction

among 2 consecutive [e]s, 3 consecutive [e]s, 4 consecutive [e]s,

and 6 consecutive [e]s in their production. Drawing on this

study, our study below investigates vowel lengthening patterns

with multiple levels, and shows that Japanese speakers can

make similar fine-detailed durational distinctions even within

single morphemes, and that this fine distinction can hold across

a wider range of vowels in Japanese.

Method

Stimuli

This study used emphasis of stem-final vowels in adjectives

which are commonly observed in Japanese casual speech. The

stimuli were grouped according to their final vowels, [a, o, u],

which commonly appear stem-finally in Japanese adjectives4.

For each vowel, two adjectives were chosen. The adjectives

used in this experiment are listed in Table 1, where [-i] is an

adjectival ending (present/non-past tense). All the stimuli were

disyllabic and had a lexical pitch accent on the second syllable

(i.e. the second syllable had an HL falling pitch contour). A

subject noun was added to each adjective to make a complete

sentence: e.g. [çiza-ɡa ita-i] “(I have) a knee pain”5.

In Japanese orthography, vowel length can be expressed us-

ing “ー” following the target vowel6. In this experiment, in

addition to the non-lengthened rendition, five different degrees

of emphasis were included as stimuli, as illustrated in Table 2.

There were a total of 36 stimuli (3 vowels * 2 adjectives * 6

emphasis levels). A random number was assigned to each

stimulus item so that transcribers could later track which item

had been produced.

Participants

The participants were seven native speakers of Japanese

(anonymously coded as Speakers TF, TN, TX, TW, TT, SX,

TV). They were all undergraduate students at International

Table 1.

The list of stimuli.

[a] [o] [u]

[kata-i] “hard” [suɡo-i] “great” [nemu-i] “sleepy”

[ita-i] “aching” [çido-i] “awful” [samu-i] “cold”

Table 2.

An illustration of one stimulus set in Japanese orthography.

Japanese orthography Transcription Condition

a. いたい [itai] No emphasis

b. いたーい [itaai] Level 1 emphasis

c. いたーーい [itaaai] Level 2 emphasis

d. いたーーーい [itaaaai] Level 3 emphasis

e. いたーーーーい [itaaaaai] Level 4 emphasis

f. いたーーーーーい [itaaaaaai] Level 5 emphasis

Christian University (Tokyo, Japan). They were paid 500 Japa-

nese yen for their time. They all signed a consent form before

participating in the experiment.

Procedure

The recording sessions took place in a sound-attenuated

room at International Christian University. The stimuli and all

instructions were presented in Japanese orthography using Su-

perlab ver. 4.0 (Cedrus Corporation, 2010). In the instructions,

speakers were told that the experiment was about multiple lev-

els of emphasis in Japanese, and that they were going to read

sentences with vowels of differing length. They were instructed

to read the whole frame sentence, not just the target words, for

each stimulus.

Each block contained one token of every stimulus item. The

speakers were allowed to take a short break after each block.

The order of the stimuli within each block was randomized by

Superlab. The speakers went through ten blocks, which resulted

in a total of 360 tokens (36 stimuli * 10 repetitions). Each spea-

ker was assigned 30 minutes for the experiment.

Before the main session, as practice, each speaker read all the

stimuli once to familiarize themselves with the stimuli and the

task. After the practice phase, the experimenter (the first author)

clarified any questions that they had. Speakers were recorded

directly via a portable recorder (TASCAM DR-40) with a 44.1

kHz sampling rate and a 16 bit quantization level. The first au-

thor sat with each speaker throughout the experiment to moni-

tor the progress of the recording.

4Most stem-final vowels in Japanese adjectives are back, although there are

some exceptions (e.g. [samiɕi-i] “lonely”).

5The target words were placed sentence-finally, and as a result some o

them showed some creakiness and/or weakening (see e.g. Kawahara &

Shinya, 2008; Myers and Hansen 2007). Although this property of the

stimuli did not cause a particular problem for the present acoustic analysis,

a follow-up study which places the target stimuli in sentence internal posi-

tions may be worthwhile.

6A long vowel is written as a sequence of two letters in the case of the

hiragana orthography. For example, [kaasaN] “mother” is written in hira-

gana as ‘かあさん’ (ka + a + sa + N). Long [e:] and [o:] can also be ortho-

graphically expressed as ei and oi in some contexts. For example, ou [oo]

“king” would be written in hiragana as “おう” (o + u) and eiga [eega]

“movie” as “えいが” (e + i + ga). In loanwords as well as in this expressive

emphasis pattern, however, the length mark (ー) is used to express vowel

length. See Labrune (2012) for a recent explanation (in English) of the

Japanese orthographic system.

Acoustic Analysis

The duration of each stem-final vowel plus the adjectival

suffix [i] was measured. We did not attempt to put a boundary

between the stem-final vowels and the suffixal [i], because the

transitions from the stem vowels into the suffixal [i] were

blurry (a vowel-to-vowel transition is generally blurry and hard

to unambiguously locate in an acoustic analysis: Turk et al.,

2006). However, since only the stem-final vowels were empha-

142

S. KAWAHARA, A. BRAVER

sized, and not the suffixal vowel (see Table 2), the duration of

[i] should be more or less constant across all conditions. Vowel

onset and offset were determined by inspecting both waveforms

and spectrograms, and the boundaries were placed where F2

and F3 (dis-)appear. Sample spectrograms are shown in Figure

After the segmental boundaries were placed, the durations of

the target intervals were automatically extracted. Acoustic mea-

surements were done using Praat (Boersma & Weenink, 1999-

2013).

Statistics

Since there are many comparisons (6 levels of emphasis * 3

types of vowels * 7 speakers), no pair-wise comparisons at each

emphasis level were conducted, in order to avoid Type I error

(i.e. to avoid finding some significant effects by chance). How-

ever, error bars, which represent 95% confidence intervals, are

provided in the result figures. They were calculated over 20

repetitions of each vowel (2 adjectives * 10 repetitions), except

when speakers mispronounced some relevant token. A post-hoc

inspection of the data showed that a linear regression analysis

would be useful, so they are reported in the results section. All

statistical analyses were performed using R (R Development

Core Team, 1993-2013). R was also used to generate result

figures.

Results

Since different speakers showed different patterns, we report

the results of individual speakers separately, and present a

summary in the next section after reporting the results of indi-

vidual speakers. We start first by discussing those speakers who

showed the clearest distinctions among the different emphasis

levels. First, as shown in Figure 2, Speaker TF seems to make

a perfect six-way distinction; i.e., the vowel durations for each

level of emphasis are different from those of every other level

of emphasis for this speaker, and error bars do not overlap.

There are large jumps in duration from the non-emphatic

level to the first level of emphasis; with each additional degree

of emphasis, there is a shorter, but steady, increase in duration.

To assess the correlation between emphasis level and dura-

tion, a linear regression analysis was run with vowel duration as

the dependent variable, and emphasis level as the independent

variable. Since the increase from non-emphatic vowels to the

first level of emphasis is non-linear, they were excluded from

this regression analysis. The coefficient estimate of the regres-

sion analysis is 120 ms (t(247) = 30.8, p < .001). This correla-

tion estimate represents an average durational increase per em-

phasis level for this speaker. In other words, it estimates that for

each level of emphasis, vowel duration should increase by 120

ms. The correlation between duration and emphasis level is

very high (r = .89), showing that the linear relationship between

durational increase and emphasis level is very strong.

As shown in Figure 3, like Speaker TF, Speaker TX shows a

six-level distinction among emphatic vowels. The average du-

ration for each condition differs, and error bars barely overlap.

In the regression analysis, the coefficient estimate is 105 ms

(t(245) = 20.2, p < .001), and the correlation estimate r is .79.

As with Speaker TF, there are large durational jumps from

non-emphatic to emphatic vowels. The emphatic vowels show

steady, linear increases in duration, except for exceptionally

Figure 1.

Sample spectrograms: no emphasis, emphasis level 1, emphasis level 2.

ll time scales are 1000 ms. A

S. KAWAHARA, A. BRAVER

144

Figure 2.

The average durations of each emphasis level with 95% confidence intervals: Speaker TF.

Figure 3.

The average durations of each emphasis level with 95% confidence intervals: Speaker TX.

large differences between emphasis level 4 and emphasis level

5. These large, non-linear jumps may be responsible for the

lower r-value compared to that of Speaker TF. Presumably, for

this speaker, the most emphatic vowel has a special status, so it

receives extra lengthening.

Speaker TN, as shown in Figure 4, showed the next clearest

increase in duration as the emphasis levels go up. Although the

speaker generally does not show a clear difference between

level 1 and level 2 emphasis for any of the three vowels, the

speaker nevertheless seems to make a difference between the

other emphasis levels. This speaker also makes an exception-

ally large increase from level 4 to level 5 for [a]. In the regres-

sion analysis, the coefficient estimate is 78 ms (t(230) = 20.7, p

< .001), and the correlation coefficient r is .81.

Speaker TW did not show differences between level 1 and

level 2 (or level 3 for [u]), as illustrated in Figure 5. It is as

though this speaker was treating these levels of emphasis as one

category of emphasis. However, the speaker did make a distinc-

tion between other levels of emphasis. The correlation between

emphasis level and duration is therefore still high (r = .76). In

the regression analysis, the coefficient estimate is 51 ms (t(240)

= 17.9, p < .001). The smaller estimate is also reflected in this

speaker’s duration range; in Figure 5, the duration range is

about 600 ms, whereas for the previous speakers, the duration

ranges are between approximately 800 ms and 1000 ms (Fig-

ures 2-4).

As shown in Figure 6, Speaker TT shows the next highest

correlation between emphasis level and duration (r = .66). This

speaker shows large variability in several conditions (as repre-

sented in the size of the error bars for these conditions); e.g.

emphasis level 5 for [a], and at all emphasis levels for [u]. This

speaker also does not show a difference between level 1 and

level 2 for [u]. These behaviors may be responsible for the

lower r-value of this speaker compared to those discussed

above. In the regression analysis, the coefficient estimate is 75

ms (t(242) = 13.8, p < .001).

As shown in Figure 7, Speaker SX does not show differ-

ences between several of the conditions: between level 2 and

level 3 for [a], between level 4 and level 5 for [o] and between

level 3 and level 4 for [u]. The lack of differences in these con-

ditions resulted in an r-value that is lower than previous speak-

ers (r = .61); however, this linear correlation is still high. In the

regression analysis, the coefficient estimate is 27 ms (t(248) =

12.1, p < .001).

Finally, as shown in Figure 8, Speaker TV shows a more or

less binary distinction—i.e. non-emphatic vs. emphatic—al-

though we do observe a slight increase in duration as emphasis

levels go higher (r = .41). Indeed the regression analysis reveals

that the coefficient estimate is as low as 12 ms, although it did

reach statistical significance (t(245) = 7.2, p < .001).

Summary

Table 3 provides a summary of each speaker’s data. It pro-

vides a regression function, an r value as a measure of the

strength of the linear correlation between emphasis levels and

duration, and maximum duration (token-wise) as a measure of

their duration range—the range each speaker is willing to use

for the emphatic vowels.

In spite of some inter-speaker variability, all speakers

showed a positive, steady correlation between level of emphasis

and vowel duration. Speakers TF and TX showed a perfect six-

way durational distinction, without much overlap in error bars.

While other speakers did not show all these distinctions quite as

clearly, they showed a (mostly) steady linear increase in dura-

tion as emphasis levels increased. Furthermore, all speakers ex-

cept Speaker TV made an at least 5-way distinction: they either

had all levels distinguished, o did not show a difference r

S. KAWAHARA, A. BRAVER

Figure 4.

The average durations of each emphasis level with 95% confidence intervals: Speaker TN.

Figure 5.

The average durations of each emphasis level with 95% confidence intervals: Speaker TW.

Figure 6.

The average durations of each emphasis level with 95% confidence intervals: Speaker TT.

Figure 7.

The average durations of each emphasis level with 95% confidence intervals: Speaker SX.

between two (but not more than two) adjacent levels (with the

potential exception of [u] for Speaker TW). On the other hand,

Speaker TV appeared to make an (almost) binary distinction

between emphasized and non-emphasized vowels. Overall,

there were no evident significant reversals, where higher em-

phasis levels would have shown shorter durations (perhaps

except for Speaker TV’s [a], level 4 and level 5).

In Table 3, we can observe that there is some association

between the strength of correlation (r) and the maximum dura-

tion a speaker used; for example, Speaker TF, who showed the

S. KAWAHARA, A. BRAVER

Figure 8.

The average durations of each emphasis level with 95% confidence intervals: Speaker TV.

Table 3.

Summary of each speaker’s behavior.

Speaker Regression function r Max dur (ms)

Speaker TF y = 152 + 120x .89 975

Speaker TN y = 217 + 78x .81 782

Speaker TX y = 320 + 105x .79 1301

Speaker TW y = 209 + 51x .76 670

Speaker TT y = 299 + 75x .66 1055

Speaker SX y = 291 + 27x .61 603

Speaker TV y = 395 + 12x .41 533

highest correlation, used a large duration range, whereas Spea-

ker TV, who showed the lowest correlation, used the smallest

duration range. The correlation is not perfect, however, since

Speaker TT showed the second-largest duration range, yet this

speaker has the third-lowest r-value.

General Discussion

Summary

The current study, to the best of our knowledge, has provided

the first experimental description of the emphatic vowel len-

gthening pattern in Japanese. Although there is some inter-

speaker variability, several speakers were able to make dur-

ational distinctions as fine-grained as six-ways. Other speakers

showed a positive correlation between durations and emphasis

levels, all to a statistically significant degree. These patterns are

in line with the conclusions drawn from a companion study on

emphatic consonant lengthening in Japanese (Kawhara, 2012b),

which used a method that is similar to the current experiment to

measure the duration of Japanese consonants with multiple

degrees of emphasis (The current speakers and those who par-

ticipated in Kawahara (2012b) do not overlap).

Taken together, one general implication of our current study,

beyond providing an experimental description of the Japanese

vowel lengthening pattern, is that the current results show that

Japanese speakers have articulatory controls which enable them

to potentially make six-way durational distinctions.

Further Questions

One question that arises, given that speakers can make such

fine-grained durational distinctions, is why natural languages

generally deploy only a two-way distinction for lexical con-

trasts (as discussed in the introduction). One possible answer to

this question is that a three way durational contrast may be dif-

ficult to unambiguously perceive in real communicative situa-

tions—in other words, perceptual distinctiveness restricts a

range of possible contrasts that the grammar can deploy (see e.g.

Boersma, 1998; Diehl et al., 2004; Flemming, 1995, 2004; Lil-

jencrants & Lindblom, 1972; Lindblom, 1986; Padgett, 2002;

Schwartz et al., 1997a, 1997b; see especially Engstrand & Krull,

1994; Podesva, 2000; and Kawahara, 2012a for the grammati-

cal imperatives on perceptual dispersion in durational contrasts).

For a discussion of an alternative, more formally-based expla-

nation, see the companion paper (Kawahara, 2012b).

The current study also raises many questions which should

be addressed in future studies. For example, would Japanese

listeners be able to track these different degrees of emphasis?

The current experiment used only up to 5 levels of emphasis,

but given how well some speakers performed, what would the

real limit for Japanese speakers be? Although the current paper

focused on vowels only, it is possible to lengthen consonants

(Kawahara, 2012b), and it is also possible to lengthen both

vowels and consonants: e.g. [suɡɡoo-i] (すっごーい). How

vowel lengthening and consonant lengthening interact is an

interesting question. Also, it is possible to lengthen stem-initial

vowels [suuɡo-i] (すーごい) instead of stem-final vowels

[suɡoo-i] (すごーい). Whether position of emphasis affects

durational manifestations of vowels is question worth pursuing.

Additionally, the differences, if any, between lengthened vow-

els and sequences of (heteromorphemic) vowel sequences (Ka-

kehi & Hirose, 1997), merits investigation. Toshio Matsuura

(p.c.) offers an example paradigm in Table 4 to address this last

question (although this paradigm does not control for accent).

Comparing a paradigm like the one in Table 4, which con-

tains heteromorphemic strings of up to six [o]s, with our current

results, which show six-way contrasts within a single adjective,

may reveal interesting effects of morphological boundaries on

phonetics (see e.g. Bird, 2004; Cho, 2001; Frazier, 2006; Pluy-

maekers et al., 2010).

Moving beyond Japanese, would we expect speakers of other

languages to be able to produce similar durational differences

(and would they make as many levels of distinction)? Would

other languages draw the boundaries between each durational

level at the same place? Would there be a difference between

languages that exploit duration-based contrasts (as in Japanese)

and those that do not? In English, for example, we observe

examples like: Thank you sooooooo much, I loooooooove you

146

S. KAWAHARA, A. BRAVER

Table 4.

An illustration of one stimulus set in Japanese.

a. 甥と言った [oi-to-itta] “I said ‘nephew’”

b. 遠いと言った [too-i-to-itta] “I said ‘far’”

c. 子を置いた [ko-o-oita] “I placed a child”

d. 甲を置いた [koo-o-oita] “I placed the back of my hand”

e. 憎悪を置いた [zoo-o-o-oita] “I set aside my anger”

f. 法王を置いた [hoo-oo-o-oita] “I set aside the Pope”

and She’s so cuuuuuuute. Given these stimuli, would English

speakers make distinctions similar to those of the Japanese

speakers tested in this experiment?

Further, as an anonymous reviewer points out, semantic fo-

cus can be realized in acoustic dimensions other than duration;

e.g. stronger intensity and pitch range expansion (see Ishihara,

2003; Liu & Xu, 2005; Taheri Ardali & Xu, 2012; Xu, 2005

among many others). It remains to be investigated how Japa-

nese speakers (and speakers of other languages, for that matter)

make use of these acoustic dimensions to express the sort of

emphasis investigated in this paper.

Finally, Hirata and Tsukuda (2009) show that long vowels

are more dispersed in their F1 and F2 dimensions than short

vowels in Japanese. Thus, the effects of emphatic vowel len-

gthening on formant displacement should be explored in future

studies. All of these are interesting questions, which are, how-

ever, beyond the scope of the current study.

A Final Remark

We would like to close with a remark about the distinction

between non-emphatic vowels and emphatic vowels. Recall that

all the speakers produced the emphatic vowels as longer than

the non-emphatic vowels, despite the fact that not all speakers

realized differences among all different levels of emphasis.

Moreover, as observed in all the figures, all speakers showed a

very large increase in duration from non-emphatic vowels to

emphatic vowels, and this increase is larger than the observed

differences between the various levels of emphatic vowels. We

therefore suggest that Japanese speakers overall make a binary

distinction between emphatic and non-emphatic durations, and

within the emphatic durations, speakers differ in how to acous-

tically realize the degrees of emphasis. This conclusion may

imply that, semantically speaking, the difference between non-

emphatic and emphatic is more important than different degrees

of emphasis. Further, Japanese speakers attempt to reflect this

difference in semantic importance in their production of em-

phatic and non-emphatic vowels. Again, we find the same pat-

terning in the companion study on consonant lengthening (Ka-

wahara, 2012b), which reinforces this conclusion.

Acknowledgements

The recording for this experiment was conducted while the

first author was a visiting scholar at International Christian

University, which was made possible by a fellowship offered

by the Japan ICU Foundation (JICUF). We also thank Shin-

ichiroo Sano and Tomo Yoshida for their support in arranging

the recording sessions. Research assistants at the Rutgers Pho-

netics Laboratory—Nat Dresher, Chris Kish, Sarah Korostoff,

Michelle Marron, Jessica Trombetta, and especially, Melanie

Pangilinan—offered indispensable help with the acoustic ana-

lysis. We thank John Kingston for comments on this project, as

well as Kyle Gorman, Manami Hirayama, Chris Kish, Toshio

Matsuura, Hope McManus, Atsushi Oho, Akiko Takemura, and

anonymous reviewers for comments on earlier versions of this

paper. The publication of this project was supported by the

startup funds provided to the first author by Rutgers University.

Remaining errors are ours.

REFERENCES

Aizawa, Y. (1985). Intensification by so-called “choked sounds”—long

consonants—in Japanese. The Study of Sounds, 21, 313-324.

Behne, D., Arai, T., Czigler, P., & Sullivan, K. (1999). Vowel duration

and spectra as perceptual cues to vowel quantity: A comparison of

Japanese and Swedish. Proceedings of ICPhS 1999, 857-860.

Bird, S. (2004). Lheidli intervocalic consonants: Phonetic and morpho-

logical effects. Journal of the International Phonetic Association, 34,

69-91. doi:10.1017/S0025100304001616

Boersma, P. (1998). Functional phonology: Formalizing the interaction

between articulato ry and perceptual drives. The Hague: Holland Aca-

demic Graphics.

Boersma, P., & Weenink, D. (1999-2013). Praat: Doing phonetics by

computer. Software.

Braver, A., & Kawahara, S. (2012). Complete and incomplete neutrali-

zation in Japanese monomoraic lengthening. Newark, NJ: Rutgers

University.

Cedrus Corporation (2010). Superlab v. 4.0. Software.

Chen, M. (1970). Vowel length variation as a function of the voicing of

the consonant environment. P honetica, 22, 129-159.

doi:10.1159/000259312

Cho, T. (2001). Effects of morpheme boundaries on gestural timing:

Evidence from Korean. Phonetica, 5 8, 129-162.

doi:10.1159/000056196

Diehl, R., Lindblom, B., & Creeger, C. (2004). Increasing realism of

auditory representations yields further insights into vowel phonetics.

Proceedings of ICPhS, XV, 1381-1384.

Engstrand, O., & Krull, D. (1994). Durational correlates of quantity in

Swedish, Finnish and Estonian: Cross-language evidence for a theory

of adaptive dispersion. Phonetica, 51, 80-91.

doi:10.1159/000261960

Flemming, E. (1995). Auditory representations in phonology. Doctoral

Dissertation, Los Angeles, CA: University of California Los Ange-

les.

Flemming, E. (2004). Contrast and perceptual distinctiveness. In B.

Hayes, R. Kirchner, & D. Steriade (Eds.), Phonetically based phono-

logy (pp. 232-276). Cambridge: Cambridge University Press.

doi:10.1017/CBO9780511486401.008

Frazier, M. (2006). Output-output faithfulness to moraic structure: Evi-

dence from American English. In C. Davis, A. R. Deal, & Y. Zabbal

(Eds.), Proceedings of north east linguistic society (pp. 1-14). Am-

herst: GLSA Publications.

Han, M. (1962). The feature of duration in Japanese. Onsei no Kenkyuu

[Studies in Phonetics ], 10, 65-80.

Hirata, Y. (2004). Effects of speaking rate on the vowel length distinc-

tion in Japanese. Journal of Phonetics, 32, 565-589. 

doi:10.1016/j.wocn.2004.02.004

Hirata, Y., & Lambacher, S. G. (2004). Role of word-external contexts

in native speakers’ identification of vowel length in Japanese. Pho-

netica, 61, 177-200. doi:10.1159/000084157

Hirata, Y., & Tsukada, K. (2009). Effects of speaking rate and vowel

length on formant frequency displacement in Japanese. Phonetica, 66,

129-149. doi:10.1159/000235657

Hoequist, C. E. (1982). Duraional correlates of linguistic rhythm cate-

gories. Phonetica, 40, 19-31. doi:10.1159/000261679

Hoogshagen, S. (1959). Three contrastive vowel lengths in Mixe. Zeit-

S. KAWAHARA, A. BRAVER

148

schrift für Phonetik und allgemeine Sprachwissenschafe, 12, 111-

115.

Ishihara, S. (2003). Intonation and interface conditions. Doctoral Dis-

sertation, Cambridge, MA: MIT.

Jany, C. (2006). Vowel length and phonation contrasts in Chuxnabán

Mixe. Santa Barbara papers in linguistics 18: Proceedings from the

9th Annual Workshop on Native American Language.

Jany, C. (2007). Phonemic versus phonetic correlates of vowel length in

Chuxnab

n Mixe. Proceedings of Berkeley linguistics society 33S:

Languages of Mexico and Central America. Berkeley: Berkeley Lin-

guistics Society.

Kakehi, K., & Hirose, Y. (1997). Taishoo-kenkyuu-o riyoosh-ita moora

suu chikaku-no kentou [Constructive study on mora identification].

Onsei Kenkyuu [Journal of the Phonetic Society of Japan], 1, 23-28.

Kawahara, S. (2001). Similarity among variants: Output-variant corre-

spondence. BA Thesis, Tokyo: International Christian University.

Kawahara, S. (2006). A faithfulness ranking projected from a percepti-

bility scale: The case of [+voice] in Japanese. Language, 82, 536-574.

doi:10.1353/lan.2006.0146

Kawahara, S. (2012a). Amplitude changes facilitate categorization and

discrimination of length contrasts. IEICE Technical Report. The In-

stitute of Electronics, Information, and Communication Engineers,

112, 67-72.

Kawahara, S. (2012b). Durational properties of emphatic consonants in

Japanese. Newark, NJ: Rutgers University.

Kawahara, S. (2013). Emphatic gemination in Japanese mimetic words:

A wug-test with auditory stimuli. Language sciences.

Kawahara, S., & Shinya, T. (2008). The intonation of gapping and co-

ordination in Japanese: Evidence for international phrase and utter-

ance. Phonetica, 65, 62-105. doi:10.1159/000130016

Kingston, J., & Diehl, R. (1994). Phonetic knowledge. Language, 70,

419-454. doi:10.2307/416481

Kinoshita, K., Behne, D., & Arai, T. (2002). Duration and F0 as per-

ceptual cues to Japanese vowel quantity. Proceedings of ICSLP,

Denver, 757-760.

Kluender, K., Diehl, R., & Wright, B. (1988). Vowel-length differences

before voiced and voiceless consonants: An auditory explanation.

Journal of Phonetics, 16, 153-169.

Labrune, L. (2012). The phonology of Japanese. Oxford: Oxford Uni-

versity Press.

Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s

languages (2nd ed.). Oxford: Blackwell Publishers.

Lehiste, I. (1970). Suprasegmentals. Cambridge: MIT Press.

Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel

quality systems: The role of perceptual contrast. Language, 48,

839-862. doi:10.2307/411991

Lindblom, B. (1986). Phonetic universals in vowel systems. In J. Ohala,

& J. Jaeger (Eds.), Experimental Phonology (pp. 13-44). Orlando:

Academic Press.

Lisker, L. (1986). “Voicing” in English: A catalog of acoustic features

signaling /b/ versus /p/ in trochees. Language and Speech, 2 9, 3-11.

Liu, F., & Xu, Y. (2005). Parallel encoding of focus and interrogative

meaning in Mandarin intonation. Phonetica, 72, 70-87.

doi:10.1159/000090090

Maekawa, K. (1998). Phonetic and phonological characteristics of

paralinguistic information in spoken Japanese. Proceedings of ICSLP

1998, Australian Speech Science and Technology Association, Incor-

porated (ASSTA).

Moreton, E., & Amano, S. (1999). Phonotactics in the perception of

Japanese vowel length: Evidence for long distance dependencies.

Proceedings of the 6th European Conference on Speech Communi-

cation and Technology, Budapest.

Myers, S., & Hansen, B. (2007). The origin of vowel length neutraliza-

tion in final position: Evidence from Finnish speakers. Natural Lan-

guage and Linguistic Theory, 25, 157-193.

doi:10.1007/s11049-006-0001-7

Nasu, A. (1999). Chouhukukei onomatope no kyouchou keitai to yuu-

hyousei (Emphatic forms of reduplicative mimetics and markedness).

Nihongo/Nihon Bunka Kenkyuu, Japan/Japanese Culture, 9, 13-25.

Padgett, J. (2002). Contrast and postvelar fronting in Russian. Natural

Language and Linguistic Theory, 21, 39-87.

doi:10.1023/A:1021879906505

Pluymaekers, M., Ernestus, M., Baayen, H., & Booij, G. (2010). Mor-

phological effects on fine phonetic detail: The case of Dutch igheid.

In C. Fougeron, B. Kühnert, M. D’Imperio, & N. Vallée (Eds.), La-

boratory Phonolog y 10 (pp. 511-531). Berlin: Mouton De Gruyter.

Podesva, R. (2000). Constraints on geminates in Burmese and Sela-

yarese. In R. Bilerey-Mosier, & B. D. Lillehaugen (Eds.), Proceed-

ings of West Coast Conference on Formal Linguistics 19 (pp. 343-

356). Somerville: Cascadilla Press.

Port, R., Dalby, J., & O’Dell, M. (1987). Evidence for mora timing in

Japanese. Journal of the Acoustical Society of America, 81, 1574-

1585. doi:10.1121/1.394510

Prince, A. (1980). A metrical theory for Estonian quantity. Linguistic

Inquiry, 11, 511-562.

R Development Core Team (1993-2013). R: A language and environ-

ment for statistical computing. R Foundation for Statistical Comput-

ing, Vienna, Austria.

Schwartz, J.-L., Boë, L.-J., Valleé, N., & Abry, C. (1997a). The disper-

sion focalization theory of vowel systems. Journal of Phonetics, 25,

255-286. doi:10.1006/jpho.1997.0043

Schwartz, J.-L., Boë, L.-J., Valleé, N., & Abry, C. (1997b). Major

trends in vowel system inventtories. Journal of Phonetics, 25, 233-

253. doi:10.1006/jpho.1997.0044

Taheri, A. M., & Xu, Y. (2012). Phonetic realization of prosodic focus

in Persian. Proceedings of Speech and Prosody 2012, Shanghai.

Thomas, K. D., & Shaterian, A. (1990). Vowel length and pitch in Ya-

vapai. In M. Langdon (Ed.), Papers from the 1990 Hokan-Penutian

languages workshop (pp. 144-153). University of Southern Illinois.

Turk, A., Nakai, S., & Sugahara, M. (2006). Acoustic segment dura-

tions in prosodic research: A practical guide. In S. Sudhoff, D. Len-

ertova, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, &

J. Schliesser (Eds.), Methods in Empirical Prosody Research (pp.

1-27). Berlin: Walter de Gruyter. doi:10.1515/9783110914641.1

Xu, Y., & Xu, C. X. (2005). Phonetic realization of focus in English

declarative intonation. Journal of Phonetics, 33, 159-197.

doi:10.1016/j.wocn.2004.11.001