Acoustic correlates of Mongolian stress are better explained by default-to-same than default-to-opposite: a preliminary investigation

Wang. Y, Chon. R, Garcia. J, Zhang. YY, Lindsey. K


In this study, we will share results from our recent investigation into the phonetic correlates of stress in Khalkha Mongolian, using novel speech data obtained from four native speaker consultants. We conclude that the acoustic correlates of duration and intensity support a left-aligned default-to-same-side (DSS) stress pattern instead of the right-aligned default-to-opposite-side (DOS) pattern most recently attributed to Mongolian stress (e.g., Walker 1997). Pitch is not a significant property signaling stress in either the DSS or DOS pattern. These findings support Gordon’s (2000) conjecture that the DOS pattern is non-existent and that all DOS stress systems would be recategorized with a closer look at the acoustics.

In the early theoretical literature, Khalkha Mongolian was categorized as having a DSS stress system, in which the leftmost heavy syllable (syllable with a long vowel or diphthong) is stressed; otherwise, the leftmost syllable receives stress (e.g., Prince 1983, Hammond 1986, Halle & Vergnaud 1987, Idsardi 1992, Hayes 1995). However, an alternative DOS stress pattern has recently been attributed to Khalkha Mongolian in which stress falls on the rightmost non-final heavy syllable of a word, else on the final heavy syllable if it is the only one, else on the leftmost light syllable (Bosson 1964, Poppe 1970, Walker 1997, Özçelik 2016).

None of these studies supported their results with evidence that meets the best practice guidelines for disentangling the acoustic properties of phrase-level intonation from those of word-level stress (Roettger & Gordon 2000). Oftentimes, these studies looked at words only in isolation, from only one speaker, or without analyzing common acoustic correlates (such as pitch, intensity, and duration) across vowels in a substantially-sized dataset. Thus, in our pilot study, we elicited words of many different lengths, in both isolation and sentence contexts, from multiple speakers.

We analyzed the vowels (N=2848) of seven native speakers of Khalkha Mongolian (two male, five female, aged 16-39) from an elicitation in which speakers read 52 words in three contexts: isolation (repeated twice), sentence-initial, and sentence-final. The words were selected from Wiktionary ( so that words with a variety of light syllables (short vowels) or heavy syllables (long vowels or diphthongs) were represented. The sentence frames were designed so that the word in question lacked phrasal prominence. Each vowel was coded for vowel type (short, long, diphthong), position in the word, DOS stress (whether or not the vowel would be stressed according to Walker’s [1997] definition), and DSS stress (whether or not the vowel would be stressed according to Hayes’ [1995] definition).

For each vowel, we used a Praat script to extract acoustic measurements that correlate with stress, including pitch (mean F0), intensity (mean intensity), and duration.

After gathering measurements, we compared stressed and unstressed vowels in heavy syllables. Stressed heavy vowels under a DOS analysis are significantly longer than unstressed heavy vowels (p<.001), but are neither higher in pitch nor higher in intensity (see Fig 1, upper). Under a DSS analysis, stressed heavy vowels are both significantly longer (p<.001) and higher in intensity ( p=0.002) than unstressed heavy vowels, but are not higher in pitch (see Fig 1, lower).

Next, we compared the acoustics of stressed vowels with unstressed vowels in words with only light syllables, in which stress is predicted to fall on the leftmost syllable, regardless of analysis (DOS or DSS). We found that stressed vowels in light syllables have significantly longer duration and higher intensity than unstressed vowels, but the difference in pitch is not significant (p<.001, p<.001, p=0.298, see Figure 2).

Finally, we compared two logistic regression models (one with a dependent variable of DOS-defined stress and the other with a dependent variable with DSS-defined stress), with speaker as a random variable, duration and intensity as independent variables. We found that longer duration and higher intensity are significantly correlated with both DOS and DSS stress. However, the model summary shows the “deviance” of the DSS model is lower than the DOS model (2922.8< 3240.6), indicating that the acoustic correlates made better predictions in the DSS model.