How We
Think About
the Sounds
of
Chinese—
Any
Implications for
Psycholinguistics?*
Bii Ming
(bii.ming@BuhTswenTzay.edu.cn)
Keynote presented at the International Conference
on Chinese Linguistics
February 30, 2008
Nanjing, Sinitic
States of China
*This work was supported by grant #LKJHG985432
from the
Parliament of the
Sinitic States of China.
I would like to share some observations about how linguists and Chinese native speakers think about the sounds of Chinese, and whether this may have influenced the development of the study of Chinese or even the development of the language itself.
Of course, as we all know, when people refer to “the Chinese language”, they usually are referring to Wu—particularly, to the standard variety, based on the Shanghai dialect of Wu, which is the most widely spoken Chinese variety and the official language of the Sinitic States of China. Of course, this came about because of historical and sociopolitical reasons rather than linguistic ones: when Shanghai became the capital of the Sinitic States of China, the language there was declared the official language, and it became a major world language when the SSC became one of the founding members of the United Nations formed during the 1930s international crisis that nearly led to global war. From a linguistic perspective, there is no reason to suppose that the features of standard Chinese based on the Shanghai dialect are representative of all Chinese varieties. There are many other Chinese languages, some of which in fact have very different phonological systems than standard Chinese. Unfortunately, much research (especially psycholinguistic research) has focused on standard Chinese and has ignored the other Chinese languages, like Mandarin. But examining minority languages like Mandarin is crucial for developing a better understanding of how language is represented and processed. Here I wish to reflect on the phonetic cueing of important contrasts in different Chinese varieties, how we think about which cues are the main ones, and whether or not these have any implications for the cognitive representation and processing of these languages.
Phonological contrasts are usually cued via multiple phonetic correlates. Consider, for instance, the Chinese syllables thaq (塔; [tʰaʔ] in the International Phonetic Alphabet) and daq (達; [daʔ]). The difference between these is in the voicing and aspiration of the stop. But there are also differences between these syllables in terms of the fundamental frequency of the sonorant part of the syllable (the vowel). thaq is pronounced with a high, level pitch; daq is pronounced with a low, slightly rising pitch. Of course, the lowering of fundamental frequency after voiced segments is a common phenomenon across languages, as phoneticians have known for decades. This is just one example of a case where a contrast cued primarily by one feature (such as voicing of an obstruent) may have additional secondary cues on the same segment or neighbouring segments (such as fundamental frequency of the following vowel). Readers with a background in phonetics will no doubt be familiar with countless similar examples.
We can see a similar pattern in another Chinese language, Mandarin, which is spoken in several minor cities in northern SSC, such as Peking. Consider the following syllables from Mandarin:
Chinese Character | Romanization | IPA |
拔 | bar | [pa] |
把 | baa | [pa̰ː] |
罢 | bah | [''pa] |
The character 把, Romanized as baa in Mandarin, is pronounced with a long [a] vowel with creaky phonation.1 拔 (bar), by comparison, has a shorter vowel and more modal phonation. 罢 (bah) has a shorter vowel than both of them, and higher intensity (indeed, there is ongoing debate over whether these sounds are evidence for a typologically rare three-way vowel length contrast, or whether phonation type and intensity are the primary cues for these differences and duration is merely a secondary cue). But pitch also forms a secondary cue for these, like in standard Chinese. 把 (baa) is typically produced with a low pitch, which falls even lower as the syllable unfolds (and sometimes rises again at the end of the syllable). The pitch of 拔 (bar) begins around the middle level and then rises over the course of the syllable. And the pitch of 罢 (bah) falls sharply. These secondary pitch patterns are not always limited to the vowel, but can spread over the entire sonorant part of a syllable (and even have coarticulatory influence on the secondary pitch cues of the following syllable). These facts are well known to scholars of Chinese dialects, but are often neglected by psycholinguists who treat the contrasts between these syllables as being segmental. For example, the Romanization system used for Mandarin (Gwoyeu Romatzyh, developed by linguist Jaw Yuenrenn) does not directly represent the pitch differences in these syllables. This is, of course, a desirable design feature for a phonemic script, as these differences are merely secondary cues to the identity of the syllable—an orthography which indicates the pitch of Mandarin syllables seems just as outlandish as one which indicates the pitch of English syllables (imagine if pop and bop were instead written póp and pòp).
(In fact, such orthographies were created before. For example, the linguist Jou Yeouguang developed a system called “Hànyǔ Pīnyīn” which explicitly represented the pitch contours on syllables. In that system, 把 (baa) would be written bǎ. Rather than suggesting that this syllable has a longer or creakier vowel, this writing system emphasizes that the vowel happens to come with a low-falling-rising pitch contour (note the shape of the diacritic above the vowel). 罢 (bah) would be written bà, with the diacritic representing the falling pitch contour, and no representation of the short vowel length or high intensity. This orthographic system was difficult to input on modern keyboards and eventually fell out of use, but can still be seen in old place-names; for example, contemporary foreign writings sometimes referred to Peking as Beijing, and these sorts of archaic spellings are still often used stylistically in, e.g., restaurant names, as in the ubiquitous chain Běijīng Express.)
Thanks to modern experimental methods using computers to independently manipulate certain acoustic cues while holding others constant, we know that pitch differences in languages like Mandarin and standard Chinese are just secondary cues, and that listeners’ lexical access, and perception and judgment of phones they hear, relies more on the primary cues like voicing and vowel quality. Certainly Mandarin and standard Chinese are not “tone languages” like Vietnamese, Yorùbá, etc. Accordingly, modern psycholinguistics of Mandarin and of standard Chinese has generally focused on the important segmental differences between sounds, and has mostly paid little attention to the processing of secondary pitch differences (with a few notable exceptions in the literature on processing sub-phonemic acoustic differences). However, some linguists once entertained the possibility that those pitch patterns are the primary cue to phonologically important differences, and treated these languages as tone languages (as in, e.g., “Hànyǔ Pīnyīn”, discussed above).2 Under such a view, the important difference between 把 (baa) and 罢 (bah), for example, is not the vowel duration and quality, but the fundamental frequency pattern over the course of the syllable; the vowel duration and quality are just unimportant secondary cues. It is interesting to imagine what a psycholinguistics of Chinese may look like if this historical view of pitch as a special primary cue had remained dominant.
Consider the case of Mandarin 把 (baa, or [pa̰ː] in IPA) and 罢 (bah, or [''pa] in IPA). If Mandarin were treated as a tone language, people might consider these words to differ in tone only (rather than in segmental properties of the vowel), and to wonder about whether that kind of difference is fundamentally different than another segmental distinction, such as that between 不 (buh [''pu]) and 罢 (bah [''pa]). In reality, of course, both pairs differ by one segment. But under the Mandarin-as-a-tone-language view, it looks as if the first pair differs in one “tone” and the second pair differs in one segment. Would that difference be perceived as cognitively important? We can almost imagine (far-fetched as it may sound) a whole psycholinguistic research enterprise emerging to test whether the difference between baa and bah is represented or processed differently than the difference between buh and bah!
A special treatment of pitch in Mandarin would also have substantial consequences for, e.g., the measurement of lexical neighbourhood statistics. A word’s neighbours are typically defined as other words that differ from that word just by the replacement, insertion, or deletion of one phoneme; thus, it is easy to see that ba and baa, for example, are neighbours. On the other hand, ba and daa (打) are not neighbours, since they differ by two phonemes. Under the Mandarin-as-a-tone-language view, however, these only differ by one segment, since they would be considered to have the same vowel (just with different pitch contours). Such a conceptualization would raise challenging questions about what constitutes a phonological neighbour, and whether segments and pitches have the same importance in determining neighbourhoods (or similar lexical statistics, like transitional phoneme probabilities).
It certainly seems fantastical to consider what kinds of far-fetched research questions could be entertained if we had developed an entirely different conceptualization of the important features of Chinese sounds. But in the hypothetical situation we are imagining here, such research questions might not even be implausible. The way we are trained to think about and categorize the sounds of our language can have an important influence on how these sounds are cognitively represented. Here I am not just talking about how linguists think the phonology of their language is organized; I am talking about what happens to everyday, naive speakers of a language. For example, research has already shown that people who are experienced users of an alphabetic script process phonemes of their language differently (in psycholinguistic experimental tasks such as phoneme monitoring) than people who are not (e.g., illiterate or dyslexic speakers, very young children, and speakers of Chinese varieties that do not have any alphabetic script).3 This suggests that explicit, metalinguistic phonological awareness changes the way phonemes are processed and represented in the mind. Therefore, if linguists and naive speakers alike had spent the past several generations considering syllables like baa and ba to differ in tone rather than in vowel, naive speakers’ unconscious intuitions about the structure of their language may have gradually changed to fit that conceptualization. Secondary acoustic cues like pitch may have become the primary cues, reified in the native grammar. Maybe speakers would even have adopted an orthography that explicitly represents the pitch contours of syllables (see Footnote 2). What would Mandarin grammar look like if history had unfolded this way? Would their cognitive representation and processing of pitch cues be different because of it? Would the psycholinguistic evidence suggest that pitch, rather than vowel length and quality, are in fact primary cues? What ideas would linguists and psycholinguists in such a world have about the Chinese language? And what would they think of our ideas? Through their eyes, would our assumptions about the nature of Chinese sounds look as implausible as their assumptions might look to us?
1 This is an oversimplification; the Mandarin long creaky vowels undergo several phonological alternations in non-phrase-final contexts (e.g., they change to short modal vowels when followed by another long creaky vowel). A detailed review of this so-called “vowel sandhi” is beyond the scope of the present piece.
2 This view may not be as far-fetched as they may seem at first blush. While pitch is a secondary cue to syllable identity in languages like Mandarin and standard Chinese, other Chinese languages such as Hokkien (the official, and most widely spoken, language of The Republic of Formosa) are now agreed to be tone languages. For example, Hokkien syllables undergo pervasive phonological changes which seem to be best explained as paradigmatic changes in tone. The Wuxi dialect, another Wu variety closely related to standard Chinese, also shows some similar patterns. In this context, it is not surprising that even languages like Mandarin were once treated as tone languages.
3 For comparison, consider nearby Hong Kong, where the major language is another Chinese language, Hakka, which has no widely-used Romanization. After Hong Kong, a former British colony, became independent, no standard Romanization was adopted, and to this day Hong Kong Hakka is rarely written with anything other than Chinese characters. Hong Kong Hakka speakers often have difficulty describing the difference between many minimal pairs in their language, likely because of the lack of any phonemic orthography to reify those phonemes in their minds.