Linguistics 001 Lecture 2 The sounds and sound patterns of language
Phonetics vs. Phonology
Today we'll be talking about the sound structure of human language, and the two fields that are dedicated to its study:
The sound structure of language encompasses quite a lot of topics, including the following.
Instead of giving a whirlwind tour of the whole of phonetics and phonology, this lecture has two more limited goals. The first goal is to put language sound structure in context.
The second goal is to give you a concrete sense of what the sound systems of languages are like. In order to do this, we will go over examples of sound alternations in various languages. Along the way, a certain amount of the terminology and theory of phonetics and phonology will emerge.
Phonetics: the sounds of language
While our discussion will range back and forth somewhat between the two subdisciplines, we will essentially be progressing from the nuts and bolts mechanics of speech sounds through their classification and representation and on to their systematic organization within a given language. Thus we can divide up the lecture into a more or less phonetic half and a more or less phonological half.
Vocal tract anatomy
The vocal tract is what we use to articulate sounds. It includes the oral cavity (essentially the mouth), the nasal cavity (inside the nose), and the pharyngeal cavity (in the throat, behind the tongue). For most speech sounds, the airstream that passes through this tract is generated by the lungs.
A number of anatomical features of humans that originated for quite different functions have been recruited to serve the purposes of language. Many of these same recruitments have been made by other animals for vocalization.
In some cases the anatomy seems to have evolved specifically to serve language independent (and even contrary to) the original function.
Strikingly, the lowering of the larynx, which permits a greater variety of articulations with the tongue, has the consequence of making it much easier for humans to choke. These X-rays and diagrams show the vocal tracts of the gorilla, chimp, and human, highlighting the tongue, larynx, and air sacs (the last for the apes only).
Adapted from W.T. Fitch, The Evolution of Speech
The longer vocal tract (seen behind the tongue in the human) separates the soft palate and epiglottis, so that airflow between the larnyx and the nose cannot avoid passing through the oral cavity. This is why humans choke more easily than other primates. Obviously the selective advantage of increased articulatory ability must have been quite strong to justify the increase in the likelihood of choking. (We'll talk more about this in the last part of the course.)
The following illustration is called a midsagittal section: it's what the head would look like if you cut it in half along the front-back dimension. Don't worry, this is just a plastic model....
From the Ultimate Visual Dictionary, p. 245
This diagram includes many detailed anatomical features that you certainly don't need to learn, but it should give you an idea of the complex context in which speech sounds are articulated.
Here is a less detailed diagram showing the most important parts of the vocal tract.
From Language Files (7th ed.), p. 40
We'll be referring to these places in the vocal tract when describing the way various sounds are produced.
Basic sounds: buzz, hiss, and pop
There are three basic modes of sound production in the human vocal tract that play a role in speech: the buzz of vibrating vocal cords, the hiss of air pushed past a constriction, and the pop of a closure released.
The larynx is a rather complex little structure of cartilage, muscle and connective tissue, sitting on top of the trachea (windpipe). It is what lies behind the "adam's apple," the protrusion in the front of the throat (usually more prominent in males). The original role of the larynx is to seal off the airway, in order to keep food, liquid and other unwanted things out of the lungs, and also to permit the torso to be pressurized (by holding in air) to provide a more rigid framework for heavy lifting and pushing.
Part of the airway-sealing system in the larynx is a pair of muscular flaps, the vocal folds (also called "vocal cords"), which can be brought together to form a seal, or moved apart to permit free motion of air in and out of the lungs.
Here are the vocal cords seen when they are open to allow free passage of air. The front of the body is toward the top of the photo; we're looking down into the dark trachea.
From the Ultimate Visual Dictionary, p. 245
Now for a little aerodynamics. When any elastic seal is not quite strong enough to resist the pressurized air it restricts, the result is an erratic release of the pressure through the seal, creating a sound.
The mechanism of this sound production is very simple and general:
In many such sounds, the pattern of opening and closing is irregular, producing a belch-like sound without a clear pitch -- think of the air being released from a balloon.
However, if the circumstances are right, a regular oscillation can be set up, giving a periodic sound that we perceive as having a pitch. Many animals have developed their larynxes so as to be able to produce particularly loud sounds, often with a clear pitch that they are able to vary for expressive purposes.
When the vocal cords are vibrating regularly in this manner, we say that the sound is voiced. Without the vibration, the sound is voiceless (or equivalently, unvoiced). This is exactly the property that distinguishes many sounds in English and other languages. A few examples:
If you hold your hand to your throat, you will feel vibration for sounds like [z] but not for [s]. You will also feel it for nasals like [m, n] and for vowels like [a]; they are all voiced.
(There is another difference between sounds like [p] and sounds like [b]. The former are accompanied by the puff of breath called aspiration, while the latter are not.)
The hiss of turbulent flow
Another source of sound in the vocal tract -- for humans and for other animals -- is the hiss generated when a volume of air is forced through a passage that is too small to permit it to flow smoothly. The result is turbulence, a complex pattern of swirls and eddies at a wide range of spatial and temporal scales. We hear this turbulent flow as some sort of hiss.
In the vocal tract, this turbulent flow can be created at many points of constrictions. For instance, the upper teeth can be pressed against the lower lip -- if air is forced past this constriction, it makes the sound associated with the letter [f].
When this kind of turbulent flow is used in speech, phoneticians call it frication, and sounds that involve frication are called fricatives. Some English examples are the sounds written "f, v, s, z, sh, th."
The pop of closure and release
When a constriction somewhere in the vocal tract is complete, so that air can't get past it as the speaker continues to breath out, pressure is built up behind the constriction. If the constriction is abruptly released, the sudden release of pressure creates a sort of a pop. When this kind of closure and release is used as a speech sound, phoneticians call it a stop (focusing on the closure) or a plosive (focusing on the release).
As with frication, a plosive constriction can be made anywhere along the vocal tract, from the lips to the larynx. Three common examples:
It is difficult to make a firm enough seal in the pharyngeal region to produce a stop, although a narrow fricative constriction in the pharynx is possible.
The phonetic alphabet
The human vocal apparatus can produce a great variety of sounds. As we look at words in other languages -- and study the sounds of English in more detail -- we need a way to write these sounds down. That's what phonetic alphabets are for.
In the mid-19th century, Melville Bell invented a writing system that he called "Visible Speech." Bell was a teacher of the deaf, and he intended his writing system to be a teaching and learning tool for helping deaf students learn spoken language. However, Visible Speech was more than a pedagogical tool for deaf education -- it was the first system for notating the sounds of speech independent of the choice of particular language or dialect. This was an extremely important step -- without this step, it is nearly impossible to study the sound systems of human languages in any sort of general way.
In the 1860's, Melville Bell's three sons -- Melville, Edward and Alexander -- went on a lecture tour of Scotland, demonstrating the Visible Speech system to appreciative audiences. In their show, one of the brothers would leave the auditorium, while the others brought volunteers from the audience to perform interesting bits of speech -- words or phrases in a foreign language, or in some non-standard dialect of English. These performances would be notated in Visible Speech on a blackboard on stage.
When the absent brother returned, he would imitate the sounds produced by the volunteers from the audience, solely by reading the Visible Speech notations on the blackboard. In those days before the phonograph, radio or television, this was interesting enough that the Scots were apparently happy to pay money to see it!
There are some interesting connections between the "visible speech" alphabet and the later career of one of the three performers, Alexander Graham Bell, who began following in his father's footsteps as a teacher of the deaf, but then went on to invent the telephone. Look especially at the discussion of Bell's "Ear Phonautograph" and artificial vocal tract.
After Melville Bell's invention, notations like Visible Speech were widely used in teaching students (from the provinces or from foreign countries) how to speak with a standard accent. This was one of the key goals of early phoneticians like Henry Sweet (said to have been the model for Henry Higgins, who teaches Eliza Doolittle to speak "properly" in Shaw's Pygmalion and its musical adaptation My Fair Lady).
The International Phonetic Association (IPA) was founded in 1886 in Paris, and has been ever since the official keeper of the Inernational Phonetic Alphabet (also IPA), the modern equivalent of Bell's Visible Speech. Although the IPA's emphasis has shifted in a more descriptive direction, there remains a lively tradition in Great Britain of teaching "received pronunciation" using explicit training in the IPA.
For this course, we'll be using portions of the IPA that describe the sounds of English.
We'll begin with a chart of the English consonants. Many of these symbols have their familiar value, but don't confuse spelling with pronunciation. When we write a phonetic transcription, i.e. how a sound or word is pronounced, we'll enclose it in [square brackets] so we know to interpret the symbols in the phonetic alphabet.
Notice that the chart (like the main IPA chart) is organized along two main dimensions. Only terms needed for English are listed here.
In addition, the obstruent sounds (stops, affricates, fricatives) come in voiced and voiceless varieties. The sonorant sounds (nasals, liquids, glides) are normally voiced.
The glottal stop, which is written as , has a limited role in English. It is the catch in the throat between the two vowels in uh-oh.
The patterning of sounds in languages generally depends on the "natural classes" of sounds defined by these articulatory labels. For example, in English, the plural suffix spelled "(e)s" is realized in three different ways, depending on the preceding sound.
So the rule determining how you pronounce the plural suffix makes reference to the classes voiced, voiceless and sibilant, not to specific sounds like [b], [p] and [s].
Similarly, the past-tense suffix spelled "ed" is realized in three different ways, again depending on the preceding sound.
For both suffixes, the inserted vowel serves to separate similar sounds (i.e. it occurs when the stem ends in a consonant similar to the suffixal consonant).
As Pinker discusses, these generalizations can extend to new sounds borrowed from other languages. These German words, which end in voiceless fricatives not found in English (velar and palatal), follow the patterns just discussed when the final consonant is pronounced in the German way.
The extension of patterns in this way confirms that what speakers understand out these processes is not the arbitrary list of sounds that cause a pattern to arise, but rather the class of sounds -- which could contain members not yet heard in the language.
For vowels, a different set of terms is used.
The terms refer, loosely speaking, to the location of the main tongue constriction within the mouth.
Most of these symbols are relatively standard, at least to the degree permitted by web-friendly characters; as often in these circumstances, the ə is used for schwa, an upside-down "e" letter.
Here are English words containing the vowel sounds referred to by each of these symbols. These words also exemplify the consonant symbols.
Many Americans do not distinguish the vowels [a] and [:], pronouncing cot and caught the same way. That's just one of many variations in pronunciation for different regional dialects.
In addition to these simple vowels, English has several diphthongs (i.e. vowel sounds that essentially combine a vowel with a glide or semi-vowel in a single unit). These are written, therefore, with two phonetic symbols, even if they can (in the case of "long i") be written with one symbol in English spelling.
(It should be noted here that, in most dialects of English, all of the tense vowels are actually diphthongs. For example, say, which we have represented above as [se] is actually pronounced [sey] by most speakers. However, there is a great deal of variation from dialect to dialect in the specifics of this, and within any given dialect, there is no [e] distinct from [ey], so for our purposes in this course we can stick with the simpler representations like [se].)
Vowel symbols are especially tricky for English speakers, because changes in the history of the language have led to considerable irregularity in the mappings between vowel letters and vowel sounds in English spelling; and the symbols in the phonetic alphabet represent more or less the sounds they represent in most other languages with a Latin-based orthography, but English orthography is very different. If you know some Spanish, German or Italian, for example, you'll be better off thinking of the way vowel sounds are spelled in those languages when you're learning and using the phonetic alphabet.
There are lots of things to be careful about when doing phonetic transcription. Most important is to pay attention to the sounds, and don't be distracted by the spelling. English spelling is not designed to faithfully represent the sounds of words and is frequently quite misleading in this respect, so it's best to try to ignore it.
For example, a single letter (or combination of letters) "ng" in English spelling can represent two different pronunciations.
These have to be distinguished in a correct transcription, even though the spellings are the same -- that's a defect of English orthography.
Similarly, "th" is ambiguous.
And vowels especially are spelled chaotically -- but in phonetic transcription a particular vowel sound is always written the same way. Some examples:
The influence of orthography is powerful, even for an International Man of Mystery.
Of course, there is no "grrr" in swinger: it's like singer, without a velar stop [g].
The written "g" is part of a digraph "ng" for the velar nasal that is more properly transcribed [h] in the phonetic alphabet. (One reason to have special symbols like this is to avoid the confusion of things like the "g" in "ng".)
Your pronunciation will differ in some ways from that of your friends or the instructor. This is generally due to difference in regional dialect or sometimes a matter of age.
For example, for a dwindling number of English speakers, the two words in the name of this board game are distinct -- "wh" is voiceless, while plain "w" is voiced. That's a distinction that goes back to Old English and earlier. But for most speakers today (including me), they're homophonous and should both be transcribed with voiced [w].
In homeworks and exams, as long as you give an accurate transcription of how you pronounce something, you'll get full credit. If you think I've taken off credit unfairly for something like this, tell me! I'll ask you to pronounce a word for me, and decide on that basis whether your transcription is in fact a plausible transcription of the way you speak.
Phonology: the structure of sound
Recall the basic distinction mentioned earlier.
Thus phonetics refers to the physiological and acoustic parts of the following diagram, while phonology resides in the brain.
From The Speech Chain
Now we'll focus on the more abstract side of things, and how sounds are organized in a particular language.
The phonological elements of a language are the basic, distinctive sounds, also called phonemes. In English, these are the following (for a dialect of Standard American English).
These sounds are said to be "distinctive" because they can be used to make contrasts between different words. This can be illustrated for the stops, using minimal pairs (words that differ in exactly one sound).
And for the vowels (We can't get an exact minimal set for the entire range of vowels in the context [h_d], so in some cases the initial consonant also differs. For each individual pair of vowels, however, we could come up with a minimal pair.):
And for the nasals:
In English, the velar nasal [h] can't occur at the beginning of a word -- cf. map, nap, *ngap -- which will lead us to the next issue, the way these elements are organized into words.
But first, note that a basic way in which languages differ is their inventory of sounds, or phonemes. For example:
When you learn a new language, one of the things you have to do is learn the "list" or inventory of sounds. That's what children have to do also, when learning their native language.
The phonological structure of a language -- the way these elements are organized -- includes the notion of syllable and its subparts. This structure is crucially involved in describing the possible words of a language.
Here's a general schema of how syllables are constructed.
The category rhyme simply brings together the nucleus and the coda, so the rhyme part of the syllable blend is the nucleus [E] and the coda [nd]. The reason for this name should be obvious: in order for syllables to rhyme, what has to match is just this part of the syllable -- trend, end, spend, etc. (In longer words, rhyme is defined as matching this part of the stressed syllable and all the way to the end of the word: flower, power, shower, tower, hour, scour, etc.)
Human speech, like many animal vocalizations, tends to involve repetitive cycles of opening and closing the vocal tract. In human speech, we call these cycles syllables.
A syllable typically begins with the vocal tract in a relatively closed position -- the syllable onset -- and procedes through a relatively open nucleus, then closing again while approaching the coda or the next syllable's onset. The degree of vocal tract openness correlates with the loudness of the sound that can be made.
Speech sounds differ on a scale of sonority, with vowels at one end (the most sonorous end) and obstruents (stops, affricates, fricatives) at the other end. In between are the liquids [l] and [r], and nasal consonants like [m] and [n].
Languages tend to arrange their syllables so that the least sonorous sounds are restricted to the margins of the syllable -- the onset in the simplest case -- and the most sonorous sounds occur in the center of the syllable -- most often a vowel.
Here are some typical English syllables that illustrate this pattern.
And in "pretending" each syllable corresponds to a peak in sonority.
As a consequence of this sonority requirement, an English word such as film is one syllable:
But if we try to reverse the last two consonants, the hypothetical word fiml comes out as two syllables, since [l] is a new peak, higher in sonority than the preceding nasal. (This new word would end just like pummel.)
Similarly, if we change the [l] in film to an obstruent such as [z] in hypothetical fizm, once again we end up with a new syllable. (It would rhyme with prism.)
These syllabifications aren't something we need to learn for each word: they're a general property of the language. That's why we know how these hypothetical words would be pronounced.
In these last two words, the consonant serves as the sonority peak at the end of the word. The consonant is syllabic, serving as the nucleus in the absence of a vowel. English permits nasals and liquids to serve in this way, at least in unstressed syllables.
For [r], the consonant can function as a vowel even in a stressed syllable.
In some dialects, such as Standard British, Boston, and Coastal Southern US, any [r] in the rhyme of a syllable (whether nucleus or coda) loses its r-ness and becomes a schwa-like vowel. These are called "r-less" dialects.
Another general property of English is that there are restrictions on what consonants can serve as an onset cluster -- i.e. the string of (two) consonants at the beginning of a syllable. It's not enough for the sonority to increase from the first consonant to the second: it has to increase by two steps.
This too is part of our general knowledge of the language: we can distinguish blick and *bnick as "possible" and "impossible" even if we've never heard either word before.
But what about words like snow, with an obstruent + nasal onset cluster? You can take any ordinary English onset, and (subject to some restrictions) tack an [s] on the front of it, completely ignoring sonority. This includes clusters of two consonants that obey the general rule; if the first of these is a voiceless stop, [s] can be added to make three consonants.
This is a special property of [s] and no other obstruent in English. Essentially, it's because [s] is a perceptually salient sound with loud fricative noise: it doesn't depend in the normal way on syllable structure. Many other languages give similar special treatment to [s] and related sounds; in German (and Yiddish), for example, it's the (alveo)palatal fricative, as in Schmutz "dirt."
Once again, syllable structure is a way in which languages differ.
A language learner, when exposed to lots of examples of words and syllables in a new language, comes to understand what structures are possible in that language by observing the attested patterns.
There are often differences in the way a phoneme is pronounced in a specific context. The variant pronunciations are called allophones ("other sounds").
When it's important to make this difference:
A classic example of sound alternation in English, which I mentioned in the first lecture, relates to the [s] found at the beginning of a syllable before a voiceless stop.
Although a word like spin is basically pin with [s] added, the /p/ in each case is pronounced differently.
The same is true for pairs like pit~spit, pot~spot, pair~spare, etc.
A simple statement of this alternation is as follows:
But the same generalization holds not just for /p/ but for the other voiceless stops, /t/ and /k/. Compare these word pairs:
So more accurately, there's a single general statement that covers all these cases, stated in terms of natural classes.
The aspirated and unaspirated versions of the voiceless stops are in complementary distribution: each occurs in its own context, which does not overlap with the contexts of the other.
The rule stated here assumes words of one syllable only. The full statement of where aspiration occurs in English is more complex: voiceless stops are aspirated when they occur syllable-initially and are followed by a stressed vowel (rápid, raphídity); as well as word-initially regardless of stress (photháto). At the beginning of a word, a preceding /s/ prevents the stop from being syllable- or word-initial.
If related words (containing the same morpheme, or meaningful element) have different stresses, then we often find alternations in whether the same underlying sound /t/ is pronounced phonetically as plain [t] etc. or aspirated [th] etc.
This process is completely unconscious for most speakers, and often quite hard to unlearn.
Aspiration in English is a small example of what phonological knowledge consists of:
The study of phonology is largely the investigation of alternations like this -- what changes occur, what sounds undergo them, and in what contexts.
A prominent feature of American English affects /t/ and /d/, and is called flapping. A flap is a quick motion with the tongue, in this case against the alveolar ridge. It's similar to the /r/ of Spanish in a word like para, although it's a separate phoneme in that language.
All these English words have flaps where "t" or "d" is written in the spelling (in the relevant dialects).
The proper phonetic symbol for a flap is - it's an "r" missing the top left serif.
For most speakers, in the right context a sound that is phonologically /t/ will end up sounding phonetically just like one that is phonologically /d/, since both become a flap . Though /t/ and /d/ are distinguished by voicing, the flap [ ] is voiced.
Thus these words are all homophonous in flapping dialects, i.e. they're pronounced the same.
And the answer in this exchange is therefore ambiguous:
It's possible for a more emphatic pronunciation such as [læthr] to avoid the ambiguity; but that's not the usual pronunciation.
Of course, /t/ and /d/ don't always end up as flaps. In many contexts they're distinct, as these minimal pairs illustrate.
The question, then, is what context causes flapping to occur.
There are two conditions:
If you compare the list of homophones (with flapping) vs. minimal pairs (without flapping), you'll see that only the homophones satisfy both these conditions -- and so flapping occurs, which is what makes them homophones.
The same basic word (or word root, or morpheme) will sometimes undergo flapping, sometimes not, as the context changes. This includes adding a vowel:
As well as moving the stress (primary ´ or secondary `):
(If the /t/ is syllable- or word-initial, then it's also aspirated, as we should expect.)
And when English borrows a new word, the new word is subject to these patterns too, regardless of the exact situation in the original language.
These facts show that aspiration and flapping are active parts of our knowledge of English; it's not just something we learn about individual words (such as flapped latter), but rather something that we know about the language.
One of the most interesting things about flapping is how it interacts with a process affecting the diphthong /ay/ in many dialects of English. When followed by a voiceless consonant, the diphthong is "raised" so the first part is more like the first vowel of mother than that of father.
What is interesing is that this distinction between [ay] and raised [əy] is maintained even when the voicing distinction is eliminated by flapping. Thus if a speaker has raising in write, that pronunciation is maintained in writer, while rider will have [ai] just like ride.
Thus even though flapping eliminates the distinction between the consonants in these two words, they still do nor rhyme.
Most speakers are aware that the two words are pronounced differently, but they think that the difference lies in the consonants. In fact, as far as the actual sounds produced are concerned, the difference lies entirely in the vowels.
The reason why speakers "hear" the difference in the consonants is because, on an abstract level in their minds, the words are represented as /rayter/ and /rayder/, with the difference localized in the consonant. The raising of the /ay/ in the former, and the flapping of the consonants in both are subsequent unconscious processes.
What we must ask ourselves, then, is how raising keeps working properly to distinguish these two words when the conditioning factor on raising -- voicing on the following consonant -- has been obliterated.
We might imagine that speakers raise the /ay/ in writer on analogy to write, where the conditioning factor is still intact. However, this leaves it as mysterious why they treat the two words differently for the purposes of flapping. If phonological rules are simply steered by the properties of the basic root, then we would expect flapping in writer to fail because it fails in write.
Instead, it seems like what is happening is that speakers have an abstract representation of a word in their minds, and they apply phonological rules to these representations in some order, so that the output of one rule can be the input to another:
What does phonology do for us?
So we've gotten an idea now of how the sounds of language are produced, how they are classified, and how they fit together into systems. An interesting question to ask is why it should be as it is. In this last section we will consider this issue and see that the phonology of human language is an ingenious solution to a serious problem.
Apparent design features of human spoken language
We can start by listing a few characteristics of human spoken languages:
Experiments on vocabulary sizes at different ages suggest that children must learn an average of more than 10 items per day, day in and day out, over long periods of time.
A sample calculation:
Most of this learning is without explicit instruction, just from hearing the words used in meaningful contexts. Usually, a word is learned after hearing only a handful of examples. Experiments have shown that young children can learn a word (and retain it for at least a year) from hearing just one casual use.
Let's put aside the question of how to figure out the meaning of a new word, and focus on how to learn its sound.
You only get to hear the word a few times -- maybe only once. You have to cope with many sources of variation in pronunciation: individual, social and geographical, attitudinal and emotional. Any particular performance of a word simultaneously expresses the word, the identity of the speaker, the speaker's attitude and emotional state, the influence of the performance of adjacent words, and the structure of the message containing the word. Yet you have to tease these factors apart so as to register the sound of the word in a way that will let you produce it yourself, and understand it as spoken by anyone else, in any style or state of mind or context of use.
In subsequent use, you (and those who listen to you speak) need to distinguish this one word accurately from tens of thousands of others.
Note that the perceptual error rate for spoken word identification is less than one percent, where words are chosen at random and spoken by arbitrary and previously-unknown speakers. In more normal and natural contexts, performance is much better.
Let's call this the pronunciation learning problem. If every word were an arbitrary pattern of sound, this problem would probably be impossible to solve.
So what makes it work?
The Phonological Principle
In human spoken languages, the sound of a word is not defined directly (in terms of mouth gestures or acoustic wave patterns). Instead, it is mediated by encoding in terms of a phonological system:
How does the phonological principle help solve the pronunciation learning problem? Basically, by splitting it into two problems, each one easier to solve.
Additional Online resources
These are not a required part of the course materials, but are presented for those students who are interested in further information.