From Speech Perception to Language Andrew Nevins (Harvard - - PowerPoint PPT Presentation

from speech perception to language
SMART_READER_LITE
LIVE PREVIEW

From Speech Perception to Language Andrew Nevins (Harvard - - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006 Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative


slide-1
SLIDE 1

From Speech Perception to Language

Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006

slide-2
SLIDE 2

Your background?

Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative data? Top-down vs. bottom-up processing?

slide-3
SLIDE 3

Stress is part of all speakers’ knowledge

Abso-bloomin’-lutely

slide-4
SLIDE 4

perVERT PERvert Stress can indicate lexical contrasts Its acoustic correlates involve greater duration, greater amplitude, and pitch contour on the stressed syllable (the vowel carries most of this)

slide-5
SLIDE 5

Contrastive vs. Fixed Stress

In languages like English and Russian, stress is not always fixed in the same position, so it can be used to contrast different words (e.g. trústy vs. trustée; or pi.sál vs. pí.sal, a mistake to be careful of!) In other languages (Czech, French, Turkish, Polish, Finnish,...) stress is always in a fixed location (e.g. always on the 1st syllable in Czech, always on the last syllable in French, etc.)

slide-6
SLIDE 6

Perceptual use of Fixed Stress

Canyoureadthiseasilywithoutpunctuationorspaces? Vroomen 1998: Learners can use stress as a cue for word boundaries in an artificial word monitoring task Infants at 7.5 months can already segment words from fluent speech the bnick strategy is one way (“stab nick”) Metrical segmentation strategy...

slide-7
SLIDE 7

The same effect in free stress languages?

English speakers sometimes (accidentally) take the stressed syllable to be evidence for a word boundary Thus in a must to a.vóid, a common “slip of the ear” is something like a muscular void 9-month infants prefer to listen to strong-weak words (róbin) than weak-strong words (giráffe) [Jusczyk, Cutler & Redanz 93]

slide-8
SLIDE 8

Infants finding words

How do infants learn new words? How do they separate the target word from the surrounding context? “Fast mapping” and Carey’s chromium study Brent & Siskind: one-word utterances occur only 9% Boundaries between words are not marked by acoustic events

slide-9
SLIDE 9

Jusczyk & Aslin

Two groups of infants: one heard cup and dog during familiarization phase, other group heard feet and bike “The cup was bright and shiny” “Meg put her cup back on the table” During test phase, both groups heard sentences with all 4 A later experiment showed they had no preference for tup,bog,zeet,gike: they are not storing words “coarsely”

slide-10
SLIDE 10

How do they do it?

Allophony: aspiration vs. flapping vs. glottalization (notate,notable,note) Transitional probabilities? P(AB)/P(A): in “prettybaby” -- TP(pre,tty) > TP(tty,ba) “Local minima” of TPs might be used to find word boundaries Based on 2 minutes(!) of exposure: pulikiberagafodaru infants have a preference for words with high internal TP (Saffran et.al) Note that statistics are not a panacea...

slide-11
SLIDE 11

Unique Stress Constraint

The Unique Stress Constraint: chewbácca vs. dárthváder Take the sequence WSSSW, the USC will automatically segment this as [WS][S][SW]. Take SWWWS. Already you know there are 2 words, and probabilities can work on the medial W’s.

Yang & Gambell 2005

slide-12
SLIDE 12

Algebraic “subtraction”

If you already know “big” then extracting “snake” is easy in bigsnake Kids seem to do this, saying “two dults”, perhaps after doing subtraction on adult and “I was hajve” after behave

Yang & Gambell 2005

slide-13
SLIDE 13

Language-specific processing

How much of perception is guided by “training”: what language you already speak? “Top-down” influences on processing

slide-14
SLIDE 14

Day 2

The effects of contrastive status (a linguistic property about the way the lexicon is built up)

  • n the way that raw acoustic properties are perceived
slide-15
SLIDE 15

Stress “Deafness” in French

It is well known that one’s native phonology affects one’s ability to perceive segmental contrasts; e.g. the difficulty of [l]/[r] perception by Japanese speakers Dupoux & Peperkamp suggest that it may also affect one’s ability to perceive suprasegmental contrasts

slide-16
SLIDE 16

Stress-deafness Test

Subjects required to learn 2 CVCV nonwords that differ only in (a) place of articulation of C2 or (b) stress, and transcribe auditorally presented sequences i.e. kúpi-kúti vs. mípa-mipá

Longer duration for stressed σ Higher F0 for stressed σ Dupoux & Peperkamp

slide-17
SLIDE 17

Speakers of fixed- stress languages are comparatively bad at perceiving contrastive stress

Note that Finnish has initial fixed stress and Spanish has final fixed stress

slide-18
SLIDE 18

Rhythm and Prosody

“Those guys talk fast!” “I can’t find the word boundaries!”

slide-19
SLIDE 19

Rhythmic differences across languages

Syllable-timed rhythm (Sp., It.) vs. stress-timed rhythm (Eng,Du) Lloyd James/Kenneth Pike: “Machine-gun languages versus Morse-code languages”

slide-20
SLIDE 20

Acoustic Correlates of Rhythm?

Durational Isochrony (“even spacing”) not experimentally upheld Phonological characteristics (Dauer 1983)(a) more syllable types in stress-timed languages (e.g. CCVC, VCC, etc.) (b) reduction of unstressed syllables Yet: Catalan has same syllable structure as Spanish, but has vowel reduction; Polish allows complex syllable types, but has no reduction

slide-21
SLIDE 21

Ratios and variance

Take a look at the spectrogram...which is more salient? Ramus et. al measured vowel/consonant intervals “Next Tuesday on”: [n][e][kst][u][sd][eio][n] %V and Variance(C)

slide-22
SLIDE 22

Babies can tell!

slide-23
SLIDE 23

Rhythmic differences

The next local elections will take place during the winter Le prossime elezioni locali avranno luogo in inverno Tsugi no chiho senkyo wa haruni okonawareru daru Infants hear speech filtered at 400 Hz...

slide-24
SLIDE 24

Homework assignment distribution: three parts Feel free to ask questions! nevins@fas.harvard.edu Individual appointments possible Requests for next week’s discussion are encouraged

slide-25
SLIDE 25

Day 3: Categories and Speech-Specificity

What makes something a category? How does “speech mode” influence perception?

slide-26
SLIDE 26

The effects of contrastive status

A,B a pair of sounds are used contrastively in a language, A,B only differ along a single acoustic dimension Tokens of sounds produced in between the extremes of “A”-ness and “B”-ness may be perceived differently depending on whether they are used contrastively in the language

slide-27
SLIDE 27

The only acoustic difference: [la] has falling F3 and [ra] has rising F3 Liberman et. al presented a continuum

  • f

linguistic stimuli and non-linguistic stimuli.

slide-28
SLIDE 28

Idealized Categorization

items 5-8 are categorized as ”A” 100% items 5-8 are categorized as “B” 0%

items 1-4 are categorized as ”A” 100% items 1-4 are categorized as “B” 0%

Idealized Categorization: 8 stimuli vary along an acoustic dimension in even steps Nonetheless, they are perceived as belong to 2 distinct groups

slide-29
SLIDE 29

Idealized Discrimination

There is a point between each adjacent stimulus

  • n the continuum which indicates subjects’ ability

to correctly guess “identical” or “not identical” Within “Category”, subjects cannot reliably discriminate two acoustically different

  • stimuli. They can only
  • guess. (50%)

But across “category”, they are perfect, even though the acoustic difference here is the same as other pairs

slide-30
SLIDE 30

Visual Light Wavelength (Nanometers)

slide-31
SLIDE 31

English speakers: Stimuli 1-6 categorized as [ra] around100% Stimuli 7-8 not reliably categorized Stimuli 9-13 categorized as [ra] around 0% Discrimination of Stimuli 3 steps apart varied near category boundary for English speakers; discrimination function shows no pattern for Japanese speakers

slide-32
SLIDE 32

MMN only for Hindi speakers when -50ms stimulus presented after sequence of -10ms stimuli

slide-33
SLIDE 33

On the stimuli that were F3 transitions alone, both populations had non-categorial perception

slide-34
SLIDE 34

Contrastiveness and Distributional Patterns

If a 2-way distinction is contrastive in the language, will it show the unimodal pattern, which has the most actual utterances most centered around the middle

  • f the continuum, or will it have more utterances that are near the extremes?

Hint: think about humans’ identification function when there are two contrastive categories along such a continuum

These “bell curved” distribution functions, with highest frequency centered symmetrically around a mean are called Gaussian distributions.

slide-35
SLIDE 35

Maye, Werker & Gerken

Infants heard: 16 tokens on 8-point ta-da continuum, 4 ma, 4 la 2.3 minutes total Then, they were presented tokens 3 & 6, and tokens 1 & 8 Infants in the bimodal condition looked longer in general They also looked longer when there were 3/6 presented in sequence than 1 or 8 presented alone.

slide-36
SLIDE 36

What about allophones?

These are also not in a unimodal distribution (though we don’t really have evidence that they are “as bimodal” as contrastive pairs) Learning that two categories are allophonic requires noticing that they are found in completely distinct environments (Notice Maye et.al’s kids heard the stimuli in identical environments: word-inital and followed by the same vowel)

slide-37
SLIDE 37

Is Speech processed different than sound?

Going back to the l/r study, it was interesting that Japanese speakers could distinguish F3 transitions when presented alone

da vs. ga also distinguished by F3 Duplex Perception (Liberman et. al): third formant of da/ga continuum played to one ear, and the rest of sound played to other ear

slide-38
SLIDE 38

Subjects report hearing both a whistle/tone (F3 transition alone) and a da or ga What does this suggest about which “modules” are passed/process which information?

When played in isolation, the whistles were not perceived categorically

Duplex Perception: Two Modes at Once

slide-39
SLIDE 39

Is Categorial Perception Linguistic or just Auditory?

Location {head, chin, nose, chest} Movement {circle, arc, wiggle, fingers} Handshape {5,A,G} Palm orientation {out, in, side} Baker, Idsardi, Michnick-Golinkoff, and Pettito (2005)...

slide-40
SLIDE 40

Handshape continua

15 English-speaking, non-ASL adults 15 ASL-speaking adults Continuum between two handshapes created by measuring finger distances and dividing into even steps 11 points along continuum, with 5/6 as category boundary (as determined in separate identification task) All of the points along the continua were meaningless in ASL (as “ba” and “pa” are in English)

slide-41
SLIDE 41

Signal Detection Theory

On Discrimination task: If they answer different 100%, they have perfect accuracy on different pairs, but 0 accuracy on same pairs So, instead, measure using d’: a number which measures ability to accurately & reliably tell when two stimuli are different Correct rejection: Stimuli are same, subject says same. False alarm: Stimuli are same, subject says they’re diff. Miss: Stimuli are diff., subject says they’re same

slide-42
SLIDE 42

Handshape Continuum from “5” to “Flat 0”

slide-43
SLIDE 43

ASL speakers show radically different signal detection rates when within vs. across a contrastive category

English speakers show no such trend

slide-44
SLIDE 44

Handshape Continuum from “B-bar” to “A-bar”

slide-45
SLIDE 45

ASL speakers show radically different signal detection rates when within vs. across a contrastive category

English speakers show no such trend

slide-46
SLIDE 46

Day 4: Syllables; Consonants vs. Vowels

slide-47
SLIDE 47

Perceived Epenthesis

A linguistic “illusion” based on Japanese tendency for vowel epenthesis: icecream: isu-kurim, christmas: kurisumasu Percept of [u] judgements along a durational continuum for nonce words like [abuno], [ebuzo] Again, this reflects top-down influence

  • f a native language: it can induce

perception of a vowel that isn’t there!

slide-48
SLIDE 48

Sneaking consonants into the vowel transitions

[hjumanz kan ditekt up to tweni faiv segments evri sekond] But for non-linguistic stimuli, sounds can be identified no faster than 7-9 items per second How can linguistic segments be perceived so much faster? Perhaps if they are not perceived at the level

  • f individual segments, but rather as larger

units...

These two instances of “d” have little in common acoustically. The consonant is “carried” as part of the formant transitions of the vowel

slide-49
SLIDE 49

Speech Monitoring

Mehler 1981: subjects told to detect pa were faster to do so in pa.lace than in pal.mier; subjects told to detect pal were faster to do so in pal.mier than in pa.lace Ferrand 1994 replicated this result with a naming task

Syllable structure plays a role in lexical access above & beyond that

  • f linear sequence
slide-50
SLIDE 50

Babbling at 7months

(a) Uses a reduced set of possible sounds found in spoken language (b) Organized around CV sequences (c) Used without apparent meaning or reference Is it a fundamentally motoric behavior, akin to crawling, a “motor flexing” of the mouth and jaw muscles? Or is it a linguistic activity which rehearses the syllabary?

slide-51
SLIDE 51

Manual Babbling

In response to Petitto’s claim that manual babbling exists, other researchers proposed it also just reflects motoric development Petitto (2004): compared two groups of hearing babies: one group had deaf parents If babbling is linguistic, then hearing babies of deaf parents should exhibit (a) a distinction between linguistic and non- linguistic hand movements at 7-months and (b) non-linguistic hand movements similar to those of the other group

slide-52
SLIDE 52

Petitto’s methods

Infrared emitting diodes with 0.1mm sensitivity placed on babies’ hands while they were in play sessions; videotaped too. Any movement segment (e.g. open-close, waving, etc.) counted; any time objects were in their hands did not count Finally, ASL syllables are only produced within a limited space in front of the signer’s body, basically from above shoulders to below sternum, in front of body. Their finding: both groups produced sets of 2.5-3 Hz movements. Only the babies exposed to ASL produced a consistent set of 1 Hz movements.

slide-53
SLIDE 53

Notice sign-exposed group had less 3Hz activity than the speech-exposed group

Babbling correlates with the ambient language

slide-54
SLIDE 54

The McGurk effect

Auditory ba + visual ga = da

The effect works on perceivers with all language backgrounds (e.g., Massaro, Cohen, Gesi, Heredia, & Tsuzaki, 1993; Sekiyama. & Tokhura, 1993) The effect works on young infants (Rosenblum, Schmuckler, & Johnson, 1997). The effect works when the visual and auditory components are from speakers of different genders (Green, Kuhl, Meltzoff, & Stevens, 1991). The effect works with highly reduced face images (Rosenblum & Saldaña, 1996). The effect works when observers are unaware that they are looking at a face (Rosenblum & Saldaña, 1996). The effect works when observers touch—rather than look—at the face (Fowler & Dekle, 1991).

slide-55
SLIDE 55

A further demo

The McGurk effect is an additional example of top- down influence on perception

It has also been taken to support the motor theory of speech perception, in which all percepts of speech involve perceiving the motoric gestures that were required to make them, too

slide-56
SLIDE 56

Day 5: Speculations on Functional components

And more on vowels!

slide-57
SLIDE 57

Acoustic vs. Articulatory Space

slide-58
SLIDE 58

And now, a confession

Categorical perception more robustly found for consonants than vowels, which may be perceived based on “prototypes” (More on how vowel perception works in a bit) Do consonants “matter more” for lexical contrast?

slide-59
SLIDE 59

Cultural effects on gender differences in vowels

slide-60
SLIDE 60

Speaker normalization

slide-61
SLIDE 61

Vowel triangle / dispersion

slide-62
SLIDE 62

A brief tour of Semitic

A discontinuous three-consonant root embodies core encyclopedic “concepts”

Roots are put into patterns which give them functional and argument structure. The overarching pattern is that consonants are thus used lexically while vowels are used functionally

No lgs. are known to be like this with the difference that they have vowels-

  • nly acting as the lexical skeleton
slide-63
SLIDE 63

Vowels vs. Consonants

Most languages have many more Cs than Vs (though cf. Swedish) Harmony is more common for vowels Disharmony/dissimilation is more common for consonants (Lyman’s Law in Japanese, Grassman’s Law, Semitic root constraints) Vowel reduction is a widespread phenomenon Is consonant neutralization as common? Unknown..

slide-64
SLIDE 64

Rltv Pssblt f rcvry ndr dltn

eai oiiy i eoe ue eeio

www.uebersetzung.at.

O rato roeu a rolha da garrafa do rei da Rússia.

Appilan pappilan apupapin papupata pankolla kiehuu ja kuohuu. Pappilan paksuposki piski pisti paksun papukeiton poskeensa.

slide-65
SLIDE 65

Caramazza 2000 double dissociation: Two Italian aphasics. AS made 3x more errors on Vs than Cs, IFA made 5x more on Cs than Vs pastore minatore

slide-66
SLIDE 66

Cs and Vs in word-similarity

Word Reconstruction Task Is kebra more like cobra or zebra? Listeners hear nonce words lik kebra and are told to name the first real word they find Cutler & van Oijen: Dutch (16 V, 19 C) vs. Spanish (5 V, 20 C)

Dutch Spanish

slide-67
SLIDE 67

Why do languages have consonants and vowels?

For the CELEX English database, words from 2 to 15 phonemes in length; there are 2.2 times as many neighbors resulting from a consonant replacement (e.g., pat as a neighbor for cat) as from a vowel replacement (e.g., kit as a neighbor for cat). The same calculation for Dutch in CELEX produced 1.72 neighbors from consonant replacement for every neighbor from vowel replacement, whereas for a Spanish lexical database of over 75,000 words (Sebastián-Gallés et al., 1996), there were 2.07 neighbors from consonant replacement for every neighbor from a vowel replacement. These ratios are comparable and reflect that fact that across vowel/consonant inventories, the “paths” to lexical neighbors are largely paved by consonants

Functional roles of each:

Consonants: provide lexical contrast Vowels: provide rhythmic scaffold, allow for speaker identification, emotive content

slide-68
SLIDE 68

The end of this chapter

http://www.people.fas.harvard.edu/~nevins/speechpercep I will post many of the relevant papers there