From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006

Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative data? Top-down vs. bottom-up processing?

Stress is part of all speakers’ knowledge Abso-bloomin’-lutely

Stress can indicate lexical contrasts perVERT PERvert Its acoustic correlates involve greater duration, greater amplitude, and pitch contour on the stressed syllable (the vowel carries most of this)

Contrastive vs. Fixed Stress In languages like English and Russian, stress is not always fixed in the same position, so it can be used to contrast different words (e.g. trústy vs. trustée; or pi.sál vs. pí.sal , a mistake to be careful of!) In other languages (Czech, French, Turkish, Polish, Finnish,...) stress is always in a fixed location (e.g. always on the 1st syllable in Czech, always on the last syllable in French, etc.)

Perceptual use of Fixed Stress Canyoureadthiseasilywithoutpunctuationorspaces? Vroomen 1998: Learners can use stress as a cue for word boundaries in an artificial word monitoring task Infants at 7.5 months can already segment words from fluent speech the bnick strategy is one way (“stab nick”) Metrical segmentation strategy...

The same effect in free stress languages? English speakers sometimes (accidentally) take the stressed syllable to be evidence for a word boundary Thus in a must to a.vóid , a common “slip of the ear” is something like a muscular void 9-month infants prefer to listen to strong-weak words (róbin) than weak-strong words (giráffe) [Jusczyk, Cutler & Redanz 93]

Infants finding words How do infants learn new words? How do they separate the target word from the surrounding context? “Fast mapping” and Carey’s chromium study Brent & Siskind: one-word utterances occur only 9% Boundaries between words are not marked by acoustic events

Jusczyk & Aslin Two groups of infants: one heard cup and dog during familiarization phase, other group heard feet and bike “The cup was bright and shiny” “Meg put her cup back on the table” During test phase, both groups heard sentences with all 4 A later experiment showed they had no preference for tup,bog,zeet,gike : they are not storing words “coarsely”

How do they do it? Allophony: aspiration vs. flapping vs. glottalization (notate,notable,note) Transitional probabilities? P(AB)/P(A): in “prettybaby” -- TP(pre,tty) > TP(tty,ba) “Local minima” of TPs might be used to find word boundaries Based on 2 minutes(!) of exposure: pulikiberagafodaru infants have a preference for words with high internal TP (Saffran et.al) Note that statistics are not a panacea...

Unique Stress Constraint Yang & Gambell 2005 The Unique Stress Constraint: chewbácca vs. dárthváder Take the sequence WSSSW, the USC will automatically segment this as [WS][S][SW]. Take SWWWS. Already you know there are 2 words, and probabilities can work on the medial W’s.

Algebraic “subtraction” Yang & Gambell 2005 If you already know “big” then extracting “snake” is easy in bigsnake Kids seem to do this, saying “two dults”, perhaps after doing subtraction on adult and “I was hajve” after behave

Language-specific processing How much of perception is guided by “training”: what language you already speak? “Top-down” influences on processing

Day 2 The effects of contrastive status (a linguistic property about the way the lexicon is built up) on the way that raw acoustic properties are perceived

Stress “Deafness” in French It is well known that one’s native phonology affects one’s ability to perceive segmental contrasts; e.g. the difficulty of [l]/[r] perception by Japanese speakers Dupoux & Peperkamp suggest that it may also affect one’s ability to perceive suprasegmental contrasts

Stress-deafness Test Dupoux & Peperkamp Subjects required to learn 2 CVCV nonwords that differ only in (a) place of articulation of C2 or (b) stress, and transcribe auditorally presented sequences i.e. kúpi-kúti vs. mípa-mipá Longer duration for stressed σ Higher F0 for stressed σ

Speakers of fixed- stress languages are comparatively bad at perceiving Note that Finnish has initial fixed contrastive stress stress and Spanish has final fixed stress

Rhythm and Prosody “Those guys talk fast!” “I can’t find the word boundaries!”

Rhythmic differences across languages Syllable-timed rhythm (Sp., It.) vs. stress-timed rhythm (Eng,Du) Lloyd James/Kenneth Pike: “Machine-gun languages versus Morse-code languages”

Acoustic Correlates of Rhythm? Durational Isochrony (“even spacing”) not experimentally upheld Phonological characteristics (Dauer 1983)(a) more syllable types in stress-timed languages (e.g. CCVC, VCC, etc.) (b) reduction of unstressed syllables Yet : Catalan has same syllable structure as Spanish, but has vowel reduction; Polish allows complex syllable types, but has no reduction

Ratios and variance Take a look at the spectrogram...which is more salient? Ramus et. al measured vowel/consonant intervals “Next Tuesday on”: [n][e][kst][u][sd][eio][n] %V and Variance(C)

Babies can tell!

Rhythmic differences The next local elections will take place during the winter Le prossime elezioni locali avranno luogo in inverno Tsugi no chiho senkyo wa haruni okonawareru daru Infants hear speech filtered at 400 Hz...

Homework assignment distribution: three parts Feel free to ask questions! nevins@fas.harvard.edu Individual appointments possible Requests for next week’s discussion are encouraged

Day 3: Categories and Speech-Specificity What makes something a category? How does “speech mode” influence perception?

The effects of contrastive status A,B a pair of sounds are used contrastively in a language, A,B only differ along a single acoustic dimension Tokens of sounds produced in between the extremes of “A”-ness and “B”-ness may be perceived differently depending on whether they are used contrastively in the language

Liberman et. al presented a continuum of linguistic stimuli and non-linguistic stimuli . The only acoustic difference: [la] has falling F3 and [ra] has rising F3

Idealized Categorization items 5-8 are categorized as ”A” 100% items 5-8 are categorized as “B” 0% items 1-4 are categorized as ”A” 100% items 1-4 are categorized as “B” 0% Idealized Categorization: Nonetheless, they are perceived 8 stimuli vary along an acoustic as belong to 2 distinct groups dimension in even steps

Idealized Discrimination Within “Category”, subjects cannot reliably discriminate two acoustically different stimuli. They can only guess. (50%) There is a point between each adjacent stimulus But across “category”, they on the continuum which indicates subjects’ ability are perfect, even though the to correctly guess “identical” or “not identical” acoustic difference here is the same as other pairs

Visual Light Wavelength (Nanometers)

English speakers: Stimuli 1-6 categorized as [ra] around100% Stimuli 7-8 not reliably categorized Stimuli 9-13 categorized as [ra] around 0% Discrimination of Stimuli 3 steps apart varied near category boundary for English speakers; discrimination function shows no pattern for Japanese speakers

MMN only for Hindi speakers when -50ms stimulus presented after sequence of -10ms stimuli

On the stimuli that were F3 transitions alone, both populations had non-categorial perception

Contrastiveness and Distributional Patterns These “bell curved” distribution functions, with highest frequency centered symmetrically around a mean are called Gaussian distributions. If a 2-way distinction is contrastive in the language, will it show the unimodal pattern, which has the most actual utterances most centered around the middle of the continuum, or will it have more utterances that are near the extremes? Hint: think about humans’ identification function when there are two contrastive categories along such a continuum

Maye, Werker & Gerken Infants heard: 16 tokens on 8-point ta-da continuum, 4 ma, 4 la 2.3 minutes total Then, they were presented tokens 3 & 6, and tokens 1 & 8 Infants in the bimodal condition looked longer in general They also looked longer when there were 3/6 presented in sequence than 1 or 8 presented alone.

What about allophones? These are also not in a unimodal distribution (though we don’t really have evidence that they are “as bimodal” as contrastive pairs) Learning that two categories are allophonic requires noticing that they are found in completely distinct environments (Notice Maye et.al’s kids heard the stimuli in identical environments: word-inital and followed by the same vowel)

Is Speech processed different than sound? Going back to the l/r study, it was interesting that Japanese speakers could distinguish F3 transitions when presented alone da vs. ga also distinguished by F3 Duplex Perception (Liberman et. al): third formant of da/ga continuum played to one ear, and the rest of sound played to other ear

From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006 Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Even more on Speech Even more on Speech Perception: It s not just s not just Perception:

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

am S Proposing a Meta-Language for Specifying Presentation Complexity in order to Support

Quantifying Air Traffic Controller Mental Workload Nicolas Suarez nstetzlaff@e-crida.enaire.es

Pacific Belltower sound installation for live sonification of earthquake Internet data PerMagnus

Frames for Psychoacoustics tics Peter Balazs Erblet transform and perceptual sparsity ARI

Robot audition and its deployment Kazuhiro Nakadai Principal Researcher, Honda Research Institute

Y P O

Models and Causation of Child Language Disorders Models and Causation of Child Language

1. Welcome & Session Explanation 1. Sound Check, M aterial Check 2. Broad Personal Goal:

Sambuz

Useful Links

Newsletter

Mail Us

From Speech Perception to Language Andrew Nevins (Harvard - PowerPoint PPT Presentation

From Speech Perception to Language Andrew Nevins (Harvard University) Lectures at Universidadte Federal do Rio de Janiero May 2006 Your background? Syllable? Heavy syllable? Stress? Secondary stress? Vowel reduction? Graphs/quantitative

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Even more on Speech Even more on Speech Perception: It s not just s not just Perception:

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

am S Proposing a Meta-Language for Specifying Presentation Complexity in order to Support

Quantifying Air Traffic Controller Mental Workload Nicolas Suarez nstetzlaff@e-crida.enaire.es

Pacific Belltower sound installation for live sonification of earthquake Internet data PerMagnus

Frames for Psychoacoustics tics Peter Balazs Erblet transform and perceptual sparsity ARI

Robot audition and its deployment Kazuhiro Nakadai Principal Researcher, Honda Research Institute

Y P O

Models and Causation of Child Language Disorders Models and Causation of Child Language

1. Welcome &amp; Session Explanation 1. Sound Check, M aterial Check 2. Broad Personal Goal:

Sambuz

Useful Links

Newsletter

Mail Us

1. Welcome & Session Explanation 1. Sound Check, M aterial Check 2. Broad Personal Goal: