stefanie shattuck hufnagel speech communication group
play

Stefanie Shattuck-Hufnagel Speech Communication Group Research - PowerPoint PPT Presentation

Cue-based analysis of speech: Implications for prosodic transcription Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT A stark view: Some unanswered questions What are the contrastive categories


  1. Cue-based analysis of speech: Implications for prosodic transcription Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT

  2. A stark view: Some unanswered questions • What are the contrastive categories of spoken prosody? • How does their phonetic implementation vary systematically with context? • How do they relate to meaning and to interaction?

  3. Prosodic parallels to a feature-cue-based approach to speech processing? 1) Segmental phonology: growing evidence that language users systematically control: • individual acoustic cues to contrastive phonemic segments • contextually appropriate parameter values of these cues 2) Models: representation and processing of surface phonetic information at this level of detail • feature-cue-based processing (Halle, Stevens) 3) Parallels in prosodic phonology? • if so, what are the implications for prosodic transcription?

  4. Instruction giver’s map Instruction follower’s map

  5. Reduction of surface word forms It’s probably the same thing.

  6. probably the

  7. Strengthening/clarification of surface word forms Are you going to have to do that all over again? ProbabLY .

  8. Extremes of variation in word forms

  9. Surface phonetic segments often not appropriate for transcription • Cues not aligned in time – Cues to a feature can be distributed over time • nasality in V preceding a nasal coda C in I can go • duration of V preceding a voiceless coda C in I can’t go – Cues to features of two segments can overlap in time • /n + dh/ of win those  interdental nasal • Cues selected individually – Individual cues to features survive ‘deletion’ of segment • Duration of V preceding a ‘deleted’ voiced coda C in cat – Individual cues to features are sometimes added • Glottalized word-final /t/ sometimes also has closure and release burst

  10. Feature-cue-based transcription provides a better fit • Stevens 2002 (extending Halle 1972): Two types of features, two types of cues – Landmarks: abrupt spectral changes as cues to articulator-free features • Consonant, Vowel, Glide, Continuant, Sonorant, Strident – Landmark-related cues: spectral patterns near Landmarks, as cues to articulator-bound features • Labial, Coronal, Velar, Voiced, Nasal etc. – Additional acoustic events

  11. Landmark cues Rapid spectral changes across several energy bands which provide information about articulator-free features Boyce et al. 2013

  12. Landmark labelling captures individual cue patterns

  13. Advantages of Landmark Cues in Speech Perception • Reliably produced – 80% of predicted LMs in AEMT Corpus (Shattuck- Hufnagel & Veilleux 2007) • Robustly detectable (‘auditory edges’) • Highly informative – Articulator-free features (~manner) provide estimate of CV structure of the utterance – Identification of regions rich in cues to other features (place, voicing) – Inter-Landmark times provide estimate of durational markers of prosodic structure

  14. Extension to Production A sketch of an extrinsic timing model Stage 1: a phonological planning stage – symbolic segmental representations are sequenced and slotted into an appropriate prosodic structure – appropriate acoustic cues are selected for each segment’s features in its context Stage 2: a phonetic planning stage – cues are mapped onto sets of articulators – appropriate values for spatial and temporal parameters of movement are computed Stage 3: a motor-sensory implementation stage – articulator movements are generated and tracked. Turk and Shattuck-Hufnagel 2014

  15. Extension to Production A sketch of an extrinsic timing model Stage 1: a phonological planning stage – symbolic segmental representations are sequenced and slotted into an appropriate prosodic structure – appropriate acoustic cues are selected for each segment’s features in its context Stage 2: a phonetic planning stage – cues are mapped onto sets of articulators – appropriate values for spatial and temporal parameters of movement are computed Stage 3: a motor-sensory implementation stage – articulator movements are generated and tracked. Turk and Shattuck-Hufnagel 2014

  16. Evidence for a Feature-Cue-Based production planning model • Evidence that speakers can choose among individual cues – Feature cues left behind in phonetic reduction – New cues in challenging speaking circumstances – Inventory constraints on LM modification • Evidence that speakers compute cue parameter values – Conversational convergence: partial, governed by social values – Covert contrast in development – Inventory constraints on final lengthening

  17. Evidence for a Feature-Cue-Based production planning model • Evidence that speakers can choose among individual cues – Feature cues left behind in phonetic reduction – New cues in challenging speaking circumstances – Inventory constraints on LM modification • Evidence that speakers compute cue parameter values – Conversational convergence: partial, governed by social values – Covert contrast in development – Inventory constraints on final lengthening

  18. Conversational convergence/divergence Neilson 2011

  19. Evidence for a Feature-Cue-Based production planning model • Evidence that speakers can choose among individual cues – Feature cues left behind in phonetic reduction – New cues in challenging speaking circumstances – Inventory constraints on LM modification • Evidence that speakers compute cue parameter values – Conversational convergence: partial, governed by social values – Covert contrast in development – Inventory constraints on final lengthening

  20. Covert contrast in child speech Scobbie 1998; see also Gibbon 1990

  21. Covert contrast for stop voicing Macken & Barton 1980 JCL

  22. Characteristics of the FCBP approach • More complex planning by the speaker – Not ‘choose a surface allophone’ – But instead, ‘choose context -appropriate feature cues and cue parameter values’ • Extensive interpretation by the listener – Which linguistic constituents and structures does the signal contain cues for? – What information about the interaction and the situation does the signal contain cues for?

  23. Parallels in Prosodic Processing? • Individual variation in cue patterns – Irregular pitch periods at prosodic boundaries and prominences (Pierrehumbert & Talkin 1992, Dilley et al. 1996) • New cues in challenging speaking situations – Dysarthric speakers use duration instead of F0 to signal question vs statement (Patel 2003) – Whispered speech in Mandarin shows amplitude variation analogous to F0 shape for tones (Gao 2003) • Interpretation of ambiguous cues in context – Early prominence patterns influence interpretation of ambiguous later prominence (Dilley & Shattuck-Hufnagel 1998) – Early speaking rate influences interpretation of ambiguous cues to function words (Dilley & Pitt 2008)

  24. Parallels in Prosodic Processing? • Individual variation in cue patterns – Irregular pitch periods at prosodic boundaries and prominences (Pierrehumbert & Talkin 1992, Dilley et al. 1996) • New cues in challenging speaking situations – Dysarthric speakers use duration instead of F0 to signal question vs statement (Patel 2003) – Whispered speech in Mandarin shows amplitude variation analogous to F0 shape for tones (Gao 2003) • Interpretation of cues in context – Early prominence patterns influence interpretation of ambiguous later prominence (Dilley & Shattuck-Hufnagel 1998) – Early speaking rate influences interpretation of ambiguous cues to function words (Dilley & Pittt 1998)

  25. New cues in challenging speaking situations: Dysarthric Speech Patel 2003

  26. New cues in challenging speaking situations: Whispered Speech https://lingos.co/blog/mandarin-tones/ Gao 1999

  27. New cues in challenging speaking situations: Whispered Speech Gao 1999

  28. New Cues in challenging speaking situations: Whispered Speech Gao 2003

  29. Implications for Prosodic Transcription? • Determine the contrastive categories • Determine the range of appropriate cues and cue parameter values for each category, across contexts • Determine the relationship of the categories (and cue parameter values) to meaning and to interaction

  30. Implications for Prosodic Transcription? • Determine the contrastive categories • Determine the range of appropriate cues and cue parameter values for each category, across contexts • Determine the relationship of the categories (and cue parameter values) to meaning and to interaction • Can cue-based transcription move us toward these goals?

  31. Some useful steps • Consider prosodic elements in terms of distributed cues to contrastive elements and parameter values for those cues – Rather than as a sequence of surface elements • Develop displays of parameters as compelling as F0 contours – Duration and amplitude as % of typical – Autodetection of irregular pitch periods • Create inventories of contrastive use of prosodic phrasing and prominence across languages • Investigate ‘phonological equivalence’ in prosody

  32. Phonological equivalence

  33. Which differences distinguish contrasts?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend