How Abstract Phonemic Categories Are Necessary for Coping With Speaker-Related Variation
Anne Cutler, Frank Eisner, James M. McQueen and Dennis Norris
Marius Volz Exemplar Theory 24 June 2020
How Abstract Phonemic Categories Are Necessary for Coping With - - PowerPoint PPT Presentation
How Abstract Phonemic Categories Are Necessary for Coping With Speaker-Related Variation Anne Cutler, Frank Eisner, James M. McQueen and Dennis Norris Marius Volz Exemplar Theory 24 June 2020 Introduction: Variability in Speech Sounds
Anne Cutler, Frank Eisner, James M. McQueen and Dennis Norris
Marius Volz Exemplar Theory 24 June 2020
Listeners can understand speech sounds despite considerable variability
Talker‘s vocal tract shapes, dialect, positions of words, ambient noise, etc. Two utterances of the same speech are never the same
Relevant information is extracted from the signal
Abstract representations can be mapped onto representations of words in the lexicon
Perception of words and voices are independent processes
Whispered and synthesised speech
Aphasia in the right vs. the left hemisphere
Lexicon entries of words include information about talker‘s voice
Complex and detailed memory traces for words
Normalisation procedures would be redundant
Nygaard, Sommers and Pisoni (1994)
Trained some listeners to identify voices Trained listeners recognised more new words in noise than untrained listeners Exposure to talkers‘ voices facilitated later recognitions of new words Talker-specific information must have been encoded
Adjusting to various voices increases processing demands
Perceptual knowledge is retained in procedural memory Enhances processing efficiency of utterances by the same talker
Relevant information is extracted from the signal
Abstract representations can be mapped onto representations of words in the lexicon
Perception of words and voices are independent processes
Whispered and synthesised speech
Aphasia in the right vs. the left hemisphere
Lexicon entries of words include information about talker‘s voice
Complex and detailed memory traces for words
Normalisation procedures would be redundant
Talker-specific information plays a role in speech perception
An extreme abstractionist view is untenable
Relevant information is extracted from the signal
Abstract representations can be mapped
Perception of words and voices are independent processes
Whispered and synthesised speech
Aphasia in the right vs. the left hemisphere
Lexicon entries of words include information about talker‘s voice
Complex and detailed memory traces for words
Normalisation procedures would be redundant
Talker-specific information plays a role in speech perception
An extreme abstractionist view is untenable
Cutler et al. (this paper)
Extreme versions of neither view is tenable Talker specific knowledge could be stored prelexically Generalisation for idiosyncrasies across the vocabulary possible
Perceptual system adjusts rapidly to articulatory idiosyncrasies
Norris, McQueen, and Cutler (2003)
Two groups of listeners Training: Words that ended in [f] or [s] For each group, one of the fricatives was replaced by an ambiguous fricative [?] Lexical decision task: 90% of [?]-final words were accepted as real words Test: Categorising sounds from an [ɛf]-[ɛs] continuum Participants were more likely to categorise a sound as their respective training sound Prelexical adjustment in how the acoustic signal is mapped onto a phonemic category
Eisner and McQueen (2005)
Similar training conditions Learning was talker-specific Effect was only applied to the fricative test sounds
uttered by the training talker
Kraljic and Samuel (2006)
Generalised learning found for [d]-[t] and [b]-[p] contrasts Stops contain less information about the talker than fricatives
Eisner and McQueen (2006)
Is learning stable over time? One group was trained in the morning and tested 12 hours later One group was trained in the evening and tested 12 hours later Effects did not decrease in either group Training consisted of listening to a short story with either [f] or [s] replaced Results suggest that lexically guided perceptual learning is automatic
In episodic models: Postlexical phoneme categorisation Based on lexical episodic traces A listener learns about a talker’s unusual speech sound Recognition of all words containing that sound are affected Indicates prelexical phoneme categorisation If learning generalises to words not heard during training… Evidence for abstract prelexical phonemic representations
McQueen, Cutler, and Norris (2006)
Training: Auditory lexical decision task Final [f] or [s] were replaced with an ambiguous fricative Test: cross-modal identity priming task
Auditory prime followed by a visual lexical decision task Speed and accuracy of decision measured and compared
Critical words: DOOF and DOOS Prime: [do:?] or a phonologically unrelated word Listeners were faster and more accurate with their training fricatives More wrong answers (negative values) when the training and
target word’s fricative were different
Perceptual adjustments are applied to other words in the lexicon
Abstract, flexible prelexical representations help in dealing with phonetic variability Episodic models contain detailed traces, and lack this abstraction and flexibility Episodic models should not be able to explain lexical generalisation
MINERVA-2 (Hintzman 1986)
Simulation model of human memory Each episode lays down a trace in Long-Term Memory New inputs activate all traces in proportion to their matching contents An aggregate echo of all activated traces is returned to Working Memory Vector consists of name fields (category identity) and form fields (phonetic patterns)
New training items are similar to existing traces
except for their final portion, the ambiguous fricatives
A test item‘s ambiguous sound corresponds with such a training stimulus
and thus activates its entire trace
Training episodes resonate with test inputs but do not help in interpreting them
More unambiguous than ambiguous training sounds
lead to a stronger proportional activation of the unambiguous sound due to higher quantity → opposite of studies‘ results
Training phase
40 words ending in [f] and [s] respectively 20 additional ambiguous items that originally ended with the same final phoneme 20 additional episodes of unambiguous items ending with the other final phoneme
Test phase
Content of the echo was compared to the two possible interpretations Determine if it was more similar to the trained than the untrained fricative Score for form retrieval was slightly below chance → opposite of the effect found in human data
Pure episodic model is unable to simulate results from experiments with humans No generalisation that would put test inputs in the direction of the trained sound Episodic models can abstract a prototype Inputs will activate that prototype Echoes of ambiguous input sounds will also be ambiguous No relationship between name fields of different words (oliif vs doof)
Abstract prelexical representations help dealing with variation in the speech signal Efficient: Idiosyncrasies are stored once instead of for each word Benefit comprehension of unheard signals containing such idiosyncrasies Inflexible with respect to acquiring new phonemic categories (second language acquisition) But flexible with respect to adjustments to existing categories (new words with critical sound) Flexibility is incompatible with episodic models Talker-specific information also helps in identifying phonemes and words Hybrid model with both episodic traces and prelexical abstractions
influences speech recognition