Acquiring and adapting phonetic categories in a computational model - - PowerPoint PPT Presentation

acquiring and adapting phonetic categories in a
SMART_READER_LITE
LIVE PREVIEW

Acquiring and adapting phonetic categories in a computational model - - PowerPoint PPT Presentation

Acquiring and adapting phonetic categories in a computational model of speech perception Joe Toscano Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Acknowledgements Cheyenne Munson Toscano


slide-1
SLIDE 1

Joe Toscano

Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign

Acquiring and adapting phonetic categories in a computational model of speech perception

slide-2
SLIDE 2
  • Acknowledgements

Cheyenne Munson Toscano University of Illinois Funding: Beckman Institute Florian Jaeger University of Rochester Dave Kleinschmidt University of Rochester

slide-3
SLIDE 3
  • Overview
  • Two types of learning:
  • Adaptation of phonetic categories by adult listeners
  • Acquisition of phonetic categories by infants during development
  • Question: Can a single learning mechanism account for both?
  • Not necessarily the same:
  • Typically viewed as distinct processes
  • Very different time scales: acquisition is slow; adaptation is rapid
  • May require separate representations of phonetic categories
slide-4
SLIDE 4

Acoustic information Lexical/semantic information

dart peach beach cat bus tart

  • Speech development

Speech perception

  • Toscano, McMurray, Dennhardt, & Luck (2010), Psych Sci
slide-5
SLIDE 5

Lexical/semantic information Acoustic information

dart peach beach cat bus tart

Phonetic cues Phonological Categories

  • Learning mapping

between cues and categories

  • Speech development
  • Toscano, McMurray, Dennhardt, & Luck (2010), Psych Sci
slide-6
SLIDE 6
  • A model system: VOT and voicing
  • Toscano, McMurray, Dennhardt, & Luck (2010), Psych Sci

5 10 15 20 25 30 35 40

Proportion /p/ VOT (ms)

/b/ /p/

0.05036 0.1007 0.1511 0.05036 0.1007 0.1511 0.05036 0.1007 0.1511

slide-7
SLIDE 7

10 20 30 40 10 20 30 40 50 60 70 80 90

VOT (ms) Number of tokens

  • How do listeners learn the mapping between cues and categories?
  • One possibility: Track distributional statistics of acoustic cues
  • Clusters corresponding to phonological categories
  • e.g., English VOT and voicing
  • Maye, Werker, and Gerken (2002), Cognition; Allen & Miller (1999), JASA
  • A model system: VOT and voicing
slide-8
SLIDE 8
  • Allen & Miller (1999); Beckman et al. (2012); Lisker & Abramson (1964); Image credit: Roke / Wikimedia Commons
  • Cross-linguistic differences
  • English
  • Swedish
  • Dutch
  • Thai
slide-9
SLIDE 9
  • Speech development
  • Learning the distributional statistics of acoustic cues
  • Provides a way of learning the mapping between cues and categories

Is this similar to unsupervised perceptual adaptation experiments? Can adults track changes in the distributional statistics of acoustic cues?

slide-10
SLIDE 10
  • Perceptual adaptation
  • Listeners rapidly adapt to novel distributions of cues (~1 hr experiments)
  • Clayards, Tanenhaus, Aslin, & Jacobs (2008): Category variance
  • Clayards et al. (2008), Cognition
slide-11
SLIDE 11

VOT (ms) Number of Tokens 10 20 30 40 50 60 70

! ! ! ! ! ! ! ! ! ! ! !

−20 20 40 60 80 Distribution

!

Left Right VOT (ms) Proportion Response P 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 First Half

! ! ! ! ! ! ! ! ! ! ! !

10 20 30 40 50 Second Half

! ! ! ! ! ! ! ! ! ! ! !

10 20 30 40 50 Day 1 Day 2 Distribution

!

Left Right

  • Perceptual adaptation
  • Listeners rapidly adapt to novel distributions of cues (~1 hr experiments)
  • Clayards, Tanenhaus, Aslin, & Jacobs (2008): Category variance
  • Munson (2011): Category means
  • Munson (2011), dissertation
slide-12
SLIDE 12
  • Two phenomena
  • Acquisition of speech sounds during development (slow process)
  • Adaptation of speech sounds in adulthood (fast process)
  • Can a single model account for both?
  • Are changes in plasticity needed?
  • Are separate representations of long- and short-term categories needed?
  • Approach:
  • Simulations with a computational model of speech categorization
  • Examine parameter space of model to see if there are common learning rates for

both acquisition and adaptation

  • Language acquisition and perceptual adaptation
slide-13
SLIDE 13
  • Modeling approach
  • Gaussian mixture model
  • Statistical learning and competition
  • Acquisition during development
  • Simulation 1: Determining the number of categories and their properties
  • Adaptation in the same model
  • Simulation 2: Perceptual learning of shifted VOT distributions
  • Other aspects of perceptual learning in the model
  • Simulation 3: Speaking rate adaptation
  • Simulation 4: Learning new phonetic categories
  • Simulation 5: Learning the categories of a second language
  • Overview
slide-14
SLIDE 14
  • VOT example
  • Clusters corresponding to phonological categories
  • Different patterns across languages (Lisker & Abramson, 1964)
  • Gaussian mixture model (GMM)
  • Categories defined by Gaussian

distributions

  • Mean (!)
  • Standard deviation (σ)
  • Likelihood (Φ)

Posterior Probability Cue Value !=35 σ=10 Φ=0.03

  • Model of speech perception
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-15
SLIDE 15
  • Model of speech perception
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
  • VOT example
  • Clusters corresponding to phonological categories
  • Different patterns across languages (Lisker & Abramson, 1964)
  • Gaussian mixture model (GMM)
  • Categories defined by Gaussian

distributions

  • Model consists of a mixture of

Gaussians along a cue dimension

10 20 30 40 10 20 30 40 50 60 70 80 90

VOT (ms) Number of tokens

slide-16
SLIDE 16
  • Allen & Miller (1999); Beckman et al. (2012); Lisker & Abramson (1964); Image credit: Roke / Wikimedia Commons
  • Speech sounds across the world’s languages
  • English
  • Swedish
  • Dutch
  • Thai
slide-17
SLIDE 17
  • Modeling approach
  • Gaussian mixture model
  • Statistical learning and competition
  • Acquisition during development
  • Simulation 1: Determining the number of categories and their properties
  • Adaptation in the same model
  • Simulation 2: Perceptual learning of shifted VOT distributions
  • Other aspects of perceptual learning in the model
  • Simulation 3: Speaking rate adaptation
  • Simulation 4: Learning new phonetic categories
  • Simulation 5: Learning the categories of a second language
  • Overview
slide-18
SLIDE 18
  • Acquiring phonetic categories
  • Learning the distributional statistics of acoustic cues
  • Why is this a hard problem?
  • Can’t specify number of categories a priori
  • Speech sounds are unlabeled
  • Learning is incremental
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-19
SLIDE 19
  • Learning in the model
  • Statistical learning (Saffran, Aslin, & Newport, 1996; Maye, Werker, & Gerken, 2002)
  • Track the distributional statistics of acoustic cues

VOT (ms) Frequency 50

/b/ /p/

  • Acquiring phonetic categories
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-20
SLIDE 20
  • Learning in the model
  • Statistical learning (Saffran, Aslin, & Newport, 1996; Maye, Werker, & Gerken, 2002)
  • Track the distributional statistics of acoustic cues

Competition

  • Allows the model to determine the correct number of categories
  • Acquiring phonetic categories
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-21
SLIDE 21

English VOTs Spanish VOTs Thai VOTs

  • Acquiring phonetic categories
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-22
SLIDE 22
  • The model can learn the correct categories for a variety of acoustic cues and

phonological distinctions across different languages

  • Makes few assumptions:
  • Unsupervised, incremental learning
  • Competition between categories
  • Small number of parameters (3) used to describe each category
  • Acquiring phonetic categories
  • McMurray, Aslin, & Toscano (2009); Toscano & McMurray (2010)
slide-23
SLIDE 23
  • Overview
  • Modeling approach
  • Gaussian mixture model
  • Statistical learning and competition
  • Acquisition during development
  • Simulation 1: Determining the number of categories and their properties
  • Adaptation in the same model
  • Simulation 2: Perceptual learning of shifted VOT distributions
  • Other aspects of perceptual learning in the model
  • Simulation 3: Speaking rate adaptation
  • Simulation 4: Learning new phonetic categories
  • Simulation 5: Learning the categories of a second language
slide-24
SLIDE 24
  • Can the same model adjust its categories in an adaptation experiment?
  • Without changes in learning rates?
  • Without separate long- and short-term representations of categories?

Examined this by exploring model parameter space Compared model’s responses with listeners from Munson (2011)

  • Learning and adapting categories in a single model
slide-25
SLIDE 25
  • Learning and adapting categories in a single model
  • Gaussian mixture model (GMM)
  • Categories defined by Gaussian

distributions

  • Mean (!)
  • Standard deviation (σ)
  • Likelihood (Φ)
  • McMurray, Aslin, & Toscano (2009)

Posterior Probability Cue Value !=35 σ=10 Φ=0.03

Each parameter has a learning rate associated with it

!

0.5 1 2 4 8 ...

σ

0.1 0.2 0.4 0.8 1.6 ...

Φ

0.01 0.02 0.04 0.08 0.16 ...

slide-26
SLIDE 26
  • Learning and adapting categories in a single model
  • Common

parameters

  • Successful

adaptation parameters

  • Successful

developmental parameters

  • Successful developmental

parameters

  • Successful adaptation

parameters

  • Slower
  • Faster
  • Learning rates
slide-27
SLIDE 27
  • Ran simulations exploring the parameter space of the model
  • Which learning rates yield successful development (generally slower?)
  • Which yield successful perceptual learning (generally faster?)
  • Are there learning rates that are common to both?
  • Learning and adapting categories in a single model
slide-28
SLIDE 28
  • Learning and adapting categories in a single model

Which learning rates yield successful development?

50 25 75 100

ημ = 32 ησ = 0.4

0.0625 0.25 1 4 16 64

Proportion of simulations with n-category solution

0.0625

ηϕ (thousandths)

50 25 75 100

ημ = 0.03 ησ = 0.002

0.25 1 4 16 64

Percent

& ! (

Number of categories (n) 1 2 3 or more

slide-29
SLIDE 29
  • Learning and adapting categories in a single model

Which learning rates yield successful development?

& ! (

Number of categories (n) 1 2 3 or more slower learning rates ησ faster learning rates

ημ

slower learning rates faster learning rates

slide-30
SLIDE 30
  • Learning and adapting categories in a single model

Which learning rates yield successful development?

& ! (

Number of categories (n) 1 2 3 or more slower learning rates ησ faster learning rates

ημ

slower learning rates faster learning rates

slide-31
SLIDE 31
  • Learning and adapting categories in a single model

Which learning rates yield successful development?

& ! (

Number of categories (n) 1 2 3 or more slower learning rates ησ faster learning rates

ημ

slower learning rates faster learning rates

slide-32
SLIDE 32
  • Learning and adapting categories in a single model

Which learning rates yield successful development?

Percent successful 0.00 0.25 0.50 0.75 1.00 success

100 75 50 25

!"#$%&' ("'$%&) *"&$%&# )"&$%&# *"!$%&( !")$%&(

ησ ημ

1 0.1 10 0.01 0.1 1.0

η!

slide-33
SLIDE 33
  • Results of developmental simulation
  • A range of learning rates leads to successful category acquisition
  • Demonstrates that the model is relatively flexible in its ability to discover the

category structure over development

Next question: do some of these learning rates also lead to successful adaptation?

  • Learning and adapting categories in a single model
slide-34
SLIDE 34

VOT (ms) Number of Tokens 10 20 30 40 50 60 70

! ! ! ! ! ! ! ! ! ! ! !

−20 20 40 60 80 Distribution

!

Left Right VOT (ms) Proportion Response P 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 First Half

! ! ! ! ! ! ! ! ! ! ! !

10 20 30 40 50 Second Half

! ! ! ! ! ! ! ! ! ! ! !

10 20 30 40 50 Day 1 Day 2 Distribution

!

Left Right

  • Learning and adapting categories in a single model
  • Can the model capture learning effect seen for listeners in Munson (2011)?
  • Tested model in same adaptation experiment
  • Compared model and listener responses across sets of learning rates
slide-35
SLIDE 35
  • Learning and adapting categories in a single model

!"#$%&' ("'$%&) *"&$%&# )"&$%&# *"!$%&( !")$%&(

RMS error

ησ ημ

1 0.1 10 0.01 0.1 1.0

η!

!" !" !" !"#$%&##'

0.36 0.16 0.04

  • Can the model capture learning effect seen for listeners in Munson (2011)?
slide-36
SLIDE 36

!"#$%&' ("'$%&) *"&$%&# )"&$%&# *"!$%&( !")$%&(

  • Learning and adapting categories in a single model

RMS error

ησ ημ

1 0.1 10 0.01 0.1 1.0

η!

!" !" !" !"#$%&##'

0.36 0.16 0.04

  • Can the model capture learning effect seen for listeners in Munson (2011)?
slide-37
SLIDE 37
  • Learning and adapting categories in a single model
  • Can the model capture learning effect seen for listeners in Munson (2011)?
  • Model accurately captures responses to left- and rightward shifted distributions
  • Can also model individual differences

!"#$%"&'()*)+%(',&- .&-,%/*,%0#1&" 2'-,%"&'()*)+%(',&-

3433 3456 3463 3476 8433 3 83 53 93 :3 3 83 53 93 :3 3 83 53 93 :3

;#*<&%#)-&,%,*0&%=0-> ?(#@#(,*#)%A@A

B&/, C*+D,

VOT distribution shift

B*-,&)&(- E#1&"

Group

ημ = 0.0625 ησ = 0.00625 η!"= 0.008 ημ = 8 ησ = 0.8 η!"= 0.008 Left shift ημ = 0.125 ησ = 0.1 η!"= 0.002 RMSE = 0.025 Right shift ημ = 0.0625 ησ = 0.2 η!"= 0.004 RMSE = 0.044

slide-38
SLIDE 38

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 25 50 75 100 25 50 75 100 25 50 75 100 25 50 75 100 20 40 60 20 40 60 20 40 60 VOT (ms) Percent voiceless

! listener

model

! !

left right

  • Learning and adapting categories in a single model
slide-39
SLIDE 39
  • A single model can capture both acquisition of speech sound categories

during development and adaptation in adulthood

  • Simple unsupervised learning procedure
  • No changes in model plasticity over development
  • Represents a “minimal description” of the process
  • Learning and adapting categories in a single model
slide-40
SLIDE 40
  • Overview
  • Modeling approach
  • Gaussian mixture model
  • Statistical learning and competition
  • Acquisition during development
  • Simulation 1: Determining the number of categories and their properties
  • Adaptation in the same model
  • Simulation 2: Perceptual learning of shifted VOT distributions
  • Other aspects of perceptual learning in the model
  • Simulation 3: Speaking rate adaptation
  • Simulation 4: Learning new phonetic categories
  • Simulation 5: Learning the categories of a second language
slide-41
SLIDE 41
  • Simulation 2: Speaking rate adaptation
  • Can the model update its VOT representations in the context of variable

speaking rates?

# of utterances

Slow

  • 5

15 35 55 75 95 115

# of utterances VOT (ms)

Fast

  • Adapting phonetic categories
  • Toscano & McMurray (2012), Attn Percep & Psychophys; Toscano & McMurray (submitted)

0% 25% 50% 75% 100% Slow Fast

Proportion /p/ responses Speaking rate

slide-42
SLIDE 42
  • Adapting phonetic categories
  • McMurray, Horst, Toscano, & Samuelson (2009)
  • Simulation 2: Speaking rate adaptation
  • Can the model update its VOT representations in the context of variable

speaking rates?

slide-43
SLIDE 43
  • Adapting phonetic categories
  • Simulation 3: Learning a new category
  • Pisoni, Alsin, Perry, & Hennessy (1982)
  • 3-way voicing distinction based on VOT
slide-44
SLIDE 44
  • Potential implications for second language learning

Discontinuous shift Gradual shift

Gradual vs. discontinuous changes in language environment

slide-45
SLIDE 45
  • Summary and conclusions
  • A single model can capture both acquisition of phonetic categories during

development and adaptation in adulthood

  • Simple unsupervised learning procedure
  • No changes in model plasticity over development
  • Represents a “minimal description” of the process
  • No need to have separate representations for acquisition and adaptation

This suggests that

  • aspects of perceptual adaptation can be explained by changes to long-term

representation of phonetic categories

  • the same learning mechanism can operate over vastly different time-scales
slide-46
SLIDE 46
  • Thanks!