4aSC43 Patterns in the perception of VC(C)V Nearey and Smits - - PowerPoint PPT Presentation

4asc43 patterns in the perception of vc c v nearey and
SMART_READER_LITE
LIVE PREVIEW

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits - - PowerPoint PPT Presentation

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits Nearey & Smits: Perception of VCCV 1 Patterns in the perception of VC(C)V strings Terrance M. Nearey University of Alberta Roel Smits Max Planck Institute for


slide-1
SLIDE 1

Nearey & Smits: Perception of VCCV

1

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits

slide-2
SLIDE 2

Patterns in the perception of VC(C)V strings

Terrance M. Nearey

University of Alberta

Roel Smits

Max Planck Institute for Psycholinguistics

(Work supported by SSHRC & MPI )

slide-3
SLIDE 3

Nearey & Smits: Perception of VCCV

3

Experiment based on Repp 1983

  • Our study like Repp [1] expt. 1 with

synthetic VC(C)V where C = [+stop]

– Repp ran 3 sub experiments 1a, 1b and 1c – Our experiment had smaller total gap duration range than Repp, but larger than any single sub experiment – Our experiment had more spectral patterns than Repp

slide-4
SLIDE 4

Nearey & Smits: Perception of VCCV

4

Challenge to modeling

  • Our models of perception of fixed string length

have been quite successful

– e.g. CV or /hVC/ syllables [3,4]

  • Repp’s 1983 abda experiments

– Variable length strings in one stimulus set – VCV: aba, ada; VCCV : abda, abda, abba, adda – Apparently very complex perceptual results [1, 2]

  • We take complexity as a challenge to our models

– Can we extend the simple architecture of our models to handle variable string-length case?

slide-5
SLIDE 5

Nearey & Smits: Perception of VCCV

5

/abda/ Stimulus #78 (of 144)

100 200 300 400 500 600 1000 2000 3000 4000 5000 Stimulus # 78 of 144. Most extreme [abda] place cues

slide-6
SLIDE 6

Nearey & Smits: Perception of VCCV

6

Experiment details

  • Stimuli arrayed in fully crossed design

– F2 (F3) VC offset: 1060 (2180) to 1450 (2539) Hz in 6 steps – F2 (F3) CV onset: 1099 (2262) to 1635 (2500) Hz in 6 steps]

  • F2 and F3 correlated (r = +1.0) in both offsets and onsets

– Gap Duration: 80, 120, 190 and 300 ms; – Total 144 = 6 x 6 x 4 stimuli

  • Subjects and responses

– Each responded to10 repetitions of each of 144 stimuli – 13 native speakers of Canadian English – Response button layout:

[b] [bb] [bd] [d] [dd] [db]

slide-7
SLIDE 7

Nearey & Smits: Perception of VCCV

7

Phonetic cover terms

  • Canonical “duration” classes

– Singletons /aba/ or /ada/ – Clusters (heterorganic) /abda/ or /adba/ – Geminates /ab#ba/ or /ad#da/

  • Place of articulation classes (closing and opening)

– Closing place class (place of first or only stop)

  • Labial closers: /b/ /bb/ /bd/; Dental closers: /d/, /dd/, /db/

– Opening place class (place of second or only stop)

  • Labial openers: /b/ /bb/ /db/; Dental openers: /d/, /dd/, /bd/
slide-8
SLIDE 8

Nearey & Smits: Perception of VCCV

8

General results I: Primary duration class patterns

  • Duration class patterns

– Short duration gaps favor singletons – Intermediate gaps favor heterorganic clusters – Very long gaps favor geminates – But single-place singleton and geminates always less likely when transition patterns clash see below.

slide-9
SLIDE 9

Nearey & Smits: Perception of VCCV

9

General Results II: Primary place patterns

  • Closing place class affected by VC- F2 F3
  • ffset

– Low F2 F3 offset favors labial closers /b bb bd/ – High F2 F3 offset favors dental closers /d dd db/

  • Opening place affected by -CV F2 F3 onset

– Low F2 F3 offset favors labial openers /b bb db/ – High F2 F3 offset favors dental closers /d dd bd/

slide-10
SLIDE 10

Nearey & Smits: Perception of VCCV

10

General Pattern III: Apparent complications

Definition: Clash of transition patterns

– Closing VC_ transitions near the low /_b/ end but opening _CV near the high /_d/ end – or vice versa

  • At longer gap durations

– even fairly small clash tends favor clusters [bd], [db]

  • ver singletons or geminates
  • At short gap durations,

– singletons favored unless clash is quite large – singleton responses dominated by opening _CV cues

slide-11
SLIDE 11

Nearey & Smits: Perception of VCCV

11

Apparent time-dependent assimilation/dissimilation effects

  • Repp 1983 summarized key results in by
  • Time course

– Proactive (left to right): preceding VC1_ stimulus affects judgment of following _C2V – Retroactive (right to left); following _C2V stimulus affects judgment of preceding VC1_

  • Phonetic cross-gap agreement of responses

– Assimilation (more one place responses: /d, dd, b, bb/) – Dissimilation (more two place: /db, bd/)

slide-12
SLIDE 12

Nearey & Smits: Perception of VCCV

12

General trends in apparent assimilation

  • For Repp 1983 experiment 1

– Retroactive effects larger than proactive – Strong assimilation prevalent for shorter gap durations – Some dissimilation present for longer gap duration – Longest gap duration shows little effect

  • Our experiment

– Trends (using Repp’s measure) are similar [see notes].

slide-13
SLIDE 13

Nearey & Smits: Perception of VCCV

13

Baseline logistic model

  • Define 6 ‘diphone’ consonant categories

– Label them ‘CC’

CC1 = b, CC2 = d, CC3 = bb, CC4 = dd, CC5 = bd, CC6 = db/ * /d/ and /b/ are ‘degenerate’ diphones

  • Three stimulus properties

1) Xclose – VC (syl. 1) closure transition step 2) Xopen – CV (syl. 2) opening transition step 3) Xdur – gap duration

[note: duration coded as sqrt(ms)]

slide-14
SLIDE 14

Nearey & Smits: Perception of VCCV

14

Results Baseline Model

  • Pretty good fit

1) RMS = 5.96%

  • Score response rate in percentage points
  • Calculate RMS error of (predicted – observed)

2) Percent modal agreement PMA = 93.75%

  • Number of cells in which observed response category with

most votes corresponds to predicted response category with highest predicted score.

  • Agrees for 135 of 144 stimuli
slide-15
SLIDE 15

Nearey & Smits: Perception of VCCV

15

Analysis of regression weights

  • Fit baseline model to each subject’s data

separately

– Basis for simple repeated measures comparisons

  • Examine means and between subject variation in

weight patterns for each cue

  • May give insight into how information is

integrated

slide-16
SLIDE 16

Nearey & Smits: Perception of VCCV

16

Baseline model CC * Xclose d dd db b bb bd

  • 1
  • 0.5

0.5 1

Baseline Model: CC * Xclose C

  • e

f f i c i e n t ResponseCategory

Matlab Boxplots. “Belt” is median, notch width robust 95% confidence estimate.

slide-17
SLIDE 17

Nearey & Smits: Perception of VCCV

17

Coefficients of Xclose depend

  • nly on phonetic closer class
  • Coefficients for Xclose [ = closing (VC) F2 F3

transition patterns] show very strong clustering

– Labial closers /b, bb, bd/ show low values – Dental closers /d, dd, db/ show high high values [Statistics in appendix panels]

  • Key fact: the phonetic nature of the opener has

negligible effect on how Xclose tunes likelihood

  • f closer place. Only phonetic closer class

matters.

slide-18
SLIDE 18

Nearey & Smits: Perception of VCCV

18

Baseline model CC * Xopen d dd db b bb bd

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Baseline Model: CC * Xopen C

  • e

f f i c i e n t ResponseCategory

Matlab Boxplots. “Belt” is median, notch width robust 95% confidence estimate.

slide-19
SLIDE 19

Nearey & Smits: Perception of VCCV

19

Coefficients of Xopen depend on phonetic opener class

  • Coefficients for Xopen [= opening (VC) F2 F3

transition patterns] show very strong clustering

– Labial openers /b, bb, db/ show low values – Dental openers /d, dd, bd/ show high high values [Statistics in appendix panels]

  • Key fact: the phonetic nature of the closer has

minimal effect on how Xopen tunes likelihood of

  • pener place. Only phonetic opener class matters.
slide-20
SLIDE 20

Nearey & Smits: Perception of VCCV

20

Baseline model CC * Xdur d dd db b bb bd

  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6

Baseline Model: CC * Xdur C

  • e

f f i c i e n t ResponseCategory

slide-21
SLIDE 21

Nearey & Smits: Perception of VCCV

21

Coefficients of Xdur depend only

  • n phonetic duration class
  • Coefficients for Xdur (= silent gap duration) show

strong clustering

– Singletons /b, d/ show low coefficients for Xdur – Heterorganic clusters /db, bd/ show moderately high coefficients – Geminates /bb, dd/ show highest coefficients

  • There is negligible differentiation of members

within duration class

[See appendix panels for statistics]

slide-22
SLIDE 22

Nearey & Smits: Perception of VCCV

22

Summary of baseline model

  • Only the ‘obvious’, primary cues count
  • Results from Baseline model suggest

extremely simple tuning of response by ‘local’ stimulus properties

  • This permits construction of a more restricted model
  • Fewer fitted parameters implementing a factored

model

– Extension of methods of Nearey 1990

slide-23
SLIDE 23

Nearey & Smits: Perception of VCCV

23

Factored model

Factoring characteristics of CC in baseline model

– Pdur: tripartite category of phonological duration classes

  • 1-singleton, 2-geminate and 3-heterorganic cluster

– Pclose:

  • 1- labial closer :[b, bb, bd]
  • 2- dental closer [d, dd, db]

– Popen:

  • 1- labial opener :[b, bb, db]
  • 2- dental opener [d, dd,bd]
slide-24
SLIDE 24

Nearey & Smits: Perception of VCCV

24

Example Factoring of CC * Xdur

  • Example if contrasts among 6 CC can be factored

for different stimuli

– (e.g.) replace CC*Xdur with Pdur * Xdur – Reduces coefficients from 6 to 3, (and df from 5 to 2)

  • If gap duration affects CC judgments only through

Duration Class, then this should give similar fit with fewer coefficients

slide-25
SLIDE 25

Nearey & Smits: Perception of VCCV

25

Comparative fit of Baseline and Factored models

9 20 Model df 93.75 5.94 2514.1 711 Factored 93.75 5.96 2403.1 700 Baseline pma rms G2 Resid. df Model

Better G2 fit of larger baseline model is not reliable according to bootstrap model comparison (train on 5 Ss, test on 13, repeat random splits 200 x)

slide-26
SLIDE 26

Nearey & Smits: Perception of VCCV

26

Simple logistics can lead to complex probability patterns

  • The simple linear relations are in the space
  • f relative log –likelihoods
  • log(p(/CCj/,X) – log(p(/CCk),X) are simple linear

functions of the sets of coefficients of categories j and k and stimulus properties X.

  • Where do apparently complex assimilation

and dissimilation effects come from

  • When projected back to raw probability of response

space, they ‘fall out’ of the baseline model

slide-27
SLIDE 27

Nearey & Smits: Perception of VCCV

27

Observed assimilation indices

(indices calculated per Repp 83, data from current experiment)

  • 5

5 10 15 20 25 30 35 80 ms 120 ms 190 ms 300 ms Retroactive Proactive

slide-28
SLIDE 28

Nearey & Smits: Perception of VCCV

28

Factored model predicted assimilation indices

  • 5

5 10 15 20 25 30 80 ms 120 ms 190 ms 300 ms Retroactive Proactive

slide-29
SLIDE 29

Nearey & Smits: Perception of VCCV

29

Consistent with simple factored recognition model

  • Not quite a simple phoneme based model

– Need to break phoneme into parts – Not all parts show up in all contexts

  • Example: in V_V, /b/ associated with three phases

– Closing (implosion) : [ >b] – Closure (hold): [$] – Opening (explosion): [B<] – Complete pattern [ >b $ B< ]

slide-30
SLIDE 30

Nearey & Smits: Perception of VCCV

30

Compiled networks

[see notes pages for more]

  • /ada/

[a]—[<d]—[$]—[D>]—[a]

  • /abda/

[a]—[<b]—[$]—[$]—[D>]—[a]

  • /ab#ba/

[a]—[<b]—[$]—[#]—[$]—[B>]—[a]

slide-31
SLIDE 31

Nearey & Smits: Perception of VCCV

31

Exploratory work

  • Exploratory work with more complex models

shows some may work reliably better

  • Dutch /apmas/ experiment work in progress by

Smits

– Plotted proportion of one versus two consonants against

  • ffset-onset F2F3 discontinuity measure

– Simple measure of formant offset/onset clash Xdiscon=abs(ClosingF2F3StepNo-OpeningF2F3StepNo)

  • When Xdiscon is large, two distinct

consonant responses more likely

slide-32
SLIDE 32

Nearey & Smits: Perception of VCCV

32

Baseline + Transition Discontinuity

  • Fit improved when terms CC*Xdiscon are added

to baseline model [see notes]

  • Factored model with discontinuity tuning duration

class ( Pdur * Xdiscon) is nearly as good

– Improvement appears to be primarily due to prediction

  • f additional /bd/ and or /db/ responses when

discontinuity is large [Puzzling fact is that direction of discontinuity doesn’t matter – Further research contemplated

slide-33
SLIDE 33

Nearey & Smits: Perception of VCCV

33

Conclusions

  • Very simple, highly interpretable model

works well

  • Key aspects of apparently complex behavior

accounted for

  • More complex models work somewhat

better

– May be more difficult to interpret

slide-34
SLIDE 34

Nearey & Smits: Perception of VCCV

34

References

[1] Repp, B. H. (1983). Bidirectional contrast effects in the perception of VC- CV sequences. Perception and Psychophysics, 33(2), 147-155. [2] Repp, B. (1978). Perceptual integration and differentiation of spectral cues for intervocalic stop consonants. Perception and Psychophysics, 24(5), 471-485. [3] Nearey, T. (1997). Speech perception as pattern recognition. J. Acoust.

  • Soc. Amer., 101, 3241-3254.

[4] Smits, R. (2001). Evidence for hierarchical categorization of coarticulated phonemes, J. Exp. Psych.: HPP. 27, 111-135 [5] Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge: Cambridge University Press. [6] Shao, J. (1996). Bootstrap model selection. J. Amer. Statist. Assoc., 96(434), 655-665. ==========================================================

Remaining panels are notes

slide-35
SLIDE 35

Nearey & Smits: Perception of VCCV

35

Details of Repp’s [1] assimilation index

  • Procedure at each gap duration for retroactive assimilation

– Consider only stimuli with most extreme /d/- like _C2V pattern (step 6 of opening F2 F3)

  • Calculate %D1|D2 = mean percentage dental closure responses (/d, dd, db/)
  • ver all VC1_ patterns

– Consider only stimuli with most extreme /b/- like _C2V pattern (step 1 of opening F2 F3)

  • Calculate %D|B2 = mean percentage dental closure responses (/d, dd, db/)
  • ver all VC1_ patterns
  • Define %A = (%D|D2)- (%D|B2)
  • Proactive assimilation index defined analogously
  • Graphic results of our experiment resemble those of Repp
  • Notes:

– Repp’s patterns change for three sub experiments 1a, 1b and 1c, covering different gap duration ranges.

  • Our duration ranges cover parts of all of Repp’s experiments 1a, 1b and 1c
  • Our results are generally compatible with a mixture the overall trends

– Repp’s original formulation of assimilation appears to have typos, switching -/ba/ and -/da/ on p 149.

slide-36
SLIDE 36

Nearey & Smits: Perception of VCCV

36

Comparison of empirical assimilation indices of Repp 83 experiment 1b and current experiment

80 ms 120 ms 160 ms 190 ms 300 ms Retro-R Proactive Retro-ns Pro-ns

slide-37
SLIDE 37

Nearey & Smits: Perception of VCCV

37

Comparative fit of 4 models

93.75 5.94 2514.1 711 Factored 96.53 5.69 2007.1 707 Factored+ Xdiscon 95.83 5.44 1918 695 Baseline+ Xdiscon 93.75 5.96 2403.1 700 Baseline pma rms G2 Error df Model

slide-38
SLIDE 38

Nearey & Smits: Perception of VCCV

38

Bootstrap comparison 4 models

  • RMS & PMA measures best to worst:

1 Factored + Xdiscon 2 Baseline + Xdiscon; 3 Factored ; 4 Baseline

  • G2 best to worst:

1 Baseline + Xdiscon; 2 Factored + Xdiscon; 3 Factored; 4 Baseline

  • 200 bootstrap model selection runs with 4 models

– Train on 5 samples; Test on 13 samples; Repeat 200 times (see [5,6]) – Random samples of entire subject data sets with replacement (13 subjects total)

slide-39
SLIDE 39

Nearey & Smits: Perception of VCCV

39

Expansion of shorthand notation for baseline model

  • Bias terms: CC

– Expansion: bj for j = 1 to 6 (ranging over 6 categories /d, dd, db, b, bb,bd/)

  • Restriction Sj(bj)) = 0, where Sj is summation over all j.
  • Stimulus tuned terms.

– Let X1i X2i, X3i respectively represent its F2-F3 offset (closing transition) step, F3F3 onset (opening transition) and gap duration for the i-th stimulus

  • CC * Xclose = diphone tuned closing (VC) transition

– Expansion: a1j X1i for j = 1 to 6; Restriction S(a1j ) = 0,

  • CC * Xopen = diphone-tuned opening (CV) transition

– Expansion: a2j X2i for j = 1 to 6; Restriction S(a2j ) = 0,

  • CC * Xdur= diphone-tuned gap duration term

– Expansion: a3j X3i for j = 1 to 6; Restriction S(a3j ) = 0, Thus each set of terms involves 6 coefficients with 5 df each

  • Total df in model = 4 x (6-1)= 20
  • Evaluation function category j on stimulus i: f(i, j)= bj+ a1j X1i + a2j X2i + a3j X3i
  • Predicted probability P(i , j)= f(i, j)/ Sk(f(i, k)); k =1:6
  • Maximum likelihood parameter estimation assuming multinomial error distribution to fit individual
  • subjects. Maximum quasi-likelihood for pooled data.
slide-40
SLIDE 40

Nearey & Smits: Perception of VCCV

40

Statistics on Baseline models I

  • Basic method. Data fit to individual subjects’ response patterns ;Multiple

comparisons via paired difference t-tests; Fischer LSD and Sidak criteria for multiple comparisons (MC) applied. – Fisher, declare significance at pt <= .05 test-wise criterion, where pt is nominal alpha level for two-tailed t-test with 12 df (based 3 subjects). – Sidak , declare significance ps = 1-(1-pt)k where k = 15 pairwise comparisons of 6 coefficients shown in boxplots

  • Results of Sidak tests: “Perfect MC clustering” on phonetic classes for Xclose,

Xopen and Xdur. Significant differences between classes, non-significant within classes.

  • Results of LSD tests. Same as above for Xclose and Xdur. For Xopen. ,

shows /b/ < /bd/ and /b/ < /bb/

slide-41
SLIDE 41

Nearey & Smits: Perception of VCCV

41

Phoneme parts in context

  • In V_V contexts, /b/ shows up as >b $ B< pattern
  • But in other contexts, not everything shows up

– 1) $ B< in env. {#, C}___

  • Hold and Release cues only

– 2) >b $ in env ___{#,C}

  • Closing and hold cues only
  • Terminal symbols used to construct pseudo-network
  • Closures: { [<b], [<d] }; b and d closure tuned by closing F23

transitions

  • Hold: [$]: hold element tuned by duration of gap
  • Pause: [#]: pause element between words tuned by duration of gap
  • Releases: { [B>], [D>] }: b and d release tuned by opening F23

transitions

slide-42
SLIDE 42

Nearey & Smits: Perception of VCCV

42

Grammar for network construction

1) S ‡ {VCV, VCCV, VC#CV} 2) V ‡ [a] 3) C ‡ {/b/, /d/} 4) # ‡ [#] [>b] [$] [B<] / V_V 5) /b/ ‡ [>b] ($) / __(#)C [$] [B<] / C(#)__ [>d] [$] [D<] / V_V 6) /d/ ‡ [>d] ($) / __(#)C [$] [D<] / C(#)__ Note: [x] indicates x is terminal symbol and element in network

{ {