[PPT] - 4aSC43 Patterns in the perception of VC(C)V Nearey and Smits PowerPoint Presentation

SLIDE 1

Nearey & Smits: Perception of VCCV

1

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits

SLIDE 2

Patterns in the perception of VC(C)V strings

Terrance M. Nearey

University of Alberta

Roel Smits

Max Planck Institute for Psycholinguistics

(Work supported by SSHRC & MPI )

SLIDE 3

Nearey & Smits: Perception of VCCV

3

Experiment based on Repp 1983

Our study like Repp [1] expt. 1 with

synthetic VC(C)V where C = [+stop]

– Repp ran 3 sub experiments 1a, 1b and 1c – Our experiment had smaller total gap duration range than Repp, but larger than any single sub experiment – Our experiment had more spectral patterns than Repp

SLIDE 4

Nearey & Smits: Perception of VCCV

4

Challenge to modeling

Our models of perception of fixed string length

have been quite successful

– e.g. CV or /hVC/ syllables [3,4]

Repp’s 1983 abda experiments

– Variable length strings in one stimulus set – VCV: aba, ada; VCCV : abda, abda, abba, adda – Apparently very complex perceptual results [1, 2]

We take complexity as a challenge to our models

– Can we extend the simple architecture of our models to handle variable string-length case?

SLIDE 5

Nearey & Smits: Perception of VCCV

5

/abda/ Stimulus #78 (of 144)

100 200 300 400 500 600 1000 2000 3000 4000 5000 Stimulus # 78 of 144. Most extreme [abda] place cues

SLIDE 6

Nearey & Smits: Perception of VCCV

6

Experiment details

Stimuli arrayed in fully crossed design

– F2 (F3) VC offset: 1060 (2180) to 1450 (2539) Hz in 6 steps – F2 (F3) CV onset: 1099 (2262) to 1635 (2500) Hz in 6 steps]

F2 and F3 correlated (r = +1.0) in both offsets and onsets

– Gap Duration: 80, 120, 190 and 300 ms; – Total 144 = 6 x 6 x 4 stimuli

Subjects and responses

– Each responded to10 repetitions of each of 144 stimuli – 13 native speakers of Canadian English – Response button layout:

[b] [bb] [bd] [d] [dd] [db]

SLIDE 7

Nearey & Smits: Perception of VCCV

7

Phonetic cover terms

Canonical “duration” classes

– Singletons /aba/ or /ada/ – Clusters (heterorganic) /abda/ or /adba/ – Geminates /ab#ba/ or /ad#da/

Place of articulation classes (closing and opening)

– Closing place class (place of first or only stop)

Labial closers: /b/ /bb/ /bd/; Dental closers: /d/, /dd/, /db/

– Opening place class (place of second or only stop)

Labial openers: /b/ /bb/ /db/; Dental openers: /d/, /dd/, /bd/

SLIDE 8

Nearey & Smits: Perception of VCCV

8

General results I: Primary duration class patterns

Duration class patterns

– Short duration gaps favor singletons – Intermediate gaps favor heterorganic clusters – Very long gaps favor geminates – But single-place singleton and geminates always less likely when transition patterns clash see below.

SLIDE 9

Nearey & Smits: Perception of VCCV

9

General Results II: Primary place patterns

Closing place class affected by VC- F2 F3
ffset

– Low F2 F3 offset favors labial closers /b bb bd/ – High F2 F3 offset favors dental closers /d dd db/

Opening place affected by -CV F2 F3 onset

– Low F2 F3 offset favors labial openers /b bb db/ – High F2 F3 offset favors dental closers /d dd bd/

SLIDE 10

Nearey & Smits: Perception of VCCV

10

General Pattern III: Apparent complications

Definition: Clash of transition patterns

– Closing VC_ transitions near the low /_b/ end but opening _CV near the high /_d/ end – or vice versa

At longer gap durations

– even fairly small clash tends favor clusters [bd], [db]

ver singletons or geminates
At short gap durations,

– singletons favored unless clash is quite large – singleton responses dominated by opening _CV cues

SLIDE 11

Nearey & Smits: Perception of VCCV

11

Apparent time-dependent assimilation/dissimilation effects

Repp 1983 summarized key results in by
Time course

– Proactive (left to right): preceding VC1_ stimulus affects judgment of following _C2V – Retroactive (right to left); following _C2V stimulus affects judgment of preceding VC1_

Phonetic cross-gap agreement of responses

– Assimilation (more one place responses: /d, dd, b, bb/) – Dissimilation (more two place: /db, bd/)

SLIDE 12

Nearey & Smits: Perception of VCCV

12

General trends in apparent assimilation

For Repp 1983 experiment 1

– Retroactive effects larger than proactive – Strong assimilation prevalent for shorter gap durations – Some dissimilation present for longer gap duration – Longest gap duration shows little effect

Our experiment

– Trends (using Repp’s measure) are similar [see notes].

SLIDE 13

Nearey & Smits: Perception of VCCV

13

Baseline logistic model

Define 6 ‘diphone’ consonant categories

– Label them ‘CC’

CC1 = b, CC2 = d, CC3 = bb, CC4 = dd, CC5 = bd, CC6 = db/ * /d/ and /b/ are ‘degenerate’ diphones

Three stimulus properties

1) Xclose – VC (syl. 1) closure transition step 2) Xopen – CV (syl. 2) opening transition step 3) Xdur – gap duration

[note: duration coded as sqrt(ms)]

SLIDE 14

Nearey & Smits: Perception of VCCV

14

Results Baseline Model

Pretty good fit

1) RMS = 5.96%

Score response rate in percentage points
Calculate RMS error of (predicted – observed)

2) Percent modal agreement PMA = 93.75%

Number of cells in which observed response category with

most votes corresponds to predicted response category with highest predicted score.

Agrees for 135 of 144 stimuli

SLIDE 15

Nearey & Smits: Perception of VCCV

15

Analysis of regression weights

Fit baseline model to each subject’s data

separately

– Basis for simple repeated measures comparisons

Examine means and between subject variation in

weight patterns for each cue

May give insight into how information is

integrated

SLIDE 16

Nearey & Smits: Perception of VCCV

16

Baseline model CC * Xclose d dd db b bb bd

1
0.5

0.5 1

Baseline Model: CC * Xclose C

e

f f i c i e n t ResponseCategory

Matlab Boxplots. “Belt” is median, notch width robust 95% confidence estimate.

SLIDE 17

Nearey & Smits: Perception of VCCV

17

Coefficients of Xclose depend

nly on phonetic closer class
Coefficients for Xclose [ = closing (VC) F2 F3

transition patterns] show very strong clustering

– Labial closers /b, bb, bd/ show low values – Dental closers /d, dd, db/ show high high values [Statistics in appendix panels]

Key fact: the phonetic nature of the opener has

negligible effect on how Xclose tunes likelihood

f closer place. Only phonetic closer class

matters.

SLIDE 18

Nearey & Smits: Perception of VCCV

18

Baseline model CC * Xopen d dd db b bb bd

1.5
1
0.5

0.5 1 1.5

Baseline Model: CC * Xopen C

e

f f i c i e n t ResponseCategory

Matlab Boxplots. “Belt” is median, notch width robust 95% confidence estimate.

SLIDE 19

Nearey & Smits: Perception of VCCV

19

Coefficients of Xopen depend on phonetic opener class

Coefficients for Xopen [= opening (VC) F2 F3

transition patterns] show very strong clustering

– Labial openers /b, bb, db/ show low values – Dental openers /d, dd, bd/ show high high values [Statistics in appendix panels]

Key fact: the phonetic nature of the closer has

minimal effect on how Xopen tunes likelihood of

pener place. Only phonetic opener class matters.

SLIDE 20

Nearey & Smits: Perception of VCCV

20

Baseline model CC * Xdur d dd db b bb bd

0.6
0.4
0.2

0.2 0.4 0.6

Baseline Model: CC * Xdur C

e

f f i c i e n t ResponseCategory

SLIDE 21

Nearey & Smits: Perception of VCCV

21

Coefficients of Xdur depend only

n phonetic duration class
Coefficients for Xdur (= silent gap duration) show

strong clustering

– Singletons /b, d/ show low coefficients for Xdur – Heterorganic clusters /db, bd/ show moderately high coefficients – Geminates /bb, dd/ show highest coefficients

There is negligible differentiation of members

within duration class

[See appendix panels for statistics]

SLIDE 22

Nearey & Smits: Perception of VCCV

22

Summary of baseline model

Only the ‘obvious’, primary cues count
Results from Baseline model suggest

extremely simple tuning of response by ‘local’ stimulus properties

This permits construction of a more restricted model
Fewer fitted parameters implementing a factored

model

– Extension of methods of Nearey 1990

SLIDE 23

Nearey & Smits: Perception of VCCV

23

Factored model

Factoring characteristics of CC in baseline model

– Pdur: tripartite category of phonological duration classes

1-singleton, 2-geminate and 3-heterorganic cluster

– Pclose:

1- labial closer :[b, bb, bd]
2- dental closer [d, dd, db]

– Popen:

1- labial opener :[b, bb, db]
2- dental opener [d, dd,bd]

SLIDE 24

Nearey & Smits: Perception of VCCV

24

Example Factoring of CC * Xdur

Example if contrasts among 6 CC can be factored

for different stimuli

– (e.g.) replace CC*Xdur with Pdur * Xdur – Reduces coefficients from 6 to 3, (and df from 5 to 2)

If gap duration affects CC judgments only through

Duration Class, then this should give similar fit with fewer coefficients

SLIDE 25

Nearey & Smits: Perception of VCCV

25

Comparative fit of Baseline and Factored models

9 20 Model df 93.75 5.94 2514.1 711 Factored 93.75 5.96 2403.1 700 Baseline pma rms G2 Resid. df Model

Better G2 fit of larger baseline model is not reliable according to bootstrap model comparison (train on 5 Ss, test on 13, repeat random splits 200 x)

SLIDE 26

Nearey & Smits: Perception of VCCV

26

Simple logistics can lead to complex probability patterns

The simple linear relations are in the space
f relative log –likelihoods
log(p(/CCj/,X) – log(p(/CCk),X) are simple linear

functions of the sets of coefficients of categories j and k and stimulus properties X.

Where do apparently complex assimilation

and dissimilation effects come from

When projected back to raw probability of response

space, they ‘fall out’ of the baseline model

SLIDE 27

Nearey & Smits: Perception of VCCV

27

Observed assimilation indices

(indices calculated per Repp 83, data from current experiment)

5

5 10 15 20 25 30 35 80 ms 120 ms 190 ms 300 ms Retroactive Proactive

SLIDE 28

Nearey & Smits: Perception of VCCV

28

Factored model predicted assimilation indices

5

5 10 15 20 25 30 80 ms 120 ms 190 ms 300 ms Retroactive Proactive

SLIDE 29

Nearey & Smits: Perception of VCCV

29

Consistent with simple factored recognition model

Not quite a simple phoneme based model

– Need to break phoneme into parts – Not all parts show up in all contexts

Example: in V_V, /b/ associated with three phases

– Closing (implosion) : [ >b] – Closure (hold): [$] – Opening (explosion): [B<] – Complete pattern [ >b $ B< ]

SLIDE 30

Nearey & Smits: Perception of VCCV

30

Compiled networks

[see notes pages for more]

/ada/

[a]—[<d]—[$]—[D>]—[a]

/abda/

[a]—[<b]—[$]—[$]—[D>]—[a]

/ab#ba/

[a]—[<b]—[$]—[#]—[$]—[B>]—[a]

SLIDE 31

Nearey & Smits: Perception of VCCV

31

Exploratory work

Exploratory work with more complex models

shows some may work reliably better

Dutch /apmas/ experiment work in progress by

Smits

– Plotted proportion of one versus two consonants against

ffset-onset F2F3 discontinuity measure

– Simple measure of formant offset/onset clash Xdiscon=abs(ClosingF2F3StepNo-OpeningF2F3StepNo)

When Xdiscon is large, two distinct

consonant responses more likely

SLIDE 32

Nearey & Smits: Perception of VCCV

32

Baseline + Transition Discontinuity

Fit improved when terms CC*Xdiscon are added

to baseline model [see notes]

Factored model with discontinuity tuning duration

class ( Pdur * Xdiscon) is nearly as good

– Improvement appears to be primarily due to prediction

f additional /bd/ and or /db/ responses when

discontinuity is large [Puzzling fact is that direction of discontinuity doesn’t matter – Further research contemplated

SLIDE 33

Nearey & Smits: Perception of VCCV

33

Conclusions

Very simple, highly interpretable model

works well

Key aspects of apparently complex behavior

accounted for

More complex models work somewhat

better

– May be more difficult to interpret

SLIDE 34

Nearey & Smits: Perception of VCCV

34

References

[1] Repp, B. H. (1983). Bidirectional contrast effects in the perception of VC- CV sequences. Perception and Psychophysics, 33(2), 147-155. [2] Repp, B. (1978). Perceptual integration and differentiation of spectral cues for intervocalic stop consonants. Perception and Psychophysics, 24(5), 471-485. [3] Nearey, T. (1997). Speech perception as pattern recognition. J. Acoust.

Soc. Amer., 101, 3241-3254.

[4] Smits, R. (2001). Evidence for hierarchical categorization of coarticulated phonemes, J. Exp. Psych.: HPP. 27, 111-135 [5] Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge: Cambridge University Press. [6] Shao, J. (1996). Bootstrap model selection. J. Amer. Statist. Assoc., 96(434), 655-665. ==========================================================

Remaining panels are notes

SLIDE 35

Nearey & Smits: Perception of VCCV

35

Details of Repp’s [1] assimilation index

Procedure at each gap duration for retroactive assimilation

– Consider only stimuli with most extreme /d/- like _C2V pattern (step 6 of opening F2 F3)

Calculate %D1|D2 = mean percentage dental closure responses (/d, dd, db/)
ver all VC1_ patterns

– Consider only stimuli with most extreme /b/- like _C2V pattern (step 1 of opening F2 F3)

Calculate %D|B2 = mean percentage dental closure responses (/d, dd, db/)
ver all VC1_ patterns
Define %A = (%D|D2)- (%D|B2)
Proactive assimilation index defined analogously
Graphic results of our experiment resemble those of Repp
Notes:

– Repp’s patterns change for three sub experiments 1a, 1b and 1c, covering different gap duration ranges.

Our duration ranges cover parts of all of Repp’s experiments 1a, 1b and 1c
Our results are generally compatible with a mixture the overall trends

– Repp’s original formulation of assimilation appears to have typos, switching -/ba/ and -/da/ on p 149.

SLIDE 36

Nearey & Smits: Perception of VCCV

36

Comparison of empirical assimilation indices of Repp 83 experiment 1b and current experiment

80 ms 120 ms 160 ms 190 ms 300 ms Retro-R Proactive Retro-ns Pro-ns

SLIDE 37

Nearey & Smits: Perception of VCCV

37

Comparative fit of 4 models

93.75 5.94 2514.1 711 Factored 96.53 5.69 2007.1 707 Factored+ Xdiscon 95.83 5.44 1918 695 Baseline+ Xdiscon 93.75 5.96 2403.1 700 Baseline pma rms G2 Error df Model

SLIDE 38

Nearey & Smits: Perception of VCCV

38

Bootstrap comparison 4 models

RMS & PMA measures best to worst:

1 Factored + Xdiscon 2 Baseline + Xdiscon; 3 Factored ; 4 Baseline

G2 best to worst:

1 Baseline + Xdiscon; 2 Factored + Xdiscon; 3 Factored; 4 Baseline

200 bootstrap model selection runs with 4 models

– Train on 5 samples; Test on 13 samples; Repeat 200 times (see [5,6]) – Random samples of entire subject data sets with replacement (13 subjects total)

SLIDE 39

Nearey & Smits: Perception of VCCV

39

Expansion of shorthand notation for baseline model

Bias terms: CC

– Expansion: bj for j = 1 to 6 (ranging over 6 categories /d, dd, db, b, bb,bd/)

Restriction Sj(bj)) = 0, where Sj is summation over all j.
Stimulus tuned terms.

– Let X1i X2i, X3i respectively represent its F2-F3 offset (closing transition) step, F3F3 onset (opening transition) and gap duration for the i-th stimulus

CC * Xclose = diphone tuned closing (VC) transition

– Expansion: a1j X1i for j = 1 to 6; Restriction S(a1j ) = 0,

CC * Xopen = diphone-tuned opening (CV) transition

– Expansion: a2j X2i for j = 1 to 6; Restriction S(a2j ) = 0,

CC * Xdur= diphone-tuned gap duration term

– Expansion: a3j X3i for j = 1 to 6; Restriction S(a3j ) = 0, Thus each set of terms involves 6 coefficients with 5 df each

Total df in model = 4 x (6-1)= 20
Evaluation function category j on stimulus i: f(i, j)= bj+ a1j X1i + a2j X2i + a3j X3i
Predicted probability P(i , j)= f(i, j)/ Sk(f(i, k)); k =1:6
Maximum likelihood parameter estimation assuming multinomial error distribution to fit individual
subjects. Maximum quasi-likelihood for pooled data.

SLIDE 40

Nearey & Smits: Perception of VCCV

40

Statistics on Baseline models I

Basic method. Data fit to individual subjects’ response patterns ;Multiple

comparisons via paired difference t-tests; Fischer LSD and Sidak criteria for multiple comparisons (MC) applied. – Fisher, declare significance at pt <= .05 test-wise criterion, where pt is nominal alpha level for two-tailed t-test with 12 df (based 3 subjects). – Sidak , declare significance ps = 1-(1-pt)k where k = 15 pairwise comparisons of 6 coefficients shown in boxplots

Results of Sidak tests: “Perfect MC clustering” on phonetic classes for Xclose,

Xopen and Xdur. Significant differences between classes, non-significant within classes.

Results of LSD tests. Same as above for Xclose and Xdur. For Xopen. ,

shows /b/ < /bd/ and /b/ < /bb/

SLIDE 41

Nearey & Smits: Perception of VCCV

41

Phoneme parts in context

In V_V contexts, /b/ shows up as >b $ B< pattern
But in other contexts, not everything shows up

– 1) $ B< in env. {#, C}___

Hold and Release cues only

– 2) >b $ in env ___{#,C}

Closing and hold cues only
Terminal symbols used to construct pseudo-network
Closures: { [<b], [<d] }; b and d closure tuned by closing F23

transitions

Hold: [$]: hold element tuned by duration of gap
Pause: [#]: pause element between words tuned by duration of gap
Releases: { [B>], [D>] }: b and d release tuned by opening F23

transitions

SLIDE 42

Nearey & Smits: Perception of VCCV

42

Grammar for network construction

1) S ‡ {VCV, VCCV, VC#CV} 2) V ‡ [a] 3) C ‡ {/b/, /d/} 4) # ‡ [#] [>b] [$] [B<] / V_V 5) /b/ ‡ [>b] ($) / __(#)C [$] [B<] / C(#)__ [>d] [$] [D<] / V_V 6) /d/ ‡ [>d] ($) / __(#)C [$] [D<] / C(#)__ Note: [x] indicates x is terminal symbol and element in network

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits - - PowerPoint PPT Presentation

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits

Patterns in the perception of VC(C)V strings

Experiment based on Repp 1983

Challenge to modeling

/abda/ Stimulus #78 (of 144)

Experiment details

Phonetic cover terms

General results I: Primary duration class patterns

General Results II: Primary place patterns

General Pattern III: Apparent complications

Apparent time-dependent assimilation/dissimilation effects

General trends in apparent assimilation

Baseline logistic model

Results Baseline Model

Analysis of regression weights

Coefficients of Xclose depend

Coefficients of Xopen depend on phonetic opener class

Coefficients of Xdur depend only

Summary of baseline model

Factored model

Example Factoring of CC * Xdur

Comparative fit of Baseline and Factored models

Simple logistics can lead to complex probability patterns

Factored model predicted assimilation indices

Consistent with simple factored recognition model

Compiled networks

Exploratory work

Baseline + Transition Discontinuity

Conclusions

References

Remaining panels are notes

Details of Repp’s [1] assimilation index

Comparative fit of 4 models

Bootstrap comparison 4 models

{ {