CAPT Xie Yanlu Beijing Language and Culture University Outline - - PowerPoint PPT Presentation

capt
SMART_READER_LITE
LIVE PREVIEW

CAPT Xie Yanlu Beijing Language and Culture University Outline - - PowerPoint PPT Presentation

Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University Outline English landmark Methods to select Chinese landmark Experiments in Chinese CAPT Discussion 2016/2/11 3 Objective in using computer aided


slide-1
SLIDE 1

Landmark in Chinese CAPT

Xie Yanlu

Beijing Language and Culture University

slide-2
SLIDE 2

Outline

 English landmark

 Methods to select Chinese landmark  Experiments in Chinese CAPT  Discussion

slide-3
SLIDE 3

2016/2/11

3

slide-4
SLIDE 4

 Basic fact: learner's erroneous sound always deviates a little from the canonical sound.

“ e ” ” sound: o{w } ” spread rounding Lip: Pinyin: “ e ” “

Rounding sound: “ e{o } ” Spreading “

  • sound:

Objective in using computer aided

pronunciation training (CAPT)

Mispronunciation detection is a typical distinctive feature selection problem

slide-5
SLIDE 5

Quantal nonlinearities

 High-Slope Nonlinearities are Natural Category Boundaries (Stevens, 1989)

2016/2/11

5

Articulation Acoustics Stable region I

  • Stable region

Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features.

slide-6
SLIDE 6

Nonlinear Map from Acoustic Features to Perceptual Features (Kuhl 1992)

slide-7
SLIDE 7

Consonant Confusions at -6dB SNR

P T K F TH S SH B D G V DH Z ZH M N P 80 43 64 17 14 6 2 1 1 1 1 2 T 71 84 55 5 9 3 8 1 1 1 K 66 76 107 12 8 9 4 1 1 F 18 12 9 175 48 11 1 7 2 1 2 2 TH 19 17 16 104 64 32 7 5 4 5 6 4 5 S 8 5 4 23 39 107 45 4 2 3 1 1 3 2 1 SH 1 6 3 4 6 29 195 3 1 B 1 5 4 4 136 10 9 47 16 6 1 5 4 D 8 5 80 45 11 20 20 26 1 G 2 3 63 66 3 19 37 56 3 V 2 2 48 5 5 145 45 12 4 DH 6 31 6 17 86 58 21 5 6 4 Z 1 1 17 20 27 16 28 94 44 1 ZH 1 26 18 3 8 45 129 2 M 1 4 4 1 3 177 46 N 4 1 5 2 7 1 6 47 163

Distinctive Features: ±nasal, ±voiced, ±fricative, ±strident

slide-8
SLIDE 8

Pronunciation Erroneous Tendency (PET) Confusions in CAPT

PET Diacritic s E.g. Notation Spreading w u{w} Round sound “u” has a spreading lip Backing

  • n{-}

The tongue position

  • f phoneme is a little

back Shorting ; p{;} The aspiration duration of phoneme p is shorter Laminalizi ng sh sh{sh} Balade-palatal phoneme sh is pronounced as Japanese lamina- alveolar

raising lowering advancing backing lengthening shortening centralizing rounding spreading labiodentalizing laminalizing devoicing voicing insertion deletion stopping fricativizing lateral nasalizing flapping

slide-9
SLIDE 9

Confusions in CAPT

2016/2/11

9

PET Diacritics PET Laminalizin g

sh sh、x zh zh、z、j ch ch、q、q6、en x sh j x、sh

Backing

an an、ang、e v v、j ang ang ing ing

Spreading

u u、iu、q6 f f eng eng、ang

Shorting

q q、j、i|sh|、 zh|sh| k k、g r r uo uo

slide-10
SLIDE 10

Phonetic landmark

 A phonetic landmark is an instantaneous speech event that is

 perceptually salient (“salient" = easy to detect), and that has  high information density about the message the speaker wishes to communicate.

2016/2/11

10

slide-11
SLIDE 11

Landmarks are Redundant

Stevens, 1999 To recognize a stop consonant, it is necessary and sufficient to hear any one of these:

  • Release into vowel
  • Closure from vowel
  • “Ejective” burst

… three “acoustic landmarks” with very different spectral patterns. “backed”

slide-12
SLIDE 12

landmark locations

 Four different candidate landmark locations:

 the temporal midpoint of the vowel  the boundary between the vowel and the consonant  the middle of the consonant  the boundary between the consonant and its following segment

2016/2/11

12

slide-13
SLIDE 13

Englsih Landmark

 1) For all vowel -type phones (usually has labels that starts with the letters a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/2 and put a V landmark  2) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of the interval, and put a G landmark  3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the Nc landmark, and at the end time, put the Nr landmark  4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put the Sc landmark  5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr landmark  6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time, put the Fc landmark, and at the end time, put the Fr landmark  7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark

2016/2/11

13

slide-14
SLIDE 14

How to find Chinese landmark

 Refer to English Landmark in IPA  Perception  Observation  Intuition/Guess?

2016/2/11

14

slide-15
SLIDE 15

How to find Chinese landmark

 English landmark in CAPT

 IPA projection

 Chinese landmark in CAPT

 Nasal: an/ang en/eng in/ing  Dorsal: j q x k/z c s  Vowel: v u eng r uo

 Zh/ch

2016/2/11

15

sh sh zh ch x j an an v ang ang ing ing u f eng eng q k r uo uo

slide-16
SLIDE 16

 IV+t-N  IV-T+N  IV-T+n

I V T’ I V T I V N I V N I V N’ I V N’

IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged

How to find Chinese landmark: perception of modified speech

pure vowel nasalized vowel nasal consonant

slide-17
SLIDE 17

/ban/ vs /bang/

2016/2/11

17

ban1 bang1 V1 T1 N1 V2 T2 N2 Revised1 Revised2 V1 T2 V2 T1

IV+t-N

ban1 bang1 V1 T1 N1 V2 T2 N2 Revised3 Revised4 V1 N1 V2 N2 ban1 bang1 V1 T1 N1 V2 T2 N2 Revised5 Revised6 V1 N2 V2 N1

IV-T+N IV-T+n

IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged

slide-18
SLIDE 18

the nasalized vowels play a dominating role in perception

2016/2/11

18

slide-19
SLIDE 19

How to find Chinese landmark

 Dorsal

2016/2/11

19

Dorsal

slide-20
SLIDE 20

following vowel landmark

 T and VOT (Wu 1989)  Coarticulation (Öhman 1966)  Initial C, first V, T and P all start at the syllable onset (Xu 2006)  We cannot explain the result of Dorsal  Due to the landmark ?  Or due to the coarticulation ?

2016/2/11

20

slide-21
SLIDE 21

Englsih Landmark & Chinese Landmark

2016/2/11

21

slide-22
SLIDE 22

System validation

Text 301 utterances #speakers 7 females #utterances 1899 #phonemes 26431 Average length per utterance 14 #kinds of specific PETs 65

 F1 Score  true positive rate (TPR)  positive rate (FPR).  Receiver Operating Characteristic (ROC): Receiver Operating Characteristic (ROC) metric that formulates the relationship between true positive rate (TPR) and false positive rate (FPR).

slide-23
SLIDE 23

Phonetic Labels

slide-24
SLIDE 24

Best acoustic cues selected for individual phones

2016/2/11

24

slide-25
SLIDE 25

Landmark: onset of vowel

2016/2/11

25

Receiver Operating Characteristic (ROC) Nearly the same Eng>Chn Chn>Eng

slide-26
SLIDE 26

Landmark-: following vowel

2016/2/11

26

Eng>Chn Eng>Chn Chn>Eng Eng>Chn

slide-27
SLIDE 27

Discussion

 English landmarks locating at both start and end of durations for most of the 16 phones slightly

  • utperformed Chinese landmarks that was defined

by the empirical analysis of error pairs in the large scale corpus.  Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones.

2016/2/11

27

slide-28
SLIDE 28

Convolution Forgetting Curve Model

Xie Yanlu

Beijing Language and Culture University

slide-29
SLIDE 29

Outline

 Introduction  Exponential shape forgetting curve model  Convolution Forgetting Curve Model  Experiments in cognitive learning

slide-30
SLIDE 30

the procedure of memory(Ebbinghaus H,1913)

 Quantitative Description  Mathematical Description

1 2 3

( ) exp( ) f t a a t a   

exponential function in forgetting (Wixted, J. T., etc 1991)

1 1 2 2 3

( ) exp( / ) exp( / ) f t a t T a t T a     

(Rubin, David, C.etc 1999)

slide-31
SLIDE 31

Exponential shape forgetting curve model

Forgetting curve from University of Waterloo

slide-32
SLIDE 32

Procedure of convolution memory model (Baddeley AD.2000)

Long term memory Central Executive

Visuo-spatial sketch- pad Phonological loop Episodic Buffer

Input Output

slide-33
SLIDE 33

Convolution Forgetting Curve Model

 Long-term memory conformation is the result of interaction

  • f input and the central executive in the working memory.

 In consideration of the relationship between stimulation (study) and memory, it is alike interaction of signal and system in circuit theory

     

( )* ( ) y t f h t d f t h t   

 

  

slide-34
SLIDE 34

One time learning convolution model (OCM)

 

( )* ( ) ( ) y t t h t h t   

 

1 2 3

( ) exp( ) y t h t a a t a    

Parameters represent the personal intrinsic characteristic of the learner

slide-35
SLIDE 35

Repeated learning convolution model (RCM)

 

1

( )* ( )

N n n

y t f t nT h t

 

 

1 1

( )* ( ) ( )

N n n N n n

y t t T h t h t T 

 

   

 

   

1 2 3 1

exp ( )

N n n

y t a a t T Na

   

slide-36
SLIDE 36

General repeated learning convolution model (GRCM)

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 50 100 150 200 250 300 350 400 450 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 50 100 150 200 250 300 350 400 450 500

       

1 2 3 1 1

( ) ( ) * ( ) ( ) ( ) * exp( )

N N n n n n n n

y t u t T u t T h t u t T u t T a a t a  

 

           

 

   

1 2 3 1 2 1 2 2 3 1 2

1 exp( ( )) ( ) exp( ( )) exp( ) 1

N n n n N n n

a a t T a t T t a a a t T a a N t a    

 

                   

 

slide-37
SLIDE 37

Perceptual training

2016/2/11

37

Day 1 Day 2 Day 7 Pre- test Mid- test Post test Adaptive training High variability training Synthesized F0 continuity samples Mandarin perception pattern Single syllable database +

slide-38
SLIDE 38

Experiments in cognitive learning

The test materials are all the same 60/20 natural words, which are voiced by native speaker. Learners are forced to judge the words’ tone in 5 minutes.

Learn er \Day 1 3 7 1 0.60 0.75 0.80 3 0.87 0.75 0.92 4 0.68 0.85 0.85 5 0.85 0.93 0.95 6 0.93 0.98 0.98 8 0.97 0.95 0.92 12 0.97 1 0.98 13 0.87 0.98 0.98 Avg 0.843 0.899 0.941 Learn er \Day 1 2 3 4 5 6 1 0.75 0.6 0.75 0.95 0.9 0.8 2 0.9 0.9 1 1 1 0.95 3 0.95 0.65 0.95 0.9 1 1 4 0.65 0.85 0.9 0.85 0.9 0.9 5 0.85 0.95 0.75 0.65 0.85 1 6 0.8 1 0.95 1 1 1 7 0.9 0.95 0.75 0.85 0.85 0.95 8 0.85 0.8 0.95 1 0.95 0.95 9 0.9 0.95 0.85 0.85 0.9 0.9 10 0.85 0.9 1 0.95 0.95 1 11 0.9 0.9 0.85 0.9 0.9 0.9 Avg 0.85 0.86 0.88 0.9 0.93 0.94 The probability of recall for the experiments of 20 trails The probability of recall for the experiments of 60 trails

slide-39
SLIDE 39

Experiments in cognitive learning

formul a MSE MSE of day 1 and3 MSE

  • f day

7 r2 a1 a2 a3 1 0.001 0.001 0.003 1 0.05 0.13 0.14 2 0.004 0.984 0.28 0.14

10 20 30 40 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 half day probability of recall formula (1) day1 data3 data7 formula (2)

The forgetting curves of convolution model (average) The calculated a1, a2, a3 with 60 trails tests result (Averaged)

slide-40
SLIDE 40

Experiments in cognitive learning

Train day formula MSE of all day MSE

  • f train day

MSE

  • f predict day

r2 1 1 0.023 0.028 0.563 2 0.018 0.022 0.906 1 and 2 1 0.018 0.027 0.570 2 0.015 0.022 0.923 1,2 and 3 1 0.003 0.002 0.006 0.992 2 0.007 0.003 0.014 0.931 1,2,3 and 4 1 0.003 0.002 0.008 0.992 2 0.004 0.002 0.010 0.966 1,2,3,4 and 5 1 0.003 0.003 0.010 0.992 2 0.002 0.002 0.001 0.982 1, 2, 3, 4,5 and 6 1 0.002 0.002 0.977 2 0.002 0.002 0.982

2016/2/11

40

The MSE and r2 with 20 trails tests result

5 10 15 20 25 30 35 40 0.75 0.8 0.85 0.9 0.95

half day probability of recall

formula (1) day1 day2 day3 day4 day5 day6 formula (2)

slide-41
SLIDE 41

Discussion

 Improving the traditional forgetting curve model.  With few memory data, the individual’s forgetting curve can be drawn.  Providing a certain basis to design better teaching methods.  Some factors that affect the phonetic teaching performance can be analyzed.

slide-42
SLIDE 42

Thank you for your attention! Any questions?