 
              Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University
Outline  English landmark  Methods to select Chinese landmark  Experiments in Chinese CAPT  Discussion
2016/2/11 3
Objective in using computer aided pronunciation training (CAPT) Basic fact: learner's erroneous sound always deviates a little  from the canonical sound. Lip: rounding spread Pinyin: e o “ ” “ ” Rounding e e{o } sound: “ ” “ ” “ Spreading ” sound: o{w } o sound: ” “ Mispronunciation detection is a typical distinctive feature selection problem
Quantal nonlinearities  High-Slope Nonlinearities are Natural Category Boundaries (Stevens, 1989) Acoustics � � I Articulation Stable region Stable region Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features. 2016/2/11 5
Nonlinear Map from Acoustic Features to Perceptual Features (Kuhl 1992)
Consonant Confusions at -6dB SNR P T K F TH S SH B D G V DH Z ZH M N P 80 43 64 17 14 6 2 1 1 1 1 2 T 71 84 55 5 9 3 8 1 1 1 K 66 76 107 12 8 9 4 1 1 F 18 12 9 175 48 11 1 7 2 1 2 2 TH 19 17 16 104 64 32 7 5 4 5 6 4 5 S 8 5 4 23 39 107 45 4 2 3 1 1 3 2 1 SH 1 6 3 4 6 29 195 3 1 B 1 5 4 4 136 10 9 47 16 6 1 5 4 D 8 5 80 45 11 20 20 26 1 G 2 3 63 66 3 19 37 56 3 V 2 2 48 5 5 145 45 12 4 DH 6 31 6 17 86 58 21 5 6 4 Z 1 1 17 20 27 16 28 94 44 1 ZH 1 26 18 3 8 45 129 2 M 1 4 4 1 3 177 46 N 4 1 5 2 7 1 6 47 163 Distinctive Features: ± nasal, ± voiced, ± fricative, ± strident
Pronunciation Erroneous Tendency (PET) Confusions in CAPT Diacritic raising PET E.g. Notation s lowering advancing Round sound “u” has Spreading w u{w} backing a spreading lip lengthening The tongue position shortening Backing - n{-} of phoneme is a little centralizing back rounding The aspiration spreading Shorting ; p{;} duration of phoneme labiodentalizing p is shorter laminalizing Balade-palatal devoicing voicing phoneme sh is Laminalizi insertion sh sh{sh} pronounced as ng deletion Japanese lamina- stopping alveolar fricativizing lateral nasalizing flapping
Confusions in CAPT PET Diacritics PET sh sh 、 x zh zh 、 z 、 j Laminalizin ch ch 、 q 、 q6 、 en g x sh j x 、 sh an an 、 ang 、 e v v 、 j Backing ang ang ing ing u u 、 iu 、 q6 Spreading f f eng eng 、 ang q 、 j 、 i|sh| 、 Shorting q zh|sh| k k 、 g r r uo uo 2016/2/11 9
Phonetic landmark  A phonetic landmark is an instantaneous speech event that is  perceptually salient (“salient" = easy to detect), and that has  high information density about the message the speaker wishes to communicate. 2016/2/11 10
Landmarks are Redundant Stevens, 1999 To recognize a stop consonant, it is necessary and sufficient to hear “backed” any one of these: • Release into vowel • Closure from vowel • “Ejective” burst … three “acoustic landmarks” with very different spectral patterns.
landmark locations  Four different candidate landmark locations:  the temporal midpoint of the vowel  the boundary between the vowel and the consonant  the middle of the consonant  the boundary between the consonant and its following segment 2016/2/11 12
Englsih Landmark 1) For all vowel -type phones (usually has labels that starts with the letters  a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/2 and put a V landmark 2) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of  the interval, and put a G landmark 3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the  Nc landmark, and at the end time, put the Nr landmark 4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put  the Sc landmark 5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr  landmark 6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time,  put the Fc landmark, and at the end time, put the Fr landmark 7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put  the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark 2016/2/11 13
How to find Chinese landmark  Refer to English Landmark in IPA  Perception  Observation  Intuition/Guess? 2016/2/11 14
sh sh zh How to find Chinese landmark ch x  English landmark in CAPT j an an  IPA projection v  Chinese landmark in CAPT ang ang  Nasal: an/ang en/eng in/ing ing ing u  Dorsal: j q x k/z c s f  Vowel: v u eng r uo eng eng  Zh/ch q k r uo uo 2016/2/11 15
How to find Chinese landmark: perception of modified speech pure vowel nasalized vowel nasal consonant I V T’ I V T  IV+t-N I V N I V N’  IV-T+N I V N’ I V N  IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged
/ban/ vs /bang/ ban1 bang1 Revised3 Revised4 ban1 bang1 Revised5 Revised6 ban1 bang1 Revised1 Revised2 V1 T1 N1 V2 T2 N2 V1 N1 V2 N2 V1 T1 N1 V2 T2 N2 V1 N2 V2 N1 V1 T1 N1 V2 T2 N2 V1 T2 V2 T1 IV+t-N IV-T+N IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged 2016/2/11 17
the nasalized vowels play a dominating role in perception 2016/2/11 18
How to find Chinese landmark Dorsal  Dorsal 2016/2/11 19
following vowel landmark T and VOT (Wu 1989)  Coarticulation (Öhman 1966)  Initial C, first V, T and P all start at the syllable onset (Xu 2006)  We cannot explain the result of Dorsal  Due to the landmark ?  Or due to the coarticulation ?  2016/2/11 20
Englsih Landmark & Chinese Landmark 2016/2/11 21
System validation 301 F1 Score  Text utterances true positive rate (TPR)  #speakers 7 females positive rate (FPR).  #utterances 1899 Receiver Operating Characteristic #phonemes 26431  (ROC): Average Receiver Operating Characteristic (ROC) metric that formulates the relationship length per 14 between true positive rate (TPR) and false utterance positive rate (FPR). #kinds of 65 specific PETs
Phonetic Labels
Best acoustic cues selected for individual phones 2016/2/11 24
Landmark: onset of vowel Nearly the same Eng>Chn Chn>Eng Receiver Operating Characteristic (ROC) 2016/2/11 25
Landmark-: following vowel Eng>Chn Eng>Chn Eng>Chn Chn>Eng 2016/2/11 26
Discussion  English landmarks locating at both start and end of durations for most of the 16 phones slightly outperformed Chinese landmarks that was defined by the empirical analysis of error pairs in the large scale corpus.  Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones. 2016/2/11 27
Convolution Forgetting Curve Model Xie Yanlu Beijing Language and Culture University
Outline  Introduction  Exponential shape forgetting curve model  Convolution Forgetting Curve Model  Experiments in cognitive learning
the procedure of memory(Ebbinghaus H,1913)    f t ( ) a exp( a t ) a 1 2 3 exponential function in forgetting (Wixted, J. T., etc 1991)      f t ( ) a exp( t T / ) a exp( t T / ) a 1 1 2 2 3 (Rubin, David, C.etc 1999)  Quantitative Description  Mathematical Description
Exponential shape forgetting curve model Forgetting curve from University of Waterloo
Procedure of convolution memory model (Baddeley AD.2000) Central Executive Output Input Visuo-spatial sketch- Episodic Buffer Phonological loop pad Long term memory
Convolution Forgetting Curve Model Long-term memory conformation is the result of interaction  of input and the central executive in the working memory. In consideration of the relationship between stimulation  (study) and memory, it is alike interaction of signal and system in circuit theory               y t f h t d f t ( )* ( ) h t 
One time learning convolution model (OCM)      y t t h t h t ( )* ( ) ( )       y t h t ( ) a exp( a t ) a 1 2 3 Parameters represent the personal intrinsic characteristic of the learner
Repeated learning convolution model (RCM) N       N y t ( t T )* ( ) h t      n y t f t ( nT )* ( ) h t  n n 1  n 1 N    h t T ( ) n  n 1 N          y t a exp a t ( T ) Na 1 2 n 3  n 1
Recommend
More recommend