Landmark in Chinese CAPT
Xie Yanlu
Beijing Language and Culture University
CAPT Xie Yanlu Beijing Language and Culture University Outline - - PowerPoint PPT Presentation
Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University Outline English landmark Methods to select Chinese landmark Experiments in Chinese CAPT Discussion 2016/2/11 3 Objective in using computer aided
Xie Yanlu
Beijing Language and Culture University
Methods to select Chinese landmark Experiments in Chinese CAPT Discussion
2016/2/11
3
Basic fact: learner's erroneous sound always deviates a little from the canonical sound.
“ e ” ” sound: o{w } ” spread rounding Lip: Pinyin: “ e ” “
Rounding sound: “ e{o } ” Spreading “
“
2016/2/11
5
Articulation Acoustics Stable region I
Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features.
P T K F TH S SH B D G V DH Z ZH M N P 80 43 64 17 14 6 2 1 1 1 1 2 T 71 84 55 5 9 3 8 1 1 1 K 66 76 107 12 8 9 4 1 1 F 18 12 9 175 48 11 1 7 2 1 2 2 TH 19 17 16 104 64 32 7 5 4 5 6 4 5 S 8 5 4 23 39 107 45 4 2 3 1 1 3 2 1 SH 1 6 3 4 6 29 195 3 1 B 1 5 4 4 136 10 9 47 16 6 1 5 4 D 8 5 80 45 11 20 20 26 1 G 2 3 63 66 3 19 37 56 3 V 2 2 48 5 5 145 45 12 4 DH 6 31 6 17 86 58 21 5 6 4 Z 1 1 17 20 27 16 28 94 44 1 ZH 1 26 18 3 8 45 129 2 M 1 4 4 1 3 177 46 N 4 1 5 2 7 1 6 47 163
Distinctive Features: ±nasal, ±voiced, ±fricative, ±strident
PET Diacritic s E.g. Notation Spreading w u{w} Round sound “u” has a spreading lip Backing
The tongue position
back Shorting ; p{;} The aspiration duration of phoneme p is shorter Laminalizi ng sh sh{sh} Balade-palatal phoneme sh is pronounced as Japanese lamina- alveolar
raising lowering advancing backing lengthening shortening centralizing rounding spreading labiodentalizing laminalizing devoicing voicing insertion deletion stopping fricativizing lateral nasalizing flapping
2016/2/11
9
PET Diacritics PET Laminalizin g
sh sh、x zh zh、z、j ch ch、q、q6、en x sh j x、sh
Backing
an an、ang、e v v、j ang ang ing ing
Spreading
u u、iu、q6 f f eng eng、ang
Shorting
q q、j、i|sh|、 zh|sh| k k、g r r uo uo
perceptually salient (“salient" = easy to detect), and that has high information density about the message the speaker wishes to communicate.
2016/2/11
10
Stevens, 1999 To recognize a stop consonant, it is necessary and sufficient to hear any one of these:
… three “acoustic landmarks” with very different spectral patterns. “backed”
the temporal midpoint of the vowel the boundary between the vowel and the consonant the middle of the consonant the boundary between the consonant and its following segment
2016/2/11
12
1) For all vowel -type phones (usually has labels that starts with the letters a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/2 and put a V landmark 2) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of the interval, and put a G landmark 3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the Nc landmark, and at the end time, put the Nr landmark 4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put the Sc landmark 5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr landmark 6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time, put the Fc landmark, and at the end time, put the Fr landmark 7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark
2016/2/11
13
Refer to English Landmark in IPA Perception Observation Intuition/Guess?
2016/2/11
14
IPA projection
Nasal: an/ang en/eng in/ing Dorsal: j q x k/z c s Vowel: v u eng r uo
Zh/ch
2016/2/11
15
sh sh zh ch x j an an v ang ang ing ing u f eng eng q k r uo uo
IV+t-N IV-T+N IV-T+n
I V T’ I V T I V N I V N I V N’ I V N’
IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged
pure vowel nasalized vowel nasal consonant
2016/2/11
17
ban1 bang1 V1 T1 N1 V2 T2 N2 Revised1 Revised2 V1 T2 V2 T1
IV+t-N
ban1 bang1 V1 T1 N1 V2 T2 N2 Revised3 Revised4 V1 N1 V2 N2 ban1 bang1 V1 T1 N1 V2 T2 N2 Revised5 Revised6 V1 N2 V2 N1
IV-T+N IV-T+n
IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged
2016/2/11
18
2016/2/11
19
T and VOT (Wu 1989) Coarticulation (Öhman 1966) Initial C, first V, T and P all start at the syllable onset (Xu 2006) We cannot explain the result of Dorsal Due to the landmark ? Or due to the coarticulation ?
2016/2/11
20
2016/2/11
21
Text 301 utterances #speakers 7 females #utterances 1899 #phonemes 26431 Average length per utterance 14 #kinds of specific PETs 65
F1 Score true positive rate (TPR) positive rate (FPR). Receiver Operating Characteristic (ROC): Receiver Operating Characteristic (ROC) metric that formulates the relationship between true positive rate (TPR) and false positive rate (FPR).
2016/2/11
24
2016/2/11
25
Receiver Operating Characteristic (ROC) Nearly the same Eng>Chn Chn>Eng
2016/2/11
26
Eng>Chn Eng>Chn Chn>Eng Eng>Chn
English landmarks locating at both start and end of durations for most of the 16 phones slightly
by the empirical analysis of error pairs in the large scale corpus. Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones.
2016/2/11
27
Xie Yanlu
Beijing Language and Culture University
Quantitative Description Mathematical Description
1 2 3
( ) exp( ) f t a a t a
exponential function in forgetting (Wixted, J. T., etc 1991)
1 1 2 2 3
( ) exp( / ) exp( / ) f t a t T a t T a
(Rubin, David, C.etc 1999)
Forgetting curve from University of Waterloo
Long term memory Central Executive
Visuo-spatial sketch- pad Phonological loop Episodic Buffer
Input Output
Long-term memory conformation is the result of interaction
In consideration of the relationship between stimulation (study) and memory, it is alike interaction of signal and system in circuit theory
( )* ( ) y t f h t d f t h t
1 2 3
( ) exp( ) y t h t a a t a
Parameters represent the personal intrinsic characteristic of the learner
1
( )* ( )
N n n
y t f t nT h t
1 1
( )* ( ) ( )
N n n N n n
y t t T h t h t T
1 2 3 1
exp ( )
N n n
y t a a t T Na
1 2 3 1 1
( ) ( ) * ( ) ( ) ( ) * exp( )
N N n n n n n n
y t u t T u t T h t u t T u t T a a t a
1 2 3 1 2 1 2 2 3 1 2
1 exp( ( )) ( ) exp( ( )) exp( ) 1
N n n n N n n
a a t T a t T t a a a t T a a N t a
2016/2/11
37
Day 1 Day 2 Day 7 Pre- test Mid- test Post test Adaptive training High variability training Synthesized F0 continuity samples Mandarin perception pattern Single syllable database +
The test materials are all the same 60/20 natural words, which are voiced by native speaker. Learners are forced to judge the words’ tone in 5 minutes.
Learn er \Day 1 3 7 1 0.60 0.75 0.80 3 0.87 0.75 0.92 4 0.68 0.85 0.85 5 0.85 0.93 0.95 6 0.93 0.98 0.98 8 0.97 0.95 0.92 12 0.97 1 0.98 13 0.87 0.98 0.98 Avg 0.843 0.899 0.941 Learn er \Day 1 2 3 4 5 6 1 0.75 0.6 0.75 0.95 0.9 0.8 2 0.9 0.9 1 1 1 0.95 3 0.95 0.65 0.95 0.9 1 1 4 0.65 0.85 0.9 0.85 0.9 0.9 5 0.85 0.95 0.75 0.65 0.85 1 6 0.8 1 0.95 1 1 1 7 0.9 0.95 0.75 0.85 0.85 0.95 8 0.85 0.8 0.95 1 0.95 0.95 9 0.9 0.95 0.85 0.85 0.9 0.9 10 0.85 0.9 1 0.95 0.95 1 11 0.9 0.9 0.85 0.9 0.9 0.9 Avg 0.85 0.86 0.88 0.9 0.93 0.94 The probability of recall for the experiments of 20 trails The probability of recall for the experiments of 60 trails
formul a MSE MSE of day 1 and3 MSE
7 r2 a1 a2 a3 1 0.001 0.001 0.003 1 0.05 0.13 0.14 2 0.004 0.984 0.28 0.14
10 20 30 40 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 half day probability of recall formula (1) day1 data3 data7 formula (2)
The forgetting curves of convolution model (average) The calculated a1, a2, a3 with 60 trails tests result (Averaged)
Train day formula MSE of all day MSE
MSE
r2 1 1 0.023 0.028 0.563 2 0.018 0.022 0.906 1 and 2 1 0.018 0.027 0.570 2 0.015 0.022 0.923 1,2 and 3 1 0.003 0.002 0.006 0.992 2 0.007 0.003 0.014 0.931 1,2,3 and 4 1 0.003 0.002 0.008 0.992 2 0.004 0.002 0.010 0.966 1,2,3,4 and 5 1 0.003 0.003 0.010 0.992 2 0.002 0.002 0.001 0.982 1, 2, 3, 4,5 and 6 1 0.002 0.002 0.977 2 0.002 0.002 0.982
2016/2/11
40
The MSE and r2 with 20 trails tests result
5 10 15 20 25 30 35 40 0.75 0.8 0.85 0.9 0.95
half day probability of recall
formula (1) day1 day2 day3 day4 day5 day6 formula (2)
Improving the traditional forgetting curve model. With few memory data, the individual’s forgetting curve can be drawn. Providing a certain basis to design better teaching methods. Some factors that affect the phonetic teaching performance can be analyzed.