Acoustic Modeling for Speech Recognition
References:
- 1. X. Huang et. al. Spoken Language Processing. Chapter 8
- 2. S. Young. The HTK Book (HTK Version 3.2)
Acoustic Modeling for Speech Recognition Berlin Chen 2004 - - PowerPoint PPT Presentation
Acoustic Modeling for Speech Recognition Berlin Chen 2004 References: 1. X. Huang et. al. Spoken Language Processing . Chapter 8 2. S. Young. The HTK Book (HTK Version 3.2) Introduction X = x , x ,..., x For the given acoustic
References:
SP 2004 - Berlin Chen 2
n 2 1
m 2 1
W W W
Language Modeling Acoustic Modeling
N 2 1 i m i 2 1
,.....,v ,v v : V w ,...,w ,..w ,w w where ∈ = W
domain, topic, style, etc. speaker, pronunciation, environment, context, etc.
SP 2004 - Berlin Chen 3
SP 2004 - Berlin Chen 4
Time Domain
Frequency Domain Modeling the cepstral feature vectors
SP 2004 - Berlin Chen 5
=
M m t k t jm k j
1
=
M m jm
1
SP 2004 - Berlin Chen 6
2 1 exp 2 1 , ;
1 1 2 1 2 1 1
∑ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − Σ − − = ∑ = ∑ =
= − = = M m jm t jm T jm t jm L jm M m jm jm t jm M m t jm jm t j
c N c b c b µ
Σ µ
=
=
M m jm
c
1
1
SP 2004 - Berlin Chen 7
k k K k j K k k j j
N k b v f k b b Σ µ
;
1 1
∑ = ∑ =
= =
k m k m K k jm M m m K k k m jm M m m j
, , 1 1 1 , 1
= = = =
SP 2004 - Berlin Chen 8
SP 2004 - Berlin Chen 9
SP 2004 - Berlin Chen 10
subword
SP 2004 - Berlin Chen 11
Syllables (1,345) Base-syllables (408) INITIAL’s (21) FINAL’s (37) Phone-like Units/Phones (33) Tones (4+1)
SP 2004 - Berlin Chen 12
Robustness Enhancement Speaker-independency Speaker-adaptation Speaker-dependency Context-Dependent Acoustic Modeling Pronunciation Variation
SP 2004 - Berlin Chen 13
door)
Pause or intonation information is needed the effect is more important in fast speech
since many phonemes are not fully realized!
SP 2004 - Berlin Chen 14
Statistics of the speaking rates
collected in Taiwan
SP 2004 - Berlin Chen 15
SP 2004 - Berlin Chen 16
SP 2004 - Berlin Chen 17
SP 2004 - Berlin Chen 18
SP 2004 - Berlin Chen 19
allophones: different realizations of a phoneme is called allophones →Triphones are examples of allophones
SP 2004 - Berlin Chen 20
SP 2004 - Berlin Chen 21
SP 2004 - Berlin Chen 22
SP 2004 - Berlin Chen 23
SP 2004 - Berlin Chen 24
SP 2004 - Berlin Chen 25
SP 2004 - Berlin Chen 26
In this example, the tree can be applied to the second state of any /k/ triphone
SP 2004 - Berlin Chen 27
SP 2004 - Berlin Chen 28
model-based clustering state-based clustering
SP 2004 - Berlin Chen 29
ㄊㄧㄢ ㄐㄧㄣ ㄐㄧㄢ
SP 2004 - Berlin Chen 30
SP 2004 - Berlin Chen 31
SP 2004 - Berlin Chen 32
from Ming-yi Tsai
SP 2004 - Berlin Chen 33
from Ming-yi Tsai
SP 2004 - Berlin Chen 34
SP 2004 - Berlin Chen 35
, io (ㄧㄛ, e.g., for 唷 was ignored here)
SP 2004 - Berlin Chen 36
SP 2004 - Berlin Chen 37
1 2 3 4
T: tall t: medium-tall M: medium s: medium-sort S: short
SP 2004 - Berlin Chen 38
SP 2004 - Berlin Chen 39
SP 2004 - Berlin Chen 40
t P ω
t P
i
ω
1 =
i t
P ω
SP 2004 - Berlin Chen 41
i i i t t t
SP 2004 - Berlin Chen 42
t t r l t t
t q *
max arg
SP 2004 - Berlin Chen 43
i i i
i
i
i x i X i X
i
2 1 i
SP 2004 - Berlin Chen 44
1/4 Node P H H = ⋅ = ⋅ =
l l l
2 . 1 3/4 6 . 1 Node P H H = ⋅ = ⋅ =
r r r
2 / 1 1/2 1 Node P H H = ⋅ = ⋅ =
l l l
2 / 1 1/2 1 Node P H H = ⋅ = ⋅ =
r r r
1.2 H H H
2 l
= + = 1.0 H H H = + =
r l
SP 2004 - Berlin Chen 45
terminal is t t
SP 2004 - Berlin Chen 46
1
2
1 1 1
, Σ µ N
2 2 2
, Σ µ N
2 1
2 2 2 2 2 2 1 1 1 1 1 1
x x
2 1 2 2 2 1 1 1
t
X
a, b are the sample counts for and
1
X
2
X