Lecture 7
LVCSR Training and Decoding (Part A) Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
Lecture 7 LVCSR Training and Decoding (Part A) Michael Picheny, - - PowerPoint PPT Presentation
Lecture 7 LVCSR Training and Decoding (Part A) Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
2 / 96
3 / 96
4 / 96
5 / 96
6 / 96
7 / 96
8 / 96
9 / 96
10 / 96
ω
ω
ω
11 / 96
A
A T
T
A
T
M
D
12 / 96
A
A T
T
A
T
M
D
13 / 96
14 / 96
15 / 96
TEN
FOUR
16 / 96
T EH N
F AO R
17 / 96
18 / 96
19 / 96
20 / 96
38
N Is phone 2 positions to the left a vowel no yes Is phone 1 position to the left a long vowel {P EY N}, {R, EY, N}, { | IX N} {P EY N}, {R, EY, N} …. yes no {| IX N} Is phone 1 position to the left a boundary phone yes {| IX N} no Is phone 2 positions to the left a plosive yes {P EY N} {R, EY, N} no
21 / 96
22 / 96
23 / 96
24 / 96
25 / 96
26 / 96
27 / 96
28 / 96
29 / 96
T EH N
F AO R
30 / 96
31 / 96
32 / 96
33 / 96
34 / 96
35 / 96
36 / 96
1
2
3
4
5
37 / 96
38 / 96
39 / 96
40 / 96
41 / 96
A
A T
T
A
T
m=1λat,m D
42 / 96
43 / 96
44 / 96
45 / 96
46 / 96
47 / 96
48 / 96
49 / 96
1
2
3
4
5
50 / 96
51 / 96
52 / 96
53 / 96
54 / 96
55 / 96
56 / 96
57 / 96
58 / 96
59 / 96
60 / 96
P1(x) P1(x) P2(x) P2(x) P3(x) P3(x) P4(x) P4(x) P5(x) P5(x) P6(x) P6(x)
61 / 96
62 / 96
1
2
3
4
5
63 / 96
64 / 96
DH1 DH2 AH1 AH2 D1 D2 AO1 AO2 G1 G2
65 / 96
i P(
66 / 96
67 / 96
1
2
3
4
5
68 / 96
4
69 / 96
70 / 96
71 / 96
72 / 96
73 / 96
74 / 96
4
75 / 96
76 / 96
77 / 96
78 / 96
4
79 / 96
80 / 96
THE DOG
THE1 THE2 THE3 THE4 DOG1 DOG2 DOG3 DOG4 DOG5 DOG6
81 / 96
THE DOG
DH AH D AO G
DH1 DH2 AH1 AH2 D1 D2 AO1 AO2 G1 G2
82 / 96
THE DOG DH AH D AO G
DH1 DH2 AH1 AH2 D1 D2 AO1 AO2 G1 G2 DH1,3 DH2,7 AH1,2 AH2,4 D1,3 D2,9 AO1,1 AO2,1 G1,2 G2,7
83 / 96
84 / 96
85 / 96
~SIL(01) THE(01) THE(02) ~SIL(01) DOG(01) DOG(02) DOG(03) ~SIL(01) ~SIL(01) THE(01) DOG(02) ~SIL(01)
86 / 96
~SIL(01) THE(01) THE(02) ~SIL(01) DOG(01) DOG(02) DOG(03) ~SIL(01)
87 / 96
1
2
3
4
5
88 / 96
89 / 96
90 / 96
91 / 96
92 / 96
ML-SAT-L ML-AD-L
ROVER
Consensus
rescoring 100-best rescoring 100-best 4-gram rescoring 4-gram rescoring 4-gram rescoring 4-gram rescoring 4-gram rescoring
Consensus Consensus Consensus Consensus Consensus
rescoring 100-best 4-gram rescoring 4-gram rescoring 4-gram rescoring 4-gram rescoring
Consensus Consensus Consensus
36.3%
MFCC ML-SAT-L VTLN ML-AD-L ML-SAT ML-AD MMI-SAT MMI-AD ML-SAT ML-AD MFCC-SI PLP VTLN MMI-SAT MMI-AD Consensus
4-gram 100-best rescoring rescoring 38.4% Eval’01 WER 35.6% 31.6% 30.3% 30.1% 30.5% 31.0% 32.1% 29.9% 31.1% 30.2% 28.8% 28.7% 31.4% 29.2% 27.8% 29.2% 29.5% 30.1% 29.8% 30.9% 31.9% 34.3% 42.6% 45.9% Eval’98 WER (SWB only) 34.0% 41.6% 39.3% 38.5% 37.7% 38.7% 38.1% 36.7% 38.7% 30.8% 37.9% 38.1% 37.1% 36.9% 35.9% 35.2% 35.7% 36.5% 38.1% 37.2% 35.5% 37.7%
93 / 96
94 / 96
95 / 96
1
2
96 / 96