Automatic Speech Recognition (CS753) Automatic Speech Recognition - - PowerPoint PPT Presentation
Automatic Speech Recognition (CS753) Automatic Speech Recognition - - PowerPoint PPT Presentation
Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Missing
Qviz-1 Postmortem
- Common Mistakes:
- Missing insertion/deletion
in E.fst
- Forgot to mark final
states/self-loops
- Output vocabulary for
T.fst has to be complete words, “bad”, “bead”, etc. rather than letuers
a) E.fst b) T.fst 10 20 30 40 50
Correct Incorrect
Project Proposal
- Start brainstorming!
- In case of doubt, discuss potential ideas with me during my
- ffice hours (Thur, 5:00 pm to 6:30 pm)
- Once decided, you will have to fill out a form specifying:
- Title of the project
- Names/roll numbers of all project members
- A 300-400 word abstract of the proposed project
- Due by 11:59 pm on Aug 14th
Composition: Recap
- If T1 transduces x to z,
and T2 transduces z to y, then T1 ○ T2 transduces x to y
- Note: output alphabet of T1 ⊆ input alphabet of T2
- E.g. If T1 removes punctuation symbols from a string, and T2 changes
uppercase letuers to lowercase letuers, then T1 ⚬ T2 brings about both changes
Determinization: Recap
- A (W)FST is deterministic if:
- Unique start state
- No two transitions from a state share the same input label
- No epsilon input labels
- Not all WFSAs can be determinized
Determinization: Weighted FSA
Some Weighted-FSAs are not determinizable! [M97] Weight of string n = n and weight of n = 2n
- 3
2 1
- Afuer seeing n an FSA can’t remember n
[M97] M. Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2), 1997
Determinization: Recap
- A (W)FST is deterministic if:
- Unique start state
- No two transitions from a state share the same input label
- No epsilon input labels
- Not all WFSAs can be determinized
- Guaranteed to yield a deterministic WFSA under some technical
conditions characterising the automata (e.g. twins property)
Minimization
Minimization: find an equivalent deterministic FSA with the least number of states (and transitions) Unweighted FSAs have a unique minimal FSA [Aho74]
- 3
2
- 1
- 3
12
- Obtained by identifying and merging equivalent states
Alfred
- V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The design and analysis of computer algorithms. Addison Wesley, 1974.
Minimization: Weighted FSA
Two states are equivalent only if for every input string, the
- utcome — weight assigned to the string, if accepted — starting
from the two states are the same
3 2 1 3 12
Redistribute weights before identifying equivalent states
Minimization: Weighted FSA
Reweighting OK as long as resulting WFSA is equivalent Can reweight using a “potential function” on states
- 3
2
- 1
- “Weight pushing”: Reweighting using a potential function that
- ptimally moves weights towards the start state
- 3
2
- 1
- 2
1
- 2
+2 +1 +1
- 1
- 1
- 2
Minimization: Weighted FSA
Afuer weight-pushing, can simply apply unweighted FSA minimization (treating label/weight as label) Guaranteed to yield a minimal WFSA (under some technical conditions required for weight-pushing)
- 3
12
- 3
2
- 1
Toolkits to work with finite-state machines
- AT&T FSM Library (no longer supported)
htup://www3.cs.stonybrook.edu/~algorith/implement/fsm/ implement.shtml
- RWTH FSA Toolkit
htups://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html
- Carmel
htups://www.isi.edu/licensed-sw/carmel/
- MIT FST Toolkit
htup://people.csail.mit.edu/ilh/fst/
- OpenFST Toolkit (actively supported)
htup://www.openfst.org/twiki/bin/view/FST/WebHome
Brief Introduction to OpenFst
- a
1 2 an
1 an a <eps> an 1 a 2 <eps> a 1 n 2 1 2 <eps> n 2 a a 1 2
Input alphabet (in.txt) Output alphabet (out.txt)
“ ”
- l
a b e l
- i
s
- r
e s e r v e d
- f
- r
- e
p s i l
- n
A.txt
Qvick Intro to OpenFst (www.openfst.org)
- a
1 2/0.1 an
1 an a 0.5 1 2 <eps> n 1.0 2 a a 0.5 1 2 0.1
Qvick Intro to OpenFst (www.openfst.org)
Compiling & Printing FSTs
The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities
- Command used to compile:
fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst
- Get back the text FST using a print command with the binary file:
fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt
Drawing FSTs
Small FSTs can be visualized easily using the draw tool:
fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf
1 an:a 2 a:a <eps>:n
FSTs can get very large!
WFSTs applied to ASR
Acoustic Indices
WFST-based ASR System
Language Model
Word Sequence
Acoustic Models Triphones Context Transducer Monophones Pronunciation Model Words
WFST-based ASR System
Acoustic Indices
Language Model
Word Sequence
Acoustic Models Triphones Context Transducer Monophones Pronunciation Model Words
H
a/a_b b/a_b
. . .
x/y_z
One 3-state HMM for each triphone
f1:ε
}
FST Union + Closure Resulting FST
H
f2:ε f3:ε f4:ε f5:ε f4:ε f6:ε f0:a:a_b
WFST-based ASR System
ε,* x,ε x:x/ ε_ε x,x x:x/ ε_x x,y x:x/ ε_y y,ε y:y/ ε_ε y,x y:y/ ε_x y,y y:y/ ε_y x:x/x_ε x:x/x_x x:x/x_y y:y/x_ ε y:y/x_x y:y/x_y x:x/y_ε x:x/y_x x:x/y_y y:y/y_ε y:y/y_x y:y/y_y
Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002
Arc labels: “monophone : phone / lefu-context_right-context” C-1:
C
Acoustic Indices
Language Model
Word Sequence
Acoustic Models Triphones Context Transducer Monophones Pronunciation Model Words
Acoustic Indices
Language Model
Word Sequence
Acoustic Models Triphones Context Transducer Monophones Pronunciation Model Words
WFST-based ASR System
L
Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002
(a)
(b) 1 d:data/1 5 d:dew/1 2 ey:ε/0.5 ae:ε/0.5 6 uw:ε/1 3 t:ε/0.3 dx:ε/0.7 4 ax: ε/1
Acoustic Indices
Language Model
Word Sequence
Acoustic Models Triphones Context Transducer Monophones Pronunciation Model Words
WFST-based ASR System
the birds/0.404 animals/1.789 are/0.693 were/0.693 boy/1.789 is walking