automatic speech recognition cs753 automatic speech
play

Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Missing


  1. Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017

  2. Qv iz-1 Postmortem Common Mistakes: • Correct Incorrect Missing insertion/deletion 
 • in E.fst a) E.fst Forgot to mark final 
 • states/self-loops b) T.fst Output vocabulary for 
 • T.fst has to be complete 
 words, “bad”, “bead”, etc. 
 0 10 20 30 40 50 rather than le tu ers

  3. Project Proposal Start brainstorming! • In case of doubt, discuss potential ideas with me during my • o ff ice hours (Thur, 5:00 pm to 6:30 pm) Once decided, you will have to fill out a form specifying: • Title of the project • Names/roll numbers of all project members • A 300-400 word abstract of the proposed project • Due by 11:59 pm on Aug 14th •

  4. ����� ����� ����� Composition: Recap If T 1 transduces x to z , 
 • and T 2 transduces z to y , 
 then T 1 ○ T 2 transduces x to y Note: output alphabet of T 1 ⊆ input alphabet of T 2 • E.g. If T 1 removes punctuation symbols from a string, and T 2 changes 
 • uppercase le tu ers to lowercase le tu ers, then T 1 ⚬ T 2 brings about 
 both changes

  5. Determinization: Recap A (W)FST is deterministic if: • Unique start state • No two transitions from a state share the same input label • No epsilon input labels • Not all WFSAs can be determinized •

  6. ��� ��� ��� ��� ��� ��� Determinization: Weighted FSA Some Weighted -FSAs are not determinizable! [M97] 1 0 3 2 Weight of string �� n � = n and weight of �� n � = 2 n A fu er seeing �� n an FSA can’t remember n [M97] M. Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2), 1997

  7. Determinization: Recap A (W)FST is deterministic if: • Unique start state • No two transitions from a state share the same input label • No epsilon input labels • Not all WFSAs can be determinized • Guaranteed to yield a deterministic WFSA under some technical • conditions characterising the automata (e.g. twins property)

  8. � � � � � � � � � � � � Minimization Minimization : find an equivalent deterministic FSA with the least number of states (and transitions) Unweighted FSAs have a unique minimal FSA [Aho74] 1 12 0 3 0 3 2 Obtained by identifying and merging equivalent states Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The design and analysis of computer algorithms. Addison Wesley, 1974.

  9. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA Two states are equivalent only if for every input string, the outcome — weight assigned to the string, if accepted — starting from the two states are the same 1 12 0 3 0 3 2 Redistribute weights before identifying equivalent states

  10. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA Reweighting OK as long as resulting WFSA is equivalent Can reweight using a “potential function” on states +2 2 -2 1 1 0 0 -2 +1 3 0 3 0 +1 -1 2 -1 2 1 “Weight pushing”: Reweighting using a potential function that optimally moves weights towards the start state

  11. ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� Minimization: Weighted FSA A fu er weight-pushing, can simply apply unweighted FSA minimization (treating label/weight as label) 1 3 0 12 3 0 2 Guaranteed to yield a minimal WFSA (under some technical conditions required for weight-pushing)

  12. Toolkits to work with finite-state machines AT&T FSM Library (no longer supported) 
 • h tu p://www3.cs.stonybrook.edu/~algorith/implement/fsm/ implement.shtml RWTH FSA Toolkit 
 • h tu ps://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html Carmel 
 • h tu ps://www.isi.edu/licensed-sw/carmel/ MIT FST Toolkit 
 • h tu p://people.csail.mit.edu/ilh/fst/ OpenFST Toolkit (actively supported) 
 • h tu p://www.openfst.org/twiki/bin/view/FST/WebHome

  13. Brief Introduction to OpenFst

  14. ��� Qv ick Intro to OpenFst (www.openfst.org) a �� “ 0 ” � l a b e l � i s � r e s e r v e d � f o r � e p s i l o n 0 1 2 an �� 0 1 an a <eps> 0 Input 
 1 2 <eps> n an 1 alphabet 
 (in.txt) 0 2 a a a 2 1 2 <eps> 0 Output 
 a 1 alphabet 
 A.txt (out.txt) n 2

  15. ������� Qv ick Intro to OpenFst (www.openfst.org) a ������ 2/0.1 0 1 an ������ 0 1 an a 0.5 1 2 <eps> n 1.0 0 2 a a 0.5 1 2 0.1

  16. Compiling & Printing FSTs The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities Command used to compile: • fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst Get back the text FST using a print command with the binary file: • fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt

  17. Drawing FSTs Small FSTs can be visualized easily using the draw tool: fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf 1 <eps>:n an:a 0 2 a:a

  18. FSTs can get very large!

  19. WFSTs applied to ASR

  20. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence

  21. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence H a/a_b f 4 : ε f 1 : ε f 3 : ε f 5 : ε f 0 :a: a_b f 2 : ε f 4 : ε f 6 : ε } b/a_b FST Union + One 3-state 
 Closure HMM for 
 Resulting . each 
 FST . triphone H . x/y_z

  22. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence C x:x/ ε _ ε y:y/ ε _x x:x/ ε _y x:x/y_x x:x/y_ ε ε ,* x:x/y_y y,x x, ε x:x/x_x x:x/ ε _x y:y/x_x x:x/x_y x,y x,x y:y/x_y y:y/y_x y:y/y_y y,y y:y/y_ ε y:y/x_ ε y, ε x:x/x_ ε y:y/ ε _y y:y/ ε _ ε C -1 : Arc labels: “monophone : phone / le fu -context_right-context” Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

  23. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence L (a) t: ε /0.3 ax: ε /1 ey: ε /0.5 2 3 4 dx: ε /0.7 ae: ε /0.5 d:data/1 1 0 d:dew/1 uw: ε /1 5 6 (b) Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

  24. WFST-based ASR System Acoustic 
 Context 
 Pronunciation 
 Language 
 Models Transducer Monophones Model Model Acoustic 
 Word 
 Triphones Words Indices Sequence G are/0.693 walking birds/0.404 the 0 were/0.693 animals/1.789 is boy/1.789

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend