Automatic Speech Recognition (CS753)
Lecture 18: Search & Decoding (Part I)Automatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Automatic Speech Recognition - - PowerPoint PPT Presentation
Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 18: Search & Decoding (Part I) Instructor: Preethi Jyothi Mar 23, 2017 Recall ASR Decoding W = arg max Pr( O A | W ) Pr( W ) W " N 8 9 # 2 3
Automatic Speech Recognition (CS753)
Lecture 18: Search & Decoding (Part I)Automatic Speech Recognition (CS753)
Recall ASR Decoding
W ∗ = arg max W Pr(OA|W) Pr(W) W ∗ = arg max wN 1 ,N 8 < : " N Y n=1 Pr(wn|wn−1 n−m+1) # 2 4 X qT 1 ,wN 1 T Y t=1 Pr(Ot|qt, wN 1 ) Pr(qt|qt−1, wN 1 ) 3 5 9 = ; ≈ arg max wN 1 ,N (" N Y n=1 Pr(wn|wn−1 n−m+1) # " max qT 1 ,wN 1 T Y t=1 Pr(Ot|qt, wN 1 ) Pr(qt|qt−1, wN 1 ) #) ViterbiRecall Viterbi search
ASR Search Network
the birds are boy is walking d ax bTime-state trellis
word1 word2 word3 Time, t →Viterbi search over the large trellis
Viterbi search over the large trellis
Two main WFST Optimizations
Recall not all weighted transducers are determinizable To ensure determinizability of L ○ G, introduce disambiguation symbols in L to deal with homophones in the lexicon read : r eh d #0 red : r eh d #1Two main WFST Optimizations
Minimization ensures that the final composed machine has minimum number of states Final optimization cascade: N = πε(min(det(H ̃ ○ det(C ̃ ○ det(L̃ ○ G))))) Replaces disambiguation symbols in input alphabet of H ̃ with εExample G
1 bob:bob bond:bond rob:rob 2 slept:slept read:read ate:ateCompact language models (G)
Example G
1 bob:bob bond:bond rob:rob 2 slept:slept read:read ate:ateExample L̃ :Lexicon with disambig symbols
1 b:bob 5 b:bond 9 r:rob 12 s:slept 17 r:read 20 ey:ate 2 aa:- 6 aa:- 10 aa:- 13 l:- 18 eh:- 21 t:- 3 b:- 4 #0:-L̃ ○ G
1 b:bob 2 b:bond 3 r:rob 4 aa:- 5 aa:- 6 aa:- 7 b:- 8 n:- 9 b:- 10 #0:- 11 d:- 12det(L̃ ○ G)
1 b:- 2 r:rob 3 aa:- 4 aa:- 5 b:bob 6 n:bond 7 b:- 8 #0:- 9 d:- 10min(det(L̃ ○ G))
1 b:- 2 r:rob 3 aa:- 4 aa:- 5 b:bob 6 n:bond 7 b:- #0:- d:- 8det(L̃ ○ G)
1 b:- 2 r:rob 3 aa:- 4 aa:- 5 b:bob 6 n:bond 7 b:- 8 #0:- 9 d:- 10Viterbi search over the large trellis
Beam pruning
Beam search
Static and dynamic networks
Multi-pass search
Multi-pass decoding with N-best lists
Multi-pass decoding with N-best lists
Multi-pass decoding with latuices
Multi-pass decoding with latuices