Automatic Speech Recognition (CS753) Automatic Speech Recognition - - PowerPoint PPT Presentation

automatic speech recognition cs753 automatic speech
SMART_READER_LITE
LIVE PREVIEW

Automatic Speech Recognition (CS753) Automatic Speech Recognition - - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST algorithms contd. + WFSTs in ASR Instructor: Preethi Jyothi August 3, 2017 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Missing


slide-1
SLIDE 1

Instructor: Preethi Jyothi August 3, 2017

Automatic Speech Recognition (CS753)

Lecture 4: WFST algorithms contd. + WFSTs in ASR

Automatic Speech Recognition (CS753)

slide-2
SLIDE 2

Qviz-1 Postmortem

  • Common Mistakes:
  • Missing insertion/deletion


in E.fst

  • Forgot to mark final 


states/self-loops

  • Output vocabulary for 


T.fst has to be complete 
 words, “bad”, “bead”, etc. 
 rather than letuers

a) E.fst b) T.fst 10 20 30 40 50

Correct Incorrect

slide-3
SLIDE 3

Project Proposal

  • Start brainstorming!
  • In case of doubt, discuss potential ideas with me during my
  • ffice hours (Thur, 5:00 pm to 6:30 pm)
  • Once decided, you will have to fill out a form specifying:
  • Title of the project
  • Names/roll numbers of all project members
  • A 300-400 word abstract of the proposed project
  • Due by 11:59 pm on Aug 14th
slide-4
SLIDE 4

Composition: Recap

  • If T1 transduces x to z, 


and T2 transduces z to y, 
 then T1 ○ T2 transduces x to y

  • Note: output alphabet of T1 ⊆ input alphabet of T2
  • E.g. If T1 removes punctuation symbols from a string, and T2 changes 


uppercase letuers to lowercase letuers, then T1 ⚬ T2 brings about 
 both changes

slide-5
SLIDE 5

Determinization: Recap

  • A (W)FST is deterministic if:
  • Unique start state
  • No two transitions from a state share the same input label
  • No epsilon input labels
  • Not all WFSAs can be determinized
slide-6
SLIDE 6

Determinization: Weighted FSA

Some Weighted-FSAs are not determinizable! [M97] Weight of string n = n and weight of n = 2n

  • 3

2 1

  • Afuer seeing n an FSA can’t remember n

[M97] M. Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2), 1997

slide-7
SLIDE 7

Determinization: Recap

  • A (W)FST is deterministic if:
  • Unique start state
  • No two transitions from a state share the same input label
  • No epsilon input labels
  • Not all WFSAs can be determinized
  • Guaranteed to yield a deterministic WFSA under some technical

conditions characterising the automata (e.g. twins property)

slide-8
SLIDE 8

Minimization

Minimization: find an equivalent deterministic FSA with the least number of states (and transitions) Unweighted FSAs have a unique minimal FSA [Aho74]

  • 3

2

  • 1
  • 3

12

  • Obtained by identifying and merging equivalent states

Alfred

  • V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The design and analysis of computer algorithms. Addison Wesley, 1974.
slide-9
SLIDE 9

Minimization: Weighted FSA

Two states are equivalent only if for every input string, the

  • utcome — weight assigned to the string, if accepted — starting

from the two states are the same

3 2 1 3 12

Redistribute weights before identifying equivalent states

slide-10
SLIDE 10

Minimization: Weighted FSA

Reweighting OK as long as resulting WFSA is equivalent Can reweight using a “potential function” on states

  • 3

2

  • 1
  • “Weight pushing”: Reweighting using a potential function that
  • ptimally moves weights towards the start state
  • 3

2

  • 1
  • 2

1

  • 2

+2 +1 +1

  • 1
  • 1
  • 2
slide-11
SLIDE 11

Minimization: Weighted FSA

Afuer weight-pushing, can simply apply unweighted FSA minimization (treating label/weight as label) Guaranteed to yield a minimal WFSA (under some technical conditions required for weight-pushing)

  • 3

12

  • 3

2

  • 1
slide-12
SLIDE 12

Toolkits to work with finite-state machines

  • AT&T FSM Library (no longer supported)


htup://www3.cs.stonybrook.edu/~algorith/implement/fsm/ implement.shtml

  • RWTH FSA Toolkit


htups://www-i6.informatik.rwth-aachen.de/~kanthak/fsa.html

  • Carmel


htups://www.isi.edu/licensed-sw/carmel/

  • MIT FST Toolkit


htup://people.csail.mit.edu/ilh/fst/

  • OpenFST Toolkit (actively supported)


htup://www.openfst.org/twiki/bin/view/FST/WebHome

slide-13
SLIDE 13

Brief Introduction to OpenFst

slide-14
SLIDE 14
  • a

1 2 an

1 an a <eps> an 1 a 2 <eps> a 1 n 2 1 2 <eps> n 2 a a 1 2

Input
 alphabet 
 (in.txt) Output
 alphabet
 (out.txt)

“ ”

  • l

a b e l

  • i

s

  • r

e s e r v e d

  • f
  • r
  • e

p s i l

  • n

A.txt

Qvick Intro to OpenFst (www.openfst.org)

slide-15
SLIDE 15
  • a

1 2/0.1 an

1 an a 0.5 1 2 <eps> n 1.0 2 a a 0.5 1 2 0.1

Qvick Intro to OpenFst (www.openfst.org)

slide-16
SLIDE 16

Compiling & Printing FSTs

The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities

  • Command used to compile:

fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst

  • Get back the text FST using a print command with the binary file:

fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt

slide-17
SLIDE 17

Drawing FSTs

Small FSTs can be visualized easily using the draw tool:

fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf

1 an:a 2 a:a <eps>:n

slide-18
SLIDE 18

FSTs can get very large!

slide-19
SLIDE 19

WFSTs applied to ASR

slide-20
SLIDE 20

Acoustic
 Indices

WFST-based ASR System

Language
 Model

Word
 Sequence

Acoustic
 Models Triphones Context
 Transducer Monophones Pronunciation
 Model Words

slide-21
SLIDE 21

WFST-based ASR System

Acoustic
 Indices

Language
 Model

Word
 Sequence

Acoustic
 Models Triphones Context
 Transducer Monophones Pronunciation
 Model Words

H

a/a_b b/a_b

. . .

x/y_z

One 3-state 
 HMM for 
 each 
 triphone

f1:ε

}

FST Union + Closure Resulting FST

H

f2:ε f3:ε f4:ε f5:ε f4:ε f6:ε f0:a:a_b

slide-22
SLIDE 22

WFST-based ASR System

ε,* x,ε x:x/ ε_ε x,x x:x/ ε_x x,y x:x/ ε_y y,ε y:y/ ε_ε y,x y:y/ ε_x y,y y:y/ ε_y x:x/x_ε x:x/x_x x:x/x_y y:y/x_ ε y:y/x_x y:y/x_y x:x/y_ε x:x/y_x x:x/y_y y:y/y_ε y:y/y_x y:y/y_y

Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

Arc labels: “monophone : phone / lefu-context_right-context” C-1:

C

Acoustic
 Indices

Language
 Model

Word
 Sequence

Acoustic
 Models Triphones Context
 Transducer Monophones Pronunciation
 Model Words

slide-23
SLIDE 23

Acoustic
 Indices

Language
 Model

Word
 Sequence

Acoustic
 Models Triphones Context
 Transducer Monophones Pronunciation
 Model Words

WFST-based ASR System

L

Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

(a)

(b) 1 d:data/1 5 d:dew/1 2 ey:ε/0.5 ae:ε/0.5 6 uw:ε/1 3 t:ε/0.3 dx:ε/0.7 4 ax: ε/1

slide-24
SLIDE 24

Acoustic
 Indices

Language
 Model

Word
 Sequence

Acoustic
 Models Triphones Context
 Transducer Monophones Pronunciation
 Model Words

WFST-based ASR System

the birds/0.404 animals/1.789 are/0.693 were/0.693 boy/1.789 is walking

G