[PPT] - Automatic Speech Recognition (CS753) Automatic Speech Recognition PowerPoint Presentation

SLIDE 1

Instructor: Preethi Jyothi Lecture 5 

Automatic Speech Recognition (CS753)

Lecture 5: Hidden Markov Models (Part I)

Automatic Speech Recognition (CS753)

SLIDE 2

OpenFst Cheat Sheet

SLIDE 3

a

1 2 an

1 an a <eps> an 1 a 2 <eps> a 1 n 2 1 2 <eps> n 2 a a 1 2

Input  alphabet   (in.txt) Output  alphabet  (out.txt)

“ ”

l

a b e l

i

s

r

e s e r v e d

f
r
e

p s i l

n

A.txt

Qvick Intro to OpenFst (www.openfst.org)

SLIDE 4

a

1 2/0.1 an

1 an a 0.5 1 2 <eps> n 1.0 2 a a 0.5 1 2 0.1

Qvick Intro to OpenFst (www.openfst.org)

SLIDE 5

Compiling & Printing FSTs

The text FSTs need to be “compiled” into binary objects before further use with OpenFst utilities

Command used to compile:

fstcompile --isymbols=in.txt --osymbols=out.txt A.txt A.fst

Get back the text FST using a print command with the binary file:

fstprint --isymbols=in.txt --osymbols=out.txt A.fst A.txt

SLIDE 6

Drawing FSTs

Small FSTs can be visualized easily using the draw tool:

fstdraw --isymbols=in.txt --osymbols=out.txt A.fst | dot -Tpdf > A.pdf

1 an:a 2 a:a <eps>:n

SLIDE 7

Fairly large FST!

SLIDE 8

Hidden Markov Models (HMMs)

Following slides contain figures/material from “Hidden Markov Models”,   Chapter 9, “Speech and Language Processing”, D. Jurafsky and J. H. Martin, 2016. (htups://web.stanford.edu/~jurafsky/slp3/9.pdf)

SLIDE 9

Markov Chains

Q = q1q2 ...qN a set of N states A = a01a02 ...an1 ...ann a transition probability matrix A, each ai j rep- resenting the probability of moving from state i to state j, s.t. Pn

j=1 ai j = 1 ∀i

q0,qF a special start state and end (final) state that are not associated with observations π = π1,π2,...,πN an initial probability distribution over states. πi is the probability that the Markov chain will start in state i. Some states j may have πj = 0, meaning that they cannot be initial

states. Also, Pn

i=1 πi = 1

QA = {qx,qy...} a set QA ⊂ Q of legal accepting states

SLIDE 10

Hidden Markov Model

Q = q1q2 ...qN a set of N states A = a11a12 ...an1 ...ann a transition probability matrix A, each ai j rep- resenting the probability of moving from state i to state j, s.t. Pn

j=1 ai j = 1 ∀i

O = o1o2 ...oT a sequence of T observations, each one drawn from a vocabulary V = v1,v2,...,vV B = bi(ot) a sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation ot being gen- erated from a state i q0,qF a special start state and end (final) state that are not associated with observations, together with transition probabilities a01a02 ...a0n out of the start state and a1Fa2F ...anF into the end state

SLIDE 11

HMM Assumptions

start0

COLD2

HOT1

B2

P(1 | COLD) .5 P(2 | COLD) = .4 P(3 | COLD) .1

.2 .8 .5 .6 .4 .3

P(1 | HOT) .2 P(2 | HOT) = .4 P(3 | HOT) .4

B1

end3

.1 .1

Markov Assumption:

P(qi|q1...qi−1) = P(qi|qi−1)

Output Independence: P(oi|q1 ...qi,...,qT,o1,...,oi,...,oT) = P(oi|qi)

SLIDE 12

Three problems for HMMs

Problem 1 (Likelihood): Given an HMM λ = (A,B) and an observation se- quence O, determine the likelihood P(O|λ). Problem 2 (Decoding): Given an observation sequence O and an HMM λ = (A,B), discover the best hidden state sequence Q. Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B.

Computing Likelihood: Given an HMM λ = (A,B) and an observation sequence O, determine the likelihood P(O|λ).

SLIDE 13

Forward Trellis

start

H C H C H C

end

P(C|start) * P(3|C) .2 * .1 P(H|H) * P(1|H) .6 * .2 P(C|C) * P(1|C) .5 * .5 P(C|H) * P(1|C) .3 * .5 P(H|C) * P(1|H) .4 * .2 P(H|start)*P(3|H) .8 * .4

α1(2)=.32 α1(1) = .02 α2(2)= .32*.12 + .02*.08 = .040 α2(1) = .32*.15 + .02*.25 = .053

start start start

t

C H end end end

qF q2 q1 q0

1

3

2
3

1 3

αt( j) = P(o1,o2 ...ot,qt = j|λ)

αt(j) =

N

X

i=1

αt−1(i)aijbj(ot)

SLIDE 14

Forward Algorithm

1. Initialization:

α1(j) = a0jbj(o1) 1 ≤ j ≤ N

2. Recursion (since states 0 and F are non-emitting):

αt( j) =

N

X

i=1

αt−1(i)aijbj(ot); 1 ≤ j ≤ N,1 < t ≤ T

3. Termination:

P(O|λ) = αT(qF) =

N

X

i=1

αT(i)aiF

SLIDE 15

Visualizing the forward recursion

t-1
t

a1j a2j aNj a3j bj(ot)

αt(j)= Σi αt-1(i) aij bj(ot)

q1 q2 q3 qN q1 qj q2 q1 q2

t+1
t-2

q1 q2 q3 q3 qN qN

αt-1(N) αt-1(3) αt-1(2) αt-1(1) αt-2(N) αt-2(3) αt-2(2) αt-2(1)

SLIDE 16

Three problems for HMMs

Problem 1 (Likelihood): Given an HMM λ = (A,B) and an observation se- quence O, determine the likelihood P(O|λ). Problem 2 (Decoding): Given an observation sequence O and an HMM λ = (A,B), discover the best hidden state sequence Q. Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B. Decoding: Given as input an HMM λ = (A,B) and a sequence of ob- servations O = o1,o2,...,oT, find the most probable sequence of states Q = q1q2q3 ...qT.

SLIDE 17

Viterbi Trellis

start

H C H C H C

end

P(C|start) * P(3|C) .2 * .1 P(H|H) * P(1|H) .6 * .2 P(C|C) * P(1|C) .5 * .5 P(C|H) * P(1|C) .3 * .5 P(H|C) * P(1|H) .4 * .2 P(H|start)*P(3|H) .8 * .4

v1(2)=.32 v1(1) = .02 v2(2)= max(.32*.12, .02*.08) = .038 v2(1) = max(.32*.15, .02*.25) = .048

start start start

t

C H end end end

qF q2 q1 q0

1
2
3

3 1 3

vt( j) = max

q0,q1,...,qt−1 P(q0,q1...qt−1,o1,o2 ...ot,qt = j|λ)

vt( j) =

N

max

i=1 vt1(i) ai j b j(ot)

SLIDE 18

Viterbi recursion

1. Initialization:

v1(j) = a0jbj(o1) 1 ≤ j ≤ N bt1(j) = 0

2. Recursion (recall that states 0 and qF are non-emitting):

vt( j) =

N

max

i=1 vt−1(i)ai j b j(ot); 1 ≤ j ≤ N,1 < t ≤ T

btt( j) =

N

argmax

i=1

vt−1(i)aij b j(ot); 1 ≤ j ≤ N,1 < t ≤ T

3. Termination:

The best score: P∗ = vT(qF) =

N

max

i=1 vT(i)∗aiF

The start of backtrace: qT∗ = btT(qF) =

N

argmax

i=1

vT(i)∗aiF

SLIDE 19

Viterbi backtrace

start

H C H C H C

end

P(C|start) * P(3|C) .2 * .1 P(H|H) * P(1|H) .6 * .2 P(C|C) * P(1|C) .5 * .5 P(C|H) * P(1|C) .3 * .5 P(H|C) * P(1|H) .4 * .2 P ( H | s t a r t ) * P ( 3 | H ) . 8 * . 4

v1(2)=.32 v1(1) = .02 v2(2)= max(.32*.12, .02*.08) = .038 v2(1) = max(.32*.15, .02*.25) = .048

start start start C H end end end

qF q2 q1 q0

1
2
3

3 1 3