HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - - PowerPoint PPT Presentation

hidden markov models in speech recognition
SMART_READER_LITE
LIVE PREVIEW

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - - PowerPoint PPT Presentation

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity Pittsburgh,PA 1 Acknowledgements Muchofthistalkisderivedfromthepaper


slide-1
SLIDE 1

1

HIDDENMARKOVMODELS INSPEECHRECOGNITION

WayneWard CarnegieMellonUniversity Pittsburgh,PA

slide-2
SLIDE 2

2

Acknowledgements

Muchofthistalkisderivedfromthepaper "AnIntroductiontoHiddenMarkovModels", by Rabiner and Juang andfromthetalk "HiddenMarkovModels:ContinuousSpeech Recognition" byKai-FuLee

slide-3
SLIDE 3

3

Topics

  • MarkovModelsandHiddenMarkovModels
  • HMMs appliedtospeechrecognition
  • Training
  • Decoding
slide-4
SLIDE 4

4

SpeechRecognition

Front End Match Search

O1O2 OT

Analog Speech Discrete Observations

W1W2 W T

Word Sequence

slide-5
SLIDE 5

5

MLContinuousSpeechRecognition

Goal: GivenacousticdataA=a1,a2,..., ak FindwordsequenceW=w1,w2,... wn SuchthatP(W|A)ismaximized P(W|A)= P(A|W)•P(W) P(A)

acousticmodel(HMMs) languagemodel

Bayes Rule: P(A)isaconstantforacompletesentence

slide-6
SLIDE 6

6

MarkovModels

Elements: States: Transitionprobabilities: MarkovAssumption: Transitionprobabilitydependsonlyoncurrentstate

S = S 0,S 1, S N

P(A|A) P(B|B) P(B|A) P(A|B) A B

P qt = Si |qt-1 = Sj, qt-2 = Sk, =P qt = Si |qt-1=Sj = aji

aji≥0∀j,i

P qt=Si|qt-1=Sj

j N i ji

a ∀ =

=

  • 1
slide-7
SLIDE 7

7

SingleFairCoin

0.5 0.5 0.5 0.5

1 2

P(H)=1.0 P(T)=0.0 P(H)=0.0 P(T)=1.0 Outcomeheadcorrespondstostate1,tailtostate2 Observationsequenceuniquelydefinesstatesequence

slide-8
SLIDE 8

8

HiddenMarkovModels

P(A|A) P(B|B) P(B|A) P(A|B) A B

Obs Prob Prob Obs

P O1|B P O2|B P OM|B P O1|A P O2|A P OM|A

S= S0,S1, SN P qt=Si|qt-1=Sj =aji

Elements: States Transitionprobabilities Output prob distributions (atstatejforsymbolk)

P(yt=Ok|qt=Sj)=bj k

slide-9
SLIDE 9

9

DiscreteObservationHMM

P(R)=0.31 P(B)=0.50 P(Y)=0.19 P(R)=0.50 P(B)=0.25 P(Y)=0.25 P(R)=0.38 P(B)=0.12 P(Y)=0.50

  • Observationsequence:RBYY•••R

notuniquetostatesequence

slide-10
SLIDE 10

10

HMMs InSpeechRecognition

Representspeechasasequenceofobservations UseHMMtomodelsomeunitofspeech(phone,word) Concatenateunitsintolargerunits PhoneModel WordModel

d ih d

ih

slide-11
SLIDE 11

11

HMMProblemsAndSolutions

Evaluation:

  • Problem- Compute Probabilty ofobservation

sequencegivenamodel

  • Solution- ForwardAlgorithmand Viterbi Algorithm

Decoding:

  • Problem- Findstatesequencewhichmaximizes

probabilityofobservationsequence

  • Solution- Viterbi Algorithm

Training:

  • Problem- Adjustmodelparameterstomaximize

probabilityofobservedsequences

  • Solution- Forward-BackwardAlgorithm
slide-12
SLIDE 12

12

Evaluation

Probabilityofobservationsequence givenHMMmodelλ฀ λ฀ λ฀ λ฀is: Notpracticalsincethenumberofpathsis O(NT) Q=q0q1 …qT isastatesequence

O=O1O2 OT

N=numberofstatesinmodel T=numberofobservationsinsequence

( ) ( )

| , |

=

Q

Q O P O P λ λ

( ) ( ) ( )

T q q q q q q q q q

O b a O b a O b a

T T T 1 2 2 1 1 1

2 1

× =∑ L

slide-13
SLIDE 13

13

TheForwardAlgorithm

α t j =P(O1O2 Ot,qt=Sj|λ)

Computeα฀ α฀ α฀ α฀ recursively:

α 0 j =

1 ifjisstartstate 0 otherwise

( )

( ) ( )

  • 1

> ∑

          = − =

t O b a i j

t j N i ij t t

α α

( ) ( )

) (

  • is
  • n

Computatio

  • |

2T

N O S O P

N T

α λ =

slide-14
SLIDE 14

14

ForwardTrellis

1.0

state1

0.6*0.8 0.6*0.8 0.6*0.2 0.0

state2

1.0*0.3 1.0*0.3 1.0*0.7 0.48 0.23 0.03 0.12 0.09 0.13 0.4*0.3 0.4*0.3 0.4*0.7

t=0 t=1 t=2 t=3 A A B

A0.8 B0.2 A0.3 B0.7

Initial Final 0.6 1.0 0.4

slide-15
SLIDE 15

15

TheBackwardAlgorithm

Computeβ

β β β recursively:

1 ifiisendstate 0 otherwise

β

T (i)=

β

t(i)=P(Ot+1Ot+2

OT| qt=Si,λ )

( ) ( ) ( )

T t j O b a i

t t j ij t

< = ∑

+ +

  • 1

1 β

β

j=0 N

( ) ( ) ( )

) (

  • is
  • n

Computatio

  • |

2

T N O S S O P

N T

α β λ = =

slide-16
SLIDE 16

16

BackwardTrellis

0.13

state1

0.6*0.8 0.6*0.8 0.6*0.2 0.06

state2

1.0*0.3 1.0*0.3 1.0*0.7 0.22 0.28 0.0 0.21 0.7 1.0 0.4*0.3 0.4*0.3 0.4*0.7

t=0 t=1 t=2 t=3 A A B

A0.8 B0.2 A0.3 B0.7

Initial Final 0.6 1.0 0.4

slide-17
SLIDE 17

17

The Viterbi Algorithm

Fordecoding: Findthestatesequence Qwhichmaximizes P(O,Q|λ λ λ λ ) SimilartoForwardAlgorithmexcept MAXinsteadof SUM

VPt(i)=MA Xq0,

qt-1P(O1O2

Ot,qt=i|λ) VPt(j)=MAXi=0,

,NVPt-1(i)aijbj(Ot)t>0

RecursiveComputation: Saveeachmaximumfor backtrace atend

P(O,Q|λ)=VPT(SN)

slide-18
SLIDE 18

18

Viterbi Trellis

1.0

state1

0.6*0.8 0.6*0.8 0.6*0.2 0.0

state2

1.0*0.3 1.0*0.3 1.0*0.7 0.48 0.23 0.03 0.12 0.06 0.06 0.4*0.3 0.4*0.3 0.4*0.7

t=0 t=1 t=2 t=3 A A B

A0.8 B0.2 A0.3 B0.7

Initial Final 0.6 1.0 0.4

slide-19
SLIDE 19

19

TrainingHMMParameters

TrainparametersofHMM

  • Tuneλ฀tomaximizeP(O|λ )
  • Noefficientalgorithmforglobaloptimum
  • Efficientiterativealgorithmfindsalocaloptimum

Baum-Welch(Forward-Backward)re-estimation

  • Computeprobabilitiesusingcurrentmodelλ
  • Refineλ฀−−> λ฀฀basedoncomputedvalues
  • Use α andβ fromForward-Backward
slide-20
SLIDE 20

20

Forward-BackwardAlgorithm

Probabilityoftransitingfromto attimetgivenO

Sj

S i

=P(qt=Si,qt+1=Sj|O, λ)

= α t(i)a ijb j(Ot+1)βt+1(j) P(O| λ)

αt(i)

aijbj(Ot+1)

βt+1(j)

ξt(i,j)=

slide-21
SLIDE 21

21

Baum-Welch Reestimation

bj(k)=expectednumberoftimesinstatejwithsymbolk expectednumberoftimesinstatej

aij= expectednumberoftransfrom SitoSj expectednumberoftransfrom Si

( ) ( )

∑∑ ∑

− = = − =

=

1 1 T t N j t T t t

j i j i , , ξ ξ

( ) ( )

∑∑ ∑ ∑

− = = = =

=

1

  • T

t N i t k O t N i t

j i j i

t+1

, ,

:

ξ ξ

slide-22
SLIDE 22

22

ConvergenceofFBAlgorithm

  • 1. Initializeλ฀

λ฀ λ฀ λ฀=(A,B)

  • 2. Computeα

α α α,β β β β,andξ ξ ξ ξ 3.฀ 3.฀ 3.฀ 3.฀Estimateλ λ λ λ =(A,B)fromξ ξ ξ ξ 4.฀ 4.฀ 4.฀ 4.฀Replaceλ λ λ λ withλ λ λ λ 5.Ifnotconvergedgoto2 ItcanbeshownthatP(O|λ λ λ λ)>P(O|λ λ λ λ)unlessλ λ λ λ =λ฀ λ฀ λ฀ λ฀

slide-23
SLIDE 23

23

HMMs InSpeechRecognition

Representspeechasasequenceofsymbols UseHMMtomodelsomeunitofspeech(phone,word) OutputProbabilities- Prob ofobservingsymbolinastate Transition Prob - Prob ofstayinginorskippingstate PhoneModel

slide-24
SLIDE 24

24

Training HMMs forContinuousSpeech

  • Useonly orthograph transcriptionofsentence
  • noneedforsegmented/labelled data
  • Concatenatephonemodelstogivewordmodel
  • Concatenatewordmodelstogivesentencemodel
  • Trainentiresentencemodelonentirespokensentence
slide-25
SLIDE 25

25

Forward-BackwardTraining forContinuousSpeech

SHOW ALL ALERTS SH OW AA L AX L ER TS

slide-26
SLIDE 26

26

RecognitionSearch

/w/->/ah/->/ts//th/->/ax/ what's the display kirk's willamette's sterett's location longitude lattitude /w/ /ah/ /ts/ /th/ /ax/

slide-27
SLIDE 27

27

Viterbi Search

  • Uses Viterbi decoding
  • TakesMAX,notSUM
  • FindsoptimalstatesequenceP(O,Q|λ

λ λ λ ) notoptimalwordsequenceP(O|λ λ λ λ )

  • Timesynchronous
  • Extendsallpathsby1timestep
  • Allpathshavesamelength(noneedto

normalizetocomparescores)

slide-28
SLIDE 28

28

Viterbi SearchAlgorithm

  • 0. Createstatelistwithonecellforeachstateinsystem
  • 1. Initializestatelistwithinitialstatesfortimet=0

2.Clearstatelistfortimet+1

  • 3. Computewithin-wordtransitionsfromtimettot+1
  • Ifnewstatereached,updatescoreand BackPtr
  • Ifbetterscoreforstate,updatescoreand BackPtr
  • 4. Computebetweenwordtransitionsattimet+1
  • Ifnewstatereached,updatescoreand BackPtr
  • Ifbetterscoreforstate,updatescoreand BackPtr
  • 5. Ifendofutterance,print backtrace andquit

6.Elseincrementtandgotostep2

slide-29
SLIDE 29

29

Viterbi SearchAlgorithm

Word1 Word2 timet timet+1 Word1 Word2 S1 S2 S3 S1 S1 S1 S2 S2 S2 S3 S3 S3

OldProb(S1)• OutProb • Transprob OldProb(S3)•P(W2|W1)

Score BackPtr ParmPtr

slide-30
SLIDE 30

30

Viterbi BeamSearch

Viterbi Search Allstatesenumerated Notpracticalforlargegrammars Moststatesinactiveatanygiventime Viterbi BeamSearch- prunelesslikelypaths Statesworsethanthresholdrangefrombestarepruned FromandTostructurescreateddynamically- listofactive states

slide-31
SLIDE 31

31

Viterbi BeamSearch

timet timet+1 Word1 Word2 S1 S1 S2 S3 FROM BEAM TO BEAM Stateswithinthreshold frombeststate Dynamicallyconstructed ?

Withinthreshold? ExistinTObeam? Betterthanexisting scoreinTObeam?

slide-32
SLIDE 32

32

ContinuousDensity HMMs

Modelsofarhasassumed discete observations, eachobservationinasequencewasoneofasetofM discretesymbols SpeechinputmustbeVector Quantized inorderto providediscreteinput. VQleadsto quantization error Thediscreteprobabilitydensitybj(k)canbereplaced withthecontinuousprobabilitydensitybj(x) wherex istheobservationvector Typically Gaussian densitiesareused Asingle Gaussian isnotadequate,soaweightedsumof Gaussians isusedtoapproximateactualPDF

slide-33
SLIDE 33

33

MixtureDensityFunctions

bj x

istheprobabilitydensityfunctionforstatej x =Observationvector M=Numberofmixtures(Gaussians) =Weightofmixtureminstatejwhere N= Gaussian densityfunction =Meanvectorformixturem,statej =Covariancematrixformixturem,statej

x1,x2, ,xD cjm µjm Ujm

( )

[ ]

=

=

M m jm jm jm j

U x N c x b

1

, , µ

1

1

=

= M m jm

c

slide-34
SLIDE 34

34

DiscreteHmmvs.ContinuousHMM

ProblemswithDiscrete:

  • quantization errors
  • Codebookand HMMsmodelled separately
  • ProblemswithContinuousMixtures:
  • Smallnumberofmixturesperformspoorly
  • Largenumberofmixturesincreasescomputation

andparameterstobeestimated

  • ContinuousmakesmoreassumptionsthanDiscrete,

especiallyifdiagonalcovariance pdf

  • Discreteprobabilityisatablelookup,continuous

mixturesrequiremanymultiplications

cjm,µjm,Ujmforj=1, ,Nandm=1, ,M

slide-35
SLIDE 35

35

ModelTopologies

Ergodic- Fullyconnected,eachstate hastransitiontoeveryotherstate Left-to-Right- Transitionsonlytostateswithhigher indexthancurrentstate.Inherentlyimposetemporalorder. Thesemostoftenusedforspeech.