1
HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - - PowerPoint PPT Presentation
HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - - PowerPoint PPT Presentation
HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity Pittsburgh,PA 1 Acknowledgements Muchofthistalkisderivedfromthepaper
2
Acknowledgements
Muchofthistalkisderivedfromthepaper "AnIntroductiontoHiddenMarkovModels", by Rabiner and Juang andfromthetalk "HiddenMarkovModels:ContinuousSpeech Recognition" byKai-FuLee
3
Topics
- MarkovModelsandHiddenMarkovModels
- HMMs appliedtospeechrecognition
- Training
- Decoding
4
SpeechRecognition
Front End Match Search
O1O2 OT
Analog Speech Discrete Observations
W1W2 W T
Word Sequence
5
MLContinuousSpeechRecognition
Goal: GivenacousticdataA=a1,a2,..., ak FindwordsequenceW=w1,w2,... wn SuchthatP(W|A)ismaximized P(W|A)= P(A|W)•P(W) P(A)
acousticmodel(HMMs) languagemodel
Bayes Rule: P(A)isaconstantforacompletesentence
6
MarkovModels
Elements: States: Transitionprobabilities: MarkovAssumption: Transitionprobabilitydependsonlyoncurrentstate
S = S 0,S 1, S N
P(A|A) P(B|B) P(B|A) P(A|B) A B
P qt = Si |qt-1 = Sj, qt-2 = Sk, =P qt = Si |qt-1=Sj = aji
aji≥0∀j,i
P qt=Si|qt-1=Sj
j N i ji
a ∀ =
∑
=
- 1
7
SingleFairCoin
0.5 0.5 0.5 0.5
1 2
P(H)=1.0 P(T)=0.0 P(H)=0.0 P(T)=1.0 Outcomeheadcorrespondstostate1,tailtostate2 Observationsequenceuniquelydefinesstatesequence
8
HiddenMarkovModels
P(A|A) P(B|B) P(B|A) P(A|B) A B
Obs Prob Prob Obs
P O1|B P O2|B P OM|B P O1|A P O2|A P OM|A
S= S0,S1, SN P qt=Si|qt-1=Sj =aji
Elements: States Transitionprobabilities Output prob distributions (atstatejforsymbolk)
P(yt=Ok|qt=Sj)=bj k
9
DiscreteObservationHMM
P(R)=0.31 P(B)=0.50 P(Y)=0.19 P(R)=0.50 P(B)=0.25 P(Y)=0.25 P(R)=0.38 P(B)=0.12 P(Y)=0.50
- Observationsequence:RBYY•••R
notuniquetostatesequence
10
HMMs InSpeechRecognition
Representspeechasasequenceofobservations UseHMMtomodelsomeunitofspeech(phone,word) Concatenateunitsintolargerunits PhoneModel WordModel
d ih d
ih
11
HMMProblemsAndSolutions
Evaluation:
- Problem- Compute Probabilty ofobservation
sequencegivenamodel
- Solution- ForwardAlgorithmand Viterbi Algorithm
Decoding:
- Problem- Findstatesequencewhichmaximizes
probabilityofobservationsequence
- Solution- Viterbi Algorithm
Training:
- Problem- Adjustmodelparameterstomaximize
probabilityofobservedsequences
- Solution- Forward-BackwardAlgorithm
12
Evaluation
Probabilityofobservationsequence givenHMMmodelλ λ λ λis: Notpracticalsincethenumberofpathsis O(NT) Q=q0q1 …qT isastatesequence
O=O1O2 OT
N=numberofstatesinmodel T=numberofobservationsinsequence
( ) ( )
| , |
∑
∀
=
Q
Q O P O P λ λ
( ) ( ) ( )
T q q q q q q q q q
O b a O b a O b a
T T T 1 2 2 1 1 1
2 1
−
× =∑ L
13
TheForwardAlgorithm
α t j =P(O1O2 Ot,qt=Sj|λ)
Computeα α α α recursively:
α 0 j =
1 ifjisstartstate 0 otherwise
( )
( ) ( )
- 1
> ∑
= − =
t O b a i j
t j N i ij t t
α α
( ) ( )
) (
- is
- n
Computatio
- |
2T
N O S O P
N T
α λ =
14
ForwardTrellis
1.0
state1
0.6*0.8 0.6*0.8 0.6*0.2 0.0
state2
1.0*0.3 1.0*0.3 1.0*0.7 0.48 0.23 0.03 0.12 0.09 0.13 0.4*0.3 0.4*0.3 0.4*0.7
t=0 t=1 t=2 t=3 A A B
A0.8 B0.2 A0.3 B0.7
Initial Final 0.6 1.0 0.4
15
TheBackwardAlgorithm
Computeβ
β β β recursively:
1 ifiisendstate 0 otherwise
β
T (i)=
β
t(i)=P(Ot+1Ot+2
OT| qt=Si,λ )
( ) ( ) ( )
T t j O b a i
t t j ij t
< = ∑
+ +
- 1
1 β
β
j=0 N
( ) ( ) ( )
) (
- is
- n
Computatio
- |
2
T N O S S O P
N T
α β λ = =
16
BackwardTrellis
0.13
state1
0.6*0.8 0.6*0.8 0.6*0.2 0.06
state2
1.0*0.3 1.0*0.3 1.0*0.7 0.22 0.28 0.0 0.21 0.7 1.0 0.4*0.3 0.4*0.3 0.4*0.7
t=0 t=1 t=2 t=3 A A B
A0.8 B0.2 A0.3 B0.7
Initial Final 0.6 1.0 0.4
17
The Viterbi Algorithm
Fordecoding: Findthestatesequence Qwhichmaximizes P(O,Q|λ λ λ λ ) SimilartoForwardAlgorithmexcept MAXinsteadof SUM
VPt(i)=MA Xq0,
qt-1P(O1O2
Ot,qt=i|λ) VPt(j)=MAXi=0,
,NVPt-1(i)aijbj(Ot)t>0
RecursiveComputation: Saveeachmaximumfor backtrace atend
P(O,Q|λ)=VPT(SN)
18
Viterbi Trellis
1.0
state1
0.6*0.8 0.6*0.8 0.6*0.2 0.0
state2
1.0*0.3 1.0*0.3 1.0*0.7 0.48 0.23 0.03 0.12 0.06 0.06 0.4*0.3 0.4*0.3 0.4*0.7
t=0 t=1 t=2 t=3 A A B
A0.8 B0.2 A0.3 B0.7
Initial Final 0.6 1.0 0.4
19
TrainingHMMParameters
TrainparametersofHMM
- TuneλtomaximizeP(O|λ )
- Noefficientalgorithmforglobaloptimum
- Efficientiterativealgorithmfindsalocaloptimum
Baum-Welch(Forward-Backward)re-estimation
- Computeprobabilitiesusingcurrentmodelλ
- Refineλ−−> λbasedoncomputedvalues
- Use α andβ fromForward-Backward
20
Forward-BackwardAlgorithm
Probabilityoftransitingfromto attimetgivenO
Sj
S i
=P(qt=Si,qt+1=Sj|O, λ)
= α t(i)a ijb j(Ot+1)βt+1(j) P(O| λ)
αt(i)
aijbj(Ot+1)
βt+1(j)
ξt(i,j)=
21
Baum-Welch Reestimation
bj(k)=expectednumberoftimesinstatejwithsymbolk expectednumberoftimesinstatej
aij= expectednumberoftransfrom SitoSj expectednumberoftransfrom Si
( ) ( )
∑∑ ∑
− = = − =
=
1 1 T t N j t T t t
j i j i , , ξ ξ
( ) ( )
∑∑ ∑ ∑
− = = = =
=
1
- T
t N i t k O t N i t
j i j i
t+1
, ,
:
ξ ξ
22
ConvergenceofFBAlgorithm
- 1. Initializeλ
λ λ λ=(A,B)
- 2. Computeα
α α α,β β β β,andξ ξ ξ ξ 3. 3. 3. 3.Estimateλ λ λ λ =(A,B)fromξ ξ ξ ξ 4. 4. 4. 4.Replaceλ λ λ λ withλ λ λ λ 5.Ifnotconvergedgoto2 ItcanbeshownthatP(O|λ λ λ λ)>P(O|λ λ λ λ)unlessλ λ λ λ =λ λ λ λ
23
HMMs InSpeechRecognition
Representspeechasasequenceofsymbols UseHMMtomodelsomeunitofspeech(phone,word) OutputProbabilities- Prob ofobservingsymbolinastate Transition Prob - Prob ofstayinginorskippingstate PhoneModel
24
Training HMMs forContinuousSpeech
- Useonly orthograph transcriptionofsentence
- noneedforsegmented/labelled data
- Concatenatephonemodelstogivewordmodel
- Concatenatewordmodelstogivesentencemodel
- Trainentiresentencemodelonentirespokensentence
25
Forward-BackwardTraining forContinuousSpeech
SHOW ALL ALERTS SH OW AA L AX L ER TS
26
RecognitionSearch
/w/->/ah/->/ts//th/->/ax/ what's the display kirk's willamette's sterett's location longitude lattitude /w/ /ah/ /ts/ /th/ /ax/
27
Viterbi Search
- Uses Viterbi decoding
- TakesMAX,notSUM
- FindsoptimalstatesequenceP(O,Q|λ
λ λ λ ) notoptimalwordsequenceP(O|λ λ λ λ )
- Timesynchronous
- Extendsallpathsby1timestep
- Allpathshavesamelength(noneedto
normalizetocomparescores)
28
Viterbi SearchAlgorithm
- 0. Createstatelistwithonecellforeachstateinsystem
- 1. Initializestatelistwithinitialstatesfortimet=0
2.Clearstatelistfortimet+1
- 3. Computewithin-wordtransitionsfromtimettot+1
- Ifnewstatereached,updatescoreand BackPtr
- Ifbetterscoreforstate,updatescoreand BackPtr
- 4. Computebetweenwordtransitionsattimet+1
- Ifnewstatereached,updatescoreand BackPtr
- Ifbetterscoreforstate,updatescoreand BackPtr
- 5. Ifendofutterance,print backtrace andquit
6.Elseincrementtandgotostep2
29
Viterbi SearchAlgorithm
Word1 Word2 timet timet+1 Word1 Word2 S1 S2 S3 S1 S1 S1 S2 S2 S2 S3 S3 S3
OldProb(S1)• OutProb • Transprob OldProb(S3)•P(W2|W1)
Score BackPtr ParmPtr
30
Viterbi BeamSearch
Viterbi Search Allstatesenumerated Notpracticalforlargegrammars Moststatesinactiveatanygiventime Viterbi BeamSearch- prunelesslikelypaths Statesworsethanthresholdrangefrombestarepruned FromandTostructurescreateddynamically- listofactive states
31
Viterbi BeamSearch
timet timet+1 Word1 Word2 S1 S1 S2 S3 FROM BEAM TO BEAM Stateswithinthreshold frombeststate Dynamicallyconstructed ?
Withinthreshold? ExistinTObeam? Betterthanexisting scoreinTObeam?
32
ContinuousDensity HMMs
Modelsofarhasassumed discete observations, eachobservationinasequencewasoneofasetofM discretesymbols SpeechinputmustbeVector Quantized inorderto providediscreteinput. VQleadsto quantization error Thediscreteprobabilitydensitybj(k)canbereplaced withthecontinuousprobabilitydensitybj(x) wherex istheobservationvector Typically Gaussian densitiesareused Asingle Gaussian isnotadequate,soaweightedsumof Gaussians isusedtoapproximateactualPDF
33
MixtureDensityFunctions
bj x
istheprobabilitydensityfunctionforstatej x =Observationvector M=Numberofmixtures(Gaussians) =Weightofmixtureminstatejwhere N= Gaussian densityfunction =Meanvectorformixturem,statej =Covariancematrixformixturem,statej
x1,x2, ,xD cjm µjm Ujm
( )
[ ]
∑
=
=
M m jm jm jm j
U x N c x b
1
, , µ
1
1
=
∑
= M m jm
c
34
DiscreteHmmvs.ContinuousHMM
ProblemswithDiscrete:
- quantization errors
- Codebookand HMMsmodelled separately
- ProblemswithContinuousMixtures:
- Smallnumberofmixturesperformspoorly
- Largenumberofmixturesincreasescomputation
andparameterstobeestimated
- ContinuousmakesmoreassumptionsthanDiscrete,
especiallyifdiagonalcovariance pdf
- Discreteprobabilityisatablelookup,continuous
mixturesrequiremanymultiplications
cjm,µjm,Ujmforj=1, ,Nandm=1, ,M
35