hidden markov models
play

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner - PowerPoint PPT Presentation

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3. L. R. Rabiner, (1989) A Tutorial on


  1. Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3. L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 1

  2. Speech Recognition - Acoustic Processing a 11 a 22 a 33 Speech Waveform s=1 s=2 s=3 a 12 a 23 Framing o 1 o 2 o t Signal Processing Feature vector sequence b (o) b (o) b (o) 1 2 3 = = = o 1 o 2 o 3 o 4 ............... o t a P ( s j | s i ) O ................... − ij t t 1 = = b ( o ) P ( o | s i ) S * s 1 s 2 s 3 s 4 ............... s t i t t t ................... M ∑ = µ Σ c N ( o ; ; ) ik t ik ik = k 1 = * S arg max P ( O | S ) Hidden Markov Model S = * W arg max P ( O | W ) W 2

  3. Hidden Markov Model (HMM) � History – Published in Baum’s papers in late 1960s and early 1970s – Introduced to speech processing by Baker (CMU) and Jelinek (IBM) in the 1970s � Assumption – Speech signal can be characterized as a parametric random process – Parameters can be estimated in a precise, well-defined manner � Three fundamental problems – Evaluation of probability (likelihood) of a sequence of observations given a specific HMM – Determination of a best sequence of model states – Adjustment of model parameters so as to best account for observed signal 3

  4. Several Useful Formulas � Bayes’ Rule : ( ) ( ) ( ) ( ) ( ) ( ) P B A P A P A , B λ P B A , λ P A λ ( ) P A , B ( ) = = = = P A B P A B , λ ( ) ( ) ( ) ( ) P B λ P B λ P B P B ( ) ( ) ( ) ( ) ( ) λ : model describing the probabilit y = = P A , B P B A P A P A B P B � Other ( ) ( ) ∑ ( ) ∑  = P A , B P A B P B if B is disrete  ( ) = P A  all B all B ( ) ( ) ( ) ∫ ∫ =  f A , B dB f A B f B dB if B is continuous  B B if x ,x ,......,x are independen t, 1 2 n ( ) ( ) ( ) ( ) = P x ,x ,......, x P x P x .......P x 1 2 n 1 2 n ( ) ( ) =  P z k q k z : discrete ∑  ( ( ) ) = E q z  k ( ) ( ) f z q z dz z : continuous z  ∫  z z 4

  5. The Markov Chain n = = P ( X , X ,..., X , X ) P ( X | X , X ,..., X ) P ( X , X ,..., X ) P ( X ) ∏ P ( X | X , X ,..., X ) − − − − − 1 2 n 1 n n 1 2 n 1 1 2 n 1 1 i i 1 i 2 1 = i 2 n ∏ = P ( X , X ,..., X ) P ( X ) P ( X | X ) First-order Markov chain − 1 2 n 1 i i 1 = i 2 � An Observable Markov Model – A Markov chain with N states labeled by { 1,…,N }, with the state at time t in the Markov chain denoted as q t , the parameters of a Markov chain can be described as ∑ = N = ∀ a ij =P ( q t = j | q t-1 = i ) 1 ≤ i,j ≤ N ( a 1 all i ) ij j 1 π i = P ( q 1 = i) 1 ≤ i ≤ N N ∑ = π = ( 1 ) i i 1 – The output of the process is the set of states at each time instant t , where each state corresponds to an observable event X i – There is one-to-one correspondence between the observable sequence and (Rabiner 1989) the Markov chain state sequence (observation is deterministic! ) 5

  6. The Markov Chain – Ex 1 � Example 1 : a 3-state Markov Chain λ 0.6 – State 1 generates symbol A only , State 2 generates symbol B only , State 3 generates symbol C only s 1 A   0 . 6 0 . 3 0 . 1 0.3 0.3   = A 0 . 1 0 . 7 0 . 2 0.1 0.1   0.2 0.5 0.7 s 2 s 3   0 . 3 0 . 2 0 . 5   0.2 [ ] π = 0 . 4 0 . 5 0 . 1 C B – Given a sequence of observed symbols O ={CABBCABC}, the only one corresponding state sequence is Q ={ S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 }, and the corresponding probability is P( O | λ )=P( CABBCABC | λ )=P( Q | λ )=P(S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 | λ ) = π ( S 3 ) P ( S 1 |S 3 ) P ( S 2 |S 1 ) P ( S 2 |S 2 ) P ( S 3 |S 2 ) P ( S 1 |S 3 ) P ( S 2 |S 1 ) P ( S 3 |S 2 ) =0.1 � 0.3 � 0.3 � 0.7 � 0.2 � 0.3 � 0.3 � 0.2=0.00002268 6

  7. The Markov Chain – Ex 2 � Example 2: A three-state Markov chain for the Dow Jones Industrial average (Huang et al., 2001) The probability of 5 consecutive up days ( ) ( ) = P 5 consecutiv e up days P 1,1,1,1,1 4 = ( ) = π = × a a a a 0.5 0.6 0.0648 1 11 11 11 11   0.5   ( ) t = π = π 0.2   i   0.3   7

  8. Extension to Hidden Markov Models � HMM: an extended version of Observable Markov Model – The observation is a probabilistic function (discrete or continuous) of a state instead of an one-to-one correspondence of a state – The model is a doubly embedded stochastic process with an underlying stochastic process that is not directly observable (hidden) • What is hidden? The State Sequence! According to the observation sequence, we are not sure which state sequence generates it! 8

  9. Hidden Markov Models – Ex 1 � Example : a 3-state discrete HMM λ 0.6   0 . 6 0 . 3 0 . 1   = A 0 . 1 0 . 7 0 . 2   s 1 { A:.3,B:.2,C:.5 }   0 . 3 0 . 2 0 . 5   0.3 0.3 ( ) ( ) ( ) = = = b A 0 . 3 , b B 0 . 2 , b C 0 . 5 0.1 0.1 1 1 1 0.2 0.5 ( ) ( ) ( ) 0.7 s 2 s 3 = = = b A 0 . 7 , b B 0 . 1 , b C 0 . 2 2 2 2 0.2 ( ) ( ) ( ) = = = b A 0 . 3 , b B 0 . 6 , b C 0 . 1 3 3 3 [ ] { A:.7,B:.1,C:.2 } { A:.3,B:.6,C:.1 } π = 0 . 4 0 . 5 0 . 1 – Given a sequence of observations O ={ABC} , there are 27 possible corresponding state sequences, and therefore the corresponding probability is ( ) ( ) ( ) ( ) 27 27 = = ∑ ∑ P O λ P O , Q λ P O Q , λ P Q λ , Q : state sequ ence i i i i = = i 1 i 1 } ( ) ( ) ( ) ( ) { = = = × × = e.g. when Q S S S , P O Q , λ P A S P B S P C S 0 . 7 0 . 1 0 . 1 0 . 007 i 2 2 3 i 2 2 3 ( ) ) ( ) ( ) ( = π = × × = P Q λ S P S S P S S 0 . 5 0 . 7 0 . 2 0 . 07 i 2 2 2 3 2 9

  10. Hidden Markov Models – Ex 2 Given a three-state Hidden Markov Model for the Dow Jones Industrial average as follows: (Huang et al., 2001) How to find the probability P(up, up, up, up, up| λ )? How to find the optimal state sequence of the model which generates the observation sequence “ up, up, up, up, up”? 10

  11. Elements of an HMM � An HMM is characterized by the following: 1. N , the number of states in the model 2. M , the number of distinct observation symbols per state 3. The state transition probability distribution A ={ a ij }, where a ij =P [ q t+1 =j|q t =i ] , 1 ≤ i,j ≤ N 4. The observation symbol probability distribution in state j , B = { b j ( v k )} , where b j ( v k ) =P [ o t =v k |q t =j ] , 1 ≤ j ≤ N, 1 ≤ k ≤ M 5. The initial state distribution π ={ π i } , where π i =P [ q 1 =i ] , 1 ≤ i ≤ N � For convenience, we usually use a compact notation λ = ( A , B , π ) to indicate the complete parameter set of an HMM – Requires specification of two model parameters ( N and M ) 11

  12. Two Major Assumptions for HMM � First First- -order Markov assumption order Markov assumption � – The state transition depends only on the origin and destination – The state transition probability is time invariant a ij =P [ q t+1 =j|q t =i ] , 1 ≤ i,j ≤ N � Output Output- -independent assumption independent assumption � – The observation is dependent on the state that generates it, not dependent on its neighbor observations 12

  13. Three Basic Problems for HMMs � Given an observation sequence O =( o 1 , o 2 ,…, o T ), and an HMM λ = ( A , B , π ) – Problem 1 : How to efficiently compute P( O | λ ) ? � Evaluation problem – Problem 2 : How to choose an optimal state sequence Q =( q 1 ,q 2 ,……, q T ) which best explains the observations? � Decoding Problem * = λ Q arg max P ( Q , O | ) Q – Problem 3 : How to adjust the model parameter λ =( A , B , π ) to maximize P( O | λ ) ? � Learning/Training Problem 13

  14. Solution to Problem 1 - Direct Evaluation Given O and λ , find P( O | λ )= Pr{observing O given λ } � Evaluating all possible state sequences of length T that generate observation sequence O ( ) ( ) ( ) ( ) ∑ ∑ λ = λ = λ λ P O P O , Q P O Q , P Q all Q all Q ( ) λ � P Q : The probability of the path Q – By first-order Markov assumption T ( ) ( ) ( ) ∏ λ = λ λ = π P Q P q P q q , a a ... a − 1 t t 1 q q q q q q q − 1 1 2 2 3 T 1 T = t 2 ( ) � : The joint output probability along the path Q λ P O Q , – By output-independent assumption T T ( ) ( ) ( ) ∏ ∏ λ = λ = P O Q , P o q , b o t t q t t = = t 1 t 1 14

  15. Solution to Problem 1 - Direct Evaluation (cont.) State s 3 s 3 s 3 s 3 s 3 s 2 s 2 s 2 s 2 s 2 s 1 s 1 s 1 s 1 s 1 1 2 3 T-1 T Time O 1 O 2 O 3 O T-1 O T s i means b j (o t ) has been computed a ij means a ij has been computed 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend