Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner - PowerPoint PPT Presentation

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3. L. R. Rabiner, (1989) “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 1

Speech Recognition - Acoustic Processing a 11 a 22 a 33 Speech Waveform s=1 s=2 s=3 a 12 a 23 Framing o 1 o 2 o t Signal Processing Feature vector sequence b (o) b (o) b (o) 1 2 3 = = = o 1 o 2 o 3 o 4 ............... o t a P ( s j | s i ) O ................... − ij t t 1 = = b ( o ) P ( o | s i ) S * s 1 s 2 s 3 s 4 ............... s t i t t t ................... M ∑ = µ Σ c N ( o ; ; ) ik t ik ik = k 1 = * S arg max P ( O | S ) Hidden Markov Model S = * W arg max P ( O | W ) W 2

Hidden Markov Model (HMM) � History – Published in Baum’s papers in late 1960s and early 1970s – Introduced to speech processing by Baker (CMU) and Jelinek (IBM) in the 1970s � Assumption – Speech signal can be characterized as a parametric random process – Parameters can be estimated in a precise, well-defined manner � Three fundamental problems – Evaluation of probability (likelihood) of a sequence of observations given a specific HMM – Determination of a best sequence of model states – Adjustment of model parameters so as to best account for observed signal 3

Several Useful Formulas � Bayes’ Rule : ( ) ( ) ( ) ( ) ( ) ( ) P B A P A P A , B λ P B A , λ P A λ ( ) P A , B ( ) = = = = P A B P A B , λ ( ) ( ) ( ) ( ) P B λ P B λ P B P B ( ) ( ) ( ) ( ) ( ) λ : model describing the probabilit y = = P A , B P B A P A P A B P B � Other ( ) ( ) ∑ ( ) ∑  = P A , B P A B P B if B is disrete  ( ) = P A  all B all B ( ) ( ) ( ) ∫ ∫ =  f A , B dB f A B f B dB if B is continuous  B B if x ,x ,......,x are independen t, 1 2 n ( ) ( ) ( ) ( ) = P x ,x ,......, x P x P x .......P x 1 2 n 1 2 n ( ) ( ) =  P z k q k z : discrete ∑  ( ( ) ) = E q z  k ( ) ( ) f z q z dz z : continuous z  ∫  z z 4

The Markov Chain n = = P ( X , X ,..., X , X ) P ( X | X , X ,..., X ) P ( X , X ,..., X ) P ( X ) ∏ P ( X | X , X ,..., X ) − − − − − 1 2 n 1 n n 1 2 n 1 1 2 n 1 1 i i 1 i 2 1 = i 2 n ∏ = P ( X , X ,..., X ) P ( X ) P ( X | X ) First-order Markov chain − 1 2 n 1 i i 1 = i 2 � An Observable Markov Model – A Markov chain with N states labeled by { 1,…,N }, with the state at time t in the Markov chain denoted as q t , the parameters of a Markov chain can be described as ∑ = N = ∀ a ij =P ( q t = j | q t-1 = i ) 1 ≤ i,j ≤ N ( a 1 all i ) ij j 1 π i = P ( q 1 = i) 1 ≤ i ≤ N N ∑ = π = ( 1 ) i i 1 – The output of the process is the set of states at each time instant t , where each state corresponds to an observable event X i – There is one-to-one correspondence between the observable sequence and (Rabiner 1989) the Markov chain state sequence (observation is deterministic! ) 5

The Markov Chain – Ex 1 � Example 1 : a 3-state Markov Chain λ 0.6 – State 1 generates symbol A only , State 2 generates symbol B only , State 3 generates symbol C only s 1 A   0 . 6 0 . 3 0 . 1 0.3 0.3   = A 0 . 1 0 . 7 0 . 2 0.1 0.1   0.2 0.5 0.7 s 2 s 3   0 . 3 0 . 2 0 . 5   0.2 [ ] π = 0 . 4 0 . 5 0 . 1 C B – Given a sequence of observed symbols O ={CABBCABC}, the only one corresponding state sequence is Q ={ S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 }, and the corresponding probability is P( O | λ )=P( CABBCABC | λ )=P( Q | λ )=P(S 3 S 1 S 2 S 2 S 3 S 1 S 2 S 3 | λ ) = π ( S 3 ) P ( S 1 |S 3 ) P ( S 2 |S 1 ) P ( S 2 |S 2 ) P ( S 3 |S 2 ) P ( S 1 |S 3 ) P ( S 2 |S 1 ) P ( S 3 |S 2 ) =0.1 � 0.3 � 0.3 � 0.7 � 0.2 � 0.3 � 0.3 � 0.2=0.00002268 6

The Markov Chain – Ex 2 � Example 2: A three-state Markov chain for the Dow Jones Industrial average (Huang et al., 2001) The probability of 5 consecutive up days ( ) ( ) = P 5 consecutiv e up days P 1,1,1,1,1 4 = ( ) = π = × a a a a 0.5 0.6 0.0648 1 11 11 11 11   0.5   ( ) t = π = π 0.2   i   0.3   7

Extension to Hidden Markov Models � HMM: an extended version of Observable Markov Model – The observation is a probabilistic function (discrete or continuous) of a state instead of an one-to-one correspondence of a state – The model is a doubly embedded stochastic process with an underlying stochastic process that is not directly observable (hidden) • What is hidden? The State Sequence! According to the observation sequence, we are not sure which state sequence generates it! 8

Hidden Markov Models – Ex 1 � Example : a 3-state discrete HMM λ 0.6   0 . 6 0 . 3 0 . 1   = A 0 . 1 0 . 7 0 . 2   s 1 { A:.3,B:.2,C:.5 }   0 . 3 0 . 2 0 . 5   0.3 0.3 ( ) ( ) ( ) = = = b A 0 . 3 , b B 0 . 2 , b C 0 . 5 0.1 0.1 1 1 1 0.2 0.5 ( ) ( ) ( ) 0.7 s 2 s 3 = = = b A 0 . 7 , b B 0 . 1 , b C 0 . 2 2 2 2 0.2 ( ) ( ) ( ) = = = b A 0 . 3 , b B 0 . 6 , b C 0 . 1 3 3 3 [ ] { A:.7,B:.1,C:.2 } { A:.3,B:.6,C:.1 } π = 0 . 4 0 . 5 0 . 1 – Given a sequence of observations O ={ABC} , there are 27 possible corresponding state sequences, and therefore the corresponding probability is ( ) ( ) ( ) ( ) 27 27 = = ∑ ∑ P O λ P O , Q λ P O Q , λ P Q λ , Q : state sequ ence i i i i = = i 1 i 1 } ( ) ( ) ( ) ( ) { = = = × × = e.g. when Q S S S , P O Q , λ P A S P B S P C S 0 . 7 0 . 1 0 . 1 0 . 007 i 2 2 3 i 2 2 3 ( ) ) ( ) ( ) ( = π = × × = P Q λ S P S S P S S 0 . 5 0 . 7 0 . 2 0 . 07 i 2 2 2 3 2 9

Hidden Markov Models – Ex 2 Given a three-state Hidden Markov Model for the Dow Jones Industrial average as follows: (Huang et al., 2001) How to find the probability P(up, up, up, up, up| λ )? How to find the optimal state sequence of the model which generates the observation sequence “ up, up, up, up, up”? 10

Elements of an HMM � An HMM is characterized by the following: 1. N , the number of states in the model 2. M , the number of distinct observation symbols per state 3. The state transition probability distribution A ={ a ij }, where a ij =P [ q t+1 =j|q t =i ] , 1 ≤ i,j ≤ N 4. The observation symbol probability distribution in state j , B = { b j ( v k )} , where b j ( v k ) =P [ o t =v k |q t =j ] , 1 ≤ j ≤ N, 1 ≤ k ≤ M 5. The initial state distribution π ={ π i } , where π i =P [ q 1 =i ] , 1 ≤ i ≤ N � For convenience, we usually use a compact notation λ = ( A , B , π ) to indicate the complete parameter set of an HMM – Requires specification of two model parameters ( N and M ) 11

Two Major Assumptions for HMM � First First- -order Markov assumption order Markov assumption � – The state transition depends only on the origin and destination – The state transition probability is time invariant a ij =P [ q t+1 =j|q t =i ] , 1 ≤ i,j ≤ N � Output Output- -independent assumption independent assumption � – The observation is dependent on the state that generates it, not dependent on its neighbor observations 12

Three Basic Problems for HMMs � Given an observation sequence O =( o 1 , o 2 ,…, o T ), and an HMM λ = ( A , B , π ) – Problem 1 : How to efficiently compute P( O | λ ) ? � Evaluation problem – Problem 2 : How to choose an optimal state sequence Q =( q 1 ,q 2 ,……, q T ) which best explains the observations? � Decoding Problem * = λ Q arg max P ( Q , O | ) Q – Problem 3 : How to adjust the model parameter λ =( A , B , π ) to maximize P( O | λ ) ? � Learning/Training Problem 13

Solution to Problem 1 - Direct Evaluation Given O and λ , find P( O | λ )= Pr{observing O given λ } � Evaluating all possible state sequences of length T that generate observation sequence O ( ) ( ) ( ) ( ) ∑ ∑ λ = λ = λ λ P O P O , Q P O Q , P Q all Q all Q ( ) λ � P Q : The probability of the path Q – By first-order Markov assumption T ( ) ( ) ( ) ∏ λ = λ λ = π P Q P q P q q , a a ... a − 1 t t 1 q q q q q q q − 1 1 2 2 3 T 1 T = t 2 ( ) � : The joint output probability along the path Q λ P O Q , – By output-independent assumption T T ( ) ( ) ( ) ∏ ∏ λ = λ = P O Q , P o q , b o t t q t t = = t 1 t 1 14

Solution to Problem 1 - Direct Evaluation (cont.) State s 3 s 3 s 3 s 3 s 3 s 2 s 2 s 2 s 2 s 2 s 1 s 1 s 1 s 1 s 1 1 2 3 T-1 T Time O 1 O 2 O 3 O T-1 O T s i means b j (o t ) has been computed a ij means a ij has been computed 15

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner - PowerPoint PPT Presentation

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter 6 2. X. Huang et. al., (2001) Spoken Language Processing, Chapter 8 3. L. R. Rabiner, (1989) A Tutorial on

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

Products of free spaces and applications Pedro L. Kaufmann I BWB - Maresias 2014 Pedro L.

? ? Computer (finite-state) Linguists Scientists string rewrites finite-state (equivalent)

Mutual Information Example - SSD 5 x 10 6 10 20 5 30 4 40 50 3 60 2 70 1 80 R I

{Total Aerosol Carbon / Sulfate} in the Free Troposphere at MLO Barry Huebert, Steve Howell, John

Disjunctive cuts in branch-and-but-and-price algorithms Application to the capacitated vehicle

CSSE463: Image Recognition Day 31 Today: Bayesian classifiers Questions? Bayesian

CSSE463: Image Recognition Day 17 Today: Bayesian classifiers Tomorrow: Lightning talks

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao