hmms for acoustic modeling part ii
play

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: - PowerPoint PPT Presentation

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs for Acoustic Modeling What are (first-order) HMMs? What are the simplifying assumptions governing HMMs? What are the three fundamental problems


  1. HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi

  2. Recap: HMMs for Acoustic Modeling What are (first-order) HMMs? What are the simplifying assumptions governing HMMs? What are the three fundamental problems related to HMMs? 1. What is the forward algorithm? What is it used to compute? Computing Likelihood: Given an HMM λ = ( A , B ) and an observa- tion sequence O , determine the likelihood P ( O | λ ) . 2. What is the Viterbi algorithm? What is it used to compute? Decoding : Given as input an HMM λ = ( A , B ) and a sequence of ob- servations O = o 1 , o 2 ,..., o T , find the most probable sequence of states Q = q 1 q 2 q 3 ... q T .

  3. Problem 3: Learning in HMMs Given an HMM λ = ( A , B ) and an observation se- Problem 1 (Likelihood): quence O , determine the likelihood P ( O | λ ) . Given an observation sequence O and an HMM λ = Problem 2 (Decoding): ( A , B ) , discover the best hidden state sequence Q . Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B . Learning: Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters A and B . Standard algorithm for HMM training: Forward-backward or Baum-Welch algorithm

  4. Forward and Backward Probabilities Baum-Welch algorithm iteratively estimates transition & observation probabilities and uses these values to derive even better estimates. Require two probabilities to compute estimates for the transition and observation probabilities: 1. Forward probability: Recall α t ( j ) = P ( o 1 , o 2 ... o t , q t = j | λ ) 2. Backward probability: β t ( i ) = P ( o t + 1 , o t + 2 ... o T | q t = i , λ )

  5. Backward probability 1. Initialization: β T ( i ) = 1 , 1 ≤ i ≤ N 2. Recursion N X β t ( i ) = a ij b j ( o t + 1 ) β t + 1 ( j ) , 1 ≤ i ≤ N , 1 ≤ t < T j = 1 3. Termination: N X P ( O | λ ) = π j b j ( o 1 ) β 1 ( j ) j = 1

  6. Visualising backward probability computation β t+1 (N) β t (i)= Σ j β t+1 (j) a ij b j (o t+1 ) q N q N a iN q i β t+1 (3) a i3 q 3 q 3 b N (o t+1 ) a i2 β t+1 (2) b 3 (o t+1 ) a i1 q 2 q 2 b 2 (o t+1 ) q 2 β t+1 (1) b 1 (o t+1 ) q 1 q 1 q 1 ot o t-1 ot+1 The computation of β i by summing all the successive values β Figure A.11 j

  7. 1. Baum-Welch: Estimating ! a ij ξ t ( i, j ) We need to define to estimate a ij where ξ t ( i , j ) = P ( q t = i , q t + 1 = j | O , λ ) ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) first compute a probability which is similar which works out to be P N j = 1 α t ( j ) β t ( j ) P T − 1 t = 1 ξ t ( i , j ) a ij = Then, ˆ P T − 1 P N k = 1 ξ t ( i , k ) t = 1 si sj a ij b j (o t+1 ) α t (i) β t+1 (j) ot-1 ot ot+1 ot+2

  8. 2. Baum-Welch: Estimating ! b j ( v k ) γ t ( j ) We need to define to estimate b j (v k ) where γ t ( j ) = P ( q t = j | O , λ ) State occupancy 
 γ t ( j ) = α t ( j ) β t ( j ) probability which works out to be P ( O | λ ) P T t = 1 s . t . O t = v k γ t ( j ) ˆ Then, for discrete outputs b j ( v k ) = P T t = 1 γ t ( j ) in Eq. 9.38 and Eq. 9.43 to re-estimate sj α t (j) β t (j) ot-1 ot ot+1

  9. Bringing it all together: Baum-Welch Estimating HMM parameters iteratively using the EM algorithm. 
 For each iteration, do: E step: For all time-state pairs, compute the state occupation 
 probabilities 훾 t (j) and ξ t (i, j) M step: Reestimate HMM parameters, i.e. transition probabilities, 
 observation probabilities, based on the estimates derived in the E step

  10. Baum-Welch algorithm (pseudocode) function F ORWARD -B ACKWARD ( observations of len T , output vocabulary V , hidden state set Q ) returns HMM=(A,B) initialize A and B iterate until convergence E-step γ t ( j ) = α t ( j ) β t ( j ) ∀ t and j α T ( q F ) ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∀ t , i , and j α T ( q F ) M-step T − 1 X ξ t ( i , j ) t = 1 a ij = ˆ T − 1 N X X ξ t ( i , k ) t = 1 k = 1 T X γ t ( j ) t = 1 s . t . O t = v k ˆ b j ( v k ) = T X γ t ( j ) t = 1 return A , B

  11. Discrete to continuous outputs We derived Baum-Welch updates for discrete outputs. However, HMMs in acoustic models emit real-valued vectors as observations. Before we understand how Baum-Welch works for acoustic modelling using HMMs, let’s look at an overview of the Expectation Maximization ( EM ) algorithm and establish some notation.

  12. EM Algorithm: Fitting Parameters to Data Observed data: i.i.d samples x i , i =1, …, N N Goal: Find where X arg max L ( θ ) L ( θ ) = log Pr( x i ; θ ) θ i =1 Initial parameters: θ 0 ( x is observed and z is hidden ) Iteratively compute θ l as follows: N X X Q ( θ , θ ` − 1 ) = Pr( z | x i ; θ ` − 1 ) log Pr( x i , z ; θ ) z i =1 θ ` = arg max Q ( θ , θ ` − 1 ) ✓ Estimate θ l cannot get worse over iterations because for all θ : L ( θ ) − L ( θ ` − 1 ) ≥ Q ( θ , θ ` − 1 ) − Q ( θ ` − 1 , θ ` − 1 ) EM is guaranteed to converge to a local optimum or saddle points [Wu83]

  13. Coin example to illustrate EM ������ ������ ������ 휌 1 = Pr ( H ) 휌 2 = Pr ( H ) 휌 3 = Pr ( H ) Repeat: Toss ������� privately 
 if it shows H : 
 Toss ������ twice 
 else 
 Toss ������ twice The following sequence is observed: “ HH, TT, HH, TT, HH ” How do you estimate 휌 1 , 휌 2 and 휌 3 ?

  14. Coin example to illustrate EM Recall, for partially observed data, the log likelihood is given by: N N X X X L ( θ ) = log Pr( x i ; θ ) = log Pr( x i , z ; θ ) z i =1 i =1 where, for the coin example: ∈ X = { HH,HT,TH,TT } • each observation x i ∈ Z = { H,T } • the hidden variable z

  15. Coin example to illustrate EM Recall, for partially observed data, the log likelihood is given by: N N X X X L ( θ ) = log Pr( x i ; θ ) = log Pr( x i , z ; θ ) z i =1 i =1 ������ ������ ������ Pr( x, z ; θ ) = Pr( x | z ; θ ) Pr( z ; θ ) 휌 2 = Pr ( H ) 휌 3 = Pr ( H ) 휌 1 = Pr ( H ) ( if z = H ρ 1 where Pr( z ; θ ) = 1 − ρ 1 if z = T ( ρ h 2 (1 − ρ 2 ) t if z = H Pr( x | z ; θ ) = ρ h 3 (1 − ρ 3 ) t if z = T h : number of heads, t : number of tails

  16. 
 Coin example to illustrate EM Our observed data is: {HH, TT, HH, TT, HH} Let’s use EM to estimate θ = ( 휌 1 , 휌 2 , 휌 3 ) [EM Iteration, E-step] 
 Compute quantities involved in N X X Q ( θ , θ ` − 1 ) = γ ( z, x i ) log Pr( x i , z ; θ ) z i =1 where 훾 ( z , x ) = Pr( z | x ; θ l -1 ) i.e., compute 훾 ( z , x i ) for all z and all i Suppose θ l -1 is 휌 1 = 0.3, 휌 2 = 0.4, 휌 3 = 0.6: What is 훾 ( H, HH )? = 0.16 What is 훾 ( H, TT )? = 0.49

  17. Coin example to illustrate EM Our observed data is: {HH, TT, HH, TT, HH} Let’s use EM to estimate θ = ( 휌 1 , 휌 2 , 휌 3 ) [EM Iteration, M-step] 
 Find θ which maximises N X X Q ( θ , θ ` − 1 ) = γ ( z, x i ) log Pr( x i , z ; θ ) z i =1 P N i =1 γ (H , x i ) ρ 1 = N P N i =1 γ (H , x i ) h i ρ 2 = P N i =1 γ (H , x i )( h i + t i ) P N i =1 γ (T , x i ) h i ρ 3 = P N i =1 γ (T , x i )( h i + t i )

  18. Coin example to illustrate EM 1 This was a very simple HMM 
 휌 1 H (with observations from 2 states) H/ 휌 2 T/1- 휌 2 State remains the same after the first transition 1- 휌 1 1 T γ estimated the distribution of this state H/ 휌 3 T/1- 휌 3 More generally, will need the distribution of the state at each time step EM for general HMMs: Baum-Welch algorithm (1972) 
 ( predates the general formulation of EM (1977))

  19. Baum-Welch Algorithm as EM Observed data: N sequences, x i , i=1…N where x i ∈ V Parameters θ : transition matrix A, observation probabilities B 
 [EM Iteration, E-step] 
 Compute quantities involved in Q ( θ , θ l -1 ) 훾 i,t ( j ) = Pr( z t = j | x i ; θ l -1 ) 
 훏 i,t ( j , k ) = Pr( z t = j, z t+1 = k | x i ; θ l -1 )

  20. <latexit sha1_base64="uVFnsJYIYcB5KF/iC5IN0q4U0tA=">ACY3ichVHLSgMxFM2MWrW+xsdOhGARLdQyUwXdFKpuXIlCq0KnDpk0Y2MzD5I7Yh3mJ925c+N/mD4WgUvBM495x5ucuIngiuw7XfDnJmdK8wvLBaXldW16z1jVsVp5KyFo1FLO9opjgEWsB8HuE8lI6At25/cvhvrdM5OKx1ETBgnrhOQx4gGnBDTlWa9nXvZU6e4jt1AEpq5Kg29jNed/OEKjxsYNlnT4dOjt0XruUK5AfaVs7/nx9R/f0p534596ySXbVHhX8DZwJKaFLXnvXmdmOahiwCKohSbcdOoJMRCZwKlhfdVLGE0D5ZG0NIxIy1clGeV4TzNdHMRSnwjwiP3uyEio1CD09WRIoKemtSH5l9ZOITjtZDxKUmARHS8KUoEhxsPAcZdLRkEMNCBUcn1XTHtERw36W4o6BGf6yb/Bba3qHFVrN8elxvkjgW0jXbRAXLQCWqgS3SNWoiD6NgrBmW8WkumRvm1njUNCaeTfSjzJ0vOj62Q=</latexit> Baum-Welch Algorithm as EM Observed data: N sequences, x i , i=1…N where x i ∈ V Parameters θ : transition matrix A, observation probabilities B 
 [EM Iteration, M-step] 
 Find θ which maximises Q ( θ , θ l -1 ) P N P T i � 1 t =1 ξ i,t ( j, k ) i =1 A j,k = P N P T i � 1 P k 0 ξ i,t ( j, k 0 ) i =1 t =1 P N P t : x it = v γ i,t ( j ) i =1 B j,v = P N P T i t =1 γ i,t ( j ) i =1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend