HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: - PowerPoint PPT Presentation

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi

Recap: HMMs for Acoustic Modeling What are (first-order) HMMs? What are the simplifying assumptions governing HMMs? What are the three fundamental problems related to HMMs? 1. What is the forward algorithm? What is it used to compute? Computing Likelihood: Given an HMM λ = ( A , B ) and an observation sequence O , determine the likelihood P ( O | λ ) . 2. What is the Viterbi algorithm? What is it used to compute? Decoding : Given as input an HMM λ = ( A , B ) and a sequence of observations O = o 1 , o 2 ,..., o T , find the most probable sequence of states Q = q 1 q 2 q 3 ... q T .

Problem 3: Learning in HMMs Given an HMM λ = ( A , B ) and an observation se- Problem 1 (Likelihood): quence O , determine the likelihood P ( O | λ ) . Given an observation sequence O and an HMM λ = Problem 2 (Decoding): ( A , B ) , discover the best hidden state sequence Q . Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B . Learning: Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters A and B . Standard algorithm for HMM training: Forward-backward or Baum-Welch algorithm

Forward and Backward Probabilities Baum-Welch algorithm iteratively estimates transition & observation probabilities and uses these values to derive even better estimates. Require two probabilities to compute estimates for the transition and observation probabilities: 1. Forward probability: Recall α t ( j ) = P ( o 1 , o 2 ... o t , q t = j | λ ) 2. Backward probability: β t ( i ) = P ( o t + 1 , o t + 2 ... o T | q t = i , λ )

Backward probability 1. Initialization: β T ( i ) = 1 , 1 ≤ i ≤ N 2. Recursion N X β t ( i ) = a ij b j ( o t + 1 ) β t + 1 ( j ) , 1 ≤ i ≤ N , 1 ≤ t < T j = 1 3. Termination: N X P ( O | λ ) = π j b j ( o 1 ) β 1 ( j ) j = 1

Visualising backward probability computation β t+1 (N) β t (i)= Σ j β t+1 (j) a ij b j (o t+1 ) q N q N a iN q i β t+1 (3) a i3 q 3 q 3 b N (o t+1 ) a i2 β t+1 (2) b 3 (o t+1 ) a i1 q 2 q 2 b 2 (o t+1 ) q 2 β t+1 (1) b 1 (o t+1 ) q 1 q 1 q 1 ot o t-1 ot+1 The computation of β i by summing all the successive values β Figure A.11 j

1. Baum-Welch: Estimating ! a ij ξ t ( i, j ) We need to define to estimate a ij where ξ t ( i , j ) = P ( q t = i , q t + 1 = j | O , λ ) ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) first compute a probability which is similar which works out to be P N j = 1 α t ( j ) β t ( j ) P T − 1 t = 1 ξ t ( i , j ) a ij = Then, ˆ P T − 1 P N k = 1 ξ t ( i , k ) t = 1 si sj a ij b j (o t+1 ) α t (i) β t+1 (j) ot-1 ot ot+1 ot+2

2. Baum-Welch: Estimating ! b j ( v k ) γ t ( j ) We need to define to estimate b j (v k ) where γ t ( j ) = P ( q t = j | O , λ ) State occupancy   γ t ( j ) = α t ( j ) β t ( j ) probability which works out to be P ( O | λ ) P T t = 1 s . t . O t = v k γ t ( j ) ˆ Then, for discrete outputs b j ( v k ) = P T t = 1 γ t ( j ) in Eq. 9.38 and Eq. 9.43 to re-estimate sj α t (j) β t (j) ot-1 ot ot+1

Bringing it all together: Baum-Welch Estimating HMM parameters iteratively using the EM algorithm.   For each iteration, do: E step: For all time-state pairs, compute the state occupation   probabilities 훾 t (j) and ξ t (i, j) M step: Reestimate HMM parameters, i.e. transition probabilities,   observation probabilities, based on the estimates derived in the E step

Baum-Welch algorithm (pseudocode) function F ORWARD -B ACKWARD ( observations of len T , output vocabulary V , hidden state set Q ) returns HMM=(A,B) initialize A and B iterate until convergence E-step γ t ( j ) = α t ( j ) β t ( j ) ∀ t and j α T ( q F ) ξ t ( i , j ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) ∀ t , i , and j α T ( q F ) M-step T − 1 X ξ t ( i , j ) t = 1 a ij = ˆ T − 1 N X X ξ t ( i , k ) t = 1 k = 1 T X γ t ( j ) t = 1 s . t . O t = v k ˆ b j ( v k ) = T X γ t ( j ) t = 1 return A , B

Discrete to continuous outputs We derived Baum-Welch updates for discrete outputs. However, HMMs in acoustic models emit real-valued vectors as observations. Before we understand how Baum-Welch works for acoustic modelling using HMMs, let’s look at an overview of the Expectation Maximization ( EM ) algorithm and establish some notation.

EM Algorithm: Fitting Parameters to Data Observed data: i.i.d samples x i , i =1, …, N N Goal: Find where X arg max L ( θ ) L ( θ ) = log Pr( x i ; θ ) θ i =1 Initial parameters: θ 0 ( x is observed and z is hidden ) Iteratively compute θ l as follows: N X X Q ( θ , θ ` − 1 ) = Pr( z | x i ; θ ` − 1 ) log Pr( x i , z ; θ ) z i =1 θ ` = arg max Q ( θ , θ ` − 1 ) ✓ Estimate θ l cannot get worse over iterations because for all θ : L ( θ ) − L ( θ ` − 1 ) ≥ Q ( θ , θ ` − 1 ) − Q ( θ ` − 1 , θ ` − 1 ) EM is guaranteed to converge to a local optimum or saddle points [Wu83]

Coin example to illustrate EM �� 휌 1 = Pr ( H ) 휌 2 = Pr ( H ) 휌 3 = Pr ( H ) Repeat: Toss �� privately   if it shows H :   Toss �� twice   else   Toss �� twice The following sequence is observed: “ HH, TT, HH, TT, HH ” How do you estimate 휌 1 , 휌 2 and 휌 3 ?

Coin example to illustrate EM Recall, for partially observed data, the log likelihood is given by: N N X X X L ( θ ) = log Pr( x i ; θ ) = log Pr( x i , z ; θ ) z i =1 i =1 where, for the coin example: ∈ X = { HH,HT,TH,TT } • each observation x i ∈ Z = { H,T } • the hidden variable z

Coin example to illustrate EM Recall, for partially observed data, the log likelihood is given by: N N X X X L ( θ ) = log Pr( x i ; θ ) = log Pr( x i , z ; θ ) z i =1 i =1 �� Pr( x, z ; θ ) = Pr( x | z ; θ ) Pr( z ; θ ) 휌 2 = Pr ( H ) 휌 3 = Pr ( H ) 휌 1 = Pr ( H ) ( if z = H ρ 1 where Pr( z ; θ ) = 1 − ρ 1 if z = T ( ρ h 2 (1 − ρ 2 ) t if z = H Pr( x | z ; θ ) = ρ h 3 (1 − ρ 3 ) t if z = T h : number of heads, t : number of tails

  Coin example to illustrate EM Our observed data is: {HH, TT, HH, TT, HH} Let’s use EM to estimate θ = ( 휌 1 , 휌 2 , 휌 3 ) [EM Iteration, E-step]   Compute quantities involved in N X X Q ( θ , θ ` − 1 ) = γ ( z, x i ) log Pr( x i , z ; θ ) z i =1 where 훾 ( z , x ) = Pr( z | x ; θ l -1 ) i.e., compute 훾 ( z , x i ) for all z and all i Suppose θ l -1 is 휌 1 = 0.3, 휌 2 = 0.4, 휌 3 = 0.6: What is 훾 ( H, HH )? = 0.16 What is 훾 ( H, TT )? = 0.49

Coin example to illustrate EM Our observed data is: {HH, TT, HH, TT, HH} Let’s use EM to estimate θ = ( 휌 1 , 휌 2 , 휌 3 ) [EM Iteration, M-step]   Find θ which maximises N X X Q ( θ , θ ` − 1 ) = γ ( z, x i ) log Pr( x i , z ; θ ) z i =1 P N i =1 γ (H , x i ) ρ 1 = N P N i =1 γ (H , x i ) h i ρ 2 = P N i =1 γ (H , x i )( h i + t i ) P N i =1 γ (T , x i ) h i ρ 3 = P N i =1 γ (T , x i )( h i + t i )

Coin example to illustrate EM 1 This was a very simple HMM   휌 1 H (with observations from 2 states) H/ 휌 2 T/1- 휌 2 State remains the same after the first transition 1- 휌 1 1 T γ estimated the distribution of this state H/ 휌 3 T/1- 휌 3 More generally, will need the distribution of the state at each time step EM for general HMMs: Baum-Welch algorithm (1972)   ( predates the general formulation of EM (1977))

Baum-Welch Algorithm as EM Observed data: N sequences, x i , i=1…N where x i ∈ V Parameters θ : transition matrix A, observation probabilities B   [EM Iteration, E-step]   Compute quantities involved in Q ( θ , θ l -1 ) 훾 i,t ( j ) = Pr( z t = j | x i ; θ l -1 )   훏 i,t ( j , k ) = Pr( z t = j, z t+1 = k | x i ; θ l -1 )

<latexit sha1_base64="uVFnsJYIYcB5KF/iC5IN0q4U0tA=">ACY3ichVHLSgMxFM2MWrW+xsdOhGARLdQyUwXdFKpuXIlCq0KnDpk0Y2MzD5I7Yh3mJ925c+N/mD4WgUvBM495x5ucuIngiuw7XfDnJmdK8wvLBaXldW16z1jVsVp5KyFo1FLO9opjgEWsB8HuE8lI6At25/cvhvrdM5OKx1ETBgnrhOQx4gGnBDTlWa9nXvZU6e4jt1AEpq5Kg29jNed/OEKjxsYNlnT4dOjt0XruUK5AfaVs7/nx9R/f0p534596ySXbVHhX8DZwJKaFLXnvXmdmOahiwCKohSbcdOoJMRCZwKlhfdVLGE0D5ZG0NIxIy1clGeV4TzNdHMRSnwjwiP3uyEio1CD09WRIoKemtSH5l9ZOITjtZDxKUmARHS8KUoEhxsPAcZdLRkEMNCBUcn1XTHtERw36W4o6BGf6yb/Bba3qHFVrN8elxvkjgW0jXbRAXLQCWqgS3SNWoiD6NgrBmW8WkumRvm1njUNCaeTfSjzJ0vOj62Q=</latexit> Baum-Welch Algorithm as EM Observed data: N sequences, x i , i=1…N where x i ∈ V Parameters θ : transition matrix A, observation probabilities B   [EM Iteration, M-step]   Find θ which maximises Q ( θ , θ l -1 ) P N P T i � 1 t =1 ξ i,t ( j, k ) i =1 A j,k = P N P T i � 1 P k 0 ξ i,t ( j, k 0 ) i =1 t =1 P N P t : x it = v γ i,t ( j ) i =1 B j,v = P N P T i t =1 γ i,t ( j ) i =1

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: - PowerPoint PPT Presentation

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs for Acoustic Modeling What are (first-order) HMMs? What are the simplifying assumptions governing HMMs? What are the three fundamental problems

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Standalone Training of Context-Dependent Deep Neural Network Acoustic Models Chao Zhang &

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis Gustav Eje Henter Joint work

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Comparing parallel and sequential Selfish Routing in the Atomic Players setting Pattarawit

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy

Imperfect Information Extensive Form Games CMPUT 654: Modelling Human Strategic Behaviour

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang & Lin Gao Network

Getting started with Isabelle/Isar Makarius Wenzel TU M unchen August 2007 1. Foundations:

CLAIMS DATA SET (MN APCD) Annual Meeting National Association of Health Data Organizations

Are Caste Categories Mis isleading? The Relationship Between Gender and Jati in in Three In

for E34 collaboration 2015/8/11 1. Introduction 2. E34 experiment 3. Status of each

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: - PowerPoint PPT Presentation

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs for Acoustic Modeling What are (first-order) HMMs? What are the simplifying assumptions governing HMMs? What are the three fundamental problems

Acoustic Modeling: Tied-state HMMs &amp; DNN-based models Lecture 7 CS 753 Instructor: Preethi

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Standalone Training of Context-Dependent Deep Neural Network Acoustic Models Chao Zhang &amp;

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis Gustav Eje Henter Joint work

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre &amp; Dance Band &amp;

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Comparing parallel and sequential Selfish Routing in the Atomic Players setting Pattarawit

Approximate Cross-Validation and Dynamic Experiments for Policy Choice Maximilian Kasy

Imperfect Information Extensive Form Games CMPUT 654: Modelling Human Strategic Behaviour

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang &amp; Lin Gao Network

Getting started with Isabelle/Isar Makarius Wenzel TU M unchen August 2007 1. Foundations:

CLAIMS DATA SET (MN APCD) Annual Meeting National Association of Health Data Organizations

Are Caste Categories Mis isleading? The Relationship Between Gender and Jati in in Three In

for E34 collaboration 2015/8/11 1. Introduction 2. E34 experiment 3. Status of each

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi

Standalone Training of Context-Dependent Deep Neural Network Acoustic Models Chao Zhang &

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang & Lin Gao Network