Introduction to Machine Learning CMU-10701 Hidden Markov Models - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing

i.i.d to sequential data  So far we assumed independent, identically distributed data  Sequential (non i.i.d.) data – Time-series data E.g. Speech – Characters in a sentence – Base pairs along a DNA strand 2

Markov Models  Joint distribution of n arbitrary random variables Chain rule  Markov Assumption (m th order) Current observation only depends on past m observations 3

Markov Models  Markov Assumption 1 st order 2 nd order 4

Markov Models # parameters in stationary model  Markov Assumption K-ary variables O(K 2 ) 1 st order m th order O(K m+1 ) O(K n ) n-1 th order ≡ no assumptions – complete (but directed) graph Homogeneous/stationary Markov model (probabilities don’t depend on n) 5

Hidden Markov Models • Distributions that characterize sequential data with few parameters but are not limited by strong Markov assumptions. S 1 S 2 S T-1 S T O 2 O T-1 O T O 1 Observation space O t ϵ {y 1 , y 2 , …, y K } S t ϵ {1, …, I } Hidden states 6

Hidden Markov Models S 1 S 2 S T-1 S T O 2 O T-1 O 1 O T 1 st order Markov assumption on hidden states {S t } t = 1, …, T (can be extended to higher order). Note: O t depends on all previous observations {O t-1 ,…O 1 } 7

Hidden Markov Models • Parameters – stationary/homogeneous markov model (independent of time t) S 1 S 2 S T-1 S T Initial probabilities p(S 1 = i) = π i O 2 O T-1 O T O 1 Transition probabilities p(S t = j|S t-1 = i) = p ij Emission probabilities p(O t = y|S t = i) = 8

HMM Example • The Dishonest Casino A casino has two dices: Fair dice P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded dice P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = ½ Casino player switches back-&- forth between fair and loaded die with 5% probability 9

HMM Problems 10

HMM Example L F F F L L L F 11

State Space Representation  Switch between F and L with 5% probability 0.05 0.95 0.95 L F 0.05  HMM Parameters Initial probs P(S 1 = L ) = 0.5 = P(S 1 = F ) Transition probs P(S t = L / F |S t-1 = L / F ) = 0.95 P(S t = F / L |S t-1 = L / F ) = 0.05 Emission probabilities P(O t = y|S t = F ) = 1/6 y = 1,2,3,4,5,6 P(O t = y|S t = L ) = 1/10 y = 1,2,3,4,5 = 1/2 y = 6 12

Three main problems in HMMs • Evaluation – Given HMM parameters & observation seqn find prob of observed sequence • Decoding – Given HMM parameters & observation seqn find most probable sequence of hidden states • Learning – Given HMM with unknown parameters and observation sequence find parameters that maximize likelihood of observed data 13

HMM Algorithms • Evaluation – What is the probability of the observed sequence? Forward Algorithm • Decoding – What is the probability that the third roll was loaded given the observed sequence? Forward-Backward Algorithm – What is the most likely die sequence given the observed sequence? Viterbi Algorithm • Learning – Under what parameterization is the observed sequence most probable? Baum-Welch Algorithm (EM) 14

Evaluation Problem • Given HMM parameters & observation sequence S T-1 S 1 S 2 S T find probability of observed sequence O T-1 O T O 1 O 2 requires summing over all possible hidden state values at all times – K T exponential # terms! Instead: Compute recursively k α T 15

Forward Probability k Compute forward probability recursively over t α t S t-1 S t S 1 O t-1 O t O 1 Introduce S t-1 . Chain rule . . Markov assumption 16

Forward Algorithm k for all k, t using dynamic programming: Can compute α t k = p(O 1 |S 1 = k) p(S 1 = k) • Initialize: α 1 for all k • Iterate: for t = 2, …, T i k = p(O t |S t = k) ∑ α t-1 p(S t = k|S t-1 = i) for all k α t i k • Termination: = ∑ α T k 17

Decoding Problem 1 • Given HMM parameters & observation sequence find probability that hidden state at time t was k Compute recursively k k β t α t S t-1 S t S t+1 S 1 S T-1 S T O t-1 O t O t+1 O 1 O T-1 O T 18

Backward Probability k Compute forward probability recursively over t β t S t S t+1 S t+2 S T O t O t+1 O t+2 O T Introduce S t+1 . Chain rule . . Markov assumption 19

Backward Algorithm k for all k, t using dynamic programming: Can compute β t k = 1 • Initialize: β T for all k • Iterate: for t = T-1, …, 1 for all k • Termination: 20

Most likely state vs. Most likely sequence  Most likely state assignment at time t E.g. Which die was most likely used by the casino in the third roll given the observed sequence?  Most likely assignment of state sequence E.g. What was the most likely sequence of die rolls used by the casino given the observed sequence? MLA of x? Not the same solution ! MLA of (x,y)? 21

Decoding Problem 2 • Given HMM parameters & observation sequence find most likely assignment of state sequence k V T Compute recursively k V T - probability of most likely sequence of states ending at state S T = k 22

Viterbi Decoding k V t Compute probability recursively over t S t-1 S t S 1 Bayes rule . . O t-1 O t O 1 Markov assumption . 23

Viterbi Algorithm k for all k, t using dynamic programming: Can compute V t k = p(O 1 |S 1 =k)p(S 1 = k) • Initialize: V 1 for all k • Iterate: for t = 2, …, T for all k • Termination: Traceback: 24

Computational complexity • What is the running time for Forward, Backward, Viterbi? O(K 2 T) linear in T instead of O(K T ) exponential in T! 25

Learning Problem • Given HMM with unknown parameters and observation sequence find parameters that maximize likelihood of observed data But likelihood doesn’t factorize since observations not i.i.d. hidden variables – state sequence EM (Baum-Welch) Algorithm: E-step – Fix parameters, find expected state assignments M-step – Fix expected state assignments, update parameters 26

Baum-Welch (EM) Algorithm • Start with random initialization of parameters • E-step – Fix parameters, find expected state assignments Forward-Backward algorithm 27

Baum-Welch (EM) Algorithm • Start with random initialization of parameters • E-step = expected # times in state i -1 = expected # transitions from state i = expected # transitions from state i to j • M-step 28

Some connections • HMM vs Linear Dynamical Systems (Kalman Filters) HMM: States are Discrete Observations Discrete or Continuous Linear Dynamical Systems: Observations and States are multi- variate Gaussians whose means are linear functions of their parent states (see Bishop: Sec 13.3) 29

HMMs.. What you should know • Useful for modeling sequential data with few parameters using discrete hidden states that satisfy Markov assumption • Representation - initial prob, transition prob, emission prob, State space representation • Algorithms for inference and learning in HMMs – Computing marginal likelihood of the observed sequence: forward algorithm – Predicting a single hidden state: forward-backward – Predicting an entire sequence of hidden states: viterbi – Learning HMM parameters: an EM algorithm known as Baum- Welch 30

Introduction to Machine Learning CMU-10701 Hidden Markov Models - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.)

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti

CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos

Substitution = 1:A G 2:C A Mutation followed GAGATC by Fixation 3:G A 6:C T 5:T C

Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of

Community Development Services An Economic Development Consulting Firm 3895 Main Street,

Week 5: Manipulate, Facet, Reduce Demo: Text Tamara Munzner Department of Computer Science

The Habitable Zone (HZ) of our Solar System today Impact Frustration of the Origin of Life Earth

IEA Bioenergy Task 34 overview Direct Thermochemical Liquefaction Current status and next

What Weve Learned from Users Evaluation, session 11 CS6200: Information Retrieval Users vs.

FOR SOLID WASTE MANAGEMENT A Feasibility Study of the Implementation of Contemporary Waste

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Machine Learning CMU-10701 Hidden Markov Models - PowerPoint PPT Presentation

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.)

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos &amp; Aarti

CMU-10701 Support Vector Machines Barnabs Pczos &amp; Aarti Singh 2014 Spring

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos &amp; Aarti Singh

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos &amp; Alex

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos &amp; Aarti

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &amp;

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos &amp; Aarti Singh

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos

Substitution = 1:A G 2:C A Mutation followed GAGATC by Fixation 3:G A 6:C T 5:T C

Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of

Community Development Services An Economic Development Consulting Firm 3895 Main Street,

Week 5: Manipulate, Facet, Reduce Demo: Text Tamara Munzner Department of Computer Science

The Habitable Zone (HZ) of our Solar System today Impact Frustration of the Origin of Life Earth

IEA Bioenergy Task 34 overview Direct Thermochemical Liquefaction Current status and next

What Weve Learned from Users Evaluation, session 11 CS6200: Information Retrieval Users vs.

FOR SOLID WASTE MANAGEMENT A Feasibility Study of the Implementation of Contemporary Waste

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti

CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh