The Structure of Hidden Markov Models Have N states, states 1 . . . - PowerPoint PPT Presentation

The Structure of Hidden Markov Models • Have N states, states 1 . . . N • Without loss of generality, take N to be the final or stop state 6.864 (Fall 07) • Have an alphabet Σ . For example Σ = { a, b } The EM Algorithm Part II • Parameter π i for i = 1 . . . N is probability of starting in state i • Parameter a i,j for i = 1 . . . ( N − 1) , and j = 1 . . . N is probability of state j following state i • Parameter b i ( o ) for i = 1 . . . ( N − 1) , and o ∈ Σ is probability of state i emitting symbol o 1 3 Overview An Example • Take N = 3 states. States are { 1 , 2 , 3 } . Final state is state 3 . • Hidden Markov models • Alphabet Σ = { the, dog } . • The EM algorithm in general form • Distribution over initial state is π 1 = 1 . 0 , π 2 = 0 , π 3 = 0 . • Products of multinomial (PM) models • Parameters a i,j are j=1 j=2 j=3 • The EM algorithm for PM models i=1 0.5 0.5 0 i=2 0 0.5 0.5 • The EM algorithm for hidden markov models (dynamic • Parameters b i ( o ) are programming) o=the o=dog i=1 0.9 0.1 i=2 0.1 0.9 2 4

A Generative Process Hidden Markov Models • Pick the start state s 1 to be state i for i = 1 . . . N with • An HMM specifies a probability for each possible ( x, y ) pair, probability π i . where x is a sequence of symbols drawn from Σ , and y is a sequence of states drawn from the integers 1 . . . ( N − 1) . The sequences x and y are restricted to have the same length. • Set t = 1 • E.g., say we have an HMM with N = 3 , Σ = { a, b } , and with • Repeat while current state s t is not the stop state ( N ): some choice of the parameters Θ . Take x = � a, a, b, b � and – Emit a symbol o t ∈ Σ with probability b s t ( o t ) y = � 1 , 2 , 2 , 1 � . Then in this case, – Pick the next state s t +1 as state j with probability a s t ,j . P ( x, y | Θ) = π 1 a 1 , 2 a 2 , 2 a 2 , 1 a 1 , 3 b 1 ( a ) b 2 ( a ) b 2 ( b ) b 1 ( b ) – t = t + 1 5 7 Probabilities Over Sequences Hidden Markov Models • An output sequence is a sequence of observations o 1 . . . o T In general, if we have the sequence x = x 1 , x 2 , . . . x n where where each o i ∈ Σ each x j ∈ Σ , and the sequence y = y 1 , y 2 , . . . y n where each e.g. the dog the dog dog the y j ∈ 1 . . . ( N − 1) , then • A state sequence is a sequence of states s 1 . . . s T where each n n � � s i ∈ { 1 . . . N } P ( x, y | Θ) = π y 1 a y n ,N a y j − 1 ,y j b y j ( x j ) e.g. 1 2 1 2 2 1 j =2 j =1 • HMM defines a probability for each state/output sequence pair e.g. the/1 dog/2 the/1 dog/2 the/2 dog/1 has probability π 1 b 1 ( the ) a 1 , 2 b 2 ( dog ) a 2 , 1 b 1 ( the ) a 1 , 2 b 2 ( dog ) a 2 , 2 b 2 ( the ) a 2 , 1 b 1 ( dog ) a 1 , 3 6 8

A Hidden Variable Problem Overview • We have an HMM with N = 3 , Σ = { e, f, g, h } • Hidden Markov models • We see the following output sequences in training data • The EM algorithm in general form e g e h • Products of multinomial (PM) models f h f g • The EM algorithm for PM models • The EM algorithm for hidden markov models (dynamic • How would you choose the parameter values for π i , a i,j , and programming) b i ( o ) ? 9 11 Another Hidden Variable Problem EM: the Basic Set-up • We have some data points—a “sample”— x 1 , x 2 , . . . x m . • We have an HMM with N = 3 , Σ = { e, f, g, h } • For example, each x i might be a sentence such as “the • We see the following output sequences in training data dog slept”: this will be the case in EM applied to hidden Markov models (HMMs) or probabilistic context-free- e g h (Note that in this case each x i is a grammars (PCFGs). e h sequence , which we will sometimes write x i 1 , x i 2 , . . . x i n i where f h g n i is the length of the sequence.) f g g e h • Or in the three coins example (see the lecture notes), each x i might be a sequence of three coin tosses, such as HHH , THT , or TTT . • How would you choose the parameter values for π i , a i,j , and b i ( o ) ? 10 12

• Given the sample x 1 , x 2 , . . . x m , we define the likelihood as • We have a parameter vector Θ . For example, see the description of HMMs in the previous section. As another m m example, in a PCFG, Θ would contain the probability P ( α → L ′ (Θ) = � P ( x i | Θ) = � � P ( x i , y | Θ) β | α ) for every rule expansion α → β in the context-free i =1 i =1 y grammar within the PCFG. and we define the log-likelihood as m m L (Θ) = log L ′ (Θ) = log P ( x i | Θ) = P ( x i , y | Θ) � � � log y i =1 i =1 13 15 • We have a model P ( x, y | Θ) : A function that for any x, y, Θ • The maximum-likelihood estimation problem is to find triple returns a probability, which is the probability of seeing Θ ML = arg max Θ ∈ Ω L (Θ) x and y together given parameter settings Θ . • This model defines a joint distribution over x and y , but that we where Ω is a parameter space specifying the set of allowable can also derive a marginal distribution over x alone, defined parameter settings. In the HMM example, Ω would enforce as the restrictions � N j =1 π j = 1 , for all j = 1 . . . ( N − 1) , � P ( x | Θ) = P ( x, y | Θ) � N k =1 a j,k = 1 , and for all j = 1 . . . ( N − 1) , � o ∈ Σ b j ( o ) = 1 . y 14 16

The EM Algorithm Overview • Θ t is the parameter vector at t ’th iteration • Hidden Markov models • Choose Θ 0 (at random, or using various heuristics) • The EM algorithm in general form • Products of multinomial (PM) models • Iterative procedure is defined as Θ t = argmax Θ Q (Θ , Θ t − 1 ) • The EM algorithm for PM models where • The EM algorithm for hidden markov models (dynamic Q (Θ , Θ t − 1 ) = P ( y | x i , Θ t − 1 ) log P ( x i , y | Θ) � � programming) i y ∈Y 17 19 The EM Algorithm Products of Multinomial (PM) Models t = argmax Θ Q (Θ , Θ t − 1 ) , where • Iterative procedure is defi ned as Θ • In a PCFG, each sample point x is a sentence, and each y is a � � Q (Θ , Θ t − 1 ) = P ( y | x i , Θ t − 1 ) log P ( x i , y | Θ) possible parse tree for that sentence. We have i y ∈Y n � P ( x, y | Θ) = P ( α i → β i | α i ) • Key points: i =1 – Intuition: fi ll in hidden variables y according to P ( y | x i , Θ) assuming that ( x, y ) contains the n context-free rules α i → β i – EM is guaranteed to converge to a local maximum, or saddle-point, for i = 1 . . . n . of the likelihood function – In general, if � argmax Θ log P ( x i , y i | Θ) • For example, if ( x, y ) contains the rules S → NP VP , i NP → Jim , and VP → sleeps , then has a simple (analytic) solution, then P ( x, y | Θ) = P ( S → NP VP | S ) × P ( NP → Jim | NP ) × P ( VP → sleeps | VP ) � � P ( y | x i , Θ t − 1 ) log P ( x i , y | Θ) argmax Θ i y also has a simple (analytic) solution. 18 20

Products of Multinomial (PM) Models Overview • HMMs define a model with a similar form. Recall the • Hidden Markov models example in the section on HMMs, where we had the following probability for a particular ( x, y ) pair: • The EM algorithm in general form P ( x, y | Θ) = π 1 a 1 , 2 a 2 , 2 a 2 , 1 a 1 , 3 b 1 ( a ) b 2 ( a ) b 2 ( b ) b 1 ( b ) • Products of multinomial (PM) models • The EM algorithm for PM models • The EM algorithm for hidden markov models (dynamic programming) 21 23 Products of Multinomial (PM) Models The EM Algorithm for PM Models • We will use Θ t to denote the parameter values at the t ’th • In both HMMs and PCFGs, the model can be written in the iteration of the algorithm. following form Θ Count ( x,y,r ) � P ( x, y | Θ) = r • In the initialization step, some choice for initial parameter r =1 ... | Θ | settings Θ 0 is made. Here: • The algorithm then defines an iterative sequence of parameters – Θ r for r = 1 . . . | Θ | is the r ’th parameter in the model Θ 0 , Θ 1 , . . . , Θ T , before returning Θ T as the final parameter – Count ( x, y, r ) for r = 1 . . . | Θ | is a count corresponding settings. to how many times Θ r is seen in the expression for P ( x, y | Θ) . • Crucial detail: deriving Θ t from Θ t − 1 • We will refer to any model that can be written in this form as a product of multinomials (PM) model. 22 24

The Structure of Hidden Markov Models Have N states, states 1 . . . - PowerPoint PPT Presentation

The Structure of Hidden Markov Models Have N states, states 1 . . . N Without loss of generality, take N to be the final or stop state 6.864 (Fall 07) Have an alphabet . For example = { a, b } The EM Algorithm Part II

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

QA for Cervix Cancer Trials in Asia Sang Young Ryu, MD Dept. Gynecologic Oncology Korea Cancer

Information and Propaganda History of Information 103 Geoff Nunberg April 21, 2009

In Soviet Russia, Film Edit You. A Time of Conflict! 1914-18 WWI 1917

Sentiment analysis CS440 Positive or negative movie review? unbelievably disappointing

Recognize common presentations of pediatric orthopedic emergencies Practice evidence

Imple me ntation of Pe diatr ic Ce r vic al Spine Cle ar anc e Guide line s at a Combine d T

Metrics that Matter Identifying KPIs & tracking progress toward extraordinary AP THE BUXE

Confounders and Corfield: Back to the Future 12 July, 2018 0G 2018 ICOTS-10 1 0G 2018

The Structure of Hidden Markov Models Have N states, states 1 . . . - PowerPoint PPT Presentation

The Structure of Hidden Markov Models Have N states, states 1 . . . N Without loss of generality, take N to be the final or stop state 6.864 (Fall 07) Have an alphabet . For example = { a, b } The EM Algorithm Part II

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

QA for Cervix Cancer Trials in Asia Sang Young Ryu, MD Dept. Gynecologic Oncology Korea Cancer

Information and Propaganda History of Information 103 Geoff Nunberg April 21, 2009

In Soviet Russia, Film Edit You. A Time of Conflict! 1914-18 WWI 1917

Sentiment analysis CS440 Positive or negative movie review? unbelievably disappointing

Recognize common presentations of pediatric orthopedic emergencies Practice evidence

Imple me ntation of Pe diatr ic Ce r vic al Spine Cle ar anc e Guide line s at a Combine d T

Metrics that Matter Identifying KPIs &amp; tracking progress toward extraordinary AP THE BUXE

Confounders and Corfield: Back to the Future 12 July, 2018 0G 2018 ICOTS-10 1 0G 2018

Metrics that Matter Identifying KPIs & tracking progress toward extraordinary AP THE BUXE