CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld - PDF document

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington [Many of these slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Hidden Markov Model: Example R t-1 P(R t | R t-1 ) P(R 1 ) t 0.7 0.6 f 0.1 R t P(U t | R t ) t 0.9 f 0.2 § An HMM is defined by: § Initial distribution: § Transitions: § Emissions: 1

Filtering (aka Monitoring) § Filtering, or monitoring, is the task of tracking the distribution B(X) (called “the belief state”) over time § We start with B 0 (X) in an initial setting, usually uniform § We update B t (X) computing B t+1 (X) 1. As time passes, and using prob model of how ghosts move 2. As we get observations using prob model of how noisy sensors work Passage of Time § Assume we have current belief P(X | evidence to date) X 1 X 2 § Then, after one time step passes: X P ( X t +1 | e 1: t ) = P ( X t +1 , x t | e 1: t ) x t § Or compactly: X = P ( X t +1 | x t , e 1: t ) P ( x t | e 1: t ) x t X B 0 ( X t +1 ) = P ( X 0 | x t ) B ( x t ) X = P ( X t +1 | x t ) P ( x t | e 1: t ) x t x t § Basic idea: beliefs get “pushed” through the transitions § With the “B” notation, we have to be careful about what time step t the belief is about, and what evidence it includes 2

Observation X 1 § Assume we have current belief P(X | previous evidence): E 1 B 0 ( X t +1 ) = P ( X t +1 | e 1: t ) § Then, after evidence comes in: P ( X t +1 | e 1: t +1 ) = P ( X t +1 , e t +1 | e 1: t ) /P ( e t +1 | e 1: t ) Defn cond prob = P ( e t +1 | e 1: t , X t +1 ) P ( X t +1 | e 1: t ) t ) /P ( e t +1 | e 1: t ) Defn cond prob = P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) t ) /P ( e t +1 | e 1: t ) Independence § Or, compactly: § Basic idea: beliefs “reweighted” by likelihood of evidence B ( X t +1 ) = +1 P ( e t +1 | X t +1 ) B 0 ( X t +1 ) t ) /P ( e t +1 | e 1: t ) § Unlike passage of time, we have to normalize Observation § Assume we have current belief P(X | previous evidence): X 1 B 0 ( X t +1 ) = P ( X t +1 | e 1: t ) § Then, after evidence comes in: E 1 P ( X t +1 | e 1: t +1 ) = P ( X t +1 , e t +1 | e 1: t ) /P ( e t +1 | e 1: t ) ∝ X t +1 P ( X t +1 , e t +1 | e 1: t ) = P ( e t +1 | e 1: t , X t +1 ) P ( X t +1 | e 1: t ) = P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) § Basic idea: beliefs “reweighted” § Or, compactly: by likelihood of evidence § Unlike passage of time, we have B ( X t +1 ) ∝ X t +1 P ( e t +1 | X t +1 ) B 0 ( X t +1 ) to renormalize 3

Normalization to Account for Evidence SELECT the joint NORMALIZE the probabilities selection X E P (make it sum to one) matching the rain U 0.4 evidence X P X E P rain - 0.1 rain 0.67 rain U 0.4 sun U 0.2 sun 0.33 sun U 0.2 sun - 0.3 Since could have seen other evidence, we normalize by dividing by the probability of the evidence we did see (in this case dividing by 0.5)… Example: Weather HMM B’( x 1 =r ) = P( x 1 =r | x 0 =r ) * 0.5 + P( x 1 =r | x 0 =s ) * 0.5 = 0.8*0.5 + 0.6*0.5 = 0.7 X B 0 ( X t +1 ) = P ( X 0 | x t ) B ( x t ) x t B(x 0 =r) = 0.5 R t-1 P(R t | R t-1 ) Rain 0 Rain 1 Rain 2 t 0.8 f 0.6 P(R 1 ) R t P(U t | R t ) 0.5 Umbr 1 = T Umbr 2 = T t 0.9 f 0.3 4

Example: Weather HMM B’( x 1 =r ) = P( x 1 =r | x 0 =r ) * 0.5 + P( x 1 =r | x 0 =s ) * 0.5 = 0.8*0.5 + 0.6*0.5 = 0.7 X B 0 ( X t +1 ) = P ( X 0 | x t ) B ( x t ) x t B(x 1 =r) ∝ 0.9 * 0.7 = 0.63 B(x 1 =s) ∝ 0.3 * 0.3 = 0.09 B ( X t +1 ) ∝ X t +1 Divide by 0.72 (=0.63+0.09) to normalize +1 P ( e t +1 | X t +1 ) B 0 ( X t +1 ) B(x 0 =r) = 0.5 B(x 1 =r) =0.63/0.72 = 0.875 R t-1 P(R t | R t-1 ) Rain 0 Rain 1 Rain 2 t 0.8 f 0.6 P(R 1 ) R t P(U t | R t ) 0.5 Umbr 1 = T Umbr 2 = T t 0.9 f 0.3 Example: Weather HMM B’( x 2 =r ) = P( x 2 =r | x 1 =r )*0.875 + P( x 2 =r | x 1 =s )*0.125 B’( x 1 =r ) = 0.7 = 0.8*0.875 + 0.6*0.125 = 0.775 X B 0 ( X t +1 ) = P ( X 0 | x t ) B ( x t ) x t B(x 0 =r) = 0.5 B(x 1 =r) = 0.875 R t-1 P(R t | R t-1 ) Rain 0 Rain 1 Rain 2 t 0.8 f 0.6 P(R 1 ) R t P(U t | R t ) 0.5 Umbr 1 = T Umbr 2 = T t 0.9 f 0.3 5

Example: Weather HMM B’( x 2 =r ) = P( x 2 =r | x 1 =r )*0.875 + P( x 2 =r | x 1 =s )*0.125 B’( x 1 =r ) = 0.7 = 0.8*0.875 + 0.6*0.125 = 0.775 X B 0 ( X t +1 ) = P ( X 0 | x t ) B ( x t ) x t B(x 1 =r) ∝ 0.9 * 0.775 = 0.6975 B(x 1 =s) ∝ 0.3 * 0.225 = 0.0675 B ( X t +1 ) ∝ X t +1 Divide by 0.765 to normalize +1 P ( e t +1 | X t +1 ) B 0 ( X t +1 ) B(x 0 =r) = 0.5 B(x 1 =r) = 0.875 B(x 1 =r) = 0.912 R t-1 P(R t | R t-1 ) Rain 0 Rain 1 Rain 2 t 0.8 f 0.6 P(R 1 ) R t P(U t | R t ) 0.5 Umbr 1 = T Umbr 2 = T t 0.9 f 0.3 Particle Filtering 6

Particle Filtering Overview § Approximation technique to solve filtering problem § Represents P distribution with samples § Filtering still operates in two steps § Elapse time § Incorporate observations § (But this part has two sub-steps: weight & resample) 38 Particle Filtering § Sometimes |X| is too big to use exact inference § |X| may be too big to even store B(X) § E.g. X is continuous § Solution: approximate inference § Track samples of X , not exact distribution of values § Samples are called particles § Time per step is linear in the number of samples § But: number needed may be large § In memory: list of particles, not states § Particle is just new name for sample § This is how robot localization works in practice 7

Remember… An HMM is defined by: § Initial distribution: § Transitions: § Emissions: Here’s a Single Particle § It represents a hypothetical state where the robot is in (1,2) 8

Particles Approximate Distribution § Our representation of P(X) is now a list of N particles (samples) P(x) § Generally, N << |X| Distribution Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) P(x=<3,3>) = 5/10 = 50% Particle Filtering A more compact view overlays the samples: 0.0 0.2 0.5 Encodes à 0.1 0.0 0.2 0.0 0.2 0.5 9

Another Example In the weather HMM, suppose we decide to approximate the distributions with 5 particles. To initialize the filter, we draw 5 samples from B(x 0 =r) = 0.5 and we might get the following set of particles: P(x) Distribution Particles: S R R S S Not such a good approximation, but that’s life. 44 Representation: Particles § Our representation of P(X) is now a list of N particles (samples) § Generally, N << |X| § Storing map from X to counts would defeat the purpose Particles: (3,3) § P(x) approximated by (number of particles with value x) / N (2,3) § More particles, more accuracy (3,3) (3,2) (3,3) § What is P((2,2))? 0/10 = 0% (3,2) (1,2) (3,3) § In fact, many x may have P(x) = 0! (3,3) (2,3) 10

Particle Filtering Algorithm 1. Elapse Time 2. Observe 2a. Downweight samples based on evidence 2b. Resample 47 Particle Filtering: Elapse Time § For each particle, x, move x by sampling its next Particles: position from the transition model (3,3) (2,3) (3,3) (3,2) (3,3) Aka: sample( P(x t+1 | x t )) (3,2) (1,2) (3,3) § This is like prior sampling – samples’ frequencies (3,3) reflect the transition probabilities (2,3) § Here, most samples move clockwise, but some move in another direction or stay in place Particles: (3,2) (2,3) (3,2) (3,1) § This captures the passage of time (3,3) (3,2) § If enough samples, close to exact values before and (1,3) after (consistent) (2,3) (3,2) (2,2) 11

Particle Filtering: Observe Particles: § Slightly trickier: (3,2) (2,3) § Don’t sample observation, fix it (3,2) (3,1) (3,3) § Similar to likelihood weighting, (3,2) § For each particle, x, down-weight x (1,3) (2,3) based on the evidence (3,2) (2,2) Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 § As before, the probabilities don’t sum to (3,1) w=.4 one, since all have been downweighted (3,3) w=.4 (in fact they now sum to (N times) an (3,2) w=.9 (1,3) w=.1 approximation of P(e)) (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 Particle Filtering Observe Part II: Resample Particles: (3,2) w=.9 § Rather than tracking weighted samples, we (2,3) w=.2 (3,2) w=.9 resample (3,1) w=.4 § N times, we choose from our weighted sample (3,3) w=.4 (3,2) w=.9 distribution (i.e. draw with replacement) (1,3) w=.1 (2,3) w=.2 Draw random number in [0, 5.3] (3,2) w=.9 (2,2) w=.4 (New) Particles: (3,2) (3,2) (2,3) (3,2) (2,2) (2,2) (3,2) w=.9 w=.9 w=.4 w=.2 (2,3) (3,3) § This is equivalent to renormalizing the distribution (3,2) (3,1) (3,2) § Now the update is complete for this time step, (3,1) continue with the next one (3,2) 12

Recap: Particle Filtering § Particles: track samples of states rather than an explicit distribution Elapse Weight Resample Particles: Particles: Particles: (New) Particles: (3,3) (3,2) (3,2) w=.9 (3,2) (2,3) (2,3) (2,3) w=.2 (2,2) (3,3) (3,2) (3,2) w=.9 (3,2) (3,2) (3,1) (3,1) w=.4 (2,3) (3,3) (3,3) (3,3) w=.4 (3,3) (3,2) (3,2) (3,2) w=.9 (3,2) (1,2) (1,3) (1,3) w=.1 (1,3) (3,3) (2,3) (2,3) w=.2 (2,3) (3,3) (3,2) (3,2) w=.9 (3,2) (2,3) (2,2) (2,2) w=.4 (3,2) [Demos: ghostbusters particle filtering (L15D3,4,5)] Video of Demo – Moderate Number of Particles Uniform initialization (!) Circular dynamics 13

Video of Demo – One Particle Uniform initialization (ha!) Circular dynamics Video of Demo – Huge Number of Particles Actually looks uniform ! but look closely Circular dynamics 14

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld - PDF document

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington [Many of these slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington

CSE 473: Artificial Intelligence Spring 2014 Hidden Markov Models & Particle Filtering

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

100 Years of Light Quanta Roy J. Glauber Harvard University Max Planck: October 19, 1900

Advanced Macroeconomics 5. Rational Expectations and Asset Prices Karl Whelan School of

Federation-enabled ticketing system The RT Case EuroCAMP Dubrovnik 14 th - 15 th November 2007

Evaluation of real-time operating systems for use in Integrated Modular Avionics Professor:

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com <rostedt@goodmis.org>

Median Closings Report The Transportation Technical Advisory Committee reviewed the Rt 13

Previous lecture: Introduction to objects and classes T odays lecture: Defining

Foundations of Chemical Kinetics Lecture 12: Transition-state theory: The thermodynamic formalism

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld - PDF document

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington [Many of these slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington

CSE 473: Artificial Intelligence Spring 2014 Hidden Markov Models &amp; Particle Filtering

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

100 Years of Light Quanta Roy J. Glauber Harvard University Max Planck: October 19, 1900

Advanced Macroeconomics 5. Rational Expectations and Asset Prices Karl Whelan School of

Federation-enabled ticketing system The RT Case EuroCAMP Dubrovnik 14 th - 15 th November 2007

Evaluation of real-time operating systems for use in Integrated Modular Avionics Professor:

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com &lt;rostedt@goodmis.org&gt;

Median Closings Report The Transportation Technical Advisory Committee reviewed the Rt 13

Previous lecture: Introduction to objects and classes T odays lecture: Defining

Foundations of Chemical Kinetics Lecture 12: Transition-state theory: The thermodynamic formalism

CSE 473: Artificial Intelligence Spring 2014 Hidden Markov Models & Particle Filtering

Threaded Network Interrupts Steven Rostedt srostedt@redhat.com <rostedt@goodmis.org>