Markov Chains and Hidden Markov Models CE417: Introduction to - PowerPoint PPT Presentation

Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley.

Reasoning over Time or Space } Often, we want to reason about a sequence of observations } Speech recognition } Robot localization } User attention } Medical monitoring } Need to introduce time (or space) into our models 2

Markov Models } Value of X at a given time is called the state X 1 X 2 X 3 X 4 } Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) } Stationarity assumption: transition probabilities the same at all times } Same as MDP transition model, but no choice of action 3

Joint Distribution of a Markov Model X 1 X 2 X 3 X 4 } Joint distribution: P ( X 1 , X 2 , X 3 , X 4 ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) P ( X 4 | X 3 ) } More generally: P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T − 1 ) T Y = P ( X 1 ) P ( X t | X t − 1 ) t =2 4

Chain Rule and Markov Models X 1 X 2 X 3 X 4 } From the chain rule, every joint distribution over can X 1 , X 2 , . . . , X T be written as: T Y P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X t | X 1 , X 2 , . . . , X t − 1 ) t =2 } Assuming that for all t : ⊥ X 1 , . . . , X t − 2 | X t − 1 X t ⊥ gives us the expression posited on the earlier slide: T Y P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X t | X t − 1 ) t =2 5

Markov Models } Explicit assumption for all t : ⊥ X 1 , . . . , X t − 2 | X t − 1 X t ⊥ } Consequence, joint distribution can be written as: P ( X 1 , X 2 , . . . , X T ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 2 ) . . . P ( X T | X T − 1 ) T Y = P ( X 1 ) P ( X t | X t − 1 ) t =2 } Implied conditional independencies: } Past variables independent of future variables given the present i.e., if or then: ⊥ X t 3 | X t 2 X t 1 ⊥ t 1 > t 2 > t 3 t 1 < t 2 < t 3 } Additional explicit assumption: is the same for P ( X t | X t − 1 ) all t 6

Conditional Independence } Basic conditional independence: } Past and future independent of the present } Each time step only depends on the previous } This is called the (first order) Markov property } Note that the chain is just a (growable) BN } We can always use generic BN reasoning on it if we truncate the chain at a fixed length 7

Example Markov Chain: Weather } States: X = {rain, sun} Initial distribution: 1.0 sun § CPT P(X t | X t-1 ): § Two new ways of representing the same CPT X t-1 X t P(X t |X t-1 ) 0.9 0.3 0.9 sun sun 0.9 sun sun rain sun 0.1 sun rain 0.1 0.3 rain sun 0.3 rain rain 0.7 rain rain 0.7 0.7 0.1 8

Example Markov Chain: Weather } Initial distribution: 1.0 sun 0.9 0.3 rain sun 0.7 0.1 } What is the probability distribution after one step? 9

Mini-Forward Algorithm } Question:What’s P(X) on some day t? X 1 X 2 X 3 X 4 P ( x t ) = X P ( x t − 1 , x t ) x t − 1 X = P ( x t | x t − 1 ) P ( x t − 1 ) x t − 1 Forward simulation 10

Example Run of Mini-Forward Algorithm § From initial observation of sun P( X 1 ) P( X 2 ) P( X 3 ) P( X 4 ) P( X ¥ ) § From initial observation of rain P( X 1 ) P( X 2 ) P( X 3 ) P( X 4 ) P( X ¥ ) § From yet another initial distribution P(X 1 ): … P( X 1 ) P( X ¥ ) [Demo: L13D1,2 11

Stationary Distributions Stationary distribution: } For most chains: § § The distribution we end up with is } Influence of the initial distribution called the stationary distribution gets less and less over time. P ∞ of the chain } The distribution we end up in is § It satisfies independent of the initial distribution X P ∞ ( X ) = P ∞ +1 ( X ) = P ( X | x ) P ∞ ( x ) x 12

Example: Stationary Distributions } Question:What’s P(X) at time t = infinity? X 1 X 2 X 3 X 4 P ∞ ( sun ) = P ( sun | sun ) P ∞ ( sun ) + P ( sun | rain ) P ∞ ( rain ) P ∞ ( rain ) = P ( rain | sun ) P ∞ ( sun ) + P ( rain | rain ) P ∞ ( rain ) P ∞ ( sun ) = 0 . 9 P ∞ ( sun ) + 0 . 3 P ∞ ( rain ) X t-1 X t P(X t |X t-1 ) P ∞ ( rain ) = 0 . 1 P ∞ ( sun ) + 0 . 7 P ∞ ( rain ) sun sun 0.9 sun rain 0.1 P ∞ ( sun ) = 3 P ∞ ( rain ) rain sun 0.3 P ∞ ( rain ) = 1 / 3 P ∞ ( sun ) P ∞ ( sun ) = 3 / 4 rain rain 0.7 Also: P ∞ ( rain ) = 1 / 4 P ∞ ( sun ) + P ∞ ( rain ) = 1 13

Inference in Ghostbusters } A ghost is in the grid somewhere } Sensor readings tell how close a square is to the ghost On the ghost: red } 1 or 2 away: orange } 3 or 4 away: yellow } 5+ away: green } § Sensors are noisy, but we know P(Color | Distance) P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3

Video of Demo Ghostbusters Basic Dynamics 15

Video of Demo Ghostbusters Circular Dynamics 16

Video of Demo Ghostbusters Whirlpool Dynamics 17

Application of Stationary Distribution: Web Link Analysis } PageRank over a web graph } Each web page is a state } Initial distribution: uniform over pages } Transitions: } With prob. c, uniform jump to a random page (dotted lines, not all shown) } With prob. 1-c, follow a random outlink (solid lines) } Stationary distribution } Will spend more time on highly reachable pages } E.g. many ways to get to the Acrobat Reader download page } Somewhat robust to link spam } Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time) 18

Hidden Markov Models 19

Hidden Markov Models } Markov chains not so useful for most agents Need observations to update your beliefs } } Hidden Markov models (HMMs) Underlying Markov chain over states X } You observe outputs (effects) at each time step } X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 20

Example: Weather HMM P ( X t | X t − 1 ) Rain t-1 Rain t Rain t+1 P ( E t | X t ) Umbrella Umbrella Umbrella t-1 t t+1 } An HMM is defined by: R t R t+1 P(R t+1 |R t ) R t U t P(U t |R t ) } Initial distribution: +r +r 0.7 +r +u 0.9 } Transitions: +r -r 0.3 +r -u 0.1 P ( X t | X t − 1 ) } Emissions: -r +r 0.3 -r +u 0.2 P ( E t | X t ) -r -r 0.7 -r -u 0.8 21

HMM: probabilistic model } Transitional probabilities : transition probabilities between states } 𝐵 "# ≡ 𝑄(𝑌 ( = 𝑘|𝑌 (,- = 𝑗) } Initial state distribution: start probabilities in different states } 𝜌 " ≡ 𝑄(𝑌 - = 𝑗) } Observation model : Emission probabilities associated with each state } 𝑄(𝐹 ( |𝑌 ( ) 22

Joint Distribution of an HMM X 1 X 2 X 3 X 5 E 1 E 2 E 3 E 5 } Joint distribution: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 ) P ( E 2 | X 2 ) P ( X 3 | X 2 ) P ( E 3 | X 3 ) } More generally: T Y P ( X 1 , E 1 , . . . , X T , E T ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X t | X t − 1 ) P ( E t | X t ) t =2 23

Chain Rule and HMMs X 1 X 2 X 3 E 1 E 2 E 3 } From the chain rule, every joint distribution over can be written X 1 , E 1 , X 2 , E 2 , X 3 , E 3 as: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 , E 1 ) P ( E 2 | X 1 , E 1 , X 2 ) P ( X 3 | X 1 , E 1 , X 2 , E 2 ) P ( E 3 | X 1 , E 1 , X 2 , E 2 , X 3 ) } Assuming that ⊥ E 1 | X 1 , ⊥ X 1 , E 1 | X 2 , ⊥ X 1 , E 1 , E 2 | X 2 , ⊥ X 1 , E 1 , X 2 , E 2 | X 3 X 2 ⊥ E 2 ⊥ X 3 ⊥ E 3 ⊥ gives us the expression posited on the previous slide: P ( X 1 , E 1 , X 2 , E 2 , X 3 , E 3 ) = P ( X 1 ) P ( E 1 | X 1 ) P ( X 2 | X 1 ) P ( E 2 | X 2 ) P ( X 3 | X 2 ) P ( E 3 | X 3 ) 24

Conditional Independencies X 1 X 2 X 3 E 1 E 2 E 3 } State independent of all past states and all past evidence given the previous state, i.e.: ⊥ X 1 , E 1 , . . . , X t − 2 , E t − 2 , E t − 1 | X t − 1 X t ⊥ } Evidence is independent of all past states and all past evidence given the current state, i.e.: ⊥ X 1 , E 1 , . . . , X t − 2 , E t − 2 , X t − 1 , E t − 1 | X t E t ⊥ 25

Conditional Independence } HMMs have two important independence properties: Markov hidden process: future depends on past via the present } Current observation independent of all else given current state } X 1 X 2 X 3 X 4 X 5 E 1 E 2 E 3 E 4 E 5 } Quiz: does this mean that evidence variables are guaranteed to be independent? [No, they tend to correlated by the hidden state] } 26

Example: Ghostbusters HMM 1/9 1/9 1/9 } P(X 1 ) = uniform 1/9 1/9 1/9 } P(X|X ’ ) = usually move clockwise, but sometimes 1/9 1/9 1/9 move in a random direction or stay in place P(X 1 ) 1/6 1/6 1/2 } P(R ij |X) = same sensor model as before: red means close, green means far away. 0 1/6 0 0 0 0 X 1 X 2 X 3 X 4 P(X|X ’ =<1,2>) X 5 R i,j R i,j R i,j R i,j 27

Video of Demo Ghostbusters – Circular Dynamics -- HMM 28

Filtering / Monitoring } Filtering, or monitoring, is the task of tracking the distribution B t (X) = P t (X t | e 1 , …, e t ) (the belief state) over time } We start with B 1 (X) in an initial setting, usually uniform } As time passes, or we get observations, we update B(X) } The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program 29

Markov Chains and Hidden Markov Models CE417: Introduction to - PowerPoint PPT Presentation

Markov Chains and Hidden Markov Models CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2019 Soleymani Slides are based on Klein and Abdeel, CS188, UC Berkeley. Reasoning over Time or Space } Often, we want

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

The computational complexity of analyzing infinite-state structured Markov Chains and structured

Markov chains and the number of occurrences of a word in a sequence (4.54.9, 11.1,2,4,6)

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Overview of Markov chains David Gleich Purdue University

8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer

Introduction to Markov Chain Monte Carlo Olivier Le Matre 1 with Omar Knio (KAUST) 1 Centre de

Randomness in Computing L ECTURE 26 Last time Randomized algorithm for 3SAT Gamblers

Markov Decision Processes and Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS