temporal probability models
play

Temporal probability models Chapter 15, Sections 13 of; based on - PowerPoint PPT Presentation

Temporal probability models Chapter 15, Sections 13 of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 13 1 Outline Time and uncertainty


  1. Temporal probability models Chapter 15, Sections 1–3 of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 1

  2. Outline ♦ Time and uncertainty ♦ Inference: filtering, prediction, smoothing ♦ Hidden Markov models of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 2

  3. Time and uncertainty The world changes; we need to track and predict it Our basic idea is to copy state and evidence variables for each time step X t = set of unobservable state variables at time t e.g., BloodSugar t , StomachContents t , etc. E t = set of observable evidence variables at time t e.g., MeasuredBloodSugar t , PulseRate t , FoodEaten t This assumes discrete time ; the step size depends on the problem Notation: X a : b = X a , X a +1 , . . . , X b − 1 , X b We want to construct a Bayes net from these variables: – what are the parents of X t and E t ? of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 3

  4. Markov chains A Markov chain has a single observable state X t that obeys the Markov assumption: X t depends on a bounded subset of X 0: t − 1 First-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) X t −2 X t −1 X t X t +1 X t +2 First−order X t −2 X t −1 X t X t +1 X t +2 Second−order Second-order Markov process: P ( X t | X 0: t − 1 ) = P ( X t | X t − 2 , X t − 1 ) (can be reduced to 1st order by using � X t − 2 , X t − 1 � as the state) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 4

  5. Hidden Markov models (HMM) A HMM contains a Markov chain X t , which is not observable. Instead we observe the evidence variables E t , and assume that they obey the Sensor Markov assumption: P ( E t | X 0: t , E 0: t − 1 ) = P ( E t | X t ) Both Markov chains and HMMs are stationary processes: – the transition model P ( X t | X t − 1 ) and – the sensor model P ( E t | X t ) are fixed for all t of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 5

  6. Example R t −1 P(R ) t t 0.7 f 0.3 Rain t −1 Rain Rain t +1 t R P(U ) t t t 0.9 f 0.2 Umbrella t −1 Umbrella Umbrella t +1 t Neither the Markov assumption nor the sensor Markov assumtion are exactly true in the real world! Possible fixes: 1. Increase the order of the Markov process 2. Augment the state , e.g., add Temp t , Pressure t of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 6

  7. Inference tasks Filtering: P ( X t | e 1: t ) to compute the current belief state given all evidence better name: state estimation Prediction: P ( X t + k | e 1: t ) for k > 0 to compute a future belief state, given current evidence (it’s like filtering without all evidence) Smoothing: P ( X k | e 1: t ) for 0 ≤ k < t to compute a better estimate of past states Most likely explanation: arg max x 1: t P ( x 1: t | e 1: t ) to compute the state sequence that is most likely, given the evidence Applications: speech recognition, decoding with a noisy channel, etc. of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 7

  8. Filtering / state estimation A useful filtering algorithm needs to maintain a current state and update it, instead of recalculating everything. I.e., we need a function f such that: P ( X t +1 | e 1: t +1 ) = f ( e t +1 , P ( X t | e 1: t )) We compose the evidence e 1: t +1 into e 1: t and e t +1 : P ( X t +1 | e 1: t +1 ) = P ( X t +1 | e 1: t , e t +1 ) (divide the evidence) = α P ( e t +1 | X t +1 , e 1: t ) P ( X t +1 | e 1: t ) (Bayes’ rule) = α P ( e t +1 | X t +1 ) P ( X t +1 | e 1: t ) (Sensor Markov assumption) � �� � � �� � the sensor model prediction We obtain the one-step prediction by conditioning on the current state X t : P ( X t +1 | e 1: t ) = P ( X t +1 | X t , e 1: t ) P ( X t | e 1: t ) = P ( X t +1 | X t ) P ( X t | e 1: t ) (Markov assumption) � �� � � �� � previous estimate the Markov model Our final equation becomes this: P ( X t +1 | e 1: t +1 ) = α P ( e t +1 | X t +1 ) P ( X t +1 | X t ) P ( X t | e 1: t ) � �� � � �� � � �� � � �� � current estimate = f 1: k +1 the sensor model the Markov model previous estimate = f 1: k of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 8

  9. Smoothing X 0 X 1 X k X t E E k E t 1 Divide evidence e 1: t into e 1: k , e k +1: t : P ( X k | e 1: t ) = P ( X k | e 1: k , e k +1: t ) = α P ( X k | e 1: k ) P ( e k +1: t | X k , e 1: k ) (Bayes’ rule) = α P ( X k | e 1: k ) P ( e k +1: t | X k ) (conditional independence) � �� � � �� � f 1: k b k +1: t The backward message b k +1: t is computed by backwards recursion: P ( e k +1: t | X k ) = P ( e k +1: t | X k , X k +1 ) P ( X k +1 | X k ) = P ( e k +1: t | X k +1 ) P ( X k +1 | X k ) = P ( e k +1 | X k +1 ) P ( e k +2: t | X k +1 ) P ( X k +1 | X k ) � �� � � �� � � �� � the sensor model b k +2: t the Markov model of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 9

  10. Forward and backward Forward algorithm is used to compute the current belief state Backward algorithm is used to compute a previous belief state Forward–backward algorithm: cache forward messages along the way, which can then be used when going backward of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 10

  11. Most likely explanation Most likely sequence � = sequence of most likely states! P ( x 1: t , X t +1 | e 1: t , e t +1 ) = α P ( e t +1 | x 1: t , X t +1 , e 1: t ) P ( x 1: t , X t +1 | e 1: t ) = α P ( e t +1 | x 1: t , X t +1 , e 1: t ) P ( X t +1 | x 1: t , e 1: t ) P ( x 1: t | e 1: t ) = α P ( e t +1 | X t +1 ) P ( X t +1 | x t ) P ( x 1: t − 1 , x t | e 1: t ) Most likely path to each x t +1 = most likely path to some x t , plus one step. Since we don’t care about the exact values, we can forget α . m 1: t +1 = max x 1: t P ( x 1: t , X t +1 | e 1: t , e t +1 ) = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) max x 1: t − 1 P ( x 1: t − 1 , X t | e 1: t )) = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) m 1: t is the probability distribution of the most likely path to each x t ∈ X t , and is calculated by the Viterbi algorithm: m 1: t +1 = P ( e t +1 | X t +1 ) max x t ( P ( X t +1 | x t ) m 1: t ) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 11

  12. Hidden Markov models X t is a single, discrete variable X t (and usually E t is too) Assume that the domain of X t is { 1 , . . . , S } Transition matrix T ij = P ( X t = j | X t − 1 = i ) ,    0 . 7 0 . 3 e.g., the rain matrix     0 . 3 0 . 7  Sensor matrix O t for each time step t , consists of diagonal elements P ( e t | X t = i )    0 . 9 0 e.g., with U 1 = true , O 1 =     0 0 . 2  Forward and backward messages can now be represented as column vectors: f 1: t +1 = α O t +1 T ⊤ f 1: t b k +1: t = T O k +1 b k +2: t The forward-backward algorithm needs time O ( S 2 t ) and space O ( St ) of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 12

  13. Summary for HMMs Temporal models use state X t and sensor E t variables replicated over time To make the models tractable, we introduce simplifying assumptions: – Markov assumption: P ( X t | X 0: t − 1 ) = P ( X t | X t − 1 ) – sensor assumption: P ( E t | X 0: t , E 0: t − 1 ) = P ( E t | X t ) – stationarity: P ( X t | X t − 1 ) = P ( X t ′ | X t ′ − 1 ) , P ( E t | X t ) = P ( E t ′ | X t ′ ) With the assumptions we only need the following models: – the transition model P ( X t | X t − 1 ) – the sensor model P ( E t | X t ) Possible computing tasks: – filtering/state estimation, prediction, smoothing, most likely sequence – all can be done with constant cost per time step of; based on AIMA Slides c Artificial Intelligence, spring 2013, Peter Ljungl¨ � Stuart Russel and Peter Norvig, 2004 Chapter 15, Sections 1–3 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend