module 4
play

Module 4 Markov Processes CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b


  1. Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

  2. Sequential Decision Making • In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 2 CS886 (c) 2013 Pascal Poupart

  3. Common Properties • Processes are rarely arbitrary • They often exhibit some structure – Laws of the process do not change – Short history sufficient to predict future • Example: weather prediction – Same model can used everyday to predict weather – Weather measurements of past few days sufficient to predict weather. 3 CS886 (c) 2013 Pascal Poupart

  4. Stochastic Process • Definition – Set of States: S – Stochastic dynamics: Pr(s t |s t-1 , …, s 0 ) s 0 s 1 s 2 s 4 s 3 4 CS886 (c) 2013 Pascal Poupart

  5. Stochastic Process • Problem: – Infinitely large conditional probability tables • Solutions: – Stationary process: dynamics do not change over time – Markov assumption: current state depends only on a finite history of past states 5 CS886 (c) 2013 Pascal Poupart

  6. K-order Markov Process • Assumption: last k states sufficient • First-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 ) s 0 s 1 s 2 s 4 s 3 • Second-order Markov Process – Pr(s t |s t-1 , …, s 0 ) = Pr(s t |s t-1 , s t-2 ) s 0 s 1 s 2 s 4 s 3 6 CS886 (c) 2013 Pascal Poupart

  7. Markov Process • By default, a Markov Process refers to a – First-order process Pr 𝑡 𝑢 𝑡 𝑢−1 , 𝑡 𝑢−2 , … , 𝑡 0 = Pr 𝑡 𝑢 𝑡 𝑢−1 ∀𝑢 – Stationary process Pr 𝑡 𝑢 𝑡 𝑢−1 = Pr 𝑡 𝑢 ′ 𝑡 𝑢 ′ −1 ∀𝑢 ′ • Advantage: can specify the entire process with a single concise conditional distribution Pr (𝑡 ′ |𝑡) 7 CS886 (c) 2013 Pascal Poupart

  8. Examples • Robotic control – States: 𝑦, 𝑧, 𝑨, 𝜄 coordinates of joints – Dynamics: constant motion • Inventory management – States: inventory level – Dynamics: constant (stochastic) demand 8 CS886 (c) 2013 Pascal Poupart

  9. Non-Markovian and/or non-stationary processes • What if the process is not Markovian and/or not stationary? • Solution: add new state components until dynamics are Markovian and stationary – Robotics: the dynamics of 𝑦, 𝑧, 𝑨, 𝜄 are not stationary when velocity varies… – Solution: add velocity to state description e.g. 𝑦, 𝑧, 𝑨, 𝜄, 𝑦 , 𝑧 , 𝑨 , 𝜄 – If velocity varies… then add acceleration – Where do we stop? 9 CS886 (c) 2013 Pascal Poupart

  10. Markovian Stationary Process • Problem: adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity • Solution: try to find the smallest state description that is self-sufficient (i.e., Markovian and stationary) 10 CS886 (c) 2013 Pascal Poupart

  11. Inference in Markov processes Common task: • – Prediction: Pr (𝑡 𝑢+𝑙 |𝑡 𝑢 ) Computation: • 𝑙 – Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = Pr (𝑡 𝑢+𝑗 |𝑡 𝑢+𝑗−1 ) 𝑡 𝑢+1 …𝑡 𝑢+𝑙−1 𝑗=1 Matrix operations: • – Let 𝑈 be a 𝑇 × |𝑇| matrix representing Pr (𝑡 𝑢+1 |𝑡 𝑢 ) – Then Pr 𝑡 𝑢+𝑙 𝑡 𝑢 = 𝑈 𝑙 – Complexity: 𝑃(𝑙 𝑇 2 ) 11 CS886 (c) 2013 Pascal Poupart

  12. Decision Making Predictions by themselves are useless • They are only useful when they will influence • future decisions Hence the ultimate task is decision making • How can we influence the process to visit • desirable states? Model: Markov Decision Process • 12 CS886 (c) 2013 Pascal Poupart

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend