Module 4 Markov Processes CS 886 Sequential Decision Making and - - PowerPoint PPT Presentation
Module 4 Markov Processes CS 886 Sequential Decision Making and - - PowerPoint PPT Presentation
Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b
CS886 (c) 2013 Pascal Poupart
2
Sequential Decision Making
- In general: exponentially large decision tree
s1 s13 s12 s3 s2 a b .9 .1 .2 .8
s4 s5
.5 .5
s6 s7
.6 .4 a b
s8 s9
.2 .8
s10 s11
.7 .3 a b
s14 s15
.1 .9
s16 s17
.2 .8 a b
s18 s19
.2 .8
s20 s21
.7 .3 a b
CS886 (c) 2013 Pascal Poupart
3
Common Properties
- Processes are rarely arbitrary
- They often exhibit some structure
–Laws of the process do not change –Short history sufficient to predict future
- Example: weather prediction
–Same model can used everyday to predict weather –Weather measurements of past few days sufficient to predict weather.
CS886 (c) 2013 Pascal Poupart
4
- Definition
– Set of States: S – Stochastic dynamics: Pr(st|st-1, …, s0)
Stochastic Process
s0 s1 s2 s3 s4
CS886 (c) 2013 Pascal Poupart
5
- Problem:
– Infinitely large conditional probability tables
- Solutions:
– Stationary process: dynamics do not change
- ver time
– Markov assumption: current state depends
- nly on a finite history of past states
Stochastic Process
CS886 (c) 2013 Pascal Poupart
6
- Assumption: last k states sufficient
- First-order Markov Process
– Pr(st|st-1, …, s0) = Pr(st|st-1)
- Second-order Markov Process
– Pr(st|st-1, …, s0) = Pr(st|st-1, st-2)
K-order Markov Process
s0 s1 s2 s3 s4 s0 s1 s2 s3 s4
CS886 (c) 2013 Pascal Poupart
7
- By default, a Markov Process refers to a
– First-order process Pr 𝑡𝑢 𝑡𝑢−1, 𝑡𝑢−2, … , 𝑡0 = Pr 𝑡𝑢 𝑡𝑢−1 ∀𝑢 – Stationary process Pr 𝑡𝑢 𝑡𝑢−1 = Pr 𝑡𝑢′ 𝑡𝑢′−1 ∀𝑢′
- Advantage: can specify the entire process
with a single concise conditional distribution Pr (𝑡′|𝑡)
Markov Process
CS886 (c) 2013 Pascal Poupart
8
- Robotic control
– States: 𝑦, 𝑧, 𝑨, 𝜄 coordinates of joints – Dynamics: constant motion
- Inventory management
– States: inventory level – Dynamics: constant (stochastic) demand
Examples
CS886 (c) 2013 Pascal Poupart
9
Non-Markovian and/or non-stationary processes
- What if the process is not Markovian and/or
not stationary?
- Solution: add new state components until
dynamics are Markovian and stationary
– Robotics: the dynamics of 𝑦, 𝑧, 𝑨, 𝜄 are not stationary when velocity varies… – Solution: add velocity to state description e.g. 𝑦, 𝑧, 𝑨, 𝜄, 𝑦 , 𝑧 , 𝑨 , 𝜄 – If velocity varies… then add acceleration – Where do we stop?
CS886 (c) 2013 Pascal Poupart
10
Markovian Stationary Process
- Problem: adding components to the state
description to force a process to be Markovian and stationary may significantly increase computational complexity
- Solution: try to find the smallest state
description that is self-sufficient (i.e., Markovian and stationary)
CS886 (c) 2013 Pascal Poupart
11
Inference in Markov processes
- Common task:
– Prediction: Pr (𝑡𝑢+𝑙|𝑡𝑢)
- Computation:
– Pr 𝑡𝑢+𝑙 𝑡𝑢 = Pr (𝑡𝑢+𝑗|𝑡𝑢+𝑗−1)
𝑙 𝑗=1 𝑡𝑢+1…𝑡𝑢+𝑙−1
- Matrix operations:
– Let 𝑈 be a 𝑇 × |𝑇| matrix representing Pr (𝑡𝑢+1|𝑡𝑢) – Then Pr 𝑡𝑢+𝑙 𝑡𝑢 = 𝑈𝑙 – Complexity: 𝑃(𝑙 𝑇 2)
CS886 (c) 2013 Pascal Poupart
12
Decision Making
- Predictions by themselves are useless
- They are only useful when they will influence
future decisions
- Hence the ultimate task is decision making
- How can we influence the process to visit
desirable states?
- Model: Markov Decision Process