Module 4 Markov Processes CS 886 Sequential Decision Making and - - PowerPoint PPT Presentation

module 4
SMART_READER_LITE
LIVE PREVIEW

Module 4 Markov Processes CS 886 Sequential Decision Making and - - PowerPoint PPT Presentation

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Sequential Decision Making In general: exponentially large decision tree s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b


slide-1
SLIDE 1

Module 4 Markov Processes

CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

slide-2
SLIDE 2

CS886 (c) 2013 Pascal Poupart

2

Sequential Decision Making

  • In general: exponentially large decision tree

s1 s13 s12 s3 s2 a b .9 .1 .2 .8

s4 s5

.5 .5

s6 s7

.6 .4 a b

s8 s9

.2 .8

s10 s11

.7 .3 a b

s14 s15

.1 .9

s16 s17

.2 .8 a b

s18 s19

.2 .8

s20 s21

.7 .3 a b

slide-3
SLIDE 3

CS886 (c) 2013 Pascal Poupart

3

Common Properties

  • Processes are rarely arbitrary
  • They often exhibit some structure

–Laws of the process do not change –Short history sufficient to predict future

  • Example: weather prediction

–Same model can used everyday to predict weather –Weather measurements of past few days sufficient to predict weather.

slide-4
SLIDE 4

CS886 (c) 2013 Pascal Poupart

4

  • Definition

– Set of States: S – Stochastic dynamics: Pr(st|st-1, …, s0)

Stochastic Process

s0 s1 s2 s3 s4

slide-5
SLIDE 5

CS886 (c) 2013 Pascal Poupart

5

  • Problem:

– Infinitely large conditional probability tables

  • Solutions:

– Stationary process: dynamics do not change

  • ver time

– Markov assumption: current state depends

  • nly on a finite history of past states

Stochastic Process

slide-6
SLIDE 6

CS886 (c) 2013 Pascal Poupart

6

  • Assumption: last k states sufficient
  • First-order Markov Process

– Pr(st|st-1, …, s0) = Pr(st|st-1)

  • Second-order Markov Process

– Pr(st|st-1, …, s0) = Pr(st|st-1, st-2)

K-order Markov Process

s0 s1 s2 s3 s4 s0 s1 s2 s3 s4

slide-7
SLIDE 7

CS886 (c) 2013 Pascal Poupart

7

  • By default, a Markov Process refers to a

– First-order process Pr 𝑡𝑢 𝑡𝑢−1, 𝑡𝑢−2, … , 𝑡0 = Pr 𝑡𝑢 𝑡𝑢−1 ∀𝑢 – Stationary process Pr 𝑡𝑢 𝑡𝑢−1 = Pr 𝑡𝑢′ 𝑡𝑢′−1 ∀𝑢′

  • Advantage: can specify the entire process

with a single concise conditional distribution Pr (𝑡′|𝑡)

Markov Process

slide-8
SLIDE 8

CS886 (c) 2013 Pascal Poupart

8

  • Robotic control

– States: 𝑦, 𝑧, 𝑨, 𝜄 coordinates of joints – Dynamics: constant motion

  • Inventory management

– States: inventory level – Dynamics: constant (stochastic) demand

Examples

slide-9
SLIDE 9

CS886 (c) 2013 Pascal Poupart

9

Non-Markovian and/or non-stationary processes

  • What if the process is not Markovian and/or

not stationary?

  • Solution: add new state components until

dynamics are Markovian and stationary

– Robotics: the dynamics of 𝑦, 𝑧, 𝑨, 𝜄 are not stationary when velocity varies… – Solution: add velocity to state description e.g. 𝑦, 𝑧, 𝑨, 𝜄, 𝑦 , 𝑧 , 𝑨 , 𝜄 – If velocity varies… then add acceleration – Where do we stop?

slide-10
SLIDE 10

CS886 (c) 2013 Pascal Poupart

10

Markovian Stationary Process

  • Problem: adding components to the state

description to force a process to be Markovian and stationary may significantly increase computational complexity

  • Solution: try to find the smallest state

description that is self-sufficient (i.e., Markovian and stationary)

slide-11
SLIDE 11

CS886 (c) 2013 Pascal Poupart

11

Inference in Markov processes

  • Common task:

– Prediction: Pr (𝑡𝑢+𝑙|𝑡𝑢)

  • Computation:

– Pr 𝑡𝑢+𝑙 𝑡𝑢 = Pr (𝑡𝑢+𝑗|𝑡𝑢+𝑗−1)

𝑙 𝑗=1 𝑡𝑢+1…𝑡𝑢+𝑙−1

  • Matrix operations:

– Let 𝑈 be a 𝑇 × |𝑇| matrix representing Pr (𝑡𝑢+1|𝑡𝑢) – Then Pr 𝑡𝑢+𝑙 𝑡𝑢 = 𝑈𝑙 – Complexity: 𝑃(𝑙 𝑇 2)

slide-12
SLIDE 12

CS886 (c) 2013 Pascal Poupart

12

Decision Making

  • Predictions by themselves are useless
  • They are only useful when they will influence

future decisions

  • Hence the ultimate task is decision making
  • How can we influence the process to visit

desirable states?

  • Model: Markov Decision Process