CMU-Q 15-381 Lecture 15: Predictions in Markov Chains Markov - - PowerPoint PPT Presentation

cmu q 15 381
SMART_READER_LITE
LIVE PREVIEW

CMU-Q 15-381 Lecture 15: Predictions in Markov Chains Markov - - PowerPoint PPT Presentation

CMU-Q 15-381 Lecture 15: Predictions in Markov Chains Markov Decision Processes Teacher: Gianni A. Di Caro M AKING P REDICTIONS : G ENERAL T WO - STATE MC 6 (7) = 6 (8) . 7 Probability distribution over the states after 9 steps, given


slide-1
SLIDE 1

CMU-Q 15-381

Lecture 15: Predictions in Markov Chains Markov Decision Processes

Teacher: Gianni A. Di Caro

slide-2
SLIDE 2

MAKING PREDICTIONS: GENERAL TWO-STATE MC

2

§ … ! = #$ #% = 1 −( 1 ) Eigenvector matrix § … !*$ =

$ +,-

) −( −1 1 § Diagonalization: !*$.! = 1 1 − ( − ) = 0$ 0% = 1 Eigenvalue matrix § Pre-multiplying both terms by ! and post-multiplying by !*$ : . = !1!*$ § .% = (!1!*$)(!1!*$) = (!1)(!*$!)(1!*$) = (!1)4%(1!*$) = !11!*$ = !1%!*$, 1% = 0$

%

0%

%

ü 6(7) = 6(8).7 Probability distribution over the states after 9 steps, given initial distribution ü :;

(7) = < =7 = >

Absolute probability of state > at step 9 given by the initial distribution § How do we compute .7? . = 1 − ( ( ) 1 − ) 0 < (, ) < 1

slide-3
SLIDE 3

GENERAL TWO-STATE MC

3

! = 1 − % % & 1 − & 0 < %, & < 1 § !* = +,*+-. ,* = /.

*

/0

* ,

+ = 1 −% 1 & , +-. =

. 123

& −% −1 1 § !* = ⋯ =

. 123

& % & % +

67 123

% −% −& & , 8 = 1 − % − & § 8* → 0 as : → ∞ § !* →

. 123

& % & % = < the matrix !* in the limit of large : § Probability distribution over the states after : steps, given initial distribution =(?) : § =(*) = =(?)!* = A.

(?)

A0

(?) !* → A. (?)

A0

(?) < =

= 1 % + & & A.

(?) + &A0 (?)

%A.

(?) + %A0 (?) =

& % + & % % + & as : → ∞, and given that A.

(?) + A0 (?) = 1

slide-4
SLIDE 4

LIMITING DISTRIBUTION FOR GENERAL 2-STATE MC

4

§ State distribution over the states after ! steps, given the initial distribution "($) : lim

)→+ "($),) = lim )→+ "()) =

. / + . / / + . = " à The chain has a limiting state probability distribution, denoted here as " § " is independent of "($) § à " is an Invariant limiting distribution of the chain: the limit exists and its invariant with respect to the initial distribution § The limiting distribution " is also a stationary distribution: if the chain starts (or arrives) in " as a state probability distribution, it stays in " (i.e., the distribution becomes stationary, it won’t change): ", = "

h

β α+β α α+β

i 2 41 − α α β 1 − β 3 5 = h

β(1−α)+αβ α+β αβ+α(1−β) α+β

i = h

β α+β α α+β

i

slide-5
SLIDE 5

LONG-TERM BEHAVIOR: LIMITING DISTRIBUTIONS

5

§ For studying the long-term behavior of a generic MC with one-step transition matrix ! and " states, let’s consider the limit of the #-step conditional transition probabilities, denoted with $: lim

(→* +,- (() = lim (→* 1 2( = 3

24 = 5) = $,- Let’s consider three different cases that can arise from the limit: 1) Limiting distribution exists 2) Limiting but no invariant distribution 3) No limiting (but possibly stationary) distribution

lim

n→∞ T n = lim n→∞

         p(n)

11

p(n)

12

. . . p(n)

1m

p(n)

21

p(n)

22

. . . p(n)

2m

. . . . . . ... . . . p(n)

m1

p(n)

m2

. . . p(n)

mm

         =         Q11 Q12 . . . Q1m Q21 Q22 . . . Q2m . . . . . . ... . . . Qm1 Qm2 . . . Qmm        

slide-6
SLIDE 6

LIMITING DISTRIBUTION DOES EXIST

6

1. Limiting distribution: Let’s consider thet case when, for all !, #:

  • the limit reaches convergence values %&'
  • and for each # the value %&' is independent of initial the state !,
  • → we can write %&' as %' (i.e., %&' = %*', ∀!, ,, # ∈ .), and ∑'12

3 %' = 1

lim

n→∞ T n = lim n→∞

         p(n)

11

p(n)

12

. . . p(n)

1m

p(n)

21

p(n)

22

. . . p(n)

2m

. . . . . . ... . . . p(n)

m1

p(n)

m2

. . . p(n)

mm

         =         Q1 Q2 . . . Qm Q1 Q2 . . . Qm . . . . . . ... . . . Q1 Q2 . . . Qm        

lim

8→9 :&' (8) = lim 8→9 < =8 = #

=> = !) = %&'

slide-7
SLIDE 7

LIMITING DISTRIBUTION IS INVARIANT

7

à The (unconditional) convergence values of the limits for the !-step conditional transition probabilities define the limiting distribution of the chain, which is invariant with respect to the initial conditions § After the process has been in operation for some long duration, the probability of finding it in state " is #$, irrespective of the starting state lim

n→∞ p(0)T n =

h p(0)

1

p(0)

2

. . . p(0)

m

i 2 6 6 6 6 6 6 4 Q1 Q2 . . . Qm Q1 Q2 . . . Qm . . . . . . ... . . . Q1 Q2 . . . Qm 3 7 7 7 7 7 7 5 h 4 5 = h Q1 Pm

i=1 p(0) i

Q2 Pm

i=1 p(0) i

. . . Qm Pm

i=1 p(0) i

i = ⇥ Q1 Q2 . . . Qm ⇤ = p

slide-8
SLIDE 8

LIMITING ⟹ STATIONARY DISTRIBUTIONS

8

§ From "($) = "($'() for ) → ∞, also "($) = "($'(), à the limiting distribution is the solution of the fixed point equation: ", = " à Because of the above equation, the limiting distribution is always also a stationary distribution: if the chain starts with or arrives to at any step ) to a probability state distribution equal to ", it doesn’t change it anymore § " = ", looks similar to an eigenvector equation: -. = /., with eigenvalue / = 1 § By transposing the matrices and calling , as 1: "2 = ("1)2 ⇒ 12"2 = "2, which is a “regular” eigenvector equation § à The transposed transition matrix 12 has eigenvectors with eigenvalue 1 that are stationary distributions expressed as column vectors.

slide-9
SLIDE 9

EIGENVECTORS AND STATIONARY DISTRIBUTION

9

§ Therefore, if the eigenvectors of the transposed transition matrix ! are known, then so are the stationary distributions of the Markov chain. This can save a lot of computations, avoiding to computing powers of !! § The stationary distribution is a left eigenvector (as opposed to the usual right eigenvectors) of the transition matrix, " = "! § Note: When there are multiple eigenvectors associated to an eigenvalue of value 1, each such eigenvector gives rise to an associated stationary

  • distribution. However, this can only occur when the Markov chain is

reducible, i.e. has multiple communicating classes.

slide-10
SLIDE 10

STATIONARY DISTRIBUTION

10

ü Using ! = !# we can easily find the stationary distribution (assumed that there is one, and independently from the limiting distribution) either: ü by solving the linear equation ! = !# ü or by using the eigenvectors of the transposed transition matrix #$ § For instance, in the case of the general 2-state MC, let ! = % 1 − % and then we can solve the matrix equation and find the stationary matrix:

slide-11
SLIDE 11

LIMITING BUT NO INVARIANT DISTRIBUTION

11

2. Limiting but no invariant distribution: Consider the case when for all !, #, the limit reaches convergence values $%& and for each # the value $%& is dependent of the initial the state !, such that we cannot write as before $%& as $&; ∑%)*

+ $%& = 1, ∀# must hold:

lim

n→∞ T n = lim n→∞

         p(n)

11

p(n)

12

. . . p(n)

1m

p(n)

21

p(n)

22

. . . p(n)

2m

. . . . . . ... . . . p(n)

m1

p(n)

m2

. . . p(n)

mm

         =         Q11 Q12 . . . Q1m Q21 Q22 . . . Q2m . . . . . . ... . . . Qm1 Qm2 . . . Qmm         lim

n→∞ p(0)T n =

h p(0)

1

p(0)

2

. . . p(0)

m

i 2 6 6 6 6 6 6 4 Q11 Q12 . . . Q1m Q21 Q22 . . . Q2m . . . . . . ... . . . Qm1 Qm2 . . . Qmm 3 7 7 7 7 7 7 5

à Each different initial distribution /(1) defines a possibly different limiting (stationary) distribution

slide-12
SLIDE 12

LIMITING BUT NO INVARIANT DISTRIBUTION

12

lim

n→∞ p(0)T n =

h p(0)

1

p(0)

2

. . . p(0)

m

i 2 6 6 6 6 6 6 4 Q11 Q12 . . . Q1m Q21 Q22 . . . Q2m . . . . . . ... . . . Qm1 Qm2 . . . Qmm 3 7 7 7 7 7 7 5

§ Example: ! = 1 1 = %&, 2-state MC with 0 ≤ (, * ≤ 1 § !+ = ! for all ,, such that a limiting distribution does exist but it always depends on -(/) 12

(/)

1&

(/)

1 1 = 12

(/)

1&

(/)

slide-13
SLIDE 13

NO LIMITING DISTRIBUTION

13

3. No Limiting distribution: The limit doesn’t reach a convergence value !"# for all $, &. Therefore a limiting distribution as defined doesn’t exist. § ( = 0 1 1 0 , in this case, (,- = 0 1 1 0 , (,-./= 1 1 , § → the succession of (’s powers oscillates between the two matrices, the MC is periodic of period 2 § However, a stationary distribution can still exist § Limiting ⇒ Stationary, but the opposite doesn’t necessarily hold

slide-14
SLIDE 14

NO LIMITING, YES STATIONARY DISTRIBUTION

14

§ ! = 0 1 1 0 , with, !%& = 0 1 1 0 , !%&()= 1 1 § The solution of the fixed point equation: pT = p ⇒ ⇥ a 1 − a ⇤ 2 40 1 1 3 5 = ⇥ a 1 − a ⇤ → ⇥ 1 − a a ⇤ = ⇥ a 1 − a ⇤ the resulting equation system: 1 − + = + + = 1 − + + = 0.5 satisfies the equations → / = 0.5 0.5 is a stationary distribution § This is intuitively expected since the oscillating behavior of the powers of ! that results in pairwise symmetric matrices, perfectly balances the probabilities of the two states of the chain.

slide-15
SLIDE 15

M-STATE MC

15

§ What about !-state chains? à Same analysis of 2-state, with !-dim matrices § "# = %&#%'( &# = )(

#

⋯ ⋮ ⋱ ⋮ ⋯ ).

#

§ Example " = 1/4 1/2 1/4 1/2 1/4 1/4 1/4 1/4 1/2 § Eigenvectors: 3( = 1 1 1 34 = −1 −1 2 36 = 1 −1 § Eigenvalues: 7( = 1, 74 =

( 9 , 76= −1/4

§ "# =

( :

1 −1 1 1 −1 −1 1 2 1 (

( 9)#

(−

( 9)#

2 2 2 −1 −1 2 3 −3 > → ∞ A = 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 § B(#) = B(C)"# à The limiting distribution: B = lim

#→G B(#) = lim #→G B(C)"# = B(C)A

§ B = 1/3 1/3 1/3 which is also a stationary distribution

slide-16
SLIDE 16

M-STATE MC

16

§ !" =

$ %

1 −1 1 1 −1 −1 1 2 1 (

$ +)"

(−

$ +)"

2 2 2 −1 −1 2 3 −3 . → ∞ 1 = 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 § 3(") = 3(4)!" à The limiting distribution: 3 = lim

"→8 3(") = lim "→8 3(4)!" = 3(4)1

§ 3 = 1/3 1/3 1/3 which is an invariant limit distribution (stationary distribution)

h p(0)

1

p(0)

2

p(0)

3

i 2 6 6 6 4 q11 q12 q13 q21 q22 q23 q31 q32 q33 3 7 7 7 5 = 2 6 6 6 6 4 p(0)

1 q11 + p(0) 2 q21 + p(0) 3 q31

p(0)

1 q12 + p(0) 2 q22 + p(0) 3 q32

p(0)

1 q3 + p(0) 2 q23 + p(0) 3 q33

3 7 7 7 7 5 = 2 6 6 6 6 4 p(0)

1 1/3 + p(0) 2 1/3 + p(0) 3 1/3

p(0)

1 1/3 + p(0) 2 1/3 + p(0) 3 1/3

p(0)

1 1/3 + p(0) 2 1/3 + p(0) 3 1/3

3 7 7 7 7 5

slide-17
SLIDE 17

RELEVANT QUESTIONS FOR A MC

17

§ Fundamental Prediction queries: § What will be the probability of each state in the long run? § What will be the probability of state ! in the long run? § Will the state probability distribution be stationary? § Under which conditions a MC has a limiting (and therefore a stationary) distribution? § Under which conditions the limit distribution is invariant? § How long does it take to reach (approximately) the limit distribution?

t=0 t=100 t=1000 t=100000 t=100000 t=100001

slide-18
SLIDE 18

STATE CLASSIFICATION

18

§ Answering these questions requires introducing a state classification, based

  • n whether a state can be reached or not during the evolution of the chain

§ Absorbing states: Once entered there’s no escape, !"" = 1, !"& = 0 ∀) ≠ + § Periodic states: The probability of a return to a state ) at a step , is !""

(.) > 0, , = 2, 22, 32 , … (periodic with period 2)

§ Persistent states (also referred to as recurrent states): following a first visit, a return at any step to the state is certain § Non-Null (persistent) states: if start in state +, the mean number of steps 5& to return to state + is finite, 5& < ∞ § Null (persistent) states: if start in state +, the mean number of steps 5& to return to state + is infinite, 5& = ∞ § Transient states: a return to the state is not certain § Ergodic states: aperiodic + persistent + non-null

slide-19
SLIDE 19

IRREDUCIBLE AND REGULAR CHAINS

19

§ Irreducible chain: every state can be reached or is accessible from every other state in the chain in a finite number of steps § Since any state !

" can be reached from any other state !#, irreducibility

means: $#"

% > 0 for some integer (

§ A matrix ) = +#" is said positive if +#" > 0 for all ,, . § Regular Markov Chain: if there exists an integer ( such as /% is positive § Regular chain ⇒ Irreducible § Irreducible ⇏ Regular chain (not necessarily)

T =  0 1 1   , T 2n =  1 1   , T 2n+1 =  0 1 1   = T

/ is irreducible, but no power of / is a positive matrix

§ Theorem: All states of an irreducible chain are of the same type, either all transient or all persistent and all have the same period. § However, they cannot be all transient since it would mean that the return to any state would not be certain even though all states are accessible from all other states in a finite number of steps ⇒ All states are recurrent

slide-20
SLIDE 20

ERGODIC CHAINS

20

§ Ergodic chain: All states are ergodic, that is, persistent, non-null, aperiodic § Irreducible + aperiodic states ⇒ Ergodic § Regular ⇒ Irreducible (⇒ recurrent states) + aperiodic ⇒ Ergodic § Note: Ergodic ⇏ Regular § A MC is ergodic if there is a number # such that any state can be reached from any other state in any number of steps greater than or equal to a number # § In case of a fully connected transition matrix, where all transitions have a non-zero probability, this condition is trivially fulfilled with # = 1

slide-21
SLIDE 21

ERGODIC CHAINS AND STATIONARY DISTRIBUTIONS

21

§ Ergodic Markov Chains have a limiting invariant distribution ! § à Have a stationary distribution ! § à Regardless of the initial state, the time-t distribution of the chain converges to ! as t tends to infinity § How large must t be until the time-t distribution is approximately ! ? à Mixing time § For an ergodic chain the invariant distribution ! is the vector of mean recurrence time reciprocals § Check for ergodicity: If there’s only one eigenvalue of # that takes value 1 then the Markov chain is ergodic (this derives from the eigenvector equation #$!$ = !$)

slide-22
SLIDE 22

TAXONOMY OF MARKOV PROCESSES

22

§ Markov chain: prediction, what is the state distribution at time t ? Common: discrete-time random process, countable state set, transition matrix, that defines the internal stochastic dynamics

slide-23
SLIDE 23

Environment’s dynamics

TAXONOMY OF MARKOV PROCESSES

23

§ Markov chain: prediction, what is the state distribution at time t ? § Markov reward process: MC ∪ {Rewards}, prediction, what is the expected cumulative reward at time t ? what is state distribution at t ?

  • 2

+1

  • 2
  • 10

+1 +2 +3

slide-24
SLIDE 24

actions

TAXONOMY OF MARKOV PROCESSES

24

§ Markov chain: prediction, what is the state distribution at time t ? § Markov reward process: MC ∪ {Rewards}, prediction, what is the expected cumulative reward at time t ? what is state distribution at t ? § Markov decision process (MDP): MC ∪ {Rewards} ∪ {Actions}, control, what is the optimal decision policy to optimize collected rewards?

  • 2
  • 10

+1 +2 +3

slide-25
SLIDE 25

MARKOV MODELS TAXONOMY: STATE OBSERVABILITY, PREDICTION VS. CONTROL

25

slide-26
SLIDE 26

STOCHASTIC OUTCOMES WITH ACTIONS

26

Deterministic actions Uncertain actions

slide-27
SLIDE 27

STOCHASTIC OUTCOMES AND MARKOV PROPERTY

27

Action effect is stochastic: probability distribution over next states

In general, non-Markov, the outcome can depend on all action history: P(st+1 = s0 | st, st1, . . . , s0, at, at1, . . . , a0) = P(st+1 = s0 | st:0, at:0) ü Deterministic: one single successor state, !, # → !% ü Probabilistic: Conditional distribution of successor states + Markov property, !&, #& → ' !&() = !% !& = !, #& = #) !, # → ' !% !, #)

slide-28
SLIDE 28

EXAMPLE: GRID WORLD

28

§ A maze-like problem § The agent lives in a grid world § Walls block the agent’s path § The agent receives rewards each time step § Small “living” reward R each step (can be negative) § Big rewards come at the end (good or bad) § Goal: maximize sum of rewards § Potentially unlimited horizon § Noisy movement: actions do not always go as planned § 80% of the time, the action takes the agent in the desired direction (if there is no wall there) § 10% of the time, the action takes the agent to the direction perpendicular to the right; 10% perpendicular to the left. § If there is a wall in the direction the agent would have gone, agent stays put

Exit Exit

+1

  • 1

R How do we formalize it and find the optimal policy?

slide-29
SLIDE 29

MARKOV DECISION PROCESSES (MPD)

29

Goal: Define the action decision policy !(#, %) that maximizes a given (utility) function of the rewards, potentially for ' → ∞

§ A set * of world states § A set + of feasible actions § A stochastic transition matrix ,, ,: *×*×+× 0,1, … , ' ↦ 0,1 , , #, #3, % = 5 #3 #, %) § A reward function 6: 6 # , 6 #, % , 6 #, %, #3 , 6: *×+×*× 0,1, … , ' ⟼ ℝ § A start state (or a distribution of initial states), optional § Terminal/Absorbing states, optional Presence of ' accounts for non-homogeneous Markov processes

slide-30
SLIDE 30

MARKOV DECISION PROCESSES (MPD)

30

General model for probabilistic planning MDP = < ", $%&'(&, "&)(*, +, ,, - > Find the policy optimizing the expected utility Classical deterministic planning P = < ", $%&'(&, "/0'1, +, ,, 2, 3 > Find the action sequence achieving the (best) goal state (least cost path)

slide-31
SLIDE 31

RECYCLING ROBOT

31

Example from Sutton and Barto

Note: the “state” (robot’s battery status) is a parameter of the agent itself, not a property of the physical environment

§ At each step, a recycling robot has to decide whether it should: search for a can; wait for someone to bring it a can; go to home base and recharge. § Searching is better but runs down the battery; if runs out of power while searching, it has to be rescued. § States are battery levels: high,low. § Reward = number of cans collected (expected)

slide-32
SLIDE 32

POLICIES

32

§ In deterministic single-agent search problems, we were looking for an optimal plan, or sequence of actions, from start to a goal § In MDPs we (usually) don’t have a specific goal, but we look for a policy, a mapping from states to actions: !: # → % § p(') deterministically specifies what action to take in each state → Deterministic policy § An explicit policy defines a reflex agent § A policy can also be stochastic: p(', *) specifies the probability of taking action * in state ' § In MDPs, if + is deterministic, the optimal policy is deterministic

slide-33
SLIDE 33

HOW MANY POLICIES?

33

§ How many non-terminal (absorbing) states? § How many actions? § How many deterministic policies

  • ver non-terminal states?

§ 9, 4, 49 § For a grid of a 100x100 cells, the # of policies is 410000 a huge number!