Planning and Optimization
- F1. Markov Decision Processes
Malte Helmert and Thomas Keller
Universit¨ at Basel
Planning and Optimization F1. Markov Decision Processes Malte - - PowerPoint PPT Presentation
Planning and Optimization F1. Markov Decision Processes Malte Helmert and Thomas Keller Universit at Basel November 27, 2019 Motivation Markov Decision Process Policy Summary Content of this Course Foundations Logic Classical
Universit¨ at Basel
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
1 2 3 4 5 1 2 3 4 5
Motivation Markov Decision Process Policy Summary
1 2 3 4 5 1 2 3 4 5
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
0 is a label cost function,
Motivation Markov Decision Process Policy Summary
LR LL TL RL TR RR
Motivation Markov Decision Process Policy Summary
0 is a label cost function,
s′∈S T(s, ℓ, s′) = 1.
Motivation Markov Decision Process Policy Summary
LR LL TL RL TR RR
.8 .2 .2 .8
Motivation Markov Decision Process Policy Summary
s′∈S T(s, ℓ, s′) = 1.
Motivation Markov Decision Process Policy Summary
sets position back to (1,1) gives reward of +1 in (4,3) gives reward of −1 in (4,2)
Motivation Markov Decision Process Policy Summary
p:ℓ
p
ℓ
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Note: n = 0 possible; then s = s′ s0, . . . , sn is called (state) path from s to s′ ℓ1, . . . , ℓn is called (action) path from s to s′ length of path is n cost of path in SSP is n
i=1 c(ℓi) and
reward of path in MDP is n
i=1 γi−1R(si−1, ℓi)
s′ is reached from s through this path with probability n
i=1 pi
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
LR LL TL RL TR RR
move-L, pickup, move-R, drop
Motivation Markov Decision Process Policy Summary
LR LL TL RL TR RR
move-L, pickup, move-R, drop
.8 .2 can’t drop! .2 .8
Motivation Markov Decision Process Policy Summary
LR
move-L
LL
pickup
TL
move-R
RL TR
drop
RR
.8 .2 .2 .8
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary
Motivation Markov Decision Process Policy Summary