Parity Objectives in Countable MDPs
Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1
Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr - - PowerPoint PPT Presentation
Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1 Countable MDPs
Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1
1 1
1
2 2
2
1 1
1 2 1 2i 1 2
1− 1
2i
bad/odd good/even controlled 1 2 random 1
0.3 0.7
2
0.3 0.7
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2
1 1
1
2 2
2
1 1
1 2 1 2i 1 2
1− 1
2i
There is no almost-surely winning strategy. sup
σ
Prσ(Parity) = 1 All finite-memory strategies lose almost surely.
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2
1 1
1
2 2
2
3 1
1 2 1 2i 1 2
1− 1
2i
There exists an almost-surely winning strategy. All finite-memory strategies lose almost surely.
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 3
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity 1 0.5 0.5
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity 1 0.5 0.5
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
ε-optimal MD
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity ε-optimal MD means: sup
σ
Prσ(Parity) = sup
MD σ
Prσ(Parity)
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
ε-optimal MD
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity ε-optimal MD means: sup
σ
Prσ(Parity) = sup
MD σ
Prσ(Parity) 1 1
1
2 2
2
1 1
1 2 1 2i 1 2
1− 1
2i
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
ε-optimal MD
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity ε-optimal MD means: sup
σ
Prσ(Parity) = sup
MD σ
Prσ(Parity)
1 1
1
2 2
2
3 1
1 2 1 2i 1 2
1− 1
2i
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4
ε-optimal MD
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity ε-optimal MD means: sup
σ
Prσ(Parity) = sup
MD σ
Prσ(Parity)
Dichotomy between MD and infinite memory; contrast to finite MDPs 1 1
1
2 2
2
3 1
1 2 1 2i 1 2
1− 1
2i
Theorem Consider a countable-state MDP with {0, 1, 2}-parity objective. If there exists an optimal strategy, then there exists an optimal strategy that is MD. “Optimal strategies for {0, 1, 2}-parity may be chosen MD.”
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 5
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 6
Theorem Almost-surely winning strategies for co-Büchi may be chosen MD. Suppose there is an almost-surely winning strategy σ. Focus on states used by σ. They all have an a.s. winning strategy. Set a more ambitious goal: Safety (= never see 1 again) 1 1 1 1
1 2 1 2 3 4 1 4 7 8 1 8
1 1 1 Always playing for safety is too greedy.
max
σ
Prσ
1
1 again
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
3
1
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
3 2 3
1
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
3 2 3
1
2 we stay in blue .
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
3 2 3
1
2 we stay in blue .
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
max
σ
Prσ
1
1 again
3 2 3
1
2 we stay in blue .
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7
ε-optimal MD
Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8
ε
t i m a l M D
t i m a l M D Safety Reach {0, 1}-Parity {1, 2}-Parity {0, 1, 2}-Parity {1, 2, 3}-Parity {0, 1, 2, 3}-Parity {1, 2, 3, 4}-Parity
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8
Our work: countable MDPs Other work: mostly finite MDPs Our work: maximizing the probability of Parity objectives Other work: maximizing expected (discounted) total/average reward/cost Our work: general countable MDPs Other work: countable MDPs arising from specific models: recursive MDPs nondeterministic probabilistic lossy channel systems VASS-induced MDPs
controlled queueing systems controlled multitype branching processes · · ·
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 9
1
1 3 1 3 1 3 1 3 1 3 1 3
1 1
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10
P r ( A ) = P r ( A | P a r i t y ) P r ( A | ¬ P a r i t y ) = P r ( A )
1
1 3 1 3 1 3 1 3 1 3 1 3
1 1
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10
1 3 1 3 1 6 2 3 1 2
1 1
1 3 1 3 2 3 1 6 1 2
1 P r ( A ) = P r ( A | P a r i t y ) P r ( A | ¬ P a r i t y ) = P r ( A )
Infinite Markov chains are very different from finite ones. Gambler’s ruin:
1 1/3 1/3 1/3 1/3 2/3 2/3 2/3
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11
Infinite Markov chains are very different from finite ones. Gambler’s ruin:
1 1/3 1/3 1/3 1/3 2/3 2/3 2/3 Dependence on exact probabilities
Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11