parity objectives in countable mdps
play

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr - PowerPoint PPT Presentation

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1 Countable MDPs


  1. Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 2017, Reykjavik 20 June 2017 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 1

  2. Countable MDPs · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 1 bad/odd good/even controlled 1 2 0 . 3 0 . 3 random 1 2 0 . 7 0 . 7 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2

  3. Countable MDPs · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 1 There is no almost-surely winning strategy. sup Pr σ ( Parity ) = 1 σ All finite-memory strategies lose almost surely. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 2

  4. { 1 , 2 , 3 } -Parity · · · · · · 1 1 1 1 1 − 1 2 2 i · · · · · · 2 2 2 1 1 1 2 2 i 3 There exists an almost-surely winning strategy. All finite-memory strategies lose almost surely. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 3

  5. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  6. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity 0 . 5 0 . 5 1 Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  7. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity 0 . 5 0 . 5 1 Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  8. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity optimal MD { 0 , 1 } -Parity { 1 , 2 } -Parity ε -optimal MD Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  9. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 1 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  10. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 3 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ optimal MD means: if ∃ optimal σ , then ∃ optimal σ that is MD Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  11. Our Results in the Mostowski Hierarchy { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity · · · · · 1 1 1 { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity 1 1 − 1 optimal MD 2 2 i · · · · · 2 2 2 { 0 , 1 } -Parity { 1 , 2 } -Parity 1 ε -optimal MD 1 1 2 2 i 3 Safety Reach ε -optimal MD means: sup Pr σ ( Parity ) = sup Pr σ ( Parity ) MD σ σ optimal MD means: if ∃ optimal σ , then ∃ optimal σ that is MD Dichotomy between MD and infinite memory; contrast to finite MDPs Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

  12. Optimal MD-Strategies Theorem Consider a countable-state MDP with { 0 , 1 , 2 } -parity objective. If there exists an optimal strategy, then there exists an optimal strategy that is MD. “Optimal strategies for { 0 , 1 , 2 } -parity may be chosen MD.” Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 5

  13. Optimal MD-Strategies for Co-Büchi Theorem Almost-surely winning strategies for co-Büchi may be chosen MD. Suppose there is an almost-surely winning strategy σ . Focus on states used by σ . They all have an a.s. winning strategy. Set a more ambitious goal: Safety (= never see 1 again) 7 1 3 · · · 8 2 4 1 0 1 0 1 0 1 1 1 1 8 2 4 0 0 0 1 1 1 Always playing for safety is too greedy. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 6

  14. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 0 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  15. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 0 0. Playing the safest action everywhere is not ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  16. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  17. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  18. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  19. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  20. An Optimal MD-Strategy for Co-Büchi � � max Pr σ never see or again 1 1 σ 1 2 3 1 3 0 0. Playing the safest action everywhere is not ok. 1. Fixing the safest action in the blue region is ok. 2. Once we are in dark blue : with prob ≥ 1 2 we stay in blue . 3. The a.s. winning strategy for 1. gets us in dark blue a.s. Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

  21. When MD Suffices For Finitely Branching MDPs { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity optimal MD { 0 , 1 } -Parity { 1 , 2 } -Parity ε -optimal MD Safety Reach Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8

  22. When MD Suffices For Infinitely Branching MDPs { 0 , 1 , 2 , 3 } -Parity { 1 , 2 , 3 , 4 } -Parity { 0 , 1 , 2 } -Parity { 1 , 2 , 3 } -Parity { 0 , 1 } -Parity { 1 , 2 } -Parity D M l a m i t p o D M l a m Safety i t Reach p o ε - Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8

  23. Context of the Paper Our work: countable MDPs Other work: mostly finite MDPs Our work: maximizing the probability of Parity objectives Other work: maximizing expected (discounted) total/average reward/cost Our work: general countable MDPs Other work: countable MDPs arising from specific models: recursive MDPs nondeterministic probabilistic lossy channel systems VASS-induced MDPs one-counter MDPs controlled queueing systems controlled multitype branching processes · · · Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 9

  24. Conditioning a Markov Chain 1 1 1 3 3 3 0 0 1 1 1 3 3 3 ) P y r t ( i r A a 0 1 P | ¬ | A P ( a r r P i t y = 1 1 ) = ) A ( P r r P ( A ) Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10

  25. Conditioning a Markov Chain 1 1 1 3 3 3 0 0 1 1 1 3 3 3 ) P y r t ( i r A a 0 1 P | ¬ | A P ( a r r P i t y = 1 1 ) = ) A ( P r r P ( A ) 1 1 1 1 1 2 3 3 3 3 6 3 0 0 0 0 2 1 1 1 3 6 2 2 0 1 1 1 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 10

  26. Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler’s ruin: 2 / 3 2 / 3 2 / 3 · · · 1 1 / 3 1 / 3 1 / 3 1 / 3 Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11

  27. Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler’s ruin: 2 / 3 2 / 3 2 / 3 · · · 1 1 / 3 1 / 3 1 / 3 1 / 3 Dependence on exact probabilities Kiefer , Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend