PAC Statistical Model Checking for Markov Decision Processes and - - PowerPoint PPT Presentation

pac statistical model checking for markov
SMART_READER_LITE
LIVE PREVIEW

PAC Statistical Model Checking for Markov Decision Processes and - - PowerPoint PPT Presentation

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games 1 Pranav Ashok, Jan K ret nsk y, Maximilian Weininger Technical University of Munich Highlights of Logic, Automata and Games Warsaw, Poland September


slide-1
SLIDE 1

PAC Statistical Model Checking for Markov Decision Processes and Stochastic Games1

Pranav Ashok, Jan Kˇ ret´ ınsk´ y, Maximilian Weininger

Technical University of Munich

Highlights of Logic, Automata and Games Warsaw, Poland

September 19, 2019

1based on paper presented at CAV 2019

slide-2
SLIDE 2

Stochastic Game

Reachability

Objective player: maximize P(F ) player: minimize P(F ) a c b 0.2 0.8

Reachability in limited information stochastic games 2/6

slide-3
SLIDE 3

Stochastic Game

Reachability

Objective player: maximize P(F ) player: minimize P(F ) a c b 0.2 0.8

Reachability in limited information stochastic games 2/6

slide-4
SLIDE 4

Stochastic Game

Reachability

Objective player: maximize P(F ) player: minimize P(F ) a c b 0.2 0.8

Reachability in limited information stochastic games 2/6

slide-5
SLIDE 5

This work: Black-box (limited information setting)

Problem statement Compute V (s) = maxσ minτ Pσ,τ

s

(F ) = minτ maxσ Pσ,τ

s

(F ) with guarantees Unknown successor distribution

Reachability in limited information stochastic games 3/6

slide-6
SLIDE 6

Background

◮ Seminal paper on Stochastic Games [Condon 90]

quadratic programming, strategy iteration, value iteration

Reachability in limited information stochastic games 4/6

slide-7
SLIDE 7

Background

◮ Seminal paper on Stochastic Games [Condon 90]

quadratic programming, strategy iteration, value iteration

◮ Algos not directly applicable on general SG ◮ First practical algorithm for general SG giving guarantees

[Kelmendi et. al. 2018]

Reachability in limited information stochastic games 4/6

slide-8
SLIDE 8

Background

◮ Seminal paper on Stochastic Games [Condon 90]

quadratic programming, strategy iteration, value iteration

◮ Algos not directly applicable on general SG ◮ First practical algorithm for general SG giving guarantees

[Kelmendi et. al. 2018]

◮ This work: first algorithm for limited information SG

Reachability in limited information stochastic games 4/6

slide-9
SLIDE 9

The Algorithm

Similar to Kelmendi et. al. 2018

while U − L is large

  • 1. Simulate and estimate
  • 2. Back-propagate

Reachability in limited information stochastic games 5/6

slide-10
SLIDE 10

The Algorithm

Similar to Kelmendi et. al. 2018

while U − L is large

  • 1. Simulate and estimate
  • 2. Back-propagate

The how

◮ Simulation finds important parts of state space

Reachability in limited information stochastic games 5/6

slide-11
SLIDE 11

The Algorithm

Similar to Kelmendi et. al. 2018

while U − L is large

  • 1. Simulate and estimate
  • 2. Back-propagate

The how

◮ Simulation finds important parts of state space ◮ Simulation computes Hoeffding confidence intervals

ball around estimate such that real prob. falls in the ball with high confidence

Reachability in limited information stochastic games 5/6

slide-12
SLIDE 12

The Algorithm

Similar to Kelmendi et. al. 2018

while U − L is large

  • 1. Simulate and estimate
  • 2. Back-propagate

The how

◮ Simulation finds important parts of state space ◮ Simulation computes Hoeffding confidence intervals

ball around estimate such that real prob. falls in the ball with high confidence

◮ Information conservatively back-propagated

Reachability in limited information stochastic games 5/6

slide-13
SLIDE 13

The Algorithm

Similar to Kelmendi et. al. 2018

while U − L is large

  • 1. Simulate and estimate
  • 2. Back-propagate

The how

◮ Simulation finds important parts of state space ◮ Simulation computes Hoeffding confidence intervals

ball around estimate such that real prob. falls in the ball with high confidence

◮ Information conservatively back-propagated ◮ Other tricks to ensure fixpoint convergence

Reachability in limited information stochastic games 5/6

slide-14
SLIDE 14

Conclusion

◮ Algorithm for reachability in limited information MDP/SG result ∈ [0.6 − ǫ, 0.6 + ǫ] with prob of going wrong 10−8 ◮ Implemented and benchmarked in PRISM Model Checker ◮ First algorithm to do so for SG ◮ First practical algorithm for MDPs

Reachability in limited information stochastic games 6/6