Statistical Model Checking for Markov Decision Processes David - - PowerPoint PPT Presentation

statistical model checking for markov decision processes
SMART_READER_LITE
LIVE PREVIEW

Statistical Model Checking for Markov Decision Processes David - - PowerPoint PPT Presentation

Statistical Model Checking for Markov Decision Processes David Henriques Joint work with Jo ao Martins, Paolo Zuliani, Andr e Platzer and Edmund M. Clarke QEST, September 18 th , 2012 David Henriques (CMU) SMC for MDPs QEST12 1 / 37


slide-1
SLIDE 1

Statistical Model Checking for Markov Decision Processes

David Henriques

Joint work with Jo˜ ao Martins, Paolo Zuliani, Andr´ e Platzer and Edmund M. Clarke

QEST, September 18th, 2012

David Henriques (CMU) SMC for MDPs QEST’12 1 / 37

slide-2
SLIDE 2

Outline of this Presentation

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 2 / 37

slide-3
SLIDE 3

Markov Decision Processes

Summary

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 3 / 37

slide-4
SLIDE 4

Markov Decision Processes

Common Settings in MC

Fully probabilistic systems Non-deterministic systems

2/3 1/6 1/6 1/4 1/2 1 1 1 1/4 3/4 1/2 1/2 1/4

Non-deterministic Probabilistic Systems

David Henriques (CMU) SMC for MDPs QEST’12 4 / 37

slide-5
SLIDE 5

Markov Decision Processes

Common Settings in MC

Fully probabilistic systems Non-deterministic systems

2/3 1/6 1/6 1/4 1/2 1 1 1 1/4 3/4 1/2 1/2 1/4

Non-determinism + Probabilism

David Henriques (CMU) SMC for MDPs QEST’12 4 / 37

slide-6
SLIDE 6

Markov Decision Processes

Common Settings in MC

Fully probabilistic systems Non-deterministic systems

2/3 1/6 1/6 1/4 1/2 1 1 1 1/4 3/4 1/2 1/2 1/4 4/5 1/5 1 1/2 1/2 1 1 1 1 1/2 1/2 1 1

Non-deterministic Probabilistic Systems

David Henriques (CMU) SMC for MDPs QEST’12 4 / 37

slide-7
SLIDE 7

Markov Decision Processes

Common Settings in MC

Fully probabilistic systems Non-deterministic systems

2/3 1/6 1/6 1/4 1/2 1 1 1 1/4 3/4 1/2 1/2 1/4 4/5 1/5 1 1/2 1/2 1 1 1 1 1/2 1/2 1 1

Non-deterministic Probabilistic Systems

David Henriques (CMU) SMC for MDPs QEST’12 4 / 37

slide-8
SLIDE 8

Markov Decision Processes

Markov Decision Processes

Definition [Markov Decision Process] A (finite, state labeled) MDP, M, is a tuple S; si; A; τ; Λ; L where: S is a finite set of states with initial state si; A is a finite set of action names; τ : S × A → Dist(S) is a probabilistic transition function; Λ is a set of propositions and L : S → 2Λ is a labeling function.

David Henriques (CMU) SMC for MDPs QEST’12 5 / 37

slide-9
SLIDE 9

Markov Decision Processes

Markov Decision Processes

Definition [Markov Decision Process] A (finite, state labeled) MDP, M, is a tuple S; si; A; τ; Λ; L where: S is a finite set of states with initial state si; A is a finite set of action names; τ : S × A → Dist(S) is a probabilistic transition function; Λ is a set of propositions and L : S → 2Λ is a labeling function.

send wait wait reset ack

David Henriques (CMU) SMC for MDPs QEST’12 5 / 37

slide-10
SLIDE 10

Markov Decision Processes

Markov Decision Processes

Definition [Markov Decision Process] A (finite, state labeled) MDP, M, is a tuple S; si; A; τ; Λ; L where: S is a finite set of states with initial state si; A is a finite set of action names; τ : S × A → Dist(S) is a probabilistic transition function; Λ is a set of propositions and L : S → 2Λ is a labeling function.

send wait wait reset ack 1 1 1 0.001 0.001 0.999 0.999

David Henriques (CMU) SMC for MDPs QEST’12 5 / 37

slide-11
SLIDE 11

Markov Decision Processes

Markov Decision Processes

Definition [Markov Decision Process] A (finite, state labeled) MDP, M, is a tuple S; si; A; τ; Λ; L where: S is a finite set of states with initial state si; A is a finite set of action names; τ : S × A → Dist(S) is a probabilistic transition function; Λ is a set of propositions and L : S → 2Λ is a labeling function.

  • k

fail

  • k

send wait wait reset ack 1 1 1 0.001 0.001 0.999 0.999

David Henriques (CMU) SMC for MDPs QEST’12 5 / 37

slide-12
SLIDE 12

Markov Decision Processes

How to choose actions?

¾ ½ ½ ½ ½ ¼ 1

Definition [Scheduler] A memoryless scheduler for M, σ, is a function σ : S → Dist(S) s.t. for each s ∈ S, σ(s) =

a∈A ps,aτ(s, a) with a∈A ps,a = 1.

Schedulers “solve” the nondeterminism.

David Henriques (CMU) SMC for MDPs QEST’12 6 / 37

slide-13
SLIDE 13

Markov Decision Processes

How to choose actions?

¾ ½ ½ ½ ½ ¼ 1

Schedulers “solve” the nondeterminism.

David Henriques (CMU) SMC for MDPs QEST’12 7 / 37

slide-14
SLIDE 14

Markov Decision Processes

How to choose actions?

¼ ¼ ¼ ¼ ½ ½ ¾ ½ ½ ½ ½ ¼ 1 1 ¾ ¼

σ σ’

Schedulers “solve” the nondeterminism.

David Henriques (CMU) SMC for MDPs QEST’12 7 / 37

slide-15
SLIDE 15

Markov Decision Processes

How to choose actions?

¼ ¼ ¼ ¼ ½ ½ ¾ ½ ½ ½ ½ ¼ 1 1 ¾ ¼

Schedulers “solve” the nondeterminism.

David Henriques (CMU) SMC for MDPs QEST’12 7 / 37

slide-16
SLIDE 16

Markov Decision Processes

Paths and Probabilities (Paths)

Definition [Path] For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-17
SLIDE 17

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-18
SLIDE 18

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-19
SLIDE 19

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-20
SLIDE 20

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-21
SLIDE 21

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-22
SLIDE 22

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-23
SLIDE 23

Markov Decision Processes

Paths and Probabilities (Paths)

Definition For M, σ, a path π is a sequence of states π1 · π2... s.t. ∀i, σ(πi)(πi+1) > 0. ½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-24
SLIDE 24

Markov Decision Processes

Paths and Probabilities

½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-25
SLIDE 25

Markov Decision Processes

Paths and Probabilities (Probabilities)

Proposition Each σ induces a probability measure Pσ over the set of paths given by Pσ({π0 · π1 · ... · πn · ∗ | ∗ is a path, π0 = si}) =

0≤i<n σ(πi)(πi+1)

½ ½ ¼ ¾ 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-26
SLIDE 26

Markov Decision Processes

Paths and Probabilities (Probabilities)

Proposition Each σ induces a probability measure Pσ over the set of paths given by Pσ({π0 · π1 · ... · πn · ∗ | ∗ is a path, π0 = si}) =

0≤i<n σ(πi)(πi+1)

½ ½ ¼ ¾ 1 1 2×1 4 × 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-27
SLIDE 27

Markov Decision Processes

Paths and Probabilities (Probabilities)

Proposition Each σ induces a probability measure Pσ over the set of paths given by Pσ({π0 · π1 · ... · πn · ∗ | ∗ is a path, π0 = si}) =

0≤i<n σ(πi)(πi+1)

½ ½ ¼ ¾ 1 1 2 × 1 4×1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-28
SLIDE 28

Markov Decision Processes

Paths and Probabilities (Probabilities)

Proposition Each σ induces a probability measure Pσ over the set of paths given by Pσ({π0 · π1 · ... · πn · ∗ | ∗ is a path, π0 = si}) =

0≤i<n σ(πi)(πi+1)

½ ½ ¼ ¾ 1 1 2 × 1 4 × 1

David Henriques (CMU) SMC for MDPs QEST’12 8 / 37

slide-29
SLIDE 29

Probabilisitic MC and Statistical MC

Summary

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 9 / 37

slide-30
SLIDE 30

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ.

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-31
SLIDE 31

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

F≤n a

a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-32
SLIDE 32

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

F≤n a

a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-33
SLIDE 33

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

F≤n a

a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-34
SLIDE 34

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-35
SLIDE 35

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-36
SLIDE 36

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-37
SLIDE 37

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-38
SLIDE 38

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-39
SLIDE 39

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

G≤n a

a a a a a a

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-40
SLIDE 40

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

a U≤n b

a a a a b a b

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-41
SLIDE 41

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

a U≤n b

a a a a b a b

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-42
SLIDE 42

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

a U≤n b

a a a a b a b

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-43
SLIDE 43

Probabilisitic MC and Statistical MC

Bounded LTL

Syntax of BLTL ϕ := λ | ¬ϕ | ϕ ∨ ϕ | F≤nϕ | G≤nϕ | ϕU≤nϕ where λ ∈ Λ. Semantics of BLTL

π | = λ if λ ∈ L(π0) π | = ¬ϕ if π | = ϕ π | = ϕ1 ∨ ϕ2 if π | = ϕ1 or π | = ϕ2 π | = F≤nϕ if ∃i≤n : π|i | = ϕ π | = G≤nϕ ∀i≤n : π|i | = ϕ π | = ϕ1U≤nϕ2 ∃i≤n∀k≤i : π|k | = ϕ1 and π|i | = ϕ2

a U≤n b

a a a a b a b

David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

slide-44
SLIDE 44

Probabilisitic MC and Statistical MC

Probabilistic BLTL

The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ, Pσ({π : π | = ϕ}) ≤ θ

David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

slide-45
SLIDE 45

Probabilisitic MC and Statistical MC

Probabilistic BLTL

The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ, Pσ({π : π | = ϕ}) ≤ θ Proposition This is a well posed problem.

David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

slide-46
SLIDE 46

Probabilisitic MC and Statistical MC

We should be so lucky...

We may not have a scheduler, but we still want to guarantee properties...

David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

slide-47
SLIDE 47

Probabilisitic MC and Statistical MC

We should be so lucky...

We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial.

David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

slide-48
SLIDE 48

Probabilisitic MC and Statistical MC

We should be so lucky...

We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial. The (decision) problem for MC for MDPS is finding out if, for a given parameter θ, Pσ({π : π | = ϕ}) ≤ θ for all σ

David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

slide-49
SLIDE 49

SMC for MDPs

Summary

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 13 / 37

slide-50
SLIDE 50

SMC for MDPs

SMC for MDPS

Basic idea “Learn the most adversarial scheduler (or a good enough approximation) by successively refining an initial guess”

David Henriques (CMU) SMC for MDPs QEST’12 14 / 37

slide-51
SLIDE 51

SMC for MDPs

SMC for MDPS

Basic idea “Learn the most adversarial scheduler (or a good enough approximation) by successively refining an initial guess”

David Henriques (CMU) SMC for MDPs QEST’12 14 / 37

slide-52
SLIDE 52

SMC for MDPs

Scheduler Evaluation

Same ideas as classical Statistical Model Checking

σ

φ ≡

θ

David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

slide-53
SLIDE 53

SMC for MDPs

Scheduler Evaluation

Same ideas as classical Statistical Model Checking

Fully Probabilistic System + σ

φ ≡ p1 U<12 (G<10 (¬ p3))

BLTL formula

θ

Probability Treshold Evaluate Traces Sample Traces Hypothesis Testing Answer

Sufficient Statistical Evidence David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

slide-54
SLIDE 54

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ.

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-55
SLIDE 55

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-56
SLIDE 56

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-57
SLIDE 57

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-58
SLIDE 58

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c

1000 tries 0 successes

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-59
SLIDE 59

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c

1000 tries 0 successes 500 tries 500 successes

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-60
SLIDE 60

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c

1000 tries 0 successes 500 tries 500 successes 700 tries 525 successes

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-61
SLIDE 61

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c

1000 tries 0 successes 500 tries 500 successes 700 tries 525 successes

Q(s,a) = 1 Q(s,c) = ¾ Q (s,b) = 0

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-62
SLIDE 62

SMC for MDPs

Scheduler Evalutaion

Record whether state action pairs crossed by samples satisfied ϕ. Empirical quality ˆ Qσ of a visited (s, a) is #(s,a) seen in satisfying traces

# times (s,a) was seen

ˆ Qσ(s, a)

#samples→∞

− → Qσ(s, a) ≡ P(π | = ϕ | (s, a) ∈ π)

a b c Q(s,a) = 1 Q(s,c) = ¾ Q (s,b) = 0

David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

slide-63
SLIDE 63

SMC for MDPs

Scheduler Improvement

New scheduler σ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule σ′(s, a) = ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b)

David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

slide-64
SLIDE 64

SMC for MDPs

Scheduler Improvement

New scheduler σ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule σ′(s, a) = ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b)

a b c Q(s,a) = 1 Q(s,c) = ¾ Q (s,b) = 0

David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

slide-65
SLIDE 65

SMC for MDPs

Scheduler Improvement

New scheduler σ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule σ′(s, a) = ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b)

a b c Q(s,a) = 1 Q(s,c) = ¾ Q (s,b) = 0 σ’(s,a) = 1/(1+ ¾+0)

David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

slide-66
SLIDE 66

SMC for MDPs

Scheduler Improvement

New scheduler σ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule σ′(s, a) = ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b)

a b c Q(s,a) = 1 Q(s,c) = ¾ Q (s,b) = 0 σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7

David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

slide-67
SLIDE 67

SMC for MDPs

Scheduler Improvement

New scheduler σ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule σ′(s, a) = ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b)

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7

David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

slide-68
SLIDE 68

SMC for MDPs

History and Greediness

What if we explore too little? In case there are state action pairs such that ˆ Q(s, a) = 0, keep a history parameter h and update instead σ′(s, a) = hσ(s, a) + (1 − h) ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b) This avoids “blocking” transitions.

David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

slide-69
SLIDE 69

SMC for MDPs

History and Greediness

What if we explore too little? In case there are state action pairs such that ˆ Q(s, a) = 0, keep a history parameter h and update instead σ′(s, a) = hσ(s, a) + (1 − h) ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b) This avoids “blocking” transitions.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7

David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

slide-70
SLIDE 70

SMC for MDPs

History and Greediness

What if we explore too little? In case there are state action pairs such that ˆ Q(s, a) = 0, keep a history parameter h and update instead σ′(s, a) = hσ(s, a) + (1 − h) ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b) This avoids “blocking” transitions.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7 σ(s,b) = 1/3 σ(s,c) = 1/3 σ’(s,a) = 1/3

David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

slide-71
SLIDE 71

SMC for MDPs

History and Greediness

What if we explore too little? In case there are state action pairs such that ˆ Q(s, a) = 0, keep a history parameter h and update instead σ′(s, a) = hσ(s, a) + (1 − h) ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b) This avoids “blocking” transitions.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7 σ(s,b) = 1/3 σ(s,c) = 1/3 σ’(s,a) = 1/3 σ’(s,b) = 1/3*h + 0 * (1-h) > 0

David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

slide-72
SLIDE 72

SMC for MDPs

History and Greediness

What if we explore too little? In case there are state action pairs such that ˆ Q(s, a) = 0, keep a history parameter h and update instead σ′(s, a) = hσ(s, a) + (1 − h) ˆ Qσ(s, a)

  • b∈A ˆ

Qσ(s, b) This avoids “blocking” transitions.

a b c σ’(s,b) = 1/3*h + 0 * (1-h) σ’(s,c) = 1/3*h + 3/7 * (1-h) σ’(s,a) = 1/3*h + 4/7 * (1-h)

David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

slide-73
SLIDE 73

SMC for MDPs

History and Greediness

What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ, which is distributed according to the update rule This avoids slow updates.

David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

slide-74
SLIDE 74

SMC for MDPs

History and Greediness

What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ, which is distributed according to the update rule This avoids slow updates.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7

David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

slide-75
SLIDE 75

SMC for MDPs

History and Greediness

What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ, which is distributed according to the update rule This avoids slow updates.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7 ε

David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

slide-76
SLIDE 76

SMC for MDPs

History and Greediness

What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ, which is distributed according to the update rule This avoids slow updates.

a b c σ’(s,a) = 4/7 σ’(s,b) = 0 σ’(s,c) = 3/7 1-ε

David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

slide-77
SLIDE 77

SMC for MDPs

History and Greediness

What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ, which is distributed according to the update rule This avoids slow updates.

a b c σ’(s,a) = ε + 4/7 *(1-ε) σ’(s,b) = 0 *(1-ε) σ’(s,c) = 3/7 *(1-ε)

David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

slide-78
SLIDE 78

SMC for MDPs

If at first you don’t succeed...

If σ makes Pσ({π : π | = ϕ}) > θ, the property is surely false.

David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

slide-79
SLIDE 79

SMC for MDPs

If at first you don’t succeed...

If σ makes Pσ({π : π | = ϕ}) > θ, the property is surely false. If not We may be converging towards a local optimum; The property may be true;

David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

slide-80
SLIDE 80

SMC for MDPs

If at first you don’t succeed...

Algorithms like this are called “False-biased Monte Carlo Algorithms”

Algorithm Input False True

We can trust We have to reconsider a couple of times

Confidence increases exponentially with the number of times we restart.

Theorem David Henriques (CMU) SMC for MDPs QEST’12 21 / 37

slide-81
SLIDE 81

Why does it work?

Summary

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 22 / 37

slide-82
SLIDE 82

Why does it work?

Value

Definition [Value] The Value of a state s under a scheduler σ is defined as V σ(s) = P(π | = ϕ | (s, a) ∈ π, a ∈ A(s))

David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

slide-83
SLIDE 83

Why does it work?

Value

Definition [Value] The Value of a state s under a scheduler σ is defined as V σ(s) = P(π | = ϕ | (s, a) ∈ π, a ∈ A(s)) Notice that the MC problem can be reduced to finding V (σsi)

David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

slide-84
SLIDE 84

Why does it work?

Value

Definition [Value] The Value of a state s under a scheduler σ is defined as V σ(s) = P(π | = ϕ | (s, a) ∈ π, a ∈ A(s)) Notice that the MC problem can be reduced to finding V (σsi) V σ(s) =

  • a∈A(s)

σ(s, a)Qσ(s, a)

David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

slide-85
SLIDE 85

Why does it work?

Value

Definition [Local Update] Let σ and σ′ be two schedulers. The local update of σ by σ′ in s, σ[σ(s) → σ′(s)] is the scheduler the behaves like σ everywhere but in s, where it behaves as σ′. σ σ′

s s

σ[σ(s → σ′(s))]

David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

slide-86
SLIDE 86

Why does it work?

Value

Definition [Local Update] Let σ and σ′ be two schedulers. The local update of σ by σ′ in s, σ[σ(s) → σ′(s)] is the scheduler the behaves like σ everywhere but in s, where it behaves as σ′. σ σ′

s s s

σ[σ(s → σ′(s))]

David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

slide-87
SLIDE 87

Why does it work?

Value

Definition [Local Update] Let σ and σ′ be two schedulers. The local update of σ by σ′ in s, σ[σ(s) → σ′(s)] is the scheduler the behaves like σ everywhere but in s, where it behaves as σ′. σ σ′

s s s

σ[σ(s → σ′(s))]

David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

slide-88
SLIDE 88

Why does it work?

Value

Theorem [SB] Let σ and σ′ be two schedulers and ∀s ∈ S : V σ[σ(s)→σ′(s)](s) ≥ V σ(s), then ∀s ∈ S : V σ′(s) ≥ V σ(s) Corollary Let σ be the input scheduler and σ′ be the output of Scheduler

  • Improvement. Then

∀s ∈ S : V σ′(s) ≥ V σ(s) and, in particular V σ′(si) ≥ V σ(si)

Proof David Henriques (CMU) SMC for MDPs QEST’12 25 / 37

slide-89
SLIDE 89

Experimental Validation

Summary

1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation

David Henriques (CMU) SMC for MDPs QEST’12 26 / 37

slide-90
SLIDE 90

Experimental Validation

Experimental Validation

We divided models in three categories Heavily structured models Structured models Unstructured models Comparisons were made against PRISM, a state-of-the-art probabilistic model checker

David Henriques (CMU) SMC for MDPs QEST’12 27 / 37

slide-91
SLIDE 91

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-92
SLIDE 92

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

 

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-93
SLIDE 93

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

 

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-94
SLIDE 94

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-95
SLIDE 95

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-96
SLIDE 96

Experimental Validation

Highly Structured Models

CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol

David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

slide-97
SLIDE 97

Experimental Validation

Highly Structured Models

CSMA 3 4 θ 0.5 0.8 0.85 0.9 0.95 PRISM

  • ut

F F F T T 0.86 t 1.7 11.5 35.9 115.7 111.9 136 CSMA 3 6 θ 0.3 0.4 0.45 0.5 0.8 PRISM

  • ut

F F F T T 0.48 t 2.5 9.4 18.8 133.9 119.3 2995 CSMA 4 4 θ 0.5 0.7 0.8 0.9 0.95 PRISM

  • ut

F F F F T 0.93 t 3.5 3.7 17.5 69.0 232.8 16244 CSMA 4 6 θ 0.5 0.7 0.8 0.9 0.95 PRISM

  • ut

F F F F F

timeout

t 3.7 4.1 4.2 26.2 258.9

timeout

WLAN 5 θ 0.1 0.15 0.2 0.25 0.5 PRISM

  • ut

F F T T T 0.18 t 4.9 11.1 124.7 104.7 103.2 1.6 WLAN 6 θ 0.1 0.15 0.2 0.25 0.5 PRISM

  • ut

F F T T T 0.18 t 5.0 11.3 127.0 104.9 102.9 1.6

David Henriques (CMU) SMC for MDPs QEST’12 29 / 37

slide-98
SLIDE 98

Experimental Validation

Highly Structured Models

CSMA 3 4 θ 0.5 0.8 0.85 0.9 0.95 PRISM

  • ut

F F F T T 0.86 t 1.7 11.5 35.9 115.7 111.9 136 CSMA 3 6 θ 0.3 0.4 0.45 0.5 0.8 PRISM

  • ut

F F F T T 0.48 t 2.5 9.4 18.8 133.9 119.3 2995 CSMA 4 4 θ 0.5 0.7 0.8 0.9 0.95 PRISM

  • ut

F F F F T 0.93 t 3.5 3.7 17.5 69.0 232.8 16244 CSMA 4 6 θ 0.5 0.7 0.8 0.9 0.95 PRISM

  • ut

F F F F F

timeout

t 3.7 4.1 4.2 26.2 258.9

timeout

WLAN 5 θ 0.1 0.15 0.2 0.25 0.5 PRISM

  • ut

F F T T T 0.18 t 4.9 11.1 124.7 104.7 103.2 1.6 WLAN 6 θ 0.1 0.15 0.2 0.25 0.5 PRISM

  • ut

F F T T T 0.18 t 5.0 11.3 127.0 104.9 102.9 1.6

David Henriques (CMU) SMC for MDPs QEST’12 29 / 37

slide-99
SLIDE 99

Experimental Validation

Highly Structured Models

Takeaways Symmetry makes the number of “meaningful” actions relatively small; SMC works well in highly structured systems; Exact methods still work best in most cases;

David Henriques (CMU) SMC for MDPs QEST’12 30 / 37

slide-100
SLIDE 100

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant P≤θ(

  • Safe1U≤30

pickup1 ∧

  • Safe′

1U≤30RendezVous

  • Safe2U≤30

pickup2 ∧

  • Safe′

2U≤30RendezVous

  • )

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-101
SLIDE 101

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant P≤θ(

  • Safe1U≤30

pickup1 ∧

  • Safe′

1U≤30RendezVous

  • Safe2U≤30

pickup2 ∧

  • Safe′

2U≤30RendezVous

  • )

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-102
SLIDE 102

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant P≤θ(

  • Safe1U≤30

pickup1 ∧

  • Safe′

1U≤30RendezVous

  • Safe2U≤30

pickup2 ∧

  • Safe′

2U≤30RendezVous

  • )

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-103
SLIDE 103

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant P≤θ(

  • Safe1U≤30

pickup1 ∧

  • Safe′

1U≤30RendezVous

  • Safe2U≤30

pickup2 ∧

  • Safe′

2U≤30RendezVous

  • )

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-104
SLIDE 104

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-105
SLIDE 105

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-106
SLIDE 106

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-107
SLIDE 107

Experimental Validation

Structured Models

Motion Planning - Two robots move around an n by n plant

David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

slide-108
SLIDE 108

Experimental Validation

Structured Models

Robot n = 50 r = 1 θ 0.9 0.95 0.99 PRISM

  • ut

F F F 0.999 t 23.4 27.5 40.8 1252.7 Robot n = 50 r = 2 θ 0.9 0.95 0.99 PRISM

  • ut

F F F 0.999 t 71.7 73.9 250.4 3651.045 Robot n = 75 r = 2 θ 0.95 0.97 0.99 PRISM

  • ut

F F F

timeout

t 382.5 377.1 2676.9

timeout

Robot n = 200 r = 3 θ 0.85 0.9 0.95 PRISM

  • ut

F F T

timeout

t 903.1 1129.3 2302.8

timeout David Henriques (CMU) SMC for MDPs QEST’12 32 / 37

slide-109
SLIDE 109

Experimental Validation

Structured Models

Takeaways Exact methods cannot exploit symmetry so much; Number of really “meaningful” actions still relatively small; SMC works very well in structured systems;

David Henriques (CMU) SMC for MDPs QEST’12 33 / 37

slide-110
SLIDE 110

Experimental Validation

Unstructured Models

Uniform random model - number of actions enabled follows uniform distribution, number of targets per choice follows uniform distribution, targets picked uniformly, probabilities of transitions uniformely distributed. Objective: as little structure as possible.

David Henriques (CMU) SMC for MDPs QEST’12 34 / 37

slide-111
SLIDE 111

Experimental Validation

Unstructured Models

Uniform random model - number of actions enabled follows uniform distribution, number of targets per choice follows uniform distribution, targets picked uniformly, probabilities of transitions uniformely distributed. Objective: as little structure as possible. Results very unpredictable and typically pretty bad.

David Henriques (CMU) SMC for MDPs QEST’12 34 / 37

slide-112
SLIDE 112

Experimental Validation

Unstructured Models

Uniform random model - number of actions enabled follows uniform distribution, number of targets per choice follows uniform distribution, targets picked uniformly, probabilities of transitions uniformely distributed. Objective: as little structure as possible. Results very unpredictable and typically pretty bad. < 0.3 probability gathered after a few hours with SMC. Exact methods fail to produce answers.

David Henriques (CMU) SMC for MDPs QEST’12 34 / 37

slide-113
SLIDE 113

Experimental Validation

Unstructured Models

Takeaways Lack of structure makes this problem very hard; SMC cannot focus on “good” areas; Symbolic methods cannot exploit symmetry when encoding the system.

David Henriques (CMU) SMC for MDPs QEST’12 35 / 37

slide-114
SLIDE 114

Experimental Validation

Conclusions and Future Work

Conclusions Statistical method for MC for probabilism + nondeterminism; Empirically and theoretically validated; Uses bounded memory; Efficient for complex but structured models. Future Work Unbounded LTL; Distributed systems; Schedulers with memory; ...

David Henriques (CMU) SMC for MDPs QEST’12 36 / 37

slide-115
SLIDE 115

Experimental Validation

Bibliography

Zuliani P., Platzer A., Clarke E.M. Bayesian statistical model checking with applications to simulink/stateflow verification. HSCC, 2010. Sutton R.S., Barto A. Reinforcement Learning: An introduction. MIT Press, 1998. Brassard G., Bratley P. Algorithmics - Theory and Practice. Prentice Hall, 1988.

David Henriques (CMU) SMC for MDPs QEST’12 37 / 37

slide-116
SLIDE 116

Experimental Validation

False Biased Monte Carlo Algorithms

Since our algorithm is false biased (results of “false” are always accurate), we can just run the algorithm again to exponentially increase confidence

  • n a “probably true” result.

Bounding theorem [BB] If the probability of success of a single trial of a false biased algorithm is greater than p = 1 − 2

log η T

where T is the number of iterations of the algorithm, than we can ensure a correcness level of 1 − η, (0 < η < 1).

Back David Henriques (CMU) SMC for MDPs QEST’12 37 / 37

slide-117
SLIDE 117

Experimental Validation

Proof of Improvement Theorem

V σ[σ(s)→σ′(s)(s) =

a∈A(s) pǫ(s, a)Qσ(s, a) + (1 − ǫ) maxa∈A(s) Qσ(s, a)

=

a∈A(s) pǫ(s, a)Qσ(s, a) + a∈A(s) σ(s, a) − a∈A(s) pǫ(s, a)

  • maxa∈A(s) Qσ(s, a)

=

a∈A(s) pǫ(s, a)Qσ(s, a) + a∈A(s) [σ(s, a) − pǫ(s, a)] maxa∈A(s) Qσ(s, a)

=

a∈A(s) pǫ(s, a)Qσ(s, a) + a∈A(s)

  • (σ(s, a) − pǫ(s, a)) maxa∈A(s) Qσ(s, a)

a∈A(s) pǫ(s, a)Qσ(s, a) + a∈A [(σ(s, a) − pǫ(s, a))Qσ(s, a)]

=

a∈A(s) pǫ(s, a)Qσ(s, a) + a∈A(s) σ(s, a)Qσ(s, a) − a∈A(s) pǫ(s, a)Qσ(s, a)

=

a∈A(s) σ(s, a)Qσ(s, a) = V σ(s)

σ′(s, a) = (1 − h)

  • I{a = arg maxa Qσ(s, a)}(1 − ǫ) + ǫ
  • Qσ(s,a)
  • b Qσ(s,a)
  • + hσ(s, a)

Back David Henriques (CMU) SMC for MDPs QEST’12 37 / 37