Probabilisti tic Model Checking and Contr troller Synth thesis - - PowerPoint PPT Presentation

probabilisti tic model checking and contr troller synth
SMART_READER_LITE
LIVE PREVIEW

Probabilisti tic Model Checking and Contr troller Synth thesis - - PowerPoint PPT Presentation

Probabilisti tic Model Checking and Contr troller Synth thesis Dave Parker University of Birmingham AVACS Autumn School, October 2015 Overview Probabilistic model checking verification vs. strategy/controller


slide-1
SLIDE 1

AVACS Autumn School, October 2015

Probabilisti tic Model Checking
 and Contr troller Synth thesis


Dave Parker


University of Birmingham

slide-2
SLIDE 2

2

Overview

  • Probabilistic model checking

− verification vs. strategy/controller synthesis − Markov decision processes (MDPs) − example: robot navigation

  • Multi-objective probabilistic model checking

− examples: power management/team-formation

  • Stochastic (multi-player) games

− example: energy management

  • Permissive controller synthesis
slide-3
SLIDE 3

3

Motivation

  • Verifying probabilistic systems…

− unreliable or unpredictable behaviour

  • failures of physical components
  • message loss in wireless communication
  • unreliable sensors/actuators

− randomisation in algorithms/protocols

  • random back-off in communication protocols
  • random routing to reduce flooding or provide anonymity
  • We need to verify quantitative system properties

− “the probability of the airbag failing to deploy
 within 0.02 seconds of being triggered is at most 0.001” − not just correctness: reliability, timeliness, performance, … − not just verification: correctness by construction

slide-4
SLIDE 4

4

Probabilistic model checking

  • Construction and analysis of probabilistic models

− state-transition systems labelled with probabilities
 (e.g. Markov chains, Markov decision processes) − from a description in a high-level modelling language

  • Properties expressed in temporal logic, e.g. PCTL:

− trigger → P≥0.999 [ F≤20 deploy ] − “the probability of the airbag deploying within
 20ms of being triggered is at at least 0.999” − properties checked against models using
 exhaustive search and numerical computation

0.5 0.1 0.4

slide-5
SLIDE 5

6

Probabilistic model checking

  • Many types of probabilistic models supported
  • Wide range of quantitative properties, expressible in


temporal logic (probabilities, timing, costs, rewards, …)

  • Often focus on numerical results (probabilities etc.)

− analyse trends, look for system flaws, anomalies

  • P≤0.1 [ F fail ] – “the probability of a

failure occurring is at most 0.1”

  • P=? [ F fail ] – “what is the probability
  • f a failure occurring?”
slide-6
SLIDE 6

7

Probabilistic model checking

  • Many types of probabilistic models supported
  • Wide range of quantitative properties, expressible in


temporal logic (probabilities, timing, costs, rewards, …)

  • Often focus on numerical results (probabilities etc.)

− analyse trends, look for system flaws, anomalies

  • Provides "exact" numerical results/guarantees

− compared to, for example, simulation

  • Combines numerical & exhaustive analysis

− especially useful for nondeterministic models

  • Fully automated, tools available, widely applicable

− network/communication protocols, security, biology,
 robotics & planning, power management, …

slide-7
SLIDE 7

9

Markov decision processes (MDPs)

  • Markov decision processes (MDPs)

− widely used also in: AI, planning, optimal control, … − model nondeterministic as well as probabilistic behaviour

  • Nondeterminism for:

− control: decisions made by a controller or scheduler − adversarial behaviour of the environment − concurrency/scheduling: interleavings of parallel components − abstraction, or under-specification, of unknown behaviour s1 s0 s2 s3

0.9 0.1 0.7 1 1 {succ} {err} {init} 0.3 1 a b c a a

slide-8
SLIDE 8

10

Strategies

  • A strategy (or “policy”, “scheduler”, “adversary”)

− is a resolution of nondeterminism, based on history − is (formally) a mapping σ from finite paths to distributions − induces an (infinite-state) discrete-time Markov chain

  • Classes of strategies:

− randomisation: deterministic or randomised − memory: memoryless, finite-memory, or infinite-memory s1 s0 s2 s3

0.9 0.1 0.7 1 1 {succ} {err} {init} 0.3 1 a b c a a

slide-9
SLIDE 9

11

Example strategy

  • Strategy σ which picks b then c in s1

− σ is finite-memory
 and deterministic

  • Fragment of induced Markov chain:

s0

0.9 1

s0s1s0s1s2 s0s1s0s1s3

0.1

s0s1

0.7

s0s1s0 s0s1s1

0.3 1

s0s1s0s1

0.9

s0s1s1s2 s0s1s1s3

0.1 1 1

s0s1s1s2s2 s0s1s1s3s3 s1 s0 s2 s3

0.9 0.1 0.7 1 1 {succ} {err} {init} 0.3 1 a b c a a

slide-10
SLIDE 10

12

Verification vs. Strategy synthesis

  • 1. Verification

− quantify over all possible
 strategies (i.e. best/worst-case) − P≤0.1 [ F err ] : “the probability of an
 error occurring is ≤ 0.1 for all strategies” − applications: randomised communication
 protocols, randomised distributed algorithms, security, …

  • 2. Strategy synthesis

− generation of "correct-by-construction" controllers − P≤0.1 [ F err ] : "does there exist a strategy for which the probability of an error occurring is ≤ 0.1?” − applications: robotics, power management, security, …

  • Two dual problems; same underlying computation:

− compute optimal (minimum or maximum) values s1 s0 s2 s3

0.9 0.1 0.7 1 1 {succ} {err} {init} 0.3 1 a b c a a

slide-11
SLIDE 11

13

Running example

  • Example MDP

− robot moving through terrain divided in to 3 x 2 grid s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

slide-12
SLIDE 12

14

Example - Reachability

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

Verify: P≤0.6 [ F goal1 ]

  • r

Synthesise for: P≥0.4 [ F goal1 ] ⇓ Compute: Pmax=?

[ F goal1 ]

Optimal strategies:
 memoryless and deterministic Computation:
 graph analysis + numerical soln.
 (linear programming, value
 iteration, policy iteration)

slide-13
SLIDE 13

15

Example - Reachability

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

x0 x1

1 1 2/3 min

x0 ≥ x1 (east) x1 ≥ 0.5 (south)

= 0.5 Verify: P≤0.6 [ F goal1 ]

  • r

Synthesise for: P≥0.4 [ F goal1 ] ⇓ Compute: Pmax=?

[ F goal1 ]

Optimal strategies:
 memoryless and deterministic Computation:
 graph analysis + numerical soln.
 (linear programming, value
 iteration, policy iteration)

slide-14
SLIDE 14

16

Example - Reachability

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

Optimal strategy: s0 : east s1 : south s2 : - s3 : - s4 : east s5 : - = 0.5 Verify: P≤0.6 [ F goal1 ]

  • r

Synthesise for: P≥0.4 [ F goal1 ] ⇓ Compute: Pmax=?

[ F goal1 ]

Optimal strategies:
 memoryless and deterministic Computation:
 graph analysis + numerical soln.
 (linear programming, value
 iteration, policy iteration)

slide-15
SLIDE 15

17

Linear temporal logic (LTL)

  • Probabilistic LTL (multiple temporal operators)

− e.g. Pmax=? [ (G¬hazard) ∧ (GF goal1) ] – "maximum probability

  • f avoiding hazard and visiting goal1 infinitely often?"

− e.g. Pmax=? [ ¬zone3 U (zone1 ∧ F zone4) ] – "max. probability of patrolling zones 1 then 4, without passing through 3".

  • Probabilistic model checking

− convert LTL formula ψ to
 deterministic automaton Aψ
 (Buchi, Rabin, finite, …) − build/solve product MDP M⊗Aψ − reduces to reachability problem − optimal strategies are:

  • deterministic
  • finite-memory
  • Det. Buchi automaton Aψ

for ψ = G¬h ∧ GF g1 q1 ¬g1∧¬h g1∧¬h g1∧¬h ¬g1∧¬h q2 true h h q0

slide-16
SLIDE 16

18

Example: Product MDP construction

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north 0.5 east south 0.8 0.1

{goal1} {hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

{goal1} {goal2}

stuck stuck 0.4 0.6 west west east 0.1 0.9 north s0q0 s2q0 s5q1

{goal2}

s4q0 s3q0 s1q2 s4q2 s3q0 s5q2 s2q2

M⊗Aψ M q1 ¬g1∧¬h g1∧¬h g1∧¬h ¬g1∧¬h q2 true h h q0 Aψ ψ = G¬h ∧ GF g1

slide-17
SLIDE 17

19

Example: Product MDP construction

0.5 east south 0.8 0.1

{goal1} {hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

{goal1} {goal2}

stuck stuck 0.4 0.6 west west east 0.1 0.9 north s0q0 s2q0 s5q1

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

{goal2}

s4q0 s3q0 s1q2 s4q2 s3q2 s5q2 s2q2

M⊗Aψ M q1 ¬g1∧¬h g1∧¬h g1∧¬h ¬g1∧¬h q2 true h h q0 Aψ ψ = G¬h ∧ GF g1

slide-18
SLIDE 18

20

MDPs – Other properties

  • Costs and rewards (expected, accumulated values)

− e.g. Rmax=? [ F end ] - "what is the worst-case (maximum) expected time for the protocol to complete?" − e.g. Rmin=? [ F goal2 ] - "what is the optimal (minimum) expected number of moves needed to reach goal2?" − optimal strategies: memoryless and deterministic − similar computation to probabilistic reachability

  • Expected cost/reward to satisfy (co-safe) LTL formula

− e.g. Rmin=? [ ¬zone3 U (zone1 ∧ F zone4) ] – "minimise exp. time to patrol zones 1 then 4, without passing through 3" − optimal strategies: finite-memory and deterministic − build/solve product of MDP and det. finite automaton

  • Nested properties, e.g. using PCTL (branching time logic)
slide-19
SLIDE 19

22

Application: Robot navigation

  • Navigation planning: [IROS'14]

− MDP models navigation through
 an uncertain environment − LTL used to formally specify
 tasks to be executed − synthesise finite-memory strategies
 to construct plans/controllers − links to continuous-space planner

Task
 scheduler Map
 generator Motion planner Navigation planner

slide-20
SLIDE 20

23

Application: Robot navigation

  • Navigation planning MDPs

− expected timed on edges + probabilities − learnt using data from previous explorations

  • LTL-based task specification

− expected time to satisfy (one or more) co-safe LTL formulas − c.f. ad-hoc reward structures, e.g. with discounting − also: efficient re-planning [IROS'14]; progress metric [IJCAI'15]

  • Implementation

− MetraLabs Scitos A5 robot + ROS module based on PRISM

slide-21
SLIDE 21

24

Overview

  • Probabilistic model checking

− verification vs. strategy synthesis − Markov decision processes (MDPs) − example: robot navigation

  • Multi-objective probabilistic model checking

− examples: power management/team-formation

  • Stochastic (multi-player) games

− example: energy management

  • Permissive controller synthesis
slide-22
SLIDE 22

25

Multi-objective model checking

  • Multi-objective probabilistic model checking

− investigate trade-offs between conflicting objectives − in PRISM, objectives are probabilistic LTL or expected rewards

  • Achievability queries: multi(P>0.95 [ F send ], Rtime

>10 [ C ])

− e.g. “is there a strategy such that the probability of message transmission is > 0.95 and expected battery life > 10 hrs?”

  • Numerical queries: multi(Pmax=? [ F send ], Rtime

>10 [ C ])

− e.g. “maximum probability of message transmission, assuming expected battery life-time is > 10 hrs?”

  • Pareto queries:

− multi(Pmax=? [ F send ], Rtime

max=? [ C ])

− e.g. "Pareto curve for maximising
 probability of transmission and
 expected battery life-time”

  • bj1 ¡
  • bj2 ¡
slide-23
SLIDE 23

26

Multi-objective model checking

  • Multi-objective probabilistic model checking

− investigate trade-offs between conflicting objectives − in PRISM, objectives are probabilistic LTL or expected rewards

  • Achievability queries: multi(P>0.95 [ F send ], Rtime

>10 [ C ])

− e.g. “is there a strategy such that the probability of message transmission is > 0.95 and expected battery life > 10 hrs?”

  • Numerical queries: multi(Pmax=? [ F send ], Rtime

>10 [ C ])

− e.g. “maximum probability of message transmission, assuming expected battery life-time is > 10 hrs?”

  • Pareto queries:

− multi(Pmax=? [ F send ], Rtime

max=? [ C ])

− e.g. "Pareto curve for maximising
 probability of transmission and
 expected battery life-time”

  • bj1 ¡
  • bj2 ¡
slide-24
SLIDE 24

27

Multi-objective model checking

  • Optimal strategies:

− usually finite-memory (e.g. when using LTL formulae) − may also need to be randomised

  • Computation:

− construct a product MDP (with several automata),
 then reduces to linear programming [TACAS'07,TACAS'11] − can be approximated using iterative numerical methods,
 via approximation of the Pareto curve [ATVA'12]

  • Extensions [ATVA'12]

− arbitrary Boolean combinations of objectives

  • e.g. ψ1⟹ψ2 (all strategies satisfying ψ1 also satisfy ψ2)
  • (e.g. for assume-guarantee reasoning)

− time-bounded (finite-horizon) properties

slide-25
SLIDE 25

28

Example – Multi-objective

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

  • Achievability query

− P≥0.7 [ G ¬hazard ] ∧ P≥0.2 [ GF goal1 ] ?

  • Numerical query

− Pmax=? [ GF goal1 ] such that P≥0.7 [ G ¬hazard ] ?

  • Pareto query

− for Pmax=? [ G ¬hazard ] ∧ Pmax=? [ GF goal1 ] ?

0.8 0.6 0.4 1 0.2 0.2 0.4 0.5 0.3 0.1 ψ1 ψ2

ψ1 = G ¬hazard ψ2 = GF goal1

True (achievable) ~0.2278

slide-26
SLIDE 26

29

Example – Multi-objective

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

Strategy 1 (deterministic) s0 : east s1 : south s2 : - s3 : - s4 : east s5 : west

0.8 0.6 0.4 1 0.2 0.2 0.4 0.5 0.3 0.1 ψ1 ψ2

ψ1 = G ¬hazard ψ2 = GF goal1

slide-27
SLIDE 27

30

Example – Multi-objective

Strategy 2 (deterministic) s0 : south s1 : south s2 : - s3 : - s4 : east s5 : west

0.8 0.6 0.4 1 0.2 0.2 0.4 0.5 0.3 0.1 ψ1 ψ2

ψ1 = G ¬hazard ψ2 = GF goal1

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

slide-28
SLIDE 28

31

Example – Multi-objective

Optimal strategy: (randomised) s0 : 0.3226 : east 0.6774 : south s1 : 1.0 : south s2 : - s3 : - s4 : 1.0 : east s5 : 1.0 : west

0.8 0.6 0.4 1 0.2 0.2 0.4 0.5 0.3 0.1 ψ1 ψ2

ψ1 = G ¬hazard ψ2 = GF goal1

s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

slide-29
SLIDE 29

32

Multi-objective: Applications

Synthesis of team
 formation strategies


[CLIMA'11, ATVA'12]

50 100 150 200 0.5 1.0 1.5 2.0 500 1000 1500 2000 2500 expected lost customers q u e u e s i z e min power consumption

Synthesis of controllers for
 dynamic power management [TACAS'11]

IBM TravelStar VP disk drive

  • switches between power modes:
  • active/idle/idlelp/stby/sleep

MDP model in PRISM:

  • power manager
  • disk requests
  • request queue
  • power usage

Pareto curve: x="probability of
 completing task 1"; y="probability of
 completing task 2"; z="expected size of successful team" Multi-objective: "minimise energy consumption, subject to constraints on: (i) expected job queue size; (ii) expected number of lost jobs


slide-30
SLIDE 30

33

Overview

  • Probabilistic model checking

− verification vs. strategy synthesis − Markov decision processes (MDPs) − example: robot navigation

  • Multi-objective probabilistic model checking

− examples: power management/team-formation

  • Stochastic (multi-player) games

− example: energy management

  • Permissive controller synthesis
slide-31
SLIDE 31

34

Stochastic multi-player games (SMGs)

  • Stochastic multi-player games

− players control states; choose actions − models competitive/collaborative behaviour − applications: security (system vs. attacker),
 controller synthesis (controller vs. environment),
 distributed algorithms/protocols, …

  • Property specifications: rPATL

− ⟨⟨{1,2}⟩⟩ P≥0.95 [ F≤45 done ] : "can nodes 1,2 collaborate so that the probability of the protocol terminating within 45 seconds is at least 0.95, whatever nodes 3,4 do?" − formally: ⟨⟨C⟩⟩ψ : do there exist strategies for players in C such that, for all strategies of other players, property ψ holds?

  • Model checking [TACAS'12,FMSD'13]

− zero sum properties: analysis reduces to 2-player games − PRISM-games: www.prismmodelchecker.org/games

b a ¼ ¼ ¼ ½ ¼ 1 1 ½ 1 a b 1 a b

slide-32
SLIDE 32

35

Example – Stochastic games

  • Two players: 1 (robot controller), 2 (environment)

− probability of s1-south→s4 is in [p,q] = [0.5-Δ, 0.5+Δ] s0 s4 s3

p east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 1-p 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

s6

q 1-q

si Player 1 Player 2 sj

slide-33
SLIDE 33

36

Example – Stochastic games

  • Two players: 1 (robot controller), 2 (environment)

− probability of s1-south→s4 is in [p,q] = [0.5-Δ, 0.5+Δ] s0 s4 s3

p east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 1-p 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

s6

q 1-q

rPATL: ⟨⟨{1}⟩⟩ Pmax=? [ F goal1 ] Optimal strategies:
 memoryless and deterministic Computation: graph analysis
 & numerical approximation si Player 1 Player 2 sj

slide-34
SLIDE 34

37

Example – Stochastic games

  • Two players: 1 (robot controller), 2 (environment)

− probability of s1-south→s4 is in [p,q] = [0.5-Δ, 0.5+Δ] s0 s4 s3

p east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 1-p 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

s6

q 1-q

rPATL: ⟨⟨{1}⟩⟩ Pmax=? [ F goal1 ] Optimal strategies:
 memoryless and deterministic Computation: graph analysis
 & numerical approximation si Player 1 Player 2 sj

0.4 0.3 0.2 0.5 0.1 0.2 0.4 0.5 0.3 0.1

Δ

  • Max. prob. F goal1

east south

slide-35
SLIDE 35

38

Example: Energy management

  • Energy management protocol for Microgrid

− Microgrid: local energy management − randomised demand management protocol − random back-off when demand is high

  • Original analysis [Hildmann/Saffre'11]

− protocol increases "value" for clients − simulation-based, clients are honest

  • Our analysis

− stochastic multi-player game model − clients can cheat (and cooperate) − model checking: PRISM-games

slide-36
SLIDE 36

39

Example: Energy management

  • Exposes protocol weakness

− incentive for clients
 to act selfishly

  • We propose a simple


fix (and verify it)

− clients can be punished

Value per client
 Value per client, with fix


All follow alg. No use of alg. Deviations of varying size

Number of clients Value per client

All follow alg. Deviations of varying size

Number of clients Value per client

slide-37
SLIDE 37

40

Overview

  • Probabilistic model checking

− verification vs. strategy synthesis − Markov decision processes (MDPs) − example: robot navigation

  • Multi-objective probabilistic model checking

− examples: power management/team-formation

  • Stochastic (multi-player) games

− example: energy management

  • Permissive controller synthesis
slide-38
SLIDE 38

41

Permissive controller synthesis

  • Multi-strategy synthesis [TACAS'14]

− for Markov decision processes and stochastic games − choose sets of actions to take in each state − controller is free to choose any action at runtime − flexible/robust (e.g. actions become unavailable or goals change)

  • Example

Multi-strategy: s0 : east or south s1 : south s2 : - s3 : - s4 : east s5 : west s0 s4 s3

0.5 east

s1

south 0.8 0.1

{goal1}

s2 s5

{hazard}

0.1

{goal2} {goal2}

south 0.5 0.6 0.4 stuck east stuck 0.4 0.6 west west east 0.1 0.9 north

slide-39
SLIDE 39

42

Permissive controller synthesis

  • Multi-strategies and temporal logic

− multi-strategy Θ satisfies a property P>p [ F goal ] iff
 any strategy σ that adheres to Θ satisfies P>p [ F goal ]

  • We quantify the permissivity of multi-strategies

− by assigning penalties to each action in each state − a multi-strategy is penalised for every action it blocks − static and dynamic (expected) penalty schemes

  • Permissive controller synthesis

− ∃ a multi-strategy satisfying P≤0.6 [ F goal1 ] with penalty < c? − what is the multi-strategy with optimum permissivity? − reduction to mixed-integer LP problems − applications: energy management, cloud scheduling, …

slide-40
SLIDE 40

44

Conclusion

  • Probabilistic model checking

− verification vs. controller synthesis − Markov decision processes, temporal logic, applications

  • Recent directions and extensions

− multi-objective probabilistic model checking − model checking for stochastic games − permissive controller synthesis

− Challenges

− stochastic games: multi-objective, equilibria, richer logics − partial information/observability − probabilistic models with continuous time (or space) − scalability, e.g. symbolic methods, abstraction

slide-41
SLIDE 41

More info here:

www.prismmodelchecker.org/lectures/avacs15/

Thanks for your attention