Probabilisti tic Model Checking & P & PRIS RISM
Dave Parker
University of Birmingham HIERATIC kick-off meeting, Birmingham, Dec 2012
Probabilisti tic Model Checking & P & PRIS RISM Dave - - PowerPoint PPT Presentation
Probabilisti tic Model Checking & P & PRIS RISM Dave Parker University of Birmingham HIERATIC kick-off meeting, Birmingham, Dec 2012 Overview Quantitative verification probabilistic model checking Probabilistic
University of Birmingham HIERATIC kick-off meeting, Birmingham, Dec 2012
− probabilistic model checking
− discrete-time Markov chains + PCTL − continuous-time Markov chains + CSL − discrete stochastic models of biological systems
− overview, modelling language, symbolic implementation
− bisimulation, symmetry, abstraction, simulation
− is the application of rigorous, mathematics-based techniques to establish the correctness of computerised systems
− applies formal verification techniques to the modelling and analysing of non-functional aspects of system behaviour (e.g. probability, time, cost, …)
− is a an automated quantitative verification technique for systems that exhibit probabilistic behaviour
Finite-state model Temporal logic specification Result System Counter- example System require- ments
¬EF fail
Model checker
e.g. SMV, Spin
Model ch Model checkin ecking: Automatic formal verification of correctness properties of computerised systems
− unreliability (e.g. component failures) − uncertainty (e.g. message losses/delays over wireless) − randomisation (e.g. in protocols such as Bluetooth, ZigBee) − stochasticity (e.g. biological/chemical reaction rates)
− reliability, performance, quality of service, … − “the probability of an airbag failing to deploy within 0.02s” − “the expected power usage of a sensor network over 1 hour” − “the expected time for a cell signalling pathway to complete”
Probabilistic model
e.g. Markov chain
Probabilistic temporal logic specification
e.g. PCTL, CSL, LTL
Result Quantitative results System Counter- example System require- ments
P<0.01 [ F≤t fail]
0.5 0.1 0.4
Probabilistic model checker
e.g. PRISM
Probabilisti tic model checking: Automatic verification of quantitative properties of systems with stochastic behaviour
− e.g. Markov chains, Markov decision processes, … − specified in high-level modelling formalisms − exhaustive model exploration (all possible states/executions)
− properties specified using temporal logic − “exact” results obtained via numerical computation − linear equation systems, iterative methods, uniformisation, … − as opposed to, for example, Monte Carlo simulations − efficient techniques from verification + performance analysis − mature tool support available, e.g. PRISM
− probabilistic model checking
− discrete-time Markov chains + PCTL − continuous-time Markov chains + CSL − discrete stochastic models of biological systems
− overview, modelling language, symbolic implementation
− bisimulation, symmetry, abstraction, simulation
Di Discrete te ti time Conti tinuous ti time Nondete terministi tic Fully probabilisti tic Discrete-time Markov chains (DTMCs) Continuous-time Markov chains (CTMCs) Markov decision processes (MDPs)
(probabilistic automata)
CTMDPs/IMCs Probabilistic timed automata (PTAs)
− discrete states + probability − for: randomisation, unreliable communication media, …
− discrete states + exponentially distributed delays − for: component failures, job arrivals, molecular reactions, …
− in fact: probabilistic automata [Segala] − probability + nondeterminism (e.g. for concurrency) − for: randomised distributed algorithms, security protocols, …
− probability, nondeterminism + real-time − for wireless comm. protocols, embedded control systems, …
− S is a finite set of states (“state space”) − sinit ∈ S is the initial state − P : S × S → [0,1] is the transition probability matrix where Σs’∈S P(s,s’) = 1 for all s ∈ S − L : S → 2AP is function labelling states with atomic propositions
− i.e. every state has at least
− can add self loops to represent final/terminating states s1 s0 s2 s3
0.01 0.98 0.01 1 1 1 {fail} {succ} {try}
− is a sequence of states s0s1s2s3… such that P(si,si+1) > 0 ∀i − represents an execution (i.e. one possible behaviour) of the system which the DTMC is modelling
− need to define a probability space over paths
− sample space: Path(s) = set of all infinite paths from a state s − basic events: cylinder sets (or “cones”) − cylinder set C(ω), for a finite path ω = set of infinite paths with the common finite prefix ω − event set: least σ-algebra on Path(s) containing C(ω) for all finite paths ω starting in s − probability of cylinder set, e.g. C(ss1s2)=P(s,s1)P(s1,s2)
s1 s2 s
− PCTL = Probabilistic Computation Tree Logic [HJ94]
− key addition is probabilistic operator P − quantitative extension of CTL’s A and E operators
− send → P≥0.95 [ F≤10 deliver ] − “if a message is sent, then the probability of it being delivered within 10 steps is at least 0.95”
− unbounded reachability (F), until (U), globally (G), …
− determine states of a DTMC satisfying a PCTL formula − boils down to: graph analysis, solution of linear equation systems, iterative numerical solution
− if the probability is unknown, how to choose the bound p?
− we allow the form P=? [ ψ ] − “what is the probability that path formula ψ is true?”
− P=? [ F err/total>0.1 ] − “what is the probability that 10% of the NAND gate outputs are erroneous?”
− P=? [ F err/total>0.1 ] − “what is the probability that 10% of the NAND gate outputs are erroneous?”
− P=? [ F≤t reply_count=k ] − “what is the probability that the sender has received k acknowledgements within t clock-ticks?”
− P=? [ F (pairs_a=0 & pairs_b>0) ] − “what is the probability that the party B gains an unfair advantage during the execution of the protocol?” reliability performance fairness
− labelled transition systems augmented with rates − continuous time delays, exponentially distributed
− S is a finite set of states (the “state space”) − sinit ∈ S is the initial state − R : S × S → ℝ≥0 is the transition rate matrix − L : S → 2AP is a labelling with atomic propositions
− used as a parameter to the exponential distribution − transition between s and s’ when R(s,s’)>0 − probability triggered before t time units: 1 – e-R(s,s’)·t s1 s0
3/2 1 {full} {empty}
s2 s3
3/2 3/2 3 3 3
− CSL = Continuous Stochastic Logic [ASSB00,BHHK03] − extension of (non-probabilistic) temporal logic CTL − transient, steady-state and path-based properties
− probabilistic operator P (like PCTL) − steady state operator S
− when a shutdown occurs, the probability of a system recovery being completed between 1 and 2.5 hours without further failure is greater than 0.75
− in the long run, the chance that an inadequate number of routers are operational is less than 0.1
− multiple molecular species, interacting through reactions − cell signalling pathway, gene regulatory network, … − fixed volume (spatially uniform), pressure and temperature
− 3 species A, B and AB; 3 reactions: − reversible binding of A and B to form AB; degradation of A
− discrete, stochastic − continuous, deterministic
k3
k1 k2
− (integer) counts of number of each molecule: x=(xA,xB,xAB) − inherently stochastic process [McQuarrie, Gillespie] − continuous-time Markov chain with states x x − stochastic simulation, numerical soln., probabilistic model checking, …
− (real-valued) concentrations: [A], [B], [AB] − solution of system of coupled
− good approximation of E[x] for very large num.s of molecules
k3
k1 k2
− state vector x=(xA,xB,xAB) − probability P(x,t) that at time t there will be xZ of species Z − stoichiometric vectors: v1=(-1,-1,1), v2=(1,1,-1), v3=(-1,0,0) − ai(x) are time-independent propensity functions − mass-action: proportional to reactant combinations
− transition rates (of exponential delays) derived from ai
i=1 3
k3
k1 k2
− states (xA,xB,xAB) ∈ S = {0,1,2}3 − initial state (2,2,0)
− r1 (binding): rate = xA·xB·k1 − r2 (unbinding) rate = xAB·k2 − r3 (degradation): rate = xA·k3
2,2,0
4k1
1,1,1 0,0,2 1,2,0 0,1,1
k1 2k2 k2
0,2,0
2k3 k3 k3 2k1 k2
k3
k1 k2
− “the probability that there are exactly i A after t seconds”
− “probability that all A proteins are eventually degraded”
− “long-run probability that the total number of Cs and Ds activated is above M”
− “highest probability of it taking more than t seconds for C to become activated, from any state where there are none”
− “the (conditional) probability that all C proteins are eventually activated, given that at least some of them are”
− “the expected number of activated D at time instant t”
− pathway: 12 species, 14 sets of reaction rules − model checking (PRISM) and simulation (stoch. π-calculus) − “in-silico” experiments: systematic removal of components − results validated by subsequent lab experiments
− probability that a signal is present at time T? − P=? [ F=T (FRS2_GRB>0 &relocFRS2=0 & degFRS2=0) ]
Probabilistic model checking for systems biology…
CTMC Temporal logic
e.g. CSL, LTL
Result Quantitative results Biological system Counter- example System properties
P=? [ F=t a>0 ]
0.5 0.1 0.4
PRISM
System model
− probabilistic model checking
− discrete-time Markov chains + PCTL − continuous-time Markov chains + CSL − discrete stochastic models of biological systems
− overview, modelling language, symbolic implementation
− bisimulation, symmetry, abstraction, simulation
− developed at Birmingham/Oxford University, since 1999 − free, open source software (GPL), runs on all major OSs
− models: Markov chains, Markov decision processes, … − properties: PCTL, CSL, LTL, PCTL*, costs/rewards, …
− simple but flexible high-level modelling language − user interface: editors, simulator, experiments, graph plotting − multiple efficient model checking engines (e.g. symbolic)
− in: (Bio)PEPA, stochastic π-calculus, DSD, SBML, Petri nets, … − out: Matlab, MRMC, INFAMY, PARAM, …
− Bluetooth, FireWire, Zeroconf, 802.11, Zigbee, gossiping, …
− consensus, leader election, self-stabilisation, …
− pin cracking, anonymity, quantum crypto, contract signing, …
− robotics, dynamic power management, …
− nanotechnology, cloud computing, manufacturing systems, …
− cell signalling pathways, DNA computation, …
− for Markov chains (and other models)
− networks formed from interacting modules − state of each module given by finite-ranging variables − behaviour of each module specified by guarded commands − interactions between modules through synchronisation − interactions are associated with state-dependent rates
[r1] (a>0) → k1*a : (a’=a-1)&(ab’=ab+1);
action guard rate update
mo modul ule A a : [0..N] in init it N; ab : [0..N] in init it 0; [r1] a>0 → k1*a : (a’=a-1)&(ab’=ab+1); [r2] ab>0 → k2*ab : (a’=a+1)&(ab’=ab-1); [r3] a>0 → k3*a : (a’=a-1); en endm dmodu
le mo modul ule B b : [0..N] in init it N; [r1] b>0 → b : (b’=b-1); [r2] b<N → b : (b’=b+1); en endm dmodu
le
Example (r1): (a,ab,b) (a-1,ab+1,b-1)
k1·a·b
k3
k1 k2
Reactions r1/r2 : Reaction r3 :
− graph-based algorithms, e.g. reachability, qualitative verif. − numerical solution techniques, e.g. probability computation − usually rely on iterative methods: uniformisation-based for transient properties, Gauss-Seidel/etc. for linear equations
− primary source: symbolic implementation techniques − (multi terminal) binary decision diagrams: (MT)BDDs − exploit structure, regularity in high-level model
− “MTBDD”: fully symbolic (up to 1010 states for regular models) − “sparse”: converts to explicit-state storage for fast solution − “hybrid”: mix of symbolic/explicit; best overall performance; usually allows model checking for up to 107-108 states
En Entr try in M x1
1
x2 y1
1
y2
2
x1y1x2y2
2
fM
M
(0,1) = 8 1 0001 8 (1,0) = 2 1 0010 2 (0,3) = 5 1 1 0101 5 (1,3) = 5 1 1 1 0111 5 (2,3) = 5 1 1 1 1101 5 (3,2) = 2 1 1 1 1110 2 y1 x1 8 2 x2 y1 5 x2 y2 y2 y2
En Entr try in M x1
1
x2 y1
1
y2
2
x1y1x2y2
2
fM
M
(0,1) = 8 1 0001 8 (1,0) = 2 1 0010 2 (0,3) = 5 1 1 0101 5 (1,3) = 5 1 1 1 0111 5 (2,3) = 5 1 1 1 1101 5 (3,2) = 2 1 1 1 1110 2
Recursion
y1 x1 8 2 x2 y1 5 x2 y2 y2 y2
y1 x1 8 2 x2 y1 5 x2 y2 y2 y2 En Entr try in M x1
1
x2 y1
1
y2
2
x1y1x2y2
2
fM
M
(0,1) = 8 1 0001 8 (1,0) = 2 1 0010 2 (0,3) = 5 1 1 0101 5 (1,3) = 5 1 1 1 0111 5 (2,3) = 5 1 1 1 1101 5 (3,2) = 2 1 1 1 1110 2
Repeated submatrices Shared MTBDD node
y1 x1 8 2 x2 y1 5 x2 y2 y2 y2 En Entr try in M x1
1
x2 y1
1
y2
2
x1y1x2y2
2
fM
M
(0,1) = 8 1 0001 8 (1,0) = 2 1 0010 2 (0,3) = 5 1 1 0101 5 (1,3) = 5 1 1 1 0111 5 (2,3) = 5 1 1 1 1101 5 (3,2) = 2 1 1 1 1110 2
Identical adjacent submatrices MTBDD node removed
y1 x1 8 2 x2 y1 5 x2 y2 y2 y2 En Entr try in M x1
1
x2 y1
1
y2
2
x1y1x2y2
2
fM
M
(0,1) = 8 1 0001 8 (1,0) = 2 1 0010 2 (0,3) = 5 1 1 0101 5 (1,3) = 5 1 1 1 0111 5 (2,3) = 5 1 1 1 1101 5 (3,2) = 2 1 1 1 1110 2
Blocks of zeros Edge goes straight to zero node
− bisimulation, symmetry, …
− “safe” approximation from a smaller model − analysis of infinite-state systems
− Monte-Carlo simulation + sampling
− probabilistic model checking
− discrete-time Markov chains + PCTL − continuous-time Markov chains + CSL − discrete stochastic models of biological systems
− overview, modelling language, symbolic implementation
− bisimulation, symmetry, abstraction, simulation
− generalised to probabilistic models, e.g.: − probabilistic bisimulation for CTMCs (lumping) [Buchholz’94] − preserves important classes of (temporal logic) queries
− construct and analyse smaller but equivalent quotient model − can be fully automated, using extensions of classic algorithm based on iterative partition splitting (but may be expensive) − however, for probabilistic models, cost of minimisation shown to be worthwhile in some cases [Katoen et al.’07]
− can be applied compositionally (minimise components first) − can be combined with symbolic techniques [Wimmer et al.]
− a simple but common case is component symmetry, i.e. multiple copies of identical processes
− state-level manipulates model (transition matrix) directly; model-level works on high-level modelling formalism − e.g. two approaches implemented for PRISM: − symbolic (MTBDD-based) model-level algorithm [CAV’06] − GRIP: language-level translation [Donaldson/Miller]
− and usually much more efficient to perform the reduction − but not necessarily the smallest bisimulation, and may need manual identification of symmetries in model
− corresponds to “population-based” model (which may be difficult to model directly by hand for complex systems)
− more complex types of symmetry required for systems biology
− hide details irrelevant to the property of interest − essential for verification of large/infinite-state systems − yields smaller/finite model, but with some loss of precision
− e.g. for DTMCs, can use abstract Markov chains (probabilities replaced with intervals) [Fecher/Leucker/Wolf] [Huth]
[0.5,1] [0,0.5] A B D C 0.5 2/3 1/3 0.25 0.25
− abstract Markov chains for DTMCs can be seen as Markov decision processes (MDPs) − yields lower/upper bounds on quantitative properties − generalised to CTMDP abstraction of CTMCs [Katoen/Klink/Leucker/Wolf]
− Erlang-k interval processes [Katoen/Klink/Leucker/Wolf] − Poisson processes + prob. intervals; better lower bounds − sliding window abstraction [Henzinger/Mateescu/Wolf] − abstract different parts of state space for each time point
− e.g. truncation of model for time-bounded properties − INFAMY [Hahn/Hermanns/Wachter/Zhang]
− one promising direction: abstraction refinement − inspired by counterexample-guided abstraction refinement (CEGAR) techniques for non-probabilistic model checking
− so far, mostly for Markov decision processes − RAPTURE [D’Argenio/Jeannet/Jensen/Larsen] − probabilistic CEGAR [Hermanns/Wachter/Zhang] − quantitative abstraction refinement [Kwiatkowska/Norman/Parker] − magnifying lens abstraction [de Alfaro/Roy] − MDP-based abstractions [Chadha/Viswanathan] − and more…
− efficient refinement strategies/heuristics − how to adapt to CTMCs and time-bounded properties? − how to adapt to other types of abstractions for CTMCs?
[error<ε] Initial partition Bounds and strategies [error≥ε] model check abstract refine New partition Return bounds Abstraction Example results: Israeli/Jalfon self-stabilisation protocol
− discrete event (Monte Carlo) simulation + sampling
− approximate result for quantitative query such as P=? [ φ ] − plus a probabilistic guarantee regarding result precision − Prob( |pactual-pestimated| ≤ ɛ ) ≥ 1-δ − can also generate corresponding confidence intervals
− applied to boolean-valued queries such as P∼p [ φ ] − basic idea: stop sampling as soon as the result can be shown to be either true or false with high probability − sensitive to distance between bound p and actual answer − also extended to Bayesian approaches [Jha et al.]
− much more scalable that conventional (numerical computation based) probabilistic model checking − (almost no scalability issues – no need to build model) − wider range of model types (anything that can be effectively simulated) and property types
− loss of precision: only approximate answers − lose ability to definitively establish causal relationships and identify best/worst-case scenarios − speed: possibly very high number of samples required to generate suitable accurate approximations − may be hard to estimate likelihood of rare events
− automatic, exhaustive construction of probabilistic models − analysis of formally specified quantitative properties − efficient techniques and tools available, e.g. PRISM − applications: communication protocols, computer security, randomised algorithms, systems biology, …
− richer models: stochastic hybrid systems, game-based models − more expressive property specification languages − scalability and efficiency − efficient construction of “good” abstractions − how to exploit symmetries, high-level structure − combining simulation-based and numerical methods