Principles of Probabilistic Programming
Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen EWSCS 2020, Palmse, Estonia Joost-Pieter Katoen Principles of Probabilistic Programming 1/56Principles of Probabilistic Programming Lectures at EWSCS 2020 - - PowerPoint PPT Presentation
Principles of Probabilistic Programming Lectures at EWSCS 2020 - - PowerPoint PPT Presentation
Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen EWSCS 2020, Palmse, Estonia Joost-Pieter Katoen Principles of Probabilistic Programming 1/56 Expected runtime analysis Overview Expected
Overview
1 Expected runtime analysis 2 Analysing Bayesian networks 3 Epilogue Joost-Pieter Katoen Principles of Probabilistic Programming 2/56Nuances of termination
Olivier Bournez Florent Garnier . . . . . . certain termination . . . . . . termination with probability one º almost-sure termination . . . . . . in an expected finite number of steps º “positive” almost-sure termination . . . . . . in an expected infinite number of steps º “null” almost-sure termination Joost-Pieter Katoen Principles of Probabilistic Programming 3/56This lecture
A weakest-precondition technique for proving “positive” almost-sure termination. In fact: A weakest-precondition technique for determining the expected runtime of a probabilistic program. Joost-Pieter Katoen Principles of Probabilistic Programming 4/56Expected run-time analysis
Z What? AST+termination in finite expected time Z Generalise. How? Z Provide a weakest precondition calculus Z . . . . . . for expected run-times Z a compositional calculus to reason at program syntax level Z Why? Z Classical weakest-preconditions cannot be used Z Proving positive AST is a special instance Z Reason about the efficiency of randomised algorithms Z Reason about simulative inference of Bayesian networks Joost-Pieter Katoen Principles of Probabilistic Programming 5/56The run time of a probabilistic program is random
int i := 0; repeat {i++; (c := false [1/2] c := true)} until (c) The expected runtime is 1 + 31/2 + 51/4 + . . . + (2n+1)1/2n = . . . < ô. Joost-Pieter Katoen Principles of Probabilistic Programming 6/56Hurdles in runtime analysis
- 1. Programs may diverge while having a finite expected runtime:
- 2. Expected runtimes are extremely sensitive to variations in probabilities
- 3. Having a finite expected time is not compositional (cf. next slide)
PAST is not compositional
Consider the two probabilistic programs: int x := 1; bool c := true; while (c) { c := false [0.5] c := true; x := 2*x } Finite expected termination time Joost-Pieter Katoen Principles of Probabilistic Programming 8/56PAST is not compositional
Consider the two probabilistic programs: int x := 1; bool c := true; while (c) { c := false [0.5] c := true; x := 2*x } Finite expected termination time while (x > 0) { x-- } Finite termination time Joost-Pieter Katoen Principles of Probabilistic Programming 8/56 Pr { k iterations ) = IT Efx , , § Te ' 2 ! , ^ PAST k=o PAST;
- AST
PAST is not compositional
Consider the two probabilistic programs: int x := 1; bool c := true; while (c) { c := false [0.5] c := true; x := 2*x } Finite expected termination time while (x > 0) { x-- } Finite termination time Running the right after the left program yields an infinite expected termination time Joost-Pieter Katoen Principles of Probabilistic Programming 8/56Using wp for expected runtimes?
while(true) { x++ } Z Consider the post-expectation x Z Characteristic function Φx(X) = X(x ( x + 1) Z Candidate upper bound is I = 0 Z Induction: Φx(I) = 0(x ⇥= x + 1) = 0 = I & I We — wrongly — conclude that 0 is the runtime. Using weakest pre-expectations is unsound for expected run-time analysis. Joost-Pieter Katoen Principles of Probabilistic Programming 9/56Expected run-times
Z Expected run-time of program P on input s: ô 9 k=1 k Pr ⌅ “P terminates after k steps on input s” ⌦ Z In general, ert() is a function t ⇥ Σ R'0 < { ô } Z Let’s call this a run-time. Let T denote the set of run-times. Z Complete partial order on T: t1 V t2 iff ºs " Σ. t1(s) & t2(s) Joost-Pieter Katoen Principles of Probabilistic Programming 10/56Continuation passing
Same principle as for weakest pre-conditions: reason backwards Joost-Pieter Katoen Principles of Probabilistic Programming 11/56 ti- ④
- r
O
- =
O
9Expected runtime transformer
Syntax Z skip Z diverge Z x := E Z P1 ; P2 Z if (G) P1 else P2 Z P1 [p] P2 Z while(G)P Semantics ert(P, t) Z 1+t Z ô Z 1 + t[x ⇥= E] Z ert(P1, ert(P2, t)) Z 1 + [G] ert(P1, t) + [¬G] ert(P2, t) Z 1 + p ert(P1, t) + (1p) ert(P2, t) Z lfp X. 1 + ([G] ert(P, X) + [¬G] t) Joost-Pieter Katoen Principles of Probabilistic Programming 12/56Expected runtime transformer
Syntax Z skip Z diverge Z x := E Z P1 ; P2 Z if (G) P1 else P2 Z P1 [p] P2 Z while(G)P Semantics ert(P, t) Z 1+t Z ô Z 1 + t[x ⇥= E] Z ert(P1, ert(P2, t)) Z 1 + [G] ert(P1, t) + [¬G] ert(P2, t) Z 1 + p ert(P1, t) + (1p) ert(P2, t) Z lfp X. 1 + ([G] ert(P, X) + [¬G] t) lfp is the least fixed point operator wrt. the ordering & on run-times This simple mild twist of weakest pre-expectations is sound. Joost-Pieter Katoen Principles of Probabilistic Programming 12/56⑤
%
Example: straight-line code
Joost-Pieter Katoen Principles of Probabilistic Programming 13/56at
I (
at
g ( x
tin))
t
3- a
t g Cx HD)
①
x>
is
.at
Exc id
.O
= g4 2
P 7
6
53
2I
- n
t
It
1+1=2 00
A simple rule for upper bounds
We have ert(while(G) P, t) = lfp X. Φt(X) with Φt(X) = 1 + ([G] ert(P, X) + [¬G] t) By Park’s lemma: if Φt(I) V I then ert(while(G) P, t) V I. Joost-Pieter Katoen Principles of Probabilistic Programming 14/56Induction on loops
while (c) { (x++ [1/2] c := false ) } Z Post runtime equals 0 Z Characteristic function: Φ0(X) = 1 + [c=1] ⇤2 + 1/2 (X(x ( x+1) + X(c ( 0)) Z Candidate for upper bound: I = 1 + [c=1]6 Z Induction: Φ0(I) = 1 + [c=1]⇤2 + 1/2 (1 + [c=1]6 + 1 + [0=1]6) = 1 + [c=1]6 & I By Park’s lemma: ert(while . . .) & 1 + [c=1]6 Joost-Pieter Katoen Principles of Probabilistic Programming 15/56Coupon collector’s problem
Joost-Pieter Katoen Principles of Probabilistic Programming 16/56 00 O OCoupon collector’s problem
cp := [0,...,0]; // no coupons yet i := 1; // coupon to be collected next x := 0: // number of coupons collected while (x < N) { while (cp[i] != 0) { i := uniform(1..N) // next coupon } cp[i] := 1; // coupon i obtained x++; // one coupon less to go } Using our ert-calculus one can prove that expected runtime is Θ(N log N). By systematic formal verification à la Floyd-Hoare. Machine checkable. Joost-Pieter Katoen Principles of Probabilistic Programming 17/56Elementary properties of the ert-calculus
Z Continuity: ert(P, t) is continuous, that is for every chainT = t0 & t1 & t2 & . . . ⇥ ert(P, sup T) = sup ert(P, T) Z Monotonicity: t & t¨ implies ert(P, t) & ert(P, t¨) Z Constant propagation: ert(P, k + t) = k + ert(P, t) Z Preservation of ô: ert(P, ô) = ô Z Connection to wp: ert(P, t) = ert(P, 0) + wp(P, t) Z Affiinity: ert(P, r t + u) = ert(P, 0) + r wp(P, t) + wp(P, u) Joost-Pieter Katoen Principles of Probabilistic Programming 18/56(Positive) almost-sure termination
For every pGCL program P and input state s: ert(P, 0)(s) < ô Õ““““““““““““““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““““““““““““““œ positive a.s-termination on s implies wp(P, 1)(s) = 1 Õ“““““““““““““““““““““““““““““““““““““““““““““““““““—“““““““““““““““““““““““““““““““““““““““““““““““““““œ almost-sure termination on s Moreover: ert(P, 0) < ô Õ“““““““““““““““““““““““““““““““““““““““““““““—“““““““““““““““““““““““““““““““““““““““““““““œ universal positive a.s-termination implies wp(P, 1) = 1 Õ““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““œ universal almost-sure termination These (well-known) facts can be proven using a short proof using the elementary properties. Joost-Pieter Katoen Principles of Probabilistic Programming 19/56C P
, O ) ( s ) so →up
( Pm ) (s )
=I
Toni
It
' sun"
:
,
÷
:
+ .*
it
:÷÷÷÷÷÷÷
:
⇒ etc
It
:p !
"
Pious
't
' up( P ,
n ) ( s ) = IXD
BEST
Obtaining lower bounds inductively
Let n be a natural and let while(G) P be our loop. Runtime transformer In is a lower ω-invariant w.r.t. t iff I0 V Ft(0) and In+1 V Ft(In) for all n Recall: Ft(X) = 1 + ([G] ert(P, X) + [¬G] t). Joost-Pieter Katoen Principles of Probabilistic Programming 20/56*
tea
Lower bounds
If In is a lower ω-invariant w.r.t. t and limn→ô In exists, then: lim n→ô In V ert(while(G) P, t) Completeness: such lower ω-invariants always exist. Joost-Pieter Katoen Principles of Probabilistic Programming 21/56PAST is not compositional
Consider the two probabilistic programs: int x := 1; bool c := true; while (c) { c := false [0.5] c := true; x := 2*x } Finite expected termination time while (x > 0) { x-- } Finite termination time Running the right after the left program yields an infinite expected termination time Joost-Pieter Katoen Principles of Probabilistic Programming 22/56Proving that PAST is not compositional (1)
while (x > 0) { x := x-1 } It is easy to check that a lower ω-invariant is: Jn = 1 + [0 < x < n]2x Õ““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““œ- n iteration
- n termination
Proving that PAST is not compositional (2)
while (c) { {c := false [0.5] c := true}; x := 2*x}; while (x > 0) { x := x-1 } Template for a lower ω-invariant of composed program: In = 1 + [c j 1] (1 + [x > 0]2x) Õ““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““œ- n termination
- n iteration
Proving that PAST is not compositional (2)
while (c) { {c := false [0.5] c := true}; x := 2*x}; while (x > 0) { x := x-1 } Template for a lower ω-invariant of composed program: In = 1 + [c j 1] (1 + [x > 0]2x) Õ““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““œ- n termination
- n iteration
Some works using the ert-calculus
Certification of ert-calculus in Isabelle/HOL theorem prover. [Hölzl, ITP 2016] Automated resource analysis for probabilistic programs. [Hoffmann et al., PLDI 2018] Type-based complexity analysis of probabilistic functional programs [Avanzini, Dal Lago et al., 2019] Expected run-time analysis of quantum programs [Liu, Zhou and Ying 2019] Joost-Pieter Katoen Principles of Probabilistic Programming 25/56Overview
1 Expected runtime analysis 2 Analysing Bayesian networks 3 Epilogue Joost-Pieter Katoen Principles of Probabilistic Programming 26/56The importance of Bayesian networks
“Bayesian networks are as important to AI and machine learning as Boolean circuits are to computer science.” [Stuart Russell (Univ. of California, Berkeley), 2009] Key problem: probabilistic inference. This is PP-complete. Joost-Pieter Katoen Principles of Probabilistic Programming 27/56Printer troubleshooting in Windows 95
How likely is it that your print is garbled given that the ps-file is not and the page orientation is portrait? Joost-Pieter Katoen Principles of Probabilistic Programming 29/56Bayesian inference
How likely does a student end up with a bad mood after getting a bad grade for an easy exam, given that she is well prepared? Joost-Pieter Katoen Principles of Probabilistic Programming 30/56Bayesian inference
Pr(D = 0, G = 0, M = 0 ∂ P = 1) = Pr(D = 0, G = 0, M = 0, P = 1) Pr(P = 1) = 0.6 0.5 0.9 0.3 0.3 = 0.27 Joost-Pieter Katoen Principles of Probabilistic Programming 31/56 O0000
00
Bayesian inference by program verification
Z Exact inference of Bayesian networks is PP-complete Z Approximate inference of BNs is NP-hard Z Typically simulative analyses are employed Z Rejection Sampling Z Markov Chain Monte Carlo (MCMC) Z Metropolis-Hastings Z Gibbs Sampling Z Importance Sampling Z . . . . . . Z Here: weakest precondition-reasoning Joost-Pieter Katoen Principles of Probabilistic Programming 32/56Reasoning about loops
Reasoning about loops is hard. Typically, loop invariants are used to capture the effect of loops. Finding such loop invariants in general is undecidable. Bayesian networks correspond to “simple” probabilistic programs. Loops in such programs are “data-flow” free. Their effect can be given as closed-form solution. Joost-Pieter Katoen Principles of Probabilistic Programming 33/56I.i.d-loops
Loop while(G)P is iid wrt. expectation f whenever: both wp(P, [G]) and wp(P, [¬G] f ) are unaffected by P. f is unaffected by P if none of f ’s variables are modified by P: x is a variable of f iff Ωs.Ωv, u ⇥ f (s[x = v]) j f (s[x = u]) If g is unaffected by program P, then: wp(P, g f ) = g wp(P, f ) Joost-Pieter Katoen Principles of Probabilistic Programming 34/56Example: sampling within a circle
while ((x-5)**2 + (y-5)**2 >= 25){ x := uniform(0..10); y := uniform(0..10) } This program is iid for every f , as both are unaffected by P’s body: wp(P, [G]) = 48 121 and wp(P, [¬G]f ) = 1 121 10p 9 i=0 10p 9 j=0 [(i/p5)2 + (j/p5)2 < 25] f x/(i/p), y/(j/p)⌥ Joost-Pieter Katoen Principles of Probabilistic Programming 35/56Weakest precondition of iid-loops
If while(G)P is iid for expectation f , it holds for every state s: wp(while(G)P, f )(s) = [G](s) wp(P, [¬G]f )(s) 1 wp(P, [G])(s) + [¬G](s) f (s) where we let 0 0 = 0. Proof: use wp(whilen(G)P, f ) = [G] wp(P, [¬G]f ) n2 9 i=0 wp(P, [G])i⌥ + [¬G] f No loop invariant or martingale needed. Fully automatable. Joost-Pieter Katoen Principles of Probabilistic Programming 36/56Bayesian inference
How likely does a student end up with a bad mood after getting a bad grade for an easy exam, given that she is well prepared? Joost-Pieter Katoen Principles of Probabilistic Programming 37/56Bayesian networks as programs
Z Take a topological sort of the BN’s vertices, e.g., D; P; G; M Z Map each conditional probability table (aka: node) to a program, e.g.: if (xD = 0 && xP = 0) { xG := 0 [0.95] xG := 1 } else if (xD = 1 && xP = 1) { xG := 0 [0.05] xG := 1 } else if (xD = 0 && xP = 1) { xG := 0 [0.5] xG := 1 } else if (xD = 1 && xP = 0) { xG := 0 [0.6] xG := 1 } Joost-Pieter Katoen Principles of Probabilistic Programming 38/56Bayesian networks as programs
Z Take a topological sort of the BN’s vertices, e.g., D; P; G; M Z Map each conditional probability table (aka: node) to a program, e.g.: if (xD = 0 && xP = 0) { xG := 0 [0.95] xG := 1 } else if (xD = 1 && xP = 1) { xG := 0 [0.05] xG := 1 } else if (xD = 0 && xP = 1) { xG := 0 [0.5] xG := 1 } else if (xD = 1 && xP = 0) { xG := 0 [0.6] xG := 1 } Z Condition on the evidence, e.g., for P = 1 we get: repeat { progD ; progP; progG ; progM } until (xP=1) Joost-Pieter Katoen Principles of Probabilistic Programming 38/56Soundness
For BN B over V with evidence obs for O N V and value v for node v: wp(prog(B, obs), ⇧ v"V \O xv = v) Õ“““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““—““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““œ wp of the BN program of B = Pr- ⇣
- ⇧
- "O
- = obs(o)
- ⌘
Exact inference by wp-reasoning
Ergo: exact Bayesian inference by wp-reasoning: wp(Pmood, [xD = 0 0 xG = 0 0 xM = 0]) = Pr(D = 0, G = 0, M = 0, P = 1) Pr(P = 1) = 0.27 Joost-Pieter Katoen Principles of Probabilistic Programming 40/56How long to sample a BN?
[Gordon, Nori, Henzinger, Rajamani, 2014] “the main challenge in this setting [sampling-based approaches] is that many samples that are generated during execution are ultimately rejected for not satisfying the observations." Andy Gordon Tom Henzinger Aditya Nori Sriram Rajamani Joost-Pieter Katoen Principles of Probabilistic Programming 41/56Rejection sampling
For a given Bayesian network and some evidence:- 1. Sample from the joint distribution described by the BN
- 2. If the sample complies with the evidence, accept the sample and halt
- 3. If not, repeat sampling (that is: go back to step 1.)
A toy Bayesian network
This BN is parametric (in a) How many samples are needed on average for a single iid-sample for evidence G = 0? Joost-Pieter Katoen Principles of Probabilistic Programming 42/56Sampling time for example BN
Rejection sampling for G = 0 requires 200a2 40a 460 89a2 69a 21 samples: For a " [0.1, 0.78], EST is below 18; for a ' 0.98, 100 samples are needed For real-life BNs, the EST may exceed 1015 Joost-Pieter Katoen Principles of Probabilistic Programming 44/56 isExpected runtime of iid-loops
For a.s.-terminating iid-loop while(G)P for which every iteration runs in the same expected time, we have: ert(while(G)P, t) = 1 + [G] 1 + ert(P, [¬G]t) 1 wp(P, [G]) + [¬G](s) t where 0/0 ⇥= 0 and a/0 ⇥= ô for a j 0. Proof: similar as for the inference (wp) using the decomposition lemma: ert(P, t) = ert(P, 0) + wp(P, t) No loop invariant needed. Fully automatable. Joost-Pieter Katoen Principles of Probabilistic Programming 45/56Sample times of BN programs
Every BN-program is iid for every f , is almost surely terminating, and every loop-iteration takes on average equally long. This enables determining the exact expected sampling times of BNs in a fully automated manner. But: BN-programs may be not positively a.s.-terminating This holds for ill-conditioned BNs. The evidence(s) in such BNs occur with probability zero. Joost-Pieter Katoen Principles of Probabilistic Programming 47/56The student’s mood example
ert(repeat D; P; G; M until (P=1) Õ“““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““—“““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““““œ program of student mood’s BN , 0) = 1 + ert(D; P; G; M, 0) wp(D; P; G; M, [P = 1]) ⌅ 23.46 Joost-Pieter Katoen Principles of Probabilistic Programming 48/56Experimental results
Benchmark BNs from www.bnlearn.com BN ∂V ∂ ∂E∂ aMB ∂O∂ EST time (s) hailfinder 56 66 3.54 5 5 105 0.63 hepar2 70 123 4.51 1 1.5 102 1.84 win95pts 76 112 5.92 3 4.3 105 0.36 pathfinder 135 200 3.04 7 ô 5.44 andes 223 338 5.61 3 5.2 103 1.66 pigs 441 592 3.92 1 2.9 103 0.74 munin 1041 1397 3.54 5 ô 1.43 aMB = average Markov Blanket size, a measure of independence in BNs Joost-Pieter Katoen Principles of Probabilistic Programming 49/56Printer troubleshooting in Windows 95
Java implementation executes about 107 steps in a single second For ∂O∂=17, an EST of 1015 yields 3.6 years simulation for a single iid-sample Joost-Pieter Katoen Principles of Probabilistic Programming 50/56Overview
1 Expected runtime analysis 2 Analysing Bayesian networks 3 Epilogue Joost-Pieter Katoen Principles of Probabilistic Programming 51/56Predictive probabilistic programming
Analysing probabilistic programs at source code level, compositionally. Some open problems: Z Completeness Z Nondeterminism Z Invariant synthesis Joost-Pieter Katoen Principles of Probabilistic Programming 52/56BE
Two take-home messages
Probabilistic programs are a universal quantitative modeling formalism: Bayesian networks, randomised algorithms, infinite-state Markov chains, pushdown Markov chains, security mechanisms, quantum programs, robotics, programs for inexact computing . . . . . . Joost-Pieter Katoen Principles of Probabilistic Programming 53/56Two take-home messages
Probabilistic programs are a universal quantitative modeling formalism: Bayesian networks, randomised algorithms, infinite-state Markov chains, pushdown Markov chains, security mechanisms, quantum programs, robotics, programs for inexact computing . . . . . . “The crux of probabilistic programming is to consider normal-looking programs as if they were probability distributions” [Michael Hicks, The Programming Language Enthusiast blog, 2014] Joost-Pieter Katoen Principles of Probabilistic Programming 53/56A big thanks to my co-authors!
Kevin Batz, Christian Dehnert, Friedrich Gretz, Nils Jansen, Benjamin Kaminski, Christoph Matheja, Annabelle McIver, Larissa Meinecke, Carroll Morgan, Fedrico Olmedo, Lukas Westhofen Joost-Pieter Katoen Principles of Probabilistic Programming 54/56Further reading
Z A. Gordon, T. Henzinger, A. Nori and S. Rajamani. Probabilistic programming. FOSE 2014. Z Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature 2015. Z JPK, A. McIver, L. Meinicke, and C. Morgan. Linear-invariant generation for probabilistic programs. SAS 2010. Z F. Gretz, JPK, and A. McIver. PRINSYS — on a quest for probabilistic loop invariants. QEST 2013. Z F. Gretz, JPK, and A. McIver. Operational versus wp-semantics for pGCL. J. on Performance Evaluation, 2014. Z F. Olmedo, F. Gretz, N. Jansen, B. Kaminski, JPK, A. McIver Conditioning in probabilistic programming. ACM TOPLAS 2018. pGCL model checking: www.stormchecker.org Joost-Pieter Katoen Principles of Probabilistic Programming 55/56Further reading
Z B. Kaminski, JPK, C. Matheja. On the hardness of amalysing probabilistic programs. Acta Inf. 2019. Z B. Kaminski, JPK. A wp-semantics for mixed-sign expectations. LICS 2017. Z B. Kaminski, JPK, C. Matheja, and F. Olmedo. Expected run-time analysis of probabilistic programs.- J. ACM 2018.