Bayes meets Dijkstra Exact Inference by Program Verification - PowerPoint PPT Presentation

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl Seminar “Model Checking and ML Join Forces” 2018 Joost-Pieter Katoen Bayes meets Dijkstra 1/65

Bayes meets Dijkstra Joost-Pieter Katoen Bayes meets Dijkstra 2/65

Joost-Pieter Katoen Bayes meets Dijkstra 3/65

Perspective “There are several reasons why probabilistic programming could prove to be revolutionary for machine intelligence and scientific modelling.” 1 Why? Probabilistic programming 1. . . . obviates the need to manually provide inference methods 2. . . . enables rapid prototyping 3. . . . clearly separates the model and the inference procedures 1 Ghahramani leads the Cambridge ML Group, and is with CMU, UCL, and Turing Institute. Joost-Pieter Katoen Bayes meets Dijkstra 4/65

Predictive probabilistic programming Verifiable programs are preferable to simulative guarantees. Our take: reason on program code, compositionally. Joost-Pieter Katoen Bayes meets Dijkstra 5/65

Probabilistic graphical models Joost-Pieter Katoen Bayes meets Dijkstra 6/65

Student’s mood after an exam How likely does a student end up with a bad mood after getting a bad grade for an easy exam, given that she is well prepared? Joost-Pieter Katoen Bayes meets Dijkstra 7/65

Printer troubleshooting in Windows 95 How likely is it that your print is garbled given that the ps-file is not and the page orientation is portrait? Joost-Pieter Katoen Bayes meets Dijkstra 8/65

Probabilistic programs What? Programs with random assignments and conditioning Why? ▶ Random assignments: to describe randomised algorithms ▶ Conditioning: to describe stochastic decision making Joost-Pieter Katoen Bayes meets Dijkstra 9/65

Applications Languages: webPPL , ProbLog , R2 , Figaro , Venture , . . . . . . Joost-Pieter Katoen Bayes meets Dijkstra 10/65

Two take-home messages Probabilistic programs are a universal quantitative modeling formalism: Bayes’ networks, randomised algorithms, infinite-state Markov chains, pushdown Markov chains, security mechanisms, quantum programs, programs for inexact computing . . . . . . “The goal of probabilistic programming is to enable probabilistic modeling and machine learning to be accessible to the working programmer .” [Gordon, Henzinger, Nori, Rajamani, 2014] Joost-Pieter Katoen Bayes meets Dijkstra 11/65

Roadmap Probabilistic weakest pre-conditions 1 Bayesian inference by program analysis 2 Termination 3 Runtime analysis 4 How long to sample a Bayes’ network? 5 Epilogue 6 Joost-Pieter Katoen Bayes meets Dijkstra 12/65

Probabilistic weakest pre-conditions Overview Probabilistic weakest pre-conditions 1 Bayesian inference by program analysis 2 Termination 3 Runtime analysis 4 How long to sample a Bayes’ network? 5 Epilogue 6 Joost-Pieter Katoen Bayes meets Dijkstra 13/65

Probabilistic weakest pre-conditions Probabilistic GCL Kozen McIver Morgan ▶ skip empty statement ▶ diverge divergence ▶ x := E assignment ▶ observe (G) conditioning ▶ prog1 ; prog2 sequential composition ▶ if (G) prog1 else prog2 choice ▶ prog1 [p] prog2 probabilistic choice ▶ while (G) prog iteration Joost-Pieter Katoen Bayes meets Dijkstra 14/65

Probabilistic weakest pre-conditions Let’s start simple x := 0 [0.5] x := 1; y := -1 [0.5] y := 0; observe (x+y = 0) This program blocks two runs as they violate x+y = 0 . Outcome: Pr [ x = 0, y = 0 ] = Pr [ x = 1, y = − 1 ] = 1 / 2 Observations thus normalize the probability of the “feasible” program runs Joost-Pieter Katoen Bayes meets Dijkstra 15/65

Probabilistic weakest pre-conditions A loopy program For 0 < p < 1 an arbitrary probability: bool c := true ; int i := 0; while (c) { i := i+1; (c := false [p] c := true ) } observe (odd(i)) The feasible program runs have a probability ∑ N ≥ 0 ( 1 − p ) 2 N ⋅ p = 1 2 − p This program models the distribution: Pr [ i = 2 N + 1 ] = ( 1 − p ) 2 N ⋅ p ⋅ ( 2 − p ) for N ≥ 0 Pr [ i = 2 N ] = 0 Joost-Pieter Katoen Bayes meets Dijkstra 16/65

Probabilistic weakest pre-conditions Or, equivalently int i := 0; repeat { c := true ; i := 0; while (c) { i := i+1; (c := false [p] c := true ) } } until (odd(i)) Joost-Pieter Katoen Bayes meets Dijkstra 17/65

Probabilistic weakest pre-conditions Weakest pre-expectations [McIver & Morgan 2004] An expectation 2 maps states onto R ≥ 0 ∪ { ∞ } . It is the quantitative analogue of a predicate. Let f ≤ g iff f ( s ) ≤ g ( s ) , for every state s . An expectation transformer is a total function between two expectations. The transformer wp ( P , f ) yields the least expectation e on P ’s initial state ensuring that P terminates with expectation f . Annotation { e } P { f } holds for total correctness iff e ≤ wp ( P , f ) . Weakest liberal pre-expectation wlp ( P , f ) = “ wp ( P , f ) + Pr [ P diverges ] ′′ . 2 ≠ expectations in probability theory. Joost-Pieter Katoen Bayes meets Dijkstra 18/65

Probabilistic weakest pre-conditions Expectation transformer semantics of pGCL Semantics wp ( P , f ) Syntax f skip 0 diverge f ( x ∶ = E ) x := E [ G ] ⋅ f observe (G) wp ( P 1 , wp ( P 2 , f )) P1 ; P2 [ G ] ⋅ wp ( P 1 , f ) + [ ¬ G ] ⋅ wp ( P 2 , f ) if (G) P1 else P2 p ⋅ wp ( P 1 , f ) + ( 1 − p ) ⋅ wp ( P 2 , f ) P1 [p] P2 µ X . ([ G ] ⋅ wp ( P , X ) + [ ¬ G ] ⋅ f ) while (G)P µ is the least fixed point operator wrt. the ordering ≤ . wlp-semantics differs from wp-semantics only for while and diverge . Joost-Pieter Katoen Bayes meets Dijkstra 19/65

Probabilistic weakest pre-conditions Examples 1. Let program P be: x := 5 [4/5] x := 10 For f = x , we have wp ( P , x ) = 4 5 ⋅ wp ( x ∶ = 5, x ) + 1 5 ⋅ wp ( x ∶ = 10, x ) = 4 5 ⋅ 5 + 1 5 ⋅ 10 = 6 2. Let program P ′ be: x := x+5 [4/5] x := 10 For f = x , we have: wp ( P ′ , x ) = 4 5 ⋅ wp ( x +∶ = 5, x ) + 1 5 ⋅ wp ( x ∶ = 10, x ) = 4 5 ⋅ ( x + 5 ) + 1 5 ⋅ 10 = 4 x 5 + 6 3. For program P ′ (again) and f = [ x = 10 ] , we have: wp ( P ′ , [ x = 10 ]) 5 ⋅ wp ( x ∶ = x + 5, [ x = 10 ]) + 1 5 ⋅ wp ( x ∶ = 10, [ x = 10 ]) 4 = 5 ⋅ [ x + 5 = 10 ] + 1 5 ⋅ [ 10 = 10 ] 4 = 4 ⋅ [ x = 5 ] + 1 = 5 Joost-Pieter Katoen Bayes meets Dijkstra 20/65

Probabilistic weakest pre-conditions An operational perspective For program P , input s and expectation f : wp ( P , f ) ( s ) E { Rew [ [ P ] ] ( ◇ sink ∣ ¬◇↯ ) } wlp ( P , 1 ) ( s ) = s The ratio wp ( P , f ) / wlp ( P , 1 ) for input s equals 3 the conditional expected reward to reach successful terminal state sink while satisfying all observe ’s in MC [ [ P ] ] . For finite-state programs, wp-reasoning can be done with model checkers such as PRISM and Storm ( www.stormchecker.org ). 3 Either both sides are equal or both sides are undefined. Joost-Pieter Katoen Bayes meets Dijkstra 21/65

Bayesian inference by program analysis Overview Probabilistic weakest pre-conditions 1 Bayesian inference by program analysis 2 Termination 3 Runtime analysis 4 How long to sample a Bayes’ network? 5 Epilogue 6 Joost-Pieter Katoen Bayes meets Dijkstra 22/65

Bayesian inference by program analysis Bayesian inference How likely does a student end up with a bad mood after getting a bad grade for an easy exam, given that she is well prepared? Joost-Pieter Katoen Bayes meets Dijkstra 23/65

Bayesian inference by program analysis Bayesian inference Pr ( D = 0, G = 0, M = 0, P = 1 ) Pr ( D = 0, G = 0, M = 0 ∣ P = 1 ) Pr ( P = 1 ) = 0 . 6 ⋅ 0 . 5 ⋅ 0 . 9 ⋅ 0 . 3 0 . 27 = = 0 . 3 Joost-Pieter Katoen Bayes meets Dijkstra 24/65

Bayesian inference by program analysis Bayesian inference by program verification ▶ Exact inference of Bayesian networks is NP-hard ▶ Approximate inference of BNs is NP-hard too ▶ Typically simulative analyses are employed ▶ Rejection Sampling ▶ Markov Chain Monte Carlo (MCMC) ▶ Metropolis-Hastings ▶ Importance Sampling ▶ . . . . . . ▶ Here: weakest precondition-reasoning Joost-Pieter Katoen Bayes meets Dijkstra 25/65

Bayesian inference by program analysis I.i.d-loops Loop while (G)P is iid wrt. expectation f whenever: both wp ( P , [ G ]) and wp ( P , [ ¬ G ] ⋅ f ) are unaffected by P . f is unaffected by P if none of f ’s variables are modified by P : f ( s [ x = v ]) ≠ f ( s [ x = u ]) ∃ s . ∃ v , u ∶ x is a variable of f iff If g is unaffected by program P , then: wp ( P , g ⋅ f ) = g ⋅ wp ( P , f ) Joost-Pieter Katoen Bayes meets Dijkstra 26/65

Bayesian inference by program analysis Example: sampling within a circle while ((x-5)**2 + (y-5)**2 >= 25){ x := uniform (0..10); y := uniform (0..10) } This program is iid for every f , as both are unaffected by P ’s body: wp ( P , [ G ]) 48 and = 121 10 p 10 p wp ( P , [ ¬ G ] ⋅ f ) [( i / p − 5 ) 2 + ( j / p − 5 ) 2 < 25 ] ⋅ f ( x /( i / p ) , y /( j / p )) 1 ∑ ∑ = 121 i = 0 j = 0 Joost-Pieter Katoen Bayes meets Dijkstra 27/65

Bayes meets Dijkstra Exact Inference by Program Verification - PowerPoint PPT Presentation

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl Seminar Model Checking and ML Join Forces 2018 Joost-Pieter Katoen Bayes meets Dijkstra 1/65 Bayes meets Dijkstra Joost-Pieter Katoen Bayes

F ormalizing Dijkstra 1 F ormalizing Dijkstra John Harrison Univ ersit y of Cam

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Probabilistic Programming or Revd. Bayes meets Countess Lovelace John Winn, Microsoft Research

The Structure of the THE Multiprogramming System p g g y THE (Technische

Dijkstra Variants: A* and Potentials Eric Price UT Austin CS 331, Spring 2020 Coronavirus

Formalizing Dijkstra John Harrison Intel Corporation A Discipline of Programming

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019

Announcements Piazza started Matlab Grader homework, email Friday, 2 (of 9) homeworks Due 21

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Trieste, 14 Mai 2015 J. Jasche, Bayesian LSS Inference What do we want to do? homogeneous vs.

Ebba: An Embedded DSL for Bayesian Inference Linkping University, 17 June 2014 Henrik Nilsson

P = P (accidents happen in period t ) = 1 e A P ( B ) t A P ( B ) t , if

Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science

Sambuz

Useful Links

Newsletter

Mail Us

Bayes meets Dijkstra Exact Inference by Program Verification - PowerPoint PPT Presentation

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl Seminar Model Checking and ML Join Forces 2018 Joost-Pieter Katoen Bayes meets Dijkstra 1/65 Bayes meets Dijkstra Joost-Pieter Katoen Bayes

F ormalizing Dijkstra 1 F ormalizing Dijkstra John Harrison Univ ersit y of Cam

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Probabilistic Programming or Revd. Bayes meets Countess Lovelace John Winn, Microsoft Research

The Structure of the THE Multiprogramming System p g g y THE (Technische

Dijkstra Variants: A* and Potentials Eric Price UT Austin CS 331, Spring 2020 Coronavirus

Formalizing Dijkstra John Harrison Intel Corporation A Discipline of Programming

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Bayesian linear regression Dr. Jarad Niemi STAT 544 - Iowa State University April 23, 2019

Announcements Piazza started Matlab Grader homework, email Friday, 2 (of 9) homeworks Due 21

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Trieste, 14 Mai 2015 J. Jasche, Bayesian LSS Inference What do we want to do? homogeneous vs.

Ebba: An Embedded DSL for Bayesian Inference Linkping University, 17 June 2014 Henrik Nilsson

P = P (accidents happen in period t ) = 1 e A P ( B ) t A P ( B ) t , if

Inference in Bayesian Networks Marco Chiarandini Department of Mathematics &amp; Computer Science

Sambuz

Useful Links

Newsletter

Mail Us

Inference in Bayesian Networks Marco Chiarandini Department of Mathematics & Computer Science