Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes - PowerPoint PPT Presentation

Semantics for Probabilistic Programming Chris Heunen 1 / 27

Bayes’ law P ( A | B ) = P ( B | A ) × P ( A ) P ( B ) 2 / 27

Bayes’ law P ( A | B ) = P ( B | A ) × P ( A ) P ( B ) Bayesian reasoning: ◮ predict future, based on model and prior evidence ◮ infer causes, based on model and posterior evidence ◮ learn better model, based on prior model and evidence 2 / 27

Bayesian networks 3 / 27

Bayesian inference 4 / 27

Bayesian data modelling 1. Develop probabilistic (generative) model 2. Design inference algorithm for model 3. Use algorithm to fit model to data Example: find effect of drug on patient, given data 5 / 27

Linear regression Generative model s ∼ normal ( 0 , 2 ) ∼ normal ( 0 , 6 ) b f ( x ) = s · x + b y i = normal ( f ( i ) , 0 . 5 ) for i = 0 . . . 6 Conditioning y 0 = 0 . 6 , y 1 = 0 . 7 , y 2 = 1 . 2 , y 3 = 3 . 2 , y 4 = 6 . 8 , y 5 = 8 . 2 , y 6 = 8 . 4 Predict f 6 / 27

Linear regression 7 / 27

Probabilistic programming 1. Develop probabilistic (generative) model Write a program 2. Design inference algorithm for model 2. Use built-in algorithm to fit model to data 8 / 27

Probabilistic programming 1. Develop probabilistic (generative) model Write a program 2. Design inference algorithm for model 2. Use built-in algorithm to fit model to data P ( A | B ) ∝ P ( B | A ) × P ( A ) posterior ∝ likelihood × prior functional programming + observe + sample 8 / 27

Linear regression (defquery Bayesian-linear-regression (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) (predict :f f))) 9 / 27

Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 12 / 27

Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 A measurable space is a set X with a family Σ X of subsets that is closed under countable unions and complements A (probability) measure on X is a function p : Σ X → [ 0 , ∞ ] that satisfies p ( � U n ) = � p ( U n ) (and has p ( X ) = 1) 12 / 27

Measure theory Impossible to sample 0 . 5 from standard normal distribution But sample in interval ( 0 , 1 ) with probability around 0 . 34 A measurable space is a set X with a family Σ X of subsets that is closed under countable unions and complements A (probability) measure on X is a function p : Σ X → [ 0 , ∞ ] that satisfies p ( � U n ) = � p ( U n ) (and has p ( X ) = 1) A function f : X → Y is measurable if f − 1 ( U ) ∈ Σ X for U ∈ Σ Y A random variable is a measurable function R → X 12 / 27

Function types Z × X f × id X ˆ f [ X → Y ] × X Y ev 13 / 27

Function types Z × X f × id X ˆ f [ X → Y ] × X Y ev [ R → R ] cannot be a measurable space! 13 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: 14 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant 14 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable 14 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n 14 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n A morphism is a function f : X → Y with f ◦ α ∈ M Y if α ∈ M X 14 / 27

Quasi-Borel spaces A quasi-Borel space is a set X together with M X ⊆ [ R → X ] satisfying: ◮ α ∈ M X if α : R → X is constant ◮ α ◦ ϕ ∈ M X if α ∈ M X and ϕ : R → R is measurable ◮ if R = � n ∈ N S n , with each set S n Borel, and α 1 , α 2 , . . . ∈ M X , then β is in M X , where β ( r ) = α n ( r ) for r ∈ S n A morphism is a function f : X → Y with f ◦ α ∈ M Y if α ∈ M X ◮ has product types ◮ has sum types ◮ has function types! M [ X → Y ] = { α : R → [ X → Y ] | ˆ α : R × X → Y morphism } 14 / 27

Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X 15 / 27

Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X Qbs Set ⊤ ( X , { α : R → X } ) X ( X , M X ) X 15 / 27

Example quasi-Borel spaces Set ⊥ Qbs ( X , { case S n . x n | S n ⊆ X partition , x n ∈ R } ) X ( X , M X ) X Qbs Set ⊤ ( X , { α : R → X } ) X ( X , M X ) X Meas ⊤ Qbs ( X , Σ X ) ( X , { α : R → X measurable } ) ( X , { U | ∀ α ∈ M X : α − 1 ( U ) measurable } ) ( X , M X ) 15 / 27

Distribution types A measure on a quasi-Borel space ( X , M X ) consists of ◮ α ∈ M X and ◮ a probability measure µ on R Two measures are identified when they induce the same µ ( α − 1 ( − )) 16 / 27

Distribution types A measure on a quasi-Borel space ( X , M X ) consists of ◮ α ∈ M X and ◮ a probability measure µ on R Two measures are identified when they induce the same µ ( α − 1 ( − )) Gives monad ◮ P ( X , M X ) = { ( α, µ ) measure on ( X , M X } / ∼ ◮ return x = [ λ r . x , µ ] ∼ for arbitrary µ ◮ bind uses integral � � f d ( α, µ ) := ( f ◦ α ) d µ if f : ( X , M X ) → R for distribution types 16 / 27

Example: facts about distributions � � let x = sample(gauss(0.0,1.0)) = � sample(bern(0.5)) � in return (x<0) 17 / 27

Example: importance sampling � � sample(exp(2)) � let x = sample(gauss(0,1))) � = observe(exp-pdf(2,x)/gauss-pdf(0,1,x)); return x 18 / 27

Example: conjugate priors � let x = sample(beta(1,1)) � � observe(bern(0.5), true); � = in observe(bern(x), true); let x = sample(beta(2,1)) return x in return x 19 / 27

Linear regression (defquery Bayesian-linear-regression Prior: (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] Likelihood: (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) Posterior: (predict :f f))) 20 / 27

Linear regression: prior Define a prior measure on [ R → R ] � (let [f (let [s (sample (normal 0.0 3.0)) � b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] = [ α, ν ⊗ ν ] ∼ ∈ P ([ R → R ]) where ν is normal distribution, mean 0 and standard deviation 3, and α : R × R → [ R → R ] is ( s , b ) �→ λ r . sr + b 21 / 27

Linear regression: likelihood Define likelihood of observations (with some noise) � � (observe (normal (f 1.0) 0.5) 2.5) � � (observe (normal (f 2.0) 0.5) 3.8) � � � � (observe (normal (f 3.0) 0.5) 4.5) � � (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) = d ( f ( 1 ) , 2 . 5 ) · d ( f ( 2 ) , 3 . 8 ) · d ( f ( 3 ) , 4 . 5 ) · d ( f ( 4 ) , 6 . 2 ) · d ( f ( 5 ) , 8 . 0 ) where f free variable of type [ R → R ] , and d : R 2 → [ 0 , ∞ ) is density of normal distribution with standard deviation 0.5 � 2 /π exp( − 2 ( x − µ ) 2 ) d ( µ, x ) = 22 / 27

Linear regression: Posterior Normalise combined prior and likelihood � (predict :f f))) � ∈ P ([ R → R ]) 23 / 27

Piecewise linear regression: Posterior Normalise combined prior and likelihood � (predict :f f))) � ∈ P ([ R → R ]) 24 / 27

Modular inference algorithms An inference representation is monad ( T , return , ≫ =) with T X → P X , sample : 1 → T [ 0 , 1 ] , score : [ 0 , ∞ ) → T 1. ◮ Discrete weighted sampler (e.g. coin flip) ◮ Continuous sampler 25 / 27

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes - PowerPoint PPT Presentation

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) 2 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) Bayesian reasoning: predict future, based on model and

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The 6 th Workshop on

Fine-Grained Semantics for Probabilistic Programs Benjamin Timon Martin Bichsel Gehr Vechev

Semantics of programming languages Informatics 2A: Lecture 27 John Longley School of Informatics

CSE 258 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Special Topics Some complex model-building problems can be handled using the linear regression

Chapter 6: Series Solutions of Linear Equations Department of Electrical Engineering National

Ritz Method Introductory Course on Multiphysics Modelling T OMASZ G. Z IELI NSKI

Segmented Regression Model 11 Oct, 2014 1E 2014 NNN 1 1E 2014 NNN 2 Segmented Are Global

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

Data analysis Piecewise-constant linear regression Hidden Markov Models Let x t denote the

Chart Generation for Contextualized Responses to NL COVID-19 Queries Hannah DeBalsi, Bianca Yu

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes - PowerPoint PPT Presentation

Semantics for Probabilistic Programming Chris Heunen 1 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) 2 / 27 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) Bayesian reasoning: predict future, based on model and

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The 6 th Workshop on

Fine-Grained Semantics for Probabilistic Programs Benjamin Timon Martin Bichsel Gehr Vechev

Semantics of programming languages Informatics 2A: Lecture 27 John Longley School of Informatics

CSE 258 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression

Special Topics Some complex model-building problems can be handled using the linear regression

Chapter 6: Series Solutions of Linear Equations Department of Electrical Engineering National

Ritz Method Introductory Course on Multiphysics Modelling T OMASZ G. Z IELI NSKI

Segmented Regression Model 11 Oct, 2014 1E 2014 NNN 1 1E 2014 NNN 2 Segmented Are Global

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &amp;

Data analysis Piecewise-constant linear regression Hidden Markov Models Let x t denote the

Chart Generation for Contextualized Responses to NL COVID-19 Queries Hannah DeBalsi, Bianca Yu

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &