Semantics of Probabilistic and Differential Programming Workshop on - PowerPoint PPT Presentation

Semantics of Probabilistic and Differential Programming Workshop on program transformations at NeurIPS Christine Tasson (tasson@irif.fr) December 2019 Institut de Recherche en Informatique Fondamentale

Every programmer can perform data analysis by describing models as programs and key operations (inference and gradient) computations are delegated to compiler. Probabilistic programming languages BUGS (Spiegelhalter et al. 1995) , BLOG (Milch et al. 2005) , Church (Goodman et al. 2008) , WebPPL (Goodman et al. 2014) , Venture (Mansinghka et al. 2014) , Anglican (Wood et al. 2015) , Stan (Stan Development Team 2014) , Hakaru (Narayanan et al., 2016) BayesDB (Mansinghka et al. 2017) , Edward (Tran et al.Tran et al. 2017) , Birch (Murray et al. 2018) , Turing (Ge et al. 2018) , Gen (Cusumano-Towner et al. 2019) , Pyro (Bingham et al. 2019) , . . . Differential programming languages Theano (Bergstra et al. 2010) , Tensorflow 1.0 (Abadi et al. 2016, Yu et al.2018) , Tangent (van Merrienboer et al. 2018) , Autograd (Maclaurin et al. 2015) , TensorFlow Eager Mode (Shankar and Dobson 2017) , Chainer (Tokui 2018) , PyTorch (PyTorch 2018) , and JAX (Frostig et al. 2018) , . . . 1

Probabilistic Programming Bayesian Inference

Sampling Idea: How to model probability distributions by programs def p l i n k o ( n ) : 1 i f ( n==0) : 2 r e t u r n 0 3 e l s e : 4 i f coin () : 5 r e t u r n p l i n k o (n − 1)+1 6 e l s e : 7 r e t u r n p l i n k o (n − 1) − 1 8 By Matemateca (IME USP) 2

Sampling Idea: How to model probability distributions by programs sample(plinko(4)) > 2 def p l i n k o ( n ) : 1 i f ( n==0) : 2 r e t u r n 0 3 e l s e : 4 i f coin () : 5 r e t u r n p l i n k o (n − 1)+1 6 e l s e : 7 r e t u r n p l i n k o (n − 1) − 1 8 2

Sampling Idea: How to model probability distributions by programs sample(plinko(4)) > 2 nSample(plinko(4), 1000) def p l i n k o ( n ) : 1 plot(gaussian(0,1)) i f ( n==0) : 2 r e t u r n 0 3 e l s e : 4 i f coin () : 5 r e t u r n p l i n k o (n − 1)+1 6 e l s e : 7 r e t u r n p l i n k o (n − 1) − 1 8 2

What is Bayesian Inference Gender Bias (Laplace): Paris, from 1745 to 1770 f 0 = 241 945 females out of B 0 = 493 472 births (49%). 3

What is Bayesian Inference Gender Bias (Laplace): Paris, from 1745 to 1770 f 0 = 241 945 females out of B 0 = 493 472 births (49%). What is the probability to be born female ? • female births are independent and follow the same law with bias θ • the probability to get f females out of B births is � B � θ f (1 − θ ) B − f P ( f | θ, B ) = f Novelty: the bias θ to be born female follows a probabilistic distribution. 3

What is Bayesian Inference Gender Bias (Laplace): Paris, from 1745 to 1770 f 0 = 241 945 females out of B 0 = 493 472 births (49%). What is the probability to be born female ? • female births are independent and follow the same law with bias θ • the probability to get f females out of B births is � B � θ f (1 − θ ) B − f P ( f | θ, B ) = f Novelty: the bias θ to be born female follows a probabilistic distribution. Inference paradigm: what is the law of θ conditioned on f and B ? • Sample θ from a postulated distribution π (prior) • Simulate data f from the outcome θ (likelihood) • Infer the distribution of θ (posterior) by Bayes Law P ( f | θ, B ) π ( θ ) P ( θ | f , B ) = θ P ( f | θ, B ) π ( θ ) = α · P ( f | θ, B ) π ( θ ) � 3

Conditioning and inference # model 1 def f B i r t h ( theta , B) : 2 i f (B == 0) : 3 r e t u r n 0 4 e l s e : 5 f = f l i p ( theta ) 6 r e t u r n f + f B i r t h ( theta , B − 1) 7 8 # parameter ( p r i o r ) 9 theta = uniform (0 ,1) 10 11 # data 1747 − 1783 12 f0 = 241 945 13 B0 = 493 472 14 15 # i n f e r e n c e ( p o s t e r i o r ) 16 i n f e r ( f B i r t h , theta , f0 , B0) 17 Idea: adjust theta distribution by comparison to data by simulation. 4

Inference by rejection sampling # p r i o r : Unit − > S 1 def g u e s s e r () : 2 sample ( uniform (0 ,1) ) 3 4 # p r e d i c a t e : i n t x i n t − > (S − > Boolean ) 5 def checker ( f0 , B0) : 6 lambda theta : gBirth ( theta , B0) == f0 7 8 # i n f e r : ( Unit − > S) − > (S − > Boolean ) − > S 9 def r e j e c t i o n ( guesser , checker ( f0 , B0) ) : 10 theta = g u e s s e r () 11 i f checker ( f0 , B0) ( theta ) : 12 r e t u r n theta 13 e l s e : 14 r e j e c t i o n ( guesser , checker ( f0 , B0) ) 15 Problem: inefficient, hence other approximated methods 5

Inference by Metropolis-Hasting Infer θ by Bayes Law : P ( θ | f , B ) = α · P ( f | θ, B ) π ( θ ) # p r o p o r t i o n : S x S − > f l o a t 1 def p r o p o r t i o n ( x , y ) : 2 r e t u r n P( f | x , B0) / P( f | y , B0) 3 4 # Metropolis − Hasting : i n t ∗ i n t ∗ i n t − > S 5 def m e t r o p o l i s (n , f0 , B0) : 6 i f ( n=0) : 7 r e t u r n f0 /B0 8 e l s e : 9 x = m e t r o p o l i s (n − 1, f0 , B0) 10 y = g a u s s i a n ( x , 1) 11 z = b e r n o u i l l i ( p r o p o r t i o n ( x , y ) ) 12 i f ( z == 0) : 13 r e t u r n x 14 e l s e : 15 r e t u r n y 16 6

Probabilistic Programming Semantics

Problems in semantics • Prove formally the correspondence between algorithms, implementations and mathematics. • Prove that two programs have equivalent behavior Operational Semantics describes how probabilistic programs compute. Denotational Semantics describes what probabilistic programs compute 7

Problems in semantics • Prove formally the correspondence between algorithms, implementations and mathematics. • Prove that two programs have equivalent behavior Operational Semantics describes how probabilistic programs compute. Proba ( M , N ) is the probability p that M reduces to N in one step, p M − → N defined by induction on the structure of M : 1 1 / 1 / 2 2 • ( λ x . M ) N → M [ N / x ] − • coin − → 0 • coin − → 1 . . . Denotational Semantics describes what probabilistic programs compute 7

Problems in semantics • Prove formally the correspondence between algorithms, implementations and mathematics. • Prove that two programs have equivalent behavior Operational Semantics describes how probabilistic programs compute. Proba ( M , N ) is the probability p that M reduces to N in one step, p M − → N defined by induction on the structure of M : 1 1 / 1 / 2 2 • ( λ x . M ) N → M [ N / x ] − • coin − → 0 • coin − → 1 . . . Denotational Semantics describes what probabilistic programs compute � M � is a probabilistic distribution, if M is a closed ground type program. • If M has type nat , then � M � a discrete distribution over integers • If M has type real , then � M � a continuous distribution over reals 7

Operational Semantics on an example (Borgström-Dal Lago-Gordon-Szymczak ICFP’16) 1 def addCoins ( ) : • ( λ x . M ) N − → M [ N / x ] a = coin 1 / 2 b = coin • coin − → 0 c = coin 1 / 2 return ( a + b + c ) • coin − → 1 . . . a = coin a = 0 a = 0 a = 0 b = coin 1 / b = coin 1 / b = 1 1 / b = 1 1 2 2 2 addCoins ( ) → − − → − → − → c = coin c = coin c = coin c = 1 ( a + b + c ) ( a + b + c ) ( a + b + c ) ( a + b + c ) b = 1 c = 1 1 1 1 1 − → − → − → − → c = 1 ( 0 + 1 + 1 ) 2 ( 0 + 1 + c ) ( 0 + b + c ) 8

Operational Semantics on an example (Borgström-Dal Lago-Gordon-Szymczak ICFP’16) 1 def addCoins ( ) : • ( λ x . M ) N → M [ N / x ] − a = coin 1 / 2 b = coin • coin − → 0 c = coin 1 / 2 return ( a + b + c ) • coin − → 1 . . . a = coin a = 0 b = coin 1 / b = coin 1 / 1 / 1 1 ∗ 2 2 2 addCoins ( ) → − − − → − − → − − → − − → 2 c = coin c = coin a =0 c =1 b =1 ( a + b + c ) ( a + b + c ) a =1 b =1 c =0 1 / 8 ∗ 1 / 8 ∗ addCoins() 2 a =1 b =0 c =1 ∗ 1 / 8 a =0 b =1 c =1 8

Operational Semantics on an example (Borgström-Dal Lago-Gordon-Szymczak ICFP’16) 1 def addCoins ( ) : • ( λ x . M ) N − → M [ N / x ] a = coin 1 / 2 b = coin • coin − → 0 c = coin 1 / 2 return ( a + b + c ) • coin − → 1 . . . a = coin a = 0 b = coin 1 / b = coin 1 / 1 / 1 1 ∗ 2 2 2 addCoins ( ) − → − − → − − → − − → − − → 2 c = coin c = coin a =0 c =1 b =1 ( a + b + c ) ( a + b + c ) a =1 b =1 c =0 1 / 8 ∗ Proba ∞ ( addCoins() , 2) = 3 1 / 8 ∗ addCoins() 2 8 a =1 b =0 c =1 ∗ 1 / 8 a =0 b =1 c =1 8

Operational Semantics Proba ∞ ( M , N ) is the proba. that M reduces to N in any number of steps Behavioral equivalence: ∀ C [ ] , Proba ∞ ( C [ M 1 ] , 0) = Proba ∞ ( C [ M 2 ] , 0) M 1 ≃ M 2 iff def addCoins1 () : def addCoins2 () : 1 1 a = coin b = coin 2 2 b = coin a = coin 3 3 c = coin c = coin 4 4 r e t u r n ( a + b + c ) r e t u r n ( a + b + c ) 5 5 def i n f e r 1 ( f0 , B0) : 1 r e j e c t i o n ( guesser , checker ( f0 , B0) ) : 2 3 def i n f e r 2 ( f0 , B0) : 4 m e t r o p o l i s ( f0 , B0 , 1000) 5 9

Semantics of Probabilistic and Differential Programming Workshop on - PowerPoint PPT Presentation

Semantics of Probabilistic and Differential Programming Workshop on program transformations at NeurIPS Christine Tasson (tasson@irif.fr) December 2019 Institut de Recherche en Informatique Fondamentale Every programmer can perform data analysis

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

Differential equations Programming of Differential Equations A differential equation (ODE)

Differential equations Programming of Differential Equations A differential equation (ODE)

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Measuring Inequality by Asset Indices: The case of South Africa Martin Wittenberg and Murray

R04 - Regression with Categorical Explanatory Variables STAT 587 (Engineering) Iowa State

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

Symbolic Model Checking Binary Decision Diagrams 2 Combinatorial Circuits 3 Eight Queen

Geometric Whitney problem and inverse problems Matti Lassas in collaboration with Charles

Learning to Compose Neural Networks for Question Answering Andreas, Rohrbach, Darrell, and Klein

Generalized Erd os-Tur an laws for the order of random permutation Alexander Gnedin (QMUL,

Challenges and Opportunities for Underwater Robotics Dr. Yi Guo Department of Electrical and