Semantics for Probabilistic Programming Chris Heunen 1 / 21 Bayes - - PowerPoint PPT Presentation

semantics for probabilistic programming
SMART_READER_LITE
LIVE PREVIEW

Semantics for Probabilistic Programming Chris Heunen 1 / 21 Bayes - - PowerPoint PPT Presentation

Semantics for Probabilistic Programming Chris Heunen 1 / 21 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) 2 / 21 Bayes law P ( A | B ) = P ( B | A ) P ( A ) P ( B ) Bayesian reasoning: predict future, based on model and


slide-1
SLIDE 1

Semantics for Probabilistic Programming

Chris Heunen

1 / 21

slide-2
SLIDE 2

Bayes’ law

P(A | B) = P(B | A)×P(A) P(B)

2 / 21

slide-3
SLIDE 3

Bayes’ law

P(A | B) = P(B | A)×P(A) P(B) Bayesian reasoning:

◮ predict future, based on model and prior evidence ◮ infer causes, based on model and posterior evidence ◮ learn better model, based on prior model and evidence

2 / 21

slide-4
SLIDE 4

Bayesian networks

3 / 21

slide-5
SLIDE 5

Bayesian inference

4 / 21

slide-6
SLIDE 6

Linear regression

5 / 21

slide-7
SLIDE 7

Probabilistic programming

P(A | B) ∝ P(B | A) × P(A) posterior ∝ likelihood × prior functional programming + observe + sample

6 / 21

slide-8
SLIDE 8

Probabilistic programming

P(A | B) ∝ P(B | A) × P(A) posterior ∝ likelihood × prior functional programming + observe + sample

6 / 21

slide-9
SLIDE 9

Linear regression

(defquery Bayesian-linear-regression (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) (predict :f f)))

7 / 21

slide-10
SLIDE 10

Linear regression

8 / 21

slide-11
SLIDE 11

Linear regression

9 / 21

slide-12
SLIDE 12

Measure theory

Impossible to sample 0.5 from standard normal distribution But sample in interval (0, 1) with probability around 0.34

10 / 21

slide-13
SLIDE 13

Measure theory

Impossible to sample 0.5 from standard normal distribution But sample in interval (0, 1) with probability around 0.34 A measurable space is a set X with a family ΣX of subsets that is closed under countable unions and complements A (probability) measure on X is a function p: ΣX → [0, ∞] that satisfies p( Un) = p(Un) (and has p(X) = 1)

10 / 21

slide-14
SLIDE 14

Measure theory

Impossible to sample 0.5 from standard normal distribution But sample in interval (0, 1) with probability around 0.34 A measurable space is a set X with a family ΣX of subsets that is closed under countable unions and complements A (probability) measure on X is a function p: ΣX → [0, ∞] that satisfies p( Un) = p(Un) (and has p(X) = 1) A function f : X → Y is measurable if f −1(U) ∈ ΣX for U ∈ ΣY A random variable is a measurable function R → X

10 / 21

slide-15
SLIDE 15

Function types

Z × X [X → Y] × X Y ˆ f f × idX ev

11 / 21

slide-16
SLIDE 16

Function types

Z × X [X → Y] × X Y ˆ f f × idX ev [R → R] cannot be a measurable space!

11 / 21

slide-17
SLIDE 17

Quasi-Borel spaces

A quasi-Borel space is a set X together with MX ⊆ [R → X] satisfying:

◮ α ◦ f ∈ MX if α ∈ MX and f : R → R is measurable ◮ α ∈ MX if α: R → X is constant ◮ if R = n∈N Sn, with each set Sn Borel, and α1, α2, . . . ∈ MX,

then β is in MX, where β(r) = αn(r) for r ∈ Sn

12 / 21

slide-18
SLIDE 18

Quasi-Borel spaces

A quasi-Borel space is a set X together with MX ⊆ [R → X] satisfying:

◮ α ◦ f ∈ MX if α ∈ MX and f : R → R is measurable ◮ α ∈ MX if α: R → X is constant ◮ if R = n∈N Sn, with each set Sn Borel, and α1, α2, . . . ∈ MX,

then β is in MX, where β(r) = αn(r) for r ∈ Sn A morphism is a function f : X → Y with f ◦ α ∈ MY if α ∈ MX

◮ has product types ◮ has countable sum types ◮ has function types!

M[X→Y] = {α: R → [X → Y] | ˆ α: R × X → Y morphism}

12 / 21

slide-19
SLIDE 19

Distribution types

A measure on a quasi-Borel space (X, MX) consists of

◮ α ∈ MX and ◮ a probability measure µ on R

Two measures are identified when they induce the same µ(α−1(−))

13 / 21

slide-20
SLIDE 20

Distribution types

A measure on a quasi-Borel space (X, MX) consists of

◮ α ∈ MX and ◮ a probability measure µ on R

Two measures are identified when they induce the same µ(α−1(−)) Gives monad

◮ P(X, MX) = {(α, µ) measure on (X, MX}/ ∼ ◮ return x = [λr.x, µ]∼ for arbitrary µ ◮ bind uses integral

  • fd(α, µ) :=
  • (f ◦ α)dµ if f : (X, MX) → R

for distribution types

13 / 21

slide-21
SLIDE 21

Example: facts about distributions

  • let x = sample(gauss(0.0,1.0))

in return (x<0)

  • = sample(bern(0.5))

14 / 21

slide-22
SLIDE 22

Example: importance sampling

  • sample(exp(2))
  • =

let x = sample(gauss(0,1)))

  • bserve(exp-pdf(2,x)/gauss-pdf(0,1,x));

return x

  • 15 / 21
slide-23
SLIDE 23

Example: conjugate priors

let x = sample(beta(1,1)) in observe(bern(x), true); return x

  • =
  • bserve(bern(0.5), true);

let x = sample(beta(2,1)) in return x

  • 16 / 21
slide-24
SLIDE 24

Linear regression

(defquery Bayesian-linear-regression Prior: (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))] Likelihood: (observe (normal (f 1.0) 0.5) 2.5) (observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0) Posterior: (predict :f f)))

17 / 21

slide-25
SLIDE 25

Linear regression: prior

Define a prior measure on [R → R] (let [f (let [s (sample (normal 0.0 3.0)) b (sample (normal 0.0 3.0))] (fn [x] (+ (* s x) b)))]

  • =

[α, ν ⊗ ν]∼ ∈ P([R → R]) where ν is normal distribution, mean 0 and standard deviation 3, and α: R × R → [R → R] is (s, b) → λr.sr + b

18 / 21

slide-26
SLIDE 26

Linear regression: likelihood

Define likelihood of observations (with some noise)

  • (observe (normal (f 1.0) 0.5) 2.5)

(observe (normal (f 2.0) 0.5) 3.8) (observe (normal (f 3.0) 0.5) 4.5) (observe (normal (f 4.0) 0.5) 6.2) (observe (normal (f 5.0) 0.5) 8.0)

  • =

d(f(1), 2.5) · d(f(2), 3.8) · d(f(3), 4.5) · d(f(4), 6.2) · d(f(5), 8.0) where f free variable of type [R → R], and d: R2 → [0, ∞) is density

  • f normal distribution with standard deviation 0.5

d(µ, x) =

  • 2/π exp(−2(x − µ)2)

19 / 21

slide-27
SLIDE 27

Linear regression: Posterior

Normalise combined prior and likelihood (predict :f f))) ∈ P([R → R])

20 / 21

slide-28
SLIDE 28

Want more?

◮ “Semantics for probabilistic programming: higher-order functions,

continuous distributions, and soft constraints” LiCS 2016

◮ “A convenient category for higher-order probability theory”

arXiv:1701.02547

21 / 21