Probabilistic Programming
Frank Wood
frank@invrea.com fwood@robots.ox.ac.uk http://www.invrea.com/ http://www.robots.ox.ac.uk/~fwood PPAML Summer School, Portland 2016
Probabilistic Programming Frank Wood frank@invrea.com - - PowerPoint PPT Presentation
Probabilistic Programming Frank Wood frank@invrea.com fwood@robots.ox.ac.uk http://www.invrea.com/ http://www.robots.ox.ac.uk/~fwood PPAML Summer School, Portland 2016 Objectives For This
Frank Wood
frank@invrea.com fwood@robots.ox.ac.uk http://www.invrea.com/ http://www.robots.ox.ac.uk/~fwood PPAML Summer School, Portland 2016
https://goo.gl/SrNzPZ Public Google Calendar
Get you to
different to a normal program.
have the resources to get started if you want to.
probabilistic programming system.
ML: Algorithms & Applications STATS: Inference & Theory PL: Compilers, Semantics, Transformations Probabilistic Programming
Statistics y
Parameters Program Output
CS
Parameters Program Observations
Probabilistic Programming Inference
“Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations.”
Gordon, Henzinger, Nori, and Rajamani “Probabilistic programming.” In Proceedings of On The Future of Software Engineering (2014).
Lines of Matlab/Java Code Lines of Anglican Code
HPYP, [Teh 2006] Object Tracking, [Neiswanger et al 2014] Automata Induction [Pfau et al 2010] Collapsed LDA DP Conjugate Mixture
Language Representation / Abstraction Layer
Inference engines Models / Simulators
c0Hr
γλm
c1πm Hy θm
r0 s0 r1 s1 y1 r2 s2 y2rT
sT yT r3 s3 y3∞
α G π θ c y
i k kK K α G π θ c y
i k kα
wd
izd
iβk
γ
d = 1...D i = 1 . . . N d . θd k = 1...K12
Probabilistic-ML,Haskell,Scheme,…
2000 1990 2010
PL
HANSAI IBAL Figaro
ML STATS
WinBUGS BUGS JAGS STAN LibBi Venture Anglican Church Probabilistic-C Infer.NET webPPL Blog Factorie
AI
Prism Prolog KMP ProbLog Simula
Λo
Graphical Models Factor Graphs Factorie Infer.NET STAN BUGS
16
model { x ~ dnorm(a, 1/b) for (i in 1:N) { y[i] ~ dnorm(x, 1/c) } } a b c
x y1 y2
Spiegelhalter et al. "BUGS: Bayesian inference using Gibbs sampling, Version 0.50." Cambridge 1995.
STAN : Finite Dimensional Differentiable Distributions
17
} parameters { real xs[T]; } model { xs[1] ~ normal(0.0, 1.0); for (t in 2:T) xs[t] ~ normal(a * xs[t - 1], q); for (t in 1:T) ys[t] ~ normal(xs[t], 1.0); }
STAN Development Team "Stan: A C++ Library for Probability and Sampling." 2014.
rx log p(x, y)
18
N(sa; µs, σ2
s)N(sb; µs, σ2
s)N(sc; µs, σ2
s)sa sb sc N(pa1; µp, σ2
p)N(pa3; µp, σ2
p)N(pb1; µp, σ2
p)N(pb2; µp, σ2
p)N(pc2; µp, σ2
p)N(pc3; µp, σ2
p)pa1 pa3 pb1 pb2 pc2 pc3 I(w1 = pa1 > pb1)) I(w2 = pb2 > pc2)) I(w3 = pc3 > pa3)) w1 w2 w3
TrueSkill CRF
Minka, Winn, Guiver, and Knowles "Infer .NET 2.4, Microsoft Research Cambridge." 2010. . McCallum, Schultz, and Singh. “Factorie Probabilistic programming via imperatively defined factor graphs.“ NIPS 2009
x1 fx1,y1 y1 fx1,x2 x2 fx2,y2 y2 fx2,x3 x3 fx3,y3 y3
Finite RV Cardinality
Probabilistic-ML,Haskell,Scheme,…
Discrete RV’s Only
2000 1990 2010
PL
HANSAI IBAL Figaro
ML STATS
WinBUGS BUGS JAGS STAN LibBi Venture Anglican Church Probabilistic-C Infer.NET webPPL Blog Factorie
AI
Prism Prolog KMP ProbLog Simula
Λo
Probabilistic-ML,Haskell,Scheme,…
2000 1990 2010
PL
HANSAI IBAL Figaro
ML STATS
WinBUGS BUGS JAGS STAN LibBi Venture Anglican Church Probabilistic-C Infer.NET webPPL Blog Factorie
AI
Prism Prolog KMP ProbLog Simula
Λo
(defquery captcha [image num-chars tol] (let [[w h] (size image) ;; sample random characters num-chars (sample (poisson num-chars)) chars (repeatedly num-chars sample-char)] ;; compare rendering to true image (map (fn [y z] (observe (normal z tol) y)) (reduce-dim image) (reduce-dim (render chars w h))) ;; predict captcha text {:text (map :symbol (sort-by :x chars))}))
Posterior Samples
Generative Model Observation
y x text image
Mansinghka,, Kulkarni, Perov, and Tenenbaum “Approximate Bayesian image interpretation using generative probabilistic graphics programs." NIPS (2013).
22
Kulkarni, Kohli, Tenenbaum, Mansinghka "Picture: a probabilistic programming language for scene perception." CVPR (2015). Mansinghka,, Kulkarni, Perov, and Tenenbaum. "Approximate Bayesian image interpretation using generative probabilistic graphics programs." NIPS (2013).
Observed Image Inferred (reconstruction) Inferred model re-rendered with novel poses Inferred model re-rendered with novel lighting
Captcha Solving Scene Description
y x
scene description image x y
y
x
23
Stuhlmüller, and Goodman. "Reasoning about reasoning by nested conditioning: Modeling theory of mind with probabilistic programs." Cognitive Systems Research 28 (2014): 80-99.
y x
cognitive process behavior
Want to meet up but phones are dead…
I prefer the pub. Where will Noah go? Simulate Noah: Noah prefers pub but will go wherever Andreas is Simulate Noah simulating Andreas:
…
24
Ritchie, Lin, Goodman, & Hanrahan. Generating Design Suggestions under Tight Constraints with Gradient‐based Probabilistic Programming. In Computer Graphics Forum, (2015)
Stable Static Structures
Ritchie, Mildenhall, Goodman, & Hanrahan. “Controlling Procedural Modeling Programs with Stochastically-Ordered Sequential Monte Carlo.” SIGGRAPH (2015)
Procedural Graphics
y x
simulation constraint
25
Perov and Wood. "Automatic Sampler Discovery via Probabilistic Programming and Approximate Bayesian Computation" AGI (2016).
y x
program source code program output
x ∼ p(x) x ∼ p(x|y)
y
˜ y ∼ p(·|x) ˜ y ∼ p(·|x)
28
Church WebPPL VentureScript Interpreted Anglican WebChurch (Bher) Probabilistic-C Anglican
Compiled Interpreted
Inspiration Modeling language
lisp javascript clojure c
(defquery gaussian-model [data] (let [x (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (map (fn [y] (observe (normal x sigma) y)) data) x))
29
(def posterior-samples (repeatedly 20000 #(sample posterior))) (def posterior ((conditional gaussian-model :pgibbs :number-of-particles 1000) dataset))
x|y ∼ Normal(7.25, 0.91)
(def dataset [9 8])
y1 = 9, y2 = 8
x ∼ Normal(1, √ 5) yi|x ∼ Normal(x, √ 2)
https://www.java.com/
(defquery gaussian-model [data] (let [x (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (map (fn [y] (observe (normal x sigma) y)) data) x))
30
(def posterior-samples (repeatedly 20000 #(sample posterior))) (def posterior ((conditional gaussian-model :pgibbs :number-of-particles 1000) dataset))
x|y ∼ Normal(7.25, 0.91)
(def dataset [9 8])
y1 = 9, y2 = 8
x ∼ Normal(1, √ 5) yi|x ∼ Normal(x, √ 2)
https://www.java.com/
(defquery gaussian-model [data] (let [x (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (map (fn [y] (observe (normal x sigma) y)) data) x))
31
(def posterior-samples (repeatedly 20000 #(sample posterior))) (def posterior ((conditional gaussian-model :pgibbs :number-of-particles 1000) dataset))
x|y ∼ Normal(7.25, 0.91)
(def dataset [9 8])
y1 = 9, y2 = 8
x ∼ Normal(1, √ 5) yi|x ∼ Normal(x, √ 2)
https://www.java.com/
(defquery gaussian-model [data] (let [x (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (map (fn [y] (observe (normal x sigma) y)) data) x))
32
(def posterior-samples (repeatedly 20000 #(sample posterior))) (def posterior ((conditional gaussian-model :pgibbs :number-of-particles 1000) dataset))
x|y ∼ Normal(7.25, 0.91)
(def dataset [9 8])
y1 = 9, y2 = 8
x ∼ Normal(1, √ 5) yi|x ∼ Normal(x, √ 2)
(defquery gaussian-model [data] (let [x (sample (normal 1 (sqrt 5))) sigma (sqrt 2)] (map (fn [y] (observe (normal x sigma) y)) data) x))
33
(def posterior-samples (repeatedly 20000 #(sample posterior))) (def posterior ((conditional gaussian-model :pgibbs :number-of-particles 1000) dataset))
x|y ∼ Normal(7.25, 0.91)
(def dataset [9 8])
y1 = 9, y2 = 8
x ∼ Normal(1, √ 5) yi|x ∼ Normal(x, √ 2)
https://www.java.com/
34
(defquery sprinkler-bayes-net [sprinkler wet-grass] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler-dist (cond (= is-cloudy true) (flip 0.1) (= is-cloudy false) (flip 0.5)) wet-grass-dist (cond (and (= sprinkler true) (= is-raining true)) (flip 0.99) (and (= sprinkler false) (= is-raining false)) (flip 0.0) (or (= sprinkler true) (= is-raining true)) (flip 0.9))] (observe sprinkler-dist sprinkler) (observe wet-grass-dist wet-grass) is-raining))
x0 x1 x2 x3
· · ·
y1 y2 y3
(defquery hmm (let [init-dist (discrete [1 1 1]) trans-dist (fn [s] (cond (= s 0) (discrete [0 1 1]) (= s 1) (discrete [0 0 1]) (= s 2) (dirac 2)))
y-1 1 y-2 1 x-0 (sample init-dist) x-1 (sample (trans-dist x-0)) x-2 (sample (trans-dist x-1))] (observe (obs-dist x-1) y-1) (observe (obs-dist x-2) y-2) [x-0 x-1 x-2]))
x0 x1 x2 x3
· · ·
y1 y2 y3
(defquery hmm [ys init-dist trans-dists obs-dists] (reduce (fn [xs y] (let [x (sample (get trans-dists (peek xs)))] (observe (get obs-dists x) y) (conj xs x))) [(sample init-dist)] ys))
1 1-p p 1-p p 2 1-p p
(defquery geometric [p] "geometric distribution" (let [dist (flip p) samp (loop [n 0] (if (sample dist) n (recur (+ n 1))))] samp))
(defquery md5-inverse [L md5str] "conditional distribution of strings that map to the same MD5 hashed string" (let [mesg (sample (string-generative-model L))] (observe (dirac md5str) (md5 mesg)) mesg)))
during the execution of a generative model
40
(let [t-1 3 x-1 (sample (discrete (repeat t-1 1)))] (if (not= x-1 1) (let [t-2 (+ x-1 7) x-2 (sample (poisson t-2))])))
x1 = 0 x1 = 1 x1 = 2
(discrete `(1 1 1)) (poisson 9) (poisson 7)
x2 = 0
x2 = 1
x2 = 2 . . . x2 = 0 x2 = 1 x2 = 2 . . .
(let [t-1 3 x-1 (sample (discrete (repeat t-1 1)))] (if (not= x-1 1) (let [t-2 (+ x-1 7) x-2 (sample (poisson t-2))] (observe (gaussian x-2 0.0001) 1))))
x1 = 0 x1 = 1 x1 = 2
(discrete `(1 1 1)) (poisson 9) (poisson 7)
x2 = 0
x2 = 1
x2 = 2 . . . x2 = 0 x2 = 1 x2 = 2 . . .
(normpdf 1 1 0.0001) (normpdf 0 1 0.0001) (normpdf 2 1 0.0001) (normpdf 1 1 0.0001) (normpdf 2 1 0.0001) (normpdf 0 1 0.0001)
is deterministic
e encounter N s {(gi, φi, yi)}N
i=1
to the sample . This yields seq d {(fj, θj)}M
j=1
sampled values ments, wi e) {xj}M
j=1.
γ(x) , p(x, y) =
N
Y
i=1
gi(yi|φi)
M
Y
j=1
fj(xj|θj). γ(x) = p(x, y) =
N
Y
i=1
˜ gi(xni) ✓ yi
φi(xni) ◆ M Y
j=1
˜ fj(xj−1) ✓ xj
θj(xj−1) ◆
y1 y2
etc x4 x6
x1
x3 x2 x4 x5
x6
alue xj = x1 × · · · × xj denote sampled values (with
π(x) , p(x|y) = γ(x) Z , Z = p(y) = Z γ(x)dx
E[z] = E[Q(x)] = Z Q(x)π(x)dx = 1 Z Z Q(x)γ(x) q(x)q(x)dx
the prior
q(xk) =
Mk
Y
j=1
fj(xk
j|θk j )
w(xk) = γ(xk) q(xk) =
Nk
Y
i=1
gk
i (yk i |φk i )
BLOG default inference engine: http://bayesianlogic.github.io/pages/users-manual.html
W k = w(xk) PK
`=1 w(x`)
b X b E⇡[Q(x)] =
K
X
k=1
W kQ(xk)
. . . . . .
z1, w1 z2, w2 zK, wK
γn(˜ x1:n) =
N
Y
n=1
g(yn|˜ x1:n)p(˜ xn|˜ x1:n−1),
ed incremental targets
πn(˜ x1:n) = 1 Zn γn(˜ x1:n)
y1 y2
etc
x1 x2 x3 x4 x5 x6
˜ x1 ˜ x2
subspace of x which is with ˜
x1:n = ˜ x1 × · · · × ˜ xn such
Wood, van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Want samples from Have a sample-based approximation to Sample from
πn(˜ x1:n) ∝ p(yn|˜ x1:n)p(˜ xn|˜ x1:n−1)πn−1(˜ x1:n−1)
X
k=1
W k
n ,
w(˜ xk
1:n)
PK
k0=1 w(˜
xk0
1:n)
˜ xk
1:n = ˜
x
ak
n−1
1:n−1 × ˜
xk
n.
ˆ πn−1(˜ x1:n−1) ,
K
X
k=1
W k
n−1δ˜ xk
1:n1(˜
x1:n−1)
n1 n1
˜ x
ak
n1
1:n1 ∼ ˆ
πn1(˜ x1:n1)
˜ xk
n|˜
x
ak
n1
1:n1 ∼ p(˜
xn|˜ x
ak
n1
1:n1)
Importance weight by
n| 1:n1
w(˜ xk
1:n) = p(yn|˜
xk
1:n) = gk n(yn|˜
xk
1:n)
Wood, van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Want samples from Have a sample-based approximation to Sample from
πn(˜ x1:n) ∝ p(yn|˜ x1:n)p(˜ xn|˜ x1:n−1)πn−1(˜ x1:n−1)
X
k=1
W k
n ,
w(˜ xk
1:n)
PK
k0=1 w(˜
xk0
1:n)
˜ xk
1:n = ˜
x
ak
n−1
1:n−1 × ˜
xk
n.
ˆ πn−1(˜ x1:n−1) ,
K
X
k=1
W k
n−1δ˜ xk
1:n1(˜
x1:n−1)
n1 n1
˜ x
ak
n1
1:n1 ∼ ˆ
πn1(˜ x1:n1)
˜ xk
n|˜
x
ak
n1
1:n1 ∼ p(˜
xn|˜ x
ak
n1
1:n1)
Importance weight by
n| 1:n1
w(˜ xk
1:n) = p(yn|˜
xk
1:n) = gk n(yn|˜
xk
1:n)
Wood, van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Want samples from Have a sample-based approximation to Sample from
πn(˜ x1:n) ∝ p(yn|˜ x1:n)p(˜ xn|˜ x1:n−1)πn−1(˜ x1:n−1)
X
k=1
W k
n ,
w(˜ xk
1:n)
PK
k0=1 w(˜
xk0
1:n)
˜ xk
1:n = ˜
x
ak
n−1
1:n−1 × ˜
xk
n.
ˆ πn−1(˜ x1:n−1) ,
K
X
k=1
W k
n−1δ˜ xk
1:n1(˜
x1:n−1)
n1 n1
˜ x
ak
n1
1:n1 ∼ ˆ
πn1(˜ x1:n1)
˜ xk
n|˜
x
ak
n1
1:n1 ∼ p(˜
xn|˜ x
ak
n1
1:n1)
Importance weight by
n| 1:n1
w(˜ xk
1:n) = p(yn|˜
xk
1:n) = gk n(yn|˜
xk
1:n)
Wood, van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Want samples from Have a sample-based approximation to Sample from
πn(˜ x1:n) ∝ p(yn|˜ x1:n)p(˜ xn|˜ x1:n−1)πn−1(˜ x1:n−1)
X
k=1
W k
n ,
w(˜ xk
1:n)
PK
k0=1 w(˜
xk0
1:n)
˜ xk
1:n = ˜
x
ak
n−1
1:n−1 × ˜
xk
n.
ˆ πn−1(˜ x1:n−1) ,
K
X
k=1
W k
n−1δ˜ xk
1:n1(˜
x1:n−1)
n1 n1
˜ x
ak
n1
1:n1 ∼ ˆ
πn1(˜ x1:n1)
˜ xk
n|˜
x
ak
n1
1:n1 ∼ p(˜
xn|˜ x
ak
n1
1:n1)
Importance weight by
n| 1:n1
w(˜ xk
1:n) = p(yn|˜
xk
1:n) = gk n(yn|˜
xk
1:n)
Intuitively
Threads
continuations
Posterior distribution of execution traces is proportional to trace score with
Metropolis-Hastings acceptance rule
Milch and Russell “General-Purpose MCMC Inference over Relational Structures.” UAI 2006. Goodman, Mansinghka, Roy, Bonawitz, and Tenenbaum “Church: a language for generative models.” UAI 2008. Wingate, Stuhlmüller, Goodman “Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation” AISTATS 2011
55
▪ Need proposal
γ(x) , p(x, y) =
N
Y
i=1
gi(yi|φi)
M
Y
j=1
fj(xj|θj).
α = min ✓ 1, π(x0)q(x|x0) π(x)q(x0|x) ◆
π(x) , p(x|y) = γ(x) Z , Z
Number of samples in
Probability of new part of proposed execution trace
q(x0|xs) = 1 M s(x0
`|xs `) M0
Y
j=`+1
f 0
j(x0 j|✓0 j)
“Single site update” = sample from the prior = run program forward MH acceptance ratio
57
κ(x0
m|xm) = fm(x0 m|θm), θm = θ0 m
Number of sample statements in original trace Number of sample statements in new trace Probability of proposal trace continuation restarting original trace at mth sample
↵ = min 1, (x0)M QM
j=m fj(xj|✓j)
(x)M 0 QM0
j=m f 0 j(x0 j|✓0 j)
!
Probability of original trace continuation restarting proposal trace at mth sample
. . .
. . . z1 z1
z3
zK
for inference
calls
Wingate, Stuhlmüller, Goodman “Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation” AISTATS 2011 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Standard C plus new directives: observe and predict
program execution predict emits sampled values
Actually
Processes
new processes/continuations
Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
computation”
programs so
continuation
Friedman and Wand. “Essentials of programming languages.” MIT press, 2008.
Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 Goodman and Stuhlmüller http://dippl.org/ 2014 Tolpin https://bitbucket.org/probprog/anglican/ 2014
First continuation Second cont.
;; CPS-transformed "primitives" (defn +& [a b k] (k (+ a b))) (defn *& [a b k] (k (* a b)))
;; Standard Clojure: (println (+ (* 2 3) 4)) ;; CPS transformed: (*& 2 3 (fn [x] (+& x 4 println)))
(defn pythag& "compute sqrt(x^2 + y^2)" [x y k] (square& x (fn [xx] (square& y (fn [yy] (+& xx yy (fn [xxyy] (sqrt& xxyy k))))))))
xx = x2 yy = y2 xxyy = xx + yy · = √xxyy
(defquery flip-example [outcome] (let [p (sample (uniform-continuous 0 1))] (observe (flip p) outcome) (predict :p p)) (flip-example true) (let [u (uniform-continuous 0 1) p (sample u) dist (flip p)] (observe dist outcome) (predict :p p))
Anglican Anglican “linearized”
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1)))))) (let [u (uniform-continuous 0 1) p (sample u) dist (flip p)] (observe dist outcome) (predict :p p))
Clojure Anglican “linearized”
;; CPS-ed distribution constructors (defn uniform-continuous& [a b k] (k (uniform-continuous a b))) (defn flip& [p k] (k (flip p)))
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1)))))) (let [u (uniform-continuous 0 1) p (sample u) dist (flip p)] (observe dist outcome) (predict :p p))
Clojure Anglican “linearized”
;; CPS-ed distribution constructors (defn uniform-continuous& [a b k] (k (uniform-continuous a b))) (defn flip& [p k] (k (flip p)))
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))
continuation functions
continuation functions
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))
Anglican primitives
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))
Anglican primitives
inference “backend” interface
continuation functions
(defn flip-query& [outcome k1] (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))
webPPL CPS compiles to pure functional Javascript
;; Implement a "backend" (defn sample& [dist k] ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] ;; Pass the sampled value to the continuation (k (sample dist))) (defn observe& [dist value k] (println "log-weight =" (observe dist value)) ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] ;; Call continuation with no arguments (k)) (defn predict& [label value k] ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] (k label value))
“Backend” Pure compiled deterministic computation
P
start
P
continue
P
continue terminate
sample
(f, θ, k)
call (k x)
(g, φ, y, k)
ameter vector call (k).
predict
ameter vector call (k).
(z, k)
terminate
P
(defn sample& [dist k] ;; Call the continuation with a sampled value (k (sample dist))) (defn observe& [dist value k] ;; Compute and record the log weight (add-log-weight! (observe dist value)) ;; Call the continuation with no arguments (k)) (defn predict& [label value k] ;; Store predict, and call continuation (store! label value) (k))
P
start
P
continue
P
continue terminate P
terminate
sample&
predict&
p ∼ U(0, 1)
w ← pI(outcome=true)(1 − p)I(outcome=false)
“Backend”
(defquery flip-example [outcome] (let [p (sample (uniform-continuous 0 1))] (observe (flip p) outcome) (predict :p p))
Compiled pure deterministic computation
(defn sample& [dist k] ;; Call the continuation with a sampled value (k (sample dist))) (defn observe& [dist value k] ;; Block and wait for K calls to reach observe& ;; Compute weights ;; Use weights to subselect continuations to call ;; Call K sampled continuations (often multiple times) ) (defn predict& [label value k] ;; Store predict, and call continuation (store! label value) (k))
(defn sample& [a dist k] (let [;; reuse previous value, ;; or sample from prior x (or (get-cache a) (sample dist))] ;; add to log-weight when reused (when (get-cache a) (add-log-weight! (observe dist x))) ;; store value and its log prob in trace (store-in-trace! a x dist) ;; continue with value x (k x))) (defn observe& [dist value k] ;; Compute and record the log weight (add-log-weight! (observe dist value)) ;; Call the continuation with no arguments (k))
"Venture: a higher-order probabilistic programming platform with programmable inference."
"C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching."
"Lightweight implementations of probabilistic programming languages via transformational compilation." AISTATS (2011). WebPPL Anglican
proposals, e.g.
independent Metropolis- Hastings”
conditional SMC”
n n n
n n n
n n n
Sweep
Andrieu, Doucet, Holenstein “Particle Markov chain Monte Carlo methods.“ JRSSB 2010
new particle sets w.p.
marginal likelihood estimate
αs
PIMH = min
1, ˆ Z? ˆ Zs−1 ! ˆ EPIMH[Q(x)] = 1 S
S
X
s=1 K
X
k=1
W s,kQ(xs,k).
ˆ Z =
N
Y
n=1
ˆ Zn =
N
Y
n=1
1 K
K
X
k=1
w(˜ xk
1:n)
ˆ Z1 ˆ Z2 ˆ Z∗
n n n
n n n
n n n
Sweep
Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014
81
Wood, van de Meent, Mansinghka “A new approach to probabilistic programming inference” AISTATS 2014
SMC in LDS slowed down for clarity
Asynchronously
n = 1 n = 2
Paige, Wood, Doucet, Teh “Asynchronous Anytime Sequential Monte Carlo” NIPS 2014
85
I For each MCMC iteration r = 1, 2, . . .
Zm and candidate retained particle x0
1:T,m
P(cj = m|c1:P\j) = ˆ Zm1m/
2c1:P\j
PM
n=1 ˆ
Zn1n/
2c1:P\j
(3)
1:T,j[r] = x0 1:T,cj
MCMC Iteration, r
2 4 6 8 10 12 14 16 18 20
Nodes
Rainforth, Naesseth, Lindsten, Paige, van de Meent, Doucet, Wood, “Interacting Particle Markov Chain Monte Carlo” ICML 2016
iPMCMC
87
code base.
88
Algorithm Type Lines of Code Citation Description smc IS 127 Wood et al. AISTATS, 2014 Sequential Monte Carlo importance IS 21 Likelihood weighting pcascade IS 176 Paige et al., NIPS, 2014 Particle cascade: Anytime asynchronous sequential Monte Carlo pgibbs PMCMC 121 Wood et al. AISTATS, 2014 Particle Gibbs (iterated conditional SMC) pimh PMCMC 68 Wood et al. AISTATS, 2014 Particle independent Metropolis-Hastings pgas PMCMC 179 van de Meent et al., AISTATS, 2015 Particle Gibbs with ancestor sampling lmh MCMC 177 Wingate et al., AISTATS, 2011 Lightweight Metropolis-Hastings ipmcmc MCMC 193 Rain forth et al., ICML, 2016 Interacting PMCMC almh MCMC 320 Tolpin et al., ECML PKDD, 2015 Adaptive scheduling lightweight Metropolis-Hastings rmh* MCMC 319
palmh MCMC 66
Hastings plmh MCMC 62
bamc MAP 318 Tolpin et al., SoCS, 2015 Bayesian Ascent Monte Carlo siman MAP 193 Tolpin et al., SoCS, 2015 MAP estimation via simulated annealing
90
INVREA
Make Better Decisions
https://invrea.com/plugin/excel/v1/download/
eliminate inference (moving observes up and out)
(defquery beta-bernoulli [observation] (let [dist (beta 1 1) theta (sample dist) like (flip theta)] (observe like observation) (predict :theta theta))) (defquery beta-bernoulli [observation] (let [dist (beta (if observation 2 1) (if observation 1 2)) theta (sample dist)] (predict :theta theta)))
Carette and Shan. “Simplifying Probabilistic Programs Using Computer Algebra⋆.” T.R. 719, Indiana University (2015) Yang - Keynote Lecture, APLAS (2015)
variable elimination to compute
and
exactly
(defquery simple [] (def y (sample (flip 0.5))) (def z (if y (dirac 0) (dirac 1))) (observe z 0) y)
x1 ⇠ 0.5 x2 ⇠ Jflip x1K x3 ⇠ Px2 x4 ⇠ 0 x5 ⇠ Jdirac x4K x6 ⇠ 1 x7 ⇠ Jdirac x6K x8 ⇠ if(x3,x5,x7) x9 ⇠ 0 x10 ⇠ J= x8 x9K
Figaro, etc. Anglican
R Cornish, F Wood, and H Yang “Efficient exact inference in discrete Anglican programs” in prep. 2016
Paige, Wood “Inference Networks for Sequential Monte Carlo in Graphical Models” ICML (2016).
A probabilistic model An inverse model generates latents Can we learn how to sample from the inverse model?
tn zn w0 w1 w2
N
tn zn w0 w1 w2
N
tn zn w0 w1 w2 ϕw
N
at λ = ϕ(η, y),
approximating family q(x|λ)
Target density ,
6= π(x) = p(x|y)
argmin
η
Ep(y) ⇥ DKL(π||qϕ(η,y)) ⇤ argmin
λ
DKL(π||qλ) =
Single dataset :
x|y)
Averaging over all possible datasets:
fit λ to learn an importance sampling proposal learn a mapping from arbitrary datasets to λ …compiles away runtime costs of inference!
Paige, Wood “Inference Networks for Sequential Monte Carlo in Graphical Models” ICML (2016).
x y
y x y x y x + inference = + data = Supervised Unsupervised
x y
y x y x y x + inference = + data = Supervised Unsupervised x
y
y x + = y x
98
p(letters | captcha)
Compiled sequential importance sampling 1 particle Sequential Monte Carlo 10k particles Lightweight Metropolis-Hastings 10k iterations
num-‑letters ¡= ¡5 ¡ … ¡ letters ¡= ¡“gtRai” num-‑letters ¡= ¡4 ¡ … ¡ letters ¡= ¡“dF6D” num-‑letters ¡= ¡6 ¡ … ¡ letters ¡= ¡“q5ihGt”
Compiled inference Classical inference
1) Compilation (1 day) 2) Inference (1 second)
(defquery ¡captcha ¡[baseline-‑image] ¡ ¡ ¡(let ¡[num-‑letters ¡(sample ¡(u-‑d ¡4 ¡7)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡x-‑offset ¡(sample ¡(u-‑d ¡min-‑x ¡max-‑x)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡y-‑offset ¡(sample ¡(u-‑d ¡min-‑y ¡max-‑y)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑x ¡(sample ¡(u-‑d ¡8 ¡15)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑y ¡(sample ¡(u-‑d ¡8 ¡15)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡kerning ¡(sample ¡(u-‑d ¡-‑1 ¡3)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡letter-‑ids ¡(repeatedly ¡num-‑letters ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡#(sample ¡(u-‑d ¡0 ¡dict-‑size))) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡letters ¡(get-‑letters ¡letter-‑ids) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡rendered-‑image ¡(render ¡letters ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡x-‑offset ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡y-‑offset ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡kerning ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑x ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑y)] ¡ ¡ ¡ ¡ ¡;; ¡ABC-‑style ¡observe ¡ ¡ ¡ ¡ ¡(observe ¡(abc-‑dist ¡rendered-‑image ¡abc-‑sigma) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡baseline-‑image) ¡ ¡ ¡ ¡ ¡(predict ¡:letters ¡letters))){x, y}
Probabilistic program Training data Dynamically assembled RNN Trained RNN weights
1) Inference (20 minutes)
Le, Baydin, Wood “Inference Compilation and Universal Probabilistic Programming” in prep 2016
x
y
y x
+ =
y x
y x program source code program output scene description image neural net structures input/output pairs policy and world
simulator constraints
well don’t necessarily scale well in current probabilistic programming systems.
101
(defquery arrange-bumpers [] (let [bumper-positions [] ;; code to simulate the world world (create-world bumper-positions) end-world (simulate-world world) balls (:balls end-world) ;; how many balls entered the box? num-balls-in-box (balls-in-box end-world)] {:balls balls :num-balls-in-box num-balls-in-box :bumper-positions bumper-positions}))
goal: “world” that puts ~20% of balls in box…
(defquery arrange-bumpers [] (let [number-of-bumpers (sample (poisson 20)) bumpydist (uniform-continuous 0 10) bumpxdist (uniform-continuous -5 14) bumper-positions (repeatedly number-of-bumpers #(vector (sample bumpxdist) (sample bumpydist))) ;; code to simulate the world world (create-world bumper-positions) end-world (simulate-world world) balls (:balls end-world) ;; how many balls entered the box? num-balls-in-box (balls-in-box end-world)] {:balls balls :num-balls-in-box num-balls-in-box :bumper-positions bumper-positions}))
(defquery arrange-bumpers [] (let [number-of-bumpers (sample (poisson 20)) bumpydist (uniform-continuous 0 10) bumpxdist (uniform-continuous -5 14) bumper-positions (repeatedly number-of-bumpers #(vector (sample bumpxdist) (sample bumpydist))) ;; code to simulate the world world (create-world bumper-positions) end-world (simulate-world world) balls (:balls end-world) ;; how many balls entered the box? num-balls-in-box (balls-in-box end-world)
(observe obs-dist num-balls-in-box) {:balls balls :num-balls-in-box num-balls-in-box :bumper-positions bumper-positions}))
van de Meent Paige Perov Le Tolpin Yang Rain forth
https://goo.gl/US3b42