Probabilistic Programming Practical
Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015
Probabilistic Programming Practical Frank Wood, Brooks Paige - - PowerPoint PPT Presentation
Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015 Setup Java (> v. 1.5) Java Installation Mac and Windows: Linux: Download and run the installer # Debian/Ubuntu from
Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015
Installation
Mac and Windows: Download and run the installer from https://www.java.com/ en/download/manual.jsp Linux: # Debian/Ubuntu sudo apt-get install default-jre
, →
# Fedora sudo yum install java-1.7.0-openjdk
, →
Installation
# Download lien to ˜/bin mkdir ˜/bin cd ˜/bin wget http://git.io/XyijMQ # Make executable chmod a+x ˜/bin/lein # Add ˜/bin to path echo ’export PATH="$HOME/bin:$PATH"’ >> ˜/.bashrc # Run lein lein Further details: http://leiningen.org/
master.zip
5
ML: Algorithms & Applications STATS: Inference & Theory PL: Compilers, Semantics, Analysis Probabilistic Programming
Parameters Program Output
CS
Parameters Program Observations
Probabilistic Programming Statistics y
“Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations.” Gordon et al, 2014
Start
Identify And Formalize Problem, Gather Data Design Model = Read Papers, Do Math Existing Model Sufficient? Choose Approximate Inference Algorithm Derive Updates And Code Inference Algorithm Exists And Can Use? Performs Well Statistically?
End
Performs Well Computationally? Search For Useable Implementation Test Scale Deploy Simple Model? Implement Using High Level Modeling Tool Tool Supports Required Features? N N N N N N Y Y Y Y Y Y LEGEND Color indicates the skills that are required to traverse the edge. PhD-level machine learning or statistics PhD-level machine learning or computer science Non-specialist Feasible? Y N
Start
Identify And Formalize Problem, Gather Data Design Model = Write Probabilistic Program Existing Model Sufficient? Derive Updates And Code Inference Algorithm Exists And Can Use? Performs Well?
End
Performs Well Computationally? Search For Useable Implementation Debug, Test, Profile Scale Deploy N Y LEGEND Color indicates the skills that are required to traverse the edge. Non-specialist Choose Approximate Inference Algorithm Simple Model? Implement Using High Level Modeling Tool Tool Supports Required Features? Feasible?
Programming Language Representation / Abstraction Layer
Inference Engine(s) Models / Stochastic Simulators
c0Hr
γλm
c1πm Hy θm
r0 s0 r1 s1 y1 r2 s2 y2rT
sT yT r3 s3 y3∞
α G π θ c y
i k kK K α G π θ c y
i k kα
wd
izd
iβk
γ
d = 1...D i = 1 . . . N d . θd k = 1...K15
Gaussian Unknown Mean
expressed as programs via query
16
Learning objectives
as inference over program executions
deterministic generative process, here a 2D-physics simulator
18
Use inference to solve a mechanism design
19
by running the program forward
y1 y2 θ x1 x2 x11 x12 x13 x21 x22
etc
p(y1:N, x1:N) =
N
Y
n=1
g(yn|x1:n)f(xn|x1:n−1)
y1 y2 x1 x2 x3 y3
f(xn|x1:n−1) g(yn|x1:n) xn yn
x1,1 = 3 x1,2 = 0 x1,2 = 1
x1,2 = 2
(let [x-1-1 3 x-1-2 (sample (discrete (range x-1-1)))] (if (not= x-1-2 1) (let [x-2-1 (+ x-1-2 7)] (sample (poisson x-2-1)))))
x2,1 = 7 x2,1 = 9
x2,2 = 0 x2,2 = 1
. . .
x1,1 = 3 x1,2 = 0 x1,2 = 1
x1,2 = 2
(let [x-1-1 3 x-1-2 (sample (discrete (range x-1-1)))] (if (not= x-1-2 1) (let [x-2-1 (+ x-1-2 7)] (sample (poisson x-2-1))))
(observe (gaussian x-2-1 0.0001) 7)))
x2,1 = 7 x2,1 = 9
x2,2 = 0 x2,2 = 1
. . .
Posterior distribution of execution traces is proportional to trace score with
Metropolis-Hastings acceptance rule
Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation [Wingate, Stuhlmüller et al, 2011]
24
▪ Need
▪ Proposal
▪ Have
▪ Likelihoods (via observe statement restrictions) ▪ Prior (sequence of ERP returns; scored in interpreter)
min ✓ 1, p(y|x0)p(x0)q(x|x0) p(y|x)p(x)q(x0|x) ◆
p(x|y) ∝ ˜ p(y = observes, x)
q(x0|x) = κ(x0
m,j|xm,j)
|x| p(x0\x | x0 ∩ x) p(x0
m,j|x0 ∩ x) .
Single stochastic procedure (SP) output Number of SP’s in
Probability of new SP return value (sample) given trace prefix Probability of new part of proposed execution trace
[Wingate, Stuhlmüller et al, 2011]
p(y|x0) p(x0) |x| p(x\x0 | x ∩ x0) p(y|x) p(x) |x0| p(x0\x | x0 ∩ x) .
“Single site update” = sample from the prior = run program forward Simplified MH acceptance ratio
Number of SP applications in original trace Probability of regenerating current trace continuation given proposal trace beginning Number of SP applications in new trace Probability of generating proposal trace continuation given current trace beginning
26
pled conditioned on the preced at κ(x0
m,j|xm,j) = p(x0 m,j|x0 ∩ x).
density can now be expressed
Sequential ¡Monte ¡Carlo ¡targets ¡ With ¡a ¡weighted ¡set ¡of ¡particles ¡ Noting ¡the ¡identity ¡ We ¡can ¡use ¡importance ¡sampling ¡to ¡generate ¡samples ¡from ¡ ¡ Given ¡a ¡sample-‑based ¡approximation ¡to ¡ ¡ ¡ ¡
p(x1:N|y1:N) ∝ ˜ p(y1:N, x1:N) ≡
N
Y
n=1
g(yn|x1:n)f(xn|x1:n−1)
p(x1:N|y1:N) ≈
L
X
`=1
w`
Nx`
1:N (x1:N).
p(x1:n|y1:n) =
1)p(x1:n−1|y1:n−1)
n = 1 n = 2
Iteratively,
Observe Particle
Run program forward until next observe Weight of particle Is observation likelihood Proposal
p(x1:n1|y1:n1) ≈
L
X
`=1
w`
n1δx`
1:n−1(x1:n1)
p(x1:n|y1:n) = g(yn|x1:n)f(xn|x1:n1)p(x1:n1|y1:n1) q(x1:n|y1:n) = f(xn|x1:n1)p(x1:n1|y1:n1) p(x1:n|y1:n) ≈
L
X
`=1
g(yn|x`
1:n)δx`
1:n(x1:n),
x`
1:n = x` nx a`
n−1
1:n1 ∼ f
Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 W., van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Sequence of environments Parallel executions
p(x1) f(xn|x1:n1) g(yn|x1:n)
Paige and W. “A Compilation Target for Probabilistic Programming Languages.” ICML, 2014
Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014
Algorithm 1 Parallel SMC program execution Assume: N observations, L particles launch L copies of the program (parallel) for n = 1 . . . N do wait until all L reach observe yn (barrier) update unnormalized weights ˜ w1:L
n
(serial) if ESS < ⌧ then sample number of offspring O1:L
n
(serial) set weight ˜ w1:L
n
= 1 (serial) for ` = 1 . . . L do fork or exit (parallel) end for else set all number of offspring O`
n = 1
(serial) end if continue program execution (parallel) end for wait until L program traces terminate (barrier)
predict from L samples from ˆ
p(x1:L
1:N|y1:N)
(serial)
Intuitively
Threads
continuations
n n n
n n n
n n n
now a building block for
techniques
independent Metropolis- Hastings”
conditional SMC”
s=1 s=2 s=3
[Andrieu, Doucet, Holenstein 2010] [W., van de Meent, Mansinghka 2014]
Poisson
inference variants work in probabilistic programming systems
34
Coordination
mutually recursive queries
index.html
35
Coordination
mutually recursive queries
index.html
36
Discrete RV’s Only
2000 1990 2010
PL
HANSAI IBAL Figaro
ML STATS
WinBUGS BUGS JAGS STAN LibBi Venture Anglican Church Probabilistic-C infer.NET webChurch Blog Factorie
AI
Prism Prolog KMP
Bounded Recursion
Problog Simula Probabilistic-ML,Haskell,Scheme,…
“Asynchronous Anytime Sequential Monte Carlo” [Paige, W., Doucet, Teh NIPS 2014]
“Particle Gibbs with Ancestor Sampling for Probabilistic Programs” [van de Meent, Yang, Mansinghka, W. AISTATS 2015]
“Maximum a Posteriori Estimation by Search in Probabilistic Models” [Tolpin, W., SOCS, 2015]
“Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs” [Tolpin, van de Meent, Paige, W ; ECML, 2015]
“Neural Adaptive Inference for Probabilistic Programming” [Paige, W.; in submission]
Inference Probabilistic Programming Language Models Applications Probabilistic Programming System