Probabilistic Programming Practical Frank Wood, Brooks Paige - - PowerPoint PPT Presentation

probabilistic programming practical
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Programming Practical Frank Wood, Brooks Paige - - PowerPoint PPT Presentation

Probabilistic Programming Practical Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015 Setup Java (> v. 1.5) Java Installation Mac and Windows: Linux: Download and run the installer # Debian/Ubuntu from


slide-1
SLIDE 1

Probabilistic Programming Practical

Frank Wood, Brooks Paige {fwood,brooks}@robots.ox.ac.uk MLSS 2015

slide-2
SLIDE 2

Setup

slide-3
SLIDE 3

Java

Installation

Mac and Windows: Download and run the installer from https://www.java.com/ en/download/manual.jsp Linux: # Debian/Ubuntu sudo apt-get install default-jre

, →

# Fedora sudo yum install java-1.7.0-openjdk

, →

Java (> v. 1.5)

slide-4
SLIDE 4

Leiningen

Installation

# Download lien to ˜/bin mkdir ˜/bin cd ˜/bin wget http://git.io/XyijMQ # Make executable chmod a+x ˜/bin/lein # Add ˜/bin to path echo ’export PATH="$HOME/bin:$PATH"’ >> ˜/.bashrc # Run lein lein Further details: http://leiningen.org/

Leiningen (v. > 2.0)

slide-5
SLIDE 5

Practical Materials

  • https://bitbucket.org/probprog/mlss2015/get/

master.zip

  • cd mlss2015
  • lein gorilla
  • open the url

5

slide-6
SLIDE 6

Schedule

  • 15:35 - 16:05 Intro/Hello World!
  • 16:05 - 16:30 Gaussian (you code)
  • 16:30 - 16:40 Discuss / intro to physics problem
  • 16:40 - 16:55 Physics (you code)
  • 16:55 - 17:00 Share / discuss solutions
  • 17:00 - 17:20 Inference explanation
  • 17:20 - 17:45 Poisson (you code)
  • 17:45 - 17:50 Inference Q/A
  • 17:50 - 18:05 Coordination (you code)
slide-7
SLIDE 7

What is probabilistic programming?

slide-8
SLIDE 8

An Emerging Field

ML: Algorithms & Applications STATS: Inference & Theory PL: Compilers, Semantics, Analysis Probabilistic Programming

slide-9
SLIDE 9

Conceptualization

Parameters Program Output

CS

Parameters Program Observations

Probabilistic Programming Statistics y

p(y|x)p(x) p(x|

slide-10
SLIDE 10

Operative Definition

“Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations.” Gordon et al, 2014

slide-11
SLIDE 11

What are the goals of probabilistic programming?

slide-12
SLIDE 12

Simplify Machine Learning…

Start

Identify And Formalize Problem, Gather Data Design Model = Read Papers, Do Math Existing Model Sufficient? Choose Approximate Inference Algorithm Derive Updates And Code Inference Algorithm Exists And Can Use? Performs Well Statistically?

End

Performs Well Computationally? Search For Useable Implementation Test Scale Deploy Simple Model? Implement Using High Level Modeling Tool Tool Supports Required Features? N N N N N N Y Y Y Y Y Y LEGEND Color indicates the skills that are required to traverse the edge. PhD-level machine learning or statistics PhD-level machine learning or computer science Non-specialist Feasible? Y N

slide-13
SLIDE 13

To This

Start

Identify And Formalize Problem, Gather Data Design Model = Write Probabilistic Program Existing Model Sufficient? Derive Updates And Code Inference Algorithm Exists And Can Use? Performs Well?

End

Performs Well Computationally? Search For Useable Implementation Debug, Test, Profile Scale Deploy N Y LEGEND Color indicates the skills that are required to traverse the edge. Non-specialist Choose Approximate Inference Algorithm Simple Model? Implement Using High Level Modeling Tool Tool Supports Required Features? Feasible?

slide-14
SLIDE 14

Automate Inference

Programming Language Representation / Abstraction Layer

Inference Engine(s) Models / Stochastic Simulators

c0

Hr

γ

λm

c1

πm Hy θm

r0 s0 r1 s1 y1 r2 s2 y2

rT

sT yT r3 s3 y3

α G π θ c y

i k k
  • i
N

K K α G π θ c y

i k k
  • i
N 1 1

α

wd

i

zd

i

βk

γ

d = 1...D i = 1 . . . N d . θd k = 1...K
slide-15
SLIDE 15

Hello World!

15

slide-16
SLIDE 16

First Exercise

Gaussian Unknown Mean

  • Learning objectives
  • 1. Clojure
  • 2. Gorilla REPL
  • 3. Anglican
  • 4. Automatic inference over generative models

expressed as programs via query

  • Resources
  • https://clojuredocs.org/
  • https://bitbucket.org/probprog/anglican/
  • http://www.robots.ox.ac.uk/~fwood/anglican/index.html

16

slide-17
SLIDE 17

Simulation

slide-18
SLIDE 18

Second Exercise

Learning objectives

  • 1. Develop experience thinking about expressing problems

as inference over program executions

  • 2. Understand how to perform inference over a complex

deterministic generative process, here a 2D-physics simulator

18

slide-19
SLIDE 19

Second Exercise

Use inference to solve a mechanism design

  • ptimization task:
  • get all balls safely in bin

19

slide-20
SLIDE 20

Inference

slide-21
SLIDE 21

Trace Probability

  • observe data points
  • internal random choices
  • simulate from

by running the program forward

  • weight traces by observes

y1 y2 θ x1 x2 x11 x12 x13 x21 x22

{

{

etc

p(y1:N, x1:N) =

N

Y

n=1

g(yn|x1:n)f(xn|x1:n−1)

y1 y2 x1 x2 x3 y3

f(xn|x1:n−1) g(yn|x1:n) xn yn

slide-22
SLIDE 22

Trace

x1,1 = 3 x1,2 = 0 x1,2 = 1

x1,2 = 2

(let [x-1-1 3 x-1-2 (sample (discrete (range x-1-1)))] (if (not= x-1-2 1) (let [x-2-1 (+ x-1-2 7)] (sample (poisson x-2-1)))))

x2,1 = 7 x2,1 = 9

x2,2 = 0 x2,2 = 1

. . .

slide-23
SLIDE 23

Observe

x1,1 = 3 x1,2 = 0 x1,2 = 1

x1,2 = 2

(let [x-1-1 3 x-1-2 (sample (discrete (range x-1-1)))] (if (not= x-1-2 1) (let [x-2-1 (+ x-1-2 7)] (sample (poisson x-2-1))))

(observe (gaussian x-2-1 0.0001) 7)))

x2,1 = 7 x2,1 = 9

x2,2 = 0 x2,2 = 1

. . .

slide-24
SLIDE 24

“Single Site” MCMC = LMH

Posterior distribution of execution traces is proportional to trace score with

  • bserved values plugged in

Metropolis-Hastings acceptance rule

Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation [Wingate, Stuhlmüller et al, 2011]

24

▪ Need

▪ Proposal

▪ Have

▪ Likelihoods (via observe statement restrictions) ▪ Prior (sequence of ERP returns; scored in interpreter)

min ✓ 1, p(y|x0)p(x0)q(x|x0) p(y|x)p(x)q(x0|x) ◆

p(x|y) ∝ ˜ p(y = observes, x)

slide-25
SLIDE 25

q(x0|x) = κ(x0

m,j|xm,j)

|x| p(x0\x | x0 ∩ x) p(x0

m,j|x0 ∩ x) .

LMH Proposal

Single stochastic procedure (SP) output Number of SP’s in

  • riginal trace

Probability of new SP return value (sample) given trace prefix Probability of new part of proposed execution trace

[Wingate, Stuhlmüller et al, 2011]

slide-26
SLIDE 26

p(y|x0) p(x0) |x| p(x\x0 | x ∩ x0) p(y|x) p(x) |x0| p(x0\x | x0 ∩ x) .

LMH Implementation

“Single site update” = sample from the prior = run program forward Simplified MH acceptance ratio

Number of SP applications in original trace Probability of regenerating current trace continuation given proposal trace beginning Number of SP applications in new trace Probability of generating proposal trace continuation given current trace beginning

26

pled conditioned on the preced at κ(x0

m,j|xm,j) = p(x0 m,j|x0 ∩ x).

density can now be expressed

slide-27
SLIDE 27

Sequential ¡Monte ¡Carlo ¡targets ¡ With ¡a ¡weighted ¡set ¡of ¡particles ¡ Noting ¡the ¡identity ¡ We ¡can ¡use ¡importance ¡sampling ¡to ¡generate ¡samples ¡from ¡ ¡ Given ¡a ¡sample-­‑based ¡approximation ¡to ¡ ¡ ¡ ¡

Introduction : Sequential Monte Carlo

p(x1:N|y1:N) ∝ ˜ p(y1:N, x1:N) ≡

N

Y

n=1

g(yn|x1:n)f(xn|x1:n−1)

p(x1:N|y1:N) ≈

L

X

`=1

w`

Nx`

1:N (x1:N).

p(x1:n|y1:n) =

1)p(x1:n−1|y1:n−1)

slide-28
SLIDE 28

n = 1 n = 2

Iteratively, 


  • simulate

  • weight

  • resample

SMC

Observe Particle

slide-29
SLIDE 29

Run program forward until next observe Weight of particle Is observation likelihood Proposal

p(x1:n1|y1:n1) ≈

L

X

`=1

w`

n1δx`

1:n−1(x1:n1)

p(x1:n|y1:n) = g(yn|x1:n)f(xn|x1:n1)p(x1:n1|y1:n1) q(x1:n|y1:n) = f(xn|x1:n1)p(x1:n1|y1:n1) p(x1:n|y1:n) ≈

L

X

`=1

g(yn|x`

1:n)δx`

1:n(x1:n),

x`

1:n = x` nx a`

n−1

1:n1 ∼ f

SMC for Probabilistic Programming

Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 W., van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014

Sequence of environments Parallel executions

slide-30
SLIDE 30
  • Initialization (sample)
  • Forward simulation (sample)
  • Observation likelihood computation
  • pointwise evaluation up to normalization

SMC Methods Only Require

p(x1) f(xn|x1:n1) g(yn|x1:n)

slide-31
SLIDE 31

SMC for Probabilistic Programming

Paige and W. “A Compilation Target for Probabilistic Programming Languages.” ICML, 2014

Paige and W. “A Compilation Target for Probabilistic Programming Languages” ICML 2014

Algorithm 1 Parallel SMC program execution Assume: N observations, L particles launch L copies of the program (parallel) for n = 1 . . . N do wait until all L reach observe yn (barrier) update unnormalized weights ˜ w1:L

n

(serial) if ESS < ⌧ then sample number of offspring O1:L

n

(serial) set weight ˜ w1:L

n

= 1 (serial) for ` = 1 . . . L do fork or exit (parallel) end for else set all number of offspring O`

n = 1

(serial) end if continue program execution (parallel) end for wait until L program traces terminate (barrier)

predict from L samples from ˆ

p(x1:L

1:N|y1:N)

(serial)

slide-32
SLIDE 32

Intuitively


  • run

  • wait

  • fork

SMC for Probabilistic Programming

Threads

  • bserve delimiter

continuations

slide-33
SLIDE 33

SMC Inner Loop

n n n

n n n

n n n

  • Sequential Monte Carlo is

now a building block for

  • ther inference

techniques

  • Particle MCMC
  • PIMH : “particle

independent Metropolis- Hastings”

  • iCSMC : “iterated

conditional SMC”

  • ­‑ ¡ ¡

s=1 s=2 s=3

[Andrieu, Doucet, Holenstein 2010] [W., van de Meent, Mansinghka 2014]

slide-34
SLIDE 34

Third Exercise

Poisson

  • Learning objectives
  • 1. Understand trace
  • 2. Understand on which variables inference algorithms
  • perate
  • 3. Develop intuitions about how LMH and SMC

inference variants work in probabilistic programming systems

34

slide-35
SLIDE 35

Fourth Exercise

Coordination

  • Learning objectives
  • 1. Learn how to write non-trivial models including

mutually recursive queries

  • Resources
  • http://forestdb.org/
  • http://www.robots.ox.ac.uk/~fwood/anglican/examples/

index.html

35

slide-36
SLIDE 36

Fourth Exercise

Coordination

  • Learning objectives
  • 1. Learn how to write non-trivial models including

mutually recursive queries

  • Resources
  • http://forestdb.org/
  • http://www.robots.ox.ac.uk/~fwood/anglican/examples/

index.html

36

slide-37
SLIDE 37

Discrete RV’s Only

2000 1990 2010

Systems

PL

HANSAI IBAL Figaro

ML STATS

WinBUGS BUGS JAGS STAN LibBi Venture Anglican Church Probabilistic-C infer.NET webChurch Blog Factorie

AI

Prism Prolog KMP

Bounded Recursion

Problog Simula Probabilistic-ML,Haskell,Scheme,…

slide-38
SLIDE 38

Opportunities

  • Parallelism

“Asynchronous Anytime Sequential Monte Carlo” [Paige, W., Doucet, Teh NIPS 2014]

  • Backwards passing

“Particle Gibbs with Ancestor Sampling for Probabilistic Programs” [van de Meent, Yang, Mansinghka, W. AISTATS 2015]

  • Search

“Maximum a Posteriori Estimation by Search in Probabilistic Models” [Tolpin, W., SOCS, 2015]

  • Adaptation

“Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs” [Tolpin, van de Meent, Paige, W ; ECML, 2015]

  • Novel proposals

“Neural Adaptive Inference for Probabilistic Programming” [Paige, W.; in submission]

slide-39
SLIDE 39

Bubble Up

Inference Probabilistic Programming Language Models Applications Probabilistic Programming System