Probabilistic Programming Frank Wood frank@invrea.com - PowerPoint PPT Presentation

Graphical Model ( defquery gaussian-model [data] √ x ∼ Normal(1 , 5) ( let [x ( sample ( normal 1 ( sqrt 5))) √ sigma ( sqrt 2)] y i | x ∼ Normal( x, 2) (map ( fn [y] ( observe ( normal x sigma) y)) data) x)) ( def dataset [9 8]) y 1 = 9 , y 2 = 8 ( def posterior (( conditional gaussian-model x | y ∼ Normal(7 . 25 , 0 . 91) :pgibbs :number-of-particles 1000) dataset)) ( def posterior-samples ( repeatedly 20000 # ( sample posterior))) 32

Anglican : Syntax ≈ Clojure, Semantics ≠ Clojure ( defquery gaussian-model [data] √ x ∼ Normal(1 , 5) ( let [x ( sample ( normal 1 ( sqrt 5))) √ sigma ( sqrt 2)] y i | x ∼ Normal( x, 2) (map ( fn [y] ( observe ( normal x sigma) y)) data) x)) ( def dataset [9 8]) y 1 = 9 , y 2 = 8 ( def posterior (( conditional gaussian-model x | y ∼ Normal(7 . 25 , 0 . 91) :pgibbs :number-of-particles 1000) dataset)) ( def posterior-samples ( repeatedly 20000 # ( sample posterior))) https://www.java.com/ 33

Bayes Net ( defquery sprinkler-bayes-net [sprinkler wet-grass] ( let [is-cloudy ( sample ( flip 0.5)) is-raining (cond (= is-cloudy true ) ( sample ( flip 0.8)) (= is-cloudy false) ( sample ( flip 0.2))) sprinkler-dist (cond (= is-cloudy true) ( flip 0.1) (= is-cloudy false) ( flip 0.5)) wet-grass-dist (cond (and (= sprinkler true) (= is-raining true)) ( flip 0.99) (and (= sprinkler false) (= is-raining false)) ( flip 0.0) (or (= sprinkler true) (= is-raining true)) ( flip 0.9))] ( observe sprinkler-dist sprinkler) ( observe wet-grass-dist wet-grass) is-raining)) 34

One Hidden Markov Model x 0 x 1 x 2 x 3 · · · ( defquery hmm ( let [init-dist ( discrete [1 1 1]) y 1 y 2 y 3 trans-dist ( fn [s] (cond (= s 0) ( discrete [0 1 1]) (= s 1) ( discrete [0 0 1]) (= s 2) ( dirac 2))) obs-dist ( fn [s] ( normal s 1)) y-1 1 y-2 1 x-0 ( sample init-dist) x-1 ( sample ( trans-dist x-0)) x-2 ( sample ( trans-dist x-1))] ( observe ( obs-dist x-1) y-1) ( observe ( obs-dist x-2) y-2) [x-0 x-1 x-2]))

All Hidden Markov Models ( defquery hmm [ys init-dist trans-dists obs-dists] (reduce ( fn [xs y] ( let [x ( sample (get trans-dists (peek xs)))] ( observe (get obs-dists x) y) (conj xs x))) [( sample init-dist)] ys)) x 0 x 1 x 2 x 3 · · · y 1 y 2 y 3

New Primitives ( defquery geometric [p] "geometric distribution" ( let [dist ( flip p) samp ( loop [n 0] ( if ( sample dist) n ( recur (+ n 1))))] samp)) p 1-p 0 p 1-p 1 p 1-p 2

A Hard Inference Problem ( defquery md5-inverse [L md5str] "conditional distribution of strings that map to the same MD5 hashed string" ( let [mesg ( sample ( string-generative-model L)) ] ( observe ( dirac md5str) ( md5 mesg)) mesg)))

Evaluation-Based Inference for Higher-Order PPLs

The Gist • Explore as many “traces” as possible, intelligently • Each trace contains all random choices made during the execution of a generative model • Compute trace “goodness” (probability) as side-effect • Combine weighted traces probabilistically coherently • Report projection of posterior over traces • If it’s going to be “hard,” let’s at least make it fast • First generation - interpreted • Second generation - compiled 40

Traces x 2 = 0 ( poisson 7) x 2 = 1 x 2 = 2 x 1 = 0 ( discrete `(1 1 1)) x 1 = 1 . . . x 1 = 2 x 2 = 2 ( poisson 9) x 2 = 1 x 2 = 0 . ( let [t-1 3 . . x-1 ( sample ( discrete (repeat t-1 1)))] ( if (not= x-1 1) ( let [t-2 (+ x-1 7) x-2 ( sample ( poisson t-2))])))

Goodness of Trace ( normpdf 0 1 0.0001) x 2 = 0 ( normpdf 1 1 0.0001) ( poisson 7) x 2 = 1 ( normpdf 2 1 0.0001) x 2 = 2 x 1 = 0 ( discrete `(1 1 1)) x 1 = 1 . . . ( normpdf 2 1 0.0001) x 1 = 2 x 2 = 2 ( normpdf 1 1 0.0001) ( poisson 9) x 2 = 1 x 2 = 0 ( normpdf 0 1 0.0001) ( let [t-1 3 . . . x-1 ( sample ( discrete (repeat t-1 1)))] ( if (not= x-1 1) ( let [t-2 (+ x-1 7) x-2 ( sample ( poisson t-2))] ( observe ( gaussian x-2 0.0001) 1))))

Trace • Sequence of N observe ’s e encounter N s { ( g i , φ i , y i ) } N i =1 to the sample • Sequence of M sample ’s . This yields seq d { ( f j , θ j ) } M j =1 sampled values • Sequence of M sampled values ments, wi e) { x j } M j =1 . own norm • Conditioned on these sampled values the entire computation is deterministic

Trace Probability • Defined as (up to a normalization constant) N M Y Y γ ( x ) , p ( x , y ) = g i ( y i | φ i ) f j ( x j | θ j ) . i =1 j =1 • Hides true dependency structure ◆ M N � � ✓ ✓ ◆ � ˜ ˜ � ˜ Y Y � � γ ( x ) = p ( x , y ) = g i ( x n i ) ˜ φ i ( x n i ) f j ( x j − 1 ) θ j ( x j − 1 ) y i x j � � i =1 j =1 x 6 { alue x j = x 1 × · · · × x j denote x 4 { sampled values (with x 1 x 2 x 3 x 4 x 5 x 6 etc y 1 y 2

Inference Goal • Posterior over traces π ( x ) , p ( x | y ) = γ ( x ) Z Z = p ( y ) = γ ( x ) d x , Z • Output Q ( x ) π ( x ) d x = 1 Q ( x ) γ ( x ) Z Z E [ z ] = E [ Q ( x )] = q ( x ) q ( x ) d x Z

Three Base Algorithms • Likelihood Weighting • Sequential Monte Carlo • Metropolis Hastings

Likelihood Weighting • Run K independent copies of program simulating from the prior M k Y q ( x k ) = f j ( x k j | θ k j ) j =1 • Accumulate unnormalized weights (likelihoods) N k w ( x k ) = γ ( x k ) Y g k i ( y k i | φ k q ( x k ) = i ) X i =1 b • Use in approximate (Monte Carlo) integration X K w ( x k ) W k = b W k Q ( x k ) E ⇡ [ Q ( x )] = P K ` =1 w ( x ` ) k =1 BLOG default inference engine: http://bayesianlogic.github.io/pages/users-manual.html

Likelihood Weighting Schematic z 1 , w 1 z 2 , w 2 . . . . . . z K , w K

Sequential Monte Carlo subspace of x which is with ˜ x 1: n = ˜ x 1 × · · · × ˜ • Notation x n such disjoint. While there are alw ˜ ˜ x 1 x 2 { { etc x 1 x 2 x 3 x 4 x 6 x 5 y 1 y 2 • Incrementalized joint N Y γ n (˜ x 1: n ) = g ( y n | ˜ x 1: n ) p (˜ x n | ˜ x 1: n − 1 ) , n =1 • Incrementalized target ed incremental targets x 1: n ) = 1 π n (˜ γ n (˜ x 1: n ) Z n

SMC for Probabilistic Programming Want samples from π n (˜ x 1: n ) ∝ p ( y n | ˜ x 1: n ) p (˜ x n | ˜ x 1: n − 1 ) π n − 1 (˜ x 1: n − 1 ) Have a sample-based approximation to K X W k x 1: n − 1 ) , π n − 1 (˜ ˆ 1: n � 1 (˜ x 1: n − 1 ) n − 1 δ ˜ x k k =1 Sample from n � 1 n � 1 a k a k a k x k ˜ 1: n � 1 ∼ ˆ n � 1 π n � 1 (˜ x 1: n � 1 ) ˜ n | ˜ 1: n � 1 ∼ p (˜ x n | ˜ n � 1 1: n � 1 ) n � 1 x x x a k x k x k ˜ 1: n = ˜ 1: n − 1 × ˜ n − 1 n . x X Importance weight by k =1 n | x k w (˜ 1: n ) 1: n � 1 W k n , x k x k 1: n ) = g k x k w (˜ 1: n ) = p ( y n | ˜ n ( y n | ˜ 1: n ) P K x k 0 k 0 =1 w (˜ 1: n ) Wood, van de Meent, and Mansinghka “A New Approach to Probabilistic Programming Inference” AISTATS 2014 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014

SMC Schematic Intuitively   - run   - wait/weight   Threads - continue continuations observe

Metropolis Hastings = “Single Site” MCMC = LMH Posterior distribution of execution traces is proportional to trace score with observed values plugged in N M π ( x ) , p ( x | y ) = γ ( x ) Y Y γ ( x ) , p ( x , y ) = Z , Z g i ( y i | φ i ) f j ( x j | θ j ) . i =1 j =1 Metropolis-Hastings acceptance rule ✓ ◆ 1 , π ( x 0 ) q ( x | x 0 ) α = min π ( x ) q ( x 0 | x ) ▪ Need proposal Milch and Russell “General-Purpose MCMC Inference over Relational Structures.” UAI 2006. Goodman, Mansinghka, Roy, Bonawitz, and Tenenbaum “Church: a language for generative models.” UAI 2008. 55 Wingate, Stuhlmüller, Goodman “Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation” AISTATS 2011

LMH Proposal Probability of new part of proposed execution trace M 0 1 Y q ( x 0 | x s ) = ` | x s M s  ( x 0 f 0 j ( x 0 j | ✓ 0 ` ) j ) j = ` +1 Number of samples in original trace

LMH Acceptance Ratio “Single site update” = sample from the prior = run program forward κ ( x 0 m | x m ) = f m ( x 0 m | θ m ) , θ m = θ 0 m MH acceptance ratio Probability of original trace continuation restarting proposal trace at m th sample Number of sample statements in original trace � ( x 0 ) M Q M ! j = m f j ( x j | ✓ j ) ↵ = min 1 , � ( x ) M 0 Q M 0 j ( x 0 j | ✓ 0 j ) j = m f 0 Number of sample statements Probability of proposal trace continuation in new trace restarting original trace at m th sample 57

LMH Schematic z 1 z 1 z 3 . . . . . . z K

Implementation Strategy • Interpreted • Interpreter tracks side effects and directs control flow for inference • Compiled • Leverages existing compiler infrastructure • Can only exert control over flow from within function calls • e.g. sample, observe, predict Wingate, Stuhlmüller, Goodman “Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation” AISTATS 2011 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014

Probabilistic C Standard C plus new directives: observe and predict observe constrains program execution predict emits sampled values

Probabilistic C Implementation Actually   - run   - wait/weight   Processes - fork new processes/continuations observe Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014

Continuations • A continuation is a function that encapsulates the “rest of the computation” • A Continuation Passing Style (CPS) transformation rewrites programs so • no function ever returns • every function takes an extra argument, a function called the continuation • Standard programming language technique • No limitations Friedman and Wand. “Essentials of programming languages.” MIT press, 2008. Fischer, Kiselyov, and Shan “Purely functional lazy non-deterministic programming” ACM Sigplan 2009 Goodman and Stuhlmüller http://dippl.org/ 2014 Tolpin https://bitbucket.org/probprog/anglican/ 2014

Example CPS Transformation ;; Standard Clojure: (println (+ (* 2 3) 4)) ;; CPS transformed: (*& 2 3 (fn [x] (+& x 4 println))) Second cont. First continuation ;; CPS-transformed "primitives" (defn +& [a b k] (k (+ a b))) (defn *& [a b k] (k (* a b)))

CPS Explicitly Linearizes Execution (defn pythag& "compute sqrt(x^2 + y^2)" [x y k] (square& x xx = x 2 (fn [xx] yy = y 2 (square& y (fn [yy] xxyy = xx + yy (+& xx yy · = √ xxyy (fn [xxyy] (sqrt& xxyy k)))))))) • Compiling to a pure language with lexical scoping ensures A. variables needed in subsequent computation are bound in the environment B. can’t be modified by multiple calls to the continuation function

Anglican Programs (defquery flip-example [outcome] (let [u (uniform-continuous 0 1) (let [p (sample (uniform-continuous 0 1))] (observe (flip p) outcome) p (sample u) (predict :p p)) dist (flip p)] (flip-example true) (observe dist outcome) (predict :p p)) Anglican Anglican “linearized”

Are “Compiled” to Native CPS-Clojure (defn flip-query& [outcome k1] (let [u (uniform-continuous 0 1) (uniform-continuous& 0 1 (fn [dist1] (sample& dist1 p (sample u) (fn [p] ((fn [p k2] (flip& p dist (flip p)] (fn [dist2] (observe& dist2 outcome (observe dist outcome) (fn [] (predict& :p p k2)))))) (predict :p p)) p k1)))))) ;; CPS-ed distribution constructors (defn uniform-continuous& [a b k] (k (uniform-continuous a b))) (defn flip& [p k] (k (flip p))) Clojure Anglican “linearized”

Explicit Functional Form for “Rest of Program” (defn flip-query& [outcome k1] (uniform-continuous& 0 1 continuation functions (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))

Interruptible (defn flip-query& [outcome k1] (uniform-continuous& 0 1 Anglican primitives continuation functions (fn [dist1] (sample& dist1 (fn [p] ((fn [p k2] (flip& p (fn [dist2] (observe& dist2 outcome (fn [] (predict& :p p k2)))))) p k1))))))

Controllable (defn flip-query& [outcome k1] (defn flip-query& [outcome k1] (uniform-continuous& 0 1 (uniform-continuous& 0 1 Anglican primitives continuation functions (fn [dist1] (fn [dist1] (sample& dist1 (sample& dist1 (fn [p] ((fn [p k2] (fn [p] ((fn [p k2] (flip& p (flip& p (fn [dist2] (fn [dist2] (observe& dist2 outcome (observe& dist2 outcome (fn [] (fn [] (predict& :p p k2)))))) (predict& :p p k2)))))) p k1)))))) p k1)))))) inference “backend” interface webPPL CPS compiles to pure functional Javascript

Inference “Backend” ;; Implement a "backend" (defn sample& [dist k] ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] ;; Pass the sampled value to the continuation (k (sample dist))) (defn observe& [dist value k] (println "log-weight =" (observe dist value)) ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] ;; Call continuation with no arguments (k)) (defn predict& [label value k] ;; [ ALGORITHM-SPECIFIC IMPLEMENTATION HERE ] (k label value))

Common Framework Pure compiled deterministic computation P start P continue P terminate continue P ameter vector ameter vector call (k x) call (k) . call (k) . “Backend” sample observe predict terminate ( g, φ , y, k ) ( f, θ , k ) ( z, k )

Likelihood Weighting “Backend” (defn sample& [dist k] ;; Call the continuation with a sampled value (k (sample dist))) (defn observe& [dist value k] ;; Compute and record the log weight (add-log-weight! (observe dist value)) ;; Call the continuation with no arguments (k)) (defn predict& [label value k] ;; Store predict, and call continuation (store! label value) (k))

Likelihood Weighting Example Compiled pure deterministic computation terminate P start P continue P continue P sample & predict & observe & terminate w ← p I ( outcome = true ) (1 − p ) I ( outcome = false ) p ∼ U (0 , 1) “Backend” (defquery flip-example [outcome] (let [p (sample (uniform-continuous 0 1))] (observe (flip p) outcome) (predict :p p))

SMC Backend (defn sample& [dist k] ;; Call the continuation with a sampled value (k (sample dist))) (defn observe& [dist value k] ;; Block and wait for K calls to reach observe& ;; Compute weights ;; Use weights to subselect continuations to call ;; Call K sampled continuations (often multiple times) ) (defn predict& [label value k] ;; Store predict, and call continuation (store! label value) (k))

LMH Backend (defn sample& [a dist k] (let [ ;; reuse previous value, ;; or sample from prior x (or (get-cache a) (sample dist))] ;; add to log-weight when reused (when (get-cache a) (add-log-weight! (observe dist x))) ;; store value and its log prob in trace (store-in-trace! a x dist) ;; continue with value x (k x))) (defn observe& [dist value k] ;; Compute and record the log weight (add-log-weight! (observe dist value)) ;; Call the continuation with no arguments (k))

LMH Variants D. Wingate, A. Stuhlmueller, and N. D. Goodman. "Lightweight implementations of probabilistic programming languages via transformational compilation." AISTATS (2011). WebPPL Anglican "C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching." D. Ritchie, A. Stuhlmuller, and N. D. Goodman. arXiv:1509.02151 (2015). "Venture: a higher-order probabilistic programming platform with programmable inference." V. Mansinghka, D. Selsam, and Y. Perov. arXiv:1404.0099 (2014).

Inference Improvements Relevant to in Higher-Order PPLs

Add Hill Climbing n n n • PMCMC = MH with SMC … proposals, e.g. - PIMH : “particle n n n independent Metropolis- Sweep Hastings” … - PGIBBS : “iterated conditional SMC” n n n … Andrieu, Doucet, Holenstein “Particle Markov chain Monte Carlo methods.“ JRSSB 2010

Blockwise Anytime Algorithm • PIMH is MH that accepts entire n n n new particle sets w.p. ˆ Z 1 … ! ˆ Z ? α s PIMH = min 1 , ˆ Z s − 1 • Each SMC sweep computes n n n Sweep marginal likelihood estimate ˆ Z 2 … N N K 1 ˆ Y ˆ Y X x k Z = Z n = w (˜ 1: n ) K n =1 n =1 k =1 • And all particles can be used n n n ˆ Z ∗ S K … E PIMH [ Q ( x )] = 1 ˆ X X W s,k Q ( x s,k ) . S s =1 k =1 Paige and Wood “A Compilation Target for Probabilistic Programming Languages” ICML 2014

PMCMC For Probabilistic Programming Inference 81 Wood , van de Meent, Mansinghka “A new approach to probabilistic programming inference” AISTATS 2014

Remove Synchronization SMC in LDS slowed down for clarity

Particle Cascade n = 1 n = 2 Asynchronously   - simulate   - weight   - branch Paige, Wood , Doucet, Teh “Asynchronous Anytime Sequential Monte Carlo” NIPS 2014

Particle Cascade

Shared Memory Scalability: Multiple Cores 85

Distributed SMC Nodes 2 4 6 8 10 12 14 16 18 20 MCMC Iteration, r iPMCMC I For each MCMC iteration r = 1 , 2 , . . . 1. Nodes c j 2 { 1 , . . . , M } , j = 1 , . . . , P run CSMC, the rest run SMC 2. Each node m returns a marginal likelihood estimate ˆ Z m and candidate retained particle x 0 1: T,m 3. A loop of Gibbs updates is applied to the retained particle indices: ˆ Z m 1 m/ 2 c 1: P \ j P ( c j = m | c 1: P \ j ) = (3) P M n =1 ˆ Z n 1 n/ 2 c 1: P \ j 4. The retained particles for the next iteration are set x 0 1: T,j [ r ] = x 0 1: T,c j Rainforth, Naesseth, Lindsten, Paige, van de Meent, Doucet, Wood , “Interacting Particle Markov Chain Monte Carlo” ICML 2016

CSMC Exploitation / SMC Exploration 87

Inference Backends in Anglican • 14+ algorithms • Average 165 lines of code per! • Can implement and use without touching core code base. Lines of Algorithm Type Citation Description Code smc IS 127 Sequential Monte Carlo Wood et al. AISTATS, 2014 importance IS 21 Likelihood weighting Particle cascade: Anytime asynchronous sequential Monte pcascade IS Paige et al., NIPS, 2014 176 Carlo pgibbs PMCMC 121 Wood et al. AISTATS, 2014 Particle Gibbs (iterated conditional SMC) pimh PMCMC 68 Wood et al. AISTATS, 2014 Particle independent Metropolis-Hastings van de Meent et al., AISTATS, pgas PMCMC Particle Gibbs with ancestor sampling 179 2015 lmh MCMC 177 Wingate et al., AISTATS, 2011 Lightweight Metropolis-Hastings ipmcmc MCMC Rain forth et al., ICML, 2016 Interacting PMCMC 193 almh MCMC 320 Tolpin et al., ECML PKDD, 2015 Adaptive scheduling lightweight Metropolis-Hastings rmh* MCMC 319 - Random-walk Metropolis-Hastings Parallelised adaptive scheduling lightweight Metropolis- palmh MCMC - 66 Hastings plmh MCMC 62 - Parallelised lightweight Metropolis-Hastings bamc MAP Tolpin et al., SoCS, 2015 Bayesian Ascent Monte Carlo 318 88 siman MAP 193 Tolpin et al., SoCS, 2015 MAP estimation via simulated annealing

What Next?

Commercial Impact INVREA Make Better Decisions https://invrea.com/plugin/excel/v1/download/ 90

Symbolic Inference via Program Transformations • Automated program transformations that simplify or eliminate inference (moving observes up and out) (defquery beta-bernoulli [observation] (defquery beta-bernoulli [observation] (let [dist (beta 1 1) (let [dist (beta theta (sample dist) (if observation 2 1) like (flip theta)] (if observation 1 2)) (observe like observation) theta (sample dist)] (predict :theta theta))) (predict :theta theta))) “Automatic Rao-Blackwellization” Carette and Shan. “Simplifying Probabilistic Programs Using Computer Algebra ⋆ .” T.R. 719, Indiana University (2015) Yang - Keynote Lecture, APLAS (2015)

Exact Inference via Compilation Anglican ( defquery simple [] ( def y ( sample ( flip 0.5))) ( def z ( if y ( dirac 0) ( dirac 1))) Figaro, etc. ( observe z 0) y) R x 1 ⇠ � 0 . 5 x 4 ⇠ � 0 variable elimination to compute x 6 ⇠ � 1 x 2 ⇠ � J flip x 1 K x 5 ⇠ � J dirac x 4 K p ( y ) x 7 ⇠ � J dirac x 6 K x 3 ⇠ P x 2 and x 8 ⇠ � if ( x 3 ,x 5 ,x 7 ) x 9 ⇠ � 0 p ( x | y ) = x 10 ⇠ � J = x 8 x 9 K exactly R Cornish, F Wood , and H Yang “Efficient exact inference in discrete Anglican programs” in prep. 2016

Inference Compilation - FOPPLs w 0 w 0 w 0 z n z n z n ϕ w w 1 w 1 w 1 t n t n t n w 2 w 2 w 2 N N N A probabilistic An inverse model Can we learn how to sample model generates latents from the inverse model? 6 = π ( x ) = p ( x | y ) approximating family q ( x | λ ) Target density , fit λ to learn an importance x | y ) Single dataset : argmin D KL ( π || q λ ) = sampling proposal λ Averaging over   learn a mapping from all possible at λ = ϕ ( η , y ), arbitrary datasets to λ ⇥ ⇤ datasets: argmin D KL ( π || q ϕ ( η , y ) ) E p ( y ) …compiles away runtime η costs of inference! Paige, Wood “Inference Networks for Sequential Monte Carlo in Graphical Models” ICML (2016).

Compiled Inference Results Paige, Wood “Inference Networks for Sequential Monte Carlo in Graphical Models” ICML (2016).

Wrap Up

Learning Dichotomy Supervised Unsupervised x x x x + data = + inference = y y y y Needs lots of labeled data Needs only unlabeled data • • Training is slow No training • • Uninterpretable model Interpretable Model • • Fast at test time Slow at test time • •

Unified Learning Supervised Unsupervised x x x x + data = + inference = y y y x y x x + = y y y Needs only unlabeled data • Slow training • Interpretable model • Fast at test time •

HOPPL Compiled Inference p(letters | captcha) Compiled inference Classical inference 1) Compilation (1 day) 1) Inference (20 minutes) 2) Inference (1 second) (defquery ¡captcha ¡[baseline-‑image] ¡ ¡ ¡(let ¡[num-‑letters ¡(sample ¡(u-‑d ¡4 ¡7)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡x-‑offset ¡(sample ¡(u-‑d ¡min-‑x ¡max-‑x)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡y-‑offset ¡(sample ¡(u-‑d ¡min-‑y ¡max-‑y)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑x ¡(sample ¡(u-‑d ¡8 ¡15)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑y ¡(sample ¡(u-‑d ¡8 ¡15)) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡kerning ¡(sample ¡(u-‑d ¡-‑1 ¡3)) ¡ Probabilistic ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡letter-‑ids ¡(repeatedly ¡num-‑letters ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡#(sample ¡(u-‑d ¡0 ¡dict-‑size))) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡letters ¡(get-‑letters ¡letter-‑ids) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡rendered-‑image ¡(render ¡letters ¡ ¡ program ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡x-‑offset ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡y-‑offset ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡kerning ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑x ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡distort-‑y)] ¡ ¡ ¡ ¡ ¡;; ¡ABC-‑style ¡observe ¡ ¡ ¡ ¡ ¡(observe ¡(abc-‑dist ¡rendered-‑image ¡abc-‑sigma) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡baseline-‑image) ¡ ¡ ¡ ¡ ¡(predict ¡:letters ¡letters))) Training {x, y} data Compiled sequential Sequential Lightweight importance sampling Monte Carlo Metropolis-Hastings 1 particle 10k particles 10k iterations Dynamically assembled RNN num-‑letters ¡= ¡5 ¡ num-‑letters ¡= ¡4 ¡ num-‑letters ¡= ¡6 ¡ … ¡ … ¡ … ¡ letters ¡= ¡“gtRai” letters ¡= ¡“dF6D” letters ¡= ¡“q5ihGt” Trained RNN weights 98 Le, Baydin, Wood “Inference Compilation and Universal Probabilistic Programming” in prep 2016

Compiled HOPPL Models x x x + = y y y y x program source code program output scene description image policy and world observations and rewards neural net structures input/output pairs simulator constraints

Wrap Up

Probabilistic Programming Frank Wood frank@invrea.com - PowerPoint PPT Presentation

Probabilistic Programming Frank Wood frank@invrea.com fwood@robots.ox.ac.uk http://www.invrea.com/ http://www.robots.ox.ac.uk/~fwood PPAML Summer School, Portland 2016 Objectives For This

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

A Brief Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago Universidade

Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago BISS 2014, Bertinoro

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven

Criticality Aware Tiered Cache Hierarchy (CATCH) Anant Nori, Jayesh Gaur, Siddharth Rai # ,

Code Similarity via Natural Language Descriptions Meital Ben Sinai & Eran Yahav Technion

Hang Zhao Massachusetts Institute of Technology Last years challenge at ECCV ADE Dataset New

3 rd Data Prefetching Championship June 23 rd , 2019 Held in conjunction with ISCA 2019 Seth

Make your code count Quantum simulations and collaborative code QuTiP: Shahnawaz Ahmed The

Whole-body Compliant Dynamical Contacts for Humanoids: the CoDyCo project (FP7 EU project No.

Nested Interpolants Matthias Heizmann Jochen Hoenicke Andreas Podelski University of Freiburg,

Chapter 6 Digital Design and Computer Architecture , 2 nd Edition David Money Harris and Sarah L.

Probabilistic Programming Frank Wood frank@invrea.com - PowerPoint PPT Presentation

Probabilistic Programming Frank Wood frank@invrea.com fwood@robots.ox.ac.uk http://www.invrea.com/ http://www.robots.ox.ac.uk/~fwood PPAML Summer School, Portland 2016 Objectives For This

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

A Brief Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago Universidade

Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago BISS 2014, Bertinoro

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven

Criticality Aware Tiered Cache Hierarchy (CATCH) Anant Nori*, Jayesh Gaur*, Siddharth Rai # ,

Code Similarity via Natural Language Descriptions Meital Ben Sinai &amp; Eran Yahav Technion

Hang Zhao Massachusetts Institute of Technology Last years challenge at ECCV ADE Dataset New

3 rd Data Prefetching Championship June 23 rd , 2019 Held in conjunction with ISCA 2019 Seth

Make your code count Quantum simulations and collaborative code QuTiP: Shahnawaz Ahmed The

Whole-body Compliant Dynamical Contacts for Humanoids: the CoDyCo project (FP7 EU project No.

Nested Interpolants Matthias Heizmann Jochen Hoenicke Andreas Podelski University of Freiburg,

Chapter 6 Digital Design and Computer Architecture , 2 nd Edition David Money Harris and Sarah L.

Criticality Aware Tiered Cache Hierarchy (CATCH) Anant Nori, Jayesh Gaur, Siddharth Rai # ,

Code Similarity via Natural Language Descriptions Meital Ben Sinai & Eran Yahav Technion