Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees - PowerPoint PPT Presentation

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Yucheng Arthur Carlos Gonzalez Low Gretton Guestrin

Sampling as an Inference Procedure Suppose we wanted to know the probability that coin lands “heads” Counts 4x Heads “Draw Samples” 6x Tails We use the same idea for graphical model inference X 1 Inference: X 2 Inference: X 3 X 4 Graphical X 5 Model X 6 2

Terminology: Graphical Models Focus on discrete factorized models with sparse structure : f 1,2 X 1 X 2 Factor Graph f 1,3 f 2,4 f 2,4,5 X 5 X 3 X 4 f 3,4 X 1 X 2 Markov X 5 Random Field X 3 X 4

Terminology: Ergodicity The goal is to estimate: Example: marginal estimation If the sampler is ergodic the following is true*: *Consult your statistician about potential risks before using.

Gibbs Sampling [Geman & Geman, 1984] Sequentially for each variable in the model Select variable Construct conditional given adjacent assignments Flip coin and update assignment to variable Initial Assignment 5

Why Study Parallel Gibbs Sampling? “The Gibbs sampler ... might be considered the workhorse of the MCMC world.” –Robert and Casella Ergodic with geometric convergence Great for high-dimensional models No need to tune a joint proposal Easy to construct algorithmically WinBUGS Important Properties that help Parallelization: Sparse structure è factorized computation

Is the Gibbs Sampler trivially parallel?

From the original paper on Gibbs Sampling: “…the MRF can be divided into collections of [variables] with each collection assigned to an independently running asynchronous processor .” -- Stuart and Donald Geman, 1984. Converges to the wrong distribution! 8

The problem with Synchronous Gibbs t =1 t =2 t =3 Strong Positive t =0 Correlation Strong Positive Correlation Strong Negative Correlation Adjacent variables cannot be sampled simultaneously . 9

How has the machine learning community solved this problem?

Two Decades later 1. Newman et al., Scalable Parallel Topic Models. Jnl. Intelligen. Comm. R&D, 2006. 2. Newman et al ., Distributed Inference for Latent Dirichlet Allocation. NIPS, 2007. 3. Asuncion et al., Asynchronous Distributed Learning of Topic Models. NIPS, 2008. 4. Doshi-Velez et al., Large Scale Nonparametric Bayesian Inference: Data Parallelization in the Indian Buffet Process. NIPS 2009 5. Yan et al., Parallel Inference for Latent Dirichlet Allocation on GPUs. NIPS, 2009. Same problem as the original Geman paper Parallel version of the sampler is not ergodic . Unlike Geman, the recent work: Recognizes the issue Ignores the issue Propose an “approximate” solution

Two Decades Ago Parallel computing community studied: Directed Acyclic Sequential Algorithm Dependency Graph Time Construct an Equivalent Parallel Algorithm Using Graph Coloring

Chromatic Sampler Compute a k-coloring of the graphical model Sample all variables with same color in parallel Sequential Consistency: Time 13

Chromatic Sampler Algorithm For t from 1 to T do For k from 1 to K do Parfor i in color k :

Asymptotic Properties Quantifiable acceleration in mixing # Variables Time to update # Colors all variables once # Processors Speedup: Penalty Term

Proof of Ergodicity Version 1 (Sequential Consistency): Chromatic Gibbs Sampler is equivalent to a Sequential Scan Gibbs Sampler Time Version 2 (Probabilistic Interpretation): Variables in same color are Conditionally Independent è Joint Sample is equivalent to Parallel Independent Samples

Special Properties of 2-Colorable Models Many common models have two colorings For the [Incorrect] Synchronous Gibbs Samplers Provide a method to correct the chains Derive the stationary distribution

Correcting the Synchronous Gibbs Sampler t =0 t =1 t =2 t =3 t =4 Invalid Strong Positive Sequence Correlation We can derive two valid chains: t =0 t =1 t =2 t =3 t =4 t =5 18

Correcting the Synchronous Gibbs Sampler t =0 t =1 t =2 t =3 t =4 Invalid Strong Positive Sequence Correlation We can derive two valid chains: Chain 1 Converges to the Correct Distribution Chain 2 19

Theoretical Contributions on 2-colorable models Stationary distribution of Synchronous Gibbs : Variables in Variables in Color 1 Color 2 20

Theoretical Contributions on 2-colorable models Stationary distribution of Synchronous Gibbs Variables in Variables in Color 1 Color 2 Corollary : Synchronous Gibbs sampler is correct for single variable marginals. 21

From Colored Fields to Thin Junction Trees Chromatic Gibbs Sampler Splash Gibbs Sampler Slowly Mixing Models ? Ideal for: Ideal for: Rapid mixing models Slowly mixing models Conditional structure does Conditional structure not admit Splash admits Splash Discrete models

Models With Strong Dependencies Single variable Gibbs updates tend to mix slowly: X 2 X 1 Single site changes move slowly with strong correlation. Ideally we would like to draw joint samples. Blocking 23

Blocking Gibbs Sampler Based on the papers: 1. Jensen et al., Blocking Gibbs Sampling for Linkage Analysis in Large Pedigrees with Many Loops. TR 1996 2. Hamze et al., From Fields to Trees . UAI 2004 .

Splash Gibbs Sampler An asynchronous Gibbs Sampler that adaptively addresses strong dependencies . Carnegie Mellon 25

Splash Gibbs Sampler Step 1: Grow multiple Splashes in parallel: Conditionally Independent 26

Splash Gibbs Sampler Step 1: Grow multiple Splashes in parallel: Tree-width = 1 Conditionally Independent 27

Splash Gibbs Sampler Step 1: Grow multiple Splashes in parallel: Tree-width = 2 Conditionally Independent 28

Splash Gibbs Sampler Step 2: Calibrate the trees in parallel 29

Splash Gibbs Sampler Step 3: Sample trees in parallel 30

Higher Treewidth Splashes Recall: Tree-width = 2 Junction Trees 31

Junction Trees Data structure used for exact inference in loopy graphical models f AB A f AB B A B D f AD f AD f BC f BC B C D D f CD C f CD f DE f CE f DE C D E E f CE Tree-width = 2

Splash Thin Junction Tree Parallel Splash Junction Tree Algorithm Construct multiple conditionally independent thin (bounded treewidth) junction trees Splashes Sequential junction tree extension Calibrate the each thin junction tree in parallel Parallel belief propagation Exact backward sampling Parallel exact sampling

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A A

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B B A

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B B C C B A

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B A D

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B A D E A D E

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B A D E A E F A D F E

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B A D E A E F D A G A G F E

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B H A D E A E F D A G A G F E B G H

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B H A B D E A B E F D A G A B G F E B G H

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B H A D E A E F D A G A G F E

Splash generation Frontier extension algorithm: Markov Random Field Corresponding Junction tree A B D B C D C B H A D E A E F I D A G D I A G F E

Splash generation Challenge: Efficiently reject vertices that violate treewidth constraint Efficiently extend the junction tree Choosing the next vertex Solution Splash Junction Trees: Variable elimination with reverse visit ordering C B H I,G,F,E,D,C,B,A Add new clique and update RIP I D A G If a clique is created which exceeds F E treewidth terminate extension Adaptive prioritize boundary

Incremental Junction Trees First 3 Rounds: 1 2 3 1 2 3 1 2 3 4 5 6 4 5 6 4 5 6 4 4 4,5 Junction Tree: 4 4,5 2,5 {5,4} { 2,5,4 } {4} Elim. Order:

Incremental Junction Trees Result of third round: 1 2 3 { 2,5,4 } 4 4,5 2,5 4 5 6 Fourth round: 4 4 1 2 3 Fix RIP 4,5 4,5 4 5 6 2,5 1,2,4 2,4,5 1,2,4 { 1,2,5,4 }

Incremental Junction Trees Results from 4 th round: 4 1 2 3 { 1,2,5,4 } 4,5 4 5 6 2,4,5 1,2,4 5 th Round: 4 1 2 3 { 6,1,2,5,4 } 4,5 5,6 4 5 6 2,4,5 1,2,4

Incremental Junction Trees Results from 5 th round: 4 1 2 3 { 6,1,2,5,4 } 4,5 5,6 4 5 6 2,4,5 1,2,4 6 th Round: 4 1,2,3, 6 1 2 3 { 3,6,1,2,5,4 } 4,5 5,6 4 5 6 2,4,5 1,2,4

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees - PowerPoint PPT Presentation

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Yucheng Arthur Carlos Gonzalez Low Gretton Guestrin Sampling as an Inference Procedure Suppose we wanted to know the probability that coin lands heads

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

y 0.4m x B 2/5/2019 [tsl342 1/69] Intermediate Exam III: Problem #1 (Spring 05) An

Diffusion approximation as a tool in computer networks performance evaluation CNT 2500 Tadeusz

(Picture of unrestored painting) (Picture of restored painting) GO/Make/Disciples Why exactly

SCALABLE HUMAN-COMPETITIVE SOFTWARE REPAIR Stephanie Michael Claire Westley Forrest

SEAGLE: Constraining galaxy evolution scenarios by Simulating EAGLE LEnses arXiv: 1802.06629

sixteen steel construction: materials & beams Steel Beams 1 Elements of Architectural

Integrability, Poisson-Lie Symmetry and Double Field Theory Falk Hassler University of North

Free idempotent generated semigroups Nik Ru skuc nik@mcs.st-and.ac.uk School of Mathematics

Sambuz

Useful Links

Newsletter

Mail Us

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees - PowerPoint PPT Presentation

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Yucheng Arthur Carlos Gonzalez Low Gretton Guestrin Sampling as an Inference Procedure Suppose we wanted to know the probability that coin lands heads

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Factors of Gibbs measures on subshifts What is a Gibbs measure? Two-ish definitions Equivalence

College P Planning N Night GIBBS GIBBS HIGH IGH SCHOOL SC SCHO HOOL COUNSE SELING OFFICE

Sampling Gibbs : Maximization Expectation Scribes Jered McInerney : 2- hang Xiongyi

Mathematics in the Sciences Eggeling et al. Gibbs sampling for parsMMs with latent variables 1

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling Christopher De Sa Kunle

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

y 0.4m x B 2/5/2019 [tsl342 1/69] Intermediate Exam III: Problem #1 (Spring 05) An

Diffusion approximation as a tool in computer networks performance evaluation CNT 2500 Tadeusz

(Picture of unrestored painting) (Picture of restored painting) GO/Make/Disciples Why exactly

SCALABLE HUMAN-COMPETITIVE SOFTWARE REPAIR Stephanie Michael Claire Westley Forrest

SEAGLE: Constraining galaxy evolution scenarios by Simulating EAGLE LEnses arXiv: 1802.06629

sixteen steel construction: materials &amp; beams Steel Beams 1 Elements of Architectural

Integrability, Poisson-Lie Symmetry and Double Field Theory Falk Hassler University of North

Free idempotent generated semigroups Nik Ru skuc nik@mcs.st-and.ac.uk School of Mathematics

Sambuz

Useful Links

Newsletter

Mail Us

sixteen steel construction: materials & beams Steel Beams 1 Elements of Architectural