Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ - PowerPoint PPT Presentation

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010 Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Introduction I Gibbs sample large Bayesian Networks using C and Cilk++, using MATLAB for pre- and post-processing. I chose a class of Bayes Nets with special structure, so that their mean, covariance, marginals, etc. are analytically knowable, making algorithm performance easy to analyze. I achieved no parallel speedups this time, but made substantial headway towards better implementations in the future. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Marginal Distributions Let p ( x 1 , . . . , x n ) be a probability distribution that is not necessarily normalized (its integral over R n is positive but possibly not 1). Define the i -th marginal of p as follows: p ( x 1 , . . . , x n ) p ( x i | x 1 , . . . , x i − 1 , x i +1 , . . . , x n ) = � R p ( x 1 , . . . , x n ) dx i Shorthand: p ( x i | p ¬ x i ) This answers the question, “What is the probability that x i is some value if we know the rest of the x j ’s?” Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Gibbs Sampling The Fundamental Assumption of Gibbs Sampling: Holding x 1 , . . . , x i − 1 , x i +1 , . . . , x n constant, we have a sampling rule for the one-dimensional distribution p ( x i |¬ x i ) . (When that isn’t true, JAGS uses “Slice Sampling,” a very good adaptive Monte-Carlo method for 1-D distributions). X ∼ p ( x j |¬ x j ) will here mean “ X is sampled from the distribution p ( x j |¬ x j ).” It need not be normalized for you to sample from it. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Gibbs Sampling The algorithm uses two for loops: 1. Initialize x 0 1 , . . . , x 0 n to anything. 2. For i = 1 , 2 , . . . For j = 1 , . . . , n j − 1 , x i − 1 Sample x i j ∼ p ( x j | x i 1 , . . . , x i j +1 , . . . , x i − 1 ). n The “Gauss-Seidel” property of this algorithm - that some superscripts in the above marginal are i , instead of all ( i − 1) - is crucial to its success. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Exploiting Independence for Parallelism Say you want to Gibbs sample p ( x 1 , x 2 ). The algorithm dictates that you generate samples in the following order given x 0 1 and x 0 2 : x 1 1 , x 1 2 , x 2 1 , x 2 2 , x 3 1 , x 3 2 , x 4 1 , x 4 2 , . . . However, if p ( x 1 , x 2 ) = p ( x 1 ) p ( x 2 ), you can do a “Jacobi Iteration” instead. I place two variables in ()’s if they can be generated only as functions of variables on their left, and one does not need to know the other in order to be generated. The order becomes: ( x 1 1 , x 1 2 ) , ( x 2 1 , x 2 2 ) , ( x 3 1 , x 3 2 ) , ( x 4 1 , x 4 2 ) , . . . Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Exploiting Independence for Parallelism In general, take a distribution p ( x 1 , . . . , x n ). Define the following partition of { x 1 , . . . , x n } into nonempty sets C k , called “colors”: m � { x 1 , . . . , x n } = C k , k =1 where if x i and x j are in the same C k , or “have the same color,” then they are independent. The converse need not hold (and usually doesn’t). Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

A Parallel Algorithm Modified from R. Bradford, A. Thomas, “Markov chain Monte Carlo methods for family trees using a parallel processor,” Statistics and Computing (1996), 6 , 67-75. Chad Scherrer had the idea independently. 1. Partition { x 1 , . . . , x n } = � m k =1 C k as previously described. 2. Initialize x 0 1 , . . . , x 0 n to anything. 3. For i = 1 , 2 , . . . For k = 1 , . . . , m Sample all x j ’s in C k in parallel, to make x k j ’s. Specifically, for x j ∈ C k , sample x i 1 , . . . , x α j − 1 j − 1 , x α j +1 j ∼ p ( x j | x α 1 j +1 , . . . , x α n n ) , where α p = i if p ∈ C k ′ for k ′ < k and α p = i − 1 if p ∈ C k ′ for k ′ ≥ k . The “ ≥ ” is where the parallelism is. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Theoretical Pros and Cons Successfully implemented in (Bradford, Thomas, 1996). Opportunity for data locality. Performance highly dependent on coloring (NP-Complete). Can create cache lines: if two independent variables are both dependent on the same third variable, they may reach for its value at the same time to update from their own marginals. Cilk++ hyperobjects, which I have not implemented (yet), will probably help. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

The Test Problem The following method creates a multivariate normal p ( x 1 , . . . , x n ) with sparse dependency structure among the variables: Pick a subset of maybe four of the variables, say x 1 , x 2 , x 3 , x 4 . 1 Assign them a pos. def. sym. “covariance” matrix. Invert it, and then extend it to an n × n matrix by 2 zero-padding. Do the above two steps a number of times, and then add the 3 inverted “covariance” matrices together. Also add in I n × n . Then invert that quantity to get C , the covariance matrix of a multivariate normal over ( x 1 , . . . , x n ). The mean is � 0. x i and x j will only be dependent if they appear in a set of four 4 in step 1. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

The Test Problem There is an analytic formula for p ( x j |¬ x j ) for normals, so we can plug that into the Gibbs Sampler. It involves inverting some of the minors of the covariance matrix C . I do that in MATLAB and dump the data to text to be read by the Cilk++ program. Part of this process is determining the dependency relations among the x j ’s, forming a “dependency graph,” which can be colored ( x j ’s are nodes, connected by edges if they are dependent). I have MATLAB do a greedy coloring and output the data to text to be read by Cilk++. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

The Test Problem We know the mean and covariance of p ( x 1 , . . . , x n ), so we can assess the Gibbs Sampler’s convergence by seeing if its sample (“ergodic”) mean and covariance match up to the true values. The Cilk++ program outputs text which is read by MATLAB, in which postprocessing can be done smoothly. Remark: This is a bad way to sample from a multivariate normal, I only Gibbs Sample it to test the Gibbs Sampler. The right way to sample from it is to rotate to a basis where the x j ’s are independent, sample them in parallel there, and then rotate back. The target basis is the eigenbasis of C . Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Results No parallel speedup this time. I used the test problem described previously, with ⌊ N / 3 ⌋ small 4 × 4 randomly chosen “covariance” matrices. Figure: N from 10 to 100, five different distributions p ( x 1 , . . . , x N ) per N . The Cilk++ implementation took longer. 10 , 000 samples done for all N , errors in ergodic mean grew from roughly 3% to 9%. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Comments I was unable to go to larger N , since my data transfer system from MATLAB to Cilk++ doesn’t exploit sparsity. This will be fixed. I also tried quite a few embarrassingly parallel implementations that created massive slowdowns. Initially the problem was probably cache lines, but even after recopying all of the relevant data things still aren’t working. I think that may be because multiple processors are trying to access nearby bytes, but I’m not sure. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

The Future Rewrite all code to be object-oriented, better take advantage of sparsity. Explore parallelism with Cilk Hyperobjects, such as Reducers. Use same data structures to explore Quadrature and Langevin-Equation Sampling approaches, as well as Gibbs Sampling. Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ - PowerPoint PPT Presentation

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010 Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Introduction I Gibbs sample large Bayesian Networks using C and Cilk++,

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The lecture is partly based on

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

General LTL Specification Mining Caroline Lemieux , Dennis Park and Ivan Beschastnikh University

The Fork-Join Model and its Implementation in Cilk Marc Moreno Maza University of Western

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of Warwick MCQMC, February

Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning

webinar series NM SMART Grid Center Student Research Spotlight Presenters: Jeewon Choi (UNM),

TDDD17 Informatjon Security Topic: Database Privacy Olaf Hartjg olaf.hartjg@liu.se

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ - PowerPoint PPT Presentation

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010 Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Introduction I Gibbs sample large Bayesian Networks using C and Cilk++,

Cilk for High Cilk for High Productivity Computing Productivity Computing Bradley C. Kuszmaul

Gibbs-non-Gibbs dynamical transitions. A large-deviation paradigm R. Fern andez F. den

Gibbs sampling Dr. Jarad Niemi Iowa State University March 29, 2018 Jarad Niemi (Iowa State)

Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The lecture is partly based on

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees Ruqi Zhang and

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

General LTL Specification Mining Caroline Lemieux , Dennis Park and Ivan Beschastnikh University

The Fork-Join Model and its Implementation in Cilk Marc Moreno Maza University of Western

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of Warwick MCQMC, February

Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning

webinar series NM SMART Grid Center Student Research Spotlight Presenters: Jeewon Choi (UNM),

TDDD17 Informatjon Security Topic: Database Privacy Olaf Hartjg olaf.hartjg@liu.se

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA