Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ - - PowerPoint PPT Presentation

gibbs sampling bayesian networks a first attempt with cilk
SMART_READER_LITE
LIVE PREVIEW

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ - - PowerPoint PPT Presentation

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010 Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Introduction I Gibbs sample large Bayesian Networks using C and Cilk++,


slide-1
SLIDE 1

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

Alexander Dubbs May 13, 2010

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-2
SLIDE 2

Introduction

I Gibbs sample large Bayesian Networks using C and Cilk++, using MATLAB for pre- and post-processing. I chose a class of Bayes Nets with special structure, so that their mean, covariance, marginals, etc. are analytically knowable, making algorithm performance easy to analyze. I achieved no parallel speedups this time, but made substantial headway towards better implementations in the future.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-3
SLIDE 3

Marginal Distributions

Let p(x1, . . . , xn) be a probability distribution that is not necessarily normalized (its integral over Rn is positive but possibly not 1). Define the i-th marginal of p as follows: p(xi|x1, . . . , xi−1, xi+1, . . . , xn) = p(x1, . . . , xn)

  • R p(x1, . . . , xn)dxi

Shorthand: p(xi|p¬xi) This answers the question, “What is the probability that xi is some value if we know the rest of the xj’s?”

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-4
SLIDE 4

Gibbs Sampling

The Fundamental Assumption of Gibbs Sampling: Holding x1, . . . , xi−1, xi+1, . . . , xn constant, we have a sampling rule for the one-dimensional distribution p(xi|¬xi). (When that isn’t true, JAGS uses “Slice Sampling,” a very good adaptive Monte-Carlo method for 1-D distributions). X ∼ p(xj|¬xj) will here mean “X is sampled from the distribution p(xj|¬xj).” It need not be normalized for you to sample from it.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-5
SLIDE 5

Gibbs Sampling

The algorithm uses two for loops:

  • 1. Initialize x0

1, . . . , x0 n to anything.

  • 2. For i = 1, 2, . . .

For j = 1, . . . , n Sample xi

j ∼ p(xj|xi 1, . . . , xi j−1, xi−1 j+1, . . . , xi−1 n

). The “Gauss-Seidel” property of this algorithm - that some superscripts in the above marginal are i, instead of all (i − 1)

  • is crucial to its success.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-6
SLIDE 6

Exploiting Independence for Parallelism

Say you want to Gibbs sample p(x1, x2). The algorithm dictates that you generate samples in the following order given x0

1 and x0 2:

x1

1, x1 2, x2 1, x2 2, x3 1, x3 2, x4 1, x4 2, . . .

However, if p(x1, x2) = p(x1)p(x2), you can do a “Jacobi Iteration”

  • instead. I place two variables in ()’s if they can be generated only

as functions of variables on their left, and one does not need to know the other in order to be generated. The order becomes: (x1

1, x1 2), (x2 1, x2 2), (x3 1, x3 2), (x4 1, x4 2), . . .

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-7
SLIDE 7

Exploiting Independence for Parallelism

In general, take a distribution p(x1, . . . , xn). Define the following partition of {x1, . . . , xn} into nonempty sets Ck, called “colors”: {x1, . . . , xn} =

m

  • k=1

Ck, where if xi and xj are in the same Ck, or “have the same color,” then they are independent. The converse need not hold (and usually doesn’t).

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-8
SLIDE 8

A Parallel Algorithm

Modified from R. Bradford, A. Thomas, “Markov chain Monte Carlo methods for family trees using a parallel processor,” Statistics and Computing (1996), 6, 67-75. Chad Scherrer had the idea independently.

  • 1. Partition {x1, . . . , xn} = m

k=1 Ck as previously described.

  • 2. Initialize x0

1, . . . , x0 n to anything.

  • 3. For i = 1, 2, . . .

For k = 1, . . . , m Sample all xj’s in Ck in parallel, to make xk

j ’s.

Specifically, for xj ∈ Ck, sample xi

j ∼ p(xj|xα1 1 , . . . , xαj−1 j−1 , xαj+1 j+1 , . . . , xαn n ),

where αp = i if p ∈ Ck′ for k′ < k and αp = i − 1 if p ∈ Ck′ for k′ ≥ k. The “≥” is where the parallelism is.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-9
SLIDE 9

Theoretical Pros and Cons

Successfully implemented in (Bradford, Thomas, 1996). Opportunity for data locality. Performance highly dependent on coloring (NP-Complete). Can create cache lines: if two independent variables are both dependent on the same third variable, they may reach for its value at the same time to update from their own marginals. Cilk++ hyperobjects, which I have not implemented (yet), will probably help.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-10
SLIDE 10

The Test Problem

The following method creates a multivariate normal p(x1, . . . , xn) with sparse dependency structure among the variables:

1

Pick a subset of maybe four of the variables, say x1, x2, x3, x4. Assign them a pos. def. sym. “covariance” matrix.

2

Invert it, and then extend it to an n × n matrix by zero-padding.

3

Do the above two steps a number of times, and then add the inverted “covariance” matrices together. Also add in In×n. Then invert that quantity to get C, the covariance matrix of a multivariate normal over (x1, . . . , xn). The mean is 0.

4

xi and xj will only be dependent if they appear in a set of four in step 1.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-11
SLIDE 11

The Test Problem

There is an analytic formula for p(xj|¬xj) for normals, so we can plug that into the Gibbs Sampler. It involves inverting some of the minors of the covariance matrix C. I do that in MATLAB and dump the data to text to be read by the Cilk++ program. Part of this process is determining the dependency relations among the xj’s, forming a “dependency graph,” which can be colored (xj’s are nodes, connected by edges if they are dependent). I have MATLAB do a greedy coloring and output the data to text to be read by Cilk++.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-12
SLIDE 12

The Test Problem

We know the mean and covariance of p(x1, . . . , xn), so we can assess the Gibbs Sampler’s convergence by seeing if its sample (“ergodic”) mean and covariance match up to the true values. The Cilk++ program outputs text which is read by MATLAB, in which postprocessing can be done smoothly. Remark: This is a bad way to sample from a multivariate normal, I only Gibbs Sample it to test the Gibbs Sampler. The right way to sample from it is to rotate to a basis where the xj’s are independent, sample them in parallel there, and then rotate back. The target basis is the eigenbasis of C.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-13
SLIDE 13

Results

No parallel speedup this time. I used the test problem described previously, with ⌊N/3⌋ small 4 × 4 randomly chosen “covariance” matrices.

Figure: N from 10 to 100, five different distributions p(x1, . . . , xN) per

  • N. The Cilk++ implementation took longer. 10, 000 samples done for all

N, errors in ergodic mean grew from roughly 3% to 9%.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-14
SLIDE 14

Comments

I was unable to go to larger N, since my data transfer system from MATLAB to Cilk++ doesn’t exploit sparsity. This will be fixed. I also tried quite a few embarrassingly parallel implementations that created massive slowdowns. Initially the problem was probably cache lines, but even after recopying all

  • f the relevant data things still aren’t working. I think that

may be because multiple processors are trying to access nearby bytes, but I’m not sure.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++

slide-15
SLIDE 15

The Future

Rewrite all code to be object-oriented, better take advantage

  • f sparsity.

Explore parallelism with Cilk Hyperobjects, such as Reducers. Use same data structures to explore Quadrature and Langevin-Equation Sampling approaches, as well as Gibbs Sampling.

Alexander Dubbs Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++