Distributed Markov chain Monte Carlo Lawrence Murray CSIRO - - PowerPoint PPT Presentation

distributed markov chain monte carlo
SMART_READER_LITE
LIVE PREVIEW

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO - - PowerPoint PPT Presentation

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and Statistics Motivation Bayesian inference in environmental models. Particle Markov chain Monte Carlo (PMCMC): state-space model,


slide-1
SLIDE 1

Distributed Markov chain Monte Carlo

Lawrence Murray CSIRO Mathematics, Informatics and Statistics

slide-2
SLIDE 2

Lawrence Murray Slide 2 of 7

Motivation

  • Bayesian inference in environmental models.
  • Particle Markov chain Monte Carlo (PMCMC):

– state-space model, – Metropolis-Hastings over p(Θ|y1:T), – use particle filter to estimate marginal likelihoods:

−∞

p(y1:T, x1:T|θ) dx1:T

  • Particle filters executed on GPU, but evaluations still take several

seconds, may require several minutes for larger models.

  • Scale up to cluster level, one Markov chain per CPU-GPU pair.
slide-3
SLIDE 3

Lawrence Murray Slide 3 of 7

Quasi-ergodicity and multiple chains

x p(X) Starting distribution x p(X) Estimate by single quasi-ergodic chain x p(x) Estimate by ensemble of quasi-ergodic chains

p(X) is the target distribution, consisting of two isolated modes;

(left) the starting distribution; (centre) typical posterior returned by a single quasi-ergodic chain; (right) typical posterior returned by multiple quasi-ergodic chains.

slide-4
SLIDE 4

Lawrence Murray Slide 4 of 7

Convergence and multiple chains

If some portion ρ of steps, 0 < ρ ≤ 1 and typically up to .5, must be removed as burn-in from each chain, the maximum clock-time speedup through parallelisation is limited to 1/ρ (Amdahl’s law). Thus, a multiple-chain strategy must also reduce ρ as the number of chains increases in order to scale well.

slide-5
SLIDE 5

Lawrence Murray Slide 5 of 7

Method

For each chain i, consider a proposal that mixes some local component

li(θ′

i|θi) with a remote or global component Ri(θ′ i):

qi(θ′

i) := (1 − α)li(θ′ i|θi) + αRi(θ′ i) ,

Ri(·) can be constructed via some contributed component rj(·) from

each chain j. Consider:

Ri(θ′

i) ∝ C

max

j=1 rj(θ′ i) .

Importantly, Ri(·) can be adapted asynchronously as new information is received from other chains. Faults only deprive chains of timely adaptation, they do not impact correctness.

slide-6
SLIDE 6

Lawrence Murray Slide 6 of 7

Early results

1 2 3 4 5 6 5000 10000 15000 20000 25000 Rp 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 Random walk Adaptive Mixture, without sharing Mixture, with sharing

Evolution of the ˆ

Rp statistic of Brooks & Gelman (1998) across steps

for each method, with (left to right) 2, 4, 8 and 16 chains. Lines indicate mean across 20 runs, and shaded areas a half standard deviation either side.

slide-7
SLIDE 7

CSIRO Mathematics, Informatics and Statistics Lawrence Murray Phone: +61 8 9333 6480 Email: lawrence.murray@csiro.au Web: www.cmis.csiro.au Contact Us Phone: 1300 363 400 or +61 3 9545 2176 Email: enquiries@csiro.au Web: www.csiro.au