distributed markov chain monte carlo
play

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO - PowerPoint PPT Presentation

Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and Statistics Motivation Bayesian inference in environmental models. Particle Markov chain Monte Carlo (PMCMC): state-space model,


  1. Distributed Markov chain Monte Carlo Lawrence Murray CSIRO Mathematics, Informatics and Statistics

  2. Motivation • Bayesian inference in environmental models. • Particle Markov chain Monte Carlo (PMCMC): – state-space model, – Metropolis-Hastings over p ( Θ | y 1: T ) , – use particle filter to estimate marginal likelihoods: � ∞ p ( y 1: T , x 1: T | θ ) d x 1: T −∞ • Particle filters executed on GPU, but evaluations still take several seconds, may require several minutes for larger models. • Scale up to cluster level, one Markov chain per CPU-GPU pair. Lawrence Murray Slide 2 of 7

  3. Quasi-ergodicity and multiple chains p(X) p(X) p(x) Starting distribution Estimate by single quasi-ergodic chain Estimate by ensemble of quasi-ergodic chains x x x p ( X ) is the target distribution, consisting of two isolated modes; (left) the starting distribution; (centre) typical posterior returned by a single quasi-ergodic chain; (right) typical posterior returned by multiple quasi-ergodic chains. Lawrence Murray Slide 3 of 7

  4. Convergence and multiple chains If some portion ρ of steps, 0 < ρ ≤ 1 and typically up to . 5 , must be removed as burn-in from each chain, the maximum clock-time speedup through parallelisation is limited to 1 /ρ (Amdahl’s law). Thus, a multiple-chain strategy must also reduce ρ as the number of chains increases in order to scale well. Lawrence Murray Slide 4 of 7

  5. Method For each chain i , consider a proposal that mixes some local component l i ( θ ′ i | θ i ) with a remote or global component R i ( θ ′ i ) : q i ( θ ′ i ) := (1 − α ) l i ( θ ′ i | θ i ) + αR i ( θ ′ i ) , R i ( · ) can be constructed via some contributed component r j ( · ) from each chain j . Consider: C R i ( θ ′ j =1 r j ( θ ′ i ) ∝ max i ) . Importantly, R i ( · ) can be adapted asynchronously as new information is received from other chains. Faults only deprive chains of timely adaptation, they do not impact correctness. Lawrence Murray Slide 5 of 7

  6. Early results 6 Random walk Adaptive Mixture, without sharing Mixture, with sharing 5 4 R p 3 2 1 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000 R p statistic of Brooks & Gelman (1998) across steps Evolution of the ˆ for each method, with (left to right) 2, 4, 8 and 16 chains. Lines indicate mean across 20 runs, and shaded areas a half standard deviation either side. Lawrence Murray Slide 6 of 7

  7. CSIRO Mathematics, Informatics and Statistics Lawrence Murray Phone: +61 8 9333 6480 Email: lawrence.murray@csiro.au Web: www.cmis.csiro.au Contact Us Phone: 1300 363 400 or +61 3 9545 2176 Email: enquiries@csiro.au Web: www.csiro.au

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend