Scalable Metropolis-Hastings for Exact Bayesian Inference with Large - PowerPoint PPT Presentation

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-Cˆ ot´ e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 1 / 24

Problem Bayesian inference via MCMC is expensive for large datasets Cornish et al. Scalable Metropolis–Hastings June 8, 2019 2 / 24

Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 ⇒ O ( n ) computation per step to compute α MH ( θ, θ ′ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

Our approach Want a method with cost o ( n ) per step – subsampling Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

Our approach Want a method with cost o ( n ) per step – subsampling Want our method not to reduce accuracy – exactness Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

10 5 MH SMH-1 Likelihoods per iteration SMH-2 10 4 10 3 10 2 10 1 64 128 256 512 1024 2048 4096 8192 32768 131072 n Our approach Several existing exact subsampling methods: Firefly [Maclaurin and Adams, 2014] Delayed acceptance [Banterle et al., 2015] Piecewise-deterministic MCMC [Bouchard-Cˆ ot´ e et al., 2018, Bierkens et al., 2018] Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

Our approach Several existing exact subsampling methods: 10 5 MH SMH-1 Firefly Likelihoods per iteration SMH-2 10 4 [Maclaurin and Adams, 2014] Delayed acceptance 10 3 [Banterle et al., 2015] 10 2 Piecewise-deterministic 10 1 MCMC 64 128 256 512 1024 2048 4096 8192 32768 131072 [Bouchard-Cˆ ot´ e et al., 2018, n Bierkens et al., 2018] Figure 1: Average number of likelihood Our method: an exact evaluations per iteration required by subsampling scheme based on a SMH for a 10-dimensional logistic proxy target that requires on regression posterior as the number of average O (1) or O (1 / √ n ) data points n increases. likelihood evaluations per step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 6 / 24

Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Compare the MH acceptance probability as � n π i ( θ ′ ) α MH ( θ, θ ′ ) = 1 ∧ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance However, still must compute all n terms in order to accept Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 9 / 24

Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C ⇒ P ( B 1 = · · · = B C = 0) = α FMH ( θ, θ ′ ), so can use this procedure to perform the FMH accept/reject step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large - PowerPoint PPT Presentation

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-C ot e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable MetropolisHastings June 8,

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems Bjrn Sprungk,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Downtown Hastings A Place for Talent 2016 MML Community Excellence Award - City of Hastings

Hastings Borough Council Corporate Plan & Budget 2017 / 2018 www.hastings.gov.uk What

Hastings Opportunity Area: Initial work was undertaken with Hastings schools, colleagues,

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

Gibbs Sampling Biostatistics 615/815 Lecture 22: . . . . . . . . . Metropolis-Hastings

MCMC Diagnostics Review In the practical you used Metropolis-Hastings with a Gaussian proposal

Variational Auto-Encoders without (too) much math Stphane dAscoli Roadmap 1. A reminder

Introduction to Data Management CSE 344 Section 6: Relational Calculus and Some XML CSE 344 -

Crypto: a key ingredient in building respectful products @LeaKissner HI! Im lea kissner

Youth On Track for Health: Making the Healthy Choice the Easy Choice in Local Group Home Settings

modern cosmology ingredient 2: fluid mechanics Bjrn Malte Schfer Fakultt fr Physik und

Internal 3D kinematics of dwarf spheroidal galaxies with Gaia + HST Davide Massari Dipartimento

VH bb: Experimental Review Georges Aad CPPM, Aix-Marseille Universit, CNRS/IN2P3,

The Gysin Sequence for Quantum Lens Spaces Some perspective Francesca Arici (SISSA) NGA2014 -

Sambuz

Useful Links

Newsletter

Mail Us

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large - PowerPoint PPT Presentation

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-C ot e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable MetropolisHastings June 8,

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems Bjrn Sprungk,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Downtown Hastings A Place for Talent 2016 MML Community Excellence Award - City of Hastings

Hastings Borough Council Corporate Plan &amp; Budget 2017 / 2018 www.hastings.gov.uk What

Hastings Opportunity Area: Initial work was undertaken with Hastings schools, colleagues,

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

Gibbs Sampling Biostatistics 615/815 Lecture 22: . . . . . . . . . Metropolis-Hastings

MCMC Diagnostics Review In the practical you used Metropolis-Hastings with a Gaussian proposal

Variational Auto-Encoders without (too) much math Stphane dAscoli Roadmap 1. A reminder

Introduction to Data Management CSE 344 Section 6: Relational Calculus and Some XML CSE 344 -

Crypto: a key ingredient in building respectful products @LeaKissner HI! Im lea kissner

Youth On Track for Health: Making the Healthy Choice the Easy Choice in Local Group Home Settings

modern cosmology ingredient 2: fluid mechanics Bjrn Malte Schfer Fakultt fr Physik und

Internal 3D kinematics of dwarf spheroidal galaxies with Gaia + HST Davide Massari Dipartimento

VH bb: Experimental Review Georges Aad CPPM, Aix-Marseille Universit, CNRS/IN2P3,

The Gysin Sequence for Quantum Lens Spaces Some perspective Francesca Arici (SISSA) NGA2014 -

Sambuz

Useful Links

Newsletter

Mail Us

Hastings Borough Council Corporate Plan & Budget 2017 / 2018 www.hastings.gov.uk What