scalable metropolis hastings for exact bayesian inference
play

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large - PowerPoint PPT Presentation

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-C ot e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable MetropolisHastings June 8,


  1. Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-Cˆ ot´ e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 1 / 24

  2. Problem Bayesian inference via MCMC is expensive for large datasets Cornish et al. Scalable Metropolis–Hastings June 8, 2019 2 / 24

  3. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  4. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  5. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 ⇒ O ( n ) computation per step to compute α MH ( θ, θ ′ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  6. Our approach Want a method with cost o ( n ) per step – subsampling Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

  7. Our approach Want a method with cost o ( n ) per step – subsampling Want our method not to reduce accuracy – exactness Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

  8. 10 5 MH SMH-1 Likelihoods per iteration SMH-2 10 4 10 3 10 2 10 1 64 128 256 512 1024 2048 4096 8192 32768 131072 n Our approach Several existing exact subsampling methods: Firefly [Maclaurin and Adams, 2014] Delayed acceptance [Banterle et al., 2015] Piecewise-deterministic MCMC [Bouchard-Cˆ ot´ e et al., 2018, Bierkens et al., 2018] Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

  9. Our approach Several existing exact subsampling methods: 10 5 MH SMH-1 Firefly Likelihoods per iteration SMH-2 10 4 [Maclaurin and Adams, 2014] Delayed acceptance 10 3 [Banterle et al., 2015] 10 2 Piecewise-deterministic 10 1 MCMC 64 128 256 512 1024 2048 4096 8192 32768 131072 [Bouchard-Cˆ ot´ e et al., 2018, n Bierkens et al., 2018] Figure 1: Average number of likelihood Our method: an exact evaluations per iteration required by subsampling scheme based on a SMH for a 10-dimensional logistic proxy target that requires on regression posterior as the number of average O (1) or O (1 / √ n ) data points n increases. likelihood evaluations per step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

  10. Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 6 / 24

  11. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  12. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  13. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  14. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Compare the MH acceptance probability as � n π i ( θ ′ ) α MH ( θ, θ ′ ) = 1 ∧ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  15. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  16. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  17. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  18. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance However, still must compute all n terms in order to accept Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  19. Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 9 / 24

  20. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

  21. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

  22. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C ⇒ P ( B 1 = · · · = B C = 0) = α FMH ( θ, θ ′ ), so can use this procedure to perform the FMH accept/reject step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend