Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of - PowerPoint PPT Presentation

Tuning of pseudo-marginal MCMC Alex Thiéry 1 1 National University of Singapore Joint work with C. Sherlock (Lancaster), G. Roberts (Warwick), J. Rosenthal (Toronto).

Pseudo-marginal MCMC Target density π ( x ) , positive unbiased estimate � π ( x ) Theorem (Beaumont 2003 - Andrieu-Roberts 2009) Usual MCMC with � π ( x ) instead of π ( x ) is correct! No free lunch: poor estimator � π , poor mixing.

Wrong Pseudo-marginal MCMC start from x 0 ∈ X 1 Proposal: y D ∼ q ( x k , · ) 2 � π ( y ) q ( y , x k ) Noisy Metropolis-Hastings: α = 1 ∧ 3 � π ( x k ) q ( x y , y ) y �→ x k + 1 with probability α 4 Not correct. Not too bad, after correction. � � � π ( y ) / � Notice that E π ( x ) � = π ( y ) /π ( x )

Pseudo-marginal MCMC start from x 0 ∈ X , estimator z 0 = � π ( x 0 ) 1 Proposal: y D ∼ q ( x k , · ) , estimator z y = � π ( y ) 2 z y q ( y , x k ) Noisy Metropolis-Hastings: α = 1 ∧ 3 z k q ( x y , y ) ( y , z y ) �→ ( x k + 1 , z k + 1 ) with probability α 4

Latent variables Observation y , latent variable u , parameter x ∈ X � P ( y | u , x ) P ( du | x ) π ( x ) ∝ π 0 ( x ) × U data augmentation: MCMC on X × U high correlation between latent variable and parameter? High-dimensional latent variable u ? Pseudo-marginal: imitate MCMC on X only If one can simulate u | x and compute P ( y | u , x ) � n k = 1 P ( y | u k , x ) � π ( x ) = π 0 ( x ) × n

Unbiased estimators? importance sampling sequential importance sampling particle filter unbiased estimator of f ( µ ) from unbiased estimator of µ . Random truncation, Russian’s roulette Probabilistic interpretation of PDEs (Feynman-Kac): replace solving a PDE by simulating a few particles

Stickyness better estimator, better mixing [talk C. Andrieu] � n k = 1 P ( y | u k , x ) � π ( x ) = π 0 ( x ) × n behaves like marginal chain as Var ( � π ( x )) → 0 if � π ( x ) ≫ π ( x ) , parameter x is over-estimated: stuck! Trade-off: MCMC mixing, estimator computational time good estimators take a lot of time to compute. good estimators yield good mixing.

How to measure efficiency? Integrated Autocorrelation Time: test function? � ∞ IACT = 1 + 2 Cor ( ϕ ( X t ) , ϕ ( X t + k )) k = 1 Spectral Gap, Log-Sobolev, mixing time: complicated! Target distribution of ( X k , W k ) depends on distribution of the noise: difficult to compare chain with different target distributions.

a toy example One dimensional Random Walk metropolis y = x + λ ξ with ξ D ∼ N ( 0 , 1 ) Gaussian target π ( x ) ∝ exp ( − x 2 / 2 ) Estimator of the form � π ( x ) = π ( x ) × e W with W = σ Z − σ 2 / 2 and Z D ∼ N ( 0 , 1 ) so that Var ( W ) = σ 2 Process ( X k , W k ) is a Markov chain

Influence of the noise σ noise sd=0 2 1 X 0 −1 −2 −3 0 100 200 300 400 500 iteration number noise sd=1 2 1 0 X −1 −2 0 100 200 300 400 500 iteration number noise sd=5 2 X 1 0 0 100 200 300 400 500 iteration number noise sd=10 0.0 −0.2 X −0.4 −0.6 −0.8 0 100 200 300 400 500 iteration number

Influence of the jump size λ 0.35 30 0.30 Spectral Gap 0.25 IACT 20 0.20 10 0.15 0.25 0.50 0.75 0.2 0.4 0.6 mean acceptance Mean Acceptance Probability

Diffusion limit High-dimensional state space, local move MCMC: Looks like a diffusion from far away all above approaches are identical if diffusion limit intuitive understanding + simple conclusions process W k is averaged out relevant for moderate dimension robust results (eg. 0.234 rule)

Goal Tuning of Pseudo Marginal RWM small jump size: explore slowly large jump size: most proposals are rejected cheap estimator: fast computation time, bad mixing good estimator: expensive, good mixing

High Dimension + simple target distributions product form target distribution in R d π ( x ) = f ( x 1 ) × f ( x 2 ) × . . . × f ( x d ) Random Walk Metropolis y = x + µ ξ D √ ξ ∼ N ( 0 , I d ) . with d Unbiased estimator � π ( x ) = π ( x ) × e W . Assumption: W is independent from x .

Accelerated first coordinate process. V ( d ) ( t ) = X ( d ) ( d × t ) 1 V ( d ) is not Markovian. In the limit d → ∞ it is!

Theorem (Sherlock-T-Roberts-Rosenthal) Weak convergence of V ( d ) on [ 0 , T ] towards Langevin diffusion � dV = 1 2 J ( µ ) ( log f ) ′ ( V ) dt + J ( µ ) dW Sketch of proof: look at Markov chain ( X k , W k ) on augmented space compute drift-variance (generator of diffusion) along X only two time scales: homogenization arguments X -process mixes after O ( d ) steps W -process mixes after O ( 1 ) steps average out the W process

Theorem (Sherlock-T-Roberts-Rosenthal) Weak convergence of V ( d ) on [ 0 , T ] towards Langevin diffusion � dV = 1 2 J ( µ ) ( log f ) ′ ( V ) dt + J ( µ ) dW Speed function: the larger J ( µ ) , the better the mixing Expected Squared Jump Distance: J ( µ ) = E � X k + 1 − X k � 2 Limiting mean acceptance probability: 0 < α ( µ ) < 1 α ( µ ) → 0 as µ increases Almost closed form expression for α ( µ ) and J ( µ ) if noise W is fixed, optimal µ independent from f ( x ) dx

Gaussian case: joint optimisation ( µ, σ ) Speed function 8 level 6 0.6 estimator variance 0.4 4 0.2 2 0 1 2 3 4 5 Jump Size

Gaussian case Speed function VS noise 10 -0.5 speed function in log-scale 10 -1 10 -1.5 10 -2 10 -2.5 10 -3 0 1 2 3 4 5 sigma

Discussion Gaussian assumption: is it relevant? What do we want to optimise? How robust the conclusions are?

Thank You! Questions?

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of - PowerPoint PPT Presentation

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with C. Sherlock (Lancaster), G. Roberts (Warwick), J. Rosenthal (Toronto). Pseudo-marginal MCMC Target density ( x ) , positive unbiased estimate

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Multitree Generalized Steppingstone Sampling A New MCMC Method for Estimating the Marginal

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 "5 L eaders h ip H acks

Introductory Probability MATH 107: Finite Mathematics University of Louisville March 5, 2014

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of - PowerPoint PPT Presentation

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with C. Sherlock (Lancaster), G. Roberts (Warwick), J. Rosenthal (Toronto). Pseudo-marginal MCMC Target density ( x ) , positive unbiased estimate

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Multitree Generalized Steppingstone Sampling A New MCMC Method for Estimating the Marginal

Developing tools to identify marginal lands and assess their potential for bioenergy production

Short Run Marginal Cost Short Run Marginal Cost K Peter Kolf General Manager Economic

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation &amp; marginal valuation &amp; un-priced

Economic viewpoint on risk transfer Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; How

Example 10.23 Compute the probability of obtaining a score of 11 on a single roll of two dice.

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 &quot;5 L eaders h ip H acks

Introductory Probability MATH 107: Finite Mathematics University of Louisville March 5, 2014

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima &amp; Fuminobu

What Is Risk? Lets start with your view Javier Go Estrada What is risk in the short

VICTORIA HARBOUR: VICTORIA HARBOUR: marginal valuation & marginal valuation & un-priced

Economic viewpoint on risk transfer Eric Marsden <eric.marsden@risk-engineering.org> How

5 L e ad e rs h ip H ac k s E n tw i ck l er t ag F ra n kf u rt 2015 "5 L eaders h ip H acks

Level Crossing between QCD Axion and ALP collaboration with Naoya Kitajima & Fuminobu