tuning of pseudo marginal mcmc
play

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of - PowerPoint PPT Presentation

Tuning of pseudo-marginal MCMC Alex Thiry 1 1 National University of Singapore Joint work with C. Sherlock (Lancaster), G. Roberts (Warwick), J. Rosenthal (Toronto). Pseudo-marginal MCMC Target density ( x ) , positive unbiased estimate


  1. Tuning of pseudo-marginal MCMC Alex Thiéry 1 1 National University of Singapore Joint work with C. Sherlock (Lancaster), G. Roberts (Warwick), J. Rosenthal (Toronto).

  2. Pseudo-marginal MCMC Target density π ( x ) , positive unbiased estimate � π ( x ) Theorem (Beaumont 2003 - Andrieu-Roberts 2009) Usual MCMC with � π ( x ) instead of π ( x ) is correct! No free lunch: poor estimator � π , poor mixing.

  3. Wrong Pseudo-marginal MCMC start from x 0 ∈ X 1 Proposal: y D ∼ q ( x k , · ) 2 � π ( y ) q ( y , x k ) Noisy Metropolis-Hastings: α = 1 ∧ 3 � π ( x k ) q ( x y , y ) y �→ x k + 1 with probability α 4 Not correct. Not too bad, after correction. � � � π ( y ) / � Notice that E π ( x ) � = π ( y ) /π ( x )

  4. Pseudo-marginal MCMC start from x 0 ∈ X , estimator z 0 = � π ( x 0 ) 1 Proposal: y D ∼ q ( x k , · ) , estimator z y = � π ( y ) 2 z y q ( y , x k ) Noisy Metropolis-Hastings: α = 1 ∧ 3 z k q ( x y , y ) ( y , z y ) �→ ( x k + 1 , z k + 1 ) with probability α 4

  5. Latent variables Observation y , latent variable u , parameter x ∈ X � P ( y | u , x ) P ( du | x ) π ( x ) ∝ π 0 ( x ) × U data augmentation: MCMC on X × U high correlation between latent variable and parameter? High-dimensional latent variable u ? Pseudo-marginal: imitate MCMC on X only If one can simulate u | x and compute P ( y | u , x ) � n k = 1 P ( y | u k , x ) � π ( x ) = π 0 ( x ) × n

  6. Unbiased estimators? importance sampling sequential importance sampling particle filter unbiased estimator of f ( µ ) from unbiased estimator of µ . Random truncation, Russian’s roulette Probabilistic interpretation of PDEs (Feynman-Kac): replace solving a PDE by simulating a few particles

  7. Stickyness better estimator, better mixing [talk C. Andrieu] � n k = 1 P ( y | u k , x ) � π ( x ) = π 0 ( x ) × n behaves like marginal chain as Var ( � π ( x )) → 0 if � π ( x ) ≫ π ( x ) , parameter x is over-estimated: stuck! Trade-off: MCMC mixing, estimator computational time good estimators take a lot of time to compute. good estimators yield good mixing.

  8. How to measure efficiency? Integrated Autocorrelation Time: test function? � ∞ IACT = 1 + 2 Cor ( ϕ ( X t ) , ϕ ( X t + k )) k = 1 Spectral Gap, Log-Sobolev, mixing time: complicated! Target distribution of ( X k , W k ) depends on distribution of the noise: difficult to compare chain with different target distributions.

  9. a toy example One dimensional Random Walk metropolis y = x + λ ξ with ξ D ∼ N ( 0 , 1 ) Gaussian target π ( x ) ∝ exp ( − x 2 / 2 ) Estimator of the form � π ( x ) = π ( x ) × e W with W = σ Z − σ 2 / 2 and Z D ∼ N ( 0 , 1 ) so that Var ( W ) = σ 2 Process ( X k , W k ) is a Markov chain

  10. Influence of the noise σ noise sd=0 2 1 X 0 −1 −2 −3 0 100 200 300 400 500 iteration number noise sd=1 2 1 0 X −1 −2 0 100 200 300 400 500 iteration number noise sd=5 2 X 1 0 0 100 200 300 400 500 iteration number noise sd=10 0.0 −0.2 X −0.4 −0.6 −0.8 0 100 200 300 400 500 iteration number

  11. Influence of the jump size λ 0.35 30 0.30 Spectral Gap 0.25 IACT 20 0.20 10 0.15 0.25 0.50 0.75 0.2 0.4 0.6 mean acceptance Mean Acceptance Probability

  12. Diffusion limit High-dimensional state space, local move MCMC: Looks like a diffusion from far away all above approaches are identical if diffusion limit intuitive understanding + simple conclusions process W k is averaged out relevant for moderate dimension robust results (eg. 0.234 rule)

  13. Goal Tuning of Pseudo Marginal RWM small jump size: explore slowly large jump size: most proposals are rejected cheap estimator: fast computation time, bad mixing good estimator: expensive, good mixing

  14. High Dimension + simple target distributions product form target distribution in R d π ( x ) = f ( x 1 ) × f ( x 2 ) × . . . × f ( x d ) Random Walk Metropolis y = x + µ ξ D √ ξ ∼ N ( 0 , I d ) . with d Unbiased estimator � π ( x ) = π ( x ) × e W . Assumption: W is independent from x .

  15. Accelerated first coordinate process. V ( d ) ( t ) = X ( d ) ( d × t ) 1 V ( d ) is not Markovian. In the limit d → ∞ it is!

  16. Theorem (Sherlock-T-Roberts-Rosenthal) Weak convergence of V ( d ) on [ 0 , T ] towards Langevin diffusion � dV = 1 2 J ( µ ) ( log f ) ′ ( V ) dt + J ( µ ) dW Sketch of proof: look at Markov chain ( X k , W k ) on augmented space compute drift-variance (generator of diffusion) along X only two time scales: homogenization arguments X -process mixes after O ( d ) steps W -process mixes after O ( 1 ) steps average out the W process

  17. Theorem (Sherlock-T-Roberts-Rosenthal) Weak convergence of V ( d ) on [ 0 , T ] towards Langevin diffusion � dV = 1 2 J ( µ ) ( log f ) ′ ( V ) dt + J ( µ ) dW Speed function: the larger J ( µ ) , the better the mixing Expected Squared Jump Distance: J ( µ ) = E � X k + 1 − X k � 2 Limiting mean acceptance probability: 0 < α ( µ ) < 1 α ( µ ) → 0 as µ increases Almost closed form expression for α ( µ ) and J ( µ ) if noise W is fixed, optimal µ independent from f ( x ) dx

  18. Gaussian case: joint optimisation ( µ, σ ) Speed function 8 level 6 0.6 estimator variance 0.4 4 0.2 2 0 1 2 3 4 5 Jump Size

  19. Gaussian case Speed function VS noise 10 -0.5 speed function in log-scale 10 -1 10 -1.5 10 -2 10 -2.5 10 -3 0 1 2 3 4 5 sigma

  20. Discussion Gaussian assumption: is it relevant? What do we want to optimise? How robust the conclusions are?

  21. Thank You! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend