Parallel tempering and Interacting MCMC algorithms Gersende FORT / - PowerPoint PPT Presentation

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris Tech CNRS - LTCI

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Part II: Adaptive Equi-Energy samplers Joint work with Amandine Schreck Aur´ elien Garivier and Eric Moulines from LTCI, Telecom ParisTech & CNRS, France.

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler From Parallel Tempering to Interacting Tempering ◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting Tempering algorithm. ◮ The idea is to replace an instantaneous swap by an interaction with the whole past of a neighboring process on the temperature ladder .

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler From Parallel Tempering to Interacting Tempering ◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting Tempering algorithm. ◮ The idea is to replace an instantaneous swap by an interaction with the whole past of a neighboring process on the temperature ladder . Equi-Energy sampler Kou et al (2006) ◮ Will define X ( t ) = { X ( t ) n , n ≥ 0 } with X (1) (hot temperature), · · · , X ( K ) target process. ◮ Algorithm: given the previous level X ( k − 1) 1: n − 1 and the current point X ( k ) n − 1 , define X ( k ) as follows: n ◮ (MCMC step / local moves) with probability ǫ , with P ( k ) s.t. π ( k ) P ( k ) = π ( k ) X ( k ) ∼ P ( k ) ( X ( k ) n − 1 , · ) n ◮ (Interaction step / global moves) otherwise, (i) selection of a point X ( k − 1) among the set { X ( k − 1) 1: n − 1 } with the same • energy level as X ( k ) n − 1 (ii) acceptance-rejection ratio.

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Numerical application: on the interest of EE Target density : mixture of 2−dim Gaussian 10 8 ◮ target density : π = � 20 i =1 N 2 ( µ i , Σ i ) 6 ◮ K processes with target distribution π 1 /T k 4 2 ( T K = 1) 0 draws means of the components −2 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 1 Target density at temperature 2 Target density at temperature 3 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 draws draws draws means of the components means of the components means of the components −4 −2 −2 −2 0 2 4 6 8 10 12 −2 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 4 Target density at temperature 5 Hastings−Metropolis 12 10 10 9 10 8 8 8 7 6 6 6 4 5 4 4 2 2 3 2 0 0 draws draws 1 draws means of the components means of the components means of the components −2 −2 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler “Design parameters” of the EE sampler 1. How to choose the probability of interaction ǫ ? 2. How many temperatures , and which ones ? 3. How many energy levels , and which ones ? Despite many convergence analysis (on EE with no selection) lim n E [ h ( X ( K ) ◮ ergodicity: )] = π ( h ) n � n j =1 h ( X ( K ) ◮ law of large numbers: 1 ) → π ( h ) in P or a.s. j n √ n − 1 � n j =1 { h ( X ( K ) ◮ CLT: ) − π ( h ) } → D N (0 , σ 2 ) j see e.g. Kou, Zhou, Wong (2006); Atchad´ e (2010); Andrieu, Jasra, Doucet, Del Moral (2011); Fort, Moulines, Priouret (2012); Fort, Moulines, Priouret, Vandekerkhove (2012) these problems are still open.

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler “Design parameters” of the EE sampler 1. How to choose the probability of interaction ǫ ? 2. How many temperatures , and which ones ? 3. How many energy levels , and which ones ? ◮ In the original EE: energy rings = strata in the range of the energy H of the target π π ( x ) = exp( −H ( x )) Choose H i s.t. min H < H 1 < · · · < H L . Energy Ring # i = { x, H ( x ) ∈ [ H i − 1 , H i ] } ◮ Our contribution : tune adaptively the boundaries of the strata

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Num. Appl.: fixed boundaries vs adapted boundaries ◮ Target distribution on R 6 π = 1 2 N 6 ( µ, 0 . 3 Id) + 1 2 N 6 ( − µ, 0 . 2 Id) µ = [2 , · · · , 2] ◮ We compare Hastings-Metropolis (HM); and the EE sampler and the Adaptive EE sampler when applied with 3 temperatures and 11 strata. u T X ; v T X ◮ The last plot is for the 2 -d projection � � with u T ∝ [1 , 1 , · · · , 1] v T ∝ [1 , 1 , 1 , − 1 , − 1 , − 1]

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Behavior along one path: HM EE A-EE [Top] Error when estimating the means L1 error when estimating the means 2.5 � � MH 6 n EES � � 1 1 SA−AEES X ( K ) � � � � − E π [ X i ] � j,i � 2 6 n � � i =1 j =1 � � 1.5 [Bottom L] Time spent in one of the mode where the path is initialized . 1 [Bottom R] Probability of being in 0.5 some ellipsoids, for the first mode 0 0 1 2 3 4 5 6 (line) and the second one (dashed line) x 10 5 Probability of being in some area (true prob. is 0.05) Time spent in the left mode 0.12 100 mode 1 ees ees mode 2 ees aees mode 1 aees true mode 2 aees true mode 1 true mode 2 90 0.1 80 0.08 70 0.06 60 0.04 50 0.02 40 0 30 0 1 2 3 4 5 6 0 1 2 3 4 5 6 5 x 10 5 x 10

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Behavior on 50 ind. run HM EE A-EE [Top] Error when estimating the means � � 6 n 1 � 1 � X ( K ) L1 error for HM (red), EES (black) and AEES (blue) � � � � − E π [ X i ] � � j,i 6 n � � 2 i =1 j =1 � � 1.5 [Bottom L] Time spent in one of the mode where the path is initialized . 1 [Bottom R] Probability of being in 0.5 some ellipsoids for the first mode Proba of being in the left ellipsoid, for EES (black) and AEES (blue) 0 1000 5000 25 k 100 k 250 k 400 k 550 k 0.16 0.14 Percent of the time spent in the first component, for EES (black) and AEES (blue) 100 0.12 90 0.1 80 70 0.08 60 50 0.06 40 30 0.04 20 0.02 10 0 1000 5000 25 k 100 k 250 k 400 k 550 k 0 1000 5000 25 k 100 k 250 k 400 k 550 k

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Adaptive tuning of the boundaries of the energy rings → How to define the boundaries H 1 , · · · , H L of the energy rings ? ֒ Algorithm ◮ Level 1 (Hot level) ◮ Sample X (1) with target π 1 /T 1 (MCMC). ◮ at each time n , update the boundaries H (1) n, 1 , · · · , H (1) n,L computed from X (1) 1: n ◮ Level 2 ◮ Sample X (2) (MCMC step and interaction step) with target π 1 /T 2 . For the interaction step, use the boundaries H (1) • . ◮ at each time n , update the boundaries H (2) n, 1 , · · · , H (2) n,L computed from X (2) 1: n ◮ Repeat until Level K .

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler On the convergence of such adaptive schemes Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction: ◮ we assume the process X ( k − 1) ”converges”. ◮ we prove that the process X ( k ) has the same convergence properties. ◮ Repeat from level 1 to K . Tools for the proof: ◮ the conditional distribution L ( X ( k ) n | past (1: k ) n − 1 ) is P ( k ) θ n − 1 ( X ( k ) n − 1 , · ) P ( k ) θn ( x, dy ) = ǫP ( k )( x, dy ) + (1 − ǫ ) K ( k ) θn ( x, dy ) gθn ( x, y ) θn ( dy ) gθn ( x, y ) θn ( dy ) K ( k ) � α ( k ) � { 1 − α ( k ) θn ( x, A ) = θn ( x, y ) + δx ( A ) θn ( x, y ) } � � A gθn ( x, z ) θn ( dz ) gθn ( x, z ) θn ( dz ) π 1 /Tk − 1 /Tk − 1 ( y ) n � gθn ( x, z ) θn ( dz ) 1 α ( k ) � θn = δ θn ( x, y ) = 1 ∧ X ( k − 1) π 1 /Tk − 1 /Tk − 1 ( x ) n � gθn ( y, z ) θn ( dz ) j =1 j gθn ( x, y ) = ” x and y are in the same energy ring with boundaries defined by H ( k − 1) ” n, • ( ex. ) � 0 if if x, y are in the same energy level = 1 if otherwise

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler On the convergence of such adaptive schemes Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction: ◮ we assume the process X ( k − 1) ”converges”. ◮ we prove that the process X ( k ) has the same convergence properties. ◮ Repeat from level 1 to K . Tools for the proof: ◮ the conditional distribution L ( X ( k ) n | past (1: k ) n − 1 ) is P ( k ) θ n − 1 ( X ( k ) n − 1 , · ) ◮ containment and diminishing adaptation conditions extensions from the pioneering work by (Roberts, Rosenthal (2005)) + Poisson equation + Limit Theorems for Martingales. ◮ condition on the adapted boundaries (i) There exists β > 0 s.t. lim n n β � � � H ( k ) n, • − H ( k ) � = 0 w.p.1. � � n − 1 , • (ii) H ( k ) n, • → H ( k ) ∞ , • w.p.1 when n → ∞ . (iii) assumption on the limiting boundaries: � g ( k ) ∞ ( x, y ) π 1 /T k ( dy ) > 0 inf x

Parallel tempering and Interacting MCMC algorithms Gersende FORT / - PowerPoint PPT Presentation

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris Tech CNRS - LTCI Parallel tempering and Interacting MCMC algorithms

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

On the infinite swapping limit for parallel Outline Standard tempering measures of performance

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC A scalable parallel tempering

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses

Delivering Value While Behind Enemy Lines Matt Barrett Adaptive Financial Consulting Matt

An Adaptive Pursuit Strategy for Allocating Operator Probabilities Dirk Thierens Department of

Estimating Baryonic Properties of Galaxies from Their Dark Matter universetoday.com MPA/Springel

The DWR method for numerical simulations related to nuclear waste disposals Roland Becker

Accelerating PDE-Constrained Optimization Problems using Adaptive Reduced-Order Models Matthew J.

AMath 483/583 Lecture 20 Notes: Outline: Adaptive quadrature, recursive functions

Numerical Integration Gerald Recktenwald Portland State University Mechanical Engineering

CS 221 Lecture 11 Numerical Integration Tuesday, 22 November 2011 Todays Agenda 1.