Convergence of Adaptive and Interacting MCMC algorithms Gersende - PowerPoint PPT Presentation

Convergence of Adaptive and Interacting MCMC algorithms Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France Joint work with E. Moulines (LTCI, France) and P. Priouret (LPMA, France)

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC Convergence of the marginals for adaptive MCMC samplers Law of large numbers for adaptive MCMC samplers Convergence of the stationary distributions π θ n Applications

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC I. Two examples of adaptive MCMC samplers an Adaptive MCMC algorithm 1 an Interacting MCMC algorithm 2

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis Example 1: The Adaptive Metropolis [Haario et al. (1999)] Consider the Metropolis-Hastings algorithm with target density π on X X ⊆ R d , density w.r.t. the Lebesgue measure with Gaussian proposal q θ ( x, y ) = N d ( x, θ )[ y ] → How to choose the design parameter θ ? ֒

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis Example 1: The Adaptive Metropolis [Haario et al. (1999)] Consider the Metropolis-Hastings algorithm with target density π on X X ⊆ R d , density w.r.t. the Lebesgue measure with Gaussian proposal q θ ( x, y ) = N d ( x, θ )[ y ] → How to choose the design parameter θ ? ֒ Ans: covariance matrix of π up to a scalar, [Roberts et al. (1997)] iteratively estimated by the empirical covariance matrix or a robust estimator n o n 1 ( X n +1 − µ n +1 )( X n +1 − µ n +1 ) T + κ Id d θ n +1 = n + 1 θ n + n + 1 1 µ n +1 = µ n + n + 1( X n +1 − µ n )

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis This yields the adaptive Metropolis algorithm: iteratively draw X n +1 ∼ P θ n ( X n , · ) transition kernel of a HM algo with Gaussian proposal with covariance matrix ∝ θn update the parameter θ n +1 , based on θ n and X 1: n +1

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis This yields the adaptive Metropolis algorithm: iteratively draw X n +1 ∼ P θ n ( X n , · ) transition kernel of a HM algo with Gaussian proposal with covariance matrix ∝ θn update the parameter θ n +1 , based on θ n and X 1: n +1 In this example πP θ = π i.e. same invariant distribution θ n ∈ Θ where Θ is a finite dimensional parameter space.

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Example 2: The Equi-Energy sampler (simplified) [Kou et al. (2006)] ֒ → For the simulation of multi-modal density π . Hastings−Metropolis 0.9 Processus auxiliaire 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −6 −4 −2 0 2 4 6 −10 −8 −6 −4 −2 0 2 4 6 Equi Energy, avec selection Equi Energy, sans selection 0.8 0.9 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −8 −6 −4 −2 0 2 4 6 −8 −6 −4 −2 0 2 4 6

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) )

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) ) Define iteratively the process of interest { X n , n ≥ 0 } with probability (1 − ǫ ) : draw X n +1 ∼ P ( X n , · ) with probability ǫ : draw at random Y through the past values Y 0: n and accept or not Y as the new value, with an acceptation-rejection algorithm.

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) ) Define iteratively the process of interest { X n , n ≥ 0 } with probability (1 − ǫ ) : draw X n +1 ∼ P ( X n , · ) with probability ǫ : draw at random Y through the past values Y 0: n and accept or not Y as the new value, with an acceptation-rejection algorithm. (simplified EE)

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) This yields the (simplified) Equi-Energy sampler: X n +1 ∼ P θ n ( X n , · ) n 1 X where θ n +1 = δ Y k n + 1 k =0 Z ff Z P θ ( x, A ) = (1 − ǫ ) P ( x, A ) + ǫ α ( x, y ) θ (d y ) + ✶ A ( x ) (1 − α ( x, y )) θ (d y ) A and α ( x, y ) defined such that πPθ⋆ = π where θ⋆ = lim n θn ∝ π 1 − β

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) This yields the (simplified) Equi-Energy sampler: X n +1 ∼ P θ n ( X n , · ) n 1 X where θ n +1 = δ Y k n + 1 k =0 Z ff Z P θ ( x, A ) = (1 − ǫ ) P ( x, A ) + ǫ α ( x, y ) θ (d y ) + ✶ A ( x ) (1 − α ( x, y )) θ (d y ) A and α ( x, y ) defined such that πPθ⋆ = π where θ⋆ = lim n θn ∝ π 1 − β In this example π θ P θ = π θ i.e. invariant distributions depending upon θ θ n ∈ Θ where Θ is an infinite dimensional parameter space.

Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC Conclusion Conclusion Let a family of transition kernels on X, { P θ , θ ∈ Θ } . Consider a X × Θ -valued process { ( X n , θ n ) , n ≥ 0 } such that it is adapted to a filtration F n . P ( X n +1 ∈ A |F n ) = P θ n ( X n , A ) ֒ → What kind of conditions on the adaption mecanism { θ n , n ≥ 0 } and on the transition kernels { P θ , θ ∈ Θ } imply that there exists a distribution π such that convergence of the marginals: E [ f ( X n )] → π ( f ) f bounded law of large numbers: n − 1 � n a.s. k =1 f ( X k ) − → π ( f ) f unbounded

Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers II. Convergence of the marginals for adaptive MCMC samplers For a bounded function f , E [ f ( X n )] − π ( f ) → 0 Even in the case the kernels P θ have the same invariant distribution, it is NOT true that ergodicity holds since the kernels are chosen at random. Conditions on the adaptation mecanism are required

Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f )

Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [A] condition on the ergodicity of the transition kernels ֒ “Usually”, the transition kernels { P θ , θ ∈ Θ } are geometrically ergodic : | P n θ f ( x ) − π θ ( f ) | ≤ C θ ρ n sup θ V ( x ) ρ θ ∈ (0 , 1) f, | f |≤ 1

Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [B] condition on the adaptation mecanism since ֒ ˛ h i˛ ˛ f ( X n ) − P N ˛ ˛ E θ n − N f ( X n − N ) ˛ N − 1 » – X ‚ ‚ ≤ ( N − j ) E sup ‚ P θ n − N + j ( x, · ) − P θ n − N + j − 1 ( x, · ) ‚ TV x j =1

Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [C] when π θ � = π , condition on the convergence of { π θ n , n ≥ 0 } to π ֒

Convergence of Adaptive and Interacting MCMC algorithms Gersende - PowerPoint PPT Presentation

Convergence of Adaptive and Interacting MCMC algorithms Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France Joint work with E. Moulines (LTCI, France) and P. Priouret (LPMA, France)

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Limit Theorems Markovs Inequality Chebyshevs Inequality Importance Allows

CS70: Lecture 28. Variance; Inequalities; WLLN 1. Review: Independence 2. Variance 3.

Laws of probabilities in efficient markets Vladimir Vovk Department of Computer Science Royal

MATH 20: PROBABILITY Fundamental Theorems of Probability Theory Xingru Chen

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Foundations of Computer Science Lecture 21 Deviations from the Mean How Good is the Expectation

WHY SUPERVISED LEARNING MAY WORK WHY SUPERVISED LEARNING MAY WORK Matthieu R Bloch Thrusday

The Source Coding Theorem Mathias Winther Madsen mathias.winther@gmail.com Institute for Logic,