Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI - PowerPoint PPT Presentation

Limit theorems for adaptive MCMC algorithms Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In collaboration with Yves ATCHADE (Univ. Michigan, US) , Eric MOULINES (TELECOM ParisTech) and Pierre PRIOURET (Univ. Paris 6) .

Limit theorems for adaptive MCMC algorithms Markov chain Monte Carlo algorithms (MCMC) : algorithms to sample from a target density π ◮ in some applications : known up to a (normalizing) constant. ◮ complex, so that exact sampling from π is not possible.

Limit theorems for adaptive MCMC algorithms Markov chain Monte Carlo algorithms (MCMC) : algorithms to sample from a target density π ◮ in some applications : known up to a (normalizing) constant. ◮ complex, so that exact sampling from π is not possible. Define a Markov chain { X n ,n ≥ 0 } with transition kernel : P � E [ f ( X n +1 ) |F n ] = f ( y ) P ( X n ,dy ) so that ◮ for any bounded function f : lim n E x [ f ( X n )] = π ( f ) . ◮ for any function f increasing like · · · : n − 1 P n → a.s. π ( f ) . k =1 f ( X k ) − ◮ · · ·

Limit theorems for adaptive MCMC algorithms I. Adaptive MCMC : ◮ why? ◮ does the process { X n ,n ≥ 0 } approximate π ?

Limit theorems for adaptive MCMC algorithms Motivation Symmetric Random Walk Hastings-Metropolis algorithm 1.1. Symmetric Random Walk Hastings-Metropolis algorithm An example of transition kernel P is described by the algorithm: ◮ Choose : a proposal density q ◮ Iterate: starting from X n ◮ draw (an increment) Y n +1 ∼ q ( · ) ◮ compute the acceptation ratio α ( X n ,X n + Y n +1 ) := 1 ∧ π ( X n + Y n +1 ) π ( X n ) ◮ set  Y n +1 + X n with probability α ( X n ,X n + Y n +1 ) X n +1 = X n with probability 1 − α ( X n ,X n + Y n +1 )

Limit theorems for adaptive MCMC algorithms Motivation Symmetric Random Walk Hastings-Metropolis algorithm 1.1. Symmetric Random Walk Hastings-Metropolis algorithm An example of transition kernel P is described by the algorithm: ◮ Choose : a proposal density q ◮ Iterate: starting from X n ◮ draw (an increment) Y n +1 ∼ q ( · ) ◮ compute the acceptation ratio α ( X n ,X n + Y n +1 ) := 1 ∧ π ( X n + Y n +1 ) π ( X n ) ◮ set  Y n +1 + X n with probability α ( X n ,X n + Y n +1 ) X n +1 = X n with probability 1 − α ( X n ,X n + Y n +1 ) The efficiency of the algorithm depends upon the proposal q

Limit theorems for adaptive MCMC algorithms Motivation On the choice of the variance of the proposal distribution 1.2. On the choice of the variance of the proposal distribution For ex., when q is Gaussian, how to choose its variance matrix Σ q ?

Limit theorems for adaptive MCMC algorithms Motivation On the choice of the variance of the proposal distribution ◮ When π ∼ N d ( µ π , Σ π ) , the optimal choice for the variance of q is Σ q = (2 . 38) 2 Σ π . d Results obtained by the ’scaling’ technique (see also ’fluid limit’ ). Generalizations exist (other MCMC; relaxing conditions on π ) edard (2007); Fort-Moulines-Priouret (2008) . Roberts-Rosenthal (2001); B´ ◮ This suggests an adaptive procedure : learn Σ π “on the fly” and modify the variance Σ q continuously during the run of the algorithm.

Limit theorems for adaptive MCMC algorithms Motivation On the choice of the variance of the proposal distribution ◮ When π ∼ N d ( µ π , Σ π ) , the optimal choice for the variance of q is Σ q = (2 . 38) 2 Σ π . d Results obtained by the ’scaling’ technique (see also ’fluid limit’ ). Generalizations exist (other MCMC; relaxing conditions on π ) edard (2007); Fort-Moulines-Priouret (2008) . Roberts-Rosenthal (2001); B´ ◮ This suggests an adaptive procedure : learn Σ π “on the fly” and modify the variance Σ q continuously during the run of the algorithm. Example : at each iteration, choose q equal to � 0 , (2 . 38) 2 d − 1 ˆ � 0 , (0 . 1) 2 d − 1 I d � � 0 . 95 N Σ n + 0 . 05 N where Σ n − 1 + 1 { X n − µ n }{ X n − µ n } T − ˆ � � Σ n = ˆ ˆ Σ n − 1 n µ n = µ n − 1 + 1 n ( X n − µ n − 1 ) Haario et al. (2001); Roberts-Rosenthal (2006)

Limit theorems for adaptive MCMC algorithms Motivation On the choice of the variance of the proposal distribution

Limit theorems for adaptive MCMC algorithms Motivation Does adaptation preserve convergence? 1.3. Be careful with adaptation ! The previous example illustrates the general framework : ◮ Let { P θ ,θ ∈ Θ } be a family of Markov kernels s.t. πP θ = π for any θ ∈ Θ . ◮ Define a process { ( θ n ,X n ) ,n ≥ 0 } : ◮ X n +1 ∼ P θ n ( X n , · ) ◮ update θ n +1 based on ( θ n ,X n ,X n +1 ) “internal” adaptation Is it true that the marginal { X n ,n ≥ 0 } approximates π ?

Limit theorems for adaptive MCMC algorithms Motivation Does adaptation preserve convergence? 1.3. Be careful with adaptation ! The previous example illustrates the general framework : ◮ Let { P θ ,θ ∈ Θ } be a family of Markov kernels s.t. πP θ = π for any θ ∈ Θ . ◮ Define a process { ( θ n ,X n ) ,n ≥ 0 } : ◮ X n +1 ∼ P θ n ( X n , · ) ◮ update θ n +1 based on ( θ n ,X n ,X n +1 ) “internal” adaptation Is it true that the marginal { X n ,n ≥ 0 } approximates π ? Not always, unfortunately for θ ∈ ]0 , 1[ � (1 − θ ) � 1 / 2 � � θ P θ = π = θ (1 − θ ) 1 / 2 Let t 1 ,t 2 ∈ ]0 , 1[ , and set θ k = t i iff X k = i . Then { X n ,n ≥ 0 } is Markov with invariant probability π ∝ [ t 2 t 1 ] T ˜ � = π

Limit theorems for adaptive MCMC algorithms II. Sufficient conditions for convergence of adaptive schemes { ( θ n ,X n ) ,n ≥ 0 } ◮ convergence of the marginals { X n ,n ≥ 0 } ◮ law of large numbers w.r.t. { X n ,n ≥ 0 }

Limit theorems for adaptive MCMC algorithms Convergence of the marginals (ergodicity) Sufficient conditions 2.1. Convergence of the marginals : Suff Cond Let ◮ a family of Markov kernels { P θ ,θ ∈ Θ } s.t. P θ has an unique invariant probability measure Π θ ◮ a filtration F n and a process { ( X n ,θ n ) ,n ≥ 0 } s.t. for any f ≥ 0 , � E [ f ( X n +1 ) |F n ] = f ( y ) P θ n ( X n ,dy ) P − a.s. Given a target density π ⋆ , which set of conditions will imply lim sup | E [ f ( X n )] − π ⋆ ( f ) | = 0 ? n f, | f | ∞ ≤ 1

Limit theorems for adaptive MCMC algorithms Convergence of the marginals (ergodicity) Sufficient conditions Idea : E [ f ( X n )] − π ⋆ ( f ) = E [ E [ f ( X n ) |F n − N ]] − π ⋆ ( f ) � � � � E [ f ( X n ) |F n − N ] − P N P N = E θ n − N f ( X n − N ) + E θ n − N f ( X n − N ) − π θ n − N ( f ) � � + E π θ n − N ( f ) − π ⋆ ( f )

Limit theorems for adaptive MCMC algorithms Convergence of the marginals (ergodicity) Sufficient conditions Idea : E [ f ( X n )] − π ⋆ ( f ) = E [ E [ f ( X n ) |F n − N ]] − π ⋆ ( f ) � � � � E [ f ( X n ) |F n − N ] − P N P N = E θ n − N f ( X n − N ) + E θ n − N f ( X n − N ) − π θ n − N ( f ) � � + E π θ n − N ( f ) − π ⋆ ( f ) i.e. conditions on ◮ (Diminishing Adaptation) the difference � P θ n ( x, · ) − P θ n − 1 ( x, · ) � TV ◮ (ergodicity of P θ / Containment) the convergence of � P N θ ( x, · ) − π θ � TV as N → + ∞ . ◮ (convergence of the stationary measures) convergence of π θ n ( f ) − π ⋆ ( f ) as n → + ∞ .

Limit theorems for adaptive MCMC algorithms Convergence of the marginals (ergodicity) Sufficient conditions Set M ǫ ( x,θ ) := inf { n ≥ 1 , � P n θ ( x, · ) − π θ � TV ≤ ǫ } . Theorem Assume ( i ) D.A. cond sup x � P θ n ( x, · ) − P θ n − 1 ( x, · ) � TV − → P 0 ( ii ) C. cond ∀ ǫ > 0 , lim M sup n P ( M ǫ ( X n ,θ n ) ≥ M ) = 0 ( iii ) π θ = π ⋆ Then sup f, | f | ∞ ≤ 1 | E [ f ( X n )] − π ⋆ ( f ) | = 0 . i.e. conditions on ◮ (Diminishing Adaptation) the difference � P θ n ( x, · ) − P θ n − 1 ( x, · ) � TV ◮ (ergodicity of P θ / Containment) the convergence of � P N θ ( x, · ) − π θ � TV as N → + ∞ . ◮ (convergence of the stationary measures) convergence of π θ n ( f ) − π ⋆ ( f ) as n → + ∞ .

Limit theorems for adaptive MCMC algorithms Convergence of the marginals (ergodicity) Sufficient conditions Set M ǫ ( x,θ ) := inf { n ≥ 1 , � P n θ ( x, · ) − π θ � TV ≤ ǫ } . Theorem Assume ( i ) D.A. cond sup x � P θ n ( x, · ) − P θ n − 1 ( x, · ) � TV − → P 0 ( ii ) C. cond ∀ ǫ > 0 , lim M sup n P ( M ǫ ( X n ,θ n ) ≥ M ) = 0 ( iii ) ∀ ǫ > 0 , sup f ∈ F P ( | π θ n ( f ) − π ⋆ ( f ) | > ǫ ) → 0 sup f ∈ F | E [ f ( X n )] − π ⋆ ( f ) | = 0 . Then i.e. conditions on ◮ (Diminishing Adaptation) the difference � P θ n ( x, · ) − P θ n − 1 ( x, · ) � TV ◮ (ergodicity of P θ / Containment) the convergence of � P N θ ( x, · ) − π θ � TV as N → + ∞ . ◮ (convergence of the stationary measures) convergence of π θ n ( f ) − π ⋆ ( f ) as n → + ∞ .

Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI - PowerPoint PPT Presentation

Limit theorems for adaptive MCMC algorithms Limit theorems for adaptive MCMC algorithms Gersende FORT LTCI CNRS - TELECOM ParisTech In collaboration with Yves ATCHADE (Univ. Michigan, US) , Eric MOULINES (TELECOM ParisTech) and Pierre PRIOURET

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Stability of Markov Chains based on fluid limit techniques. Applications to MCMC Gersende FORT

Limit theorems for BSDE with local time applications to nonlinear PDE Mhamed Eddahbi

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Cool theorems proved by undergraduates Ken Ono Emory University Cool theorems proved by

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Convergence of Random Processes DS GA 1002 Probability and Statistics for Data Science

Global exact controllability in infnite time of Schrdinger equation Vahagn Nersesyan

Some topics related to bounding by canonical functions Sean Cox Institute for mathematical logic

Africa: the case of Rwanda Andy McKay WIDER Inequality Conference, 5-6 September 2014 Inequality

Can Random Matrices Change the Future of Machine Learning? Malik TIOMOKO and Romain COUILLET

Convergence theorems for barycentric maps Fumio Hiai Tohoku University 2018, July (at Be

Building Random Trees from Blocks Mohan Gopaladesikan Department of Statistics, Purdue University

Definably amenable groups in NIP Artem Chernikov (Paris 7) Lyon, 21 Nov 2013 Joint work