Optimal scaling of the transient phase of Metropolis Hastings - PowerPoint PPT Presentation

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre Ecole des Ponts and INRIA Joint work with B. Jourdain and B. Miasojedow MCMSki, Chamonix, 8 January 2014

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Outline of the talk Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Metropolis Hastings algorithm The aim of the MH algorithm is to sample a target probability measure, say with density p on R n . Algorithm: iterate on k ≥ 0, • Proposition: At time k , given X n k , propose a move to ˆ X n k + 1 ∼ q ( X n k , y ) dy , where q ( x , y ) Markov density kernel on R n , k + 1 = ˆ • Acception/Rejection: Accept the move ( X n X n k + 1 ) with k , ˆ probability α ( X n X n k ) , where α ( x , y ) := p ( y ) q ( y , x ) p ( x ) q ( x , y ) ∧ 1 . Otherwise, reject the move ( X n k + 1 = X n k ). ( X n k ) k ≥ 0 is a reversible Markov chain wrt p ( x ) dx . The efficiency of the algorithm crucially depends on the choice of the proposal distribution q .

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Metropolis Hastings algorithm In the following, we focus on the Gaussian random walk proposal (RWM): • ˆ X n k + 1 = X n k + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) � � − | x − y | 2 1 • q ( x , y ) = ( 2 πσ 2 ) n / 2 exp = q ( y , x ) 2 σ 2 • Acceptance probability α ( x , y ) = p ( y ) p ( x ) ∧ 1. Another standard choice: one step of overdamped Langevin (MALA): k + σ 2 • ˆ X n k + 1 = X n 2 ( ∇ ln p )( X n k ) + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) • q ( x , y ) � = q ( y , x ) . Question: How to choose σ as a function of the dimension n ?

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Previous work: Roberts, Gelman, Gilks 97 Two fundamental assumptions: • (H1) Product target: p ( x ) = p ( x 1 , . . . , x n ) = � n i = 1 e − V ( x i ) , 0 = ( X 1 , n , . . . , X n , n • (H2) Stationarity: X n ) ∼ p ( x ) dx and thus 0 0 k = ( X 1 , n , . . . , X n , n ∀ k , X n ) ∼ p ( x ) dx . k k Then, pick the first component X 1 , n , choose k ℓ √ n , σ n = and rescale the time accordingly (diffusive scaling) by considering ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 . ( d ) Under regularity assumptions on V , as n → ∞ , ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 ⇒ ( X t ) t ≥ 0 unique solution of the SDE dX t = − h ( ℓ ) 1 � 2 V ′ ( X t ) dt + h ( ℓ ) dB t , ℓ √ � � � R ( V ′ ) 2 exp ( − V ) � x −∞ e − y 2 where h ( ℓ ) = 2 ℓ 2 Φ dy − with Φ( x ) = 2 π . √ 2 2

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Previous work: Roberts, Gelman, Gilks 97 Practical counterparts: (i) scaling of the variance proposal, (ii) scaling of the number of iterations. Question: How to choose ℓ ? ℓ √ � � � R ( V ′ ) 2 exp ( − V ) • The function ℓ �→ h ( ℓ ) = 2 ℓ 2 Φ − is maximum 2 at ℓ ⋆ ≃ 2 . 38 √ � R ( V ′ ) 2 exp ( − V ) . • Besides, the limiting average acceptance rate is � i = 1 ( V ( x i ) − V ( y i )) ∧ 1 � n q ( x , y ) e − � n k , ˆ E [ α ( X n X n i = 1 V ( x i ) dxdy k + 1 )] = R n × R n e � �� α ( x , y ) ��   R ( V ′ ) 2 exp ( − V ) ℓ  ∈ ( 0 , 1 ) . − → n →∞ acc ( ℓ ) = 2 Φ  − 2 Observe that acc ( ℓ ⋆ ) ≃ 0 . 234, whatever V . This justifies a constant acceptance rate strategy, with a target acceptance rate of approximately 25 % .

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm A few references • (H1) + (H2) and various proposals: Gaussian RWM Roberts Gelman Gilks 1997 , MALA Roberts Rosenthal 1997 , nonGaussian RWM Neal Roberts 2011 , RWM discontinuous target Neal Roberts Yuen 2012 , Mutiple try MCMC B´ edard Douc Moulines 2012 , Delayed rejection MCMC B´ edard Douc Moulines 2013 , Hybrid Monte Carlo Beskos Pillai Roberts Sanz-Serna Stuart 2013 . • Beyond (H1): i. but non i.d. components RWM B´ edard 2007,2009 ; finite range interactions Breyer Roberts 2000 ; mean-field interaction Breyer Piccioni Scarlatti 2004 ; density w.r.t. i.i.d. Beskos Roberts Stuart 2009 ; infinite-dimensional target with density w.r.t. Gaussian field RWM Mattingly, Pillai, Stuart 2012 , MALA Pillai, Stuart, Thiery 2012 . • Beyond (H2): Christensen, Roberts, Rosenthal 2005 Partial results for RWM and MALA with Gaussian target, Pillai, Stuart, Thiery 2013 modified RWM for infinite-dimensional target with density w.r.t. Gaussian field. Aim of this work: Study of the limit n → ∞ without the stationarity assumption (H2).

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm The limit n → ∞ without (H2) We consider the RWMH with target p ( x ) = � n i = 1 exp ( − V ( x i )) : ( G i k ) i , k ≥ 1 are i.i.d. ∼ N 1 ( 0 , 1 ) independent of ( U k ) k ≥ 1 i.i.d. ∼ U [ 0 , 1 ] , and  X i , n k + 1 = X i , n √ n G i ℓ + k + 1 1 A k + 1 , 1 ≤ i ≤ n ,  k � k + 1 )) � i = 1 ( V ( X i , n k ) − V ( X i , n � n k + ℓ √ n G i with A k + 1 = U k + 1 ≤ e .  From now on, we assume that V is C 3 with V ′′ and V ( 3 ) bounded. Theorem Assume that � R ( V ′ ) 4 ( x ) m ( dx ) < + ∞ , 1. m is a probability measure on R s.t. 2. ∀ n ≥ 1 , X 1 , n , . . . , X n , n are i.i.d. according to m. 0 0 Then the process ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 converges in distribution to the unique solution of the SDE nonlinear in the sense of McKean: X 0 ∼ m, dX t = −G ( a ( t ) , b ( t )) V ′ ( X t ) dt + Γ 1 / 2 ( a ( t ) , b ( t )) dB t with a ( t ) = E [( V ′ ( X t )) 2 ] , b ( t ) = E [ V ′′ ( X t )] , and...

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm The functions Γ and G 2 √ a − √ a  � � � � �� ℓ 2 ( a − b ) ℓ 2 Φ − ℓ b + ℓ 2 e b Φ ℓ if a ∈ ( 0 , + ∞ ) , 2 √ a  2   ℓ 2 Γ( a , b ) = 2 if a = + ∞ ,   ℓ 2 e − ℓ 2 b + where b + = max ( b , 0 ) if a = 0 ,  2  2 √ a − √ a � � �� ℓ 2 ( a − b ) ℓ 2 e b if a ∈ ( 0 , + ∞ ) ,  Φ ℓ 2 G ( a , b ) = 0 if a = + ∞ and 1 { b > 0 } ℓ 2 e − ℓ 2 b 2 if a = 0 . 

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Remarks • Limiting acceptance rate: t �→ P ( A ⌊ nt ⌋ ) converges to t �→ acc ( a ( t ) , b ( t )) where a ( t ) = E [( V ′ ( X t )) 2 ] , b ( t ) = E [ V ′′ ( X t )] and acc ( a , b ) = 1 ℓ 2 Γ( a , b ) . • Stationary case: If m ( dx ) = e − V ( x ) dx , then ∀ t ≥ 0 X t ∼ e − V ( x ) dx � � R V ′ ( − e − V ) ′ = and a ( t ) = E [( V ′ ( X t )) 2 ] = R V ′ ( V ′ e − V ) = � R V ′′ e − V = E [ V ′′ ( X t )] = b ( t ) are constant. Using the fact that − ℓ √ a / 2 � � for a > 0, Γ( a , a ) = 2 G ( a , a ) = 2 ℓ 2 Φ , we are back to the dynamics dX t = − h ( ℓ ) 1 � 2 V ′ ( X t ) dt + h ( ℓ ) dB t � �� with h ( ℓ ) = 2 ℓ 2 Φ R ( V ′ ) 2 exp ( − V ) − ℓ . 2

Optimal scaling of the transient phase of Metropolis Hastings - PowerPoint PPT Presentation

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C:

Outline Side and covert channels Transient execution CSci 5271 Introduction to Computer

Transient Test Reactors Dr. Daniel M. Wachs National Technical Lead for Transient Testing Idaho

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Optimal Operation of Transient Gas Transport Networks Kai Hoppmann-Baum Combinatorial

Baryogenesis and ParticleAntiparticle Oscillations Seyda Ipek UC Irvine SI, John

Topic #10 Higher-order Systems Reference textbook : Control Systems, Dhanesh N. Manik, Cengage

Math 211 Math 211 Lecture #33 Harmonic Motion Inhomogeneous Equations April 9, 2001 2

Math 211 Math 211 Lecture #33 Harmonic Motion Inhomogeneous Equations November 13, 2002 2

Anomalous statistics of dynamical systems on networks Stefan Thurner

Overlay Networks Outline Tunneling Virtual Private Networks Routing Overlays PlanetLab Spring

Reading schedule Expect a fluid schedule: some papers may be cancelled (one week notice)

Computer Science II for Majors Lecture 09 Overloaded Operators and More Dr. Katherine Gibson

Optimal scaling of the transient phase of Metropolis Hastings - PowerPoint PPT Presentation

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C:

Outline Side and covert channels Transient execution CSci 5271 Introduction to Computer

Transient Test Reactors Dr. Daniel M. Wachs National Technical Lead for Transient Testing Idaho

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Optimal Operation of Transient Gas Transport Networks Kai Hoppmann-Baum Combinatorial

Baryogenesis and ParticleAntiparticle Oscillations Seyda Ipek UC Irvine SI, John

Topic #10 Higher-order Systems Reference textbook : Control Systems, Dhanesh N. Manik, Cengage

Math 211 Math 211 Lecture #33 Harmonic Motion Inhomogeneous Equations April 9, 2001 2

Math 211 Math 211 Lecture #33 Harmonic Motion Inhomogeneous Equations November 13, 2002 2

Anomalous statistics of dynamical systems on networks Stefan Thurner

Overlay Networks Outline Tunneling Virtual Private Networks Routing Overlays PlanetLab Spring

Reading schedule Expect a fluid schedule: some papers may be cancelled (one week notice)

Computer Science II for Majors Lecture 09 Overloaded Operators and More Dr. Katherine Gibson

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms