optimal scaling of the transient phase of metropolis
play

Optimal scaling of the transient phase of Metropolis Hastings - PowerPoint PPT Presentation

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre


  1. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre Ecole des Ponts and INRIA Joint work with B. Jourdain and B. Miasojedow MCMSki, Chamonix, 8 January 2014

  2. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Outline of the talk Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

  3. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

  4. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Metropolis Hastings algorithm The aim of the MH algorithm is to sample a target probability measure, say with density p on R n . Algorithm: iterate on k ≥ 0, • Proposition: At time k , given X n k , propose a move to ˆ X n k + 1 ∼ q ( X n k , y ) dy , where q ( x , y ) Markov density kernel on R n , k + 1 = ˆ • Acception/Rejection: Accept the move ( X n X n k + 1 ) with k , ˆ probability α ( X n X n k ) , where α ( x , y ) := p ( y ) q ( y , x ) p ( x ) q ( x , y ) ∧ 1 . Otherwise, reject the move ( X n k + 1 = X n k ). ( X n k ) k ≥ 0 is a reversible Markov chain wrt p ( x ) dx . The efficiency of the algorithm crucially depends on the choice of the proposal distribution q .

  5. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Metropolis Hastings algorithm In the following, we focus on the Gaussian random walk proposal (RWM): • ˆ X n k + 1 = X n k + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) � � − | x − y | 2 1 • q ( x , y ) = ( 2 πσ 2 ) n / 2 exp = q ( y , x ) 2 σ 2 • Acceptance probability α ( x , y ) = p ( y ) p ( x ) ∧ 1. Another standard choice: one step of overdamped Langevin (MALA): k + σ 2 • ˆ X n k + 1 = X n 2 ( ∇ ln p )( X n k ) + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) • q ( x , y ) � = q ( y , x ) . Question: How to choose σ as a function of the dimension n ?

  6. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Previous work: Roberts, Gelman, Gilks 97 Two fundamental assumptions: • (H1) Product target: p ( x ) = p ( x 1 , . . . , x n ) = � n i = 1 e − V ( x i ) , 0 = ( X 1 , n , . . . , X n , n • (H2) Stationarity: X n ) ∼ p ( x ) dx and thus 0 0 k = ( X 1 , n , . . . , X n , n ∀ k , X n ) ∼ p ( x ) dx . k k Then, pick the first component X 1 , n , choose k ℓ √ n , σ n = and rescale the time accordingly (diffusive scaling) by considering ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 . ( d ) Under regularity assumptions on V , as n → ∞ , ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 ⇒ ( X t ) t ≥ 0 unique solution of the SDE dX t = − h ( ℓ ) 1 � 2 V ′ ( X t ) dt + h ( ℓ ) dB t , ℓ √ � � � R ( V ′ ) 2 exp ( − V ) � x −∞ e − y 2 where h ( ℓ ) = 2 ℓ 2 Φ dy − with Φ( x ) = 2 π . √ 2 2

  7. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Previous work: Roberts, Gelman, Gilks 97 Practical counterparts: (i) scaling of the variance proposal, (ii) scaling of the number of iterations. Question: How to choose ℓ ? ℓ √ � � � R ( V ′ ) 2 exp ( − V ) • The function ℓ �→ h ( ℓ ) = 2 ℓ 2 Φ − is maximum 2 at ℓ ⋆ ≃ 2 . 38 √ � R ( V ′ ) 2 exp ( − V ) . • Besides, the limiting average acceptance rate is � i = 1 ( V ( x i ) − V ( y i )) ∧ 1 � n q ( x , y ) e − � n k , ˆ E [ α ( X n X n i = 1 V ( x i ) dxdy k + 1 )] = R n × R n e � �� � α ( x , y ) ��   R ( V ′ ) 2 exp ( − V ) ℓ  ∈ ( 0 , 1 ) . − → n →∞ acc ( ℓ ) = 2 Φ  − 2 Observe that acc ( ℓ ⋆ ) ≃ 0 . 234, whatever V . This justifies a constant acceptance rate strategy, with a target acceptance rate of approximately 25 % .

  8. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm A few references • (H1) + (H2) and various proposals: Gaussian RWM Roberts Gelman Gilks 1997 , MALA Roberts Rosenthal 1997 , nonGaussian RWM Neal Roberts 2011 , RWM discontinuous target Neal Roberts Yuen 2012 , Mutiple try MCMC B´ edard Douc Moulines 2012 , Delayed rejection MCMC B´ edard Douc Moulines 2013 , Hybrid Monte Carlo Beskos Pillai Roberts Sanz-Serna Stuart 2013 . • Beyond (H1): i. but non i.d. components RWM B´ edard 2007,2009 ; finite range interactions Breyer Roberts 2000 ; mean-field interaction Breyer Piccioni Scarlatti 2004 ; density w.r.t. i.i.d. Beskos Roberts Stuart 2009 ; infinite-dimensional target with density w.r.t. Gaussian field RWM Mattingly, Pillai, Stuart 2012 , MALA Pillai, Stuart, Thiery 2012 . • Beyond (H2): Christensen, Roberts, Rosenthal 2005 Partial results for RWM and MALA with Gaussian target, Pillai, Stuart, Thiery 2013 modified RWM for infinite-dimensional target with density w.r.t. Gaussian field. Aim of this work: Study of the limit n → ∞ without the stationarity assumption (H2).

  9. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

  10. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm The limit n → ∞ without (H2) We consider the RWMH with target p ( x ) = � n i = 1 exp ( − V ( x i )) : ( G i k ) i , k ≥ 1 are i.i.d. ∼ N 1 ( 0 , 1 ) independent of ( U k ) k ≥ 1 i.i.d. ∼ U [ 0 , 1 ] , and  X i , n k + 1 = X i , n √ n G i ℓ + k + 1 1 A k + 1 , 1 ≤ i ≤ n ,  k � k + 1 )) � i = 1 ( V ( X i , n k ) − V ( X i , n � n k + ℓ √ n G i with A k + 1 = U k + 1 ≤ e .  From now on, we assume that V is C 3 with V ′′ and V ( 3 ) bounded. Theorem Assume that � R ( V ′ ) 4 ( x ) m ( dx ) < + ∞ , 1. m is a probability measure on R s.t. 2. ∀ n ≥ 1 , X 1 , n , . . . , X n , n are i.i.d. according to m. 0 0 Then the process ( X 1 , n ⌊ nt ⌋ ) t ≥ 0 converges in distribution to the unique solution of the SDE nonlinear in the sense of McKean: X 0 ∼ m, dX t = −G ( a ( t ) , b ( t )) V ′ ( X t ) dt + Γ 1 / 2 ( a ( t ) , b ( t )) dB t with a ( t ) = E [( V ′ ( X t )) 2 ] , b ( t ) = E [ V ′′ ( X t )] , and...

  11. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm The functions Γ and G 2 √ a − √ a  � � � � �� ℓ 2 ( a − b ) ℓ 2 Φ − ℓ b + ℓ 2 e b Φ ℓ if a ∈ ( 0 , + ∞ ) , 2 √ a  2   ℓ 2 Γ( a , b ) = 2 if a = + ∞ ,   ℓ 2 e − ℓ 2 b + where b + = max ( b , 0 ) if a = 0 ,  2  2 √ a − √ a � � �� ℓ 2 ( a − b ) ℓ 2 e b if a ∈ ( 0 , + ∞ ) ,  Φ ℓ 2 G ( a , b ) = 0 if a = + ∞ and 1 { b > 0 } ℓ 2 e − ℓ 2 b 2 if a = 0 . 

  12. Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Remarks • Limiting acceptance rate: t �→ P ( A ⌊ nt ⌋ ) converges to t �→ acc ( a ( t ) , b ( t )) where a ( t ) = E [( V ′ ( X t )) 2 ] , b ( t ) = E [ V ′′ ( X t )] and acc ( a , b ) = 1 ℓ 2 Γ( a , b ) . • Stationary case: If m ( dx ) = e − V ( x ) dx , then ∀ t ≥ 0 X t ∼ e − V ( x ) dx � � R V ′ ( − e − V ) ′ = and a ( t ) = E [( V ′ ( X t )) 2 ] = R V ′ ( V ′ e − V ) = � R V ′′ e − V = E [ V ′′ ( X t )] = b ( t ) are constant. Using the fact that − ℓ √ a / 2 � � for a > 0, Γ( a , a ) = 2 G ( a , a ) = 2 ℓ 2 Φ , we are back to the dynamics dX t = − h ( ℓ ) 1 � 2 V ′ ( X t ) dt + h ( ℓ ) dB t � �� � with h ( ℓ ) = 2 ℓ 2 Φ R ( V ′ ) 2 exp ( − V ) − ℓ . 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend