langevin dynamics
play

Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas - PowerPoint PPT Presentation

Diffusions and their numerical approximation Applications of Langevin algorithms Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin Dynamics Diffusions and their numerical approximation Applications of


  1. Diffusions and their numerical approximation Applications of Langevin algorithms Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin Dynamics

  2. Diffusions and their numerical approximation Applications of Langevin algorithms Introduction Sampling distribution over high-dimensional space is an important topic in computational statistics and machine learning Example of application : Bayesian inference for high-dimensional models Problems : Most of sampling techniques do not scale to high-dimension. 1 Big d . And to large number of data (recall HMC, need the full 2 gradient). Big N . Loucas Pillaud-Vivien Langevin Dynamics

  3. Diffusions and their numerical approximation Applications of Langevin algorithms Example: Bayesian setting A Bayesian model is specified by: sampling distribution of observed data: likelihood Y ∼ L ( ·| θ ) 1 a prior distribution p on the parameter space θ ∈ R d 2 The inference is based on the posterior distribution π ( d θ ) = p ( d θ ) L ( Y | θ ) � L ( Y | u ) p ( du ) The normalizing constant is often not tractable (too high dimensional), we can only compute: π ( d θ ) ∝ p ( d θ ) L ( Y | θ ) Loucas Pillaud-Vivien Langevin Dynamics

  4. Diffusions and their numerical approximation Applications of Langevin algorithms Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics

  5. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Framework We want to sample the following measure that has a density w.r.t Lebesgue known up to a normalization factor. e − V ( x ) dx d µ ( x ) = � R d e − V ( y ) dy We assume that V is L -smooth : i.e. continuously differentiable and ∃ L > 0 s.t. �∇ V ( x ) − ∇ V ( y ) � � L � x − y � Loucas Pillaud-Vivien Langevin Dynamics

  6. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Let us consider the overdamped Langevin diffusion in R d : √ dX t = −∇ V ( X t ) dt + 2 dB t , L -smoothness of V gives existence and unicity of a solution e − V ( x ) dx Stationnary measure : d µ ( x ) = R d e − V ( y ) dy . � Semi-group : P t ( f )( x ) = E [ f ( X t ) | X 0 = x ] − → ”law of X t ”. Infinitesimal generator : L φ = ∆ φ − ∇ V · ∇ φ . We can verify that the semi-group follows the dynamics: d dt P t ( f ) = L P t ( f ) . − → Question : what speed of convergence then??? ? Loucas Pillaud-Vivien Langevin Dynamics

  7. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Theorem (Poincar´ e implies convergence to equilibrium) With the notations above, the following propositions are equivalent: µ satisfies a Poincar´ e Inequality with constant P For all f smooth, Var µ ( P t ( f )) � e − 2 t / P Var µ ( f ) for all t � 0 . Proof: Integration by part formula ( µ is reversible), � � � − f ( L g ) d µ = ∇ f · ∇ g d µ = − ( L f ) g d µ, hence, dt Var µ ( P t ( f )) = d d � � ( P t ( f )) 2 d µ = 2 P t ( f )( L P t ( f )) d µ dt � �∇ P t ( f ) � 2 d µ = − 2 � − 2 / P Var µ ( P t ( f )) Loucas Pillaud-Vivien Langevin Dynamics

  8. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities: definition in modern language Definition (Poincar´ e inequality) µ ∈ P ( R d ) satisfies a Poincar´ e Inequality with constant P if � �∇ f � 2 d µ, Var µ ( f ) � P µ for all (bounded) f : R d − → R of class C 1 . Recall that : � 2 � 2 �� � � � � f 2 d µ − f − Var µ ( f ) = fd µ = fd µ d µ � �∇ f � 2 d µ = E ( f ) is the Dirichlet Energy . � � Spectral interpretation: E ( f ) = ∇ f · ∇ fd µ = f ( −L f ) d µ − → 1 / P = λ 2 , first non-trivial eigenvalue of L . Loucas Pillaud-Vivien Langevin Dynamics

  9. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Application to the Ornstein-Uhlenbeck process The diffusion of the Ornstein-Uhlenbeck process follows the SDE in R d : √ dX t = − X t dt + 2 dB t , Denote L the operator L φ = ∆ φ − x · ∇ φ , then (2 π ) d / 2 e −� x � 2 / 2 dx , L is self adjoint in L 2 1 1 For d µ ( x ) = µ 2 µ stationnary measure of O-U process 3 µ verifies Poincar´ e inequality with constant 1. 4 for all f smooth, for all t � 0 . Var µ ( P t ( f )) � e − 2 t Var µ ( f ) . Loucas Pillaud-Vivien Langevin Dynamics

  10. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities Long story short: Poincar´ e inequality ⇐ ⇒ Spectral gap for L ⇐ ⇒ Exponential convergence for the diffusion Loucas Pillaud-Vivien Langevin Dynamics

  11. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) Loucas Pillaud-Vivien Langevin Dynamics

  12. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... Loucas Pillaud-Vivien Langevin Dynamics

  13. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α Loucas Pillaud-Vivien Langevin Dynamics

  14. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α For mixture of Gaussian P explodes exponentially. Loucas Pillaud-Vivien Langevin Dynamics

  15. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Ok, fine. But how do I get back to the real world and draw samples ? Loucas Pillaud-Vivien Langevin Dynamics

  16. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion Idea: Sample the diffusion paths, using Euler-Maruyama scheme √ dX t = −∇ V ( X t ) dt + 2 dB t X k +1 = X k − γ k +1 ∇ V ( X k ) + � 2 γ k +1 ξ k +1 where ( ξ k ) k is i.i.d N (0 , I d ) ( γ k ) k is a sequence of stepsizes, either constant or decreasing to 0 Note the similarity with gradient descent or its stochastic counterpart. This algorithm is referred to Unajusted Langevin Algorithm , Langevin Monte Carlo or Gradient Langevin Dynamics . Loucas Pillaud-Vivien Langevin Dynamics

  17. Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion: constant stepsize When ∀ k , γ k = γ , then ( X k ) k is an homogeneous Markov chain with Markov kernel R γ Under some mild assumptions R γ is irreducible , positive recurrent and hence has an invariant distribution d µ γ � = d µ . Typical questions: For a given precision how do we choose the stepsize γ and the number of iterations such that dist ( δ x R n γ , d µ ) � ǫ How do we choose x ? How do we quantify dist ( d µ γ , d µ ) ? Loucas Pillaud-Vivien Langevin Dynamics

  18. Sampling a strongly convex potential Diffusions and their numerical approximation Stochastic Gradient Langevin Dynamics Applications of Langevin algorithms Non convex Learning via SGLD Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend