Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas - PowerPoint PPT Presentation

Diffusions and their numerical approximation Applications of Langevin algorithms Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin Dynamics

Diffusions and their numerical approximation Applications of Langevin algorithms Introduction Sampling distribution over high-dimensional space is an important topic in computational statistics and machine learning Example of application : Bayesian inference for high-dimensional models Problems : Most of sampling techniques do not scale to high-dimension. 1 Big d . And to large number of data (recall HMC, need the full 2 gradient). Big N . Loucas Pillaud-Vivien Langevin Dynamics

Diffusions and their numerical approximation Applications of Langevin algorithms Example: Bayesian setting A Bayesian model is specified by: sampling distribution of observed data: likelihood Y ∼ L ( ·| θ ) 1 a prior distribution p on the parameter space θ ∈ R d 2 The inference is based on the posterior distribution π ( d θ ) = p ( d θ ) L ( Y | θ ) � L ( Y | u ) p ( du ) The normalizing constant is often not tractable (too high dimensional), we can only compute: π ( d θ ) ∝ p ( d θ ) L ( Y | θ ) Loucas Pillaud-Vivien Langevin Dynamics

Diffusions and their numerical approximation Applications of Langevin algorithms Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Framework We want to sample the following measure that has a density w.r.t Lebesgue known up to a normalization factor. e − V ( x ) dx d µ ( x ) = � R d e − V ( y ) dy We assume that V is L -smooth : i.e. continuously differentiable and ∃ L > 0 s.t. �∇ V ( x ) − ∇ V ( y ) � � L � x − y � Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Let us consider the overdamped Langevin diffusion in R d : √ dX t = −∇ V ( X t ) dt + 2 dB t , L -smoothness of V gives existence and unicity of a solution e − V ( x ) dx Stationnary measure : d µ ( x ) = R d e − V ( y ) dy . � Semi-group : P t ( f )( x ) = E [ f ( X t ) | X 0 = x ] − → ”law of X t ”. Infinitesimal generator : L φ = ∆ φ − ∇ V · ∇ φ . We can verify that the semi-group follows the dynamics: d dt P t ( f ) = L P t ( f ) . − → Question : what speed of convergence then??? ? Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Convergence to equilibrium for Diffusions Theorem (Poincar´ e implies convergence to equilibrium) With the notations above, the following propositions are equivalent: µ satisfies a Poincar´ e Inequality with constant P For all f smooth, Var µ ( P t ( f )) � e − 2 t / P Var µ ( f ) for all t � 0 . Proof: Integration by part formula ( µ is reversible), � � � − f ( L g ) d µ = ∇ f · ∇ g d µ = − ( L f ) g d µ, hence, dt Var µ ( P t ( f )) = d d � � ( P t ( f )) 2 d µ = 2 P t ( f )( L P t ( f )) d µ dt � �∇ P t ( f ) � 2 d µ = − 2 � − 2 / P Var µ ( P t ( f )) Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities: definition in modern language Definition (Poincar´ e inequality) µ ∈ P ( R d ) satisfies a Poincar´ e Inequality with constant P if � �∇ f � 2 d µ, Var µ ( f ) � P µ for all (bounded) f : R d − → R of class C 1 . Recall that : � 2 � 2 �� f 2 d µ − f − Var µ ( f ) = fd µ = fd µ d µ � �∇ f � 2 d µ = E ( f ) is the Dirichlet Energy . � � Spectral interpretation: E ( f ) = ∇ f · ∇ fd µ = f ( −L f ) d µ − → 1 / P = λ 2 , first non-trivial eigenvalue of L . Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Application to the Ornstein-Uhlenbeck process The diffusion of the Ornstein-Uhlenbeck process follows the SDE in R d : √ dX t = − X t dt + 2 dB t , Denote L the operator L φ = ∆ φ − x · ∇ φ , then (2 π ) d / 2 e −� x � 2 / 2 dx , L is self adjoint in L 2 1 1 For d µ ( x ) = µ 2 µ stationnary measure of O-U process 3 µ verifies Poincar´ e inequality with constant 1. 4 for all f smooth, for all t � 0 . Var µ ( P t ( f )) � e − 2 t Var µ ( f ) . Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities Long story short: Poincar´ e inequality ⇐ ⇒ Spectral gap for L ⇐ ⇒ Exponential convergence for the diffusion Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Poincar´ e inequalities For what distribution do they occur? When V is m -stongly convex: P = 1 / m (linear convergence of gradient descent) When V is only convex: yes but no bound... A generic condition for non necessarily convex potential : 1 2 |∇ V | 2 − ∆ V � α For mixture of Gaussian P explodes exponentially. Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Ok, fine. But how do I get back to the real world and draw samples ? Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion Idea: Sample the diffusion paths, using Euler-Maruyama scheme √ dX t = −∇ V ( X t ) dt + 2 dB t X k +1 = X k − γ k +1 ∇ V ( X k ) + � 2 γ k +1 ξ k +1 where ( ξ k ) k is i.i.d N (0 , I d ) ( γ k ) k is a sequence of stepsizes, either constant or decreasing to 0 Note the similarity with gradient descent or its stochastic counterpart. This algorithm is referred to Unajusted Langevin Algorithm , Langevin Monte Carlo or Gradient Langevin Dynamics . Loucas Pillaud-Vivien Langevin Dynamics

Setting Diffusions and their numerical approximation Continuous time Markov process: diffusions Applications of Langevin algorithms Discretized Langevin diffusion Discretized Langevin Diffusion: constant stepsize When ∀ k , γ k = γ , then ( X k ) k is an homogeneous Markov chain with Markov kernel R γ Under some mild assumptions R γ is irreducible , positive recurrent and hence has an invariant distribution d µ γ � = d µ . Typical questions: For a given precision how do we choose the stepsize γ and the number of iterations such that dist ( δ x R n γ , d µ ) � ǫ How do we choose x ? How do we quantify dist ( d µ γ , d µ ) ? Loucas Pillaud-Vivien Langevin Dynamics

Sampling a strongly convex potential Diffusions and their numerical approximation Stochastic Gradient Langevin Dynamics Applications of Langevin algorithms Non convex Learning via SGLD Outline Diffusions and their numerical approximation 1 Setting Continuous time Markov process: diffusions Discretized Langevin diffusion Applications of Langevin algorithms 2 Sampling a strongly convex potential Stochastic Gradient Langevin Dynamics Non convex Learning via SGLD Loucas Pillaud-Vivien Langevin Dynamics

Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas - PowerPoint PPT Presentation

Diffusions and their numerical approximation Applications of Langevin algorithms Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin Dynamics Diffusions and their numerical approximation Applications of

Neutrograph T. Pirling Institut Laue Langevin INSTITUT MAX VON LAUE - PAUL LANGEVIN Camera

Langevin dynamics and charmonium in sQGP Clint Young, SUNY Stony Brook at the CATHIE-INT

Langevin type dynamics for continuous and discrete systems Lukas Kades Cold Quantum Coffee -

Langevin equation equation for for a a system system Langevin nonlinearly coupled coupled to

STOCHASTIC PROXIMAL LANGEVIN ALGORITHM Adil Salim Joint work with Dmitry Kovalev and Peter

A path integral approach to the Langevin equation - Ashok Das Reference: A path integral

Non-asymptotic convergence bound for the Langevin MCMC Algorithm Alain Durmus, Eric Moulines,

Adaptive Langevin Algorithms for Canonical Sampling with Noisy Forces in Scale-bridging

Contrastive Divergence by Accelerated Langevin Dynamics . . . Masayuki Ohzeki Kyoto

FRACTIONAL UNDERDAMPED LANGEVIN DYNAMICS: Umut im ekli LTCI, Tlcom Paris, RETARGETING

Langevin dynamics in a deep belief network Stefanie Czischek Cold Quantum Coffee L. Kades, J.

Complex Langevin dynamics and the sign problem GGI 2012 Gert Aarts GGI, September 2012 p. 1

Mirrored Langevin Dynamics Ya-Ping Hsieh https://lions.epfl.ch Laboratory for Information and

PDE methods for statistical physics Julien Roussel Cermics, ENPC Equipe-projet INRIA Matherials

Complex Langevin Dynamics in 1+1D QCD at finite densities SIGN workshop Sebastian Schmalzbauer

Small Mass Limit of a Langevin Equation on a Manifold Jeremiah Birrell Department of Mathematics

Probabilistic Models and Their Verification David N. Jansen Informatics for Technical

The deviation matrix, Poissons equation, and QBDs Guy Latouche Universit e libre de

Self-similar scaling limits of Markov chains on the positive integers 1.2 1 0.8 0.6 0.4 0.2

Chapter 5. Continuous-Time Markov Chains Prof. Shun-Ren Yang Department of Computer Science,

Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm 22 nd

Interpreting Models for Categorical and Count Outcomes Rose Medeiros StataCorp LLC Stata

Bayesian Updating: Continuous Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Problem 6a

Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables