Non-asymptotic convergence bound for the Langevin MCMC Algorithm - PowerPoint PPT Presentation

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Non-asymptotic convergence bound for the Langevin MCMC Algorithm Alain Durmus, Eric Moulines, Marcelo Pereyra, Umut S ¸im¸ sekli Telecom ParisTech, Ecole Polytechnique, Bristol University January 27, 2017 Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) 1 Motivation 2 Framework 3 Strongly log-concave distribution 4 Convex and Super-exponential densities 5 Non-smooth potentials 6 The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Introduction Sampling distribution over high-dimensional state-space has recently attracted a lot of research efforts in computational statistics and machine learning community... Applications (non-exhaustive) 1 Bayesian inference for high-dimensional models 2 Aggregation of estimators and predictors 3 Bayesian non parametrics (function space) 4 Bayesian linear inverse problems (function space) Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Introduction ”Classical” MCMC algorithms do not scale to high-dimension. However, the possibility of sampling high-dimensional distribution has been demonstrated in several fields (in particular, molecular dynamics) with specially tailored algorithms Our objective: Propose (or rather analyse) sampling algorithm that can be used for some challenging high-dimensional problems with a Machine Learning flavour. Challenges are numerous in this area... Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Illustration Likelihood: Binary regression set-up in which the binary observations (responses) ( Y 1 , . . . , Y n ) are conditionally independent Bernoulli β T X i ) , where β random variables with success probability F ( β 1 X i is a d dimensional vector of known covariates, 2 β β β is a d dimensional vector of unknown regression coefficient 3 F is a distribution function. Two important special cases: 1 probit regression: F is the standard normal distribution function, 2 logistic regression: F is the standard logistic distribution function: F ( t ) = e t / (1 + e t ) Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Bayesian inference for binary regression? The posterior density distribution of β β β is given, up to a proportionality constant by π ( β β | ( Y, X )) ∝ exp( − U ( β β β β )) with p β T X i )+(1 − Y i ) log(1 − F ( β β T X i )) } +g( β � U ( β β β ) = − { Y i log F ( β β β β β ) , i =1 where g is the log density of the posterior distribution. Two important cases: β T Σ β β β β Gaussian prior g( β β ) = (1 / 2) β β : ridge penalty. β ) = λ � d β i =1 | β β β i | : LASSO penalty. Laplace prior g( β Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) New challenges Beware ! the number of predictor variables d is large ( 10 4 and up). - text categorization, - genomics and proteomics (gene expression analysis), - other data mining tasks (recommendations, longitudinal clinical trials, ..). Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) State of the art The most popular algorithms for Bayesian inference in binary regression models are based on data augmentation β β | ( X, Y )) sample π ( β β β, W | ( X, Y )) probability Instead on sampling π ( β measure on R d 1 × R d 2 and take the marginal w.r.t. β β β . Typical application of the Gibbs sampler: sample in turn β β | ( X, Y, W )) and π ( W | ( X, Y,β β π ( β β )) . The choice of the DA should make these two steps reasonably easy... - probit link: Albert and Chib (1993). - logistic link: Polya-Gamma sampler, Polsson and Scott (2012)... ! Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) State of the art: shortcomings The Albert and Chib DA probit DA algorithm and the Polya-Gamma sampler have been shown to be uniformly geometrically ergodic, BUT - The geometric rate of convergence is exponentially small with the dimension - Do not allow to construct honest confidence intervals, credible regions The algorithms are very demanding in terms of computational ressources... - applicable only when is d small 10 to moderate 100 but certainly not when d is large ( 10 4 or more). - convergence time prohibitive as soon as d ≥ 10 2 . Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) A daunting problem ? In the case of the ridge regression, the potential U is smooth strongly convex. In the case of the lasso regression, the potential U is non-smooth but still convex... A wealth of reasonably fast optimisation algorithms are available to solve this problem in high-dimension... Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) 1 Motivation 2 Framework 3 Strongly log-concave distribution 4 Convex and Super-exponential densities 5 Non-smooth potentials 6 The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Framework Denote by π a target density w.r.t. the Lebesgue measure on R d , known up to a normalisation factor � x �→ e − U ( x ) / R d e − U ( y ) d y , Implicitly, d ≫ 1 . Assumption: U is L -smooth : twice continuously differentiable and there exists a constant L such that for all x, y ∈ R d , �∇ U ( x ) − ∇ U ( y ) � ≤ L � x − y � . Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Langevin diffusion (overdamped) Langevin SDE: √ d Y t = −∇ U ( Y t )d t + 2d B t , where ( B t ) t ≥ 0 is a d -dimensional Brownian Motion. Notation: ( P t ) t ≥ 0 the Markov semigroup associated to the Langevin diffusion: π ∝ e − U is reversible ❀ the unique invariant probability measure.. Key property: For all x ∈ R d , t → + ∞ � δ x P t − π � TV = 0 . lim Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Discretized Langevin diffusion Idea: Sample the diffusion paths, using the Euler-Maruyama (EM) scheme: � X k +1 = X k − γ k +1 ∇ U ( X k ) + 2 γ k +1 Z k +1 where - ( Z k ) k ≥ 1 is i.i.d. N (0 , I d ) - ( γ k ) k ≥ 1 is a sequence of stepsizes, which can either be held constant or be chosen to decrease to 0 at a certain rate. Closely related to the gradient descent algorithm. Von Dantzig Seminar, Amsterdam

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Discretized Langevin diffusion: constant stepsize When γ k = γ , then ( X k ) k ≥ 1 is an homogeneous Markov chain with Markov kernel R γ Under some appropriate conditions, this Markov chain is irreducible, positive recurrent ❀ unique invariant distribution π γ . Problem: the limiting distribution of the discretization π γ does not coincide with the target distribution π . Questions: Can we quantify the distance between π γ and π , e.g. a bound for � π γ − π � TV with explicit dependence in the dimension ? Given a computational budget, is there an optimal trade-off between the ”mixing” rate ( � δ x R γ − π γ � TV ) and the bias ( � π γ − π � TV ) ? Von Dantzig Seminar, Amsterdam

Non-asymptotic convergence bound for the Langevin MCMC Algorithm - PowerPoint PPT Presentation

Motivation Framework Strongly log-concave distribution Convex and Super-exponential densities Non-smooth potentials The Unadjusted Langevin Algorithm within Gibbs (ULAwG) Non-asymptotic convergence bound for the Langevin MCMC Algorithm Alain

Neutrograph T. Pirling Institut Laue Langevin INSTITUT MAX VON LAUE - PAUL LANGEVIN Camera

Non-asymptotic convergence bound for the Unadjusted Langevin Algorithm Alain Durmus, Eric

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization Thanh Huy

Langevin Dynamics Loucas Pillaud-Vivien November 7, 2019 Loucas Pillaud-Vivien Langevin

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

An Introduction to Asymptotic Theory Ping Yu School of Economics and Finance The University of

Langevin equation equation for for a a system system Langevin nonlinearly coupled coupled to

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

New Langevin based algorithms for MCMC in high dimensions Alain Durmus Joint work with Gareth O.

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to

T he Asymmetric leader election algorithm: number of survivors near the end of the game Guy

Large fringe and non-fringe subtrees in conditional Galton-Watson trees Xing Shi Cai, Luc Devroye

2BSDEs with Continuous Coefficients Dylan POSSAMAI Ecole Polytechnique Paris New advances in

Balance indices for phylogenetic trees under well-known probability models Universitat de les

Bayesian non parametric inference of discrete valued networks L. Nouedoui, P . Latouche

Regularized Multi-Class Semi-Supervised Boosting Amir Saffari, Christian Leistner, Horst Bischof

Efficient Bayesian inference for Copula Gaussian graphical models A. Mohammadi, F. Abegaz and E.