A Dynamical Systems Perspective on Nesterov Acceleration Michael - PowerPoint PPT Presentation

A Dynamical Systems Perspective on Nesterov Acceleration Michael Muehlebach and Michael I. Jordan UC Berkeley Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 1 / 7

Introduction Find x ∗ ∈ R n such that f ( x ∗ ) ≤ f ( x ) for all x ∈ R n , where f is smooth and convex. Focus on the case where f is strongly convex, i.e. f is convex and satisfies, for any x ∈ R n , ¯ x ) + L x | 2 , ∀ x ∈ R n . f ( x ) ≥ f (¯ x ) + ∇ f (¯ x )( x − ¯ 2 κ | x − ¯ L > 0 is the Lipschitz constant of the gradient. κ ≥ 1 is the condition number. Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 2 / 7

Dynamical Systems Perspective Consider the ordinary differential equation (ODE) ∇ f ( x ) x ( t ) + 1 ¨ x ( t ) + 2 d ˙ L ∇ f ( x ( t ) + β ˙ x ( t )) = 0 , with f NP √ κ − 1 1 √ κ + 1 , √ κ + 1 . d := β := The ODE can be brought to the form p ( t ) = − 1 q ( t ) = p ( t ) , ˙ ˙ L ∇ f ( q ( t )) + f NP ( q ( t ) , p ( t )) , where H ( q, p ) := 1 2 | p | 2 + 1 f NP ( q, p ) := − 2 dp − 1 Lf ( q ) , L ( ∇ f ( q + βp ) − ∇ f ( q )) . Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 3 / 7

Damping The non-potential forces can be rewritten as ∇ f ( x ) f NP ( q, p ) = − 2 dp − 1 L ( ∇ f ( q + βp ) − ∇ f ( q )) f NP � β − 1 = − 2 dp ∆ f ( q + τp )d τ p . L � �� 0 � �� isotropic curv. dependent damping damping 1 1 2 d β 0 . 8 0 . 8 0 . 6 0 . 6 2 d β 0 . 4 0 . 4 0 . 2 0 . 2 0 0 0 20 40 60 80 100 0 20 40 60 80 100 κ κ Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 4 / 7

Convergence Asymptotic stability (through dissipation). Convergence rate (upper bound, stated for p (0) = 0 ) f ( q ( t )) ≤ 2( f ( q (0)) − f ∗ ) exp( − 1 / (2 √ κ ) t ) , ∀ t ∈ [0 , ∞ ) . Convergence rate of O (1 /t 2 ) in the non-strongly convex case. Derivation is based on the following Lyapunov-like function (stated for x ∗ = f ( x ∗ ) = 0 ) V ( t ) = 1 2 | aq ( t ) + p ( t ) | 2 + 1 Lf ( q ( t )) . Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 5 / 7

Discretization Semi-implicit Euler discretization (with time step T s = 1 ) leads to the accelerated gradient method q k +1 = q k + T s p k +1 , p k +1 = p k + T s ( −∇ f ( q k ) − f NP ( q k , p k )) . What are the properties that are preserved through the discretization? ◮ phase-space area contraction rate (contraction for T s ∈ (0 , 2) ) ◮ time-reversibility (for T s ∈ (0 , 1) ) ⇒ yields a worst-case bound on the convergence rate p k p k +1 ◮ convergence rate (for T s ∈ (0 , 1] ) ψ ∂ Γ k ∂ Γ k +1 Γ k Γ k +1 q k +1 q k Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 6 / 7

Conclusion and Outlook We derived a dynamical system model for the accelerated gradient method. ◮ The dynamics have an interpretation as mass-spring-damper system. ◮ Discretization yields the accelerated gradient method. ◮ Certain key properties are preserved through the discretization. Is a symplectic discretization the “right” discretization? ◮ The behavior for large κ seems particularly important. Come to visit me at Poster 205. Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 7 / 7

A Dynamical Systems Perspective on Nesterov Acceleration Michael - PowerPoint PPT Presentation

A Dynamical Systems Perspective on Nesterov Acceleration Michael Muehlebach and Michael I. Jordan UC Berkeley Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 1 / 7 Introduction Find x R n such that f ( x )

Continuous orbit equivalence rigidity Xin Li Dynamical systems and operator algebras Dynamical

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 -

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Statistics of spike trains: A dynamical systems Statistics of spike trains: A dynamical systems

Lecture 5: Basic Dynamical Systems CS 344R/393R: Robotics Benjamin Kuipers Dynamical Systems

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

ANALYSIS of EUCLIDEAN ALGORITHMS An Arithmetical Instance of Dynamical Analysis Dynamical

ANALYSIS of EUCLIDEAN ALGORITHMS An Arithmetical Instance of Dynamical Analysis Dynamical

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Dynamical analysis of euclidean algorithms Introduction Dynamical analysis of euclidean

On the biases in AIRS retrieval of ozone (work in progress) AIRS Science Team Meeting - March 9,

Microsoft Research, Cambridge Joint work with Marc Brockschmidt, Mahmoud Khademi, Hamel Husain,

The Arveson-Douglas essential normality conjecture Matthew Kennedy Carleton University Aug. 3,

Linear Algegra Flux Fitting Dan Douglas Michigan State University October 25, 2018 D. Douglas

Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars Karl Bringmann and Philip Wellnitz

Residence-time distributions as a measure for stochastic resonance WIAS Berlin, Germany

An efficient reduced basis method for the stochastic Darcy flow model Craig Newsum University of

ON NUMERICAL UPSCALING FOR STOKES AND ON NUMERICAL UPSCALING FOR STOKES AND STOKES- -BRINKMAN

Sambuz

Useful Links

Newsletter

Mail Us