a dynamical systems perspective on nesterov acceleration
play

A Dynamical Systems Perspective on Nesterov Acceleration Michael - PowerPoint PPT Presentation

A Dynamical Systems Perspective on Nesterov Acceleration Michael Muehlebach and Michael I. Jordan UC Berkeley Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 1 / 7 Introduction Find x R n such that f ( x )


  1. A Dynamical Systems Perspective on Nesterov Acceleration Michael Muehlebach and Michael I. Jordan UC Berkeley Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 1 / 7

  2. Introduction Find x ∗ ∈ R n such that f ( x ∗ ) ≤ f ( x ) for all x ∈ R n , where f is smooth and convex. Focus on the case where f is strongly convex, i.e. f is convex and satisfies, for any x ∈ R n , ¯ x ) + L x | 2 , ∀ x ∈ R n . f ( x ) ≥ f (¯ x ) + ∇ f (¯ x )( x − ¯ 2 κ | x − ¯ L > 0 is the Lipschitz constant of the gradient. κ ≥ 1 is the condition number. Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 2 / 7

  3. Dynamical Systems Perspective Consider the ordinary differential equation (ODE) ∇ f ( x ) x ( t ) + 1 ¨ x ( t ) + 2 d ˙ L ∇ f ( x ( t ) + β ˙ x ( t )) = 0 , with f NP √ κ − 1 1 √ κ + 1 , √ κ + 1 . d := β := The ODE can be brought to the form p ( t ) = − 1 q ( t ) = p ( t ) , ˙ ˙ L ∇ f ( q ( t )) + f NP ( q ( t ) , p ( t )) , where H ( q, p ) := 1 2 | p | 2 + 1 f NP ( q, p ) := − 2 dp − 1 Lf ( q ) , L ( ∇ f ( q + βp ) − ∇ f ( q )) . Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 3 / 7

  4. Damping The non-potential forces can be rewritten as ∇ f ( x ) f NP ( q, p ) = − 2 dp − 1 L ( ∇ f ( q + βp ) − ∇ f ( q )) f NP � β − 1 = − 2 dp ∆ f ( q + τp )d τ p . L � �� � 0 � �� � isotropic curv. dependent damping damping 1 1 2 d β 0 . 8 0 . 8 0 . 6 0 . 6 2 d β 0 . 4 0 . 4 0 . 2 0 . 2 0 0 0 20 40 60 80 100 0 20 40 60 80 100 κ κ Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 4 / 7

  5. Convergence Asymptotic stability (through dissipation). Convergence rate (upper bound, stated for p (0) = 0 ) f ( q ( t )) ≤ 2( f ( q (0)) − f ∗ ) exp( − 1 / (2 √ κ ) t ) , ∀ t ∈ [0 , ∞ ) . Convergence rate of O (1 /t 2 ) in the non-strongly convex case. Derivation is based on the following Lyapunov-like function (stated for x ∗ = f ( x ∗ ) = 0 ) V ( t ) = 1 2 | aq ( t ) + p ( t ) | 2 + 1 Lf ( q ( t )) . Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 5 / 7

  6. Discretization Semi-implicit Euler discretization (with time step T s = 1 ) leads to the accelerated gradient method q k +1 = q k + T s p k +1 , p k +1 = p k + T s ( −∇ f ( q k ) − f NP ( q k , p k )) . What are the properties that are preserved through the discretization? ◮ phase-space area contraction rate (contraction for T s ∈ (0 , 2) ) ◮ time-reversibility (for T s ∈ (0 , 1) ) ⇒ yields a worst-case bound on the convergence rate p k p k +1 ◮ convergence rate (for T s ∈ (0 , 1] ) ψ ∂ Γ k ∂ Γ k +1 Γ k Γ k +1 q k +1 q k Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 6 / 7

  7. Conclusion and Outlook We derived a dynamical system model for the accelerated gradient method. ◮ The dynamics have an interpretation as mass-spring-damper system. ◮ Discretization yields the accelerated gradient method. ◮ Certain key properties are preserved through the discretization. Is a symplectic discretization the “right” discretization? ◮ The behavior for large κ seems particularly important. Come to visit me at Poster 205. Michael Muehlebach and Michael I. Jordan Dynamical Systems Perspective 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend