Direct Runge-Kutta Discretization Achieves Acceleration Jingzhao - - PowerPoint PPT Presentation

direct runge kutta discretization achieves acceleration
SMART_READER_LITE
LIVE PREVIEW

Direct Runge-Kutta Discretization Achieves Acceleration Jingzhao - - PowerPoint PPT Presentation

Direct Runge-Kutta Discretization Achieves Acceleration Jingzhao Zhang , Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie NeurIPS 2018 This work is supported by DARPA Lagrange Program under grant No. FA 8650-18-2-7838 Acceleration in first order


slide-1
SLIDE 1

Direct Runge-Kutta Discretization Achieves Acceleration

Jingzhao Zhang , Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie NeurIPS 2018

This work is supported by DARPA Lagrange Program under grant No. FA 8650-18-2-7838

slide-2
SLIDE 2

Acceleration in first order convex optimization

2

Optimize smooth convex function:

slide-3
SLIDE 3

Acceleration in first order convex optimization

Gradient Descent:

3

Optimize smooth convex function:

slide-4
SLIDE 4

Acceleration in first order convex optimization

Gradient Descent:

4

Optimize smooth convex function:

slide-5
SLIDE 5

Acceleration in first order convex optimization

Gradient Descent: Optimize smooth convex function:

slide-6
SLIDE 6

Acceleration in first order convex optimization

Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]: Optimize smooth convex function:

slide-7
SLIDE 7

Acceleration in first order convex optimization

Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]:

[SBC 2015]

7

Optimize smooth convex function:

[SBC 2015] Su, Weijie, Stephen Boyd, and Emmanuel Candes. "A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights." Advances in Neural Information Processing Systems. 2014.

slide-8
SLIDE 8

Acceleration in first order convex optimization

Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]:

[SBC 2015]

8

Optimize smooth convex function:

[SBC 2015] Su, Weijie, Stephen Boyd, and Emmanuel Candes. "A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights." Advances in Neural Information Processing Systems. 2014.

slide-9
SLIDE 9

Convergence in continuous time

9

slide-10
SLIDE 10

Convergence in continuous time

[WWJ 2016]

10 [WWJ 2016] Wibisono, A., Wilson, A. C., & Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47), E7351-E7358.

Arbitrary acceleration by change of variable

slide-11
SLIDE 11

Convergence in continuous time

However, smooth convex optimization algorithms cannot achieve faster rate than:

[WWJ 2016]

11 [WWJ 2016] Wibisono, A., Wilson, A. C., & Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47), E7351-E7358.

Arbitrary acceleration by change of variable

slide-12
SLIDE 12

Question: How to relate the convergence rate in continuous time ODE to the convergence rate of a discrete optimization algorithm?

12

slide-13
SLIDE 13

Question: How to relate the convergence rate in continuous time ODE to the convergence rate of a discrete optimization algorithm? Our approach: Discretize the ODE with known Runge-Kutta integrators (e.g. Euler, midpoint, RK44) and provide theoretical guarantees for convergence rates.

13

slide-14
SLIDE 14

Main theorem:

For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with

  • rder-s Runge-Kutta integrator, we have
slide-15
SLIDE 15

Main theorem:

p-flat:

15

For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with

  • rder-s Runge-Kutta integrator, we have
slide-16
SLIDE 16

Main theorem:

p-flat: Order-s: Discretization error scales as , h is the step size .

16

For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with

  • rder-s Runge-Kutta integrator, we have
slide-17
SLIDE 17

Main theorem:

p-flat: Order-s: Discretization error scales as ; h is the step size.

17

For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with

  • rder-s Runge-Kutta integrator, we have

Objective Integrator Rate L-smooth (p=2) RK44 (s=4) (p=4) Midpoint(s=2)

slide-18
SLIDE 18

Our poster session: Thu Dec 6th 05:00 -- 07:00 PM Room 210 & 230 AB Poster Number: 9

18