Direct Runge-Kutta Discretization Achieves Acceleration
Jingzhao Zhang , Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie NeurIPS 2018
This work is supported by DARPA Lagrange Program under grant No. FA 8650-18-2-7838
Direct Runge-Kutta Discretization Achieves Acceleration Jingzhao - - PowerPoint PPT Presentation
Direct Runge-Kutta Discretization Achieves Acceleration Jingzhao Zhang , Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie NeurIPS 2018 This work is supported by DARPA Lagrange Program under grant No. FA 8650-18-2-7838 Acceleration in first order
Jingzhao Zhang , Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie NeurIPS 2018
This work is supported by DARPA Lagrange Program under grant No. FA 8650-18-2-7838
2
Optimize smooth convex function:
Gradient Descent:
3
Optimize smooth convex function:
Gradient Descent:
4
Optimize smooth convex function:
Gradient Descent: Optimize smooth convex function:
Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]: Optimize smooth convex function:
Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]:
[SBC 2015]
7
Optimize smooth convex function:
[SBC 2015] Su, Weijie, Stephen Boyd, and Emmanuel Candes. "A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights." Advances in Neural Information Processing Systems. 2014.
Gradient Descent: Accelerated Gradient Descent [Nesterov 1983]:
[SBC 2015]
8
Optimize smooth convex function:
[SBC 2015] Su, Weijie, Stephen Boyd, and Emmanuel Candes. "A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights." Advances in Neural Information Processing Systems. 2014.
9
[WWJ 2016]
10 [WWJ 2016] Wibisono, A., Wilson, A. C., & Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47), E7351-E7358.
Arbitrary acceleration by change of variable
However, smooth convex optimization algorithms cannot achieve faster rate than:
[WWJ 2016]
11 [WWJ 2016] Wibisono, A., Wilson, A. C., & Jordan, M. I. (2016). A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47), E7351-E7358.
Arbitrary acceleration by change of variable
12
13
For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with
p-flat:
15
For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with
p-flat: Order-s: Discretization error scales as , h is the step size .
16
For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with
p-flat: Order-s: Discretization error scales as ; h is the step size.
17
For a p-flat, (s+2)-differentiable convex function, if we discretize the ODE with
Objective Integrator Rate L-smooth (p=2) RK44 (s=4) (p=4) Midpoint(s=2)
18