Stochastic Cubic Regularization for Fast Nonconvex Optimization
Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier and Michael I. Jordan
Achin Jain University of Pennsylvania STAT991, Spring 2019
Fast Nonconvex Optimization 1
Stochastic Cubic Regularization for Fast Nonconvex Optimization - - PowerPoint PPT Presentation
Stochastic Cubic Regularization for Fast Nonconvex Optimization Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier and Michael I. Jordan Achin Jain University of Pennsylvania STAT991, Spring 2019 Fast Nonconvex Optimization 1
Fast Nonconvex Optimization 1
Fast Nonconvex Optimization 2
Fast Nonconvex Optimization 3 1 – Motivation
x∈Rd f (x) := Eξ∈D [f (x; ξ)]
Fast Nonconvex Optimization 4 1 – Motivation
Fast Nonconvex Optimization 5 2 – Objectives
Fast Nonconvex Optimization 6 2 – Objectives
x
x
Fast Nonconvex Optimization 7 2 – Objectives
x
x
t (x − xt) + 1
Fast Nonconvex Optimization 8 2 – Objectives
x∈Rd f (x) := Eξ∈D [f (x; ξ)]
Fast Nonconvex Optimization 9 2 – Objectives
x∈Rd f (x) := Eξ∈D [f (x; ξ)]
Fast Nonconvex Optimization 10 2 – Objectives
Fast Nonconvex Optimization 11 3 – Algorithm
1,
2,
Fast Nonconvex Optimization 12 3 – Algorithm
x
Fast Nonconvex Optimization 13 3 – Algorithm
x
2∆TBt [∆] Hessian-vector product
6||∆||3,
Fast Nonconvex Optimization 14 3 – Algorithm
∆
Fast Nonconvex Optimization 15 3 – Algorithm
Fast Nonconvex Optimization 16 3 – Algorithm
Fast Nonconvex Optimization 17 3 – Algorithm
σ2
1
cM1 , σ4
2
c2M2
2 ρ}, Algorithm 1 will output an ǫ-second-order
1
2
Fast Nonconvex Optimization 18 3 – Algorithm
σ2
1
cM1 , σ4
2
c2M2
2 ρ}, then Algorithm 1 will output an
1
2
Fast Nonconvex Optimization 19 3 – Algorithm
Fast Nonconvex Optimization 20 3 – Algorithm
Fast Nonconvex Optimization 21 4 – Experiments
x∈R2
2
Fast Nonconvex Optimization 22 4 – Experiments
Fast Nonconvex Optimization 23 4 – Experiments
Fast Nonconvex Optimization 24 5 – References
Allen-Zhu, Z. (2018). Natasha 2: Faster non-convex optimization than SGD. In Advances in Neural Information Processing Systems, pages 2675–2686. Allen-Zhu, Z. and Li, Y. (2018). Neon2: Finding local minima via first-order oracles. In Advances in Neural Information Processing Systems, pages 3716–3726. Carmon, Y. and Duchi, J. C. (2016). Gradient descent efficiently finds the cubic-regularized non-convex newton step. arXiv preprint arXiv:1612.00547. Carmon, Y., Duchi, J. C., Hinder, O., and Sidford, A. (2018). Accelerated methods for nonconvex optimization. SIAM Journal on Optimization, 28(2):1751–1772. Ge, R., Huang, F., Jin, C., and Yuan, Y. (2015). Escaping from saddle points – online stochastic gradient for tensor decomposition. In Conference on Learning Theory, pages 797–842. Jin, C., Ge, R., Netrapalli, P., Kakade, S. M., and Jordan, M. I. (2017). How to escape saddle points efficiently. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1724–1732. JMLR. org.
Fast Nonconvex Optimization 25 5 – References
Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of newton method and its global performance. Mathematical Programming, 108(1):177–205. Tripuraneni, N., Stern, M., Jin, C., Regier, J., and Jordan, M. I. (2018). Stochastic cubic regularization for fast nonconvex optimization. In Advances in Neural Information Processing Systems, pages 2899–2908.
Fast Nonconvex Optimization 26 5 – References