Stochastic Composite Optimization:
Variance Reduction, Acceleration, and Robustness to Noise Andrei Kulunchakov, Julien Mairal
Inria Grenoble
ML in the real world, Criteo
Julien Mairal Stochastic Composite Optimization 1/24
Stochastic Composite Optimization: Variance Reduction, Acceleration, - - PowerPoint PPT Presentation
Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise Andrei Kulunchakov, Julien Mairal Inria Grenoble ML in the real world, Criteo Julien Mairal Stochastic Composite Optimization 1/24 Publications
Julien Mairal Stochastic Composite Optimization 1/24
Julien Mairal Stochastic Composite Optimization 2/24
x∈Rp{F(x) := f(x) + ψ(x)},
Julien Mairal Stochastic Composite Optimization 3/24
x∈Rp{F(x) := f(x) + ψ(x)},
n
Julien Mairal Stochastic Composite Optimization 3/24
(a) smooth (b) non-smooth
Julien Mairal Stochastic Composite Optimization 4/24
(a) smooth (b) non-smooth
Julien Mairal Stochastic Composite Optimization 4/24
(a) non-convex (b) convex (c) strongly-convex
Julien Mairal Stochastic Composite Optimization 5/24
(a) non-convex (b) convex (c) strongly-convex
Julien Mairal Stochastic Composite Optimization 5/24
Picture from F. Bach
Julien Mairal Stochastic Composite Optimization 6/24
Picture from F. Bach
Julien Mairal Stochastic Composite Optimization 7/24
Julien Mairal Stochastic Composite Optimization 8/24
Julien Mairal Stochastic Composite Optimization 8/24
it
n
i=1 yt−1 i
it
n
i=1 yt−1 i
i =
i
1 n
i=1 yt−1 i
i =
i
Julien Mairal Stochastic Composite Optimization 9/24
x∈Rp{F(x) := E[ ˜
Julien Mairal Stochastic Composite Optimization 10/24
x∈Rp{F(x) := E[ ˜
Julien Mairal Stochastic Composite Optimization 10/24
x∈Rp
n
n
Julien Mairal Stochastic Composite Optimization 11/24
x∈Rp
n
n
Julien Mairal Stochastic Composite Optimization 11/24
n
Julien Mairal Stochastic Composite Optimization 12/24
x∈Rp
n
Julien Mairal Stochastic Composite Optimization 13/24
Julien Mairal Stochastic Composite Optimization 14/24
Julien Mairal Stochastic Composite Optimization 15/24
Julien Mairal Stochastic Composite Optimization 15/24
k (x − xk−1) + µ
0 + µ 2x − x02.
Julien Mairal Stochastic Composite Optimization 15/24
k (x − xk−1) + µ
0 + µ 2x − x02.
Julien Mairal Stochastic Composite Optimization 15/24
k (x − xk−1) + µ
Julien Mairal Stochastic Composite Optimization 16/24
k
t
t=1(1 − δt), ˆ
t = E[gt − ∇f(xt−1)2].
Julien Mairal Stochastic Composite Optimization 17/24
k
t
t=1(1 − δt), ˆ
t = E[gt − ∇f(xt−1)2].
Julien Mairal Stochastic Composite Optimization 17/24
k
t
t=1(1 − δt), ˆ
t = E[gt − ∇f(xt−1)2].
Julien Mairal Stochastic Composite Optimization 17/24
k
t
t=1(1 − δt), ˆ
t = E[gt − ∇f(xt−1)2].
Julien Mairal Stochastic Composite Optimization 17/24
k
k (x − yk−1) + µ
0 + µ 2x − x02.
Julien Mairal Stochastic Composite Optimization 18/24
k
Julien Mairal Stochastic Composite Optimization 18/24
k
Julien Mairal Stochastic Composite Optimization 18/24
k
Julien Mairal Stochastic Composite Optimization 18/24
Julien Mairal Stochastic Composite Optimization 19/24
Julien Mairal Stochastic Composite Optimization 20/24
50 100 150 200 250 300
Effective passes over data, Dataset alpha
10 -5 10 -4 10 -3 10 -2 10 -1 10 0 log(F/F *-1)
rand-SVRG 1/12L rand-SVRG 1/3L acc-SVRG 1/3L SGD 1/L SGD-d acc-SGD-d acc-mb-SGD-d
50 100 150 200 250 300
Effective passes over data, Dataset ckn-cifar
10 -5 10 -4 10 -3 10 -2 10 -1 10 0 log(F/F *-1)
Julien Mairal Stochastic Composite Optimization 21/24
50 100 150 200 250 300
Effective passes over data, Dataset alpha
10 -5 10 -4 10 -3 10 -2 10 -1 10 0 log(F/F *-1)
rand-SVRG 1/12L rand-SVRG 1/3L acc-SVRG 1/3L SGD 1/L SGD-d acc-SGD-d acc-mb-SGD-d
50 100 150 200 250 300
Effective passes over data, Dataset ckn-cifar
10 -5 10 -4 10 -3 10 -2 10 -1 10 0 log(F/F *-1)
Julien Mairal Stochastic Composite Optimization 22/24
50 100 150 200 250 300
Effective passes over data
10 -5 10 -4 10 -3 10 -2 10 -1 10 0
log(F/F *-1)
rand-SVRG 1/12L rand-SVRG 1/3L acc-SVRG 1/3L SGD 1/L SGD-d acc-SGD-d acc-mb-SGD-d
50 100 150 200 250 300
Effective passes over data
10 -5 10 -4 10 -3 10 -2 10 -1 10 0
log(F/F *-1)
Julien Mairal Stochastic Composite Optimization 23/24
Julien Mairal Stochastic Composite Optimization 24/24
Julien Mairal Stochastic Composite Optimization 24/24
Julien Mairal Stochastic Composite Optimization 24/24
Julien Mairal Stochastic Composite Optimization 25/24
Julien Mairal Stochastic Composite Optimization 26/24
for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS), 2014a.
method for big data problems. In Proceedings of the International Conference on Machine Learning (ICML), 2014b. Olivier Devolder. Stochastic first order methods in smooth convex optimization. CORE Discussion Papers 2011070, Universit˜ A c catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2011. Saeed Ghadimi and Guanghui Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, ii: shrinking procedures and optimal algorithms. SIAM Journal
Julien Mairal Stochastic Composite Optimization 27/24
Jakub Koneˇ cn` y and Peter Richt´
Mathematics and Statistics, 3:9, 2017.
theory to practice. Journal of Machine Learning Research (JMLR), 18(212):1–54, 2018. Qihang Lin, Xi Chen, and Javier Pe˜
Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Tak´ aˇ
learning problems using stochastic recursive gradient. In Proceedings of the International Conference on Machine Learning (ICML), 2017. Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162(1-2):83–112, 2017.
regularized loss minimization. Mathematical Programming, pages 1–41, 2014.
Julien Mairal Stochastic Composite Optimization 28/24
SIAM Journal on Optimization, 24(4):2057–2075, 2014. Kaiwen Zhou, Fanhua Shang, and James Cheng. A simple stochastic variance reduced algorithm with fast convergence rates. arXiv preprint arXiv:1806.11027, 2018.
Julien Mairal Stochastic Composite Optimization 29/24