Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Alberto Bietti Julien Mairal
Inria Grenoble (Thoth)
March 21, 2017
Alberto Bietti Stochastic MISO March 21, 2017 1 / 20
Stochastic Optimization with Variance Reduction for Infinite - - PowerPoint PPT Presentation
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Julien Mairal Alberto Bietti Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21, 2017 1 / 20 Stochastic optimization
Alberto Bietti Stochastic MISO March 21, 2017 1 / 20
◮ Infinite datasets (expected risk, D: data distribution), or “single pass” ◮ SGD, stochastic mirror descent, FOBOS, RDA ◮ O(1/ǫ) complexity
◮ Finite datasets (empirical risk): fi(x) = ℓ(yi, x⊤ξi) + (µ/2)x2 ◮ SAG, SDCA, SVRG, SAGA, MISO, etc. ◮ O(log 1/ǫ) complexity Alberto Bietti Stochastic MISO March 21, 2017 2 / 20
◮ Image data augmentation: add random transformations of each
◮ Dropout: set coordinates of feature vectors to 0 with probability δ.
The colorful Norwegian city
Bergen is also a gateway to majes- tic fjords. Bryggen Hanseatic Wharf will give you a sense of the local cul- ture – take some time to snap photos
ings, which look like scenery from a movie set. The colorful of gateway to fjords. Hanseatic Wharf will sense the cul- ture – take some to snap photos the commercial buildings, which look scenery a
Alberto Bietti Stochastic MISO March 21, 2017 3 / 20
Alberto Bietti Stochastic MISO March 21, 2017 4 / 20
◮ Sample index it, perturbation ρt ∼ Γ ◮ Update: xt = xt−1 − ηt∇˜
Alberto Bietti Stochastic MISO March 21, 2017 5 / 20
◮ Select it, update: dt
i (x) =
i
i (x),
i
◮ Minimize the model: xt = arg minx{Dt(x) = 1
n
i=1 dt i (x)}
Alberto Bietti Stochastic MISO March 21, 2017 6 / 20
Alberto Bietti Stochastic MISO March 21, 2017 7 / 20
Alberto Bietti Stochastic MISO March 21, 2017 8 / 20
Alberto Bietti Stochastic MISO March 21, 2017 9 / 20
Alberto Bietti Stochastic MISO March 21, 2017 10 / 20
Alberto Bietti Stochastic MISO March 21, 2017 11 / 20
Alberto Bietti Stochastic MISO March 21, 2017 12 / 20
◮ MISO extends to this case by adding h to lower bound model (Lin
◮ Different Lyapunov function (xt − x∗2 replaced by an upper bound) ◮ Similar to Regularized Dual Averaging when n = 1
◮ Smoothness constants Li of each ˜
◮ Sampling “difficult” examples more often can improve dependence in L
Alberto Bietti Stochastic MISO March 21, 2017 13 / 20
50 100 150 200 250 300 350 400 epochs 10-5 10-4 10-3 10-2 10-1 100 f - f*
gene dropout, δ = 0.30
S-MISO η = 0. 1 S-MISO η = 1. 0 SGD η = 0. 1 SGD η = 1. 0 N-SAGA η = 0. 1 N-SAGA η = 1. 0
50 100 150 200 250 300 350 400 epochs 10-6 10-5 10-4 10-3 10-2 10-1 100 f - f*
gene dropout, δ = 0.10
50 100 150 200 250 300 350 400 epochs 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 f - f*
gene dropout, δ = 0.01
Alberto Bietti Stochastic MISO March 21, 2017 14 / 20
100 200 300 400 500 epochs 10-5 10-4 10-3 10-2 10-1 100 f - f*
STL-10 ckn, µ = 10−3
S-MISO η = 0. 1 S-MISO η = 1. 0 SGD η = 0. 1 SGD η = 1. 0 N-SAGA η = 0. 1
100 200 300 400 500 epochs 10-4 10-3 10-2 10-1 100 f - f*
STL-10 ckn, µ = 10−4
50 100 150 200 250 300 350 400 epochs 10-3 10-2 10-1 100 f - f*
STL-10 ckn, µ = 10−5
Alberto Bietti Stochastic MISO March 21, 2017 15 / 20
Alberto Bietti Stochastic MISO March 21, 2017 16 / 20
Alberto Bietti Stochastic MISO March 21, 2017 17 / 20
Alberto Bietti Stochastic MISO March 21, 2017 18 / 20
Alberto Bietti Stochastic MISO March 21, 2017 19 / 20
i
Alberto Bietti Stochastic MISO March 21, 2017 20 / 20