Painless Stochastic Gradient Descent: Interpolation, Line-Search, and Convergence Rates.
NeurIPS 2019 Aaron Mishkin
1
⁄
21
Painless Stochastic Gradient Descent : Interpolation, Line-Search, - - PowerPoint PPT Presentation
Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS 2019 Aaron Mishkin 1 21 Stochastic Gradient Descent: Workhorse of ML? Stochastic gradient descent (SGD) is today one of the main
1
21
2
21
3
21
4
21
5
21
n
i=1
6
21
7
21
8
21
9
21
10
21
11
21
MNIST CIFAR10 CIFAR100
0.000 0.001 0.002 0.003 0.004 0.005
Tuned SGD SGD + Goldstein Adam Polyak + Armijo Coin-Betting AdaBound SGD + Armijo
mushrooms ijcnn MF: 1 MF: 10
Experiments
0.000 0.001 0.002 0.003 0.004 0.005
Time Per-Iteration (s)
Adam Polyak + Armijo Coin-Betting SGD + Armijo Nesterov + Armijo SEG + Lipschitz
12
21
100 200 300 400 Number of epochs 10
4
10
3
10
2
10
1
100 101 Distance to the optimum
Bilinear with Interpolation
100 200 300 400 Number of epochs 101 2 × 101 3 × 101 Distance to the optimum
Bilinear without Interpolation
Adam Extra-Adam SEG + Lipschitz SVRE + Restarts
13
21
14
21
50 100 150 200 250 300 350
Iterations
10
10
10
4
Training Loss
Synthetic Matrix Fac.
Adam SGD + Armijo Nesterov + Armijo
15
21
16
21
17
21
18
21
19
21
20
21
21
21