Painless Stochastic Gradient Descent: Interpolation, Line-Search, and Convergence Rates.
MLSS 2020 Aaron Mishkin, amishkin@cs.ubc.ca
1⁄21
Painless Stochastic Gradient Descent : Interpolation, Line-Search, - - PowerPoint PPT Presentation
Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS 2020 Aaron Mishkin, amishkin@cs.ubc.ca 1 21 Stochastic Gradient Descent: Workhorse of ML? Stochastic gradient descent (SGD) is today one of
1⁄21
2⁄21
3⁄21
4⁄21
5⁄21
n
6⁄21
7⁄21
8⁄21
9⁄21
10⁄21
11⁄21
12⁄21
13⁄21
MNIST CIFAR10 CIFAR100
0.000 0.001 0.002 0.003 0.004 0.005
Tuned SGD SGD + Goldstein Adam Polyak + Armijo Coin-Betting AdaBound SGD + Armijo
mushrooms ijcnn MF: 1 MF: 10
Experiments
0.000 0.001 0.002 0.003 0.004 0.005
Time Per-Iteration (s)
Adam Polyak + Armijo Coin-Betting SGD + Armijo Nesterov + Armijo SEG + Lipschitz
14⁄21
100 200 300 400 Number of epochs 10
4
10
3
10
2
10
1
100 101 Distance to the optimum
Bilinear with Interpolation
100 200 300 400 Number of epochs 101 2 × 101 3 × 101 Distance to the optimum
Bilinear without Interpolation
Adam Extra-Adam SEG + Lipschitz SVRE + Restarts
15⁄21
16⁄21
17⁄21
18⁄21
19⁄21
20⁄21
21⁄21