Stochastic Gradient Methods for Neural Networks
Chih-Jen Lin
National Taiwan University Last updated: May 25, 2020
Chih-Jen Lin (National Taiwan Univ.) 1 / 45
Stochastic Gradient Methods for Neural Networks Chih-Jen Lin - - PowerPoint PPT Presentation
Stochastic Gradient Methods for Neural Networks Chih-Jen Lin National Taiwan University Last updated: May 25, 2020 Chih-Jen Lin (National Taiwan Univ.) 1 / 45 Outline Gradient descent 1 Mini-batch SG 2 Adaptive learning rate 3
Chih-Jen Lin (National Taiwan Univ.) 1 / 45
1
2
3
4
Chih-Jen Lin (National Taiwan Univ.) 2 / 45
Gradient descent
1
2
3
4
Chih-Jen Lin (National Taiwan Univ.) 3 / 45
Gradient descent
θ f (θ)
i=1 ξ(③L+1,i(θ); ② i, Z 1,i)
i=1 ξ(θ; ② i, Z 1,i)
Chih-Jen Lin (National Taiwan Univ.) 4 / 45
Gradient descent
∆θ
Chih-Jen Lin (National Taiwan Univ.) 5 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 6 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 7 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 8 / 45
Gradient descent
α f (θ + α∆θ)
Chih-Jen Lin (National Taiwan Univ.) 9 / 45
Gradient descent
k→∞ ∇f (θk) = 0,
Chih-Jen Lin (National Taiwan Univ.) 10 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 11 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 12 / 45
Gradient descent
Chih-Jen Lin (National Taiwan Univ.) 13 / 45
Mini-batch SG
1
2
3
4
Chih-Jen Lin (National Taiwan Univ.) 14 / 45
Mini-batch SG
i=1 ξ(θ; ② i, Z 1,i)
l
Chih-Jen Lin (National Taiwan Univ.) 15 / 45
Mini-batch SG
l
Chih-Jen Lin (National Taiwan Univ.) 16 / 45
Mini-batch SG
1: Given an initial learning rate η. 2: while do 3:
4:
5:
6: end while
Chih-Jen Lin (National Taiwan Univ.) 17 / 45
Mini-batch SG
Chih-Jen Lin (National Taiwan Univ.) 18 / 45
Mini-batch SG
Chih-Jen Lin (National Taiwan Univ.) 19 / 45
Mini-batch SG
Chih-Jen Lin (National Taiwan Univ.) 20 / 45
Mini-batch SG
Chih-Jen Lin (National Taiwan Univ.) 21 / 45
Adaptive learning rate
1
2
3
4
Chih-Jen Lin (National Taiwan Univ.) 22 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 23 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 24 / 45
Adaptive learning rate
l
Chih-Jen Lin (National Taiwan Univ.) 25 / 45
Adaptive learning rate
l
Chih-Jen Lin (National Taiwan Univ.) 26 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 27 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 28 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 29 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 30 / 45
Adaptive learning rate
1
2
Chih-Jen Lin (National Taiwan Univ.) 31 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 32 / 45
Adaptive learning rate
1
2
Chih-Jen Lin (National Taiwan Univ.) 33 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 34 / 45
Adaptive learning rate
t
1 ❣ i
Chih-Jen Lin (National Taiwan Univ.) 35 / 45
Adaptive learning rate
t
1 ❣ i]
t
1
Chih-Jen Lin (National Taiwan Univ.) 36 / 45
Adaptive learning rate
t
1
1
1
1)
1
Chih-Jen Lin (National Taiwan Univ.) 37 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 38 / 45
Adaptive learning rate
Chih-Jen Lin (National Taiwan Univ.) 39 / 45
Discussion
1
2
3
4
Chih-Jen Lin (National Taiwan Univ.) 40 / 45
Discussion
Chih-Jen Lin (National Taiwan Univ.) 41 / 45
Discussion
ℓ
Chih-Jen Lin (National Taiwan Univ.) 42 / 45
Discussion
Chih-Jen Lin (National Taiwan Univ.) 43 / 45
Discussion
Chih-Jen Lin (National Taiwan Univ.) 44 / 45
Discussion
stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
International Conference on Learning Representations (ICLR), 2015.
C.-H. Tsai, C.-Y. Lin, and C.-J. Lin. Incremental and decremental training for linear
Knowledge Discovery and Data Mining, 2014. URL http://www.csie.ntu.edu.tw/~cjlin/papers/ws/inc-dec.pdf.
gradient methods in machine learning. In Advances in Neural Information Processing Systems, pages 4148–4158, 2017.
Chih-Jen Lin (National Taiwan Univ.) 45 / 45