Learning From Data Lecture 22 Neural Networks and Overfitting
Approximation vs. Generalization Regularization and Early Stopping Minimizing Ein More Efficienty
- M. Magdon-Ismail
CSCI 4100/6100
Learning From Data Lecture 22 Neural Networks and Overfitting - - PowerPoint PPT Presentation
Learning From Data Lecture 22 Neural Networks and Overfitting Approximation vs. Generalization Regularization and Early Stopping Minimizing E in More Efficienty M. Magdon-Ismail CSCI 4100/6100 recap: Neural Networks and Fitting the Data
CSCI 4100/6100
recap: Neural Networks and Fitting the Data
x = x(0)
W(1)
− → s(1)
θ
− → x(1)
W(2)
− → s(2) · · ·
W(L)
− → s(L)
θ
− → x(L) = h(x) s(ℓ) = (W(ℓ))tx(ℓ−1) x(ℓ) =
θ(s(ℓ))
δ(1) ← − δ(2) · · · ← − δ(L−1) ← − δ(L)
1
SGD Gradient Descent log10(iteration) log10(error) 2 4 6
Average Intensity Symmetry
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 2 /15
2-layer neural network − →
v3 v5 h(x) v1 v2 v4 vm x w0 w1 w2 w3 wm w4 w5
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 3 /15
Tunable Transform − →
Neural Network Nonlinear Transform k-RBF-Network h(x) = θ
m
wjθ (vj
tx)
w0 +
˜ d
wjΦj(x) h(x) = θ
k
wjφ (| | x − µj | |)
approximation
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 4 /15
Generalization − →
semi-parametric because you still have to learn parameters.
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 5 /15
Regularization – weight decay − →
↑ backpropagation
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 6 /15
Digits data − →
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 7 /15
Early Stopping − →
w0 w1 = w0 − η g0
| | g0 | |
H1
H1 = {w : | | w − w0 | | ≤ η}
w0 w1 w2 H2
H2 = H1 ∪ {w : | | w − w1 | | ≤ η}
w0 w1 w2 w3 H3
H3 = H2 ∪ {w : | | w − w2 | | ≤ η}
w(0) w(t∗)
contour of constant Ein
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 8 /15
Early stopping on digits data − →
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 9 /15
Minimizing Ein− →
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 10 /15
Beefing up gradient descent − →
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 11 /15
Variable learning rate − →
1: Initialize w(0), and η0 at t = 0. Set α > 1 and β < 1. 2: while stopping criterion has not been met do 3:
4:
5:
α ∈ [1.05, 1.1]
6:
7:
β ∈ [0.7, 0.8]
8:
9:
10: end while c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 12 /15
Steepest Descent - Line Search − →
1: Initialize w(0) and set t = 0; 2: while stopping criterion has not been met do 3:
4:
5:
6:
7: end while
w(t) w(t + 1)
v(t) −g(t + 1)
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 13 /15
Comparison of optimization heuristics − →
0.1 1 10 102 103 104
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 14 /15
Conjugate gradients − →
0.1 1 10 102 103 104
Optimization Time Method 10 sec 1,000 sec 50,000 sec Stochastic Gradient Descent 0.0203 0.000447 1.6310 × 10−5 Steepest Descent 0.0497 0.0194 0.000140 Conjugate Gradients 0.0200 1.13 × 10−6 2.73 × 10−9 There are better algorithms (eg. Levenberg-Marquardt), but we will stop here
c A M L Creator: Malik Magdon-Ismail
Neural Networks and Overfitting: 15 /15