Regularization for Deep Learning
Lecture slides for Chapter 7 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27
Regularization for Deep Learning Lecture slides for Chapter 7 of - - PowerPoint PPT Presentation
Regularization for Deep Learning Lecture slides for Chapter 7 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27 Definition Regularization is any modification we make to a learning algorithm that is intended to reduce
Lecture slides for Chapter 7 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27
(Goodfellow 2016)
learning algorithm that is intended to reduce its generalization error but not its training error.”
(Goodfellow 2016)
w1 w2 w∗ ˜ w
Figure 7.1
(Goodfellow 2016)
Bayesian estimation with Laplace prior
MAP Bayesian estimation with Gaussian prior
(Goodfellow 2016)
Affine Distortion Noise Elastic Deformation Horizontal flip Random Translation Hue Shift
(Goodfellow 2016)
h(1) h(1) h(2) h(2) h(3) h(3) y(1) y(1) y(2) y(2) h(shared) h(shared) x
Figure 7.2
(Goodfellow 2016)
50 100 150 200 250 Time (epochs) 0.00 0.05 0.10 0.15 0.20 Loss (negative log-likelihood)
Training set loss Validation set loss
Figure 7.3 Early stopping: terminate while validation set performance is better
(Goodfellow 2016)
w1 w2 w∗ ˜ w w1 w2 w∗ ˜ w
Figure 7.4
(Goodfellow 2016)
2 6 6 6 6 4 14 1 19 2 23 3 7 7 7 7 5 = 2 6 6 6 6 4 3 1 2 5 4 1 4 2 3 1 1 3 1 5 4 2 3 2 3 1 2 3 3 5 4 2 2 5 1 3 7 7 7 7 5 2 6 6 6 6 6 6 4 2 3 3 7 7 7 7 7 7 5 y 2 Rm B 2 Rm⇥n h 2 Rn (7.47)
(Goodfellow 2016)
8 8 First ensemble member Second ensemble member Original dataset First resampled dataset Second resampled dataset
Figure 7.5
(Goodfellow 2016)
y h1 h1 h2 h2 x1 x1 x2 x2 y h1 h1 h2 h2 x1 x1 x2 x2 y h1 h1 h2 h2 x2 x2 y h1 h1 h2 h2 x1 x1 y h2 h2 x1 x1 x2 x2 y h1 h1 x1 x1 x2 x2 y h1 h1 h2 h2 y x1 x1 x2 x2 y h2 h2 x2 x2 y h1 h1 x1 x1 y h1 h1 x2 x2 y h2 h2 x1 x1 y x1 x1 y x2 x2 y h2 h2 y h1 h1 y Base network Ensemble of subnetworks
Figure 7.6
(Goodfellow 2016)
+ .007 ⇥ = x sign(rxJ(θ, x, y)) x + ✏ sign(rxJ(θ, x, y)) y =“panda” “nematode” “gibbon” w/ 57.7% confidence w/ 8.2% confidence w/ 99.3 % confidence
Figure 7.8 Training on adversarial examples is mostly intended to improve security, but can sometimes provide generic regularization.
(Goodfellow 2016)
x1 x2 Normal Tangent
Figure 7.9