Deep Learning Basics Lecture 4: Regularization II
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 4: Regularization II Princeton University COS 495 - - PowerPoint PPT Presentation
Deep Learning Basics Lecture 4: Regularization II Princeton University COS 495 Instructor: Yingyu Liang Review Regularization as hard constraint Constrained optimization = 1 min (, ,
Princeton University COS 495 Instructor: Yingyu Liang
min
π
ΰ· π π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) subject to: π π β€ π
min
π
ΰ· ππ π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) + ππ(π) for some regularization parameter π > 0
π π | {π¦π, π§π} = π π π π¦π, π§π π) π({π¦π, π§π})
max
π
log π π | {π¦π, π§π} = max
π
log π π + log π π¦π, π§π | π Regularization MLE loss
Class +1 Class -1 π₯2 π₯3 π₯1 Prefer π₯2 (higher confidence)
Class +1 Class -1 π₯2 Prefer π₯2 (higher confidence)
Class +1 Class -1 π₯2 Prefer π₯2 (higher confidence) Too much noise leads to data points cross the boundary
π(π) = π½π¦,π§,π π π¦ + π β π§ 2 = π½π¦,π§,π π π¦ + π₯ππ β π§ 2 π(π) =π½π¦,π§,π π π¦ β π§ 2 + 2π½π¦,π§,π π₯ππ π π¦ β π§ + π½π¦,π§,π π₯ππ 2 π(π) =π½π¦,π§,π π π¦ β π§ 2 + π π₯
2
before computing the prediction π~π(0, ππ½), π₯β² = π₯ + π
π₯ π¦
π(π) = π½π¦,π§,π π
π₯+π π¦ β π§ 2
π(π) = π½π¦,π§,π π
π₯+π π¦ β π§ 2
π₯+π π¦ β π π₯ π¦ + πππΌπ π¦ + πππΌ2π π¦ π 2
π₯ π¦ β π§ 2 + ππ½[ π π₯ π¦ β π§ πΌ2π π₯ π¦ ] + ππ½||πΌπ π₯(π¦)||2
Small so can be ignored Regularization term
Figure from Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7, by Keven Wang
hypothesis that fits the difference between the two
validation error to decide when to stop
Figure from Deep Learning, Goodfellow, Bengio and Courville
data
previous number of epochs
validation data util the validation loss < the training loss at the early stopping point
Figure from Deep Learning, Goodfellow, Bengio and Courville
Figure from Deep Learning, Goodfellow, Bengio and Courville
Figure from Deep Learning, Goodfellow, Bengio and Courville
Figure from Deep Learning, Goodfellow, Bengio and Courville