Deep Learning Basics Lecture 3: Regularization I
Princeton University COS 495 Instructor: Yingyu Liang
Lecture 3: Regularization I Princeton University COS 495 - - PowerPoint PPT Presentation
Deep Learning Basics Lecture 3: Regularization I Princeton University COS 495 Instructor: Yingyu Liang What is regularization? In general: any method to prevent overfitting or help the optimization Specifically: additional terms in the
Princeton University COS 495 Instructor: Yingyu Liang
prevent overfitting or help the optimization
π’ = sin 2ππ¦ + π
Figure from Machine Learning and Pattern Recognition, Bishop
Figure from Machine Learning and Pattern Recognition, Bishop
difference between the two
min
π
ΰ· π π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) subject to: π β π
min
π
ΰ· π π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) subject to: π β π»
min
π
ΰ· π π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) subject to: π π β€ π
min
π
ΰ· π π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) subject to: | π| 2
2 β€ π 2
min
π
ΰ· ππ π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) + πβπ(π) for some regularization parameter πβ > 0
min
π
ΰ· ππ π = 1 π ΰ·
π=1 π
π(π, π¦π, π§π) + πβ| π| 2
2
β π, π β ΰ· π π + π[π π β π ]
πβ = argmin
π
max
πβ₯0 β π, π β ΰ·
π π + π[π π β π ]
πβ = argmin
π
β π, πβ β ΰ· π π + πβ[π π β π ]
π¦π, π§π π)
π π | {π¦π, π§π} = π π π π¦π, π§π π) π({π¦π, π§π})
π π | {π¦π, π§π} = π π π π¦π, π§π π) π({π¦π, π§π})
max
π
log π π | {π¦π, π§π} = max
π
log π π + log π π¦π, π§π | π Regularization MLE loss
min
π
ΰ· ππ π = 1 π ΰ·
π=1 π
π
π π¦π β π§π 2 + πβ| π| 2 2
min
π
ΰ· ππ π = ΰ· π π + ππ(π)
min
π
ΰ· ππ π = ΰ· π(π) + π½ 2 | π| 2
2
πΌΰ· ππ π = πΌΰ· π(π) + π½π
π β π β ππΌΰ· ππ π = π β π πΌΰ· π π β ππ½π = 1 β ππ½ π β π πΌΰ· π π
ΰ· π π β ΰ· π πβ + π β πβ ππΌΰ· π πβ + 1 2 π β πβ ππΌ π β πβ
π πβ = 0 ΰ· π π β ΰ· π πβ + 1 2 π β πβ ππΌ π β πβ πΌΰ· π π β πΌ π β πβ
πΌΰ· ππ π β πΌ π β πβ + π½π
β
0 = πΌΰ· ππ ππ
β β πΌ ππ β β πβ + π½ππ β
ππ
β β πΌ + π½π½ β1πΌπβ
ππ
β β πΌ + π½π½ β1πΌπβ
ππ
β β πΌ + π½π½ β1πΌπβ = π Ξ + π½π½ β1Ξπ ππβ
Figure from Deep Learning, Goodfellow, Bengio and Courville
Notations: πβ = π₯β, ππ
β = ΰ·₯
π₯
min
π
ΰ· ππ π = ΰ· π(π) + π½| π |1
πΌΰ· ππ π = πΌΰ· π π + π½ sign(π) where sign applies to each element in π
π β π β ππΌΰ· ππ π = π β π πΌΰ· π π β ππ½ sign(π)
ΰ· π π β ΰ· π πβ + π β πβ ππΌΰ· π πβ + 1 2 π β πβ ππΌ π β πβ
π πβ = 0 ΰ· π π β ΰ· π πβ + 1 2 π β πβ ππΌ π β πβ
ΰ· ππ π β ΰ·
π
1 2 πΌππ ππ β ππ
β 2 + π½ |ππ|
β
(ππ
β)π β
max ππ
β β π½
πΌππ , 0 if ππ
β β₯ 0
min ππ
β + π½
πΌππ , 0 if ππ
β < 0
β π½ πΌππ π½ πΌππ (ππ
β)π
(πβ)π
β
(ππ
β)π β sign ππ β max{ ππ β β π½
πΌππ , 0}
π π β exp(π½ ΰ·
π
|ππ|) log π π = π½ ΰ·
π
|ππ| + constant = π½| π |1 + constant