Lecture 5:
−Regularization −ML Methodology
Aykut Erdem
February 2016 Hacettepe University
Lecture 5: Regularization ML Methodology Aykut Erdem February 2016 - - PowerPoint PPT Presentation
Lecture 5: Regularization ML Methodology Aykut Erdem February 2016 Hacettepe University Recall from last time Linear Regression y ( x ) = w 0 + w 1 x w = ( w 0 , w 1 ) N i 2 h X t ( n ) ( w 0 + w 1 x ( n ) ) ` ( w ) = n =1
Aykut Erdem
February 2016 Hacettepe University
inde- M ERMS 3 6 9 0.5 1 Training Test
2
from Bishop
y(x) = w0 + w1x `(w) =
N
X
n=1
h t(n) − (w0 + w1x(n)) i2 w = (w0, w1)
w ← w + 2λ ⇣ t(n) − y(x(n)) ⌘ x(n)
Gradient Descent Update Rule: Closed Form Solution:
w =
−1 XT t
w (T)1Ty
− Simplest models do not capture all the important variations (signal) in the data: underfit − More complex model may overfit the training data (fit not only the signal but also the noise in the data), especially if not enough data to constrain model
ability to predict the held out data
approaches; analytic when available
3
slide by Richard Zemel
4
5
6
2
N
{y(xn, w) − tn}2 + λ 2 ∥w∥2
0 + w2 1 + . . . + w2
M,
importance of the regularization term compared
slide by Erik Sudderth
7
x t ln λ = −18 1 −1 1 x t ln λ = 0 1 −1 1
M = 9
slide by Erik Sudderth
ERMS ln λ −35 −30 −25 −20 0.5 1 Training Test
8
ln λ = −∞ ln λ = −18 ln λ = 0 w⋆ 0.35 0.35 0.13 w⋆
1
232.37 4.74
w⋆
2
w⋆
3
48568.31
w⋆
4
w⋆
5
640042.26 55.28
w⋆
6
41.32
w⋆
7
1042400.18
w⋆
8
0.00 w⋆
9
125201.43 72.68 0.01
The corresponding coefficients from the fitted polynomials, showing that regularization has the desired effect of reducing the magnitude
slide by Erik Sudderth
9
N
M
q = 0.5 q = 1 q = 2 q = 4
slide by Richard Zemel
10
i are
11
slide by Olga Veksler
12
slide by Olga Veksler
make it parameter d
− degree 3 is the best according to the training error, but overfits
the data
13
− degree 2 is the best model according to the test error
training at all!
14
choosing among 3 classifiers (degree 1, 2, or 3)
15
parametersw trainother parameters,
classifier
Training 60% Validation 20% Test 20%
useonly to assessfinal performance
slide by Olga Veksler
16
Trainingerror: computedontraining examples Validationerror: computedon validation examples
Training 60% Validation 20% Test 20%
Testerror: computedon testexamples
slide by Olga Veksler
17
validationerror:1.8 validationerror:3.4
d =2 d =3
slide by Olga Veksler
18
error TrainingError ValidationError 50
slide by Olga Veksler
19
Underfitting JustRight
Overfitting
slide by Olga Veksler
20
slide by Olga Veksler
parameters
for test and 20% for validation data
be lucky or unlucky
21
slide by Olga Veksler
22
MeanSquaredError=2.4 QuadraticModel: MeanSquaredError=0.9 x JointhedotsModel: MeanSquaredError=2.2
slide by Olga Veksler
23
y
Fork=1toR 1.Let (xk,yk) bethek example
slide by Olga Veksler
24
x y
1.Let (xk,yk) bethekth example 2.Temporarilyremove(xk,yk) fromthedataset
slide by Olga Veksler
25
x y
1.Let (xk,yk) bethekth example 2.Temporarilyremove(xk,yk) fromthedataset 3.Trainontheremainingn1 examples
slide by Olga Veksler
26
x y
1.Let (xk,yk) bethekth example 2.Temporarilyremove(xk,yk) fromthedataset 3.Trainontheremainingn1 examples 4.Noteyourerroron(xk,yk)
slide by Olga Veksler
27
Fork=1ton 1.Let (xk,yk) bethekth example 2.Temporarilyremove(xk,yk) fromthedataset 3.Trainontheremainingn1 examples 4.Noteyourerroron(xk,yk) Whenyou’vedoneallpoints, reportthemeanerror
x y
28
x y x y x y x y x y x y x y x y x y
29
y x y x y x y x y x y x y x y x y
slide by Olga Veksler
30
y x y x y x y x y x y x y x y x y
slide by Olga Veksler
31
32
x y Randomly
colored red green and blue
slide by Olga Veksler
33
y
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
slide by Olga Veksler
34
y
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
slide by Olga Veksler
35
y
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
36
LinearRegression MSE3FOLD=2.05
x y
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
37
QuadraticRegression MSE3FOLD=1.11
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
38
x y
Jointthedots MSE3FOLD=2.93
colored red green and blue
in the blue partition. Find test‐ set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
39
Upside Testset maygiveunreliable estimateoffuture performance cheap Leave
expensive doesn’twastedata 10fold wastes10%ofthedata,10 timesmoreexpensivethan testset
timesmoreexpensive insteadofn times 3fold wastesmoredatathan10 fold,moreexpensivethan testset slightlybetterthantestset Nfold IdenticaltoLeaveoneout
slide by Olga Veksler
40
slide by Andrew Moore
41
slide by Andrew Moore
42
slide by Andrew Moore
43
slide by Andrew Moore
44
TrainingError
f2 f3
f5 f6
slide by Olga Veksler
45
TrainingError
10FOLDCVError f1 f2 f3
f5 f6
slide by Olga Veksler
46
TrainingError
10FOLDCVError Choice f1 f2 f3
f5 f6
slide by Olga Veksler
47
TrainingError
10foldCVError Choice
k=1 k=2 k=3 k=4
k=6
slide by Olga Veksler
worse as K was increasing
48
slide by Olga Veksler