Aykut Erdem // Hacettepe University // Fall 2019
Lecture 5:
ML Methodology
BBM406
Fundamentals of Machine Learning
Illustration: detail from The Alchemist Discovering Phosphorus by Joseph Wright (1771)
BBM406 Fundamentals of Machine Learning Lecture 5: ML Methodology - - PowerPoint PPT Presentation
Illustration: detail from The Alchemist Discovering Phosphorus by Joseph Wright (1771) BBM406 Fundamentals of Machine Learning Lecture 5: ML Methodology Aykut Erdem // Hacettepe University // Fall 2019 About class projects This semester
Aykut Erdem // Hacettepe University // Fall 2019
Lecture 5:
ML Methodology
Illustration: detail from The Alchemist Discovering Phosphorus by Joseph Wright (1771)
(classroom + video presentations), final report and code
http://web.cs.hacettepe.edu.tr/~aykut/classes/fall2019/bbm406/project.html.
2
3
Recall from last time… Linear Regression
y(x) = w0 + w1x `(w) =
N
X
n=1
h t(n) − (w0 + w1x(n)) i2 w = (w0, w1)
w ← w + 2λ ⇣ t(n) − y(x(n)) ⌘ x(n)
Gradient Descent Update Rule: Closed Form Solution:
w =
−1 XT t
Recall from last time… Some key concepts
− Simplest models do not capture all the important variations (signal) in the data: underfit − More complex model may overfit the training data (fit not only the signal but also the noise in the data), especially if not enough data to constrain model
− test generalization = model’s ability to predict the held out data
4
slide by Richard Zemel
2
N
{y(xn, w) − tn}2 + λ 2 ∥w∥2
0 + w2 1 + . . . + w2
M,
importance of the regularization term compared
ln λ = −∞ ln λ = −18 ln λ = 0 w⋆ 0.35 0.35 0.13 w⋆
1232.37 4.74
w⋆
2w⋆
348568.31
w⋆
4w⋆
5640042.26 55.28
w⋆
641.32
w⋆
71042400.18
w⋆
80.00 w⋆
9125201.43 72.68 0.01
5
6
continuous
solved very similarly
far transfers to classification with very minor changes
examples to the fitted model
7
slide by Olga Veksler
x y 1 4 8 6 3
to new data
8
slide by Olga Veksler
make it parameter d
− degree 3 is the best according to the training error, but overfits
the data
9
slide by Olga Veksler
x y
− degree 2 is the best model according to the test error
training at all!
10
slide by Olga Veksler
choosing among 3 classifiers (degree 1, 2, or 3)
11
slide by Olga Veksler
choosing among 3 classifiers (degree 1, 2, or 3)
12
slide by Olga Veksler
Training ≈ 60% Validation ≈ 20% Test ≈ 20%
train tunable parameters w train other parameters,
classifier use only to assess final performance
labeled data
13
slide by Olga Veksler
Training ≈ 60% Validation ≈ 20% Test ≈ 20%
Training error: computed on training example Validation error: computed on validation examples Test error: computed
test examples
labeled data
14
slide by Olga Veksler
validation error: 3.3 validation error: 1.8 validation error: 3.4
Choosing Parameters: Example
15
slide by Olga Veksler
error Validation error Training error
number of base functions 50
Diagnosing Underfitting/Overfitting
16
slide by Olga Veksler
Underfitting
Just Right
Overfitting
, try more hidden units
, try less hidden units
17
slide by Olga Veksler
parameters
for test and 20% for validation data
be lucky or unlucky
evaluation that wastes less data
18
slide by Olga Veksler
19
slide by Olga Veksler
Linear Model: Quadratic Model: Join the dots Model:
Mean Squared Error = 2.4 Mean Squared Error = 0.9 Mean Squared Error = 2.2
x y x y x y
LOOCV (Leave-one-out Cross Validation)
20
slide by Olga Veksler
x y For k=1 to n
from the dataset
examples
When you’ve done all points, report the mean error
LOOCV (Leave-one-out Cross Validation)
21
slide by Olga Veksler
x y For k=1 to n
from the dataset
examples
When you’ve done all points, report the mean error
LOOCV (Leave-one-out Cross Validation)
22
slide by Olga Veksler
x y For k=1 to n
from the dataset
examples
When you’ve done all points, report the mean error
LOOCV (Leave-one-out Cross Validation)
23
slide by Olga Veksler
x y For k=1 to n
from the dataset
examples
When you’ve done all points, report the mean error
LOOCV (Leave-one-out Cross Validation)
24
slide by Olga Veksler
x y For k=1 to n
from the dataset
examples
When you’ve done all points, report the mean error
LOOCV (Leave-one-out Cross Validation)
25
slide by Olga Veksler x y x y x y
MSELOOCV = 2.12
x y x y x y x y x y x y
LOOCV for Quadratic Regression
26
slide by Olga Veksler
MSELOOCV = 0.96
x y x y x y x y x y x y x y x y x y
27
slide by Olga Veksler
MSELOOCV = 3.33
x y x y x y x y x y x y x y x y x y
Which kind of Cross Validation?
28
Upside Testset maygiveunreliable estimateoffuture performance cheap Leaveone
expensive doesn’twaste data
29
colored red green and blue
slide by Olga Veksler
x y
30
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
slide by Olga Veksler
x y
31
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
slide by Olga Veksler
x y
32
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
x y
33
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
x y
Linear Regression MSE3FOLD = 2.05
34
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
Quadratic Regression MSE3FOLD = 1.1
x y
35
colored red green and blue
in the blue partition. Find test‐set sum of errors on blue points
in green partition. Find test‐set sum of errors on green points
red partition. Find the test‐set sum of errors
slide by Olga Veksler
Join the dots MSE3FOLD = 2.93
x y
Which kind of Cross Validation?
36
Upside Testset maygiveunreliable estimateoffuture performance cheap Leave
expensive doesn’twastedata 10fold wastes10%ofthedata,10 timesmoreexpensivethan testset
timesmoreexpensive insteadofn times 3fold wastesmoredatathan10 fold,moreexpensivethan testset slightlybetterthantestset Nfold IdenticaltoLeaveoneout
slide by Olga Veksler
Cross-validation for classification
errors on a test set, you should compute...
37
slide by Andrew Moore
Cross-validation for classification
errors on a test set, you should compute…
The total number of misclassifications on a test set
38
slide by Andrew Moore
Cross-validation for classification
errors on a test set, you should compute…
The total number of misclassifications on a test set
39
slide by Andrew Moore
Cross-validation for classification
40
slide by Andrew Moore
41
TrainingError
f2 f3
f5 f6
slide by Olga Veksler
42
TrainingError
10FOLDCVError f1 f2 f3
f5 f6
slide by Olga Veksler
43
TrainingError
10FOLDCVError Choice f1 f2 f3
f5 f6
slide by Olga Veksler
44
TrainingError
10foldCVError Choice
k=1 k=2 k=3 k=4
k=6
slide by Olga Veksler
worse as K was increasing
will be the global optimum?
45
slide by Olga Veksler
Learning Theory & Probability Review
46