Model Validation and Selection
- Dr. Ilija Bogunovic
Introduction to Machine Learning Model Validation and Selection - - PowerPoint PPT Presentation
Introduction to Machine Learning Model Validation and Selection Dr. Ilija Bogunovic Learning and Adaptive Systems (las.ethz.ch) Recap: Achieving generalization Fundamental assumption: Our data set is generated independently and identically
2
3
4
w
(x,y)∈Dtest
EDtrain,Dtest h ˆ RDtest(ˆ wDtrain) i = EDtrain h R(ˆ wDtrain) i
5
6
7
w:degree(w)≤m
m
8
9
Split the same data set into training and validation set Train model Estimate error
10
w
train(w)
train ] D(i) val
m
k
i=1
m
m = ˆ
val( ˆ
Pick training set of given size uniformly at random Validate on remaining points Estimate prediction error by averaging the validation error
Partition the data into k „folds“ Train on (k-1) folds, evaluating on remaining fold Estimate prediction error by averaging the validation error
11
12
13
è Risk of overfitting to test set è Using too little data for training è risk of underfitting to training set
In general, better performance! k=n is perfectly fine (called leave-one-out cross-validation, LOOCV) Higher computational complexity
14
15
16
Squared loss, lp-loss
Exact solution, Gradient Descent
K-fold Cross-Validation, Monte Carlo CV
Linear hypotheses, nonlinear hypotheses through feature transformations
Mean squared error
E.g., how should we order words in the bag-of-words model? Collection of nonlinear feature transformations
17
18
19
20
w
n
i=1
2
21
n
i=1
j = 1
n
i=1
22
23
24
w
n
i=1
2
25
26
w
27
Squared loss, lp-loss
Exact solution, Gradient Descent
K-fold Cross-Validation, Monte Carlo CV
Linear hypotheses, nonlinear hypotheses through feature transformations
Mean squared error
L2 norm
How do you solve it? Closed form vs gradient descent Can represent non-linear functions using basis functions
Resampling; Cross-validation
Comparing different models via cross-validation
Adding penalty function to control magnitude of weights Choose regularization parameter via cross-validation
28