Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D - PowerPoint PPT Presentation

Validation and Testing COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Validation and Testing 1 / 19

Outline 1 Training, Testing, and Model Selection 2 A Generative Data Model 3 Model Selection: Validation 4 Model Selection: Cross-Validation 5 Model Selection: The Bootstrap COMPSCI 371D — Machine Learning Validation and Testing 2 / 19

Training, Testing, and Model Selection Training and Testing • Empirical risk is average loss over training set: def 1 L T ( h ) = � ( x , y ) ∈ T ℓ ( y , h ( x )) | T | • Training is Empirical Risk Minimization: ERM T ( H ) ∈ arg min h ∈H L T ( h ) (A fitting problem) • Not enough for machine learning: Must generalize • Small loss on “previously unseen data” • How do we know? Evaluate on a separate test set S • This is called testing the predictor • How do we know that S and T are “related?” COMPSCI 371D — Machine Learning Validation and Testing 3 / 19

Training, Testing, and Model Selection Model Selection • Hyper-parameters: Degree k for polynomials, number k of neighbors in k -NN • How to choose? Why not just include with parameters, and train? • Difficulty 0: k -NN has no training! No big deal • Difficulty 1: k ∈ N , while v ∈ R m for some predictors. Hybrid optimization. Medium deal, just technical difficulty • Difficulty 2: Answer from training would be trivial! • Can always achieve zero risk on T • So k must be chosen separately from training. It tunes generalization • This is what makes it a hyper-parameter • Choosing hyper-parameters is called model selection • Evaluate choices on a separate validation set V COMPSCI 371D — Machine Learning Validation and Testing 4 / 19

Training, Testing, and Model Selection Model Selection, Training, Testing • “Model” = H • Given a parametric family of hypothesis spaces, model selection selects one particular member of the family • Given a specific hypothesis space, training selects one particular predictor out of it • Use V to select model, T to train, S to test • V , T , S are mutually disjoint but “related” • What does “related” mean? • Train on cats and test on horses? COMPSCI 371D — Machine Learning Validation and Testing 5 / 19

A Generative Data Model A Generative Data Model • What does “related” mean? • Every sample ( x , y ) comes from a joint probability distribution p ( x , y ) • True for training, validation, and test data, and for data seen during deployment • For the latter, y is “out there” but unknown • The goal of machine learning: • Define the (statistical) risk ˜ L p ( h ) = E p [ ℓ ( y , h ( x ))] = ℓ ( y , h ( x )) p ( x , y ) d x dy • Learning performs (Statistical) Risk Minimization : RM p ( H ) ∈ arg min h ∈H L p ( h ) def • Lowest risk on H : L p ( H ) = min h ∈H L p ( h ) COMPSCI 371D — Machine Learning Validation and Testing 6 / 19

A Generative Data Model p is Unknown • So, we don’t need training data anymore? • We typically do not know p ( x , y ) • x = image? Or sentence? • Can we not estimate p ? • The curse of dimensionality, again • We typically cannot find RM p ( H ) or L p ( H ) • That’s the goal all the same COMPSCI 371D — Machine Learning Validation and Testing 7 / 19

A Generative Data Model So Why Talk About It? • Why talk about p ( x , y ) if we cannot know it? • L p ( h ) is a mean, and we can estimate means • We can sandwich L p ( h ) or L p ( H ) between bounds over all possible choices of p • What else would we do anyway? • p is conceptually clean and simple • The unattainable holy grail • Think of p as an oracle that sells samples from X × Y • She knows p , we don’t • Samples cost money and effort! [Example: MNIST Database] COMPSCI 371D — Machine Learning Validation and Testing 8 / 19

A Generative Data Model Even More Importantly... • We know what “related” means: T , V , S are all drawn independently from p ( x , y ) • We know what “generalize” means: Find RM p ( H ) ∈ arg min h ∈H L p ( h ) • We know the goal of machine learning COMPSCI 371D — Machine Learning Validation and Testing 9 / 19

Model Selection: Validation Validation • Parametric family of hypothesis spaces H = � π ∈ Π H π • Finding a good vector ˆ π of hyper-parameters is called model selection • A popular method is called validation • Use a validation set V separate from T • Pick a hyper-parameter vector for which the predictor trained on the training set minimizes the validation risk π = arg min ˆ π ∈ Π L V ( ERM T ( H π )) • When the set Π of hyper-parameters is finite, try them all COMPSCI 371D — Machine Learning Validation and Testing 10 / 19

Model Selection: Validation Validation Algorithm procedure V ALIDATION ( H , Π , T , V , ℓ ) ˆ L = ∞ ⊲ Stores the best risk so far on V for π ∈ Π do h ∈ arg min h ′ ∈H π L T ( h ′ ) ⊲ Use loss ℓ to compute best predictor ERM T ( H π ) on T L = L V ( h ) ⊲ Use loss ℓ to evaluate the predictor’s risk on V if L < ˆ L then π , ˆ h , ˆ (ˆ L ) = ( π , h , L ) ⊲ Keep track of the best hyper-parameters, predictor, and risk end if end for π , ˆ h , ˆ return (ˆ L ) ⊲ Return best hyper-parameters, predictor, and risk estimate end procedure COMPSCI 371D — Machine Learning Validation and Testing 11 / 19

Model Selection: Validation Validation for Infinite Sets • When Π is not finite, scan and find a local minimum • Example: Polynomial degree 1.5 5 training risk validation risk 1 k = 1 0.5 k = 2 k = 3 k = 6 k = 9 0 0 0 2 4 6 8 10 0 1 • When Π is not countable, scan a grid and find a local minimum COMPSCI 371D — Machine Learning Validation and Testing 12 / 19

Model Selection: Cross-Validation Resampling Methods for Validation • Validation is good but expensive: needs separate data • A pity not to use V as part of T ! • Resampling methods split T into T k and V k for k = 1 , . . . , K • (Nothing to do with number of classes or polynomial degree!) • For each π , for each k , train on T k , test on V k to measure performance • Average performance over k taken as validation risk for π • Let ˆ π be the best π • Train the predictor in H ˆ π and on all of T • Cross-validation and the bootstrap differ on how splits are made COMPSCI 371D — Machine Learning Validation and Testing 13 / 19

Model Selection: Cross-Validation K -Fold Cross-Validation • V 1 , . . . , V K are a partition of T into approximately equal-sized sets • T k = T \ V k • For π ∈ Π For k = 1 , . . . , K : train on T k , measure performance on V k Average performance over k is validation risk for π • Pick ˆ π as the π with best average performance • Train the predictor in H ˆ π and on all of T • Since performance is an average, we also get a variance! • We don’t have that for standard validation COMPSCI 371D — Machine Learning Validation and Testing 14 / 19

Model Selection: Cross-Validation Cross-Validation Algorithm procedure C ROSS V ALIDATION ( H , Π , T , K , ℓ ) { V 1 , . . . , V K } = S PLIT ( T , K ) ⊲ Split T in K approximately equal-sized sets at random ˆ L = ∞ ⊲ Will hold the lowest risk over Π for π ∈ Π do s , s 2 = 0 , 0 ⊲ Will hold sum of risks and their squares to compute risk mean and variance for k = 1 , . . . , K do T k = T \ V k ⊲ Use all of T except V k as training set h ∈ arg min h ′∈H π L Tk ( h ′ ) ⊲ Use the loss ℓ to compute h = ERM Tk ( H π ) L = L Vk ( h ) ⊲ Use the loss ℓ to compute the risk of h on V k ( s , s 2 ) = ( s + L , s 2 + L 2 ) ⊲ Keep track of quantities to compute risk mean and variance end for L = s / K ⊲ Sample mean of the risk over the K folds if L < ˆ L then σ 2 = ( s 2 − s 2 / K ) / ( K − 1 ) ⊲ Sample variance of the risk over the K folds π , ˆ σ 2 ) = ( π , L , σ 2 ) ( ˆ L , ˆ ⊲ Keep track of the best hyper-parameters and their risk statistics end if end for ˆ h = arg min h ∈H ˆ π L T ( h ) ⊲ Train predictor afresh on all of T with the best hyper-parameters π , ˆ h , ˆ σ 2 ) return ( ˆ L , ˆ ⊲ Return best hyper-parameters, predictor, and risk statistics end procedure COMPSCI 371D — Machine Learning Validation and Testing 15 / 19

Model Selection: Cross-Validation How big is K ? • T k has | T | ( K − 1 ) / K samples, so the predictor in each fold is a bit worse than the final predictor • Smaller K : More pessimistic risk estimate (upward bias b/c we train on smaller T k ) • Bigger K decreases bias of risk estimate • (training on bigger T k ) • Why not K = N ? • LOOCV (Leave-One-Out Cross-Validation) • Train on all but one data point, test on that data point, repeat • Any issue? • Nadeau and Bengio recommend K = 15 COMPSCI 371D — Machine Learning Validation and Testing 16 / 19

Model Selection: The Bootstrap The Bootstrap • Bag or multiset : A set that allows for multiple instances • { a , a , b , b , b , c } has cardinality 6 • Multiplicities : 2 for a , 3 for b , and 1 for c • A set is also a bag: { a , b , c } • Bootstrap: Same as CV, except • T k : N samples drawn uniformly at random from T , with replacement • V k = T \ T k • T k is a bag, V k is a set • Repetitions change training risk to a weighted average: � N � J L T k ( h ) = 1 n = 1 ℓ ( y n , h ( x n )) = 1 j = 1 m j ℓ ( y j , h ( x j )) N N COMPSCI 371D — Machine Learning Validation and Testing 17 / 19

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D - PowerPoint PPT Presentation

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Validation and Testing 1 / 19 Outline 1 Training, Testing, and Model Selection 2 A Generative Data Model 3 Model Selection: Validation 4 Model Selection:

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Module 4 19/05/2015 2 Agenda 1. What is validation? 2. Three-part empathy 3. What is

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Bounce Address Tag Validation Bounce Address Tag Validation Bounce Address Tag Validation (BATV)

Capital Quality Validation Webinar Sept. 17, 2020 Agenda Validation Overview

AIRS Validation Overview & TDS Support of Validation Eric Fetzer AIRS Science Team Meeting

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

AngularJS & Bootstrap Form Validation HTML default validation Browsers have built-in

Chapter 5 Analysis: Four Level for Validation Vis/Visual Analytics, Chap 5 Validation 1 CGGM

Software testing Software Testing Introduction Testing levels Automated testing Principles and

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov

In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation

Errors, and What to Do What to Do About Errors

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D - PowerPoint PPT Presentation

Validation and Testing COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Validation and Testing 1 / 19 Outline 1 Training, Testing, and Model Selection 2 A Generative Data Model 3 Model Selection: Validation 4 Model Selection:

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Module 4 19/05/2015 2 Agenda 1. What is validation? 2. Three-part empathy 3. What is

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Bounce Address Tag Validation Bounce Address Tag Validation Bounce Address Tag Validation (BATV)

Capital Quality Validation Webinar Sept. 17, 2020 Agenda Validation Overview

AIRS Validation Overview &amp; TDS Support of Validation Eric Fetzer AIRS Science Team Meeting

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

AngularJS &amp; Bootstrap Form Validation HTML default validation Browsers have built-in

Chapter 5 Analysis: Four Level for Validation Vis/Visual Analytics, Chap 5 Validation 1 CGGM

Software testing Software Testing Introduction Testing levels Automated testing Principles and

Advanced branch predic.on algorithms Ryan Gabrys Ilya Kolykhmatov

In Deep Learning Anima Anandkumar &amp; Zachary Lipton DATA AUGMENTATION To improve

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

LEARNING [These slides were adapted from those created by Dan Klein and Pieter Abbeel for CS188

= + TF TF IIC 0 . 1 (4) m The IIC can be qualified as a criterion to estimate

Feature-Based Tagging The Task, Again Recall: tagging ~ morphological disambiguation

Errors, and What to Do What to Do About Errors

AIRS Validation Overview & TDS Support of Validation Eric Fetzer AIRS Science Team Meeting

AngularJS & Bootstrap Form Validation HTML default validation Browsers have built-in

In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve