Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw - PowerPoint PPT Presentation

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34

Outline Cross Validation 1 How Many Folds? Ensemble Methods 2 Voting Bagging Boosting Why AdaBoost Works? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 2 / 34

Cross Validation So far, we use the hold out method for: Hyperparameter tuning: validation set Performance reporting: testing set What if we get an “unfortunate” split? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

Cross Validation So far, we use the hold out method for: Hyperparameter tuning: validation set Performance reporting: testing set What if we get an “unfortunate” split? K -fold cross validation : Split the data set X evenly into K subsets X ( i ) (called folds ) 1 For i = 1 , ··· , K , train f � N ( i ) using all data but the i -th fold ( X \ X ( i ) ) 2 Report the cross-validation error C CV by averaging all testing errors 3 C [ f � N ( i ) ] ’s on X ( i ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Train final model using both 2 training and validation sets with the selected hyperparameters Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Train final model using both 2 training and validation sets with the selected hyperparameters Outer (5 folds): report C CV as 3 test error Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

How Many Folds K ? I Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Regard each C [ f � N ( i ) ] as an estimator of the expected generalization error E X ( C [ f N ]) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Regard each C [ f � N ( i ) ] as an estimator of the expected generalization error E X ( C [ f N ]) C CV is an estimator too, and we have MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) � 2 + 2 · 0 · ( E [ ˆ ( ˆ θ n � E [ ˆ E [ ˆ ⇥ θ n ]) 2 ⇤ � = E + θ n ] � θ θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) � 2 + 2 · 0 · ( E [ ˆ ( ˆ θ n � E [ ˆ E [ ˆ ⇥ θ n ]) 2 ⇤ � = E + θ n ] � θ θ n ] � θ ) = Var X ( ˆ θ n )+ bias ( ˆ θ n ) 2 MSE of an unbiased estimator is its variance Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels E X ( C [ f N ]) : read line Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels E X ( C [ f N ]) : read line bias ( C CV ) : gaps between the red and other solid lines ( E X [ C CV ] ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw - PowerPoint PPT Presentation

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34 Outline Cross

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Criticality experiments and benchmarks for for validation of cross validation of cross sections:

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research 1 Cross-Validation 101

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

The Relational Algebra and Relational Calculus 5DV119 Introduction to Database Management

Ashequl Qadir University of Wolverhampton, UK ashequl.qadir@wlv.ac.uk Outline Introduction

cccccc BIBFRAME Ray Denenberg BIBFRAME Pilot Ray Denenberg / Nate Trail / Sally McCallum

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Emerging Technology Community of Interest February 19, 2019 Advancing Government through

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems

Parallel Algorithms and CS26 S260 Algor gorit ithmic mic Engin ginee eerin ing Yihan

ACLU of Oregon Legislative Advocacy Bill of Rights Action Network Webinar March 15, 2013 1