cross validation ensembling
play

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw - PowerPoint PPT Presentation

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34 Outline Cross


  1. Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 1 / 34

  2. Outline Cross Validation 1 How Many Folds? Ensemble Methods 2 Voting Bagging Boosting Why AdaBoost Works? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 2 / 34

  3. Outline Cross Validation 1 How Many Folds? Ensemble Methods 2 Voting Bagging Boosting Why AdaBoost Works? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 3 / 34

  4. Cross Validation So far, we use the hold out method for: Hyperparameter tuning: validation set Performance reporting: testing set What if we get an “unfortunate” split? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

  5. Cross Validation So far, we use the hold out method for: Hyperparameter tuning: validation set Performance reporting: testing set What if we get an “unfortunate” split? K -fold cross validation : Split the data set X evenly into K subsets X ( i ) (called folds ) 1 For i = 1 , ··· , K , train f � N ( i ) using all data but the i -th fold ( X \ X ( i ) ) 2 Report the cross-validation error C CV by averaging all testing errors 3 C [ f � N ( i ) ] ’s on X ( i ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 4 / 34

  6. Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

  7. Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

  8. Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Train final model using both 2 training and validation sets with the selected hyperparameters Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

  9. Nested Cross Validation Cross validation (CV) can be applied to both hyperparameter tuning and performance reporting E.g, 5 ⇥ 2 nested CV Inner (2 folds): select 1 hyperparameters giving lowest C CV Can be wrapped by grid search Train final model using both 2 training and validation sets with the selected hyperparameters Outer (5 folds): report C CV as 3 test error Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 5 / 34

  10. Outline Cross Validation 1 How Many Folds? Ensemble Methods 2 Voting Bagging Boosting Why AdaBoost Works? Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 6 / 34

  11. How Many Folds K ? I Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

  12. How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

  13. How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Regard each C [ f � N ( i ) ] as an estimator of the expected generalization error E X ( C [ f N ]) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

  14. How Many Folds K ? I The cross-validation error C CV is an average of C [ f � N ( i ) ] ’s Regard each C [ f � N ( i ) ] as an estimator of the expected generalization error E X ( C [ f N ]) C CV is an estimator too, and we have MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 7 / 34

  15. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  16. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  17. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  18. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  19. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) � 2 + 2 · 0 · ( E [ ˆ ( ˆ θ n � E [ ˆ E [ ˆ ⇥ θ n ]) 2 ⇤ � = E + θ n ] � θ θ n ] � θ ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  20. Point Estimation Revisited: Mean Square Error Let ˆ θ n be an estimator of quantity θ related to random variable x mapped from n i.i.d samples of x Mean square error of ˆ θ n : MSE ( ˆ ( ˆ θ n � θ ) 2 ⇤ ⇥ θ n ) = E X Can be decomposed into the bias and variance: ( ˆ ( ˆ θ n � E [ ˆ θ n ]+ E [ ˆ ⇥ θ n � θ ) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ = E E X θ n ]) 2 +( E [ ˆ θ n ] � θ ) 2 + 2 ( ˆ ( ˆ θ n � E [ ˆ θ n � E [ ˆ θ n ])( E [ ˆ ⇥ ⇤ = E θ n ] � θ ) � ˆ ( ˆ θ n � E [ ˆ ( E [ ˆ θ n � E [ ˆ ( E [ ˆ ⇥ θ n ]) 2 ⇤ ⇥ θ n ] � θ ) 2 ⇤ � = E + E + 2E θ n ] θ n ] � θ ) � 2 + 2 · 0 · ( E [ ˆ ( ˆ θ n � E [ ˆ E [ ˆ ⇥ θ n ]) 2 ⇤ � = E + θ n ] � θ θ n ] � θ ) = Var X ( ˆ θ n )+ bias ( ˆ θ n ) 2 MSE of an unbiased estimator is its variance Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 8 / 34

  21. Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

  22. Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

  23. Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

  24. Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels E X ( C [ f N ]) : read line Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

  25. Example: 5-Fold vs. 10-Fold CV MSE ( C CV ) = E X [( C CV � E X ( C [ f N ])) 2 ] = Var X ( C CV )+ bias ( C CV ) 2 Consider polynomial regression where P ( y | x ) = sin ( x )+ ε , ε ⇠ N ( 0 , σ 2 ) Let C [ · ] be the MSE of predictions (made by a function) to true labels E X ( C [ f N ]) : read line bias ( C CV ) : gaps between the red and other solid lines ( E X [ C CV ] ) Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning 9 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend