Cross-Validation Machine Learning 1 Model selection Very broadly: - PowerPoint PPT Presentation

Cross-Validation Machine Learning 1

Model selection Very broadly: Choosing the best model using given data • What makes a model – Features – Hyper-parameters that control the hypothesis space • Example: depth of a decision tree, neural network architecture, etc. – The learning algorithm (which may have its own hyper- parameters) – Actual model itself • The learning algorithms we see in this class only find the last one – What about the rest? 2

Model selection strategies • Many, many different approaches out there – (Chapter 7 of Elements of Statistical Learning Theory) – Minimum description length – VC dimension and risk minimization – Cross-validation – Bayes factor, AIC, BIC, …. 3

Cross-validation We want to train a classifier using a given dataset We know how to train given features and hyper- parameters. How do we know what the best feature set and hyper- parameters are? 4

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data into K (say 5 or 10) equal sized parts Part 1 Part 2 Part 3 Part 4 Part 5 5

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data randomly into K (say 5 or 10) equal sized parts 2. Train a classifier on four parts and evaluate it on the fifth one train Part 1 Part 2 Part 3 Part 4 Part 5 6

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data randomly into K (say 5 or 10) equal sized parts 2. Train a classifier on four parts and evaluate it on the fifth one train evaluate Part 1 Part 2 Part 3 Part 4 Part 5 Accuracy 5 7

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data randomly into K (say 5 or 10) equal sized parts 2. Train a classifier on four parts and evaluate it on the fifth one 3. Repeat this using each of the K parts as the validation set Part 1 Part 2 Part 3 Part 4 Part 5 Accuracy 5 Accuracy 4 Part 1 Part 2 Part 3 Part 4 Part 5 Accuracy 3 Part 1 Part 2 Part 3 Part 4 Part 5 Part 1 Part 2 Part 3 Part 4 Part 5 Accuracy 2 Accuracy 1 Part 1 Part 2 Part 3 Part 4 Part 5 8

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data randomly into K (say 5 or 10) equal sized parts 2. Train a classifier on four parts and evaluate it on the fifth one 3. Repeat this using each of the K parts as the validation set 4. The quality of this feature set/hyper-parameter is the average of these K estimates Performance = (accuracy 1 + accuracy 2 + accuracy 3 + accuracy 4 + accuracy 5 )/5 9

K-fold cross-validation Given a particular feature set and hyper-parameter setting 1. Split the data randomly into K (say 5 or 10) equal sized parts 2. Train a classifier on four parts and evaluate it on the fifth one 3. Repeat this using each of the K parts as the validation set 4. The quality of this feature set/hyper-parameter is the average of these K estimates Performance = (accuracy 1 + accuracy 2 + accuracy 3 + accuracy 4 + accuracy 5 )/5 5. Repeat for every feature set/hyper parameter choice 10

Cross-validation We want to train a classifier using a given dataset We know how to train given features and hyper-parameters How do we know what the best feature set and hyper- parameters are? 1. Evaluate every feature set and hyper-parameter using cross- validation (could be computationally expensive) 2. Pick the best according to cross-validation performance 3. Train on full data using this setting 11

Cross-validation We want to train a classifier using a given dataset We know how to train given features and hyper-parameters How do we know what the best feature set and hyper- parameters are? 1. Evaluate every feature set and hyper-parameter using cross- validation (could be computationally expensive) 2. Pick the best according to cross-validation performance 3. Train on full data using this setting 12

Cross-Validation Machine Learning 1 Model selection Very broadly: - PowerPoint PPT Presentation

Cross-Validation Machine Learning 1 Model selection Very broadly: Choosing the best model using given data What makes a model Features Hyper-parameters that control the hypothesis space Example: depth of a decision tree, neural

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Stratified Cross-Validation in Multi-Label Classification Using Genetic Algorithms 7-8/02/2013

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Criticality experiments and benchmarks for for validation of cross validation of cross sections:

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

in Spark Using GPU Minsik Cho, Rajesh Bordawekar IBM TJW Research 1 Cross-Validation 101

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Seismic landslide hazard zonation By: M.T.J. Terlien Department of Earth Resources Surveys,

Summary of Last Chapter Principles of Knowledge Discovery in Data What is the motivation for

Towards a linear algebra semantics for columnar data storage Institute of Cybernetics Tallinn

Income Mobility in the Developing World: Navigating and Interpreting the Empirical Evidence

Machine Learning July 20, 2016 Basic Concepts: Review Example machine learning problem: Decide

Bayesian leave-one-out cross-validation for large data Mns Magnusson (Aalto University) Michael

CS 6316 Machine Learning Model Selection and Validation Yangfeng Ji Department of Computer

Lecture 5: Regularization ML Methodology Aykut Erdem February 2016 Hacettepe University