Model Selection Matt Gormley Lecture 4 January 29, 2018 1 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1

Q&A Q: How do we deal with ties in k-Nearest Neighbors (e.g. even k or equidistant points)? A: I would ask you all for a good solution! Q: How do we define a distance function when the features are categorical (e.g. weather takes values {sunny, rainy, overcast})? A: Step 1: Convert from categorical attributes to numeric features (e.g. binary) Step 2: Select an appropriate distance function (e.g. Hamming distance) 2

Reminders • Homework 2: Decision Trees – Out: Wed, Jan 24 – Due: Mon, Feb 5 at 11:59pm • 10601 Notation Crib Sheet 3

K-NEAREST NEIGHBORS 7

k-Nearest Neighbors Chalkboard: – KNN for binary classification – Distance functions – Efficiency of KNN – Inductive bias of KNN – KNN Properties 8

KNN ON FISHER IRIS DATA 9

Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Petal Petal Length Width Length Width 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 10 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set

Fisher Iris Dataset Fisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936) Species Sepal Sepal Deleted two of the Length Width four features, so that 0 4.3 3.0 input space is 2D 0 4.9 3.6 0 5.3 3.7 1 4.9 2.4 1 5.7 2.8 1 6.3 3.3 1 6.7 3.0 11 Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set

KNN on Fisher Iris Data 12

KNN on Fisher Iris Data Special Case: Nearest Neighbor 13

KNN on Fisher Iris Data Special Case: Majority Vote 14

KNN on Fisher Iris Data Special Case: Nearest Neighbor 16

KNN on Fisher Iris Data Special Case: Majority Vote 36

KNN ON GAUSSIAN DATA 37

KNN on Gaussian Data 38

K-NEAREST NEIGHBORS 63

Questions • How could k-Nearest Neighbors (KNN) be applied to regression? • Can we do better than majority vote? (e.g. distance-weighted KNN) • Where does the Cover & Hart (1967) Bayes error rate bound come from ? 64

KNN Learning Objectives You should be able to… • Describe a dataset as points in a high dimensional space [CIML] • Implement k-Nearest Neighbors with O(N) prediction • Describe the inductive bias of a k-NN classifier and relate it to feature scale [a la. CIML] • Sketch the decision boundary for a learning algorithm (compare k-NN and DT) • State Cover & Hart (1967)'s large sample analysis of a nearest neighbor classifier • Invent "new" k-NN learning algorithms capable of dealing with even k • Explain computational and geometric examples of the curse of dimensionality 65

k-Nearest Neighbors But how do we choose k? 66

MODEL SELECTION 67

Model Selection WARNING : • In some sense, our discussion of model selection is premature. • The models we have considered thus far are fairly simple. • The models and the many decisions available to the data scientist wielding them will grow to be much more complex than what we’ve seen so far. 68

Model Selection Statistics Machine Learning • • Def : a model defines the data Def : (loosely) a model defines the hypothesis space over which generation process (i.e. a set or learning performs its search family of parametric probability distributions) • Def : model parameters are the numeric values or structure • Def : model parameters are the selected by the learning algorithm values that give rise to a that give rise to a hypothesis particular probability distribution in the model family • Def : the learning algorithm defines the data-driven search • Def : learning (aka. estimation) is over the hypothesis space (i.e. the process of finding the search for good parameters) parameters that best fit the data • Def : hyperparameters are the • Def : hyperparameters are the tunable aspects of the model, that parameters of a prior the learning algorithm does not distribution over parameters select 69

Model Selection Example: Decision Tree Machine Learning • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which trees, possibly restricted by learning performs its search some hyperparameters (e.g. max depth) • Def : model parameters are the numeric values or structure selected by the learning algorithm • parameters = structure of a that give rise to a hypothesis specific decision tree • Def : the learning algorithm defines the data-driven search • learning algorithm = ID3, over the hypothesis space (i.e. CART, etc. search for good parameters) • Def : hyperparameters are the • hyperparameters = max- tunable aspects of the model, that depth, threshold for splitting the learning algorithm does not criterion, etc. select 70

Model Selection Machine Learning Example: k-Nearest Neighbors • • model = set of all possible Def : (loosely) a model defines the hypothesis space over which nearest neighbors classifiers learning performs its search • • Def : model parameters are the parameters = none numeric values or structure (KNN is an instance-based or selected by the learning algorithm non-parametric method) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = for naïve defines the data-driven search setting, just storing the data over the hypothesis space (i.e. search for good parameters) • hyperparameters = k , the • Def : hyperparameters are the number of neighbors to tunable aspects of the model, that consider the learning algorithm does not select 71

Model Selection Example: Perceptron Machine Learning • • model = set of all linear Def : (loosely) a model defines the hypothesis space over which separators learning performs its search • • Def : model parameters are the parameters = vector of numeric values or structure weights (one for each selected by the learning algorithm feature) that give rise to a hypothesis • Def : the learning algorithm • learning algorithm = mistake defines the data-driven search based updates to the over the hypothesis space (i.e. parameters search for good parameters) • Def : hyperparameters are the • hyperparameters = none tunable aspects of the model, that (unless using some variant the learning algorithm does not such as averaged perceptron) select 72

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal with ties in k-Nearest Neighbors

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Class of 2024 1 Course selection worksheet 1 Course selection online directions for

MLbase: A System for Distributed Machine Learning Ameet Talwalkar

Introduction to Machine Learning Hyperparameter Tuning - Problem Definition

Cryptanalysis of the Advanced Encryption Standard Vincent Rijmen Albena 2013 Content AES

Revisiting the Area under the ROC Berry de Bruijn Institute for Information Technology National

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany

Model Selection Matt Gormley Lecture 4 January 29, 2018 1 - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal with ties in k-Nearest Neighbors

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Class of 2024 1 Course selection worksheet 1 Course selection online directions for

MLbase: A System for Distributed Machine Learning Ameet Talwalkar

Introduction to Machine Learning Hyperparameter Tuning - Problem Definition

Cryptanalysis of the Advanced Encryption Standard Vincent Rijmen Albena 2013 Content AES

Revisiting the Area under the ROC Berry de Bruijn Institute for Information Technology National

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Scalable Bandit Methods for Hyper-parameter Tuning Kirthevasan Kandasamy Carnegie Mellon

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Towards efficient automatic end-to-end learning Frank Hutter University of Freiburg, Germany

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?