Lecture #12: kNN Classification and Missing Data Data Science 1 CS - PowerPoint PPT Presentation

Lecture #12: kNN Classification and Missing Data Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave

Lecture Outline ROC Curves k -NN Revisited Dealing with Missing Data Types of Missingness Imputation Methods 2

ROC Curves 3

See next slide for an example. ROC Curves The ROC curve illustrates the trade-off for all possible thresholds chosen for the two types of error (or correct classification). The vertical axis displays the true positive predictive value and the horizontal axis depicts the true negative predictive value. What is the shape of an ideal ROC curve? 4

ROC Curves The ROC curve illustrates the trade-off for all possible thresholds chosen for the two types of error (or correct classification). The vertical axis displays the true positive predictive value and the horizontal axis depicts the true negative predictive value. What is the shape of an ideal ROC curve? See next slide for an example. 4

ROC Curve Example 5

This AUC then can be use to compare various approaches to classification: Logistic regression, LDA (to come), kNN, etc... ROC Curve for measuring classifier preformance The overall performance of a classifier, calculated over all possible thresholds, is given by the area under the ROC curve (’AUC’). Let T be the threshold False Positive Rate, and let TPR ( T ) be the corresponding True Positive rate at T , then the AUC is simply just the integral of the function: ∫ 1 AUC = TPR ( T ) dT 0 What is the worst case scenario for AUC? What is the best case? What is AUC if we independently just flip a coin to perform classification? 6

ROC Curve for measuring classifier preformance The overall performance of a classifier, calculated over all possible thresholds, is given by the area under the ROC curve (’AUC’). Let T be the threshold False Positive Rate, and let TPR ( T ) be the corresponding True Positive rate at T , then the AUC is simply just the integral of the function: ∫ 1 AUC = TPR ( T ) dT 0 What is the worst case scenario for AUC? What is the best case? What is AUC if we independently just flip a coin to perform classification? This AUC then can be use to compare various approaches to classification: Logistic regression, LDA (to come), kNN, etc... 6

k -NN Revisited 7

The approach was simple: to predict an observation’s response, use the other available observations that are most similar to it. For a specified value of k , each observation’s outcome is predicted to be the average of the k-closest observations as measured by some distance of the predictor(s). With one predictor, the method was easily implemented. k -Nearest Neighbors We’ve already seen the k-NN method for predicting a quantitative response (it was the very first method we introduced). How was k-NN implemented in the Regression setting (quantitative response)? 8

k -Nearest Neighbors We’ve already seen the k-NN method for predicting a quantitative response (it was the very first method we introduced). How was k-NN implemented in the Regression setting (quantitative response)? The approach was simple: to predict an observation’s response, use the other available observations that are most similar to it. For a specified value of k , each observation’s outcome is predicted to be the average of the k-closest observations as measured by some distance of the predictor(s). With one predictor, the method was easily implemented. 8

A picture is worth a thousand words... Review: Choice of k How well the predictions perform is related to the choice of k . What will the predictions look like if k is very small? What if it is very large? More specifically, what will the predictions be for new observations if k = n ? 9

Review: Choice of k How well the predictions perform is related to the choice of k . What will the predictions look like if k is very small? What if it is very large? More specifically, what will the predictions be for new observations if k = n ? y ¯ A picture is worth a thousand words... 9

Choice of k Matters 2 2 4 5 5 10 10 100 100 500 500 2 1000 1000 y_train 0 −2 −4 −6 −4 −2 0 2 4 6 x_train 10

The approach here is the same as for k-NN regression: use the other available observations that are most similar to the observation we are trying to predict (classify into a group) based on the predictors at hand. How do we classify which category a specific observation should be in based on its nearest neighbors? The category that shows up the most among the nearest neighbors. k-NN for Classification How can we modify the k-NN approach for classification? 11

The category that shows up the most among the nearest neighbors. k-NN for Classification How can we modify the k-NN approach for classification? The approach here is the same as for k-NN regression: use the other available observations that are most similar to the observation we are trying to predict (classify into a group) based on the predictors at hand. How do we classify which category a specific observation should be in based on its nearest neighbors? 11

k-NN for Classification How can we modify the k-NN approach for classification? The approach here is the same as for k-NN regression: use the other available observations that are most similar to the observation we are trying to predict (classify into a group) based on the predictors at hand. How do we classify which category a specific observation should be in based on its nearest neighbors? The category that shows up the most among the nearest neighbors. 11

k-NN for Classification formal definition The KNN classifier first identifies the k points in the training data that are closest to x 0 , represented by N 0 . It then estimates the conditional probability for class j as the fraction of points in N 0 whose response values equal j : P ( Y = j | X = x 0 ) = 1 ∑ I ( y i = j ) K i ∈N 0 Then, the k-NN classifier applies Bayes rule and classifies the test observation, x 0 , to the class with largest probability. 12

With a coin flip! What could be a major problem with always classifying to the most common group amongst the neighbors? If one category is much more common than the others then all the predictions may be the same! How can we handle this? Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? 13

What could be a major problem with always classifying to the most common group amongst the neighbors? If one category is much more common than the others then all the predictions may be the same! How can we handle this? Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? With a coin flip! 13

If one category is much more common than the others then all the predictions may be the same! How can we handle this? Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? With a coin flip! ▶ What could be a major problem with always classifying to the most common group amongst the neighbors? 13

How can we handle this? Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? With a coin flip! ▶ What could be a major problem with always classifying to the most common group amongst the neighbors? If one category is much more common than the others then all the predictions may be the same! 13

Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? With a coin flip! ▶ What could be a major problem with always classifying to the most common group amongst the neighbors? If one category is much more common than the others then all the predictions may be the same! ▶ How can we handle this? 13

k-NN for Classification (cont.) There are some issues that may arise: ▶ How can we handle a tie? With a coin flip! ▶ What could be a major problem with always classifying to the most common group amongst the neighbors? If one category is much more common than the others then all the predictions may be the same! ▶ How can we handle this? Rather than classifying with the most likely group, use a biased coin flip to decide which group to classify to! 13

We would need to define a measure of distance for observations in order to which are the most similar to the observation we are trying to predict. Euclidean distance is a good option. To measure the distance of a new observation, x from each observation in the data set, x : x x k-NN with Multiple Predictors How could we extend k -NN (both regression and classification) when there are multiple predictors? 14

Lecture #12: kNN Classification and Missing Data Data Science 1 CS - PowerPoint PPT Presentation

Lecture #12: kNN Classification and Missing Data Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave Lecture Outline ROC Curves k -NN Revisited Dealing with Missing Data Types of

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Final Project Specifications CMPE 650 kNN Overview K-N earest N eighbors (kNN) is a

10-701 Fall 2017 Recitation 3 Agenda Q1 - Decision Tree to KNN A1 Q2.1 - KNN to Decision

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

CS 445 Introduction to Machine Learning Features and the KNN Classifier Instructor: Dr. Kevin

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Nearest Neighbor Classification Seed classification by area and What should we compactness

Missing data and data imputation with the Swiss Household Panel Andr Berchtold LIVES, LINES,

Estimating Gaussian Mixture Models from Data with Missing Features by Daniel McMichael CSSIP

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Whats Missing? SOCI 101 November 29, 2011 SOCI 101 () Whats Missing? November 29, 2011

Graph Classification Classification Outline Introduction, Overview Classification using

Q3 2012 interim financial results presentation for the three month period ending 31 May 19 July

Healthcare Conference January 12, 2017 Safe Harbor Statement and Non-GAAP Financial Measures

Training Presentation EIV Income Validation Tool (IVT) PHA Training 2018 EIV & IVT The

Apollo Global Management, LLC Reports Third Quarter 2015 Results New York, October 28, 2015--

Multiple imputation methods for incomplete longitudinal ordinal data: a simulation study

Incremental Algorithms for Missing Data Imputation based on Recursive Partitioning Claudio

Updated Survey Indices for Silver and Red hake through 2019 Background Relative index for

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Lecture #12: kNN Classification and Missing Data Data Science 1 CS - PowerPoint PPT Presentation

Lecture #12: kNN Classification and Missing Data Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas Kevin Rader Margo Levine Rahul Dave Lecture Outline ROC Curves k -NN Revisited Dealing with Missing Data Types of

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

Final Project Specifications CMPE 650 kNN Overview K-N earest N eighbors (kNN) is a

10-701 Fall 2017 Recitation 3 Agenda Q1 - Decision Tree to KNN A1 Q2.1 - KNN to Decision

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

CS 445 Introduction to Machine Learning Features and the KNN Classifier Instructor: Dr. Kevin

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Bayesian Generalized linear mixed models with data missing not at random Overview: Two simple

Nearest Neighbor Classification Seed classification by area and What should we compactness

Missing data and data imputation with the Swiss Household Panel Andr Berchtold LIVES, LINES,

Estimating Gaussian Mixture Models from Data with Missing Features by Daniel McMichael CSSIP

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Whats Missing? SOCI 101 November 29, 2011 SOCI 101 () Whats Missing? November 29, 2011

Graph Classification Classification Outline Introduction, Overview Classification using

Q3 2012 interim financial results presentation for the three month period ending 31 May 19 July

Healthcare Conference January 12, 2017 Safe Harbor Statement and Non-GAAP Financial Measures

Training Presentation EIV Income Validation Tool (IVT) PHA Training 2018 EIV &amp; IVT The

Apollo Global Management, LLC Reports Third Quarter 2015 Results New York, October 28, 2015--

Multiple imputation methods for incomplete longitudinal ordinal data: a simulation study

Incremental Algorithms for Missing Data Imputation based on Recursive Partitioning Claudio

Updated Survey Indices for Silver and Red hake through 2019 Background Relative index for

Genotype imputation accuracy with different reference panels Guan-Hua Huang and Yi-Chi Tseng

Training Presentation EIV Income Validation Tool (IVT) PHA Training 2018 EIV & IVT The