Machine Learning and Data Mining Nearest neighbor methods Kalev - PowerPoint PPT Presentation

+ Machine Learning and Data Mining Nearest neighbor methods Kalev Kask

Supervised learning • Notation – Features x – Targets y – Predictions ŷ – Parameters q Learning algorithm Change q Program (“Learner”) Improve performance Characterized by some “ parameters ” q Training data (examples) Procedure (using q ) Features that outputs a prediction Feedback / Target values Score performance (“cost function”)

Regression; Scatter plots 40 Target y y (new) =? 20 x (new) 0 0 10 20 Feature x • Suggests a relationship between x and y • Regression: given new observed x (new) , estimate y (new) (c) Alexander Ihler

Nearest neighbor regression “ Predictor ” : 40 Given new features: Target y Find nearest example y (new) =? Return its value 20 x (new) 0 0 10 20 Feature x • Find training datum x (i) closest to x (new) ; predict y (i) (c) Alexander Ihler

Nearest neighbor regression “ Predictor ” : 40 Given new features: Target y Find nearest example Return its value 20 0 0 10 20 Feature x • Find training datum x (i) closest to x (new) ; predict y (i) • Defines an (implicit) function f(x) • “Form” is piecewise constant (c) Alexander Ihler

Nearest neighbor classifier “ Predictor ” : Given new features: Find nearest example Return its value 1 0 1 X 2 ! 0 1 ? 1 0 0 X 1 ! (c) Alexander Ihler

Nearest neighbor classifier “ Predictor ” : Given new features: Find nearest example Return its value 1 0 1 X 2 ! 0 1 ? 1 “ Closest ” training x? 0 Typically Euclidean distance: 0 X 1 ! (c) Alexander Ihler

Nearest neighbor classifier Decision Boundary All points where we decide 1 1 0 1 X 2 ! 0 1 ? 1 0 0 All points where we decide 0 X 1 ! (c) Alexander Ihler

Nearest neighbor classifier Voronoi tessellation: Each datum is assigned to a region, in 1 which all points are closer to it than any other datum 0 1 Decision boundary: X 2 ! 0 Those edges across 1 which the decision 1 (class of nearest ? 0 training datum) changes 0 Nearest Nbr: Piecewise linear boundary X 1 ! (c) Alexander Ihler

Nearest neighbor classifier Nearest Nbr: 1 Piecewise linear boundary Class 1 0 1 X 2 ! Class 0 1 0 0 X 1 ! (c) Alexander Ihler

More Data Points 1 1 1 2 2 1 1 2 2 X 2 ! 1 2 1 1 2 2 2 X 1 ! (c) Alexander Ihler

More Complex Decision Boundary 1 In general: Nearest-neighbor classifier 1 1 produces piecewise linear 2 decision boundaries 2 1 1 2 2 X 2 ! 1 2 1 1 2 2 2 X 1 ! (c) Alexander Ihler

K-Nearest Neighbor (kNN) Classifier • Find the k-nearest neighbors to x in the data – i.e., rank the feature vectors according to Euclidean distance – select the k vectors which are have smallest distance to x • Regression – Usually just average the y-values of the k closest training examples • Classification – ranking yields k feature vectors and a set of k class labels – pick the class label which is most common in this set ( “ vote ” ) – classify x as belonging to this class – Note: for two- class problems, if k is odd (k=1, 3, 5, …) there will never be any “ties”; otherwise, just use (any) tie -breaking rule • “Like” the optimal estimator, but using nearest k points to estimate p( y|x) • “ Training ” is trivial: just use training data as a lookup table, and search to classify a new datum (c) Alexander Ihler

kNN Decision Boundary • Piecewise linear decision boundary • Increasing k “ simplifies ” decision boundary – Majority voting means less emphasis on individual points K = 1 K = 3

kNN Decision Boundary • Piecewise linear decision boundary • Increasing k “ simplifies ” decision boundary – Majority voting means less emphasis on individual points K = 5 K = 7

kNN Decision Boundary • Piecewise linear decision boundary • Increasing k “ simplifies ” decision boundary – Majority voting means less emphasis on individual points K = 25

Error rates and K Predictive Error Error on Test Data Error on Training Data K (# neighbors) K=1? Zero error! Training data have been memorized... Best value of K (c) Alexander Ihler

Complexity & Overfitting • Complex model predicts all training points well • Doesn’t generalize to new data points • k = 1 : perfect memorization of examples (complex) • k = m : always predict majority class in dataset (simple) • Can select k using validation data, etc. simpler Too complex K (# neighbors) (c) Alexander Ihler

K-Nearest Neighbor (kNN) Classifier • Theoretical Considerations – as k increases • we are averaging over more neighbors • the effective decision boundary is more “ smooth ” – as m increases, the optimal k value tends to increase (as O(log(m))) – k=1, m increasing to infinity : error < 2x optimal • Extensions of the Nearest Neighbor classifier – Weighted distances • e.g., some features may be more important; others may be irrelevant • Mahalanobis distance: – Fast search techniques (indexing) to find k-nearest points in d-space – Weighted average / voting based on distance (c) Alexander Ihler

Curse of dimensionality • Various phenomena that occur when analyzing and organizing data in higher dimensions (e.g. thousands) – When d >> 1 volume of data increases so rapidly that data becomes sparse – The amount of data needed for statistical validity grows exponentially with dimensionality – E.g. when d >> 1, distances between points become uniform

Summary • K-nearest neighbor models – Classification (vote) – Regression (average or weighted average) • Piecewise linear decision boundary – How to calculate • Test data and overfitting – Model “complexity” for knn – Use validation data to estimate test error rates & select k (c) Alexander Ihler

Machine Learning and Data Mining Nearest neighbor methods Kalev - PowerPoint PPT Presentation

+ Machine Learning and Data Mining Nearest neighbor methods Kalev Kask Supervised learning Notation Features x Targets y Predictions Parameters q Learning algorithm Change q Program (Learner) Improve performance

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M.

Nearest neighbor methods Lecture 11 David Sontag New York

Decision Framing in Judgment Aggregation Fabrizio Cariani, Marc Pauly, Josh Snyder Philosophy

Annual Meeting September 28 th , 2015 Sandy Watershed Learning Center Council Development

Combining Classifiers d i,j = 1 if D i labels x in i , and d i,j = 0 otherwise. In this case,

rt t ts tr

K nearest neighbor LING 572 Advanced Statistical Methods for NLP Shane Steinert-Threlkeld

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

The Economic History of the World J. Parman (College of William & Mary) American Economic

Anti-Malthus: Evolution, Population and the Maximization of Free Resources David K. Levine

Machine Learning and Data Mining Nearest neighbor methods Kalev - PowerPoint PPT Presentation

+ Machine Learning and Data Mining Nearest neighbor methods Kalev Kask Supervised learning Notation Features x Targets y Predictions Parameters q Learning algorithm Change q Program (Learner) Improve performance

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M.

Nearest neighbor methods Lecture 11 David Sontag New York

Decision Framing in Judgment Aggregation Fabrizio Cariani, Marc Pauly, Josh Snyder Philosophy

Annual Meeting September 28 th , 2015 Sandy Watershed Learning Center Council Development

Combining Classifiers d i,j = 1 if D i labels x in i , and d i,j = 0 otherwise. In this case,

rt t ts tr

K nearest neighbor LING 572 Advanced Statistical Methods for NLP Shane Steinert-Threlkeld

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

The Economic History of the World J. Parman (College of William &amp; Mary) American Economic

Anti-Malthus: Evolution, Population and the Maximization of Free Resources David K. Levine

The Economic History of the World J. Parman (College of William & Mary) American Economic