Nearest Neighbor Learning (Instance Based Learning) l Classify based - PowerPoint PPT Presentation

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges from simple nearest neighbor to case-based and analogical reasoning l Use local information near the current query instance to decide the classification of that instance l As such can represent quite complex decision surfaces in a simple manner – Local model vs a model such as an MLP which uses a global decision surface CS 472 - Nearest Neighbor Learning 1

k -Nearest Neighbor Approach l Simply store all (or some representative subset) of the examples in the training set l When desiring to generalize on a new instance, measure the distance from the new instance to all the stored instances and the nearest ones vote to decide the class of the new instance l No need to pre-process a specific hypothesis (Lazy vs Eager learning) – Fast learning – Can be slow during execution and require significant storage – Some models index the data or reduce the instances stored to enhance efficiency CS 472 - Nearest Neighbor Learning 2

k -Nearest Neighbor (cont) l Naturally supports real valued attributes l Typically use Euclidean distance m ∑ ( x i − y i ) 2 dist ( x , y ) = i = 1 l Nominal/unknown attributes can just be a 1/0 distance (more on other distance metrics later) l The output class for the query instance is set to the most common class of its k nearest neighbors (could output confidence/probability) k ^ ∑ f ( x q ) = argmax δ ( v , f ( x i )) v ∈ V i = 1 where d ( x , y ) = 1 if x = y , else 0 l k greater than 1 is more noise resistant, but a large k would lead to less accuracy as less relevant neighbors have more influence (common values: k =3, k =5) Usually choose k by Cross Validation (trying different values for a task) – CS 472 - Nearest Neighbor Learning 3

Decision Surface l Linear decision boundary between 2 closest points of different classes for 1-nn CS 472 - Nearest Neighbor Learning 4

Decision Surface l Combining all the appropriate intersections gives a Voronoi diagram Same points - Manhattan distance Euclidean distance – each point a unique class CS 472 - Nearest Neighbor Learning 5

k -Nearest Neighbor (cont) l Usually do distance weighted voting where the strength of a neighbor's influence is proportional to its distance k 1 ^ ∑ w i = f ( x q ) = argmax w i δ ( v , f ( x i )) dist ( x q , x i ) 2 v ∈ V i = 1 l Inverse of distance squared is a common weight l Gaussian is another common distance weight l In this case the k value is more robust, could let k be even and/or be larger (even all points if desired), because the more distant points have negligible influence CS 472 - Nearest Neighbor Learning 6

*Challenge Question* - k -Nearest Neighbor l Assume the following data set l Assume a new point (2, 6) – For nearest neighbor distance use Manhattan distance – What would the output be for 3-nn with no distance weighting? What is the total vote? – What would the output be for 3-nn with distance weighting? What is the total vote? A. A A x y Label B. A B 1 5 A C. B A 0 8 B D. B B 9 9 B 10 10 A None of the above E. CS 472 - Nearest Neighbor Learning 7

*Challenge Question* - k -Nearest Neighbor l Assume the following data set l Assume a new point (2, 6) – For nearest neighbor distance use Manhattan distance – What would the output be for 3-nn with no distance weighting? What is the total vote? – B wins with vote 2 out of 3 – What would the output be for 3-nn with distance weighting? What is the total vote? A wins with vote .25 vs B vote of .0625+.01=.0725 x y Label Distance Weighted Vote 1/2 2 = .25 1 5 A 1 + 1 = 2 1/4 2 = .0625 0 8 B 2 + 2 = 4 1/10 2 = .01 9 9 B 7 + 3 = 10 1/12 2 = .0069 10 10 A 8 + 4 = 12 CS 472 - Nearest Neighbor Learning 8

Regression with k -nn l Can do regression by letting the output be the mean of the k nearest neighbors CS 472 - Nearest Neighbor Learning 9

Weighted Regression with k -nn l Can do weighted regression by letting the output be the weighted mean of the k nearest neighbors l For distance weighted regression k ∑ w i f ( x i ) ^ 1 i = 1 f ( x q ) = w i = k dist ( x q , x i ) 2 ∑ w i i = 1 l Where f ( x ) is the output value for instance x CS 472 - Nearest Neighbor Learning 10

Regression Example k 3 ∑ w i f ( x i ) ^ 8 i = 1 f ( x q ) = k ∑ w i 5 i = 1 1 w i = dist ( x q , x i ) 2 l What is the value of the new instance? l Assume dist( x q , n 8 ) = 2, dist( x q , n 5 ) = 3, dist( x q , n 3 ) = 4 l f ( x q ) = (8/2 2 + 5/3 2 + 3/4 2 )/(1/2 2 + 1/3 2 + 1/4 2 ) = 2.74/.42 = 6.5 l The denominator renormalizes the value CS 472 - Nearest Neighbor Learning 11

k -Nearest Neighbor Homework l Assume the following training set l Assume a new point (.5, .2) – For all below, use Manhattan distance, if required, and show work – What would the output class for 3-nn be with no distance weighting? – What would the output class for 3-nn be with squared inverse distance weighting? – What would the 3-nn regression value be for the point be if we used the regression labels rather than the class labels and used squared inverse distance weighting? x y Class Regression Label Label .3 .8 A .6 -.3 1.6 B -.3 .9 0 B .8 1 1 A 1.2 CS 472 - Nearest Neighbor Learning 12

Attribute Weighting l One of the main weaknesses of nearest neighbor is irrelevant features, since they can dominate the distance – Example: assume 2 relevant and 10 irrelevant features l Can create algorithms which weight the attributes (Note that backprop and ID3 do higher order weighting of features) l Could do attribute weighting - No longer lazy evaluation since you need to come up with a portion of your hypothesis (attribute weights) before generalizing l Still an open area of research – Higher order weighting – 1 st order helps, but not enough – Even if all features are relevant features, all distances become similar as number of features increases, since not all features are relevant at the same time, and the currently irrelevant ones can dominate distance – A problem with all pure distance based techniques, need higher-order weighting to ignore currently irrelevant features – What is the best method, etc.? – important research area – Dimensionality reduction can be useful (feature pre-processing, PCA, NLDR, etc.) CS 472 - Nearest Neighbor Learning 13

Reduction Techniques l Create a subset or other representative set of prototype nodes Faster execution, and could even improve accuracy if noisy instances removed – l Approaches Leave-one-out reduction - Drop instance if it would still be classified – correctly Growth algorithm - Only add instance if it is not already classified correctly – - both order dependent, similar results More global optimizing approaches – Just keep central points – lower accuracy (mostly linear Voronoi decision – surface), best space savings Just keep border points, best accuracy (pre-process noisy instances – Drop5) – Drop 5 (Wilson & Martinez) maintains almost full accuracy with – approximately 15% of the original instances l Wilson, D. R. and Martinez, T. R., Reduction Techniques for Exemplar-Based Learning Algorithms, Machine Learning Journal, vol. 38 , no. 3, pp. 257-286, 2000. CS 472 - Nearest Neighbor Learning 14

CS 472 - Nearest Neighbor Learning 15

Distance Metrics l Wilson, D. R. and Martinez, T. R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research, vol. 6 , no. 1, pp. 1-34, 1997. l Normalization of features - critical l Don't know values in novel or data set instances – Can do some type of imputation and then normal distance – Or have a distance (between 0-1) for don't know values l Original main question: How best to handle nominal features CS 472 - Nearest Neighbor Learning 16

Value Difference Metric l Assume a 2 output class (A,B) example l Attribute 1 = Shape (Round, Square, Triangle, etc.) l 10 total round instances – 6 class A and 4 class B l 5 total square instances – 3 class A and 2 class B l Since both attribute values suggest the same probabilities for the output class, the distance between Round and Square would be 0 – If triangle and round suggested very different outputs, triangle and round would have a large distance l Distance of two attribute values is a measure of how similar they are in inferring the output class CS 472 - Nearest Neighbor Learning 18

IVDM l Distance Metrics make a difference l IVDM also helps deal with the many/irrelevant feature problem of k -NN, because features only add significantly to the overall distance if that distance leads to different outputs l Two features which tend to lead to the same output probabilities (exactly what irrelevant features should do) will have 0 or little distance, while their Euclidean distance could have been significantly larger l Need to take it further to find distance approaches taking into account higher order combinations between features in the distance metric CS 472 - Nearest Neighbor Learning 24

Nearest Neighbor Learning (Instance Based Learning) l Classify based - PowerPoint PPT Presentation

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges from simple nearest neighbor to case-based and analogical reasoning l Use local information near the current query instance to decide the

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Instance-Based Learning Preview K -nearest neighbor Other forms of IBL Collaborative

Introduction Marco Chiarandini Department of Mathematics & Computer Science University of

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Classification with Nearest Neighbors CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu What we know

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w

Nearest Neighbor Learning (Instance Based Learning) l Classify based - PowerPoint PPT Presentation

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges from simple nearest neighbor to case-based and analogical reasoning l Use local information near the current query instance to decide the

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Learning Nearest Neighbor Graphs from Noisy Distance Samples Noisy Distance Samples Blake Mason,

Instance-Based Learning Preview K -nearest neighbor Other forms of IBL Collaborative

Introduction Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Classification with Nearest Neighbors CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu What we know

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w

Introduction Marco Chiarandini Department of Mathematics & Computer Science University of

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures