The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm - PowerPoint PPT Presentation

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space – variable size variable size – – deterministic deterministic – – continuous parameters continuous parameters – Learning Algorithm Learning Algorithm – direct computation direct computation – – lazy lazy –

Nearest Neighbor Algorithm Nearest Neighbor Algorithm Store all of the training examples Store all of the training examples Classify a new example x x by finding the training by finding the training Classify a new example y i , y h x i that is nearest to example h x i i i that is nearest to x x according to according to example i , Euclidean distance: Euclidean distance: sX ( x j − x ij ) 2 k x − x i k = j ŷ = y i guess the class ŷ = y . guess the class i . Efficiency trick: squared Euclidean distance Efficiency trick: squared Euclidean distance gives the same answer but avoids the square gives the same answer but avoids the square root computation root computation X k x − x i k 2 = ( x j − x ij ) 2 j

Decision Boundaries: The Voronoi Diagram Decision Boundaries: The Voronoi Diagram Nearest Neighbor does not explicitly compute decision boundaries. . Nearest Neighbor does not explicitly compute decision boundaries However, the boundaries form a subset of the Voronoi diagram of However, the boundaries form a subset of the Voronoi diagram of the training data the training data Each line segment is equidistant between two points of opposite Each line segment is equidistant between two points of opposite class. The more examples that are stored, the more complex the class. The more examples that are stored, the more complex the decision boundaries can become. decision boundaries can become.

Nearest Neighbor depends critically Nearest Neighbor depends critically on the distance metric on the distance metric Normalize Feature Values: Normalize Feature Values: – All features should have the same range of values (e.g., [ All features should have the same range of values (e.g., [– –1,+1]). 1,+1]). – Otherwise, features with larger ranges will be treated as more Otherwise, features with larger ranges will be treated as more important important Remove Irrelevant Features: Remove Irrelevant Features: – Irrelevant or noisy features add random perturbations to the – Irrelevant or noisy features add random perturbations to the distance measure and hurt performance distance measure and hurt performance Learn a Distance Metric: Learn a Distance Metric: – One approach: weight each feature by its mutual information One approach: weight each feature by its mutual information – n w ∑ j=1 ) = ∑ w j x j y ). Then d ( w j x j – x x’ ’ j with the class. Let w = I( x ; y ). Then d ( x 2 ( x x , , x x’ ’ ) = n ) 2 with the class. Let j = I( j ; j ( j – j ) j=1 – Another approach: Use the Mahalanobis distance: Another approach: Use the Mahalanobis distance: – Σ - T Σ -1 1 ( D M ( x x , , x x’ ’ ) = ( ) = ( x x – – x x ’ ’) ) T ( x x – – x x ’ ’) ) D M ( Smoothing: Smoothing: Find the k k nearest neighbors and have them vote. This is – Find the nearest neighbors and have them vote. This is – especially good when there is noise in the class labels. especially good when there is noise in the class labels.

Reducing the Cost of Nearest Neighbor Reducing the Cost of Nearest Neighbor Efficient Data Structures for Retrieval (kd- - Efficient Data Structures for Retrieval (kd trees) trees) Selectively Storing Data Points (editing) Selectively Storing Data Points (editing) Pipeline of Filters Pipeline of Filters

kd Trees kd Trees A kd- -tree is similar to a decision tree except that we split tree is similar to a decision tree except that we split A kd median value along the dimension having the using the median value along the dimension having the using the highest variance. Every internal node stores one data highest variance. Every internal node stores one data point, and the leaves are empty point, and the leaves are empty

Log time Queries with kd- -trees trees Log time Queries with kd KDTree root; KDTree root; Node NearestNeighbor(Point P) Node NearestNeighbor(Point P) { { PriorityQueue PQ; // minimizing queue PriorityQueue PQ; // minimizing queue float bestDist = infinity; // smallest distance seen so far float bestDist = infinity; // smallest distance seen so far Node bestNode; // nearest neighbor so far Node bestNode; // nearest neighbor so far PQ.push(root, 0); PQ.push(root, 0); while (!PQ.empty()) { while (!PQ.empty()) { (node, bound) = PQ.pop(); (node, bound) = PQ.pop(); if (bound >= bestDist) return bestNode.p; if (bound >= bestDist) return bestNode.p; float dist = distance(P, node.p); float dist = distance(P, node.p); if (dist < bestDist) {bestDist = dist; bestNode = node; } if (dist < bestDist) {bestDist = dist; bestNode = node; } if (node.test(P)) {PQ.push(node.left, P[node.feat] - - node.thresh); node.thresh); if (node.test(P)) {PQ.push(node.left, P[node.feat] PQ.push(node.right, 0); } PQ.push(node.right, 0); } else { PQ.push(node.left, 0); else { PQ.push(node.left, 0); PQ.push(node.right, node.thresh - - P[node.feat]); } P[node.feat]); } PQ.push(node.right, node.thresh } // while } // while return bestNode.p; return bestNode.p; } // NearestNeighbor } // NearestNeighbor

Example Example New Distance Best Distance Best node Priority Queue New Distance Best Distance Best node Priority Queue ∞ ∞ none none none none (f,0) (f,0) 4.00 4.00 f (c,0) (h,4) 4.00 4.00 f (c,0) (h,4) 7.61 4.00 f (e,0) (h,4) (b,7) 7.61 4.00 f (e,0) (h,4) (b,7) 1.00 1.00 1.00 1.00 e e (d,1) (h,4) (b,7) (d,1) (h,4) (b,7) This is a form of A* search using the minimum distance to a node as an as an This is a form of A* search using the minimum distance to a node underestimate of the true distance underestimate of the true distance

Edited Nearest Neighbor Edited Nearest Neighbor Select a subset of the training examples Select a subset of the training examples that still gives good classifications that still gives good classifications – Incremental deletion: Loop through the Incremental deletion: Loop through the – memory and test each point to see if it can be memory and test each point to see if it can be correctly classified given the other points in correctly classified given the other points in memory. If so, delete it from the memory. memory. If so, delete it from the memory. – Incremental growth. Start with an empty Incremental growth. Start with an empty – memory. Add each point to the memory only memory. Add each point to the memory only if it is not correctly classified by the points if it is not correctly classified by the points already stored already stored

Filter Pipeline Filter Pipeline Consider several distance measures: D 1 , Consider several distance measures: D 1 , D 2 , … …, D , D n where D i+1 is more expensive to D 2 , n where D i+1 is more expensive to compute than D i compute than D i Calibrate a threshold N i for each filter Calibrate a threshold N i for each filter using the training data using the training data Apply the nearest neighbor rule with D i to Apply the nearest neighbor rule with D i to compute the N i nearest neighbors compute the N i nearest neighbors Then apply filter D i+1 to those neighbors Then apply filter D i+1 to those neighbors and keep the N i+1 nearest, and so on and keep the N i+1 nearest, and so on

The Curse of Dimensionality The Curse of Dimensionality Nearest neighbor breaks down in high- -dimensional spaces, because the dimensional spaces, because the Nearest neighbor breaks down in high “neighborhood neighborhood” ” becomes very large. becomes very large. “ Suppose we have 5000 points uniformly distributed in the unit hy Suppose we have 5000 points uniformly distributed in the unit hypercube percube and we want to apply the 5- -nearest neighbor algorithm. Suppose our query nearest neighbor algorithm. Suppose our query and we want to apply the 5 point is at the origin. point is at the origin. Then on the 1- -dimensional line, we must go a distance of 5/5000 = 0.001 on dimensional line, we must go a distance of 5/5000 = 0.001 on Then on the 1 the average to capture the 5 nearest neighbors the average to capture the 5 nearest neighbors √ In 2 dimensions, we must go to get a square that c In 2 dimensions, we must go to get a square that contains 0.001 of ontains 0.001 of 0 . 001 the volume. the volume. In D dimensions, we must go (0.001) 1/d 1/d In D dimensions, we must go (0.001)

The Curse of Dimensionality (2) The Curse of Dimensionality (2) With 5000 points in 10 dimensions, we must go With 5000 points in 10 dimensions, we must go 0.501 distance along each attribute in order to 0.501 distance along each attribute in order to find the 5 nearest neighbors find the 5 nearest neighbors

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm - PowerPoint PPT Presentation

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space variable size variable size deterministic deterministic continuous parameters continuous parameters Learning

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest neighbor methods Lecture 11 David Sontag New York

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web:

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Recent Advances in Adaptive Sampling and Reconstruction for Monte

High-dimensional statistics and probability Christophe Giraud 1 , Matthieu Lerasle 2 , 3 and

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm - PowerPoint PPT Presentation

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space variable size variable size deterministic deterministic continuous parameters continuous parameters Learning

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest neighbor methods Lecture 11 David Sontag New York

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons &amp; Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web:

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Recent Advances in Adaptive Sampling and Reconstruction for Monte

High-dimensional statistics and probability Christophe Giraud 1 , Matthieu Lerasle 2 , 3 and

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553