K nearest neighbor LING 572 Advanced Statistical Methods for NLP - PowerPoint PPT Presentation

K nearest neighbor LING 572 Advanced Statistical Methods for NLP Shane Steinert-Threlkeld January 16, 2020 1

The term “weight” in ML ● Weights of features ● Weights of instances ● Weights of classifiers 2

The term “binary” in ML ● Classification problem: ● Binary: the number of classes is 2 ● Multi-class: the number is classes is > 2 ● Features: ● Binary: the number of possible feature values is 2. ● Categorical / discrete: > 2 values ● Real-valued / scalar / continuous: the feature values are real numbers ● File format: ● Binary: human un-readable ● Text: human readable 3

Instance-based (IB) learning ● No training: store all training instances. ➔ “Lazy learning” ● Examples: ● kNN ● Locally weighted regression ● Case-based reasoning ● … ● The most well-known IB method: kNN 5

kNN 6 img: Antti Ajanki, CC-by-SA 3.0

kNN ● Training: record labeled instances as feature vectors ● Test: for a new instance d, ● find k training instances that are closest to d. ● perform majority voting or weighted voting. ● Properties: ● A “lazy” classifier. No learning in the training stage. ● Feature selection and distance measure are crucial. 7

The algorithm ● Determine parameter K ● Calculate the distance between the test instance and all the training instances ● Sort the distances and determine K nearest neighbors ● Gather the labels of the K nearest neighbors ● Use simple majority voting or weighted voting. 8

Issues ● What’s K? ● How do we weight/scale/select features? ● How do we combine instances by voting? 9

Picking K ● Split the data into ● Training data ● Dev/val data ● Test data ● Pick k with the lowest error rate on the validation set ● use N-fold cross validation if the training data is small 10

Normalizing attribute values ● Distance could be dominated by some attributes with large numbers: ● Example: features: age, income ● Original data: x 1 =(35, 76K), x 2 =(36, 80K), x 3 =(70, 79K) ● Rescale: i.e., normalize to [0,1] ● Assume: age ∈ ∈ [0,100], income [0, 200K] ● After normalization: x 1 =(0.35, 0.38), x 2 =(0.36, 0.40), x 3 = (0.70, 0.395). 11

The Choice of Features ● Imagine there are 100 features, and only 2 of them are relevant to the target label. ● Differences in irrelevant features likely to dominate: ● kNN is easily misled in high-dimensional space. ● Feature weighting or feature selection is key (It will be covered next time) 12

Feature weighting j w j ● Reweighting a dimension by weight ● Can increase or decrease weight of feature on that dimension w j ● Setting to zero eliminates this dimension altogether. ● Use (cross-)validation to automatically choose weights w 1 , …, w | F | 13

Some distance measures ● Euclidean distance: d ( d i , d j ) = ∥ d i − d j ∥ 2 Σ k ( d i , k − d j , k ) 2 2 = ● Weighted Euclidean distance: Σ k w k ( d i , k − d j , k ) 2 d ( d i , d j ) = d i ⋅ d k ● Cosine: cos( d i , d j ) = ∥ d i ∥ 2 2 ∥ d j ∥ 2 2 14

Voting by k-nearest neighbors ● Suppose we have found the k-nearest neighbors. ● f i ( x ) Let be the class label for the i -th neighbor of x . δ ( c , f i ( x )) = { 1 f i ( x ) = c 0 otherwise g ( c ) = ∑ δ ( c , f i ( x )) i that is, g(c) is the number of neighbors with label c. 15

     Voting ● Majority voting: c * = arg max g ( c ) c ● Weighted voting: weighting is on each neighbor ∑ c * = arg max w i δ ( c , f i ( x )) c i ● Weighted voting allows us to use more training examples, e.g.: 1 w i = d ( x , x i ) ➔ We can use all the training examples. 16

kNN Decision Boundary IR, fig 14.6 1-NN: unions of cells of Voronoi tessellation 17

kNN Decision Boundary link 5-NN example 18

Summary of kNN algorithm ● Decide k, feature weights, and similarity measure ● Given a test instance x ● Calculate the distances between x and all the training data ● Choose the k nearest neighbors ● Let the neighbors vote 19

Pros/Cons of kNN algorithm ● Strengths: ● Simplicity (conceptual) ● Efficiency at training: no training ● Handling multi-class ● Stability and robustness: averaging k neighbors ● Predication accuracy: when the training data is large ● Complex decision boundaries ● Weakness: ● Efficiency at testing time: need to calculate all distances ● Better search algorithms: e.g., use k-d trees ● Reduce the amount of training data used at the test time: e.g., Rocchio algorithm ● Sensitivity to irrelevant or redundant features ● Distance metrics unclear on non-numerical/binary values 20

K nearest neighbor LING 572 Advanced Statistical Methods for NLP - PowerPoint PPT Presentation

K nearest neighbor LING 572 Advanced Statistical Methods for NLP Shane Steinert-Threlkeld January 16, 2020 1 The term weight in ML Weights of features Weights of instances Weights of classifiers 2 The term binary in ML

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Machine Learning and Data Mining Nearest neighbor methods Kalev Kask Supervised learning

Decision Framing in Judgment Aggregation Fabrizio Cariani, Marc Pauly, Josh Snyder Philosophy

Annual Meeting September 28 th , 2015 Sandy Watershed Learning Center Council Development

Combining Classifiers d i,j = 1 if D i labels x in i , and d i,j = 0 otherwise. In this case,

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

The Economic History of the World J. Parman (College of William & Mary) American Economic

Anti-Malthus: Evolution, Population and the Maximization of Free Resources David K. Levine

Understanding Persistence Morgan Kelly 22nd October 2020 Long Run Impact of History European

Sambuz

Useful Links

Newsletter

Mail Us

K nearest neighbor LING 572 Advanced Statistical Methods for NLP - PowerPoint PPT Presentation

K nearest neighbor LING 572 Advanced Statistical Methods for NLP Shane Steinert-Threlkeld January 16, 2020 1 The term weight in ML Weights of features Weights of instances Weights of classifiers 2 The term binary in ML

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons &amp; Neural Nets Artificial Intelligence CSPP 56553

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

Machine Learning and Data Mining Nearest neighbor methods Kalev Kask Supervised learning

Decision Framing in Judgment Aggregation Fabrizio Cariani, Marc Pauly, Josh Snyder Philosophy

Annual Meeting September 28 th , 2015 Sandy Watershed Learning Center Council Development

Combining Classifiers d i,j = 1 if D i labels x in i , and d i,j = 0 otherwise. In this case,

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

The Economic History of the World J. Parman (College of William &amp; Mary) American Economic

Anti-Malthus: Evolution, Population and the Maximization of Free Resources David K. Levine

Understanding Persistence Morgan Kelly 22nd October 2020 Long Run Impact of History European

Sambuz

Useful Links

Newsletter

Mail Us

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

The Economic History of the World J. Parman (College of William & Mary) American Economic