Nearest-Neighbor Methods Store all training examples Given a new - PowerPoint PPT Presentation

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are closest to it in feature space (distance: Eu- About this class clidean/Mahalanobis?) k -Nearest-Neighbors Return majority classification among those k points Bagging Curse of dimensionality – irrelevant features can dominate classification Boosting Training is trivial, but e ffi ciency of finding k nearest points? Use intelligent data structures like kd -trees. Worst case behavior is bad for nearest-neighor 1 2

Bagging Bootstrap Aggregating (Breiman, 1994) Key idea: Build t independent replicates of the training set L by sampling with replacement search ( O ( l )) but much better on average (al- though distribution dependent). Initial fixed Train classifier on each of them cost to building the tree Predict the majority of all these classifiers Big caveat: search cost seems to scale badly with the number of dimensions in the feature In the case of a regression problem, predict the space! average Very simple but efective algorithm! For decision trees: significant improvement in accuracy, but a loss in comprehensibility Works well for unstable algorithms. Intuition: unstable algorithms can change their predic- tions substantially based on small changes in the training set, which is essentially what each 3

replicate training set is doing. When you average over multiple sets of training data, you’re Boosting getting a more stable predictor. Basic question: Can we take an algorithm that Let f A be the aggregated predictor. Then learns weak hypotheses that perform some- f A ( x ) attempts to approximate E L f ( x ) what better than chance and make it into a How di ff erent are the training sets? The prob- strong learner? ability that a given example is not in a given subset is (1 − 1 /n ) n → 1 /e = 0 . 368 as n → ∞ . Answer: yes (Freund and Schapire, various pa- pers) Empirically, 50 replicates give all the benefit of bagging, often a 20% to 40% reduction in We’ll again build an ensemble classifier, but, error rate. unlike bagging, members of the ensemble will have di ff erent weights Each trained model has higher initial variance since it is trained on a smaller training set. Bagging reduces variance (albeit slower than Bagging stable classifiers can somewhat de- 1 /n because the training set is replicated), but grade performance boosting reduces bias by making the hypothesis space more flexible What would happen with linear regression or Naive Bayes? 4

AdaBoost Algorithm 2. α t ← 1 � 1 − ǫ t � 2 log ǫ t Given: 3. Update: Training examples ( x i , y i ) , . . . , ( x m , y m ) D ( i ) ← D ( i ) e − α t if h t ( x i ) = y i Z A weak learning algorithm, guaranteed to make error ǫ ≤ 1 2 − γ D ( i ) ← D ( i ) e α t if h t ( x i ) � = y i Z Maintain a weight distribution D over training where Z is a normalization factor examples. Initialize D ( i ) = 1 /m . Return final hypothesis: Now repeat for a number of rounds T :  T  � H ( x ) = sgn α t h t ( x )   t =1 1. Train weak learner using distribution D. This gives a weak hypothesis h t : X → Caveat: We need a weak learner that can learn {± 1 } . h t has error even on hard weight distributions! ǫ t = Pr [ h ( x i ) � = y i ] i ∼ D t 5

Training Error Substituting from above, � � ǫ ≤ D T +1 ( i ) Z t First let’s bound the weight distribution: t i � = Z t t D T +1 ( i ) = D T ( i ) exp( − α T h T ( x i ) y i ) Z T Finally, D t ( i ) e − α t + � � Z t = D t ( i ) e α t T D T +1 ( i ) = 1 1 � exp( − α T h T ( x i ) y i ) i : h t ( x i )= y i i : h t ( x i ) � = y i n Z t t =1 = e − α t (1 − ǫ t ) + e α t ( ǫ t ) exp � T = 1 t =1 ( − α t h t ( x i ) y i ) � = 2 ǫ t (1 − ǫ t ) � T n t =1 Z t � 1 − 4 γ 2 = t Now for the training error: Then ǫ = 1 � � I [ y i α t h t ( x i ) ≤ 0] γ 2 � ǫ ≤ exp( − 2 t ) n i t t ≤ 1 � � exp( − y i α t h t ( x i )) So, proof that we can boost weak learners that n t i meet the requisite conditions into strong learn- (because e − z ≥ 1 if z ≤ 0) ers! 6

Generalization and Empirical Properties Fairly robust to overfitting. In fact, often test error keeps decreasing even after training error has converged Works well with a range of hypotheses, includ- ing decision trees, stumps, and Naive Bayes Relation to SVMs? Can think of boosting as maximizing a di ff erent margin, and of using multiple weak learners to go to a high dimensional space, instead of using a kernel like SVMs do. Computationally, boosting is easier (LP as opposed to QP) 7

Nearest-Neighbor Methods Store all training examples Given a new - PowerPoint PPT Presentation

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are closest to it in feature space (distance: Eu- About this class clidean/Mahalanobis?) k -Nearest-Neighbors Return majority classification among

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest neighbor methods Lecture 11 David Sontag New York

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Nearest-Neighbor Methods Store all training examples Given a new - PowerPoint PPT Presentation

Nearest-Neighbor Methods Store all training examples Given a new test example, find the k that are closest to it in feature space (distance: Eu- About this class clidean/Mahalanobis?) k -Nearest-Neighbors Return majority classification among

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Nearest neighbor methods Lecture 11 David Sontag New York

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal Jiantao Jiao

Learning: Nearest Neighbor, Perceptrons &amp; Neural Nets Artificial Intelligence CSPP 56553

Nearest Neighbor Classifiers CSE 4308/5360: Artificial Intelligence I University of Texas at

Lecture 7: Non-Parametric Methods KNN Dr. Chengjiang Long Computer Vision Researcher at

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

Instance Based Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 8

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures