k nearest neighbors
play

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine: Install


  1. K-Nearest Neighbors Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative • Check out review materials • Probability • Linear algebra • Python and NumPy • Start your HW 0 • On your Local machine: Install Anaconda, Jupiter notebook • On the cloud: https://colab.research.google.com • Sign up Piazza discussion forum

  3. Enrollment • Maximum allowable capacity reached. Students Classroom

  4. Machine learning reading&study group • Reading Group Tuesday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Research paper reading: machine learning, computer vision • Study Group Thursday 11 AM - 12:00 PM Location: Whittmore Hall 457B • Video lecture: machine learning All are welcome. More info: https://github.com/vt-vl-lab/reading_group

  5. Recap: Machine learning algorithms Supervised Unsupervised Learning Learning Discrete Classification Clustering Dimensionality Continuous Regression reduction

  6. Today’s plan • Supervised learning • Setup • Basic concepts • K-Nearest Neighbor (kNN) • Distance metric • Pros/Cons of nearest neighbor • Validation, cross-validation, hyperparameter tuning

  7. Supervised learning • Input : 𝑦 (Images, texts, emails) • Output : 𝑧 (e.g., spam or non-spams) • Data : 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 (Labeled dataset) • (Unknown) Target function : 𝑔: 𝑦 → 𝑧 (“True” mapping) • Model/hypothesis : ℎ: 𝑦 → 𝑧 (Learned model) • Learning = search in hypothesis space Slide credit: Dhruv Batra

  8. Training set Learning Algorithm 𝑦 𝑧 ℎ Hypothesis

  9. Regression Training set Learning Algorithm 𝑦 𝑧 ℎ Size of house Hypothesis Estimated price

  10. Classification Training set Learning Algorithm ‘Mug’ 𝑦 𝑧 ℎ Unseen image Predicted object class Hypothesis Image credit: CS231n @ Stanford

  11. Procedural view of supervised learning • Training Stage: • Raw data → 𝑦 (Feature Extraction) • Training data { 𝑦, 𝑧 } → ℎ (Learning) • Testing Stage • Raw data → 𝑦 (Feature Extraction) • Test data 𝑦 → ℎ(𝑦) (Apply function, evaluate error) Slide credit: Dhruv Batra

  12. Basic steps of supervised learning • Set up a supervised learning problem • Data collection: Collect training data with the “right” answer. • Representation: Choose how to represent the data. • Modeling : Choose a hypothesis class: 𝐼 = {ℎ: 𝑌 → 𝑍} • Learning/estimation : Find best hypothesis in the model class. • Model selection: T ry different models. Picks the best one. (More on this later) • If happy stop, else refine one or more of the above Slide credit: Dhruv Batra

  13. Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

  14. Face recognition Image credit: MegaFace

  15. Face recognition – surveillance application

  16. Music identification https://www.youtube.com/watch?v=TKNNOMddkNc

  17. Album recognition (Instance recognition) http://record-player.glitch.me/auth

  18. Scene Completion (C) Dhruv Batra [Hayes & Efros, SIGGRAPH07]

  19. Hays and Efros, SIGGRAPH 2007

  20. … 200 total [Hayes & Efros, SIGGRAPH07]

  21. Context Matching [Hayes & Efros, SIGGRAPH07]

  22. Graph cut + Poisson blending [Hayes & Efros, SIGGRAPH07]

  23. [Hayes & Efros, SIGGRAPH07]

  24. [Hayes & Efros, SIGGRAPH07]

  25. [Hayes & Efros, SIGGRAPH07]

  26. [Hayes & Efros, SIGGRAPH07]

  27. [Hayes & Efros, SIGGRAPH07]

  28. [Hayes & Efros, SIGGRAPH07]

  29. Synonyms • Nearest Neighbors • k-Nearest Neighbors • Member of following families: • Instance-based Learning • Memory-based Learning • Exemplar methods • Non-parametric methods Slide credit: Dhruv Batra

  30. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  31. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  32. Recall: 1-Nearest neighbor classifier • Training data 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , ⋯ , 𝑦 𝑂 , 𝑧 𝑂 • Learning Do nothing. • Testing ℎ 𝑦 = 𝑧 (𝑙) , where 𝑙 = argmin i 𝐸(𝑦, 𝑦 (𝑗) )

  33. Distance metrics ( 𝑦 : continuous variables ) • 𝑀 2 -norm: Euclidean distance 𝐸 𝑦, 𝑦 ′ = σ 𝑗 𝑦 𝑗 − 𝑦 𝑗 ′ 2 • 𝑀 1 -norm: Sum of absolute difference 𝐸 𝑦, 𝑦 ′ = σ 𝑗 |𝑦 𝑗 − 𝑦 𝑗 ′| ′ ) 𝐸 𝑦, 𝑦 ′ = max( 𝑦 𝑗 − 𝑦 𝑗 • 𝑀 inf - norm • Scaled Euclidean distance 𝐸 𝑦, 𝑦 ′ = 2 𝑦 𝑗 − 𝑦 𝑗 ′ 2 σ 𝑗 𝜏 𝑗 𝐸 𝑦, 𝑦 ′ = • Mahalanobis distance 𝑦 − 𝑦 ′ ⊤ 𝐵(𝑦 − 𝑦 ′ )

  34. Distance metrics ( 𝑦 : discrete variables ) • Example application: document classification • Hamming distance

  35. Distance metrics ( 𝑦 : Histogram / PDF) • Histogram intersection histint 𝑦, 𝑦 ′ = 1 − ෍ ′ ) min(𝑦 𝑗 , 𝑦 𝑗 𝑗 • Chi-squared Histogram matching distance ′ 2 𝜓 2 𝑦, 𝑦 ′ = 1 𝑦 𝑗 − 𝑦 𝑗 2 ෍ ′ 𝑦 𝑗 + 𝑦 𝑗 𝑗 • Earth mover’s distance (Cross-bin similarity measure) [Rubner et al. IJCV 2000] • minimal cost paid to transform one distribution into the other

  36. Distance metrics ( 𝑦 : gene expression microarray data) • When “shape” matters more than values 𝑦 (2) 𝑦 (1) • Want 𝐸(𝑦 (1) , 𝑦 (2) ) < 𝐸(𝑦 (1) , 𝑦 (3) ) 𝑦 (3) • How? Gene • Correlation Coefficients • Pearson, Spearman, Kendal, etc

  37. Distance metrics ( 𝑦 : Learnable feature) Large margin nearest neighbor (LMNN)

  38. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  39. kNN Classification k = 3 k = 5 Image credit: Wikipedia

  40. Classification decision boundaries Image credit: CS231 @ Stanford

  41. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  42. Issue: Skewed class distribution • Problem with majority voting in kNN • Intuition: nearby points should be weighted strongly, far points weakly ? • Apply weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 • 𝜏 2 : Kernel width

  43. Instance/Memory-based Learning 1. A distance metric 2. How many nearby neighbors to look at? 3. A weighting function (optional) 4. How to fit with the local points? Slide credit: Carlos Guestrin

  44. 1-NN for Regression • Just predict the same output as the nearest neighbour. Here, this is the closest datapoint y x Figure credit: Carlos Guestrin

  45. 1-NN for Regression • Often bumpy (overfits) Figure credit: Andrew Moore

  46. 9-NN for Regression • Predict the averaged of k nearest neighbor values Figure credit: Andrew Moore

  47. Weighting/Kernel functions Weight 2 𝑥 (𝑗) = exp(− 𝑒 𝑦 𝑗 , 𝑟𝑣𝑓𝑠𝑧 ) 𝜏 2 Prediction (use all the data) 𝑥 𝑗 𝑧 𝑗 / ෍ 𝑥 (𝑗) 𝑧 = ෍ 𝑗 𝑗 (Our examples use Gaussian) Slide credit: Carlos Guestrin

  48. Effect of Kernel Width • What happens as σ  inf? • What happens as σ  0? Kernel regression Slide credit: Ben Taskar

  49. Problems with Instance-Based Learning • Expensive • No Learning: most real work done during testing • For every test sample, must search through all dataset – very slow! • Must use tricks like approximate nearest neighbour search • Doesn’t work well when large number of irrelevant features • Distances overwhelmed by noisy features • Curse of Dimensionality • Distances become meaningless in high dimensions Slide credit: Dhruv Batra

  50. Curse of dimensionality 2𝑠 • Consider a hypersphere with radius 𝑠 and dimension 𝑒 2𝑠 • Consider hypercube with edge of length 2𝑠 𝑒 = 2 • Distance between center and the corners is 𝑠 𝑒 • Hypercube consist almost entirely of the “corners”

  51. Hyperparameter selection • How to choose K? • Which distance metric should I use? L2, L1? • How large the kernel width 𝜏 2 should be? • ….

  52. Tune hyperparameters on the test dataset? • Will give us a stronger performance on the test set! • Why this is not okay ? Let’s discuss Evaluate on the test set only a single time, at the very end.

  53. Validation set • Spliting training set: A fake test set to tune hyper-parameters Slide credit: CS231 @ Stanford

  54. Cross-validation • 5-fold cross-validation -> split the training data into 5 equal folds • 4 of them for training and 1 for validation Slide credit: CS231 @ Stanford

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend