8 nearest neighbors
play

8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Paris Fall 2017 8. Nearest neighbors Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters Class


  1. Foundatjons of Machine Learning CentraleSupélec Paris — Fall 2017 8. Nearest neighbors Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

  2. Practjcal maters ● Class representatjves – William PALMER william.palmer@student.ecp.fr – Léonard BOUSSIOUX leonard.boussioux@student.ecp.fr ● Kaggle project 2

  3. Learning objectjves ● Implement the nearest-neighbor and k-nearest- neighbors algorithms. ● Compute distances between real-valued vectors as well as objects represented by categorical features . ● Defjne the decision boundary of the nearest- neighbor algorithm. ● Explain why kNN might not work well in high dimension . 3

  4. Nearest neighbors 4

  5. ● How would you color the blank circles? 5

  6. ● How would you color the blank circles? 6

  7. Partjtjoning the space The training data partjtjons the entjre space 7

  8. Nearest neighbor ● Learning: – Store all the training examples ● Predictjon : – For x : the label of the training example closest to it 8

  9. k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon? 9

  10. k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. 10

  11. k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. – Regression? 11

  12. k nearest neighbors ● Learning: – Store all the training examples ● Predictjon : – Find the k training examples closest to x – Classifjcatjon Majority vote: Predict the class of the most frequent label among the k neighbors. – Regression Predict the average of the labels of the k neighbors. 12

  13. Choice of k ● Small k: noisy The idea behind using more than 1 neighbor is to average out the noise ● Large k: computatjonally intensive If k = n ? 13

  14. Choice of k ● Small k: noisy The idea behind using more than 1 neighbor is to average out the noise ● Large k: computatjonally intensive If k=n, then we predict – for classifjcatjon: the majority class – for regression: the average value ● Set k by cross-validatjon ● Heuristjc: k ≈ √n 14

  15. Non-parametric learning Non-parametric learning algorithm: – the complexity of the decision functjon grows with the number of data points. – contrast with linear regression (≈ as many parameters as features). – Usually: decision functjon is expressed directly in terms of the training examples. – Examples: ● kNN (this chapter) ● tree-based methods (Chap. 9) ● SVM (Chap. 10) 15

  16. Instance-based learning ● Learning: – Storing training instances. ● Predictjng: – Compute the label for a new instance based on its similarity with the stored instances. ● Also called lazy learning. ● Similar to case-based reasoning – Doctors treatjng a patjent based on how patjents with similar symptoms were treated, – Judges ruling court cases based on legal precedent. 16

  17. Instance-based learning ● Learning: – Storing training instances. ● Predictjng: – Compute the label for a new instance based on its similarity with the stored instances. where the magic happens! ● Also called lazy learning. ● Similar to case-based reasoning – Doctors treatjng a patjent based on how patjents with similar symptoms were treated, – Judges ruling court cases based on legal precedent. 17

  18. Computjng distances & similaritjes 18

  19. Distances between instances ● Distance 19

  20. Distances between instances ● Distance 20

  21. Distances between instances ● Euclidean distance 21

  22. Distances between instances ● Euclidean distance ● Manhatan distance Why is this called the Manhatan distance? 22

  23. Distances between instances ● Euclidean distance ● Manhatan distance ● Lq-norm: Minkowski distance – L1 = Manhatuan. – L2 = Euclidean. – L ∞ ? 23

  24. Distances between instances ● Euclidean distance ● Manhatan distance ● Lq-norm: Minkowski distance – L1 = Manhatuan. – L2 = Euclidean. – L ∞ 24

  25. Similarity between instances ● Pearson's correlatjon ● Assuming the data is centered Geometric interpretatjon? 25

  26. Similarity between instances ● Pearson's correlatjon (centered data) ● Cosine similarity: the dot product can be used to measure similaritjes. 26

  27. Categorical features ● Ex: a feature that can take 5 values – Sports – World – Culture – Internet – Politjcs ● Naive encoding: x 1 in {1, 2, 3, 4, 5}: – Why is Sports closer to World than Politjcs? ● One-hot encoding: x 1 , x 2 , x 3 , x 4 , x 5 – Sports: [1, 0, 0, 0, 0] – Internet: [0, 0, 0, 1, 0] 27

  28. Categorical features ● Represent object as the list of presence/absence (or counts) of features that appear in it. ● Example : small molecules features = atoms and bonds of a certain type – C, H, S, O, N... – O-H, O=C, C-N.... 28

  29. Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 no occurrence 1+ occurrences of the 1 st feature of the 10 th feature ● Hamming distance Number of bits that are difgerent Equivalent to ? 29

  30. Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 no occurrence 1+ occurrences of the 1 st feature of the 10 th feature ● Hamming distance Number of bits that are difgerent Equivalent to 30

  31. Binary representatjon 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 ● Tanimoto/Jaccard similarity Number of shared features (normalized) 31

  32. Counts representatjon 0 1 2 0 0 1 0 0 0 4 0 1 0 0 7 no occurrence # occurrences of the 1 st feature of the 10 th feature ● MinMax similarity Number of shared features (normalized) If x is binary, MinMax and Tanimoto are equivalent 32

  33. Categorical features ● Features ● Compute the Hamming distance and Tanimoto and MinMax similaritjes between these objects: ? 33

  34. Categorical features ● Features ● Compute the Hamming distance and Tanimoto and MinMax similaritjes between these objects: 100011010110 300011010120 111011011110 211021011120 111011010100 311011010100 34

  35. Categorical features ● A = 100011010110 / 300011010120 ● B = 111011011110 / 211021011120 ● C = 111011010100 / 311011010100 ● Hamming distance d(A, B) = 3 d(A, C) = 3 d(B, C) = 2 ● Tanimoto similarity s(A, B) = 6/9 s(A, C) = 5/8 s(B, C) = 7/9 = 0.67 = 0.63 = 0.78 ● MinMax similarity s(A, B) = 8/13 s(A, C) = 7/11 s(B, C) = 8/13 = 0.62 = 0.64 = 0.62 35

  36. Categorical features ● Features ● When new data has unknown features: ignore them. = 36

  37. Back to nearest neighbors 37

  38. Advantages of kNN ● Training is very fast – Just store the training examples. – Can use smart indexing procedures to speed-up testjng (slower training). ● Keeps the training data – Useful if we want to do something else with it. ● Rather robust to noisy data (averaging k votes) ● Can learn complex functjons 38

  39. Drawbacks of kNN ● Memory requirements ● Predictjon can be slow. – Complexity of labeling 1 new data point ? 39

  40. Drawbacks of kNN ● Memory requirements ● Predictjon can be slow. Complexity of labeling 1 new data point: But kNN works best with lots of samples... → Effjcient data structures ( k-D trees , ball-trees ) ● constructjon space: tjme: ● query: → Approximate solutjons based on hashing ● kNN are fooled by irrelevant atributes. E.g. p=1000, only 10 features are relevant; distances become meaningless. 40

  41. Decision boundary of kNN ● Classifjcatjon ● Decision boundary: Line separatjng the positjve from negatjve regions. ● What decision boundary is the kNN building? 41

  42. Voronoi tesselatjon ● Voronoi cell of x : – set of all points of the space closer to x than any other point of the training set – polyhedron ● Voronoid tesselatjon of the space: union of all Voronoi cells. Draw the ? Voronoi cell of the blue dot. 42

  43. Voronoi tesselatjon ● Voronoi cell of x : – set of all points of the space closer to x than any other point of the training set – polyhedron ● Voronoid tesselatjon of the space: union of all Voronoi cells. 43

  44. Voronoi tesselatjon ● The Voronoi tesselatjon defjnes the decision boundary of the 1-NN. ● The kNN also partjtjons the space (in a more complex way). 44

  45. Curse of dimensionality ● Remember from Chap 3 ● When p ↗ the proportjon of a hypercube outside of its inscribed hypersphere approaches 1. ● Volume of a p-sphere: ● What this means: – hyperspace is very big – all points are far apart – dimensionality reductjon needed. 45

  46. kNN variants ● ε-ball neighbors – Instead of using the k nearest neighbors, use all points within a distance ε of the test point. – What if there are no such points? 46

  47. kNN variants ● Weighted kNN – Weigh the vote of each neighbor according to the distance to the test point. – Variant: learn the optjmal weights [e.g. Swamidass, Azencotu et al. 2009, Infmuence Relevance Voter] 47

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend