 
              Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M. Magdon-Ismail CSCI 4100/6100
My 5-Year-Old Called It “A ManoHorse” The simplest method of learning that we know. Classify according to similar objects you have seen. M Similarity and Nearest Neighbor : 2 /16 � A c L Creator: Malik Magdon-Ismail Measuring similarity − →
Measuring Similarity − − − − features, x − − − − − → | x − x ′ | d ( x , x ′ ) = | | M Similarity and Nearest Neighbor : 3 /16 � A c L Creator: Malik Magdon-Ismail Euclidean distance − →
Measuring Similarity − − − − features, x − − − − − → | x − x ′ | d ( x , x ′ ) = | | M Similarity and Nearest Neighbor : 4 /16 � A c L Creator: Malik Magdon-Ismail Nearest neighbor − →
Nearest Neighbor Test ‘ x ’ is classified using its nearest neighbor. d ( x , x [1] ) ≤ d ( x , x [2] ) ≤ · · · ≤ d ( x , x [ N ] ) x [2] x [1] x g ( x ) = y [1] ( x ) x [3] x [4] No training needed! E in = 0 Nearest neighbor Voronoi tesselation M Similarity and Nearest Neighbor : 5 /16 � A c L Creator: Malik Magdon-Ismail No training − →
Nearest Neighbor Test ‘ x ’ is classified using its nearest neighbor. d ( x , x [1] ) ≤ d ( x , x [2] ) ≤ · · · ≤ d ( x , x [ N ] ) x [2] x [1] x g ( x ) = y [1] ( x ) x [3] x [4] No training needed! E in = 0 Nearest neighbor Voronoi tesselation M Similarity and Nearest Neighbor : 6 /16 � A c L Creator: Malik Magdon-Ismail E in = 0 − →
Nearest Neighbor Test ‘ x ’ is classified using its nearest neighbor. d ( x , x [1] ) ≤ d ( x , x [2] ) ≤ · · · ≤ d ( x , x [ N ] ) x [2] x [1] x g ( x ) = y [1] ( x ) x [3] x [4] No training needed! E in = 0 Nearest neighbor Voronoi tesselation M Similarity and Nearest Neighbor : 7 /16 � A c L Creator: Malik Magdon-Ismail What about E out ? − →
What about E out ? Theorem: E out ≤ 2 E ∗ (with high probability as N → ∞ ) out VC analysis: E in is an estimate for E out . Nearest neighbor analysis: E in = 0, E out is small. So we will never know what E out is, but it cannot be much worse than the best anyone can do . Half the classification power of the data is in the nearest neighbor M Similarity and Nearest Neighbor : 8 /16 � A c L Creator: Malik Magdon-Ismail Proving E out ≤ 2 E ∗ out − →
Proving E out ≤ 2 E ∗ out π ( x ) = P [ y = +1 | x ] . ← the target in logistic regression N →∞ N →∞ Assume π ( x ) is continuous and x [1] − → x . Then π ( x [1] ) − → π ( x ). P [ g N ( x ) � = y ] = P [ y = +1 , y [1] = − 1] + P [ y = − 1 , y [1] = +1] , = π ( x ) · (1 − π ( x [1] )) + (1 − π ( x )) · π ( x [1] ) , → π ( x ) · (1 − π ( x )) + (1 − π ( x )) · π ( x ) , = 2 π ( x ) · (1 − π ( x )) , ≤ 2 min { π ( x ) , 1 − π ( x ) } . The best you can do is E ∗ out ( x ) = min { π ( x ) , 1 − π ( x ) } . M Similarity and Nearest Neighbor : 9 /16 � A c L Creator: Malik Magdon-Ismail Nearest neighbor ‘self-regularizes’ − →
Nearest Neighbor ‘Self-Regularizes’ N = 2 N = 3 N = 4 N = 5 N = 6 A simple boundary is used with few data points. A more complicated boundary is possible only when you have more data points. regularization guides you to simpler hypotheses when data quality/quantity is lower. M Similarity and Nearest Neighbor : 10 /16 � A c L Creator: Malik Magdon-Ismail k -nearest neighbor − →
k -Nearest Neighbor � k � � g ( x ) = sign y [ i ] ( x ) . i =1 ( k is odd and y n = ± 1). 1-NN rule 21-NN rule 127-NN rule M Similarity and Nearest Neighbor : 11 /16 � A c L Creator: Malik Magdon-Ismail The role of k − →
The Role of k k determines the tradeoff between fitting the data and overfitting the data. Theorem. For N → ∞ , if k ( N ) → ∞ and k ( N ) /N → 0 then, E out ( g ) → E ∗ E in ( g ) → E out ( g ) and out . � √ � For example k = N . M Similarity and Nearest Neighbor : 12 /16 � A c L Creator: Malik Magdon-Ismail 3 Ways To Choose k − →
3 Ways To Choose k 2 1. k = 3. � √ E out (%) � 2. k = N . 1.5 k = 1 k = 3 √ 3. Validation or cross validation: k = N 1 CV k -NN rule hypotheses g k constructed on training set, tested on validation set, and best k is picked. 0 1000 2000 3000 4000 5000 # Data Points, N M Similarity and Nearest Neighbor : 13 /16 � A c L Creator: Malik Magdon-Ismail Nearest neighbor is nonparametric − →
Nearest Neighbor is Nonparametric NN-rule Linear Model no parameters ( d + 1) parameters expressive/flexible rigid, always linear g ( x ) needs data g ( x ) needs only weights generic, can model anything specialized M Similarity and Nearest Neighbor : 14 /16 � A c L Creator: Malik Magdon-Ismail Multiclass − →
Nearest Neighbor Easily Extends to Multiclass 0 1 1 2 3 Symmetry Symmetry 4 4 5 0 9 6 8 7 3 8 9 7 2 6 Average Intensity Average Intensity True Predicted 0 1 2 3 4 5 6 7 8 9 0 13.5 0.5 0.5 1 0 0.5 0 0 0.5 0 16.5 1 0.5 13.5 0 0 0 0 0 0 0 0 14 2 0.5 0 3.5 1 1 1.5 1 1 0 0.5 10 3 2.5 0 1.5 2 0.5 0.5 0.5 0.5 0.5 1 9.5 41% accuracy! 4 0.5 0 1 0.5 1.5 0.5 1 2 0 1.5 8.5 5 0.5 0 2.5 1 0.5 1.5 1 1 0 0.5 7.5 6 0.5 0 2 1 1 1 1 1 0 1 8.5 7 0 0 1.5 0.5 1.5 0.5 1 3 0 1 9 8 3.5 0 0.5 1 0.5 0.5 0.5 0 0.5 1 8 9 0.5 0 1 1 1 0.5 1 1 0.5 2 8.5 22.5 14 14 9 7.5 7 7 9.5 2 8.5 100 M Similarity and Nearest Neighbor : 15 /16 � A c L Creator: Malik Magdon-Ismail Summary − →
Highlights of k -Nearest Neighbor } 1. Simple. 2. No training. A good! method 3. Near optimal E out . 4. Easy to justify classification to customer. 5. Can easily do multi-class. 6. Can easily adapt to regression or logistic regression k k g ( x ) = 1 g ( x ) = 1 � � � � y [ i ] ( x ) y [ i ] ( x ) = +1 k k i =1 i =1 7. Computationally demanding . ← − we will address this next M Similarity and Nearest Neighbor : 16 /16 � A c L Creator: Malik Magdon-Ismail
Recommend
More recommend