fast nearest neighbour classification
play

Fast Nearest Neighbour Classification Gordon Lesti July 17, 2015 - PowerPoint PPT Presentation

Fast Nearest Neighbour Classification Gordon Lesti July 17, 2015 Structure Introduction Problem Use Solutions Full Search Orchards Algorithm Annulus Method AESA Outlook Resources Nearest-Neighbour Searching Input Set U


  1. Fast Nearest Neighbour Classification Gordon Lesti July 17, 2015

  2. Structure Introduction Problem Use Solutions Full Search Orchards Algorithm Annulus Method AESA Outlook Resources

  3. Nearest-Neighbour Searching Input ◮ Set U ◮ Distance function d on U , with d : U × U → R ◮ Set S ⊂ U of size n ◮ Query item q ∈ U Output ◮ Item a ∈ S , with d ( q , a ) ≤ d ( q , x ) for all x ∈ S

  4. Use ◮ Pattern recognition ◮ Statistical classification ◮ Image editing ◮ Coding theory ◮ Data compression ◮ Recommender system ◮ . . .

  5. Full Search ◮ Calculate d ( q , x ) for all x ∈ S ◮ Return a ∈ S , with d ( q , a ) ≤ d ( q , x ) for all x ∈ S

  6. Full Search Example U = R 2 Items x 1 x 1 = (3 , 3) x 2 = ( − 1 , 2) x 2 x 3 = ( − 4 , − 4) q x 4 = (0 , − 1) x 5 = (4 , − 3) x 4 Query item q = (2 , 1) x 5 x 3

  7. Full Search Example Result d ( q , x 1 ) ≈ 2 . 236 d ( q , x 2 ) ≈ 3 . 162 x 1 d ( q , x 3 ) ≈ 7 . 810 x 2 d ( q , x 4 ) ≈ 2 . 828 q d ( q , x 5 ) ≈ 4 . 472 x 4 x 5 x 3

  8. Full Search Advantages and disadvantages Advantages ◮ Easy implementation ◮ Works in none metric spaces Disadvantages ◮ Large runtime on big data sets and in higher multidimensional spaces

  9. Metric Given a set X . A Metric on X is a function d : X × X → R , ( x , y ) �→ d ( x , y ) with: 1. d ( x , y ) = 0 exactly when x = y . 2. Symmetry: For all x , y ∈ X is true d ( x , y ) = d ( y , x ). 3. Triangle inequality: For all x , y , z ∈ X is true d ( x , z ) ≤ d ( x , y ) + d ( y , z ) [Forster, 2006]

  10. Triangle inequality p q s Lemma For any q , s , p ∈ U , r ∈ R and P ⊂ U is true: 1. | d ( p , q ) − d ( p , s ) | ≤ d ( q , s ) ≤ d ( p , q ) + d ( p , s ) 2. d ( q , s ) ≥ d P ( q , s ) := max p ∈ P | d ( p , q ) − d ( p , s ) | 3. d ( p , s ) > d ( p , q ) + r ∨ d ( p , s ) < d ( p , q ) − r ⇒ d ( q , s ) > r 4. d ( p , s ) ≥ 2 · d ( p , q ) ⇒ d ( q , s ) ≥ d ( q , p ) [Clarkson, 2005]

  11. Orchards Algorithm ◮ Create a list for every item p ∈ S with all items x ∈ S , ordered ascending to the distance ◮ Choose random item c ∈ S as initial candidate ◮ Calculate d ( c , q ) ◮ Go along the list of c ◮ If the current item has smaller distance to q as c , choose current item as c ◮ Abort, if ◮ at the end of the current list or ◮ d ( c , s ) > 2 · d ( c , q ) for the current item of the list (Triangle inequality 4) ◮ Else c is nearest neighbour

  12. Orchards Algorithm Example U = R 2 Items x 1 x 1 = (3 , 3) x 2 = ( − 1 , 2) x 2 x 3 = ( − 4 , − 4) q x 4 = (0 , − 1) x 5 = (4 , − 3) x 4 Query item q = (2 , 1) x 5 x 3

  13. Orchards Algorithm Example Distances x 1 x 2 x 3 x 4 x 5 x 1 0 ≈ 4 . 123 ≈ 9 . 899 5 ≈ 6 . 083 x 2 ≈ 4 . 123 0 ≈ 6 . 708 ≈ 3 . 162 ≈ 7 . 071 x 3 ≈ 9 . 899 ≈ 6 . 708 0 5 ≈ 8 . 062 5 ≈ 3 . 162 5 0 ≈ 4 . 472 x 4 x 5 ≈ 6 . 083 ≈ 7 . 071 ≈ 8 . 062 ≈ 4 . 472 0

  14. Orchards Algorithm Example Lists L ( x 1 ) = { x 2 , x 4 , x 5 , x 3 } L ( x 2 ) = { x 4 , x 1 , x 3 , x 5 } x 1 L ( x 3 ) = { x 4 , x 2 , x 5 , x 1 } x 2 L ( x 4 ) = { x 2 , x 5 , x 1 , x 3 } L ( x 5 ) = { x 4 , x 1 , x 2 , x 3 } x 4 x 5 x 3

  15. Orchards Algorithm Example ◮ Set c := x 3 and s := x 4 x 1 ◮ As 7 . 810 ≈ d ( c , q ) > x 2 d ( s , q ) ≈ 2 . 828, q set c := s x 4 x 5 x 3

  16. Orchards Algorithm Example ◮ Set c := x 4 and s := x 2 x 1 ◮ As 2 . 828 ≈ d ( c , q ) < x 2 d ( s , q ) ≈ 3 . 162, q no new c ◮ As x 4 3 . 162 ≈ d ( c , s ) < 2 · d ( c , q ) ≈ 5 . 656, no abort x 5 x 3

  17. Orchards Algorithm Example ◮ Set s := x 5 ◮ As x 1 2 . 828 ≈ d ( c , q ) < d ( s , q ) ≈ 4 . 472, x 2 no new c q ◮ As 4 . 472 ≈ d ( c , s ) < x 4 2 · d ( c , q ) ≈ 5 . 656, no abort x 5 x 3

  18. Orchards Algorithm Example ◮ Set s := x 1 ◮ As x 1 2 . 828 ≈ d ( c , q ) > d ( s , q ) ≈ 2 . 236, x 2 set c := s q x 4 x 5 x 3

  19. Orchards Algorithm Example ◮ Set c := x 1 and s := x 2 x 1 ◮ As 2 . 236 ≈ d ( c , q ) < x 2 d ( s , q ) ≈ 3 . 162, q no new c ◮ As x 4 4 . 123 ≈ d ( c , s ) < 2 · d ( c , q ) ≈ 4 . 472, no abort x 5 x 3

  20. Orchards Algorithm Example ◮ Set s := x 4 ◮ As x 1 2 . 236 ≈ d ( c , q ) < d ( s , q ) ≈ 2 . 828, x 2 no new c q ◮ As 5 ≈ d ( c , s ) > 2 · d ( c , q ) ≈ 4 . 472, x 4 abort x 5 x 3

  21. Orchards Algorithm Advantages and disadvantages Advantages ◮ Faster as Full Search Disadvantages ◮ Preprocessing needs large memory and runtime Improvement ◮ Use MarkBits to ensure that no distance is calculated twice

  22. Annulus Method ◮ Create a list for a random item p ∗ ∈ S with all items x ∈ S , ordered ascending to the distance ◮ Choose random item c ∈ S ◮ Walk alternating away from p ∗ and back to it in the list ◮ If current item s has smaller distance to q as c , set c := s ◮ Current item s is under c : ◮ If d ( p ∗ , s ) < d ( p ∗ , q ) − d ( c , q ), ignore all items under s (Triangle inequality 3) ◮ Current item s is above c : ◮ If d ( p ∗ , s ) > d ( p ∗ , q ) + d ( c , q ), ignore all items above s (Triangle inequality 3) ◮ c is the nearest neighbour if the entire list is traversed

  23. Annulus Method Example U = R 2 Items x 1 x 1 = (3 , 3) x 2 = ( − 1 , 2) x 2 x 3 = ( − 4 , − 4) q x 4 = (0 , − 1) x 5 = (4 , − 3) x 4 Query item q = (2 , 1) x 5 x 3

  24. Annulus Method Example Distances x 1 x 2 x 3 x 4 x 5 ≈ 6 . 083 ≈ 7 . 071 ≈ 8 . 062 ≈ 4 . 472 0 x 5

  25. Annulus Method Example p ∗ := x 5 with d ( p ∗ , q ) ≈ 4 . 472 List x 1 L ( x 5 ) = x 2 { x 5 , x 4 , x 1 , x 2 , x 3 } x 4 x 5 x 3

  26. Annulus Method Example ◮ Set c := x 2 with d ( c , q ) ≈ 3 . 162 x 1 ◮ Set s := x 3 with d ( s , q ) ≈ 7 . 810 x 2 ◮ 8 . 062 ≈ q d ( p ∗ , s ) > d ( p ∗ , q ) + x 4 d ( c , q ) ≈ 7 . 634 ⇒ ignore items above x 3 x 5 x 3

  27. Annulus Method Example ◮ Set s := x 1 with d ( s , q ) ≈ 2 . 236 x 1 ◮ As d ( s , q ) < d ( c , q ), x 2 set c := s q x 4 x 5 x 3

  28. Annulus Method Example ◮ Set c := x 1 with d ( c , q ) ≈ 2 . 236 x 1 ◮ Set s := x 4 with d ( s , q ) ≈ 2 . 828 x 2 ◮ 4 . 472 ≈ q d ( p ∗ , s ) > d ( p ∗ , q ) − x 4 d ( c , q ) ≈ 2 . 236 ⇒ ignore no items x 5 x 3

  29. Annulus Method Example ◮ Set s := x 5 with d ( s , q ) ≈ 4 . 472 < 2 . 236 ≈ d ( c , q ) x 1 ◮ End of list, c = x 1 x 2 nearest neighbour q of q x 4 x 5 x 3

  30. Annulus Method Advantages and disadvantages Advantages ◮ Faster as Full Search ◮ Less memory usage than Orchards Algorithm

  31. AESA Approximating and Eliminating Search Algorithm ◮ Create matrix with all distances d ( x , y ), with x , y ∈ S ◮ Every item is always in one status ◮ Known , d ( x , q ) is known ◮ Unknown , only d P ( x , q ) is known ◮ Rejected , d P ( x , q ) is bigger as smallest known distance r ◮ All x ∈ S are Unknown and d P ( x , q ) = −∞ ◮ Repeat until all x ∈ S Known oder Rejected 1. Choose Unknown item x ∈ S with smallest d P ( x , q ) 2. Calculate d ( x , q ), so that x gets Known 3. Refresh the smallest known distance r 4. Set P := P ∪ { x } , refresh d P ( x ′ , q ), if x ′ is Unknown mark x ′ as Rejected , if d P ( x ′ , q ) > r

  32. LAESA Linear Approximating and Eliminating Search Algorithm ◮ Works with a set of pivot items instead of a matrix ◮ Works best if pivot items are strongly separated

  33. Outlook ◮ Metric trees ◮ . . .

  34. Resources ◮ Otto Forster, 2006, Analysis 2 , Friedr. Vieweg & Sohn Verlag ◮ Kenneth L. Clarkson, 2005, Nearest-Neighbor Searching and Metric Space Dimensions , http://kenclarkson.org/nn survey/p.pdf

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend