approximate nearest neighbors
play

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, - PowerPoint PPT Presentation

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions Approximate Nearest Neighbors What we want O(n log n)


  1. Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions

  2. Approximate Nearest Neighbors ● What we want – O(n log n) preprocess – O(n) space – O(log n) time query ● Possible in 1 and 2D ● Not really in 3D

  3. Lets Approximate ● Return a point within distance (1+ε)r ● Can achieve the bounds several ways ● First – compute rough approximation – use it to set scale for final solution ● Second – build a tree which solves the problem

  4. Ring Separator Tree t i u n o out out i in n out out out out in in in in

  5. Ring Separator Tree ● Answer (1+4/t)-ANN queries in O(height) ● Check if rep is closest, if so update closest ● Recurse on correct side of halfway ball

  6. Error Bounds ● Closest: rt/2 ● Returned: 2r+rt/2

  7. Construction ● Find circle containing n/c points

  8. Construction r ● Grid of side L = 16  d ● Number of points d n  4 L  c ● Set d c = 2  4L  ● Ring has n/2 points

  9. Construction ● Put ring in largest gap ● Size 2r/n

  10. The Upshot ● Can preprocess in O(n log n) time ● Query time is O(log n) ● (4n+1) approximation! ● Amazingly, this is good enough

  11. Bounded Distance O  1 ● Normal quadtree gives d  log   ● Why? – Approximation and r eliminates small cells (ε/4)r – Bound number of cells visited by last level – Do some algebra to get bound...

  12. A Complete Algorithm ● Build – a compressed quadtree/finger tree – a ring separator tree ● Compute approximate value, R ● Start from – nodes of size approximately R – and closer than R to query point

  13. Arya and Mount ● O(dn log n) time ● O(dn) space ● O(c d,ε log n) time ANN – where c d,ε ≤ d(1+6d/ε) d ● Can find k NN ● Any Minkowski metric ● Preprocessing does not depend on ε or metric

  14. Overview ● Build BBD tree ● Locate leaf containing q ● Try nearby nodes in order of distance ● Stop when no node is close enough

  15. Tree types ● KD reduce number of points each level ● Quadtree reduces size ● BBD does both – either KD-like split – or shrink

  16. Properties ● Bounded aspect ratio – bound number of cells intersecting a volume ● Stickiness – control number of nearby cells ● Inner boxes not cut by children – so everything packs

  17. An Important Trick ● Maintain 3 sorted lists of points (x,y,z) ● Have links between lists ● Allows – removal of first k points in time k – O(d) time determination of min bounding box

  18. Computing Shrinks ● Compute a set of splits – until have n/c in a rectangle – trivially sticky ● Problems – doesn't respect nesting – may have to split many times

  19. Computing Shrinks II ● Alway cut min enclosing box – constant time – always remove points – make sure it respects stickyness ● Include parent inner rectangle – go until it is cut out

  20. Computing Shrinks 2 ● More flexible ● Shrink roughly as before

  21. Tweaks ● Collapse trivial splits/shrinks – now no sequence of trivial splits ● Assign one point to each leaf – even to empty shrink cells

  22. Properties ● Bounded occupancy ● Point near each leaf ● Can do point location in O(d log n) time ● Packing constraint ● Distance enumeration

  23. Proof of Packing ● Ball of radius r – intersects (1+6r/s) d leaves of size s ● Trivial packing argument except for shrinks – use stickiness to replace outer boxes

  24. ANN using BBD ● Number of leaves visited is O((1+6d/ε) d ) ● r is distance to last non-terminating leaf ● r(1+ε)≤dist(q,p) ● Can't have visited cell smaller than rε/d – this cell must have a point closer than r(1+ε) ● Use packing argument from before

  25. Experimental Results Surface Data ● Choices 22.5 20 17.5 – shrink only when necessary 15 12.5 BBD Kd 10 – leaves held 5-8 points 7.5 5 ● Results 2.5 0 10 1 .1 .01 .001 – Slightly slower than Kd trees for even data – Much faster for clustered data (10x or so) – Slightly slower than Kd trees for surfaces (20%)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend