approximate nearest neighbor problem improving query time
play

Approximate Nearest Neighbor Problem: Improving Query Time CS468, - PowerPoint PPT Presentation

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline d ( d 1) / 2 Reducing the constant from O to O in query time Need to know ahead of time Preprocessing


  1. Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006

  2. Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc.

  3. Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc. • Timothy M. Chan. Approximate Nearest Neighbor Queries Revisited . Discrete and Computational Geometry 1998. – Decomposition of space into cones – BBD-tree for range searching in R d − k + point location in R k • Kenneth Clarkson. An Algorithm for Approximate Closest-point Queries . SoCG 1994. – Additional log( ρ/ǫ ) in space complexity – Polytope approximation in R d +1

  4. Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction

  5. Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction Uses the BBD-tree data structure Given a query point q ∈ R d and a radius r one can find O (log n ) cells of the BBD-tree which contain B ( q, r ) and are contained in B ( q, 2 r ) . This takes O (log n ) time Use for approximate range searching in R d − 1

  6. Conic ANN (with a Hint) Input: Query point q and a 2 -approximation r to the NN distance Output: A points s such that || q − s || ≤ (1 + ǫ ) || q − p || � where p is the NN inside a cone with apex q and angle δ = ǫ/ 16 r δ p q s Note: s need not be in the cone! Note: The cone is fixed (not a part of input, mod. translation to q )

  7. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q

  8. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone

  9. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s p true NN ( 1 + ǫ )-ANN (returned from that cone’s data structure) q

  10. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s Query time p O ( ǫ − ( d − 1) / 2 log n ) true NN ( 1 + ǫ )-ANN (returned from [ # of cones] [conic query] that cone’s data structure) q

  11. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ d -axis δ r q

  12. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min δ r s q δr 2 δr

  13. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr

  14. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr Data structure BBD-tree on the projection set For every tree node v the associated list of points is sorted in the d coordinate

  15. Conic-ANN Analysis Construction (preprocessing) BBD-tree O ( n log n ) +sorting O ( n log n ) = O ( n log n ) Query Approximate range query O (log n ) + bin. searches O (log 2 n ) = O (log 2 n ) Improving query time by exploiting correlation [Lueker and Willard] O (log n ) v O (log n ) nodes O (1) O (1) O (1) O (1) O (1) O (1) O (1) right ( v ) left ( v ) O (1) O (1) O (1) O (1) O (1)

  16. Summary and Remarks Variant with projecting to d − 2 dimensions • BBD tree + planar point location Rough ( ≈ d 3 / 2 ) approximation algorithms • Polynomial dependence on d

  17. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q q s

  18. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q s

  19. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q q ′ s Note The same L s valid for all q !

  20. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges

  21. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s

  22. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

  23. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Conclusion No improvement over the trivial algorithm! Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

  24. Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s ||

  25. Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s || Algorithm (simple version) s ← arbitrary site while ∃ t ∈ L s : || q − t || ≤ || q − s || 1+ ǫ/ 2 do s ← t return s

  26. Query Algorithm Skip list approach [Arya and Mount 1993] R 0 = S

  27. Query Algorithm Skip list approach [Arya and Mount 1993] R 1 R 0 = S

  28. Query Algorithm Skip list approach [Arya and Mount 1993] R 2 R 1 R 0 = S

  29. Query Algorithm Skip list approach [Arya and Mount 1993] R 3 R 2 R 1 R 0 = S

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend