Approximate Nearest Neighbor Problem: Improving Query Time CS468, - PowerPoint PPT Presentation

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006

Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc.

Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc. • Timothy M. Chan. Approximate Nearest Neighbor Queries Revisited . Discrete and Computational Geometry 1998. – Decomposition of space into cones – BBD-tree for range searching in R d − k + point location in R k • Kenneth Clarkson. An Algorithm for Approximate Closest-point Queries . SoCG 1994. – Additional log( ρ/ǫ ) in space complexity – Polytope approximation in R d +1

Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction

Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction Uses the BBD-tree data structure Given a query point q ∈ R d and a radius r one can find O (log n ) cells of the BBD-tree which contain B ( q, r ) and are contained in B ( q, 2 r ) . This takes O (log n ) time Use for approximate range searching in R d − 1

Conic ANN (with a Hint) Input: Query point q and a 2 -approximation r to the NN distance Output: A points s such that || q − s || ≤ (1 + ǫ ) || q − p || � where p is the NN inside a cone with apex q and angle δ = ǫ/ 16 r δ p q s Note: s need not be in the cone! Note: The cone is fixed (not a part of input, mod. translation to q )

Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q

Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone

Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s p true NN ( 1 + ǫ )-ANN (returned from that cone’s data structure) q

Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s Query time p O ( ǫ − ( d − 1) / 2 log n ) true NN ( 1 + ǫ )-ANN (returned from [ # of cones] [conic query] that cone’s data structure) q

Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ d -axis δ r q

Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min δ r s q δr 2 δr

Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr

Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr Data structure BBD-tree on the projection set For every tree node v the associated list of points is sorted in the d coordinate

Conic-ANN Analysis Construction (preprocessing) BBD-tree O ( n log n ) +sorting O ( n log n ) = O ( n log n ) Query Approximate range query O (log n ) + bin. searches O (log 2 n ) = O (log 2 n ) Improving query time by exploiting correlation [Lueker and Willard] O (log n ) v O (log n ) nodes O (1) O (1) O (1) O (1) O (1) O (1) O (1) right ( v ) left ( v ) O (1) O (1) O (1) O (1) O (1)

Summary and Remarks Variant with projecting to d − 2 dimensions • BBD tree + planar point location Rough ( ≈ d 3 / 2 ) approximation algorithms • Polynomial dependence on d

Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q q s

Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q s

Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q q ′ s Note The same L s valid for all q !

Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges

Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s

Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Conclusion No improvement over the trivial algorithm! Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s ||

Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s || Algorithm (simple version) s ← arbitrary site while ∃ t ∈ L s : || q − t || ≤ || q − s || 1+ ǫ/ 2 do s ← t return s

Query Algorithm Skip list approach [Arya and Mount 1993] R 0 = S

Query Algorithm Skip list approach [Arya and Mount 1993] R 1 R 0 = S

Query Algorithm Skip list approach [Arya and Mount 1993] R 2 R 1 R 0 = S

Query Algorithm Skip list approach [Arya and Mount 1993] R 3 R 2 R 1 R 0 = S

Approximate Nearest Neighbor Problem: Improving Query Time CS468, - PowerPoint PPT Presentation

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline d ( d 1) / 2 Reducing the constant from O to O in query time Need to know ahead of time Preprocessing

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Rectangle-of-influence triangulations Therese Biedl 1 Anna Lubiw 1 Saeed Mehrabi 1 Sander

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing

Towards NNLO Corrections for Jet Observables at LHC Thomas Gehrmann Universit at Z urich

From Traces To Proofs: Proving Concurrent Programs Safe S. Arun-Kumar (Joint work with Chinmay

UMBC A B M A L T F O U M B C I M Y O R T 1 (9/10/04) I E S R C E O V U

EE-612: Lecture 25: CMOS Circuits: Part 2 Mark Lundstrom Electrical and Computer Engineering

Dynamic operation 20 A simple model for the propagation delay Symmetric inverter (rise and

ambrosys What is an ODE? Examples Newtons equations Reaction and relaxation equations (i.e.