approximate nearest line search in high dimensions
play

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - PowerPoint PPT Presentation

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi The NLS Problem Given: a set of lines in The NLS Problem Given: a set of lines in Goal: build a data structure s.t. given a


  1. Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi

  2. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒

  3. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ

  4. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ – polynomial space – sub-linear query time

  5. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ – polynomial space – sub-linear query time Approximation β€’ Finds an approximate closest line β„“ 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ ≀ 𝑒𝑒𝑒𝑒 ( π‘Ÿ , β„“ βˆ— )(1 + πœ— )

  6. Nearest Neighbor Problems Motivation Previous Work Our result Notation BACKGROUND

  7. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ .

  8. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ . Applications: database, information retrieval, β€’ pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points

  9. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ . Applications: database, information retrieval, β€’ pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points Current solutions suffer from β€œcurse of β€’ dimensionality”: – Either space or query time is exponential in 𝑒 – Little improvement over linear search

  10. Approximate Nearest Neighbor(ANN) β€’ ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds an approximate closest point π‘ž to π‘Ÿ , i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž βˆ— 1 + πœ—

  11. Approximate Nearest Neighbor(ANN) β€’ ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds an approximate closest point π‘ž to π‘Ÿ , i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž βˆ— 1 + πœ— β€’ There exist data structures with different tradeoffs. Example: 1 – Space: 𝑒𝑂 𝑃 πœ—2 𝑃 1 𝑒 log 𝑂 – Query time: πœ—

  12. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points

  13. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points β€’ Model data under linear variations β€’ Unknown or unimportant parameters in database

  14. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points β€’ Model data under linear variations β€’ Unknown or unimportant parameters in database β€’ Example: – Varying light gain parameter of images – Each image/point becomes a line – Search for the closest line to the query image

  15. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1

  16. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat β€’ [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑒 πœ—2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑒2

  17. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat β€’ [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑒 πœ—2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑒2 β€’ Very recently [MNSS] extended it for 𝑙 -flats 𝑙 𝑙+1βˆ’πœ +𝑒 – Query time 𝑃 π‘œ πœπ‘™ 𝑙+1βˆ’πœ + π‘œ log 𝑃 1 1+ 𝑒 π‘œ ) – Space: 𝑃 ( π‘œ

  18. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ—

  19. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒

  20. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 β€’ The first algorithm with poly log query time and polynomial space for objects other than points

  21. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 β€’ The first algorithm with poly log query time and polynomial space for objects other than points β€’ Only uses reductions to ANN

  22. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point

  23. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑

  24. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects

  25. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects β€’ π‘π‘œπ‘π‘π‘ : defined between lines

  26. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects β€’ π‘π‘œπ‘π‘π‘ : defined between lines β€’ πœ€ -close: two lines β„“ , β„“β€² are πœ€ -close if sin( π‘π‘œπ‘π‘π‘ β„“ , β„“ β€² ) ≀ πœ€

  27. Net Module Unbounded Module Parallel Module MODULES

  28. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line.

  29. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line. Lemma: β€’ Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line β„“ π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ π‘ž ≀ 𝑦 / πœ—

  30. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line. Lemma: β€’ Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line β„“ π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ π‘ž ≀ 𝑦 / πœ— Issue: It should be used inside a bounded region

  31. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝

  32. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝 β€’ Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂 ( 𝑄 , πœ— )

  33. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝 β€’ Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂 ( 𝑄 , πœ— ) β€’ Query Algorithm: – Project the query on 𝑇 ( 𝑝 , 𝑠 ) to get π‘Ÿβ€² – Find the approximate closest point to π‘Ÿβ€² , i.e., π‘ž = 𝐡𝑂𝑂 𝑄 π‘Ÿ β€² – Return the corresponding line of π‘ž

  34. Unbounded Module β€’ All lines in 𝑀 pass through a small ball 𝐢 𝑝 , 𝑠 β€’ Query is far enough, outside of 𝐢 ( 𝑝 , 𝑆 ) β€’ Use the same data structure and query algorithm

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend