graph based time space trade offs for approximate near
play

Graph-based timespace trade-offs for approximate near neighbors - PowerPoint PPT Presentation

Graph-based timespace trade-offs for approximate near neighbors Thijs Laarhoven mail@thijs.com http://thijs.com/ SoCG 2018 , Budapest, Hungary (June 13, 2018) Nearest neighbor searching Nearest neighbor problem Problem description


  1. Graph-based time–space trade-offs for approximate near neighbors Thijs Laarhoven mail@thijs.com http://thijs.com/ SoCG 2018 , Budapest, Hungary (June 13, 2018)

  2. Nearest neighbor searching

  3. Nearest neighbor problem – Problem description

  4. Nearest neighbor problem – Problem description

  5. Nearest neighbor problem – Problem description

  6. Nearest neighbor problem – Approximate solutions r

  7. Nearest neighbor problem – Approximate solutions r c · r

  8. Nearest neighbor problem – Example: Voronoi cells

  9. Nearest neighbor problem – Example: Voronoi cells

  10. Nearest neighbor problem – Example: Voronoi cells

  11. Partition-based methods

  12. Partition-based methods – Data structure

  13. Partition-based methods – Data structure

  14. Partition-based methods – Hash table lookups

  15. Partition-based methods – Hash table lookups

  16. Partition-based methods – Hash table lookups

  17. Partition-based methods – Hash table lookups

  18. Partition-based methods – Near the boundaries

  19. Partition-based methods – Near the boundaries

  20. Partition-based methods – Near the boundaries

  21. Partition-based methods – Randomizations

  22. Partition-based methods – Randomizations

  23. Partition-based methods – Randomizations

  24. • Product of bisections; [Cha03] Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical .

  25. • Product of bisections; [Cha03] Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical .

  26. Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical . • Product of bisections; [Cha03]

  27. Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical . • Product of bisections; [Cha03]

  28. Nearest neighbor methods – Practice (ANN Benchmarks [ABF17]) bruteforce-blas rpforest nearpy hnsw(nmslib) fmann falconn faiss-lsh faiss-ivf faiss-gpu DolphinnPy dolphinn bruteforce0(nmslib) BallTree(nmslib) 0 annoy Queries per second Recall rate 10 4 10 3 10 2 10 1 1 SW-graph(nmslib) 0 . 2 0 . 4 0 . 6 0 . 8

  29. Nearest neighbor methods – Practice (ANN Benchmarks [ABF17]) bruteforce-blas rpforest nearpy hnsw(nmslib) fmann falconn faiss-lsh faiss-ivf faiss-gpu DolphinnPy dolphinn bruteforce0(nmslib) BallTree(nmslib) 0 annoy Queries per second Recall rate 10 4 10 3 10 2 10 1 1 SW-graph(nmslib) 0 . 2 0 . 4 0 . 6 0 . 8

  30. Graph-based methods

  31. Graph-based methods – Data structure

  32. Graph-based methods – Data structure

  33. Graph-based methods – Greedy algorithm

  34. Graph-based methods – Greedy algorithm

  35. Graph-based methods – Greedy algorithm

  36. Graph-based methods – Greedy algorithm

  37. Graph-based methods – Greedy algorithm

  38. Graph-based methods – Greedy algorithm

  39. Graph-based methods – Greedy algorithm

  40. Graph-based methods – Greedy algorithm

  41. Graph-based methods – Greedy algorithm

  42. Graph-based methods – Greedy algorithm

  43. Graph-based methods – Greedy algorithm

  44. Graph-based methods – Greedy algorithm

  45. Graph-based methods – Local solutions

  46. Graph-based methods – Local solutions

  47. Graph-based methods – Local solutions

  48. Graph-based methods – Local solutions

  49. Graph-based methods – Local solutions

  50. Graph-based methods – Local solutions

  51. Graph-based methods – Local solutions

  52. Graph-based methods – Randomizations

  53. Graph-based methods – Randomizations

  54. Graph-based methods – Randomizations

  55. Graph-based methods – Randomizations

  56. Graph-based methods – Randomizations

  57. Graph-based methods – Randomizations

  58. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  59. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  60. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  61. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  62. Graph-based methods – Contributions Theorem (Main result, informal) For randomized greedy walks on the near neighbor graph and for “random” data sets, we can solve the approximate nearest neighbor problem on n points with query time O ( n ρ q ) and space O ( n 1 + ρ s ) with ρ q , ρ s ≥ 0 satisfying ( 2 c 2 − 1 ) ρ q + 2 c 2 ( c 2 − 1 ) � ρ s ( 1 − ρ s ) ≥ c 4 .

  63. Graph-based methods – Contributions space), this scales equivalently as the best partition-based trade-offs: [ALRW17] (1) Positive result : greedy algorithm already “optimal” for c 1 and s 0. Negative result : (analysis of) this algorithm is not competitive for c 1 or s 0. In the most common regime of c ≈ 1 ( high recall rate ) and ρ s ≈ 0 (near-linear ρ q = 1 − 4 ( c − 1 ) √ ρ s · ( 1 + o ( 1 )) .

  64. Graph-based methods – Contributions space), this scales equivalently as the best partition-based trade-offs: [ALRW17] (1) Negative result : (analysis of) this algorithm is not competitive for c 1 or s 0. In the most common regime of c ≈ 1 ( high recall rate ) and ρ s ≈ 0 (near-linear ρ q = 1 − 4 ( c − 1 ) √ ρ s · ( 1 + o ( 1 )) . Positive result : greedy algorithm already “optimal” for c ≈ 1 and ρ s ≈ 0.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend