data dependent hashing for nearest neighbor search
play

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni - PowerPoint PPT Presentation

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn Nearest Neighbor Search (NNS) Preprocess: a set of points Query: given a


  1. Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn

  2. Nearest Neighbor Search (NNS)  Preprocess: a set 𝑄 of points  Query: given a query point π‘Ÿ , report a point π‘ž βˆ— ∈ 𝑄 with the smallest distance to π‘Ÿ π‘ž βˆ— π‘Ÿ 2

  3. Motivation  Generic setup: 000000  Points model objects (e.g. images) 011100 010100  Distance models (dis)similarity measure 000100 010100 011111  Application areas: 000000 001100  machine learning: k-NN rule π‘ž βˆ— 000100 000100  speech/image/video/music recognition, vector 110100 111111 π‘Ÿ quantization, bioinformatics, etc…  Distances:  Hamming, Euclidean, edit distance, earthmover distance, etc…  Core primitive: closest pair, clustering, etc… 3

  4. Curse of Dimensionality  All exact algorithms degrade rapidly with the dimension 𝑒 Algorithm Query time Space π‘œ 𝑃(𝑒) (Voronoi diagram size) 𝑒 β‹… log π‘œ 𝑃 1 Full indexing No indexing – 𝑃(π‘œ β‹… 𝑒) 𝑃(π‘œ β‹… 𝑒) linear scan 4

  5. Approximate NNS  𝑠 -near neighbor: given a query point π‘Ÿ , report a point π‘ž β€² ∈ 𝑄 s.t. π‘žβ€² βˆ’ π‘Ÿ ≀ 𝑠 𝑑𝑠  as long as there is some point within distance 𝑠 𝑠 π‘ž βˆ— 𝑑𝑠 π‘Ÿ  Practice: use for exact NNS  Filtering : gives a set of candidates (hopefully π‘ž β€² small) 5

  6. NNS algorithms Exponential dependence on dimension  [Arya- Mount’93], [Clarkson’94], [Arya -Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har - Peled’02],[Arya -Fonseca- Mount’11],… Linear/poly dependence on dimension  [Kushilevitz-Ostrovsky- Rabani’98], [Indyk - Motwani’98], [Indyk’98, β€˜01], [Gionis-Indyk- Motwani’99], [Charikar’02], [Datar -Immorlica-Indyk- Mirrokni’04], [Chakrabarti - Regev’04], [Panigrahy’06], [Ailon - Chazelle’06], [A.- Indyk’06], [ A.-Indyk-Nguyen- Razenshteyn’14], [A. - Razenshteyn’15], [Pagh’16],[Laarhoven’16],… 6

  7. Locality-Sensitive Hashing [Indyk-Motwani ’ 98] π‘Ÿ π‘žβ€² Random hash function β„Ž on 𝑆 𝑒 π‘ž π‘Ÿ satisfying:  for close pair : when π‘Ÿ βˆ’ π‘ž ≀ 𝑠 𝑄 1 = Pr[β„Ž(π‘Ÿ) = β„Ž(π‘ž)] is β€œhigh” β€œ not-so-small ”  for far pair : when π‘Ÿ βˆ’ π‘žβ€² > 𝑑𝑠 𝑄 2 = Pr[β„Ž(π‘Ÿ) = β„Ž(π‘žβ€²)] is β€œsmall” Use several hash tables: Pr[𝑕(π‘Ÿ) = 𝑕(π‘ž)] 𝜍 = log 1/𝑄 1 1 π‘œ 𝜍 , where log 1/𝑄 𝑄 1 2 𝑄 2 π‘Ÿ βˆ’ π‘ž 7 𝑠 𝑑𝑠

  8. LSH Algorithms 𝒅 = πŸ‘ Space Time Exponent Reference π‘œ 𝜍 π‘œ 1+𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming space 𝜍 β‰₯ 1/𝑑 [MNP’06, OWZ’11] π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98, DIIM’04] Euclidean 𝜍 β‰ˆ 1/𝑑 2 space 𝜍 = 1/4 [AI’06] 𝜍 β‰₯ 1/𝑑 2 [MNP’06, OWZ’11] 8

  9. LSH is tight… what’s next? Lower bounds (cell probe) Datasets with additional structure [A.-Indyk- Patrascu’06 , [Clarkson’99, Panigrahy-Talwar- Wieder’08,β€˜ 10, Karger- Ruhl’02, Kapralov- Panigrahy’12] Krauthgamer- Lee’04, Beygelzimer-Kakade- Langford’06, Indyk- Naor’07, Space-time trade-offs Dasgupta- Sinha’13, [Panigrahy’06, Abdullah-A.-Krauthgamer- Kannan’14,…] A.- Indyk’06 ] But are we really done with basic NNS algorithms? 9

  10. Beyond Locality Sensitive Hashing 𝒅 = πŸ‘ Space Time Exponent Reference π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming LSH 𝜍 β‰₯ 1/𝑑 [MNP’06, OWZ’11] space π‘œ 1+𝜍 π‘œ 𝜍 complicated 𝜍 = 1/2 βˆ’ πœ— [AINR’14] 1 𝜍 = 1/3 [AR’15] 𝜍 β‰ˆ 2𝑑 βˆ’ 1 Euclidean π‘œ 1+𝜍 π‘œ 𝜍 𝜍 β‰ˆ 1/𝑑 2 𝜍 = 1/4 [AI’06] LSH space 𝜍 β‰₯ 1/𝑑 2 [MNP’06, OWZ’11] π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/4 βˆ’ πœ— complicated [AINR’14] 1 𝜍 = 1/7 [AR’15] 𝜍 β‰ˆ 2𝑑 2 βˆ’ 1 10

  11. New approach? Data-dependent hashing  A random hash function, chosen after seeing the given dataset  Efficiently computable 11

  12. Construction of hash function  T wo components: has better LSH  Nice geometric structure data-dependent  Reduction to such structure  Like a (weak) β€œ regularity l emma” for a set of points 12

  13. Nice geometric structure: average-case  Think: random dataset on a sphere  vectors perpendicular to each other  s.t. random points at distance β‰ˆ 𝑑𝑠 1  Lemma: 𝜍 = 2𝑑 2 βˆ’1 𝑑𝑠  via Cap Carving 𝑑𝑠/ 2 13

  14. Reduction to nice structure  Idea: iteratively decrease the radius of minimum enclosing ball Why ok? Why ok?  Algorithm: no dense clusters β€’  find dense clusters  with smaller radius l ike β€œrandom dataset” β€’  large fraction of points with radius= 100𝑑𝑠  recurse on dense clusters radius = 100𝑑𝑠 even better! β€’  apply cap carving on the rest  recurse on each β€œcap”  eg, dense clusters might reappear radius = 99𝑑𝑠 14 *picture not to scale & dimension

  15. Hash function  Described by a tree (like a hash table) radius = 100𝑑𝑠 15 *picture not to scale&dimension

  16. 2 βˆ’ πœ— 𝑆 Dense clusters  Current dataset: radius 𝑆  A dense cluster:  Contains π‘œ 1βˆ’πœ€ points  Smaller radius: 1 βˆ’ Ξ© πœ— 2 𝑆  After we remove all clusters:  For any point on the surface, there are at most π‘œ 1βˆ’πœ€ points 2 βˆ’ πœ— 𝑆 within distance πœ— trade-off πœ€ trade-off  The other points are essentially orthogonal !  When applying Cap Carving with parameters (𝑄 1 , 𝑄 2 ) : ?  Empirical number of far pts colliding with query: π‘œπ‘„ 2 + π‘œ 1βˆ’πœ€  As long as π‘œπ‘„ 2 ≫ π‘œ 1βˆ’πœ€ , the β€œimpurity” doesn’t matter!

  17. Tree recap  During query:  Recurse in all clusters  Just in one bucket in CapCarving  Will look in >1 leaf!  How much branching? 𝑃(1/πœ— 2 )  Claim: at most π‘œ πœ€ + 1  Each time we branch  at most π‘œ πœ€ clusters ( +1 )  a cluster reduces radius by Ξ©(πœ— 2 )  cluster-depth at most 100/Ξ© πœ— 2  Progress in 2 ways: πœ€ trade-off  Clusters reduce radius  CapCarving nodes reduce the # of far points (empirical 𝑄 2 ) 1  A tree succeeds with probability β‰₯ π‘œ βˆ’ 2𝑑2βˆ’1 βˆ’π‘(1) 17

  18. Beyond β€œBeyond LSH”  Practice: often optimize partition to your dataset  PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…]  no guarantees (performance or correctness)  Theory: assume special structure in the dataset  low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…]  structure + noise [Abdullah-A.-Krauthgamer- Kannan’14] Data-dependent hashing helps even when no a priori structure ! 18

  19. Data-dependent hashing wrap-up  Dynamicity?  Dynamization techniques [Overmars- van Leeuwen’81]  Better bounds?  For dimension 𝑒 = 𝑃(log π‘œ) , can get better 𝜍 ! [Laarhoven’16]  For 𝑒 > log 1+πœ€ π‘œ : our 𝜍 is optimal even for data-dependent hashing! [A- Razenshteyn ’??]:  in the right formalization (to rule out Voronoi diagram):  description complexity of the hash function is π‘œ 1βˆ’Ξ©(1)  Practical variant [A-Indyk-Laarhoven-Razenshteyn- Schmidt’15]  NNS for β„“ ∞  [Indyk’98] gets approximation 𝑃(log log 𝑒) (poly space, sublinear qt)  Cf., β„“ ∞ has no non-trivial sketch!  Some matching lower bounds in the relevant model [ACP’08, KP’12]  Can be thought of as data-dependent hashing  NNS for any norm (eg, matrix norms, EMD) ? 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend