Data-dependent Hashing for Nearest Neighbor Search Alex Andoni - PowerPoint PPT Presentation

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn

Nearest Neighbor Search (NNS)  Preprocess: a set 𝑄 of points  Query: given a query point 𝑟 , report a point 𝑞 ∗ ∈ 𝑄 with the smallest distance to 𝑟 𝑞 ∗ 𝑟 2

Motivation  Generic setup: 000000  Points model objects (e.g. images) 011100 010100  Distance models (dis)similarity measure 000100 010100 011111  Application areas: 000000 001100  machine learning: k-NN rule 𝑞 ∗ 000100 000100  speech/image/video/music recognition, vector 110100 111111 𝑟 quantization, bioinformatics, etc…  Distances:  Hamming, Euclidean, edit distance, earthmover distance, etc…  Core primitive: closest pair, clustering, etc… 3

Curse of Dimensionality  All exact algorithms degrade rapidly with the dimension 𝑒 Algorithm Query time Space 𝑜 𝑃(𝑒) (Voronoi diagram size) 𝑒 ⋅ log 𝑜 𝑃 1 Full indexing No indexing – 𝑃(𝑜 ⋅ 𝑒) 𝑃(𝑜 ⋅ 𝑒) linear scan 4

Approximate NNS  𝑠 -near neighbor: given a query point 𝑟 , report a point 𝑞 ′ ∈ 𝑄 s.t. 𝑞′ − 𝑟 ≤ 𝑠 𝑑𝑠  as long as there is some point within distance 𝑠 𝑠 𝑞 ∗ 𝑑𝑠 𝑟  Practice: use for exact NNS  Filtering : gives a set of candidates (hopefully 𝑞 ′ small) 5

NNS algorithms Exponential dependence on dimension  [Arya- Mount’93], [Clarkson’94], [Arya -Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har - Peled’02],[Arya -Fonseca- Mount’11],… Linear/poly dependence on dimension  [Kushilevitz-Ostrovsky- Rabani’98], [Indyk - Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk- Motwani’99], [Charikar’02], [Datar -Immorlica-Indyk- Mirrokni’04], [Chakrabarti - Regev’04], [Panigrahy’06], [Ailon - Chazelle’06], [A.- Indyk’06], [ A.-Indyk-Nguyen- Razenshteyn’14], [A. - Razenshteyn’15], [Pagh’16],[Laarhoven’16],… 6

Locality-Sensitive Hashing [Indyk-Motwani ’ 98] 𝑟 𝑞′ Random hash function ℎ on 𝑆 𝑒 𝑞 𝑟 satisfying:  for close pair : when 𝑟 − 𝑞 ≤ 𝑠 𝑄 1 = Pr[ℎ(𝑟) = ℎ(𝑞)] is “high” “ not-so-small ”  for far pair : when 𝑟 − 𝑞′ > 𝑑𝑠 𝑄 2 = Pr[ℎ(𝑟) = ℎ(𝑞′)] is “small” Use several hash tables: Pr[𝑕(𝑟) = 𝑕(𝑞)] 𝜍 = log 1/𝑄 1 1 𝑜 𝜍 , where log 1/𝑄 𝑄 1 2 𝑄 2 𝑟 − 𝑞 7 𝑠 𝑑𝑠

LSH Algorithms 𝒅 = 𝟑 Space Time Exponent Reference 𝑜 𝜍 𝑜 1+𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming space 𝜍 ≥ 1/𝑑 [MNP’06, OWZ’11] 𝑜 1+𝜍 𝑜 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98, DIIM’04] Euclidean 𝜍 ≈ 1/𝑑 2 space 𝜍 = 1/4 [AI’06] 𝜍 ≥ 1/𝑑 2 [MNP’06, OWZ’11] 8

LSH is tight… what’s next? Lower bounds (cell probe) Datasets with additional structure [A.-Indyk- Patrascu’06 , [Clarkson’99, Panigrahy-Talwar- Wieder’08,‘ 10, Karger- Ruhl’02, Kapralov- Panigrahy’12] Krauthgamer- Lee’04, Beygelzimer-Kakade- Langford’06, Indyk- Naor’07, Space-time trade-offs Dasgupta- Sinha’13, [Panigrahy’06, Abdullah-A.-Krauthgamer- Kannan’14,…] A.- Indyk’06 ] But are we really done with basic NNS algorithms? 9

Beyond Locality Sensitive Hashing 𝒅 = 𝟑 Space Time Exponent Reference 𝑜 1+𝜍 𝑜 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming LSH 𝜍 ≥ 1/𝑑 [MNP’06, OWZ’11] space 𝑜 1+𝜍 𝑜 𝜍 complicated 𝜍 = 1/2 − 𝜗 [AINR’14] 1 𝜍 = 1/3 [AR’15] 𝜍 ≈ 2𝑑 − 1 Euclidean 𝑜 1+𝜍 𝑜 𝜍 𝜍 ≈ 1/𝑑 2 𝜍 = 1/4 [AI’06] LSH space 𝜍 ≥ 1/𝑑 2 [MNP’06, OWZ’11] 𝑜 1+𝜍 𝑜 𝜍 𝜍 = 1/4 − 𝜗 complicated [AINR’14] 1 𝜍 = 1/7 [AR’15] 𝜍 ≈ 2𝑑 2 − 1 10

New approach? Data-dependent hashing  A random hash function, chosen after seeing the given dataset  Efficiently computable 11

Construction of hash function  T wo components: has better LSH  Nice geometric structure data-dependent  Reduction to such structure  Like a (weak) “ regularity l emma” for a set of points 12

Nice geometric structure: average-case  Think: random dataset on a sphere  vectors perpendicular to each other  s.t. random points at distance ≈ 𝑑𝑠 1  Lemma: 𝜍 = 2𝑑 2 −1 𝑑𝑠  via Cap Carving 𝑑𝑠/ 2 13

Reduction to nice structure  Idea: iteratively decrease the radius of minimum enclosing ball Why ok? Why ok?  Algorithm: no dense clusters •  find dense clusters  with smaller radius l ike “random dataset” •  large fraction of points with radius= 100𝑑𝑠  recurse on dense clusters radius = 100𝑑𝑠 even better! •  apply cap carving on the rest  recurse on each “cap”  eg, dense clusters might reappear radius = 99𝑑𝑠 14 *picture not to scale & dimension

Hash function  Described by a tree (like a hash table) radius = 100𝑑𝑠 15 *picture not to scale&dimension

2 − 𝜗 𝑆 Dense clusters  Current dataset: radius 𝑆  A dense cluster:  Contains 𝑜 1−𝜀 points  Smaller radius: 1 − Ω 𝜗 2 𝑆  After we remove all clusters:  For any point on the surface, there are at most 𝑜 1−𝜀 points 2 − 𝜗 𝑆 within distance 𝜗 trade-off 𝜀 trade-off  The other points are essentially orthogonal !  When applying Cap Carving with parameters (𝑄 1 , 𝑄 2 ) : ?  Empirical number of far pts colliding with query: 𝑜𝑄 2 + 𝑜 1−𝜀  As long as 𝑜𝑄 2 ≫ 𝑜 1−𝜀 , the “impurity” doesn’t matter!

Tree recap  During query:  Recurse in all clusters  Just in one bucket in CapCarving  Will look in >1 leaf!  How much branching? 𝑃(1/𝜗 2 )  Claim: at most 𝑜 𝜀 + 1  Each time we branch  at most 𝑜 𝜀 clusters ( +1 )  a cluster reduces radius by Ω(𝜗 2 )  cluster-depth at most 100/Ω 𝜗 2  Progress in 2 ways: 𝜀 trade-off  Clusters reduce radius  CapCarving nodes reduce the # of far points (empirical 𝑄 2 ) 1  A tree succeeds with probability ≥ 𝑜 − 2𝑑2−1 −𝑝(1) 17

Beyond “Beyond LSH”  Practice: often optimize partition to your dataset  PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…]  no guarantees (performance or correctness)  Theory: assume special structure in the dataset  low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…]  structure + noise [Abdullah-A.-Krauthgamer- Kannan’14] Data-dependent hashing helps even when no a priori structure ! 18

Data-dependent hashing wrap-up  Dynamicity?  Dynamization techniques [Overmars- van Leeuwen’81]  Better bounds?  For dimension 𝑒 = 𝑃(log 𝑜) , can get better 𝜍 ! [Laarhoven’16]  For 𝑒 > log 1+𝜀 𝑜 : our 𝜍 is optimal even for data-dependent hashing! [A- Razenshteyn ’??]:  in the right formalization (to rule out Voronoi diagram):  description complexity of the hash function is 𝑜 1−Ω(1)  Practical variant [A-Indyk-Laarhoven-Razenshteyn- Schmidt’15]  NNS for ℓ ∞  [Indyk’98] gets approximation 𝑃(log log 𝑒) (poly space, sublinear qt)  Cf., ℓ ∞ has no non-trivial sketch!  Some matching lower bounds in the relevant model [ACP’08, KP’12]  Can be thought of as data-dependent hashing  NNS for any norm (eg, matrix norms, EMD) ? 19

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni - PowerPoint PPT Presentation

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn Nearest Neighbor Search (NNS) Preprocess: a set of points Query: given a

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Dual-view Hashing MOHAMMAD RASTEGARI JONGHYUN CHOI SHOBEIR FAKHRAEI* HAL DAUM III LARRY S.

Extended and generalized weight enumerators Relinde Jurrius Ruud Pellikaan Eindhoven University

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &

Locality-Sensitive Hashing CS 395T: Visual Recognition and Search Marc Alban Feb 22, 2008 1

Discrete Factorization Machines for Fast Feature-based Recommendation Han Liu 1 , Xiangnan He 2 ,

Outline 1 Introduction to LWE The LWE Problem Motivation 2 Background and reformulating LWE 3 The

Great antipodal sets on unitary groups and Hamming graphs Hirotake Kurihara National Institute

A discrete curvature approach to strongly spherical graphs Shiping Liu University of Science and

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni - PowerPoint PPT Presentation

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn Nearest Neighbor Search (NNS) Preprocess: a set of points Query: given a

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Dual-view Hashing MOHAMMAD RASTEGARI JONGHYUN CHOI SHOBEIR FAKHRAEI* HAL DAUM III LARRY S.

Extended and generalized weight enumerators Relinde Jurrius Ruud Pellikaan Eindhoven University

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &amp;

Locality-Sensitive Hashing CS 395T: Visual Recognition and Search Marc Alban Feb 22, 2008 1

Discrete Factorization Machines for Fast Feature-based Recommendation Han Liu 1 , Xiangnan He 2 ,

Outline 1 Introduction to LWE The LWE Problem Motivation 2 Background and reformulating LWE 3 The

Great antipodal sets on unitary groups and Hamming graphs Hirotake Kurihara National Institute

A discrete curvature approach to strongly spherical graphs Shiping Liu University of Science and

Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio &