LSH for 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 LSH for ℓ 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 21

LSH Approach for Approximate NNS Use locality-sensitive hashing to solve simplified decision problem Definition A family of hash functions is ( r , cr , p 1 , p 2 ) -LSH with p 1 > p 2 and c > 1 if h drawn randomly from the family satisfies the following: Pr[ h ( x ) = h ( y )] ≥ p 1 when dist ( x , y ) ≤ r Pr[ h ( x ) = h ( y )] ≤ p 2 when dist ( x , y ) ≥ cr Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. Two-level hashing scheme: Amplify basic locality sensitive hash family to create better family by repetition Use several copies of amplified hash functions Layer binary search based on r on top of above scheme. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 21

LSH Approach for Approximate NNS Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. L ≃ n ρ hash tables Storage: n 1+ ρ (ignoring log factors) Query time: kn ρ (ignoring log factors) where k = log 1 / p 2 n Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 21

LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21

LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21

LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Projections onto random lines plus bucketing Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21

Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 21

Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Pick d independent rvs Z 1 , Z 2 , . . . , Z d where each Z i ≃ N (0 , 1) and let g = ( Z 1 , Z 2 , . . . , Z d ) (also called a random Guassian vector) g is symmetric and hence is a random direction to obtain random unit vector normalize g ′ = g / � g � 2 When d is large � g � 2 i Z 2 2 = � i is concentrated around d and hence � g � 2 = (1 ± ǫ ) √ g with high probability. √ Thus g / d is a proxy for random unit vector and is easier to work with in many cases Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 21

Projection onto a random guassian vector Lemma Suppose x ∈ R d and g is a random Guassian vector. Let Y = x · g . Then Y ∼ N (0 , � x � 2 ) and hence E [ Y 2 ] = ( � x � 2 ) 2 . Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 21

Hashing scheme Pick a random unit Guassian vector u Pick a random shift a ∈ (0 , r ] For vector x set h u , a = ⌊ x · u + a ⌋ r Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 21

Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21

Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21

Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21

Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Thus collision probability depends only on s Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21

Analysis Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r For a fixed s collision probability is � r f ( t )(1 − t / r ) dt p ( s ) = 0 where f is the density function of | s N (0 , 1) | . Rewriting � r 1 s f ( t s )(1 − t / r ) dt p ( s ) = 0 where f is the density function of the |N (0 , 1) | . Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 21

ρ Analysis � r 1 s f ( t p ( s ) = s )(1 − t / r ) dt 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 log p 2 . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 21

Analysis � r 1 s f ( t p ( s ) = s )(1 − t / r ) dt 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 ρ log p 2 . Show ρ < 1 / c by plot 1 rho 1/c 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 Approximation factor c Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 21

NNS for Euclidean distances For any fixed c > 1 use above scheme to obtain Storage: O ( n 1+1 / c polylog ( n )) Query time: O ( dn 1 / c polylog ( n )) Can use JL to reduce d to O (log n ) . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 21

Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points → into R t Use lattice based space partitioning scheme to “bucket” instead of intervals [Andoni-Indyk’06] ρ p p X t w → 6]: ntil ρ ≥ 0.45/c 2 w ρ Figures from Piotr Indyk’s slides ρ ≥ Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 21

Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points into R t Use lattice based space partitioning scheme to “bucket” instead of intervals Leads to ρ ≃ 1 / c 2 + O (log t / √ t ) and hence tends to 1 / c 2 for large c and fixed t Lower bound for LSH in ℓ 2 says ρ ≥ 1 / c 2 Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 21

Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 21

Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Yes. [Andoni-Indyk-Ngyuyen-Razenshteyn’14, Andoni-Razensteyn’15] ρ = 1 / (2 c 2 − 1) for ℓ 2 improving upon 1 / c 2 for data oblivious LSH (which is tight in worst case) ρ = 1 / ( c 2 − 1) for ℓ 1 /Hamming cubt improving upon 1 / c for data oblivious LSH Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 21

LSH Summary A modular hashing based scheme for similarity estimation Main competitors are space partitioning data structures such as variants of k-d trees Provides speedups but uses more memory Does not appear to be a clear winner Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 21

Digression: p -stable distributions For F 2 estimation and JL and LSH we used important “stability” property of the Normal distribution. Lemma Let Y 1 , Y 2 , . . . , Y d be independent random variables with distribution N (0 , 1) . Z = � i x i Y i has distribution � x � 2 N (0 , 1) Standard Gaussian is 2 -stable. Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 21

LSH for 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 LSH for 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 21 LSH Approach for Approximate NNS Use locality-sensitive hashing to solve simplified decision

LSH for 2 distances Lecture 15 October 15, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 /

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

LSH: A Survey of Hashing for Similarity Search CS 584: Big Data Analytics LSH Problem Definition

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Roy oyal l Wels lsh Coll llege of Mus Music ic & Dram ama National Conservatoire of

Course : Data mining Topic : Locality-sensitive hashing (LSH) Aristides Gionis Aalto University

LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists Koninika Pal and

Learned about: LSH/Similarity search & recommender systems Search: jaguar

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a

Matching Using LSH Forest Michael Cochez * 1st International KEYSTONE Conference * Industrial

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

/ AVL trees and rotations This week, you should be able to perform rotations on

Lingering Questions in the Selection and Sequence of Therapy for Patients with mCRC Jo John L

Imprecise Inference for 2 2 Tables Mik elis Bickis with Naeima Ashleik University of

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates University of

Long Memory Time Series A time series has short memory if | ( h ) | < . So a

Long/Short-Term Memory Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified.

A two-sample test for comparison of long memory parameters F. Lavancier 1 , A. Philippe 1 , D.

LSH for 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 LSH for 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 21 LSH Approach for Approximate NNS Use locality-sensitive hashing to solve simplified decision

LSH for 2 distances Lecture 15 October 15, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 /

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

LSH: A Survey of Hashing for Similarity Search CS 584: Big Data Analytics LSH Problem Definition

Dr Jeffrey Chow Research Consultant Civic Exchange Distances to public open spaces Distances to

A Sociolinguistic Analysis of Linguistically Sensitive Dialectal Word Pronunciation Distances

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger

Metric Distances 28 Great Circle Distances North Pole (90N lat) North Pole C Prime

Geodesic distances and intrinsic distances on some fractal sets Masanori Hino (Kyoto Univ.)

Roy oyal l Wels lsh Coll llege of Mus Music ic &amp; Dram ama National Conservatoire of

Course : Data mining Topic : Locality-sensitive hashing (LSH) Aristides Gionis Aalto University

LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists Koninika Pal and

Learned about: LSH/Similarity search &amp; recommender systems Search: jaguar

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

Locality-Sensitive Hashing &amp; Image Similarity Search Andrew Wylie Overview; LSH given a

Matching Using LSH Forest Michael Cochez * 1st International KEYSTONE Conference * Industrial

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

/ AVL trees and rotations This week, you should be able to perform rotations on

Lingering Questions in the Selection and Sequence of Therapy for Patients with mCRC Jo John L

Imprecise Inference for 2 2 Tables Mik elis Bickis with Naeima Ashleik University of

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates University of

Long Memory Time Series A time series has short memory if | ( h ) | &lt; . So a

Long/Short-Term Memory Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified.

A two-sample test for comparison of long memory parameters F. Lavancier 1 , A. Philippe 1 , D.

Roy oyal l Wels lsh Coll llege of Mus Music ic & Dram ama National Conservatoire of

Learned about: LSH/Similarity search & recommender systems Search: jaguar

Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a

Long Memory Time Series A time series has short memory if | ( h ) | < . So a