Feb 22, 2008 1
Locality-Sensitive Hashing
CS 395T: Visual Recognition and Search Marc Alban
Locality-Sensitive Hashing CS 395T: Visual Recognition and Search - - PowerPoint PPT Presentation
Locality-Sensitive Hashing CS 395T: Visual Recognition and Search Marc Alban Feb 22, 2008 1 Nearest Neighbor Given a query any point , return the point q closest to . q Useful for finding similar objects in a database. Brute
Feb 22, 2008 1
CS 395T: Visual Recognition and Search Marc Alban
Feb 22, 2008 2
Given a query any point , return the point
Useful for finding similar objects in a database. Brute force linear search is not practical for
?
Feb 22, 2008 3
For , data structures exist that
Time or space requirements grow exponentially
The dimensionality of images or documents is
Brute force linear search is the best we can do.
Feb 22, 2008 4
An approximate nearest neighbor should suffice
Definition: If for any query point , there exists
?
Feb 22, 2008 5
Feb 22, 2008 6
Definition: Hamming space is the set of all
Definition: The Hamming distance between
Feb 22, 2008 7
Let a hashing family be defined as
Feb 22, 2008 8
A k-bit locality-sensitive hash function (LSHF) is
Each is chosen randomly from . Each results in a single bit.
Pr(similar points collide) Pr(dissimilar points collide) · P k
2
Feb 22, 2008 9
Each training example is entered into hash
Preprocessing Space:
...
Feb 22, 2008 10
For each hash table
Return the bin indexed by
Perform a linear search on the union of the
...
Feb 22, 2008 11
Suppose we want to search at most
Feb 22, 2008 12
Compare LSH accuracy and performance to
k, the number of hash bits. l, the number of hash tables. B, the maximum search length.
Dataset
59500 20x20 patches taken from
Represented as 400-dimensional
Feb 22, 2008 13
Convert the feature vectors into binary strings
Given a vector we can create a unary
= 1's followed by 0's,
Note that for any two points :
Feb 22, 2008 14
Query = Examples searched: 7,722 of 59,500 Result = Actual NNs =
Feb 22, 2008 15
Let B = 1
5 10 15 20 25 30 5 10 15 20 25 30 24 22 20 18 16 14 12 10 8 6 4 2 x1000
Feb 22, 2008 16
5 10 15 20 25 30 5 10 15 20 25 30 24 22 20 18 16 14 12 10 8 6 4 2 x1000
Let B = 1
More hash bits, (k), result in shorter searches.
More hash tables (l), result in longer searches.
Feb 22, 2008 17
Let
5 10 15 20 25 30 5 10 15 20 25 30 1.11 1.1 1.09 1.08 1.07 1.06 1.05 1.04
Feb 22, 2008 18
Let
5 10 15 20 25 30 5 10 15 20 25 30 1.11 1.1 1.09 1.08 1.07 1.06 1.05 1.04
Over hashing can result in too few candidates to return a good approximation.
Over hashing can cause algorithm to fail.
Feb 22, 2008 19
Let
Over hashing can result in too few candidates to return a good approximation.
Over hashing can cause algorithm to fail.
5 10 15 20 25 30 5 10 15 20 25 30 1.11 1.1 1.09 1.08 1.07 1.06 1.05 1.04
Average search length = 8000
Feb 22, 2008 20
Let
5 10 15 20 25 30 5 10 15 20 25 30 1.15 1.14 1.13 1.12 1.11 1.1 1.09 1.08
Feb 22, 2008 21
Let B = 250 ¼
5 10 15 20 25 30 5 10 15 20 25 30 1.6 1.55 1.5 1.45 1.4 1.35 1.3 1.25
Feb 22, 2008 22
Examine the effect of the approximation on the
Dataset
D. Nistér and H. Stewénius.
Scalable recognition with a vocabulary tree
2550 sets of 4 images
represented as document-term matrix of the visual words.
Feb 22, 2008 23
LSH requires a vector representation. Not clear how to easily convert a bag of words
A binary vector where the presence of each word is
Each image has roughly the same number of
Boostmap?
Feb 22, 2008 24
Approximate Nearest Neighbors is neccessary
LSH is a simple approach to aNN. LSH requires a vector representation. Clear relationship between search length and
Feb 22, 2008 25
Octave (MATLAB) LSH Matlab Toolbox -
Python Gnuplot
Feb 22, 2008 26
'Fast Pose Estimation with Parameter Senative Hashing' – Shakhnarovich et al.
'Similarity Search in High Dimensions via Hashing' – Gionis et al.
'Object Recognition Using Locality-Sensitive Hashing of Shape Contexts' - Andrea Frome and Jitendra Malik
'Nearest neighbors in high-dimensional spaces', Handbook of Discrete and Computational Geometry – Piotr Indyk
Algorithms for Nearest Neighbor Search - http://simsearch.yury.name/tutorial.html
LSH Matlab Toolbox - http://www.cs.brown.edu/~gregory/code/lsh/