Approximate Nearest Neighbors Search Approximate Nearest Neighbors - - PowerPoint PPT Presentation

approximate nearest neighbors search approximate nearest
SMART_READER_LITE
LIVE PREVIEW

Approximate Nearest Neighbors Search Approximate Nearest Neighbors - - PowerPoint PPT Presentation

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in High Dimensions and Locality-Sensitive Hashing and Locality-Sensitive Hashing PAPERS Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors:


slide-1
SLIDE 1

PAPERS Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality, STOC 1998. Eyal Kushilevitz, Rafail Ostrovsky, Yuval Rabani: Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. SIAM J. Comput., 2000. Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. Symposium on Computational Geometry 2004. Alexandr Andoni, Mayur Datar, Nicole Immorlica, Vahab S. Mirrokni, P. Indyk: Locality-sensitive hashing using stable distributions. In: Nearest Neighbor Methods in Learning and Vision: Theory and Practice, 2006.

Aneesh Sharma, Michael Wand Aneesh Sharma, Michael Wand CS 468|Geometric Algorithms CS 468|Geometric Algorithms

Approximate Nearest Neighbors Search in High Dimensions and Locality-Sensitive Hashing Approximate Nearest Neighbors Search in High Dimensions and Locality-Sensitive Hashing

slide-2
SLIDE 2

2

Overview Overview

Overview

  • Introduction
  • Locality Sensitive Hashing (Aneesh)
  • Hash Functions Based on p-Stable

Distributions (Michael)

slide-3
SLIDE 3

3

Overview Overview

Overview

  • Introduction
  • Nearest neighbor search problems
  • Higher dimensions
  • Johnson-Lindenstrauss lemma
  • Locality Sensitive Hashing (Aneesh)
  • Hash Functions Based on p-Stable

Distributions (Michael)

slide-4
SLIDE 4

4

Problem Problem

slide-5
SLIDE 5

5

Problem Statement Problem Statement

Today’s Talks: NN-search in high dimensional spaces

  • Given
  • Point set P = {p1, …, pn}
  • a query point q
  • Find
  • [ε-approximate] nearest neighbor to q from P
  • Goal:
  • Sublinear query time
  • “Reasonable” preprocessing time & space
  • “Reasonable” growth in d (exponential not acceptable)
slide-6
SLIDE 6

6

Example Application: Feature spaces

  • Vectors x∈ Rd represent

characteristic features

  • f objects
  • There are often many

features

  • Use nearest neighbor

rule for classification / recognition

mileage

Applications Applications

? ?

top speed SUV sports car sedan

slide-7
SLIDE 7

7

Applications Applications

“Real World” Example: Image Completion

slide-8
SLIDE 8

8

Applications Applications

“Real World” Example: Image Completion

  • Iteratively fill in pixels with best match

(+ multi scale)

  • Typically 5×5 … 9×9 neighborhoods,

i.e.: dimension 25 … 81

  • Performance limited by

nearest neighbor search

  • 3D version: dimension 81 … 729
slide-9
SLIDE 9

9

Higher Dimensions Higher Dimensions

slide-10
SLIDE 10

10

Higher Dimensions are Weird Higher Dimensions are Weird

Issues with High-Dimensional Spaces :

  • d-dimensional space:

d independent neighboring directions to each point

  • Volume-distance ratio explodes

d = 1 d = 2 d = 3 d →

∞ vol(r) ∈ ∈ ∈ ∈ Θ(r d)

slide-11
SLIDE 11

11

No Grid Tricks No Grid Tricks

Regular Subdivision Techniques Fail

  • Regular k-grids contain kd cells
  • The “grid trick” does not work
  • Adaptive grids usually also

do not help

  • Conventional integration

becomes infeasible (⇒ MC-approx.)

  • Finite element function representation

become infeasible

k subdivisions

slide-12
SLIDE 12

12

More Weird Effects:

  • Dart-throwing anomaly
  • Normal distributions

gather prob.-mass in thin shells

  • [Bishop 95]
  • Nearest neighbor ~ farthest neighbor
  • For unstructured points (e.g. iid-random)
  • Not true for certain classes of structured data
  • [Beyer et al. 99]

Higher Dimensions are Weird Higher Dimensions are Weird

d = 1..200 d = 1..200

slide-13
SLIDE 13

13

Johnson-Lindenstrauss Lemma Johnson-Lindenstrauss Lemma

slide-14
SLIDE 14

14

Johnson-Lindenstrauss Lemma Johnson-Lindenstrauss Lemma

JL-Lemma: [Dasgupta et al. 99]

  • Point set P in Rd, n := #P
  • There is f: Rd → Rk,

k ∈ O(ε -2 lnn) (k ≥ 4(ε 2/2 – ε 3/3)-1 lnn)

  • …that preserves all inter-point distances

up to a factor of (1+ε)

Random orthogonal linear projection works with probability ≥ ≥ ≥ ≥ (1-1/n)

slide-15
SLIDE 15

15

This means… This means…

What Does the JL-Lemma Imply?

Pairwise distances in small point set P (sub-exponential in d) can be well-preserved in low-dimensional embedding

What does it not say?

Does not imply that the points themselves are well-represented (just the pairwise distances)

slide-16
SLIDE 16

16

Experiment Experiment

slide-17
SLIDE 17

17

Intuition Intuition

Difference Vectors

  • Normalize (relative error)
  • Pole yields bad

approximation

  • Non-pole area much

larger (high dimension)

  • Need large number
  • f poles (exponential in d)

diff

good prj. bad prj. no-go area good prj.

u v diff

slide-18
SLIDE 18

18

Overview Overview

Overview

  • Introduction
  • Locality Sensitive Hashing
  • Approximate Nearest Neighbors
  • Big picture
  • LSH on unit hypercube
  • Setup
  • Main idea
  • Analysis
  • Results
  • Hash Functions Based on p-Stable

Distributions

slide-19
SLIDE 19

19

Approximate Nearest Neighbors Approximate Nearest Neighbors

slide-20
SLIDE 20

20

ANN: Decision version ANN: Decision version

Input: P, q, r Output:

  • If there is a NN, return yes and output one ANN
  • If there is no ANN, return no
  • Otherwise, return either
slide-21
SLIDE 21

21

ANN: Decision version ANN: Decision version

Input: P, q, r Output:

  • If there is a NN, return yes and output one ANN
  • If there is no ANN, return no
  • Otherwise, return either
slide-22
SLIDE 22

22

ANN: Decision version ANN: Decision version

Input: P, q, r Output:

  • If there is a NN, return yes and output one ANN
  • If there is no ANN, return no
  • Otherwise, return either
slide-23
SLIDE 23

23

ANN: Decision version ANN: Decision version

General ANN PLEB Decision version + Binary search

c

slide-24
SLIDE 24

24

LSH Kd-tree Vornoi Preprocessing time Space used Query time

( )

n n O log

ANN: previous results ANN: previous results

( )

n n O log

ρ

( )

ρ + 1

n O

( )

n n O log

1 ρ +

( )

n O

d log

2

( )

n O

d log

2

( )

n O

( )

2 / d

n O

( )

2 / d

n O

slide-25
SLIDE 25

25

LSH: Big picture LSH: Big picture

slide-26
SLIDE 26

26

Locality Sensitive Hashing Locality Sensitive Hashing

  • Remember: solving decision ANN
  • Input:
  • No. of points: n
  • Number of dimensions: d
  • Point set: P
  • Query point: q
slide-27
SLIDE 27

27

LSH: Big Picture LSH: Big Picture

  • Family of hash functions:
  • Close points to same buckets
  • Faraway points to different

buckets

  • Choose a random function

and hash P

  • Only store non-empty

buckets

slide-28
SLIDE 28

28

LSH: Big Picture LSH: Big Picture

  • Hash q in the table
  • Test every point in q’s

bucket for ANN

  • Problem:
  • q’s bucket may be empty
slide-29
SLIDE 29

29

LSH: Big Picture LSH: Big Picture

  • Solution:
  • Use a number of hash tables!
  • We are done if any ANN is found
slide-30
SLIDE 30

30

LSH: Big Picture LSH: Big Picture

  • Problem:
  • Poor resolution too many candidates!
  • Stop after reaching a limit, small probability
slide-31
SLIDE 31

31

LSH: Big Picture LSH: Big Picture

  • Want to find a hash function:
  • h is randomly picked from a family
  • Choose

[ ] [ ]

β α β α >> < ≤ = ∉ ≥ = ∈ , ) ( ) ( Pr then ) , ( If ) ( ) ( Pr then ) , ( If R r q h u h R q B u q h u h r q B u

( )

ε + = 1 r R

slide-32
SLIDE 32

32

LSH on unit Hypercube LSH on unit Hypercube

slide-33
SLIDE 33

33

Setup: unit hypercube Setup: unit hypercube

  • Points lie on hypercube: Hd = {0,1}d
  • Every point is a binary string
  • Hamming distance (r):
  • Number of different coordinates
slide-34
SLIDE 34

34

Setup: unit hypercube Setup: unit hypercube

  • Points lie on hypercube: Hd = {0,1}d
  • Every point is a binary string
  • Hamming distance (r):
  • Number of different coordinates
slide-35
SLIDE 35

35

Main idea Main idea

slide-36
SLIDE 36

36

Hash functions for hypercube Hash functions for hypercube

  • Define family F:
  • Intuition: compare a random coordinate
  • Called:

( )

( )

d r d r d i d H d b b b i b b i h F h b b b d H

d

) 1 ( 1 , 1 , , 1 for , , , 1 ) ( : , , point , Hypercube : Given

1

ε β α + − = − =       = ∈ = = ∈ = K K K

( )

family sensitive

  • ,

), 1 ( , β α ε + r r

slide-37
SLIDE 37

37

Hash functions for hypercube Hash functions for hypercube

  • Define family G:
  • Intuition: Compare k random coordinates
  • Choose k later – logarithmic in n J-L lemma

{ } { }

k k k k

d r d r F i h b k h b h b g k d g G g F d H b β ε β α α =       + − = ′ =       − = ′       ∈       = → ∈ ∈ ) 1 ( 1 , 1 for , ) ( , ), ( 1 ) ( 1 , 1 , : : , : Given K

slide-38
SLIDE 38

38

Constructing hash tables Constructing hash tables

  • Choose

uniformly at random from G

  • Constructing hash tables, hash P
  • Will choose later

τ g g , , 1 K

τ

1

g

2

g

τ

g τ

slide-39
SLIDE 39

39

LSH: ANN algorithm LSH: ANN algorithm

  • Hash q into each
  • Check colliding nodes for ANN
  • Stop if more than collisions, return fail

τ g g , , 1 K

1

g

2

g

τ

g

τ 4

slide-40
SLIDE 40

40

Details… Details…

slide-41
SLIDE 41

41

Choosing parameters Choosing parameters

  • Choose k and to ensure constant probability of:
  • Finding an ANN if there is a NN
  • Few collisions when there is no ANN

τ

) 4 ( τ <

ρ

τ β β α ρ n 2 , 1/ ln n ln k : Choose 1/ ln 1/ ln : Define = = =

slide-42
SLIDE 42

42

Analysis of LSH Analysis of LSH

  • Probability of finding an ANN if there is a NN
  • Consider a point and a hash function

G i g ∈

[ ]

) ( ) ( Pr

1/ ln 1/ ln 1/ ln n ln ρ β α β

α α

− −

= = = ≥ = n n q g p g

k i i

) , ( r q B p∈

slide-43
SLIDE 43

43

Analysis of LSH Analysis of LSH

  • Probability of finding an ANN if there is a NN
  • Consider a point and a hash function

G i g ∈

) , ( r q B p∈

[ ] [ ]

( )

5 4 / 1 1 1 1 tables in

  • nce

least at collide and Pr 1 locations different to and hashes Pr

2

> − ≥ − − ≥ − ≤

− −

e n q p n q p gi

τ ρ ρ

τ

slide-44
SLIDE 44

44

Analysis of LSH Analysis of LSH

  • Probability of collision if there is no ANN
  • Consider a point and a hash function

G g ∈

[ ]

n q g p g

k

1 1/ ln n ln ). ln( exp ) ( ) ( Pr =         = ≤ = β β β

( )

) 1 ( , ε + ∉ r q B p

slide-45
SLIDE 45

45

Analysis of LSH Analysis of LSH

  • Probability of collision if there is no ANN
  • Consider a point and a hash function

G g ∈

( )

) 1 ( , ε + ∉ r q B p

[ ] [ ] [ ] [ ]

4 3 collisions 4 Pr 4 1 4 collisions 4 Pr tables in with collisions E 1 table a in with collisions E ≥ < = ≤ ≥ ≤ ≤ τ τ τ τ τ τ q q

slide-46
SLIDE 46

46

Results Results

slide-47
SLIDE 47

47

Complexity of LSH Complexity of LSH

  • Given:
  • Can answer Decision-ANN with:
  • Show:

( )

Hypercube for family sensitive

  • ,

), 1 ( , β α ε + r r query time space 1             + + ρ ρ dn O n dn O

ε ε β α ρ + ≤       + −       = = 1 1 d )r (1 1 ln d r

  • 1

ln 1/ ln 1/ ln

slide-48
SLIDE 48

48

Complexity of LSH Complexity of LSH

  • Given:
  • Can answer Decision-ANN with:

( )

Hypercube for family sensitive

  • ,

), 1 ( , β α ε + r r query time ) 1 /( 1 space ) 1 /( 1 1       +       + + + ε ε dn O n dn O

slide-49
SLIDE 49

49

Complexity of LSH Complexity of LSH

  • Can amplify success probability
  • Build

structures

  • Can answer Decision-ANN with:

query time log ) 1 /( 1 space log ) 1 /( 1 1       +       + + + n dn O n n dn O ε ε

( )

n O log

slide-50
SLIDE 50

50

Complexity of LSH Complexity of LSH

  • Can answer ANN on the Hypercube:
  • Build

structures with

query time log 1 log ) 1 /( 1 space 2 log 1 ) 1 /( 1 1             − +       − + + + n dn O n n dn O ε ε ε ε

      − n O log 1 ε i i r ) 1 ( ε + =

slide-51
SLIDE 51

51

LSH - Summary LSH - Summary

  • Randomized Monte-Carlo algorithm for ANN
  • First truly sub-linear query time for ANN
  • Need to examine only logarithmic number of

coordinates

  • Can be extended to any metric space if we can find a

hash function for it!

  • Easy to update dynamically
  • Can reduce ANN in Rd to ANN on hypercube
slide-52
SLIDE 52

52

Overview Overview

Overview

  • Introduction
  • Locality Sensitive Hashing
  • Hash Functions Based on p-Stable

Distributions

  • The basic idea
  • The details (more formal)
  • Analysis, experimental results
slide-53
SLIDE 53

53

LSH by Random Projections LSH by Random Projections

Idea:

  • Hash function is a projection to a line
  • f random orientation
  • One composite hash function is a random grid
  • Hashing buckets are grid cells
  • Multiple grids are used for prob. amplification
  • Jitter grid offset randomly (check only one cell)
  • Double hashing: Do not store empty grid cells
slide-54
SLIDE 54

54

LSH by Random Projections LSH by Random Projections

Basic Idea:

slide-55
SLIDE 55

55

LSH by Random Projections LSH by Random Projections

Questions:

  • What distribution should be used for the

projection vectors?

  • What is a good bucket size?
  • Local sensitivity:
  • How many lines per grid?
  • How many hash grids overall?
  • Depends on sensitivity (as explained before)
  • How efficient is this scheme?
slide-56
SLIDE 56

56

The Details The Details

slide-57
SLIDE 57

57

p-Stable Distributions p-Stable Distributions

Distribution for the Projection Vectors:

  • Need to make the projection process formally

accessible

  • Mathematical tool: p-stable distributions
slide-58
SLIDE 58

58

p-Stable Distributions p-Stable Distributions

p-Stable Distributions: A prob. distribution D is called p-stable :⇔

  • For any v1, … ,vn ∈ R
  • And i.i.d. random variables X1, … ,Xn ~ D

ΣviXi

has the same distribution as Σ|vi|p 1/pX where X ~ D

i i

slide-59
SLIDE 59

59

Gaussian Distribution Gaussian Distribution

Gaussian Normal Distributions are 2-stable

x1 x2

slide-60
SLIDE 60

60

Other distributions:

  • Cauchy distribution

is 1-stable (must have infinite variance so that the central limit theorem is not violated)

  • Distributions exists for p ∈ (0,2]
  • No closed form, but can be sampled
  • Sampling sufficient for LSH-algorithm

More General Distributions More General Distributions

) 1 ( 1

2

x + π

slide-61
SLIDE 61

61

Projection Projection

Projection Algorithm:

  • Chose p according to metric of the space lp
  • Compute vector with entries according to a

p-stable distribution [for example: Gaussian noise entries]

  • Each vector vi yields a hash function hi
  • Compute:

        + = r b x v x h

i i

, ) (

random value ∈ [0…r] bucket size

slide-62
SLIDE 62

62

ln ln β α = ρ

Locality Sensitive HF Locality Sensitive HF

Locality Sensitive Hash Functions H = {h: S → U} is (r1, r2, α, β)-sensitive :⇔

v ∈ B(q, r1) ⇒ Pr(collision(p,q)) ≥ α v ∉ B(q, r2) ⇒ Pr(collision(p,q)) ≤ β

Performance

(O(dn + n1+ρ) space, O(dnρ) query time)

slide-63
SLIDE 63

63

Locality Sensitivity Locality Sensitivity

Computing the Locality “Sensitivity” Distance c = ||v1 - v2||p cX-distributed, X from p-stable distr. The constructed family of hash functions is (r1, r2, α, β)-sensitive for α = p(1), β = p(c), r2/r1 = c

dt r t c t f c collision

bucket hit r density abs p c p

4 3 4 2 1 3 2 1 4 4 3 4 4 2 1       −       = ∫

=

1 1 ) Pr(

. ) ( :

fp

t

        + = r b x v x h

i i

, ) (

slide-64
SLIDE 64

64

Numerical Computation Numerical Computation

, O(dn + n1+ρ) space, O(dnρ) query time

[Datar et al. 04] [Datar et al. 04]

l1 l2

Numerical result: ρ ~ 1/c = 1/(1+ε)

= ρ ln ln β α

slide-65
SLIDE 65

65

Numerical Computation Numerical Computation

[Datar et al. 04] [Datar et al. 04]

l1 l2

Width Parameter r

  • Intuitively: In the range of ball radius
  • Num. result: not too small (too large increases k)
  • Practice: factor 4 (E2LSH manual)
slide-66
SLIDE 66

66

Experimental Results Experimental Results

slide-67
SLIDE 67

67

LSH vs. ANN LSH vs. ANN

Comparison with ANN (Mount, Arya, kD/BBD-trees) MNIST handwritten digits, 60000×282 pix (d=784)

[Datar et al. 04]

slide-68
SLIDE 68

68

LSH vs. ANN LSH vs. ANN

Remarks:

  • ANN with c = 10 is comparably fast and 65% correct,

but there are no guarantees [Indyk]

  • LSH needs more memory:

1.2GB vs. 360MB [Indyk]

  • Empirically, LSH shows linear performance when

forced to use linear memory [Goldstein et al. 05]

  • Benchmark searches only for points in the data set,

LSH is much slower for negative results [Goldstein et al. 05, report ~1.5 ord. of mag.]