Approximate Nearest Neighbor Problem: Improving Query Time CS468, - - PowerPoint PPT Presentation

approximate nearest neighbor problem improving query time
SMART_READER_LITE
LIVE PREVIEW

Approximate Nearest Neighbor Problem: Improving Query Time CS468, - - PowerPoint PPT Presentation

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline d ( d 1) / 2 Reducing the constant from O to O in query time Need to know ahead of time Preprocessing


slide-1
SLIDE 1

CS468, 10/9/2006

Approximate Nearest Neighbor Problem: Improving Query Time

slide-2
SLIDE 2

Outline

  • Reducing the ”constant” from O
  • ǫ−d

to O

  • ǫ−(d−1)/2

in query time

  • Need to know ǫ ahead of time

– Preprocessing time and storage feature O(ǫ−d), O(ǫ−(d−1)/2) etc.

slide-3
SLIDE 3

Outline

  • Reducing the ”constant” from O
  • ǫ−d

to O

  • ǫ−(d−1)/2

in query time

  • Need to know ǫ ahead of time

– Preprocessing time and storage feature O(ǫ−d), O(ǫ−(d−1)/2) etc.

  • Timothy M. Chan. Approximate Nearest Neighbor Queries Revisited.

Discrete and Computational Geometry 1998. – Decomposition of space into cones – BBD-tree for range searching in Rd−k + point location in Rk

  • Kenneth Clarkson. An Algorithm for Approximate Closest-point Queries.

SoCG 1994. – Additional log(ρ/ǫ) in space complexity – Polytope approximation in Rd+1

slide-4
SLIDE 4

Chen’s Algorithm: Motivation

(1 + ǫ)-ANN among (sorted) points in a narrow cone

q

O(log n) by binary search Need a data structure that returns a sorted points given q and a cone direction

slide-5
SLIDE 5

Chen’s Algorithm: Motivation

Uses the BBD-tree data structure Given a query point q ∈ Rd and a radius r

  • ne can find O(log n) cells of the BBD-tree

which contain B(q, r) and are contained in B(q, 2r). This takes O(log n) time Use for approximate range searching in Rd−1 (1 + ǫ)-ANN among (sorted) points in a narrow cone

q

O(log n) by binary search Need a data structure that returns a sorted points given q and a cone direction

slide-6
SLIDE 6

Conic ANN (with a Hint)

Output: A points s such that ||q − s|| ≤ (1 + ǫ)||q − p|| where p is the NN inside a cone with apex q and angle δ =

  • ǫ/16

Note: s need not be in the cone! Input: Query point q and a 2-approximation r to the NN distance Note: The cone is fixed (not a part of input, mod. translation to q)

δ s q r p

slide-7
SLIDE 7

Main (1 + ǫ)-ANN Algorithm

Uses the ”conic-ANN with a hint” as a subrotine Query (given only q)

  • Obtain r by [Arya and Mount 1998]
  • Get one point per data structure, return the one closest to q
slide-8
SLIDE 8

Main (1 + ǫ)-ANN Algorithm

Uses the ”conic-ANN with a hint” as a subrotine Preprocessing

  • ”Tile” Rd with O(ǫ−(d−1)/2) cones of angle δ = Θ(√ǫ)
  • Build a ”conic-ANN” data structure for each cone

”floating”

Query (given only q)

  • Obtain r by [Arya and Mount 1998]
  • Get one point per data structure, return the one closest to q
slide-9
SLIDE 9

Main (1 + ǫ)-ANN Algorithm

Uses the ”conic-ANN with a hint” as a subrotine Preprocessing

  • ”Tile” Rd with O(ǫ−(d−1)/2) cones of angle δ = Θ(√ǫ)
  • Build a ”conic-ANN” data structure for each cone

”floating”

Query (given only q)

  • Obtain r by [Arya and Mount 1998]
  • Get one point per data structure, return the one closest to q

q p s Correctness

true NN (1 + ǫ)-ANN (returned from that cone’s data structure)

slide-10
SLIDE 10

Main (1 + ǫ)-ANN Algorithm

Uses the ”conic-ANN with a hint” as a subrotine Preprocessing

  • ”Tile” Rd with O(ǫ−(d−1)/2) cones of angle δ = Θ(√ǫ)
  • Build a ”conic-ANN” data structure for each cone

”floating”

Query (given only q)

  • Obtain r by [Arya and Mount 1998]
  • Get one point per data structure, return the one closest to q

q p s Correctness

true NN (1 + ǫ)-ANN (returned from that cone’s data structure)

[# of cones] Query time O(ǫ−(d−1)/2 log n) [conic query]

slide-11
SLIDE 11

Conic-ANN Data Structure

For preprocessing given only direction of the cone (wlog: d-axis) and angle δ

δ q d-axis r

slide-12
SLIDE 12

Conic-ANN Data Structure

For preprocessing given only direction of the cone (wlog: d-axis) and angle δ Query Algorithm (given q and r) Approximate range query on the set of projections {p′ = [p1 p2 · · · pd−1]T , p ∈ P} with B(q, δr)

  • returns O(log n) BBD-nodes (cells) in O(log n) time

O(log n) binary searches Return the point s such that |sd − qd| is min

δ δr 2δr s q d-axis r

slide-13
SLIDE 13

Conic-ANN Data Structure

For preprocessing given only direction of the cone (wlog: d-axis) and angle δ Query Algorithm (given q and r) Approximate range query on the set of projections {p′ = [p1 p2 · · · pd−1]T , p ∈ P} with B(q, δr)

  • returns O(log n) BBD-nodes (cells) in O(log n) time

O(log n) binary searches Return the point s such that |sd − qd| is min Correctness (proof for ||q − s|| ≤ (1 + ǫ)||q − p||)

δ δr 2δr s q d-axis r p

|sd − qd| ≤ |pd − qd| ≤ ||p − q|| |s′ − q′| ≤ 2δr ≤ 4δ||p − q|| ||s − q|| ≤ √ 1 + 16δ2||p − q|| = (1 + ǫ)||p − q||

slide-14
SLIDE 14

Conic-ANN Data Structure

For preprocessing given only direction of the cone (wlog: d-axis) and angle δ Data structure BBD-tree on the projection set For every tree node v the associated list of points is sorted in the d coordinate Query Algorithm (given q and r) Approximate range query on the set of projections {p′ = [p1 p2 · · · pd−1]T , p ∈ P} with B(q, δr)

  • returns O(log n) BBD-nodes (cells) in O(log n) time

O(log n) binary searches Return the point s such that |sd − qd| is min Correctness (proof for ||q − s|| ≤ (1 + ǫ)||q − p||)

δ δr 2δr s q d-axis r p

|sd − qd| ≤ |pd − qd| ≤ ||p − q|| |s′ − q′| ≤ 2δr ≤ 4δ||p − q|| ||s − q|| ≤ √ 1 + 16δ2||p − q|| = (1 + ǫ)||p − q||

slide-15
SLIDE 15

Conic-ANN Analysis

Construction (preprocessing) BBD-tree O(n log n) +sorting O(n log n) = O(n log n)

O(1)

Improving query time by exploiting correlation

[Lueker and Willard]

Query Approximate range query O(log n) + bin. searches O(log2 n) = O(log2 n)

O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(1) O(log n)

O(log n) nodes

v left(v) right(v)

slide-16
SLIDE 16

Summary and Remarks

Variant with projecting to d − 2 dimensions

  • BBD tree + planar point location

Rough (≈ d3/2) approximation algorithms

  • Polynomial dependence on d
slide-17
SLIDE 17

Clarkson’s Algorithm: Iterative Improvement

q Exact nearest neighbor problem Data structure For each site s, a (small) list Ls of other sites such that s for any query point q if s is not the nearest neighbor of q, then Ls contains a site closer to q

slide-18
SLIDE 18

Clarkson’s Algorithm: Iterative Improvement

q Exact nearest neighbor problem Data structure For each site s, a (small) list Ls of other sites such that s for any query point q if s is not the nearest neighbor of q, then Ls contains a site closer to q s ← arbitrary site while ∃t ∈ Ls : ||t − q|| < ||s − q|| do s ← t return s Algorithm

slide-19
SLIDE 19

Clarkson’s Algorithm: Iterative Improvement

q Exact nearest neighbor problem Data structure For each site s, a (small) list Ls of other sites such that s for any query point q if s is not the nearest neighbor of q, then Ls contains a site closer to q s ← arbitrary site while ∃t ∈ Ls : ||t − q|| < ||s − q|| do s ← t return s Algorithm q′ Note The same Ls valid for all q!

slide-20
SLIDE 20

Not Useful for Exact NN

Reason 1: space complexity Ω(n2) For all s, Ls has to include all Delaunay neighbors of s For d > 2, Delaunay triangulation may have Ω(n2) edges

slide-21
SLIDE 21

Not Useful for Exact NN

Reason 1: space complexity Ω(n2)

s c q

For all s, Ls has to include all Delaunay neighbors of s For d > 2, Delaunay triangulation may have Ω(n2) edges

t

Proof: t Delaunay neighbor of s, but t / ∈ Ls t is the only site closer to q than s

slide-22
SLIDE 22

Not Useful for Exact NN

Reason 1: space complexity Ω(n2)

s c q

For all s, Ls has to include all Delaunay neighbors of s For d > 2, Delaunay triangulation may have Ω(n2) edges

t

Reason 2: query time Ω(n) No ”sufficient progress” guarantee, may have to visit all sites Proof: t Delaunay neighbor of s, but t / ∈ Ls t is the only site closer to q than s

q s1 s2 s3 s4 s5

slide-23
SLIDE 23

Not Useful for Exact NN

Reason 1: space complexity Ω(n2)

s c q

For all s, Ls has to include all Delaunay neighbors of s For d > 2, Delaunay triangulation may have Ω(n2) edges

t

Reason 2: query time Ω(n) No ”sufficient progress” guarantee, may have to visit all sites Proof: t Delaunay neighbor of s, but t / ∈ Ls t is the only site closer to q than s

q s1 s2 s3 s4 s5

Conclusion No improvement over the trivial algorithm!

slide-24
SLIDE 24

Modification for ANN

Data structure For each site s, a (small) list Ls of other sites such that for any query point q if s is not a (1 + ǫ)-ANN of q, then Ls contains a site (1 + ǫ/2)-closer to q q s

||q−s|| 1+ǫ

||q − s||

||q−s|| 1+ǫ/2

t b

slide-25
SLIDE 25

Modification for ANN

Data structure For each site s, a (small) list Ls of other sites such that for any query point q if s is not a (1 + ǫ)-ANN of q, then Ls contains a site (1 + ǫ/2)-closer to q s ← arbitrary site while ∃t ∈ Ls : ||q − t|| ≤ ||q−s||

1+ǫ/2 do s ← t

return s Algorithm (simple version) q s

||q−s|| 1+ǫ

||q − s||

||q−s|| 1+ǫ/2

t b

slide-26
SLIDE 26

Query Algorithm

[Arya and Mount 1993] R0 = S

Skip list approach

slide-27
SLIDE 27

Query Algorithm

[Arya and Mount 1993] R0 = S R1

Skip list approach

slide-28
SLIDE 28

Query Algorithm

[Arya and Mount 1993] R0 = S R1 R2

Skip list approach

slide-29
SLIDE 29

Query Algorithm

[Arya and Mount 1993] R0 = S R1 R2 R3

Skip list approach

slide-30
SLIDE 30

Query Algorithm

[Arya and Mount 1993] R0 = S R1 R2 R3 RK

Skip list approach

slide-31
SLIDE 31

Query Algorithm

Algorithm

  • start with any tK−1 ∈ RK−1
  • for j = K − 2, K − 3, . . . , 0

– find tj =(1 + ǫ)-ANN of q in Rj starting from tj+1

  • return t0

[Arya and Mount 1993] R0 = S R1 R2 R3 RK

Skip list approach

[using naive algorithm]

slide-32
SLIDE 32

Query Time Analysis

Compare with a regular path

  • Visit nodes in the order of proximity to q, then go to the lower level

Suppose that any node’s list size is at most c Observation: Query time = c· number of visited nodes

slide-33
SLIDE 33

Query Time Analysis

Compare with a regular path

  • Visit nodes in the order of proximity to q, then go to the lower level

q t tj+1 Rj+1 Rj t′ (1 + ǫ/2)2 ≥ 1 + ǫ ⇒ ||q − t′|| ≤ ||q − t|| t q tj+1 Suppose that any node’s list size is at most c Observation: Query time = c· number of visited nodes Claim: Our path visits at most 2K nodes more

slide-34
SLIDE 34

Query Time Analysis

Pr[regular path length ≥ C log n] ≤ O(n−C) Compare with a regular path

  • Visit nodes in the order of proximity to q, then go to the lower level

q t tj+1 Rj+1 Rj t′ (1 + ǫ/2)2 ≥ 1 + ǫ ⇒ ||q − t′|| ≤ ||q − t|| t q tj+1 Suppose that any node’s list size is at most c Observation: Query time = c· number of visited nodes Claim: Our path visits at most 2K nodes more

[distribution of points across levels] [starting search point]

slide-35
SLIDE 35

Query Time Analysis

What about any q? Skip list n possible search targets Probability of failure n · O(n−C) = O(n−(C−1))

slide-36
SLIDE 36

Query Time Analysis

What about any q? Only nO(d) ”combinatorially distinct” regular paths

  • If q1 and q2 incude the same distance ordering on the input

sites, their regular paths are the same

  • Arrangement of

n

2

  • bisecting hyperplanes has

n

2

  • d
  • ≤ (n2)d = n2d

d-dimensional cells Skip list n possible search targets Probability of failure n · O(n−C) = O(n−(C−1)) q

slide-37
SLIDE 37

Query Time Analysis

What about any q? Only nO(d) ”combinatorially distinct” regular paths

  • If q1 and q2 incude the same distance ordering on the input

sites, their regular paths are the same

  • Arrangement of

n

2

  • bisecting hyperplanes has

n

2

  • d
  • ≤ (n2)d = n2d

d-dimensional cells Skip list n possible search targets Probability of failure n · O(n−C) = O(n−(C−1)) Setting C = 2d + C′ Pr[regular path length ≤ O(d) log n] = O(n−C′) q

slide-38
SLIDE 38

Weighted Voronoi Diagrams

∀q ∈ Rd ∀b ∈ S : ||q − b|| ≥ ||q−s||

1+ǫ

⇐ ∀t ∈ Ls : ||q − t|| ≥ ||q−s||

1+ǫ/2

Goal For each site s, compute Ls such that

[s is an (1 + ǫ)-ANN of q] [no ”improvement” in Ls]

slide-39
SLIDE 39

Weighted Voronoi Diagrams

∀q ∈ Rd ∀b ∈ S : ||q − b|| ≥ ||q−s||

1+ǫ

⇐ ∀t ∈ Ls : ||q − t|| ≥ ||q−s||

1+ǫ/2

Goal For each site s, compute Ls such that b s q t s q

1 ǫ(2+ǫ)||s − b|| 2(1+ǫ) ǫ(2+ǫ) ||s − b||

[s is an (1 + ǫ)-ANN of q] [no ”improvement” in Ls]

1 ǫ(2+ǫ/2)||s − t|| 2(1+ǫ/2) (ǫ/2)(2+ǫ/2)||s − t||

[s, b, ǫ fixed] [s, t, ǫ fixed] Q(b, ǫ) Q(t, ǫ/2)

slide-40
SLIDE 40

Weighted Voronoi Diagrams

∀q ∈ Rd ∀b ∈ S : ||q − b|| ≥ ||q−s||

1+ǫ

⇐ ∀t ∈ Ls : ||q − t|| ≥ ||q−s||

1+ǫ/2

Goal For each site s, compute Ls such that b s q t s q

1 ǫ(2+ǫ)||s − b|| 2(1+ǫ) ǫ(2+ǫ) ||s − b||

[s is an (1 + ǫ)-ANN of q] [no ”improvement” in Ls]

1 ǫ(2+ǫ/2)||s − t|| 2(1+ǫ/2) (ǫ/2)(2+ǫ/2)||s − t||

[s, b, ǫ fixed] [s, t, ǫ fixed] Q(b, ǫ) Q(t, ǫ/2)

∀b ∈ S : q ∈ Q(b, ǫ) ⇐ ∀t ∈ Ls : q ∈ Q(t, ǫ/2)

slide-41
SLIDE 41

Weighted Voronoi Diagrams

∀q ∈ Rd ∀b ∈ S : ||q − b|| ≥ ||q−s||

1+ǫ

⇐ ∀t ∈ Ls : ||q − t|| ≥ ||q−s||

1+ǫ/2

Goal For each site s, compute Ls such that b s q t s q

1 ǫ(2+ǫ)||s − b|| 2(1+ǫ) ǫ(2+ǫ) ||s − b||

[s is an (1 + ǫ)-ANN of q] [no ”improvement” in Ls]

1 ǫ(2+ǫ/2)||s − t|| 2(1+ǫ/2) (ǫ/2)(2+ǫ/2)||s − t||

[s, b, ǫ fixed] [s, t, ǫ fixed] Q(b, ǫ) Q(t, ǫ/2)

∀b ∈ S : q ∈ Q(b, ǫ) ⇐ ∀t ∈ Ls : q ∈ Q(t, ǫ/2) T

b∈S

Q(b, ǫ) ⊇ T

t∈Ls

Q(t, ǫ/2)

slide-42
SLIDE 42

Linearization (”Lifting”)

Example for d=1 A point inside/outside a sphere in Rd?

  • A point above/below a hyperplane in Rd+1?

D q q′

y = ||q||2

D′

slide-43
SLIDE 43

Linearization (”Lifting”)

Example for d=1 A point inside/outside a sphere in Rd?

  • A point above/below a hyperplane in Rd+1?

D q q′

y = ||q||2

D′

b s q

Q(b, ǫ) Q(b, ǫ) = {q ∈ Rd : ||q − s|| ≤ (1 + ǫ)||q − b||}

slide-44
SLIDE 44

Linearization (”Lifting”)

Example for d=1 A point inside/outside a sphere in Rd?

  • A point above/below a hyperplane in Rd+1?

D q q′

y = ||q||2

D′

b s q

Q(b, ǫ) H(b, ǫ), halfspace in Rd+1 (note: contains the origin)

P(b, ǫ) = {(q, y) : αy ≥ 2q, b − ||b||2} ∩ {(q, y) : y = ||q||2}

Ψ, standard paraboloid in Rd+1 (note: independent of b, ǫ) Q(b, ǫ) = {q ∈ Rd : ||q − s|| ≤ (1 + ǫ)||q − b||}

α ≈ 2ǫ

slide-45
SLIDE 45

Final Formulation

Paraboloid Ψ = {(q, y) : y = ||q||2} −||b||2/α y q s b

1 4||b||2 4 α2 ||b||2

slide-46
SLIDE 46

Final Formulation

Halfspaces H(b, ǫ) = {(q, y) : αy ≥ 2b, q − ||b||2} for all b ∈ S Paraboloid Ψ = {(q, y) : y = ||q||2} [can compute using S and ǫ] y q s query points for which s is a (1 + ǫ)-ANN H(b, ǫ)

slide-47
SLIDE 47

Final Formulation

Halfspaces H(b, ǫ) = {(q, y) : αy ≥ 2b, q − ||b||2} for all b ∈ S Halfspaces G(t, ǫ′ = ǫ/2) = {(q, y) : α′y ≥ 2t, q − ||t||2} for all t ∈ Ls Paraboloid Ψ = {(q, y) : y = ||q||2} [can compute using S and ǫ] [unknown] y q s Goal It suffices to make sure that ⊆

slide-48
SLIDE 48

Preprocessing

initialize the weight of all sites to 1 repeat pick a (weighted) random sample R ⊆ S of size C1cd log c if T

t∈R

G(t, ǫ/2) ∩ Ψ ⊆ T

b∈S

H(b, ǫ) return R else v = a violating vertex of T

t∈R

G(t, ǫ/2) ∩ Ψ double the weight of V = {t ∈ S \ R : v / ∈ G(t, ǫ/2)} The sample size depends on c, the optimal size of Ls Next we bound c using polytope approximation

slide-49
SLIDE 49

Size of Ls

Exhibit a list of size O

  • ǫ−(d−1)/2 log ρ

ǫ

  • , where ρ = maxs,t∈S ||s−t||

mins,t∈S ||s−t||

Lemma For any convex and compact set P ⊂ Rd contained in the unit sphere and any ǫ ∈ (0, 1), there is a polytope P ′ ⊃ P with at most O(ǫ(d−1)/2) facets which is in the ǫ-neighborhood of P.

1 ǫ ǫ ǫ ǫ P P ′ ǫ

Note Always ”outer” approximation

slide-50
SLIDE 50

Size of Ls

Exhibit a list of size O

  • ǫ−(d−1)/2 log ρ

ǫ

  • , where ρ = maxs,t∈S ||s−t||

mins,t∈S ||s−t||

Lemma For any convex and compact set P ⊂ Rd contained in the unit sphere and any ǫ ∈ (0, 1), there is a polytope P ′ ⊃ P with at most O(ǫ(d−1)/2) facets which is in the ǫ-neighborhood of P.

1 ǫ ǫ ǫ ǫ P P ′ ǫ

Note Always ”outer” approximation

y q

Recall We need an ”inner” approximation

  • f this

H(b, ǫ), b ∈ S

slide-51
SLIDE 51

Size of Ls

y q

Want an ”inner” approximation of this

slide-52
SLIDE 52

Size of Ls

y q y q

”stretching” ≈ 2 times Want an ”inner” approximation of this using only these hyperplanes as potential facets

slide-53
SLIDE 53

Size of Ls

y q y q

”stretching” ≈ 2 times Goal: Subsample (as much as possible) the hyperplanes on the right so that Want an ”inner” approximation of this using only these hyperplanes as potential facets ⊆

slide-54
SLIDE 54

Size of Ls

y q

Dudley approximation ≥ ǫ (in Dudley’s Theorem) Straightforward application of Dudley’s Theorem does not work! The value of ǫ dictated by the smallest scale

slide-55
SLIDE 55

Size of Ls

y q

vertical slice Slices have

  • geometrically increasing height
  • ”constant” gap

Solution: height-dependent slicing, per-slice Dudley approximations

slide-56
SLIDE 56

Size of Ls

y q

d0 = 1

4 min b∈S ||b||2

dm >

4 α2 max b∈S ||b||2

di = 3

2di−1

Number of slices m = O(log(ρ/α)) Complexity (number of facets) of approximation O(ǫ−(d−1)/2) per slice Recall: ρ – spread Key fact Red and blue projections into the q-hyperplane within one slice are at least a factor

  • f 1 + ǫ apart, so the same ǫ can be used in all approximations
slide-57
SLIDE 57

Clarkson’s Algorithm: Summary

  • Improved query time at the expense of specifying ǫ in advance
  • O(ǫ−(d−1)/2) instead of O(ǫ−d)
  • Express the condition on Ls in the form of P(S, ǫ) ⊇ Q(Ls, ǫ/2)
  • Preprocessing by iterative random sampling from S and checking the

containment condition

  • Query procedure using

– top-down search on a skip list – iterative improvement algorithm within one level