Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - - PowerPoint PPT Presentation

β–Ά
approximate nearest line search in high dimensions
SMART_READER_LITE
LIVE PREVIEW

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - - PowerPoint PPT Presentation

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi The NLS Problem Given: a set of lines in The NLS Problem Given: a set of lines in Goal: build a data structure s.t. given a


slide-1
SLIDE 1

Approximate Nearest Line Search in High Dimensions

Sepideh Mahabadi

slide-2
SLIDE 2

The NLS Problem

  • Given: a set of 𝑂 lines 𝑀 in ℝ𝑒
slide-3
SLIDE 3

The NLS Problem

  • Given: a set of 𝑂 lines 𝑀 in ℝ𝑒
  • Goal: build a data structure s.t.

– given a query π‘Ÿ, find the closest line β„“βˆ— to π‘Ÿ

slide-4
SLIDE 4

The NLS Problem

  • Given: a set of 𝑂 lines 𝑀 in ℝ𝑒
  • Goal: build a data structure s.t.

– given a query π‘Ÿ, find the closest line β„“βˆ— to π‘Ÿ – polynomial space – sub-linear query time

slide-5
SLIDE 5

The NLS Problem

  • Given: a set of 𝑂 lines 𝑀 in ℝ𝑒
  • Goal: build a data structure s.t.

– given a query π‘Ÿ, find the closest line β„“βˆ— to π‘Ÿ – polynomial space – sub-linear query time

Approximation

  • Finds an approximate closest line β„“

𝑒𝑒𝑒𝑒 π‘Ÿ,β„“ ≀ 𝑒𝑒𝑒𝑒(π‘Ÿ, β„“βˆ—)(1 + πœ—)

slide-6
SLIDE 6

BACKGROUND

Nearest Neighbor Problems Motivation Previous Work Our result Notation

slide-7
SLIDE 7

Nearest Neighbor Problem

NN: Given a set of 𝑂 points 𝑄, build a data structure s.t. given a query point π‘Ÿ, finds the closest point π‘žβˆ— to π‘Ÿ.

slide-8
SLIDE 8

Nearest Neighbor Problem

NN: Given a set of 𝑂 points 𝑄, build a data structure s.t. given a query point π‘Ÿ, finds the closest point π‘žβˆ— to π‘Ÿ.

  • Applications: database, information retrieval,

pattern recognition, computer vision

– Features: dimensions – Objects: points – Similarity: distance between points

slide-9
SLIDE 9

Nearest Neighbor Problem

NN: Given a set of 𝑂 points 𝑄, build a data structure s.t. given a query point π‘Ÿ, finds the closest point π‘žβˆ— to π‘Ÿ.

  • Applications: database, information retrieval,

pattern recognition, computer vision

– Features: dimensions – Objects: points – Similarity: distance between points

  • Current solutions suffer from β€œcurse of

dimensionality”:

– Either space or query time is exponential in 𝑒 – Little improvement over linear search

slide-10
SLIDE 10

Approximate Nearest Neighbor(ANN)

  • ANN: Given a set of 𝑂 points 𝑄, build a data

structure s.t. given a query point π‘Ÿ, finds an approximate closest point π‘ž to π‘Ÿ, i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ,π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ, π‘žβˆ— 1 + πœ—

slide-11
SLIDE 11

Approximate Nearest Neighbor(ANN)

  • ANN: Given a set of 𝑂 points 𝑄, build a data

structure s.t. given a query point π‘Ÿ, finds an approximate closest point π‘ž to π‘Ÿ, i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ,π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ, π‘žβˆ— 1 + πœ—

  • There exist data structures with different
  • tradeoffs. Example:

– Space: 𝑒𝑂 𝑃

1 πœ—2

– Query time:

𝑒 log 𝑂 πœ— 𝑃 1

slide-12
SLIDE 12

Motivation for NLS

One of the simplest generalizations of ANN: data items are represented by 𝑙- flats (affine subspace) instead of points

slide-13
SLIDE 13

Motivation for NLS

One of the simplest generalizations of ANN: data items are represented by 𝑙- flats (affine subspace) instead of points

  • Model data under linear variations
  • Unknown or unimportant parameters in

database

slide-14
SLIDE 14

Motivation for NLS

One of the simplest generalizations of ANN: data items are represented by 𝑙- flats (affine subspace) instead of points

  • Model data under linear variations
  • Unknown or unimportant parameters in

database

  • Example:

– Varying light gain parameter of images – Each image/point becomes a line – Search for the closest line to the query image

slide-15
SLIDE 15

Previous and Related Work

  • Magen[02]: Nearest Subspace Search for constant 𝑙

– Query time is fast : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

– Space is super-polynomial : 2 log 𝑂 𝑃 1

slide-16
SLIDE 16

Previous and Related Work

  • Magen[02]: Nearest Subspace Search for constant 𝑙

– Query time is fast : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

– Space is super-polynomial : 2 log 𝑂 𝑃 1

Dual Problem: Database is a set of points, query is a 𝑙-flat

  • [AIKN] for 1-flat: for any 𝑒 > 0

– Query time: 𝑃 𝑒3𝑂0.5+𝑒 – Space: 𝑒2𝑂𝑃

1 πœ—2+ 1 𝑒2

slide-17
SLIDE 17

Previous and Related Work

  • Magen[02]: Nearest Subspace Search for constant 𝑙

– Query time is fast : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

– Space is super-polynomial : 2 log 𝑂 𝑃 1

Dual Problem: Database is a set of points, query is a 𝑙-flat

  • [AIKN] for 1-flat: for any 𝑒 > 0

– Query time: 𝑃 𝑒3𝑂0.5+𝑒 – Space: 𝑒2𝑂𝑃

1 πœ—2+ 1 𝑒2

  • Very recently [MNSS] extended it for 𝑙-flats

– Query time 𝑃 π‘œ

𝑙 𝑙+1βˆ’πœ+𝑒

– Space: 𝑃(π‘œ

1+

πœπ‘™ 𝑙+1βˆ’πœ + π‘œ log𝑃 1 𝑒 π‘œ)

slide-18
SLIDE 18

Our Result

We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability

  • Space: 𝑂 + 𝑒 𝑃

1 πœ—2

  • Time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

slide-19
SLIDE 19

Our Result

We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability

  • Space: 𝑂 + 𝑒 𝑃

1 πœ—2

  • Time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

  • Matches up to polynomials, the performance of best

algorithm for ANN. No exponential dependence on 𝑒

slide-20
SLIDE 20

Our Result

We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability

  • Space: 𝑂 + 𝑒 𝑃

1 πœ—2

  • Time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

  • Matches up to polynomials, the performance of best

algorithm for ANN. No exponential dependence on 𝑒

  • The first algorithm with poly log query time and

polynomial space for objects other than points

slide-21
SLIDE 21

Our Result

We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability

  • Space: 𝑂 + 𝑒 𝑃

1 πœ—2

  • Time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

  • Matches up to polynomials, the performance of best

algorithm for ANN. No exponential dependence on 𝑒

  • The first algorithm with poly log query time and

polynomial space for objects other than points

  • Only uses reductions to ANN
slide-22
SLIDE 22

Notation

  • 𝑀 : the set of lines with size 𝑂
  • q : the query point
slide-23
SLIDE 23

Notation

  • 𝑀 : the set of lines with size 𝑂
  • q : the query point
  • 𝐢(𝑑, 𝑠): ball of radius 𝑠 around 𝑑
slide-24
SLIDE 24

Notation

  • 𝑀 : the set of lines with size 𝑂
  • q : the query point
  • 𝐢(𝑑, 𝑠): ball of radius 𝑠 around 𝑑
  • 𝑒𝑒𝑒𝑒: the Euclidean distance

between objects

slide-25
SLIDE 25

Notation

  • 𝑀 : the set of lines with size 𝑂
  • q : the query point
  • 𝐢(𝑑, 𝑠): ball of radius 𝑠 around 𝑑
  • 𝑒𝑒𝑒𝑒: the Euclidean distance

between objects

  • π‘π‘œπ‘π‘π‘: defined between lines
slide-26
SLIDE 26

Notation

  • 𝑀 : the set of lines with size 𝑂
  • q : the query point
  • 𝐢(𝑑, 𝑠): ball of radius 𝑠 around 𝑑
  • 𝑒𝑒𝑒𝑒: the Euclidean distance

between objects

  • π‘π‘œπ‘π‘π‘: defined between lines
  • πœ€-close: two lines β„“ , β„“β€² are πœ€-close

if sin(π‘π‘œπ‘π‘π‘ β„“, β„“β€² ) ≀ πœ€

slide-27
SLIDE 27

MODULES

Net Module Unbounded Module Parallel Module

slide-28
SLIDE 28

Net Module

  • Intuition: sampling points from each line

finely enough to get a set of points 𝑄, and building an 𝐡𝑂𝑂(𝑄, πœ—) should suffice to find the approximate closest line.

slide-29
SLIDE 29

Net Module

  • Intuition: sampling points from each line

finely enough to get a set of points 𝑄, and building an 𝐡𝑂𝑂(𝑄, πœ—) should suffice to find the approximate closest line. Lemma:

  • Let 𝑦 be the separation parameter:

distance between two adjacent samples

  • n a line, Then

– Either the returned line β„“π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ, β„“π‘ž ≀ 𝑦/πœ—

slide-30
SLIDE 30

Net Module

  • Intuition: sampling points from each line

finely enough to get a set of points 𝑄, and building an 𝐡𝑂𝑂(𝑄, πœ—) should suffice to find the approximate closest line. Lemma:

  • Let 𝑦 be the separation parameter:

distance between two adjacent samples

  • n a line, Then

– Either the returned line β„“π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ, β„“π‘ž ≀ 𝑦/πœ—

Issue: It should be used inside a bounded region

slide-31
SLIDE 31

Unbounded Module - Intuition

  • All lines in 𝑀 pass through the origin

𝑝

slide-32
SLIDE 32

Unbounded Module - Intuition

  • All lines in 𝑀 pass through the origin

𝑝

  • Data structure:

– Project all lines onto any sphere 𝑇 𝑝,𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂(𝑄, πœ—)

slide-33
SLIDE 33

Unbounded Module - Intuition

  • All lines in 𝑀 pass through the origin

𝑝

  • Data structure:

– Project all lines onto any sphere 𝑇 𝑝,𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂(𝑄, πœ—)

  • Query Algorithm:

– Project the query on 𝑇(𝑝, 𝑠) to get π‘Ÿβ€² – Find the approximate closest point to π‘Ÿβ€², i.e., π‘ž = 𝐡𝑂𝑂𝑄 π‘Ÿβ€² – Return the corresponding line of π‘ž

slide-34
SLIDE 34

Unbounded Module

  • All lines in 𝑀 pass through a small ball

𝐢 𝑝, 𝑠

  • Query is far enough, outside of 𝐢(𝑝, 𝑆)
  • Use the same data structure and

query algorithm

slide-35
SLIDE 35

Unbounded Module

  • All lines in 𝑀 pass through a small ball

𝐢 𝑝, 𝑠

  • Query is far enough, outside of 𝐢(𝑝, 𝑆)
  • Use the same data structure and

query algorithm Lemma: if 𝑆 β‰₯ 𝑠

πœ—πœ— , the returned line β„“π‘ž is

  • Either an approximate closest line
  • Or is πœ€-close to the closest line β„“βˆ—
slide-36
SLIDE 36

Unbounded Module

  • All lines in 𝑀 pass through a small ball

𝐢 𝑝, 𝑠

  • Query is far enough, outside of 𝐢(𝑝, 𝑆)
  • Use the same data structure and

query algorithm Lemma: if 𝑆 β‰₯ 𝑠

πœ—πœ— , the returned line β„“π‘ž is

  • Either an approximate closest line
  • Or is πœ€-close to the closest line β„“βˆ—

This helps us in two ways

  • Bound the region for the net module
  • Restrict search to almost parallel lines
slide-37
SLIDE 37

Parallel Module - Intuition

  • All lines in 𝑀 are parallel
slide-38
SLIDE 38

Parallel Module - Intuition

  • All lines in 𝑀 are parallel
  • Data structure:

– Project all lines onto any hyper-plane 𝑏 which is perpendicular to all the lines to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂(𝑄, πœ—)

slide-39
SLIDE 39

Parallel Module - Intuition

  • All lines in 𝑀 are parallel
  • Data structure:

– Project all lines onto any hyper-plane 𝑏 which is perpendicular to all the lines to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂(𝑄, πœ—)

  • Query algorithm:

– Project the query on 𝑏 to get π‘Ÿβ€² – Find the approximate closest point to π‘Ÿβ€², i.e., π‘ž = 𝐡𝑂𝑂𝑄 π‘Ÿβ€² – Return the corresponding line to π‘ž

slide-40
SLIDE 40

Parallel Module

  • All lines in 𝑀 are πœ€-close to a base line ℓ𝑐
  • Project the lines onto a hyper-plane 𝑏 which is

perpendicular to ℓ𝑐

  • Query is close enough to 𝑏
  • Use the same data structure and query algorithm
slide-41
SLIDE 41

Parallel Module

  • All lines in 𝑀 are πœ€-close to a base line ℓ𝑐
  • Project the lines onto a hyper-plane 𝑏 which is

perpendicular to ℓ𝑐

  • Query is close enough to 𝑏
  • Use the same data structure and query algorithm

Lemma: if 𝑒𝑒𝑒𝑒 π‘Ÿ, 𝑏 ≀

πΈπœ— πœ— , then

  • Either the returned line β„“π‘ž is an approximate closest

line

  • Or 𝑒𝑒𝑒𝑒 π‘Ÿ, β„“π‘ž

≀ 𝐸

slide-42
SLIDE 42

Parallel Module

  • All lines in 𝑀 are πœ€-close to a base line ℓ𝑐
  • Project the lines onto a hyper-plane 𝑏 which is

perpendicular to ℓ𝑐

  • Query is close enough to 𝑏
  • Use the same data structure and query algorithm

Lemma: if 𝑒𝑒𝑒𝑒 π‘Ÿ, 𝑏 ≀

πΈπœ— πœ— , then

  • Either the returned line β„“π‘ž is an approximate closest

line

  • Or 𝑒𝑒𝑒𝑒 π‘Ÿ, β„“π‘ž

≀ 𝐸 Thus, for a set of almost parallel lines, we can use a set

  • f parallel modules to cover a bounded region.
slide-43
SLIDE 43

How the Modules Work Together

Given a set of lines, we come up with a polynomial number of balls.

slide-44
SLIDE 44

How the Modules Work Together

Given a set of lines, we come up with a polynomial number of balls.

  • If π‘Ÿ is inside the ball

– Use net module

q

slide-45
SLIDE 45

How the Modules Work Together

Given a set of lines, we come up with a polynomial number of balls.

  • If π‘Ÿ is inside the ball

– Use net module

  • If π‘Ÿ is outside the ball

– First use unbounded module to find a line β„“

q

slide-46
SLIDE 46

How the Modules Work Together

Given a set of lines, we come up with a polynomial number of balls.

  • If π‘Ÿ is inside the ball

– Use net module

  • If π‘Ÿ is outside the ball

– First use unbounded module to find a line β„“

q β„“

slide-47
SLIDE 47

How the Modules Work Together

Given a set of lines, we come up with a polynomial number of balls.

  • If π‘Ÿ is inside the ball

– Use net module

  • If π‘Ÿ is outside the ball

– First use unbounded module to find a line β„“ – Then use parallel module to search among parallel lines to β„“

q β„“

slide-48
SLIDE 48

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
slide-49
SLIDE 49

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
slide-50
SLIDE 50

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
slide-51
SLIDE 51

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
  • For logπ‘œ iterations

– Use β„“π‘ž to find a much closer line β„“π‘žβ€² – Update β„“π‘ž with β„“π‘ž

β€²

Improvement step

slide-52
SLIDE 52

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
  • For logπ‘œ iterations

– Use β„“π‘ž to find a much closer line β„“π‘žβ€² – Update β„“π‘ž with β„“π‘ž

β€²

Improvement step

slide-53
SLIDE 53

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
  • For logπ‘œ iterations

– Use β„“π‘ž to find a much closer line β„“π‘žβ€² – Update β„“π‘ž with β„“π‘ž

β€²

Why?

Improvement step

slide-54
SLIDE 54

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
  • For logπ‘œ iterations

– Use β„“π‘ž to find a much closer line β„“π‘žβ€² – Update β„“π‘ž with β„“π‘ž

β€²

Let 𝑒1, … , 𝑒log π‘œ be the log π‘œ closest lines to π‘Ÿ in the set 𝑇

Improvement step

slide-55
SLIDE 55

Outline of the Algorithms

  • Input: a set of π‘œ lines 𝑇
  • Randomly choose a subset of π‘œ/2 lines π‘ˆ
  • Solve the problem over π‘ˆ to get a line β„“π‘ž
  • For logπ‘œ iterations

– Use β„“π‘ž to find a much closer line β„“π‘žβ€² – Update β„“π‘ž with β„“π‘ž

β€²

Let 𝑒1, … , 𝑒log π‘œ be the log π‘œ closest lines to π‘Ÿ in the set 𝑇 With high probability at least one of {𝑒1, … , 𝑒log π‘œ} is sampled in π‘ˆ

– 𝑒𝑒𝑒𝑒 π‘Ÿ, β„“π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ, 𝑒log π‘œ (1 + πœ—) – log π‘œ improvement steps suffices to find an approximate closest line

Improvement step

slide-56
SLIDE 56

Improvement Step

Given a line β„“, how to improve it, i.e., find a closer line?

slide-57
SLIDE 57

Improvement Step

Given a line β„“, how to improve it, i.e., find a closer line?

  • Data structure
  • Query Processing Algorithm
slide-58
SLIDE 58

Improvement Step

Given a line β„“, how to improve it, i.e., find a closer line?

  • Data structure
  • Query Processing Algorithm

Use the three modules here

slide-59
SLIDE 59

Conclusion

Bounds we get for NLS problem

– Polynomial Space: 𝑃 𝑂 + 𝑒 𝑃

1 πœ—2

– Poly-logarithmic query time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

slide-60
SLIDE 60

Conclusion

Bounds we get for NLS problem

– Polynomial Space: 𝑃 𝑂 + 𝑒 𝑃

1 πœ—2

– Poly-logarithmic query time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

Future Work

  • The current result is not efficient in practice

– Large exponents – Algorithm is complicated

slide-61
SLIDE 61

Conclusion

Bounds we get for NLS problem

– Polynomial Space: 𝑃 𝑂 + 𝑒 𝑃

1 πœ—2

– Poly-logarithmic query time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

Future Work

  • The current result is not efficient in practice

– Large exponents – Algorithm is complicated

  • Can we get a simpler algorithms?
slide-62
SLIDE 62

Conclusion

Bounds we get for NLS problem

– Polynomial Space: 𝑃 𝑂 + 𝑒 𝑃

1 πœ—2

– Poly-logarithmic query time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

Future Work

  • The current result is not efficient in practice

– Large exponents – Algorithm is complicated

  • Can we get a simpler algorithms?
  • Generalization to higher dimensional flats
slide-63
SLIDE 63

Conclusion

Bounds we get for NLS problem

– Polynomial Space: 𝑃 𝑂 + 𝑒 𝑃

1 πœ—2

– Poly-logarithmic query time : 𝑒 + log 𝑂 +

1 πœ— 𝑃 1

Future Work

  • The current result is not efficient in practice

– Large exponents – Algorithm is complicated

  • Can we get a simpler algorithm?
  • Generalization to higher dimensional flats
  • Generalization to other objects, e.g. balls
slide-64
SLIDE 64

THANK YOU!