Approximate Voronoi Diagrams: Techniques, tools, and applications to - - PowerPoint PPT Presentation

approximate voronoi diagrams techniques tools and
SMART_READER_LITE
LIVE PREVIEW

Approximate Voronoi Diagrams: Techniques, tools, and applications to - - PowerPoint PPT Presentation

Approximate Voronoi Diagrams: Techniques, tools, and applications to k th ANN search Nirman Kumar University of California, Santa-Barbara January 13th, 2016 Similarity Search ? Need similarity search to make sense of the world! When an


slide-1
SLIDE 1

Approximate Voronoi Diagrams: Techniques, tools, and applications to kth ANN search

Nirman Kumar

University of California, Santa-Barbara

January 13th, 2016

slide-2
SLIDE 2

Similarity Search

?

Need similarity search to make sense of the world!

slide-3
SLIDE 3

When an appropriate metric is defined

Similarity search reduces to NN search

slide-4
SLIDE 4

Nearest neighbor search

Set of points P: find quickly for a query q, the closest point to q in P

slide-5
SLIDE 5

Nearest neighbor search

Also important in other domains

slide-6
SLIDE 6

Approximate nearest neighbor search (ANN)

Find any point x with d(q, x) ≤ (1 + ε)d1(q, P)

slide-7
SLIDE 7

Space partitioning

Most data structures for NN (or ANN) search partition space

slide-8
SLIDE 8

Space partitioning

In low dimensions this is an explicit paritioning

slide-9
SLIDE 9

Space partitioning

In high dimensions the partitioning is implicit (via hash functions)

slide-10
SLIDE 10

Voronoi diagrams

slide-11
SLIDE 11

Voronoi diagrams

Very efficient in dimensions d ≤ 2

slide-12
SLIDE 12

Voronoi diagrams

Performance degrades sharply - bad even for d = 3

slide-13
SLIDE 13

This talk

◮ Construction of Approximate Voronoi

Diagrams

◮ Tools used - Quadtrees, WSPD ◮ Construction of AVD for kth ANN ◮ Some open problems

slide-14
SLIDE 14

Approximate Voronoi Diagrams (AVD)

A space partition as before

slide-15
SLIDE 15

Approximate Voronoi Diagrams (AVD)

With each region is associated 1 rep (a point of P)

slide-16
SLIDE 16

Approximate Voronoi Diagrams (AVD)

This rep is a valid ANN for any q in region

slide-17
SLIDE 17

Main ideas behind ANN search and AVDs

◮ If the query point is “far” any point is a

good ANN

◮ A region can be approximated well by cubes ◮ Point location can be done in a set of cubes

efficiently

slide-18
SLIDE 18

Tool 1: Quadtrees

A quadtree - intuitively

[0, 1] × [0, 1]

slide-19
SLIDE 19

Tool 1: Quadtrees

A quadtree on points

a b c d e f g h i a c b d i g h e f

slide-20
SLIDE 20

Tool 1: Quadtrees

The compressed version

a b c d e f g h i a c b d i g h e f

slide-21
SLIDE 21

Tool 1: Quadtrees

Point Location ≡ find leaf node containing a point

slide-22
SLIDE 22

Tool 1: Quadtrees

Height h: O(log h) time - O(log log n) for balanced tree!

slide-23
SLIDE 23

Tool 1: Quadtrees

But height not bounded as function of n

slide-24
SLIDE 24

Tool 1: Quadtrees

Use compressed quadtree - height bounded by O(n)

slide-25
SLIDE 25

Tool 2: Well separated pairs decomposition

How many distances among points - Ω(n2)

slide-26
SLIDE 26

Tool 2: Well separated pairs decomposition

What if distances within (1 ± ε) are considered the same?

slide-27
SLIDE 27

Tool 2: Well separated pairs decomposition

About O(n/εd) different distinct distances upto (1 ± ε)

slide-28
SLIDE 28

Tool 2: Well separated pairs decomposition

◮ How can we represent them? ◮ Given a pair of points, which

bucket does it belong to?

slide-29
SLIDE 29

Tool 2: Well separated pairs decomposition

The WSPD data structure captures this

slide-30
SLIDE 30

Tool 2: Well separated pairs decomposition More formally

◮ A collection of pairs Ai, Bi ⊂ P ◮ Ai ∩ Bi = ∅ ◮ Every pair of points is separated by some Ai, Bi ◮ Each pair Ai, Bi is well separated

slide-31
SLIDE 31

Tool 2: Well separated pairs decomposition A well separated pair is a dumbbell

ℓ ≥ 1/ε max{r1, r2}

r1 r2

slide-32
SLIDE 32

Tool 2: Well separated pairs decomposition WSPD example

a a b c d e f b c d f e a b c d f e

A1 = {a, b, c}, B1 = {e} A1 = {a}, B1 = {b, c} . . .

slide-33
SLIDE 33

Tool 2: Well separated pairs decomposition

Main result about WSPDs

There is a ε−1-WSPD of size O(nε−d) - It can be constructed in O(n log n + nε−d) time

slide-34
SLIDE 34

AVD results

The main result

◮ O(n/εd) cells ◮ Query time - O(log(n/ε))

slide-35
SLIDE 35

The AVD algorithm

Construct a 8-WSPD for the point set

slide-36
SLIDE 36

The AVD algorithm

Let (Ai, Bi) for i = 1, . . . , m be the pairs

slide-37
SLIDE 37

The AVD algorithm

For each pair do some processing

  • output some cells
slide-38
SLIDE 38

The AVD algorithm

Preprocess them for point location

slide-39
SLIDE 39

The AVD algorithm

So what is the processing per pair?

slide-40
SLIDE 40

The AVD algorithm Consider a WSPD dumbbell

slide-41
SLIDE 41

The AVD algorithm Concentric balls increasing radii - r/4 to ≈ r/ε

slide-42
SLIDE 42

The AVD algorithm Tile each ball (rad x) by cubes of size ≈ εx

slide-43
SLIDE 43

The AVD algorithm

Store the ε/c ANN for some point in each cell

slide-44
SLIDE 44

So why does it work?

Every pair of competing points is resolved

slide-45
SLIDE 45

So why does it work?

p1, p2 resolved by the WSPD pair separating them

slide-46
SLIDE 46

So why does it work?

p1 p2

slide-47
SLIDE 47

So why does it work?

q

p1 p2

slide-48
SLIDE 48

So why does it work?

q

p1 p2

slide-49
SLIDE 49

So why does it work?

q

p1 p2

slide-50
SLIDE 50

Bounding the AVD complexity

The shown method gives O(n/εd log 1/ε) cubes

slide-51
SLIDE 51

Bounding the AVD complexity

This can be improved to O(n/εd)

slide-52
SLIDE 52

kth ANN search Given q output a point u ∈ P such that: (1 − ε)dk(q, P) ≤ d(q, u) ≤ (1 + ε)dk(q, P)

slide-53
SLIDE 53

Applications of kth ANN search

◮ Density estimation ◮ Functions of the form : F(q) = k

i=1 f(di(q, P))

◮ kth ANN on balls

slide-54
SLIDE 54

Applications of kth ANN search Density estimation

density ≈ #points

area

slide-55
SLIDE 55

The result

AVD for kth ANN

O((n/k)ε−d log 1/ε) cells

◮ Query time - O(log(n/(kε)))

slide-56
SLIDE 56

Quorum clustering

slide-57
SLIDE 57

Quorum clustering

Find smallest ball containing k points

slide-58
SLIDE 58

Quorum clustering

Find smallest ball containing k points

slide-59
SLIDE 59

Quorum clustering

Remove points and repeat

slide-60
SLIDE 60

Quorum clustering

Remove points and repeat

slide-61
SLIDE 61

Quorum clustering

Remove points and repeat

slide-62
SLIDE 62

Quorum clustering

Remove points and repeat

slide-63
SLIDE 63

Quorum clustering

Remove points and repeat

slide-64
SLIDE 64

Quorum clustering

Remove points and repeat

slide-65
SLIDE 65

Quorum clustering

A way to summarize points

slide-66
SLIDE 66

Quorum clustering

Has properties favorable for kth ANN problem

slide-67
SLIDE 67

Quorum clustering

Quorum clustering too expensive to compute

slide-68
SLIDE 68

Quorum clustering

Can compute approximate quorum clustering

slide-69
SLIDE 69

Quorum clustering

◮ Computed in: O(n logd n) time in I

Rd [Carmi, Dolev, Har-Peled, Katz and Segal, 2005]

◮ Computed in: O(n log n) time in I

Rd [Har-Peled and K., 2012]

slide-70
SLIDE 70

Why is quorum clustering useful

c1

c2 c3 r1 r2 r3 q

◮ x = dk(q, P) ◮ r1 ≤ x ◮ x + r1 ≥ d(q, c1) =

⇒ d(q, c1) ≤ 2x

◮ x ≤ d(q, c1) + r1 ≤ 3x

slide-71
SLIDE 71

Refining the approximation

Just as in AVDs generate a list of cells

slide-72
SLIDE 72

Refining the approximation

slide-73
SLIDE 73

Refining the approximation

slide-74
SLIDE 74

Refining the approximation

For closest ball use ANN data structure in I Rd+1

slide-75
SLIDE 75

Refining the approximation

b = b(c, r) → (c, r) ∈ I Rd+1

slide-76
SLIDE 76

Refining the approximation

Some cells generated by AVD for ball centers

slide-77
SLIDE 77

Refining the approximation

Store some info with each cell

slide-78
SLIDE 78

Refining the approximation

A kth ANN, and approximate closest ball

slide-79
SLIDE 79

Open problems

◮ In high dimensions, is there a data structure for kth NN

whose space requirement is f(n/k)?

◮ There is an AVD for weighted ANN similar to AVD as

shown - is there an extension to weighted kth ANN?