Outline Spatial Index R Tree NN query Nave solution Nearest - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Spatial Index R Tree NN query Nave solution Nearest - - PDF document

9/3/2009 Outline Spatial Index R Tree NN query Nave solution Nearest Neighbor Queries A better solution Branch and bound Nick Roussopoulos, Stephen Kelly and Frdric Vincent Can we do better?


slide-1
SLIDE 1

9/3/2009 1

Nearest Neighbor Queries

Ling Hu lingh@usc.edu Nick Roussopoulos, Stephen Kelly and Frédéric Vincent

Outline

  • Spatial Index – R‐Tree
  • NN query – Naïve solution
  • A better solution – Branch‐and‐bound
  • Can we do better?
  • Experiment Results

2

2 3 1 4 5

R-

  • Tree

Tree

6 7 8 9 10 11 12

3

2 3 1 4 5

R-

  • Tree

Tree

E F 2 3 1 4 5 6 7 8 9 10 11 12 G H

4

6 7 8 9 10 11 12 2 3 1 4 5 E F

R-

  • Tree

Tree

B 2 3 1 4 5 6 7 8 9 10 11 12 C G H

5

6 7 8 9 10 11 12 6 2 3 1 4 5 A B E F

R-

  • Tree

Tree

B C E F G H A: B: C:

Pointer MBR

One entry:

6 2 3 1 4 5 6 7 8 9 10 11 12 C G H

B: 4 5 6 1 2 3 10 11 12 7 8 9 E F G H 6

6 7 8 9 10 11 12

slide-2
SLIDE 2

9/3/2009 2

Outline

  • Spatial Index – R‐Tree
  • NN query – Naïve solution
  • A better solution – Branch‐and‐bound
  • Can we do better?
  • Experiment Results

7

Nearest Neighbor Search

  • Retrieve the nearest neighbor of query point Q
  • Simple Strategy:

– convert the nearest neighbor search to range search. – Guess a range around Q that contains at least one object say O

  • if the current guess does not include any answers, increase range size

til bj t f d

8

until an object found.

– Compute distance d’ between Q and O – re‐execute the range query with the distance d’ around Q. – Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!!

Naïve Approach

A B E F

B C E F G H A: B: C: 4 5 6 1 2 3 10 11 12 E F G

2 3 1 4 5 C G H

Query Point Q

Issues: how to guess range? The retrieval may be sub‐optimal if incorrect range guessed. Would be a problem in high dimensional spaces.

7 8 9 H

6 7 5 8 9 10 11 12

Outline

  • Spatial Index – R‐Tree
  • NN query – Naïve solution
  • A better solution – Branch‐and‐bound
  • Can we do better?
  • Experiment Results

10

MINDIST Property

  • MINDIST is a lower bound of any k-NN distance

(s1, s2) (t1, t2) (p1, p2) (p1, p2) (p1, p2) (p1, p2) (p1, p2)

11

A Better Strategy for KNN search

  • A sorted priority queue based on MINDIST;
  • Nodes traversed in order;
  • Stops when there is an object at the top of the

queue; (1‐NN found)

12

  • k‐NN can be computed incrementally;

I/O optimal

slide-3
SLIDE 3

9/3/2009 3

Priority Queue

B C E F G H A: B: C: 4 5 6 1 2 3 10 11 12 7 8 9 E F G H

2 3 1 4 5 A B E F

A B C E F C

6 7 5 8 9 11 C G H

Query Point Q

C 5 6 4 F H 5 G 6 4 F 7 5 8 9 G 6 4 F

1NN

13

10 12

Outline

  • Spatial Index – R‐Tree
  • NN query – Naïve solution
  • A better solution – Branch‐and‐bound
  • Can we do better?
  • Experiment Results

14

MBR Face Property

  • MBR is an n‐dimensional Minimal Bounding

Rectangle used in R trees, which is the minimal bounding n‐dimensional rectangle bounds its corresponding objects corresponding objects.

  • MBR face property: Every face of any MBR

contains at least one point of some object in the database.

15

MBR Face Property – 2D

16

MBR Face Property – 3D

Rectangle R 17

Improving the KNN Algorithm

  • While the MinDist based algorithm is I/O
  • ptimal, its performance may be further

improved by pruning nodes from the priority queue

18

queue.

slide-4
SLIDE 4

9/3/2009 4

Properties of MINMAXDIST

  • MINMAXDIST(P,R) is the minimum over all dimensions

distances from P to the furthest point of the closest face of R.

  • MINMAXDIST is the smallest possible upper bound of

distances from the point P to the rectangle R.

  • MINMAXDIST guarantees there is an object within the R

at a distance to P less than or equal to it.

  • MINMAXDIST is an upper bound of the 1-NN

distance

19

MINDIST & MINMAXDIST

MINDIST(P,R) <= NN(P) <= MINMAXDIST(P,R)

20

MinDist & MinMaxDist – 3D

Query Point Q Rectangle R MinDist(Q,R) MinMaxDist(Q,R) 21

Pruning 1

  • Downward pruning: An MBR R is discarded

R R’

If there exists another R’ such that MINDIST(P,R)> MINMAXDIST(P,R’)

R MINDIST P MINMAXDIST

22

Pruning 2

  • Downward pruning: An object O is discarded

O R’

If there exists an R such that Actual_dist(P,O) > MINIMAXDIST(P,R)

O R Actual‐dist P MINMAXDIST

23

Pruning 3

  • Upward pruning: An MBR R is discarded

R

If an object O is found such that MINDIST(P,R) > Actual_dist(P,O)

R O MINDIST P Actual_dist

24

slide-5
SLIDE 5

9/3/2009 5

MINDIST vs MINMAXDIST Ordering

  • MINDIST: optimistic
  • MINMAXDIST: pessimistic

25

  • Example: MINDIST ordering finds the 1‐NN first

MINDIST vs MINMAXDIST Ordering

26

  • Example: MINMAXDIST ordering finds the 1‐NN first

Generalize to k‐NN

  • Keep a sorted buffer of at most k current nearest

neighbors

  • Pruning is done according to the distance of the

furthest nearest neighbor in this buffer

27

  • Example:

R The k‐th object in the buffer MINDIST P Actual_dist

Outline

  • Spatial Index – R‐Tree
  • NN Query – Intuitive Solutions
  • Optimized NN Query – branch‐and‐bound
  • Experiment Results
slide-6
SLIDE 6

9/3/2009 6

Key Insights

  • # of pages accessed grows when k grews;
  • The denser the dataset, the more page access;
  • MinDist v.s. MinMaxDist: same in shape, but

i i h /O MinMaxDist has more I/O cost;

  • In Dense area, MinMaxDist is bad;

Thanks & Questions ?