outline
play

Outline Spatial Index R Tree NN query Nave solution Nearest - PDF document

9/3/2009 Outline Spatial Index R Tree NN query Nave solution Nearest Neighbor Queries A better solution Branch and bound Nick Roussopoulos, Stephen Kelly and Frdric Vincent Can we do better?


  1. 9/3/2009 Outline • Spatial Index – R ‐ Tree • NN query – Naïve solution Nearest Neighbor Queries • A better solution – Branch ‐ and ‐ bound Nick Roussopoulos, Stephen Kelly and Frédéric Vincent • Can we do better? • Experiment Results Ling Hu lingh@usc.edu 2 R- -Tree Tree R- -Tree Tree 1 1 1 2 2 2 E 3 3 3 F 4 4 4 5 5 5 6 6 6 11 11 11 7 7 7 G 9 9 9 12 12 12 8 8 8 H 10 10 10 3 4 A R- -Tree Tree R- -Tree Tree One entry: 1 1 B Pointer MBR 1 1 B 2 2 A: B C 2 2 E 3 3 F E 3 3 4 4 F 4 4 5 5 5 5 G H C: E F 6 6 B: B: 6 6 6 6 11 11 C E F G 7 7 11 11 G C 6 7 7 9 9 4 5 1 2 3 10 11 12 G 12 12 9 9 8 8 H 12 12 H 8 8 7 8 10 10 9 H 10 10 5 6 1

  2. 9/3/2009 Outline Nearest Neighbor Search • Retrieve the nearest neighbor of query point Q • Spatial Index – R ‐ Tree • Simple Strategy: • NN query – Naïve solution – convert the nearest neighbor search to range search. – Guess a range around Q that contains at least one object say O • A better solution – Branch ‐ and ‐ bound • if the current guess does not include any answers, increase range size until an object found. til bj t f d • Can we do better? – Compute distance d’ between Q and O – re ‐ execute the range query with the distance d’ around Q. • Experiment Results – Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!! 7 8 Naïve Approach Outline A A: B C 1 B • Spatial Index – R ‐ Tree F 2 C: G H F • NN query – Naïve solution B: E E 3 E F G • A better solution – Branch ‐ and ‐ bound 4 4 5 6 1 2 3 10 11 12 5 5 H 7 8 9 • Can we do better? 6 • Experiment Results Issues: how to guess range? Query Point Q C The retrieval may be sub ‐ optimal if 7 G incorrect range guessed. 11 9 12 8 Would be a problem in high H 10 dimensional spaces. 10 A Better Strategy for KNN search MINDIST Property • MINDIST is a lower bound of any k-NN distance • A sorted priority queue based on MINDIST; • Nodes traversed in order; • Stops when there is an object at the top of the queue; (1 ‐ NN found) • k ‐ NN can be computed incrementally; (p1, p2) (t1, t2) I/O optimal (p1, p2) (p1, p2) (s1, s2) (p1, p2) (p1, p2) 11 12 2

  3. 9/3/2009 Priority Queue Outline A: A B C 1 B • Spatial Index – R ‐ Tree C: B: F G H E F 2 E F G • NN query – Naïve solution E 6 4 5 1 2 3 10 11 12 3 H 7 8 9 • A better solution – Branch ‐ and ‐ bound 4 5 5 A • Can we do better? 6 B C • Experiment Results E C F Query Point Q C C 5 6 4 F 7 G 11 9 12 8 H 5 G 6 4 F H 10 7 5 8 G 9 6 4 F 1NN 13 14 MBR Face Property – 2D MBR Face Property • MBR is an n ‐ dimensional Minimal Bounding Rectangle used in R trees, which is the minimal bounding n ‐ dimensional rectangle bounds its corresponding objects corresponding objects. • MBR face property: Every face of any MBR contains at least one point of some object in the database. 15 16 MBR Face Property – 3D Improving the KNN Algorithm • While the MinDist based algorithm is I/O optimal, its performance may be further Rectangle R improved by pruning nodes from the priority queue queue. 17 18 3

  4. 9/3/2009 Properties of MINMAXDIST MINDIST & MINMAXDIST • MINMAXDIST(P,R) is the minimum over all dimensions MINDIST(P,R) <= NN(P) <= MINMAXDIST(P,R) distances from P to the furthest point of the closest face of R. • MINMAXDIST is the smallest possible upper bound of distances from the point P to the rectangle R . • MINMAXDIST guarantees there is an object within the R at a distance to P less than or equal to it. • MINMAXDIST is an upper bound of the 1-NN distance 19 20 MinDist & MinMaxDist – 3D Pruning 1 Query Point Q Downward pruning: An MBR R is discarded • If there exists another R’ such that MINDIST(P,R)> MINMAXDIST(P,R’) MinMaxDist(Q,R) MinDist(Q,R) R Rectangle R R R’ P MINDIST MINMAXDIST 21 22 Pruning 2 Pruning 3 Downward pruning: An object O is discarded Upward pruning: An MBR R is discarded • • If an object O is found such that MINDIST(P,R) > Actual_dist(P,O) If there exists an R such that Actual_dist(P,O) > MINIMAXDIST(P,R) R R’ O O R R P Actual ‐ dist MINMAXDIST P MINDIST Actual_dist O 23 24 4

  5. 9/3/2009 MINDIST vs MINMAXDIST Ordering MINDIST vs MINMAXDIST Ordering • MINDIST: optimistic • MINMAXDIST: pessimistic • Example: MINDIST ordering finds the 1 ‐ NN first • Example: MINMAXDIST ordering finds the 1 ‐ NN first 25 26 Outline Generalize to k ‐ NN • Keep a sorted buffer of at most k current nearest • Spatial Index – R ‐ Tree neighbors • NN Query – Intuitive Solutions • Pruning is done according to the distance of the • Optimized NN Query – branch ‐ and ‐ bound furthest nearest neighbor in this buffer • Example: • Experiment Results R P MINDIST Actual_dist The k ‐ th object in the buffer 27 5

  6. 9/3/2009 Key Insights • # of pages accessed grows when k grews; • The denser the dataset, the more page access; • MinDist v.s. MinMaxDist: same in shape, but MinMaxDist has more I/O cost; i i h /O • In Dense area, MinMaxDist is bad; Thanks & Questions ? 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend