The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries - - PowerPoint PPT Presentation

the v diagram a query dependent approach to moving knn
SMART_READER_LITE
LIVE PREVIEW

The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries - - PowerPoint PPT Presentation

The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik Dept. of Computer Science and Software Engineering University of Melbourne p.1/25 Motivation Consider two scenarios:


slide-1
SLIDE 1

The V*-Diagram: A Query-Dependent Approach to Moving KNN Queries

Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik

  • Dept. of Computer Science and Software Engineering

University of Melbourne

– p.1/25

slide-2
SLIDE 2

Motivation

Consider two scenarios:

  • a driver in a GPS-equipped car finding the nearest

gas station along the route of a trip;

  • a tourist walking in the city looking for the nearest

ATM. These scenarios are examples of moving k nearest neighbor queries (MkNN).

– p.2/25

slide-3
SLIDE 3

Simple Approach

The Voronoi Diagram

Figure 1: Voronoi diagrams

Drawbacks:

  • 1. Expensive precomputations
  • 2. Inefficient update operations
  • 3. No support for dynamically changing k values

– p.3/25

slide-4
SLIDE 4

Best Existing Approach

Influence-set Retrieval [Zhang et al., 2003]

(a) Bisector Bad is discovered as a

boundary.

(b) All boundaries are discovered

Figure 2: Computing a Voronoi cell locally

– p.4/25

slide-5
SLIDE 5

Our Approach: V*-Diagram

Objectives:

  • 1. Requires no precomputation
  • 2. Supports dynamic insertions/deletions of objects
  • 3. Handles dynamically changing k

– p.5/25

slide-6
SLIDE 6

Our Approach: V*-Diagram

Objectives:

  • 1. Requires no precomputation
  • 2. Supports dynamic insertions/deletions of objects
  • 3. Handles dynamically changing k

Result: Outperforms the best practice [Zhang et al.] by 2 orders of magnitude

– p.5/25

slide-7
SLIDE 7

The V*-Diagram

Known Region

If the known NNs to q are {d, f, j}, the know region W(q, j) is {v : dist(q, v) ≤ dist(q, j)}.

– p.6/25

slide-8
SLIDE 8

The V*-Diagram

Safe region wrt a data point

We retrieve (k + x) objects. In this example, k and x are 1, so we retrieve p and z.

If q′ ∈ S(qb, z, p) then, ∀p′ / ∈ W(qb, z), dist(q′, p) < dist(q′, p′). S(qb, z, p) = {q′ : dist(p, q′) ≤ dist(qb, z) − dist(qb, q′)}

– p.7/25

slide-9
SLIDE 9

The V*-Diagram

The Fixed-rank Region (FRR) [Kulik and Tanin, 2006]

(a) a, c, b, f, e, d (b) a, c, b, e, f, d

Figure 3: Incremental rank update

– p.8/25

slide-10
SLIDE 10

The V*-Diagram

Integrated Safe Region (ISR) and V*-kNN

ISR is an intersection of

  • 1. the safe region wrt

kth NN, S(qb, z, pk);

  • 2. the FRR of the (k+x)

NNs of qb.

Figure 4: V*-kNN Example (k = 2, x = 2)

– p.9/25

slide-11
SLIDE 11

V*-kNN Algorithm

http://www.csse.unimelb.edu.au/~sarana/demo.html

– p.10/25

slide-12
SLIDE 12

Experiments

  • Data Structure: R*-trees (1-kB block size).
  • Comparative Method: RIS-kNN [Zhang et al.]
  • Datasets:
  • (U) 25,000 of data points in uniform distribution
  • (Z) 25,000 of data points in Zipfian distribution
  • (C) 65,743 postal addresses from California
  • (N) 119,897 postal addresses from

North-Eastern USA

– p.11/25

slide-13
SLIDE 13

Experiments

Trajectories

5000 5500 6000 6500 5000 5500 6000 6500

(a) Directional (D)

2850 2875 2900 2925 2950 2150 2175 2200 2225

(b) Random (R)

Figure 5: Trajectory types

– p.12/25

slide-14
SLIDE 14

Experiments

total cost wrt x

0.01 0.1 1 10 100 3 6 9 12 15 18 21 24 time (sec) x U Z C N

(a) Total cost (D)

0.01 0.1 1 10 100 3 6 9 12 15 18 21 24 time (sec) x U Z C N

(b) Page access (D)

Figure 6: Effect of x

– p.13/25

slide-15
SLIDE 15

Experiments

total cost wrt k

1 10 100 1000 10 20 30 40 time (sec) k V* (D) V* (R) RIS (D) RIS (R)

(a) Total Cost (California)

1 10 100 1000 10 20 30 40 time (sec) k V* (D) V* (R) RIS (D) RIS (R)

(b) Total Cost (North-Eastern USA)

Figure 7: Effect of k

– p.14/25

slide-16
SLIDE 16

Experiments

total cost wrt n

1 10 100 25 50 75 100 time (sec) n (x1000) V* (D) V* (R) RIS (D) RIS (R)

(a) Total Cost (Uniform)

1 10 100 25 50 75 100 time (sec) n (x1000) V* (D) V* (R) RIS (D) RIS (R)

(b) Total Cost (Zipfian)

Figure 8: Effect of dataset size

– p.15/25

slide-17
SLIDE 17

Cost model

RIS-kNN

The number of the kVD cells in 2D space is approximated as 2kn [Okabe et al., 1992]. For a given trajectory length l, the number nv of kVD cells crossed by the trajectory is given by nv = l √ 2kn.

– p.16/25

slide-18
SLIDE 18

Cost model

V*-kNN

Directional: nb = l/de. Random: nb = ls/d2

e, where s is the step size.

– p.17/25

slide-19
SLIDE 19

Experiments

Cost Model

0.1 1 10 100 1000 25 50 75 100 #accesses n (x1000) V* (D) V* (R) RIS (D) RIS (R) Est.

(a) Effect of n

1 10 100 10 20 30 40 #accesses k V* (D) V* (R) RIS (D) RIS (R) Est.

(b) Effect of k

Figure 9: Cost model validation

– p.18/25

slide-20
SLIDE 20

The V*-Diagram in a spatial network

Figure 10: Safe region Figure 11: Fixed-rank region Figure 12: ISR is S(q1, u, s) ∩ Fs, t, u

– p.19/25

slide-21
SLIDE 21

Experiments

The V*-Diagram in a spatial network

Figure 13: Road network in north America (175,813 nodes and 179,179 edges)

– p.20/25

slide-22
SLIDE 22

Experiments

The V*-Diagram in a spatial network

10 20 30 40 50 60 70 80 90 100 110 2 4 6 8 10 time (sec) x k=2 k=4 k=6 k=8 k=10

(a) Total Response Time

5 10 15 20 25 30 35 40 2 4 6 8 10 #accesses x k=2 k=4 k=6 k=8 k=10

(b) Access Cost

Figure 14: Spatial network: effect of x

– p.21/25

slide-23
SLIDE 23

Experiments

The V*-Diagram in a spatial network

20 40 60 80 100 120 140 160 180 200 220 250 500 750 1000 time (sec) l k=2 k=4 k=6 k=8 k=10

(a) Total Response Time

5 10 15 20 25 30 35 40 45 50 55 250 500 750 1000 #accesses l k=2 k=4 k=6 k=8 k=10

(b) Access Cost

Figure 15: Spatial network: effect of l

– p.22/25

slide-24
SLIDE 24

Conclusions

  • The V*-Diagram constructs a safe region using:
  • 1. the location of the query point,
  • 2. kNN-search coverage (known region),
  • 3. known data points.
  • V*-kNN is local, incremental and dynamic.
  • V*-kNN outperforms the best existing technique

by two orders of magnitude.

  • The V*-diagram is a general philosophy, which

can be applied to most safe region based techniques.

– p.23/25

slide-25
SLIDE 25

Related Publications

  • S. Nutanong, R. Zhang, E. Tanin, L. Kulik:

Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving k Nearest Neighbor

  • Queries. To appear in VLDB Journal.
  • S. Nutanong, R. Zhang, E. Tanin, L. Kulik:

V*-kNN: An Efficient Algorithm for Moving k Nearest Neighbor Queries (Demo). ICDE 2009: 1519-1522.

  • S. Nutanong, R. Zhang, E. Tanin, L. Kulik: The

V*-Diagram: a query-dependent approach to moving KNN queries. PVLDB 1(1): 1095-1106 (2008).

– p.24/25

slide-26
SLIDE 26

Key References

  • Lars Kulik, Egemen Tanin: Incremental Rank

Updates for Moving Query Points. GIScience 2006:251-268.

  • Atsuyuki Okabe, Berry Boots, Kokichi Sugihara,

Sung Nok Chiu: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley & Sons, Inc., 1992.

  • Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei

Tao, Dik Lun Lee: Location-based Spatial Queries. SIGMOD 2003:443-454.

– p.25/25