Shooting Stars in the Sky An Online Algorithm for Skyline Queries - - PowerPoint PPT Presentation

shooting stars in the sky
SMART_READER_LITE
LIVE PREVIEW

Shooting Stars in the Sky An Online Algorithm for Skyline Queries - - PowerPoint PPT Presentation

Shooting Stars in the Sky An Online Algorithm for Skyline Queries Donald Kossmann Frank Ramsak Steffen Rost kossmann@in.tum.de frank.ramsak@forwiss.de rost@in.tum.de Technische Universitt Mnchen Institut fr Informatik Boltzmannstr. 3


slide-1
SLIDE 1

Shooting Stars in the Sky

An Online Algorithm for Skyline Queries

Donald Kossmann Frank Ramsak Steffen Rost kossmann@in.tum.de frank.ramsak@forwiss.de rost@in.tum.de Technische Universität München Institut für Informatik

  • Boltzmannstr. 3

85748 Garching b. München Germany

slide-2
SLIDE 2

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Outline

Motivation

– Skyline & known algorithms – Challenges in online scenarios

The NN algorithm for Skyline queries

– Algorithm for 2D – Relationship between NN and Skyline – Algorithm for higher dimensionality

Evaluation Supporting user control Summary

slide-3
SLIDE 3

Shooting Stars in the Sky, VLDB 2002, Hong Kong

What is the Skyline?

Literature: Minimum/maximum vector problem

– Two vectors are not comparable – Dominance: A vector/point dominates another point if it is as good or better in all dimensions and better in at least one dimension – Skyline: All points of a data set that are not dominated by any other point

x: price [€] y: distance to the beach [km]

slide-4
SLIDE 4

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Traditional Skyline algorithms

Blocking algorithms: require to read the complete

data set

– Compare each point with all other points – [Börzsönyi et.al. ICDE 01]

  • Block-Nested-Loops (BNL): keep window of

candidate Skyline points

  • Divide-and-Conquer (D&C): divide data set and

compute partial Skylines and merge them

Progressive algorithms [Tan et.al. VLDB 01]

– Bitmap: operations on range-encoded bitmaps – Index: transformation of d-dimensional space to

  • ne dimension + B-Tree
slide-5
SLIDE 5

Shooting Stars in the Sky, VLDB 2002, Hong Kong

The challenge in online scenarios

Compute the first few Skyline points almost

instantaneously

Compute more and more results incrementally “Big picture”: compute Skyline points from the

whole range, do not favor points that are good in

  • ne dimension
slide-6
SLIDE 6

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Result ≠ Result - Quality of Results

Complete Skyline Progressive:first 10 points by Index; Order of Bitmap depends on insertion order Online: first 10 points by our NN algorithm

slide-7
SLIDE 7

Shooting Stars in the Sky, VLDB 2002, Hong Kong

The challenge in online scenarios

Compute the first few Skyline points almost

instantaneously

Compute more and more results incrementally “Big picture”: compute Skyline points from the

whole range, do not favor points that are good in

  • ne dimension

Do not compute good approximations, do only

return real Skyline points

User should be able to make preferences while

the algorithm is running control which Skyline points are produced next

Universality w.r. to data sets and type of Skyline

queries

slide-8
SLIDE 8

Shooting Stars in the Sky, VLDB 2002, Hong Kong

RangeNNSearch

The NN algorithm: 2D example

x: price [€] y: distance to the beach [km] RangeNNSearch

slide-9
SLIDE 9

Shooting Stars in the Sky, VLDB 2002, Hong Kong

The NN algorithm for 2 dimensions

Input: data set D, monotonic distance function f Additional structures: to-do list T, keeps

information of regions to be processed

Algorithm:

T = {(O, ∞)} while (T = ∅) do (mx, my) = takeElement(T) if (∃ RangeNNSearch(O, D, (mx, my), f)) then (nx, ny) = RangeNNSearch(O, D, (mx, my), f)

  • utput n

T = T ∪ {(nx, my), (mx, ny)} endif endwhile

slide-10
SLIDE 10

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Correctness of the NN algorithm

Relationship between Nearest Neighbor (NN)

and Skyline Given a data set D with origin O and an arbitrary monotonic distance function f, we can state:

– Observation 1: The Nearest Neighbor NN of O in D w.r.t. f is in the Skyline – Observation 2: Given a region R with R=(O, X)=X, the Nearest Neighbor NN of O in R w.r.t. f is in the Skyline

slide-11
SLIDE 11

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Extending NN algorithm to higher dimensionality

Observations also hold in d dimensional space Modification: Processed region is partitioned into d

subregions w.r.t. the NN

Problem: duplicate Skyline points may occur

Solutions: post-filtering, merging, propagation, ...

n x y z n n n n x y z p n n p n p

slide-12
SLIDE 12

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Comparison with other approaches

Algorithms:

– Online algorithm: our NN algorithm – Blocking algorithms: BNL, D&C – Progressive algorithms: Bitmap, Index

Data sets:

– Sizes: 100 K points, 1 M points – Distributions: correlated, anti-correlated, independent – Dimensionality: 2 - 10

slide-13
SLIDE 13

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in 2D

anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 1M 100K

slide-14
SLIDE 14

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in 2D

NN depends on data distribution (Skyline size),

not on data set size

anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 1M 100K

slide-15
SLIDE 15

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in 2D

NN depends on data distribution (Skyline size),

not on data set size

BNL, D&C depend on data set size only

anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 1M 100K

slide-16
SLIDE 16

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in 2D

NN depends on data distribution (Skyline size),

not on data set size

BNL, D&C depend on data set size only Bitmap depends on data size and data

distribution

anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 1M 100K

slide-17
SLIDE 17

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in 2D

NN depends on data distribution (Skyline size),

not on data set size

BNL, D&C depend on data set size only Bitmap depends on data size and data

distribution

Index depends on data distribution and on data

size

anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 1M 100K

slide-18
SLIDE 18

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Performance in higher dimensional spaces

d < 4: NN is typically the winner in all respects d >= 4: Performance depends on goal

– Complete Skyline: BNL and D&C are usually the best choice to compute the complete Skyline – Big picture: NN produces the big picture the fastest – Data rate: Index produces Skyline points at the highest rate, but always returns “extreme” points first no big picture

slide-19
SLIDE 19

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Providing control

Goal: user should be able to determine order of

Skyline points at run-time

Order of region processing

– Influences „direction“

Adaptation of distance function

– Does not change Skyline → Observation 1&2 – Influences order of points

Bitmap and Index lack control

– Order of Skyline points is determined by one- dimensional mapping and insertion order resp.

slide-20
SLIDE 20

Shooting Stars in the Sky, VLDB 2002, Hong Kong

RangeNNSearch (DF1)

Example: User control

x: price [€] y: distance to the beach [km] RangeNNSearch (DF2)

slide-21
SLIDE 21

Shooting Stars in the Sky, VLDB 2002, Hong Kong

Summary and future work

Online algorithm for Skyline based on NN-search NN algorithm

– returns first Skyline points instantaneously – builds complete Skyline incrementally – generates a “big picture“ of the Skyline – generates only Skyline points → no approximation – supports user interaction – is universal

Future work:

– Main memory, multidimensional indexing for region list – Continuous Skyline queries