Caching Dynamic Skyline Queries D. Sacharidis 1 , P. Bouros 1 , T. - - PowerPoint PPT Presentation

caching dynamic skyline queries
SMART_READER_LITE
LIVE PREVIEW

Caching Dynamic Skyline Queries D. Sacharidis 1 , P. Bouros 1 , T. - - PowerPoint PPT Presentation

Caching Dynamic Skyline Queries D. Sacharidis 1 , P. Bouros 1 , T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems R.C. Athena July 11 SSDBM'08 Outline Introduction Skyline (SL)


slide-1
SLIDE 1

July 11 SSDBM'08

Caching Dynamic Skyline Queries

  • D. Sacharidis1, P. Bouros1, T. Sellis1,2

1National Technical University of Athens 2Institute for Management of Information Systems – R.C. Athena

slide-2
SLIDE 2

July 11 SSDBM'08

Outline

  • Introduction

– Skyline (SL) and dynamic skyline queries (DSL)

  • Related work
  • Evaluating dynamic skyline queries

– Computing orthant skylines (OSL) – Computing dynamic skyline via caching

  • LRU, LFU, LPP cache replacement policies
  • Experimental evaluation
  • Conclusions and Future work
slide-3
SLIDE 3

July 11 SSDBM'08

Skyline queries (SL)

  • Given a dataset of d-

dimensional points

– SL contains points not dominated by others – x dominates y iff x as good as y in all dimensions and strictly better in at least one

slide-4
SLIDE 4

July 11 SSDBM'08

Skyline queries (SL)

  • Given a dataset of d-

dimensional points

– SL contains points not dominated by others – x dominates y iff x as good as y in all dimensions and strictly better in at least one

  • Example

– Dataset of hotels – Prefer cheap hotels close to the sea

Distance from sea Price

slide-5
SLIDE 5

July 11 SSDBM'08

Skyline queries (SL)

  • Given a dataset of d-

dimensional points

– SL contains points not dominated by others – x dominates y iff x as good as y in all dimensions and strictly better in at least one

  • Example

– Dataset of hotels – Prefer cheap hotels close to the sea

Distance from sea Price

Skyline points

slide-6
SLIDE 6

July 11 SSDBM'08

Skyline queries (SL)

  • Given a dataset of d-

dimensional points

– SL contains points not dominated by others – x dominates y iff x as good as y in all dimensions and strictly better in at least one

  • Example

– Dataset of hotels – Prefer cheap hotels close to the sea

Distance from sea Price

Skyline points p1

slide-7
SLIDE 7

July 11 SSDBM'08

Skyline queries (SL)

  • Given a dataset of d-

dimensional points

– SL contains points not dominated by others – x dominates y iff x as good as y in all dimensions and strictly better in at least one

  • Example

– Dataset of hotels – Prefer cheap hotels close to the sea

Distance from sea Price

Skyline points p1 p2

slide-8
SLIDE 8

July 11 SSDBM'08

Dynamic skyline queries (DSL)

  • Extension of skyline

queries

– Given a query point q – DSL contains points not dynamically dominated by

  • thers w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static

SL

– Transform points w.r.t. q

slide-9
SLIDE 9

July 11 SSDBM'08

Dynamic skyline queries (DSL)

  • Extension of skyline

queries

– Given a query point q – DSL contains points not dynamically dominated by

  • thers w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static

SL

– Transform points w.r.t. q

  • Example

– User defines “ideal” hotel q

Distance from sea Price

Query point q

slide-10
SLIDE 10

July 11 SSDBM'08

Dynamic skyline queries (DSL)

  • Extension of skyline

queries

– Given a query point q – DSL contains points not dynamically dominated by

  • thers w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static

SL

– Transform points w.r.t. q

  • Example

– User defines “ideal” hotel q

Distance from sea Price

Dynamic Skyline points q

slide-11
SLIDE 11

July 11 SSDBM'08

Dynamic skyline queries (DSL)

  • Extension of skyline

queries

– Given a query point q – DSL contains points not dynamically dominated by

  • thers w.r.t q

– x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one

  • Can be treated as static

SL

– Transform points w.r.t. q

  • Example

– User defines “ideal” hotel q

Distance from sea Price

Dynamic Skyline points q p4 p5

slide-12
SLIDE 12

July 11 SSDBM'08

Intuition (1)

  • Traditional SL algorithms need to run

anew for each DSL query

  • Our idea

– Exploit results from past queries to reduce processing cost for future DSL queries – Cache past queries – Decide which queries in cache are useful

slide-13
SLIDE 13

July 11 SSDBM'08

Intuition (2)

Distance from sea Price

slide-14
SLIDE 14

July 11 SSDBM'08

Intuition (2)

  • 2 past DSL queries

– qa, qb

  • Each query partitions

space in 4 quadrants

Distance from sea Price

qa qb

slide-15
SLIDE 15

July 11 SSDBM'08

Intuition (3)

  • A new query q arrives
  • Consider DSL for qa

– p1 is contained DSL(qa) – p1 dominates p2, p3, p4

  • p1 lies in upper right

quadrant w.r.t. qa

  • qa lies in upper right

quadrant w.r.t. q

  • p1 dominates also p2, p3,

p4 w.r.t. to q

– Exclude p2, p3, p4 from dominance test for DSL(q)

Distance from sea Price

qa qb p1 p2 p3 p4 q

  • Shaded area denotes

points dominated by p1

slide-16
SLIDE 16

July 11 SSDBM'08

Contribution in brief

  • Caching past DSL queries cannot reduce

processing cost for future ones

– We need more information about dominance relationships

  • Introduce orthant skylines (OSL) and examine

their relationship with DSL

  • Extend Bitmap algorithm to compute OSL in

parallel with DSL

  • Cache OSL to enhance DSL queries evaluation

– Present 3 cache replacement policies

  • LRU, LFU, LPP
  • Experimental evaluation of caching mechanism
slide-17
SLIDE 17

July 11 SSDBM'08

Related work

  • Non-indexed methods

– Block-Nested Loops (BnL) – Bitmap – Multidimensional Divide and Conquer (DnC) – Sort First Scan (SFS)

  • Index-based methods

– B-tree

  • sort points according to the lowest valued coordinate

– R-tree

  • Nearest neighbor based (NN)
  • Branch and bound (BBS)
slide-18
SLIDE 18

July 11 SSDBM'08

Related work

  • Non-indexed methods

– Block-Nested Loops (BnL) – Bitmap – Multidimensional Divide and Conquer (DnC) – Sort First Scan (SFS)

  • Index-based methods

– B-tree

  • sort points according to the lowest valued coordinate

– R-tree

  • Nearest neighbor based (NN)
  • Branch and bound (BBS)
slide-19
SLIDE 19

July 11 SSDBM'08

Bitmap

  • BnL variant
  • Suitable for domains with low cardinality and

discrete

  • In brief

– Computes a bitmap representation of the points in the dataset – Examines each point separately (dominance test)

  • Checks whether it is contained in the skyline or not
  • Exploits fast bitwise operations OR/AND
slide-20
SLIDE 20

July 11 SSDBM'08

Bitmap – Dominance test

  • For each point p

– Define A = A1 & A2 & … & Ad

  • Denotes the points as good as p in all dimensions

– Define B = B1 | B2 | … | Bd

  • Denotes the points strictly better than p in at least
  • ne dimension

– Dominance test:

  • If C = A & B has all bits set to 0 then p is in SL
slide-21
SLIDE 21

July 11 SSDBM'08

Orthant skyline (OSL)

  • OSL provides more information

about dominance relationships than DSL

– Useful for pruning

  • Given a dataset of d-

dimensional points and a query point q

– Space partitioned in 2d

  • rthants

– o-th orthant skyline (OSL) of q contains points of the o-th

  • rthant not dynamically

dominated by others inside

  • rthant o w.r.t q
slide-22
SLIDE 22

July 11 SSDBM'08

Orthant skyline (OSL)

  • OSL provides more information

about dominance relationships than DSL

– Useful for pruning

  • Given a dataset of d-

dimensional points and a query point q

– Space partitioned in 2d

  • rthants

– o-th orthant skyline (OSL) of q contains points of the o-th

  • rthant not dynamically

dominated by others inside

  • rthant o w.r.t q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 Query point q

slide-23
SLIDE 23

July 11 SSDBM'08

Orthant skyline (OSL)

  • OSL provides more information

about dominance relationships than DSL

– Useful for pruning

  • Given a dataset of d-

dimensional points and a query point q

– Space partitioned in 2d

  • rthants

– o-th orthant skyline (OSL) of q contains points of the o-th

  • rthant not dynamically

dominated by others inside

  • rthant o w.r.t q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 Query point q

slide-24
SLIDE 24

July 11 SSDBM'08

Orthant skyline (OSL)

  • OSL provides more information

about dominance relationships than DSL

– Useful for pruning

  • Given a dataset of d-

dimensional points and a query point q

– Space partitioned in 2d

  • rthants

– o-th orthant skyline (OSL) of q contains points of the o-th

  • rthant not dynamically

dominated by others inside

  • rthant o w.r.t q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 Query point q Quadrant 2 skyline points

slide-25
SLIDE 25

July 11 SSDBM'08

OSL and DSL relationship

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q

slide-26
SLIDE 26

July 11 SSDBM'08

OSL and DSL relationship

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q

slide-27
SLIDE 27

July 11 SSDBM'08

OSL and DSL relationship

  • Map points from

quadrants 1,2,3 to points inside quadrant

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q

slide-28
SLIDE 28

July 11 SSDBM'08

OSL and DSL relationship

  • Map points from

quadrants 1,2,3 to points inside quadrant

  • Compute DSL w.r.t. q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q

slide-29
SLIDE 29

July 11 SSDBM'08

OSL and DSL relationship

  • Map points from

quadrants 1,2,3 to points inside quadrant

  • Compute DSL w.r.t. q
  • Union of all OSLs is

superset of DSL w.r.t. to q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q

slide-30
SLIDE 30

July 11 SSDBM'08

OSL and DSL relationship

  • Map points from

quadrants 1,2,3 to points inside quadrant

  • Compute DSL w.r.t. q
  • Union of all OSLs is

superset of DSL w.r.t. to q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q p1 p2

slide-31
SLIDE 31

July 11 SSDBM'08

OSL and DSL relationship

  • Map points from

quadrants 1,2,3 to points inside quadrant

  • Compute DSL w.r.t. q
  • Union of all OSLs is

superset of DSL w.r.t. to q

Quadrant 2

Distance from sea Price

Quadrant 0 Quadrant 3 Quadrant 1 q p3 p2

slide-32
SLIDE 32

July 11 SSDBM'08

Computing orthant skylines

  • Algorithm DBM

– Extends Bitmap to compute DSL and OSLs at the same time

  • Method:

– Compute bitmap representation

  • Transform each point coordinates w.r.t. to query q

– Dominance test, point p, orthant o

  • p not in OSLo and not in DSL
  • p not in DSL, but in OSLo
  • p in DSL and in OSLo
slide-33
SLIDE 33

July 11 SSDBM'08

Dynamic skylines Via Caching

  • Cache OSLs instead of DSLs

– Query cache contains (query point qj, OSLs) – OSLs encode by bitmaps

  • Algorithm cDBM

– OSL contains information about dominance test inside orthant – Discard points inside orthants from dominance tests

  • Method:

– Compute bitmap representation – For each point p consider its position (orthant) w.r.t. to cache queries qj – If p in the same orthant o w.r.t qj as qj w.r.t. q and p not in OSLo (qj) then exclude it from OSLo(q), DSL(q)

slide-34
SLIDE 34

July 11 SSDBM'08

Cache Replacement Policies

  • General idea

– Limited cache space – Identify least useful query point in cache – Replace it with new one

slide-35
SLIDE 35

July 11 SSDBM'08

Usage-based policies

  • Only a few queries in

cache are useful

  • Log cache query usage
  • Given a new query q

– Consider as input the query point cache Q – Only query points in OSL of Q w.r.t. q are useful – Update cache - remove:

  • Least Recently Used

(LRU) query point

  • Least Frequently Used

(LFU) query point

slide-36
SLIDE 36

July 11 SSDBM'08

Usage-based policies

  • Only a few queries in

cache are useful

  • Log cache query usage
  • Given a new query q

– Consider as input the query point cache Q – Only query points in OSL of Q w.r.t. q are useful – Update cache - remove:

  • Least Recently Used

(LRU) query point

  • Least Frequently Used

(LFU) query point

Distance from sea Price

qa qb qc qd q

slide-37
SLIDE 37

July 11 SSDBM'08

Usage-based policies

  • Only a few queries in

cache are useful

  • Log cache query usage
  • Given a new query q

– Consider as input the query point cache Q – Only query points in OSL of Q w.r.t. q are useful – Update cache - remove:

  • Least Recently Used

(LRU) query point

  • Least Frequently Used

(LFU) query point

Distance from sea Price

qa qb qc qd q Redundant queries

slide-38
SLIDE 38

July 11 SSDBM'08

Usage-based policies

  • Only a few queries in

cache are useful

  • Log cache query usage
  • Given a new query q

– Consider as input the query points in cache Q – Only query points in OSL of Q w.r.t. q are useful – Update cache - remove:

  • Least Recently Used

(LRU) query point

  • Least Frequently Used

(LFU) query point

Distance from sea Price

qa qb qc qd q Redundant queries

slide-39
SLIDE 39

July 11 SSDBM'08

Usage-based policies

  • Only a few queries in

cache are useful

  • Log cache query usage
  • Given a new query q

– Consider as input the query points in cache Q – Only query points in OSL of Q w.r.t. q are useful – Update cache - remove:

  • Least Recently Used

(LRU) query point

  • Least Frequently Used

(LFU) query point

Distance from sea Price

qa qb qc qd q Redundant queries

slide-40
SLIDE 40

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q

slide-41
SLIDE 41

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q 2:2 4 5:2 4 3:2 4 74:2 4

slide-42
SLIDE 42

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q 2:3 4 5:7 4 3:4 4 74:88 4

slide-43
SLIDE 43

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q 2:3 176 5:7 4 3:4 4 74:88 4

slide-44
SLIDE 44

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q 2:3 176 5:7 20 3:4 21 74:88 222

slide-45
SLIDE 45

July 11 SSDBM'08

Pruning power-based policy

  • Usage-based policies do

not indicate usefulness

  • Useful cached query

– Great pruning power

  • Probability that a query can

prune points of dataset from DSL computation

– Depends on

  • Points dominated by query

in an orthant j

  • Points contained in the

antisymetric orthant of j

  • Update cache – remove

– Query point with less pruning power (LPP)

Distance from sea Price

qa q 2:3 176 5:7 20 3:4 21 74:88 222

slide-46
SLIDE 46

July 11 SSDBM'08

Experimental Evaluation

  • Synthetic datasets

– Distribution types

  • Independent, correlated, anti-correlated

– Number of points N

  • 10k, 20k, 50k, 100k,

– Dimensionality

  • d = {2,3,4,5,6}

– Domain size for dimension

  • |D| = {10,20,50}
  • Compare

– Bitmap (NO-CACHE) – cDBM with LFU,LRU,LPP cache replacement policies – Query cache

  • |Q| = {10,20,30,40,50} past query points
  • Cache size is |Q|*N bits uncompressed
slide-47
SLIDE 47

July 11 SSDBM'08

Varying query cache size

  • Dataset: N = 50k points, with d = 4 dimensions of |D| = 20

domain size

  • LFU,LRU cache queries not representative for future ones
  • LPP caches queries with great pruning power

Anti-correlated Independent

slide-48
SLIDE 48

July 11 SSDBM'08

Effect of distribution parameters

  • Relative improvement in running time over NO-CACHE
  • Vary number of points N

– d = 4 dimensions of |D| = 20 domain size

  • Vary number of dimensions d

– N = 50k, |D| = 20 Correlated vary d Correlated vary N

slide-49
SLIDE 49

July 11 SSDBM'08

Conclusions and Future work

  • Conclusions

– Introduced orthant skylines (OSLs) and discussed its relationship with DSL – Extended Bitmap to compute OSLs and DSL at the same time (DBM algorithm) – Proposed caching mechanism of OSLs to reduce cost for future DSL queries

  • LRU, LFU, LPP cache replacement policies

– Experimentally verified the efficiency of caching mechanism

  • Future work

– Apply caching mechanism to index-based methods – Further increase pruning power of cached queries

slide-50
SLIDE 50

July 11 SSDBM'08

Questions ?