ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong - - PowerPoint PPT Presentation

zinc efficient indexing for skyline computation
SMART_READER_LITE
LIVE PREVIEW

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong - - PowerPoint PPT Presentation

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore Skyline Queries Skyline points that are not dominated by other points wrt a set of dimensions


slide-1
SLIDE 1

ZINC: Efficient Indexing for Skyline Computation

Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore

slide-2
SLIDE 2

Skyline Queries

◮ Skyline – points that are not dominated by other points wrt a set

  • f dimensions

◮ Point x dominates point y if

(1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension

◮ Example: Find used cars that are cheap and have low mileage

A B C D E F G H I J Mileage Price

VLDB 2011 2

slide-3
SLIDE 3

Skyline Queries

◮ Skyline – points that are not dominated by other points wrt a set

  • f dimensions

◮ Point x dominates point y if

(1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension

◮ Example: Find used cars that are cheap and have low mileage

A B C D E F G H I J Mileage Price

VLDB 2011 2

slide-4
SLIDE 4

Simple Evaluation Algorithm

Input: set of data points P Output: set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if (p is not dominated by any point in S) then delete each s ∈ S if p dominates s insert p into S return S

VLDB 2011 3

slide-5
SLIDE 5

Simple Evaluation Algorithm

Input: set of data points P Output: set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if (p is not dominated by any point in S) then delete each s ∈ S if p dominates s insert p into S return S

Drawbacks:

◮ Need to scan entire data set ◮ Performs many dominance comparisons ◮ Non-progressive

VLDB 2011 3

slide-6
SLIDE 6

Processing Skyline Queries

◮ Scan-based solutions:

◮ BNL, D&C [Börzsönyi, Kossmann, Stocker, ICDE’01] ◮ SFS [Chomicki, Godfrey, Gryz, Liang, ICDE’03] ◮ LESS [Godfrey, Shipley, Gryz, VLDB’05] ◮ LS [Morse, Patel, Jagadish, VLDB’07]

◮ Index-based solutions:

◮ Bitmap, Index [Tan, Eng, Ooi, VLDB’01] ◮ NN [Kossmann, Ramsak, Rost, VLDB’02] ◮ BBS [Papadias, Tao, Fu, Seeger, SIGMOD’03] ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ OPS, LCRS [Zhang, Mamoulis, Cheung, SIGMOD’09] ◮ BSkyTree [Lee, Hwang, EDBT’10]

VLDB 2011 4

slide-7
SLIDE 7

Partially-Ordered Domains

◮ Many data have partially-ordered domains:

◮ User preferences

Ferrari Audi Honda BMW Toyota Yugo

◮ Interval data (e.g., availability period, price range) ◮ Type/class hierarchies (e.g., categorical data) ◮ Set-valued domains (e.g., skill sets, hotel facilities)

VLDB 2011 5

slide-8
SLIDE 8

Our Work: ZINC

◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07]

◮ Index method for totally-ordered domains ◮ Outperforms BBS [Papadias, Tao, Fu, Seeger, SIGMOD’03]

VLDB 2011 6

slide-9
SLIDE 9

Our Work: ZINC

◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07]

◮ Index method for totally-ordered domains ◮ Outperforms BBS [Papadias, Tao, Fu, Seeger, SIGMOD’03]

◮ Related work

◮ SDC+ [Chan, Eng, Tan, SIGMOD’05] ◮ TSS [Sacharidis, Papadopoulos, Papadias, ICDE’09]

VLDB 2011 6

slide-10
SLIDE 10

Our Work: ZINC

◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07]

◮ Index method for totally-ordered domains ◮ Outperforms BBS [Papadias, Tao, Fu, Seeger, SIGMOD’03]

◮ Related work

◮ SDC+ [Chan, Eng, Tan, SIGMOD’05] ◮ TSS [Sacharidis, Papadopoulos, Papadias, ICDE’09] ◮ Recent technique: ⋆ CPS, SCL [Zhang, Mamoulis, Cheung, Kao, VLDB’10]

VLDB 2011 6

slide-11
SLIDE 11

ZB-tree

◮ Maps multi-dimensional data point to 1-dimensional

Z-address

◮ Z-address = Interleaved bitstring representation of attribute

values

◮ Example: (0,5) = (000,101) → 010001

◮ Index Z-addresses using B+-tree

VLDB 2011 7

slide-12
SLIDE 12

ZB-tree: Example

1 2 3 4 5 6 7 1 2 3 4 5 6 7 a b c d e f g h i x y

VLDB 2011 8

slide-13
SLIDE 13

ZB-tree: Example

1 2 3 4 5 6 7 1 2 3 4 5 6 7 a b c d e f g h i x y e

VLDB 2011 8

slide-14
SLIDE 14

ZB-tree: Example

Monotonic ordering property: if p dominates q, then p precedes q in Z-order 1 2 3 4 5 6 7 1 2 3 4 5 6 7 a b c d e f g h i x y e

VLDB 2011 8

slide-15
SLIDE 15

ZB-tree: Example

1 2 3 4 5 6 7 1 2 3 4 5 6 7 a b c d e f g h i x y e [a, d] [e, i] [a, a] [b, d] [e, g] [h, i] a b c d e f g h i

VLDB 2011 9

slide-16
SLIDE 16

Encoding Schemes for Partial Orders

◮ Given a partial order domain D, find the smallest set

S and an embedding f : D → 2S such that x dominates y iff f(x) ⊆ f(y)

◮ Many proposed heuristics:

◮ Ait-Kaci et al, ACM TOPLS 1989 ◮ Caseau, OOPSLA 1993 ◮ Krall, Vitek, Horspool, ECOOP 1997 ◮ etc

VLDB 2011 10

slide-17
SLIDE 17

ZINC: Nested Encoding Scheme

◮ ZINC = Z-order Indexing with Nested Code ◮ Key idea:

◮ Organize PO into nested layers of simpler POs ◮ Encode each value in PO as a concatenation of encodings

in simpler POs

VLDB 2011 11

slide-18
SLIDE 18

Example of Partial Order Reduction

a b i c e j d f k l g m n h

  • p

G0

VLDB 2011 12

slide-19
SLIDE 19

Example of Partial Order Reduction

a b i c e j d f k l R1 g m n R2 h

  • p

G0

VLDB 2011 12

slide-20
SLIDE 20

Example of Partial Order Reduction

a b i c e j d f k l R1 g m n R2 h

  • p

G0

A subset of nodes R in PO is a region if every node in R has the same dominance relationship wrt nodes outside of R

◮ if u ∈ R dominates v /

∈ R, then every

u′ ∈ R dominates v

◮ if v /

∈ R dominates u ∈ R, then v

dominates every u′ ∈ R

VLDB 2011 12

slide-21
SLIDE 21

Example of Partial Order Reduction

a b i c e j d f k l R1 g m n R2 h

  • p

G0

a b i j v2 v1 g h

  • p

G1

VLDB 2011 12

slide-22
SLIDE 22

Example of Partial Order Reduction

a b i c e j d f k l R1 g m n R2 h

  • p

G0

a b i j v2 v1 g h

  • R3

p

G1

VLDB 2011 12

slide-23
SLIDE 23

Example of Partial Order Reduction

a b i c e j d f k l R1 g m n R2 h

  • p

G0

a b i j v2 v1 g h

  • R3

p

G1

a v3 p

G2

VLDB 2011 12

slide-24
SLIDE 24

Example of Nested Encodings

a b i c e j d f k l R1 g m n R2 h

  • p

G0

a b i j v2 v1 g h

  • R3

p

G1

a v3 p

G2 Encode(a,G0) = Encode(a,G2) Encode(h,G0) = Encode(v3,G2) + Encode(h,R3) Encode(k,G0) = Encode(v3,G2) + Encode(v2,R3) + Encode(k,R2)

VLDB 2011 13

slide-25
SLIDE 25

Vertical Regions

A region R in a PO a vertical region if

◮ R = S0 ∪ ··· ∪ Sk, k ≥ 1, each Si is a total order, ◮ nodes from different total orders are incomparable ◮ R is maximal subgraph of PO that satisfies the above

properties

a b i c e j d f k l R g m n h

  • p

R = S0 ∪ S1 S0 = {c,d}, S1 = {e,f} Each v ∈ R is encoded by two components: (1) which Si contains v, and (2) rank of v within Si c = 00, d = 01, e = 10, f = 11

VLDB 2011 14

slide-26
SLIDE 26

Horizontal Regions

A region R in a PO is a horizontal region if

◮ R = S0 ∪ ··· ∪ Sk, k ≥ 1, ◮ the nodes within each Si are incomparable, ◮ u ∈ Si dominates v ∈ Sj if i < j, and ◮ R is maximal subgraph of PO that satisfies the above

properties

a b i c e j d f k l g m n R h

  • p

R = S0 ∪ S1 S0 = {k,l}, S1 = {m,n} Each v ∈ R is encoded by i if v ∈ Si k = 0, l = 0, m = 1, n = 1

VLDB 2011 15

slide-27
SLIDE 27

Regular & Irregular Regions

◮ A region R in a PO is a regular region if R is either a

vertical or horizontal region

◮ A region R in a PO is an irregular region if

◮ R is not a regular region, and ◮ R is a minimal subgraph of PO containing at least two

nodes

◮ Example of an irregular region:

a b c d e f R4

◮ Irregular regions are encoded using Compact

Hierarchical Encoding (CHE) [Caseau, OOPSLA 1993]

VLDB 2011 16

slide-28
SLIDE 28

Putting everything together

a b i

00 c 10

d j

01 c 11

e 0 k l R1 g 1 m 1 n R2 h

  • p

G0

a

000 b 100

i

101

j

110

v2

001 v1 010 g 011 h 111

  • R3

p

G1

00 a 01 v3 10 p

G2

Encode(a,G0) = Encode(a,G2) = 00 00000 Encode(h,G0) = Encode(v3,G2) + Encode(h,R3) = 01 011 00 Encode(k,G0) = Encode(v3,G2) + Encode(v2,R3) + Encode(k,R2) = 01 110 0 0

VLDB 2011 17

slide-29
SLIDE 29

Performance Comparison

10 20 30 40 50 (2,1) (3,1) (4,1) (2,2) (3,2) (4,2) Processing time (sec) (|TO|, |PO|) TSS TSS+ZB CHE+ZB ZINC

VLDB 2011 18

slide-30
SLIDE 30

Conclusion

◮ Presented a novel index method for computing

skyline queries on data with partially-ordered attribute domains

◮ ZINC = Z-order based indexing (ZB-tree) + Nested

encoding scheme

◮ Future work:

◮ ZINC vs CPS, SCL [Zhang, Mamoulis, Cheung, Kao,

VLDB’10]

◮ Other techniques?

VLDB 2011 19