zinc efficient indexing for skyline computation
play

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong - PowerPoint PPT Presentation

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore Skyline Queries Skyline points that are not dominated by other points wrt a set of dimensions


  1. ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore

  2. Skyline Queries ◮ Skyline – points that are not dominated by other points wrt a set of dimensions ◮ Point x dominates point y if (1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension ◮ Example : Find used cars that are cheap and have low mileage Price A E B I J F H C G D Mileage VLDB 2011 2

  3. Skyline Queries ◮ Skyline – points that are not dominated by other points wrt a set of dimensions ◮ Point x dominates point y if (1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension ◮ Example : Find used cars that are cheap and have low mileage Price A E B I J F H C G D Mileage VLDB 2011 2

  4. Simple Evaluation Algorithm Input : set of data points P Output : set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if ( p is not dominated by any point in S ) then delete each s ∈ S if p dominates s insert p into S return S VLDB 2011 3

  5. Simple Evaluation Algorithm Input : set of data points P Output : set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if ( p is not dominated by any point in S ) then delete each s ∈ S if p dominates s insert p into S return S Drawbacks: ◮ Need to scan entire data set ◮ Performs many dominance comparisons ◮ Non-progressive VLDB 2011 3

  6. Processing Skyline Queries ◮ Scan-based solutions: ◮ BNL, D&C [ Börzsönyi, Kossmann, Stocker, ICDE’01 ] ◮ SFS [ Chomicki, Godfrey, Gryz, Liang, ICDE’03 ] ◮ LESS [ Godfrey, Shipley, Gryz, VLDB’05 ] ◮ LS [ Morse, Patel, Jagadish, VLDB’07 ] ◮ Index-based solutions: ◮ Bitmap, Index [ Tan, Eng, Ooi, VLDB’01 ] ◮ NN [ Kossmann, Ramsak, Rost, VLDB’02 ] ◮ BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ ZB-tree [ Lee, Zheng, Li, Lee, VLDB’07 ] ◮ OPS, LCRS [ Zhang, Mamoulis, Cheung, SIGMOD’09 ] ◮ BSkyTree [ Lee, Hwang, EDBT’10 ] VLDB 2011 4

  7. Partially-Ordered Domains ◮ Many data have partially-ordered domains: ◮ User preferences Ferrari Audi Honda Toyota BMW Yugo ◮ Interval data (e.g., availability period, price range) ◮ Type/class hierarchies (e.g., categorical data) ◮ Set-valued domains (e.g., skill sets, hotel facilities) VLDB 2011 5

  8. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] VLDB 2011 6

  9. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ Related work ◮ SDC + [ Chan, Eng, Tan, SIGMOD’05 ] ◮ TSS [ Sacharidis, Papadopoulos, Papadias, ICDE’09 ] VLDB 2011 6

  10. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ Related work ◮ SDC + [ Chan, Eng, Tan, SIGMOD’05 ] ◮ TSS [ Sacharidis, Papadopoulos, Papadias, ICDE’09 ] ◮ Recent technique: ⋆ CPS, SCL [ Zhang, Mamoulis, Cheung, Kao, VLDB’10 ] VLDB 2011 6

  11. ZB-tree ◮ Maps multi-dimensional data point to 1-dimensional Z-address ◮ Z-address = Interleaved bitstring representation of attribute values ◮ Example: (0,5) = (000,101) → 010001 ◮ Index Z-addresses using B + -tree VLDB 2011 7

  12. ZB-tree: Example y 7 d 6 b i 5 h c 4 a 3 g e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  13. ZB-tree: Example y 7 d 6 b i 5 h c 4 a 3 g e e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  14. ZB-tree: Example Monotonic ordering property: if p dominates q, then p precedes q in Z-order y 7 d 6 b i 5 h c 4 a 3 g e e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  15. ZB-tree: Example y 7 d 6 b i [a, d] [e, i] 5 h [a, a] [b, d] [e, g] [h, i] c 4 a b c d e f g h i 3 a g 2 e e 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 9

  16. Encoding Schemes for Partial Orders ◮ Given a partial order domain D , find the smallest set S and an embedding f : D → 2 S such that x dominates y iff f ( x ) ⊆ f ( y ) ◮ Many proposed heuristics: ◮ Ait-Kaci et al, ACM TOPLS 1989 ◮ Caseau, OOPSLA 1993 ◮ Krall, Vitek, Horspool, ECOOP 1997 ◮ etc VLDB 2011 10

  17. ZINC: Nested Encoding Scheme ◮ ZINC = Z-order Indexing with Nested Code ◮ Key idea : ◮ Organize PO into nested layers of simpler POs ◮ Encode each value in PO as a concatenation of encodings in simpler POs VLDB 2011 11

  18. Example of Partial Order Reduction a b i c e j d f k l g m n o h p G 0 VLDB 2011 12

  19. Example of Partial Order Reduction a b i R 1 c e j R 2 d f k l g m n o h p G 0 VLDB 2011 12

  20. Example of Partial Order Reduction a b i R 1 A subset of nodes R in PO is a region if every node in R has the same dominance relationship c e j R 2 wrt nodes outside of R d f k l ◮ if u ∈ R dominates v / ∈ R , then every u ′ ∈ R dominates v g m n ◮ if v / ∈ R dominates u ∈ R , then v dominates every u ′ ∈ R o h p G 0 VLDB 2011 12

  21. Example of Partial Order Reduction a a b i b i R 1 c e j j R 2 v 1 d f k l v 2 g m n g o h o h p p G 0 G 1 VLDB 2011 12

  22. Example of Partial Order Reduction a a R 3 b i b i R 1 c e j j R 2 v 1 d f k l v 2 g m n g o h o h p p G 0 G 1 VLDB 2011 12

  23. Example of Partial Order Reduction a a R 3 a b i b i R 1 c e j j R 2 v 1 d f k l v 3 v 2 g m n g o h o h p p p G 0 G 1 G 2 VLDB 2011 12

  24. Example of Nested Encodings G 0 G 1 G 2 a a a R 3 b i b i R 1 j j c e R 2 v 1 v 3 d f k l v 2 g g m n o o h h p p p Encode( a , G 0 ) = Encode( a , G 2 ) Encode( h , G 0 ) = Encode( v 3 , G 2 ) + Encode( h , R 3 ) Encode( k , G 0 ) = Encode( v 3 , G 2 ) + Encode( v 2 , R 3 ) + Encode( k , R 2 ) VLDB 2011 13

  25. Vertical Regions A region R in a PO a vertical region if ◮ R = S 0 ∪ ··· ∪ S k , k ≥ 1, each S i is a total order, ◮ nodes from different total orders are incomparable ◮ R is maximal subgraph of PO that satisfies the above properties R = S 0 ∪ S 1 a S 0 = { c , d } , S 1 = { e , f } b i R c e j Each v ∈ R is encoded by two components: (1) which S i contains d f k l v , and (2) rank of v within S i g m n c = 00 , d = 01 , e = 10 , f = 11 h o p VLDB 2011 14

  26. Horizontal Regions A region R in a PO is a horizontal region if ◮ R = S 0 ∪ ··· ∪ S k , k ≥ 1, ◮ the nodes within each S i are incomparable, ◮ u ∈ S i dominates v ∈ S j if i < j , and ◮ R is maximal subgraph of PO that satisfies the above properties R = S 0 ∪ S 1 a S 0 = { k , l } , S 1 = { m , n } b i c e j R Each v ∈ R is encoded by i if d f k l v ∈ S i g m n k = 0 , l = 0 , m = 1 , n = 1 o h p VLDB 2011 15

  27. Regular & Irregular Regions ◮ A region R in a PO is a regular region if R is either a vertical or horizontal region ◮ A region R in a PO is an irregular region if ◮ R is not a regular region, and ◮ R is a minimal subgraph of PO containing at least two nodes ◮ Example of an irregular region: a R 4 b c e d f ◮ Irregular regions are encoded using Compact Hierarchical Encoding (CHE) [Caseau, OOPSLA 1993] VLDB 2011 16

  28. Putting everything together G 0 a G 1 G 2 a 00 a R 3 b i R 1 000 b i 100 00 c j d 10 R 2 j 101 001 v 1 01 c e 0 k l 0 11 01 v 3 v 2 110 g 1 m n 1 010 g o h 011 h o 111 p p p 10 Encode( a , G 0 ) = Encode( a , G 2 ) = 00 00000 Encode( h , G 0 ) = Encode( v 3 , G 2 ) + Encode( h , R 3 ) = 01 011 00 Encode( k , G 0 ) = Encode( v 3 , G 2 ) + Encode( v 2 , R 3 ) + Encode( k , R 2 ) = 01 110 0 0 VLDB 2011 17

  29. Performance Comparison 50 TSS TSS+ZB 40 Processing time (sec) CHE+ZB ZINC� 30 20 10 0 (2,1) (3,1) (4,1) (2,2) (3,2) (4,2) (|TO|, |PO|) VLDB 2011 18

  30. Conclusion ◮ Presented a novel index method for computing skyline queries on data with partially-ordered attribute domains ◮ ZINC = Z-order based indexing (ZB-tree) + Nested encoding scheme ◮ Future work: ◮ ZINC vs CPS, SCL [ Zhang, Mamoulis, Cheung, Kao, VLDB’10 ] ◮ Other techniques? VLDB 2011 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend