Quotient Cube: How to Summarize the Semantics of a Data Cube Laks - - PowerPoint PPT Presentation

quotient cube how to summarize the semantics of a data
SMART_READER_LITE
LIVE PREVIEW

Quotient Cube: How to Summarize the Semantics of a Data Cube Laks - - PowerPoint PPT Presentation

Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign) + * The work is partially supported


slide-1
SLIDE 1

Quotient Cube: How to Summarize the Semantics of a Data Cube

Laks V.S. Lakshmanan (Univ. of British Columbia)* Jian Pei (State Univ. of New York at Buffalo)* Jiawei Han (Univ. of Illinois at Urbana-Champaign)+

* The work is partially supported by NSERC and NCE/IRIS

+ The work is partially supported by NSF, UI, and Microsoft Research

slide-2
SLIDE 2

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 2

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-3
SLIDE 3

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 3

Data Cube

Base table

9 Fall P1 S2 12 Spring P2 S1 6 Spring P1 S1 Measure Dimensions 9 * * * … … … … 9 Spring * S1 AVG(Sales) Season Product Store 9 Fall P1 S2 12 Spring P2 S1 6 Spring P1 S1 Measure Dimensions Sales Season Product Store

Aggregation

slide-4
SLIDE 4

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 4

Previous Work: Efficient Cube Computation

  • Compute a cube from a base table: e.g. (Agarwal et
  • al. 98), (Zhao et al. 97)
  • View materialization with space constraint: e.g.

Harinarayann et al. 96

  • Handling scarcity (Ross & Srivastava 97)
  • Cube compression: e.g. (Sismanis et al. 02),

(Shanmugasundaram et al. 99), (Want et al. 02)

  • Approximation: e.g. (Barbara & Sullivan 97), (Barbara

& Xu 00), (Vitter et al. 98)

  • Constrained cube construction: e.g. (Beyer &

Ramakrishnan 99)

slide-5
SLIDE 5

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 5

Previous Work: Extracting Semantics From Cubes

  • General contexts of patterns (Sathe &

Sarawagi 01)

  • Generalize association rules (Imielinski

et al. 00)

  • Cube gradient analysis (Dong et al. 01)
slide-6
SLIDE 6

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 6

Cube (Cell) Lattice

  • Many cells have same aggregate values
  • Can we summarize the semantics of the

cube by grouping cells by aggregate values?

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-7
SLIDE 7

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 7

A Naïve Attempt

  • Put all cells having same aggregate

value in a class

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

C1 C2 C3 C4

slide-8
SLIDE 8

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 8

Problems w/ the Naïve Attempt

  • The result is not a lattice anymore!

– Anomaly – The rollup/drilldown semantics is lost

C1 C2 C3 C4

3 4 3 C C C

rollup rollup

  →    → 

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-9
SLIDE 9

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 9

A Better Partitioning

  • Quotient cube: partitioning reserving the

rollup/drilldown semantics

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

C1 C3 C5 C4 C2

slide-10
SLIDE 10

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 10

Problem Statement

  • Given a cube, characterize a good way

(quotient cube) of partitioning its cells into classes such that

– The partition generates a reduced lattice preserving the rollup/drilldown semantics – The partition is optimal: # classes as small as possible

  • Compute quotient cubes efficiently
slide-11
SLIDE 11

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 11

Why A Quotient Cube Useful?

  • Semantic compression
  • Semantic OLAP browsing

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*)(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

C1 C2 C5 C4 C3

slide-12
SLIDE 12

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 12

Why A Quotient Cube Useful?

  • Semantic compression
  • Semantic OLAP browsing

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9(S1,P1,*):6(*,P1,s):6(S1,P2,*):12(*,P2,s):12(S2,*,f):9 (S2,P1,*)(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

C1 C2 C5 C4

(S2,P1,f):9 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (*,*,f):9 (S2,*,*):9

slide-13
SLIDE 13

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 13

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-14
SLIDE 14

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 14

Convex Partitions

  • A convex partition retains semantics

CLS c CLS c c c c c

rollup rollup

∈ ⇒ ∈   →    → 

2 3 1 3 2 1

, ,

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

C1 C3 C5 C4 C2

slide-15
SLIDE 15

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 15

A Non-convex Partition

  • Anomaly
  • The rollup/drilldown semantics is lost

C1 C2 C3 C4

3 4 3 C C C

rollup rollup

  →    → 

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-16
SLIDE 16

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 16

Connected Partitions

  • Cells c1 and c2 are connected if a

series of rollup/drilldown operation starting from c1 can touch c2

  • Intuitively, (each class of) a partition

should be connected

slide-17
SLIDE 17

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 17

Cover Partition

  • For a cell c, a tuple t in base table is in

c’s cover if t can be rolled up to c

– E.g., Cov(S1,*,spring)={(S1,P1,spring), (S1,P2,spring)}

9 Fall P1 S2 12 Spring P2 S1 6 Spring P1 S1 Measure Dimensions Sales Season Product Store

slide-18
SLIDE 18

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 18

Cover Partitions Are Convex

  • All cells having the same cover are in a class
  • (S1,P2,s) and (*,P2,*) cover same tuples in

the base table (S1,P2,*) and (*,P2,s) are in the same class.

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-19
SLIDE 19

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 19

Cover Partitions Are Connected

  • Cells c1 and c2 have the same cover there

must be some common ancestor c3 of c1 and c2 st c3 has the same cover

– Cells c1 and c2 are in the same class and connected

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-20
SLIDE 20

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 20

Cover Partitions & Aggregates

  • All cells in a cover partition carry the

same aggregate value w.r.t. any aggregate function

– But cells in a class of MIN() may have different covers

  • For COUNT() and SUM() (positive),

cover equivalence coincides with aggregate equivalence

slide-21
SLIDE 21

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 21

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-22
SLIDE 22

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 22

Class 1 = Class 2 Class 1

Weak Congruence

  • Weak congruence preserves semantics

Class 2 c c’ d d’ rollup rollup c c’ d d’ rollup rollup imply

slide-23
SLIDE 23

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 23

Weak Congruence = Convex

  • Convex ⇔ no “hole” in the class ⇔

weak congruence

  • They preserve the rollup/drilldown

semantics

  • Quotient cube lattice is the lattice of

convex classes

  • How to derive the coarsest quotient

cube?

slide-24
SLIDE 24

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 24

Monotone Aggregate Functions

  • Monotone functions

– S ⊆ T f(S) ≥ f(T) – S ⊆ T f(S) ≤ f(T) – MIN(), MAX(), COUNT(), PSUM(), …

  • The aggregate function f is monotone

≡f is the unique coarsest partition

– MIN(): put all cells having the same MIN() value into a class

slide-25
SLIDE 25

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 25

Non-monotone Functions

  • Bad news: ≡f may or may not be a

convex/weak congruence.

  • Good news: cover partition is convex

(I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function! ☺

slide-26
SLIDE 26

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 26

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-27
SLIDE 27

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 27

How to Compute A QC

  • Aggregate functions

– Monotone functions – Non-monotone functions

  • Settings

– The cube is available – Only the base table is available

slide-28
SLIDE 28

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 28

Monotone Functions

  • The cube is available grab all cells

with the same aggregate value and put them into a class

  • Only the base table is available

bottom-up, depth-first search

– For a cell, compute its cover, find the upper bound having the same aggregate value – Group lower bounds by upper bounds

slide-29
SLIDE 29

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 29

Example: Cover QC

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-30
SLIDE 30

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 30

Non-monotone Functions

  • Class merging
  • Find cover partition classes
  • Merge classes as long as convexity is

retained

slide-31
SLIDE 31

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 31

Example: AVG QC

(S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9

slide-32
SLIDE 32

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 32

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-33
SLIDE 33

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 33

Reduction Ratio vs. Dimensionality

10 20 30 40 50 60 70 80 90 100 2 3 4 5 6 7 8 9 10 Reduction ratio (% ) Dimensionality MinCube QC_Cov QC_MIN

# base tuples = 200k Zipf factor = 2.0

slide-34
SLIDE 34

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 34

Reduction Ratio vs. Zipf Factor

10 20 30 40 50 60 0.5 1 1.5 2 2.5 3 R e d u c tio n ra tio ( % ) Zipf factor MinCube QC_Cov QC_MIN

# base tuples = 200k # dimensions = 6

slide-35
SLIDE 35

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 35

Reduction Ratio vs. Base Table Size

10 20 30 40 50 60 70 80 200 400 600 800 1000 1200 1400 R e d u c tio n ra tio ( % ) Number of tuples (k) MinCube QC_Cov QC_MIN

Zipf factor = 2.0 # dimensions = 6

slide-36
SLIDE 36

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 36

Runtime

500 1000 1500 2000 2500 3000 200 400 600 800 1000 1200 1400 R u n tim e (s e c

  • n

d s ) Number of tuples (k) MinCube QC_Cov QC_MIN BUC

Zipf factor = 2.0 # dimensions = 6

slide-37
SLIDE 37

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 37

Compression Ratio on Weather Data Set

10 20 30 40 50 60 70 80 90 100 2 3 4 5 6 7 Reduction ratio (%) Number of dimensions QC_Cov QC_AVG

10 20 30 40 50 60 2 3 4 5 6 7 8 9 Reduction ratio (%) Number of dimensions MinCube QC_Cov

slide-38
SLIDE 38

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 38

Outline

  • Introduction and motivation
  • Cube lattice partitions
  • Semantics preserving partitions
  • Algorithms
  • Experimental results
  • Discussion and summary
slide-39
SLIDE 39

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 39

Semantic Cube Exploration

  • Theoretical foundation for semantic

summarization in data cube

– concept and properties of quotient cubes

  • Efficient algorithms for quotient cube

construction

– Quotient cubes can be computed directly from base tables

slide-40
SLIDE 40

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 40

Ongoing Research

  • Efficient implementation of quotient

cube-based OLAP system

– Data warehouse built using quotient cubes

  • Hierarchies and constraints
  • Incremental maintenance
  • Semantics based OLAP and mining
  • Efficient query answering
slide-41
SLIDE 41

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 41

References (1)

  • R. Agrawal and R. Srikant. Fast Algorithms for Mining

Association Rules in Large Databases. VLDB 1994

  • S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F.

Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996.

  • D. Barbara and M. Sullivan. Quasi-cubes: Exploiting

approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997.

  • D. Barbara and X. Wu. Using loglinear models to compress
  • datacube. In WAIM'2000}, pages 311--322, 2000.
  • K. Beyer and R. Ramakrishnan. Bottom-up computation of

sparse and iceberg cubes. In SIGMOD'99.

slide-42
SLIDE 42

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 42

Reference (2)

  • G. Birkhoff, Lattice Theory, 2nd edition, New York, American

Mathematical Society (Colloquium Publications, vol. 25), 1948.

  • S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative

prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99.

  • Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh.

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96.

  • C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data

cubes using covering codes. In PODS'97.

  • J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of

Iceberg Cubes with Complex Measures. In SIGMOD'01.

slide-43
SLIDE 43

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 43

Reference (3)

  • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing

data cubes efficiently. In SIGMOD'96.

  • T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades:

Generalizing Association Rules. Technical Report, Rutgers University, August 2000.

  • H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and

Pattern Extraction with Fascicles. VLDB'99.

  • K. Ross and D. Srivastava. Fast computation of sparse
  • datacubes. In VLDB'97.
  • G. Sathe and S. Sarawagi. Intelligent Rollups in

Multidimensional OLAP Data. VLDB'01.

slide-44
SLIDE 44

Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube 44

Reference (4)

  • J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley.

Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99.

  • J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation

and historgrams via wavelets. In CIKM'98.

  • W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An

effective approach to reducing data cube size. In ICDE'02.

  • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based

algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.

  • G.K. Zipf. Human Behavior and The Principle of Least Effort

Addison-Wesley, 1949.