Data Cube: A Relational Aggregation Operator Generalizing Group-By, - - PowerPoint PPT Presentation

data cube a relational aggregation operator generalizing
SMART_READER_LITE
LIVE PREVIEW

Data Cube: A Relational Aggregation Operator Generalizing Group-By, - - PowerPoint PPT Presentation

Data Analysis Analysis Operations The Data Cube Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Jim Gray Surajit Chaudhuri Adam Bosworth Andrew Layman Don Reichart Murali Venkatrao Frank


slide-1
SLIDE 1

Data Analysis Analysis Operations The Data Cube

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Jim Gray Surajit Chaudhuri Adam Bosworth Andrew Layman Don Reichart Murali Venkatrao Frank Pellow Hamid Pirahesh October 31+1, 2006 Presented by Michael Lawrence

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-2
SLIDE 2

Data Analysis Analysis Operations The Data Cube

L

A

T EX Beamer

L

AT

EX Beamer http://latex-beamer.sourceforge.net/

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-3
SLIDE 3

Data Analysis Analysis Operations The Data Cube

Outline

1

Data Analysis

2

Analysis Operations Histograms Roll-Up/Drill-Down Cross-Tab

3

The Data Cube The ALL Dummy Value The CUBE Operator

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-4
SLIDE 4

Data Analysis Analysis Operations The Data Cube

Data Analysis

Key features Dimensionality reduction Aggregation

Extract Q ? Analyze Formulate Visualize

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-5
SLIDE 5

Data Analysis Analysis Operations The Data Cube

Example Data

attrs (dims) measurements

  • Model

Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2002 Black 70 Golf 2002 Silver 100 Jetta 2001 Black 70 Jetta 2001 Silver 50 Jetta 2002 Black 60 Jetta 2002 Silver 80

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-6
SLIDE 6

Data Analysis Analysis Operations The Data Cube

Discussion 1: Multi-dimensional aggregation

How useful is multi-dimensional aggregation? Besides the data warehousing applications mentioned in the paper, can you think of any other applications?

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-7
SLIDE 7

Data Analysis Analysis Operations The Data Cube

Problems with SQL GROUP BY

Common analysis operations akward Histograms Roll-Up/Drill-Down Cross-Tabulations

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-8
SLIDE 8

Data Analysis Analysis Operations The Data Cube Histograms

Histograms

Categorize models by type Model Type Golf Hatchback Rabbit . . . Jetta Sedan Passat . . .

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-9
SLIDE 9

Data Analysis Analysis Operations The Data Cube Histograms

Histograms

Type Year Colour Num_Sold Hatchback 2001 Black 40 Hatchback 2001 Silver 65 . . . Sedan 2001 Black 70 Sedan 2001 Silver 50 . . . Problem: not directly supported by SQL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-10
SLIDE 10

Data Analysis Analysis Operations The Data Cube Histograms

Histograms

Type Year Colour Num_Sold Hatchback 2001 Black 40 Hatchback 2001 Silver 65 . . . Sedan 2001 Black 70 Sedan 2001 Silver 50 . . . Problem: not directly supported by SQL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-11
SLIDE 11

Data Analysis Analysis Operations The Data Cube Roll-Up/Drill-Down

Roll-Up/Drill-Down

View data at decreasing/increasing levels of granularity Model, Year, Colour Model, Year Model

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-12
SLIDE 12

Data Analysis Analysis Operations The Data Cube Roll-Up/Drill-Down

Roll-Up/Drill-Down (1)

Model Year Colour Num_MCY Num_MY Num_M Golf 2001 Black 40 Silver 65 105 2002 Black 70 Silver 100 170 275 Problem: not relational

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-13
SLIDE 13

Data Analysis Analysis Operations The Data Cube Roll-Up/Drill-Down

Roll-Up/Drill-Down (1)

Model Year Colour Num_MCY Num_MY Num_M Golf 2001 Black 40 Silver 65 105 2002 Black 70 Silver 100 170 275 Problem: not relational

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-14
SLIDE 14

Data Analysis Analysis Operations The Data Cube Roll-Up/Drill-Down

Roll-Up/Drill-Down (2)

Chris Date (1996) Model Year Colour Num_Sold Num_MY Num_M Golf 2001 Black 40 105 275 Golf 2001 Silver 65 105 275 Golf 2002 Black 70 170 275 Golf 2002 Silver 100 170 275 Problem: 2D columns

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-15
SLIDE 15

Data Analysis Analysis Operations The Data Cube Roll-Up/Drill-Down

Roll-Up/Drill-Down (2)

Chris Date (1996) Model Year Colour Num_Sold Num_MY Num_M Golf 2001 Black 40 105 275 Golf 2001 Silver 65 105 275 Golf 2002 Black 70 170 275 Golf 2002 Silver 100 170 275 Problem: 2D columns

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-16
SLIDE 16

Data Analysis Analysis Operations The Data Cube Cross-Tab

Cross-Tab

Roll-Up

Model, Year, Colour Model, Year Model

Cross Tab

Model, Year, Colour Model, Year Model, Colour Model

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-17
SLIDE 17

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-18
SLIDE 18

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Model, Colour, Year

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-19
SLIDE 19

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Model, Year

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-20
SLIDE 20

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Colour, Year

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-21
SLIDE 21

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Model

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-22
SLIDE 22

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Year

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-23
SLIDE 23

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

Total

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-24
SLIDE 24

Data Analysis Analysis Operations The Data Cube Cross-Tab

Pivot Table (Excel)

2001 2002 Model Black Silver 2001 Total Black Silver 2002 Total Grand Total Golf 40 65 105 70 100 170 275 Jetta 70 50 120 60 80 140 260 Total 110 115 225 130 180 310 535

Problem: N × M values

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-25
SLIDE 25

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Solution: Overload Column Values

Add special ALL value Model Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2001 ALL 105 Golf 2002 Black 70 Golf 2002 Silver 100 Golf 2002 ALL 175 Golf ALL ALL 275 SQL: UNION of GROUP-BYs on ALL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-26
SLIDE 26

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Solution: Overload Column Values

Add special ALL value Model Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2001 ALL 105 Golf 2002 Black 70 Golf 2002 Silver 100 Golf 2002 ALL 175 Golf ALL ALL 275 SQL: UNION of GROUP-BYs on ALL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-27
SLIDE 27

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Solution: Overload Column Values

Add special ALL value Model Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2001 ALL 105 Golf 2002 Black 70 Golf 2002 Silver 100 Golf 2002 ALL 175 Golf ALL ALL 275 SQL: UNION of GROUP-BYs on ALL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-28
SLIDE 28

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Solution: Overload Column Values

Add special ALL value Model Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2001 ALL 105 Golf 2002 Black 70 Golf 2002 Silver 100 Golf 2002 ALL 175 Golf ALL ALL 275 SQL: UNION of GROUP-BYs on ALL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-29
SLIDE 29

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Solution: Overload Column Values

Add special ALL value Model Year Colour Num_Sold Golf 2001 Black 40 Golf 2001 Silver 65 Golf 2001 ALL 105 Golf 2002 Black 70 Golf 2002 Silver 100 Golf 2002 ALL 175 Golf ALL ALL 275 SQL: UNION of GROUP-BYs on ALL

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-30
SLIDE 30

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Is ALL Enough?

2D UNIONs for Roll-Up/Drill-Down 2D UNIONs for Cross-Tab Awkward Inefficient

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-31
SLIDE 31

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Is ALL Enough?

2D UNIONs for Roll-Up/Drill-Down 2D UNIONs for Cross-Tab Awkward Inefficient

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-32
SLIDE 32

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Is ALL Enough?

2D UNIONs for Roll-Up/Drill-Down 2D UNIONs for Cross-Tab Awkward Inefficient

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-33
SLIDE 33

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Is ALL Enough?

2D UNIONs for Roll-Up/Drill-Down 2D UNIONs for Cross-Tab Awkward Inefficient

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-34
SLIDE 34

Data Analysis Analysis Operations The Data Cube The ALL Dummy Value

Is ALL Enough?

2D UNIONs for Roll-Up/Drill-Down 2D UNIONs for Cross-Tab Awkward Inefficient

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-35
SLIDE 35

Data Analysis Analysis Operations The Data Cube The CUBE Operator

Discussion 2: The CUBE Operator

How hard did you find it to understand the CUBE operator? As a query writer, would you feel comfortable using it? Or, would you rather use the "solutions" described in the previous slides?

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-36
SLIDE 36

Data Analysis Analysis Operations The Data Cube The CUBE Operator

The CUBE Operator

SELECT <attribute-list> <aggregated-measurements> FROM <table> WHERE <selection-cons> GROUP BY CUBE <attribute-list> Generates equivalent result as UNION of GROUP BY on all subsets of <attribute-list> using ALL.

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-37
SLIDE 37

Data Analysis Analysis Operations The Data Cube The CUBE Operator

The CUBE Operator

SELECT <attribute-list> <aggregated-measurements> FROM <table> WHERE <selection-cons> GROUP BY CUBE <attribute-list> Generates equivalent result as UNION of GROUP BY on all subsets of <attribute-list> using ALL.

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-38
SLIDE 38

Data Analysis Analysis Operations The Data Cube The CUBE Operator

The CUBE Operator

Sum Black Silver Colour Model Model, Year Year Model, Colour Colour, Year Jetta Golf 2001 2002

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

slide-39
SLIDE 39

Data Analysis Analysis Operations The Data Cube The CUBE Operator

Discussion 3: Overloading NULL

The authors intended to reduce special-case handling code by “overloading” NULL with the “ALL ” value. Do you think overloading NULL is a good idea? Does this actually reduce special-case handling code, or is special-case handling code still required, but even more difficult to deal with because of the overloading?

Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals