September 23, 2002
KAIST
1
VLDB 2002 A One-Pass Aggregation Algorithm with the Optimal Buffer - - PowerPoint PPT Presentation
VLDB 2002 A One-Pass Aggregation Algorithm with the Optimal Buffer Size in Multidimensional OLAP September 23, 2002 Young-Koo Lee, Kyu-Young Whang, Yang-Sae Moon, and Il-Yeol Song Department of Computer Science and Advanced Information
September 23, 2002
KAIST
1
September 23, 2002
KAIST
2
September 23, 2002
KAIST
3
40,000 30,000 20,000 10,000
Y e a r Age Sales (values in cells)
10 20 30 40 50 1998
10 20 30 50 40 30 70 20 20 60
1999 2000 2001
September 23, 2002
KAIST
4
September 23, 2002
KAIST
5
September 23, 2002
KAIST
6
(a) accessing in the unit of cells
1 8 3 2 4 6 3 2 7 1 2 4 2 3 4 3 2 2 3 4 1 3 8 4 1 2 2 1 9 1 y0 y1 y2 y3 y4 y5 y6 y7 2 8 1 2 3 2 5 1 1 1 2 x0 x1 x2 x3 x4 x5 x6 x7 x8 y8
(b) accessing in the unit of pages X Y
1 8 3 2 4 6 3 2 7 1 2 4 2 3 4 3 2 2 3 4 1 3 8 4 1 2 2 1 9 1 y0 y1 y2 y3 y4 y5 y6 y7 2 8 1 2 3 2 5 1 1 1 2 x0 x1 x2 x3 x4 x5 x6 x7 x8 y8
: cell : page region
X Y
September 23, 2002
KAIST
7
September 23, 2002
KAIST
8
E C B A D F
0 50 75 99 99 50 9 9 5
Aggregation of Z values for each pair of X and Y values in a three dimensional space : aggregation windows
W1 W2 W3 W4
September 23, 2002
KAIST
9
E C B A D F
0 50 75 99 99 50 9 9 5
Aggregation of Z values for each pair of X and Y values in a three dimensional space : aggregation windows
W1 W2 W3 W4
September 23, 2002
KAIST
10
September 23, 2002
KAIST
11
September 23, 2002
KAIST
12
September 23, 2002
KAIST
13
September 23, 2002
KAIST
14
ΠGF E C B A D F A C B E D ΠGA ΠGB ΠGE ΠGB ΠGC ΠGD ΠGE
X Y Z
X, Y, Z
attributes G = {X, Y}
ΠGA ΠGC ΠGD
(a) A DIP. (b) A non-DIP. ΠGA and ΠGD (also ΠGA and ΠGE ) do not satisfy the disjoint-inclusive relationship
September 23, 2002
KAIST
15
September 23, 2002
KAIST
16
September 23, 2002
KAIST
17
September 23, 2002
KAIST
18
R1=ΠGA R2=ΠGB R3= ΠGC R5=ΠGE X R4= ΠGD R6=ΠGF Y (b) Aggregation windows (Wi’s). X Y W3 W1 W4 W2 (a) Page grouping regions (Ri’s). W3 W1 W4 W2 W3 W1 W4 W2 X Y X Y (c) An ISFCR∪W. (d) A non-ISFCR∪W. ISFCR∪W order:
R1(W1W2) R6(W4(R3R4) W3(R2)) Traversing order for Wi’s: W1W2W4W3
R1 R2 R3 R5 R4 R6 R1 R2 R3 R5 R4 R6
September 23, 2002
KAIST
19
September 23, 2002
KAIST
20
September 23, 2002
KAIST
21
Algorithm One_Pass_Aggregation Input: (1) DIP multidimensional file md-file that contains OLAP data (2) Set G of grouping attributes (3) Aggregated attribute A Output: Result of aggregation 1 Partition the grouping domain space into aggregation windows so that aggregation windows and page grouping regions satisfy the disjoint-inclusive relationship. 2 Initialize the buffer:
2.1 Compute the one-pass buffer size, BUFSIZE= 2.2 Allocate the buffer of size BUFSIZE
3 Traversing aggregation windows in ISFCR∪W order, for each aggregation window Wcurr, DO
3.1 Construct a range query. Here the query region consists of the intervals corresponding to the aggregation window for the grouping attributes and the entire domain of each attribute for the other organizing attributes. 3.2 Process the range query against md-file 3.2.1 While evaluating the range query, when processing the current page Pcurr is completed, if ISFCR∪W (ΠG Pcurr) ≤ ISFCR∪W ( ΠG Wcurr), then remove Pcurr from the buffer. 3.2.2 For each record retrieved, using the values of the attributes in G, find the corresponding entry from the window result table, aggregate the value of the attribute A, and store the result into the entry.
{(the number of L-pages of ) }
i
W i i
W α +
~
September 23, 2002
KAIST
22
the number of L-pages of {( ) }
i
W i i
W α +
September 23, 2002
KAIST
23
September 23, 2002
KAIST
24
September 23, 2002
KAIST
25
September 23, 2002
KAIST
26
September 23, 2002
KAIST
27
September 23, 2002
KAIST
28
September 23, 2002
KAIST
29
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 10 20 30 40 50 60 70 80 90 100 110
buffer size in pages normalized I/O access Naive_Aggregation DIP_Aggregation ISFC_Aggregation One_Pass_Aggregation
buffer size (=94)
September 23, 2002
KAIST
30
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 10 20 30 40 50 60 70 80 90 100 110
buffer size in pages normalized I/O accesses Naive_Aggregation DIP_Aggregation ISFC_Aggregation One Pass Aggregation
buffer size (=49)
September 23, 2002
KAIST
31
(a) SMALL-DATA (b) MEDIUM-DATA
0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%
window result table size as the ratio to the database size normalized memory size
0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%
window result table size as the ratio to the database size
normalized memory size
window result table size
memory requirement
September 23, 2002
KAIST
32 (c) LARGE-DATA (d) REAL-DATA
0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5%
window result table size as the ratio to the database size
normalized memory size
0.0% 0.2% 0.4% 0.6% 0.8% 1.0% 1.2% 1.4% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%
window result table size as the ratio to the database size normalized memory size
window result table size
memory requirement
September 23, 2002
KAIST
33