Incremental Maintenance for Non-Distributive Aggregate Functions - - PowerPoint PPT Presentation
Incremental Maintenance for Non-Distributive Aggregate Functions - - PowerPoint PPT Presentation
Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center Motivation large amounts of data stored in databases
VLDB, Aug 2002 Themis Palpanas - U of Toronto 2
Motivation
large amounts of data stored in databases
expensive OLAP queries, but with nice properties:
based on same set of tables perform similar aggregations
can efficiently support such queries with
Automatic Summary Tables (ASTs)
precomputed once, used many times answer complex queries fast
must maintain ASTs when base tables change
inserts, updates, deletes
3
Motivation (cont’d)
base tables AST insert/update/delete AST definition
VLDB, Aug 2002 Themis Palpanas - U of Toronto 4
Problem Statement
given ASTs with aggregate functions
distributive
SUM, COUNT
non-distributive
STDDEV, CORRELATION, REGRESSION,
MIN/MAX, XMLAGG, …
when base tables change
incrementally maintain affected ASTs
efficient maintenance of ASTs with non-distributive aggregate functions
VLDB, Aug 2002 Themis Palpanas - U of Toronto 5
Outline
Current Approach Our Solution Experimental Evaluation Related Work Conclusions
6
AST
Current Approach
base tables insert/update/delete delta combine
- ld and
new values AST definition Propagate phase Apply phase
VLDB, Aug 2002 Themis Palpanas - U of Toronto 7
Current Approach (cont’d)
works for distributive
SUM, COUNT
does not work for non-distributive
STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG
need new way to deal with these functions
VLDB, Aug 2002 Themis Palpanas - U of Toronto 8
Our Solution
selective recomputation
no longer enough to compute delta must recompute some aggregation groups
minimize work to be done
choose which groups to recompute
- ptimize query plan
9
Our Solution (cont’d)
AST base tables insert/update/delete delta recompute affected groups combine
- ld and
new values Propagate phase AST definition Apply phase
VLDB, Aug 2002 Themis Palpanas - U of Toronto 10
Our Solution (cont’d)
the 5 steps
1.
compute new aggregate values
2.
change column derivation
3.
recompute only affected groups
4.
eliminate unnecessary operations
5.
- ptimize for special cases
VLDB, Aug 2002 Themis Palpanas - U of Toronto 11
Initial Query Plan
prop UDI LOJ AST
Query Graph Model (QGM)
VLDB, Aug 2002 Themis Palpanas - U of Toronto 12
- 1. Compute New Aggregate Values
compute delta for distributive functions recompute non- distributive functions get those values only for affected groups duplicate computation for distributive functions!
prop UDI LOJ AST AST LOJ
VLDB, Aug 2002 Themis Palpanas - U of Toronto 13
- 2. Change Column Derivation
change column derivation rewrite phase projects
- ut unused columns
entire AST gets recomputed!
prop UDI LOJ AST AST LOJ
non-distributive
- nly
distributive
- nly
VLDB, Aug 2002 Themis Palpanas - U of Toronto 14
- 3. Recompute Affected Groups
push join predicate down in AST
- nly affected groups
are recomputed
special rules for super-aggregates
prop UDI LOJ AST AST* LOJ
non-distributive
- nly
distributive
- nly
T1 Tk … J J
VLDB, Aug 2002 Themis Palpanas - U of Toronto 15
- 3. Recompute Affected Groups
special treatment for ASTs with super-aggregates
predicates not pushdownable caution not to compute totals of totals
build special join predicate
ensure correct aggregations
change rewrite rules
allow predicate pushdown through super aggregates applicable only for special join predicate
VLDB, Aug 2002 Themis Palpanas - U of Toronto 16
- 4. Remove Unnecessary Operations
- uterjoin not always
needed when changes are
- nly inserts
reroute columns
from propagate phase through AST
remove outerjoin
- perator
same for updates not referencing AST grouping columns and predicates
prop UDI LOJ AST AST T1 Tk … J J
all columns distributive
- nly
VLDB, Aug 2002 Themis Palpanas - U of Toronto 17
- 5. Optimize for Special Cases
recomputation step not needed when
- nly insertions and only MIN/MAX functions
build predicate in apply phase check if new min/max should replace old values
- nly deletions referring only to grouping
columns of AST
can only cause entire groups to be deleted handled in apply phase
VLDB, Aug 2002 Themis Palpanas - U of Toronto 18
Experimental Evaluation
prototype implementation in IBM DB2 UDB star schema database
- sales of products over 5 year time period
- fact table: 10 million tuples
AST with non-distributive aggregate function
- 240,000 tuples
workload simulates nightly updates
1.
add/delete data for first day of month
2.
add/delete data for second day of month
3.
add/delete data for full month
VLDB, Aug 2002 Themis Palpanas - U of Toronto 19
Experimental Evaluation (cont’d)
692 702 294 workload 2 699 full refresh 420 286 incremental workload 3 workload 1
deletions require 40-60% of full refresh time
692 702 n/a workload 2 699 full refresh 31 3 incremental workload 3 workload 1
- ptimized deletions require 1-4% of full refresh
time
VLDB, Aug 2002 Themis Palpanas - U of Toronto 20
Experimental Evaluation (cont’d)
721 702 158 workload 2 702 full refresh 180 151 incremental workload 3 workload 1
insertions/updates require 20-25% of full refresh time
VLDB, Aug 2002 Themis Palpanas - U of Toronto 21
Related Work
incremental view maintenance
differential refresh algorithms
Lindsay et al. 1986, Blakeley et al. 1986, Qian
and Wiederhold 1991, Ceri and Widom 1991
deferred incremental maintenance
Colby et al. 1996, Salem et al. 2000
views with aggregation
Quass 1996, Mumick et al. 1997
VLDB, Aug 2002 Themis Palpanas - U of Toronto 22
Conclusions
incremental maintenance for ASTs with non-distributive aggregate functions
support MIN/MAX, STDDEV, CORRELATION,
REGRESSION, XMLAGG, …
efficient selective recomputation
recompute only affected groups
- ptimize query plan
customize for special cases