Incremental Maintenance for Non-Distributive Aggregate Functions - - PowerPoint PPT Presentation

incremental maintenance for non distributive aggregate
SMART_READER_LITE
LIVE PREVIEW

Incremental Maintenance for Non-Distributive Aggregate Functions - - PowerPoint PPT Presentation

Incremental Maintenance for Non-Distributive Aggregate Functions Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh work done at IBM Almaden Research Center Motivation large amounts of data stored in databases


slide-1
SLIDE 1

Incremental Maintenance for Non-Distributive Aggregate Functions

work done at IBM Almaden Research Center

Themis Palpanas (U of Toronto) Richard Sidle Bobbie Cochrane Hamid Pirahesh

slide-2
SLIDE 2

VLDB, Aug 2002 Themis Palpanas - U of Toronto 2

Motivation

large amounts of data stored in databases

expensive OLAP queries, but with nice properties:

based on same set of tables perform similar aggregations

can efficiently support such queries with

Automatic Summary Tables (ASTs)

precomputed once, used many times answer complex queries fast

must maintain ASTs when base tables change

inserts, updates, deletes

slide-3
SLIDE 3

3

Motivation (cont’d)

base tables AST insert/update/delete AST definition

slide-4
SLIDE 4

VLDB, Aug 2002 Themis Palpanas - U of Toronto 4

Problem Statement

given ASTs with aggregate functions

distributive

SUM, COUNT

non-distributive

STDDEV, CORRELATION, REGRESSION,

MIN/MAX, XMLAGG, …

when base tables change

incrementally maintain affected ASTs

efficient maintenance of ASTs with non-distributive aggregate functions

slide-5
SLIDE 5

VLDB, Aug 2002 Themis Palpanas - U of Toronto 5

Outline

Current Approach Our Solution Experimental Evaluation Related Work Conclusions

slide-6
SLIDE 6

6

AST

Current Approach

base tables insert/update/delete delta combine

  • ld and

new values AST definition Propagate phase Apply phase

slide-7
SLIDE 7

VLDB, Aug 2002 Themis Palpanas - U of Toronto 7

Current Approach (cont’d)

works for distributive

SUM, COUNT

does not work for non-distributive

STDDEV, CORRELATION, REGRESSION MIN/MAX XMLAGG

need new way to deal with these functions

slide-8
SLIDE 8

VLDB, Aug 2002 Themis Palpanas - U of Toronto 8

Our Solution

selective recomputation

no longer enough to compute delta must recompute some aggregation groups

minimize work to be done

choose which groups to recompute

  • ptimize query plan
slide-9
SLIDE 9

9

Our Solution (cont’d)

AST base tables insert/update/delete delta recompute affected groups combine

  • ld and

new values Propagate phase AST definition Apply phase

slide-10
SLIDE 10

VLDB, Aug 2002 Themis Palpanas - U of Toronto 10

Our Solution (cont’d)

the 5 steps

1.

compute new aggregate values

2.

change column derivation

3.

recompute only affected groups

4.

eliminate unnecessary operations

5.

  • ptimize for special cases
slide-11
SLIDE 11

VLDB, Aug 2002 Themis Palpanas - U of Toronto 11

Initial Query Plan

prop UDI LOJ AST

Query Graph Model (QGM)

slide-12
SLIDE 12

VLDB, Aug 2002 Themis Palpanas - U of Toronto 12

  • 1. Compute New Aggregate Values

compute delta for distributive functions recompute non- distributive functions get those values only for affected groups duplicate computation for distributive functions!

prop UDI LOJ AST AST LOJ

slide-13
SLIDE 13

VLDB, Aug 2002 Themis Palpanas - U of Toronto 13

  • 2. Change Column Derivation

change column derivation rewrite phase projects

  • ut unused columns

entire AST gets recomputed!

prop UDI LOJ AST AST LOJ

non-distributive

  • nly

distributive

  • nly
slide-14
SLIDE 14

VLDB, Aug 2002 Themis Palpanas - U of Toronto 14

  • 3. Recompute Affected Groups

push join predicate down in AST

  • nly affected groups

are recomputed

special rules for super-aggregates

prop UDI LOJ AST AST* LOJ

non-distributive

  • nly

distributive

  • nly

T1 Tk … J J

slide-15
SLIDE 15

VLDB, Aug 2002 Themis Palpanas - U of Toronto 15

  • 3. Recompute Affected Groups

special treatment for ASTs with super-aggregates

predicates not pushdownable caution not to compute totals of totals

build special join predicate

ensure correct aggregations

change rewrite rules

allow predicate pushdown through super aggregates applicable only for special join predicate

slide-16
SLIDE 16

VLDB, Aug 2002 Themis Palpanas - U of Toronto 16

  • 4. Remove Unnecessary Operations
  • uterjoin not always

needed when changes are

  • nly inserts

reroute columns

from propagate phase through AST

remove outerjoin

  • perator

same for updates not referencing AST grouping columns and predicates

prop UDI LOJ AST AST T1 Tk … J J

all columns distributive

  • nly
slide-17
SLIDE 17

VLDB, Aug 2002 Themis Palpanas - U of Toronto 17

  • 5. Optimize for Special Cases

recomputation step not needed when

  • nly insertions and only MIN/MAX functions

build predicate in apply phase check if new min/max should replace old values

  • nly deletions referring only to grouping

columns of AST

can only cause entire groups to be deleted handled in apply phase

slide-18
SLIDE 18

VLDB, Aug 2002 Themis Palpanas - U of Toronto 18

Experimental Evaluation

prototype implementation in IBM DB2 UDB star schema database

  • sales of products over 5 year time period
  • fact table: 10 million tuples

AST with non-distributive aggregate function

  • 240,000 tuples

workload simulates nightly updates

1.

add/delete data for first day of month

2.

add/delete data for second day of month

3.

add/delete data for full month

slide-19
SLIDE 19

VLDB, Aug 2002 Themis Palpanas - U of Toronto 19

Experimental Evaluation (cont’d)

692 702 294 workload 2 699 full refresh 420 286 incremental workload 3 workload 1

deletions require 40-60% of full refresh time

692 702 n/a workload 2 699 full refresh 31 3 incremental workload 3 workload 1

  • ptimized deletions require 1-4% of full refresh

time

slide-20
SLIDE 20

VLDB, Aug 2002 Themis Palpanas - U of Toronto 20

Experimental Evaluation (cont’d)

721 702 158 workload 2 702 full refresh 180 151 incremental workload 3 workload 1

insertions/updates require 20-25% of full refresh time

slide-21
SLIDE 21

VLDB, Aug 2002 Themis Palpanas - U of Toronto 21

Related Work

incremental view maintenance

differential refresh algorithms

Lindsay et al. 1986, Blakeley et al. 1986, Qian

and Wiederhold 1991, Ceri and Widom 1991

deferred incremental maintenance

Colby et al. 1996, Salem et al. 2000

views with aggregation

Quass 1996, Mumick et al. 1997

slide-22
SLIDE 22

VLDB, Aug 2002 Themis Palpanas - U of Toronto 22

Conclusions

incremental maintenance for ASTs with non-distributive aggregate functions

support MIN/MAX, STDDEV, CORRELATION,

REGRESSION, XMLAGG, …

efficient selective recomputation

recompute only affected groups

  • ptimize query plan

customize for special cases

significant performance improvements