[PPT] - Efficient Computation of Parsimonious Temporal Aggregation Giovanni PowerPoint Presentation

SLIDE 1

Efficient Computation of Parsimonious Temporal Aggregation

Giovanni Mahlknecht, Anton Dign¨

s, Johann Gamper

Free University of Bozen-Bolzano, Italy

ADBIS 2015 September 8-11, 2015 - Futuroscope, Poitiers, France

ADBIS 2015 1/23

G. Mahlknecht et al.

SLIDE 2

Outline

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 2/23

G. Mahlknecht et al.

SLIDE 3

Instant Temporal Aggregation (ITA)

Patient treatment periods with daily cost

P C T r1 Bob 600 [1,4] r2 Mary 400 [1,2] r3 Eve 40 [3,3] r4 Eric 310 [3,4] r5 Joe 30 [6,6] r6 John 300 [6,9] r7 Alex 100 [9,9]

. days

1 2 3 4 5 6 7 8 9 Eve 40 Joe 30 Alex 100 Eric 310 John 300 Mary 400 Bob 600

ADBIS 2015 3/23

G. Mahlknecht et al.

SLIDE 4

Instant Temporal Aggregation (ITA)

Patient treatment periods with daily cost

P C T r1 Bob 600 [1,4] r2 Mary 400 [1,2] r3 Eve 40 [3,3] r4 Eric 310 [3,4] r5 Joe 30 [6,6] r6 John 300 [6,9] r7 Alex 100 [9,9]

. days

1 2 3 4 5 6 7 8 9 Eve 40 Joe 30 Alex 100 Eric 310 John 300 Mary 400 Bob 600

ITA: at each timepoint SUM(C)

Val T s1 1000 [1,2] s2 950 [3,3] s3 910 [4,4] s4 330 [6,6] s5 300 [7,8] s6 400 [9,9]

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400

ADBIS 2015 3/23

G. Mahlknecht et al.

SLIDE 5

Parsimonious Temporal Aggregation (PTA)

Input: ITA result Output: merged tuples to size c with minimum error

Rules:

◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 value = 983.33, error = 1667 value = 333.33, error = 6667

ADBIS 2015 4/23

G. Mahlknecht et al.

SLIDE 6

Parsimonious Temporal Aggregation (PTA)

Input: ITA result Output: merged tuples to size c with minimum error

Rules:

◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 value = 332.5, error = 6675

ADBIS 2015 4/23

G. Mahlknecht et al.

SLIDE 7

PTA Optimal Solution

Result of ITA for SUM(C) (size n = 6)

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 error = 800 error = 600

Result PTA (c = 4) total error 1,400 (optimal solution)

. days

1 2 3 4 5 6 7 8 9 1, 000 930 310 400

ADBIS 2015 5/23

G. Mahlknecht et al.

SLIDE 8

Split Points / Split Path

◮ PTA computes a split path (sequence of split points) ◮ Tuples between split points are merged

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 error = 800 error = 600 split 5 split 3 split 1

Split Path: [1, 3, 5] . days

1 2 3 4 5 6 7 8 9 1, 000 930 310 400

ADBIS 2015 6/23

G. Mahlknecht et al.

SLIDE 9

PTA Existing Algorithm

Dynamic Programming Algorithm

Error Matrix E

i=1 2 3 4 5 6 k=1 0 1667 5700 ∞ ∞ ∞ 2 - 800 5700 6300 12375 3 -

800 1400 6300

4 -

600

1400 Ei,k minimum error in reducing the first i tuples to size k

Split Point Matrix J

i=1 2 3 4 5 6 k=1 0 0 0 0 0 0 2 0 1 1 3 3 3 3 0 0 2 3 3 5 4 0 0 0 3 3 5 Ji,k optimum split point when reducing the first i tuples to size k

nly the last two rows are used

whole matrix

ADBIS 2015 7/23

G. Mahlknecht et al.

SLIDE 10

Problem and Contribution

Problem

◮ Runtime and space requirements of existing algorithm not

scalable

Contribution

◮ Diagonal Pruning: Reduces the computational complexity by

avoiding unnecessary computations

◮ Split Point Graph: Reduces the space complexity ◮ Result remains optimal

ADBIS 2015 8/23

G. Mahlknecht et al.

SLIDE 11

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 9/23

G. Mahlknecht et al.

SLIDE 12

Diagonal Pruning

Lemma (Diagonal Pruning)

For the computation of the error matrix E and split point matrix J there exists an upper bound for variable i. i=1 2 3 4 5 6 k=1 2

1

1 3 3 3 3

2

3 3 5 4

3

3 5

◮ Red cells can be avoided, reduces runtime ◮ allows to eliminate parts of the matrices, reduces memory

ADBIS 2015 10/23

G. Mahlknecht et al.

SLIDE 13

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 11/23

G. Mahlknecht et al.

SLIDE 14

Split Point Graph

Challenge

Substitution of Split Point Matrix J by alternative structure to reduce memory consumption

Idea

unnecessary nodes are not stored

Split Point Graph

◮ Only necessary nodes are inserted ◮ Nodes are removed when they become obsolete (Path

Pruning)

ADBIS 2015 12/23

G. Mahlknecht et al.

SLIDE 15

Graph Evolution

1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 16

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 17

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 18

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 19

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 20

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 21

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 22

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes Total number of nodes: 24 Not computed nodes: 12 Path pruned nodes: 4

ADBIS 2015 13/23

G. Mahlknecht et al.

SLIDE 23

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 14/23

G. Mahlknecht et al.

SLIDE 24

Experimental Configuration

Synthetic Datasets

◮ SYNTH: random distributed values ◮ ETDS: evolution of employees in a company

Algorithm Comparisons

◮ PTA: original Algorithm ◮ DP: PTA with diagonal pruning ◮ SGP: Split point graph with diagonal and path pruning

ADBIS 2015 15/23

G. Mahlknecht et al.

SLIDE 25

Runtime: PTA vs Diagonal Pruning

1 2 3 4 5 5 10 Reduction size [k] Runtime [sec] ETDS

PTA DP

1 2 3 4 5 50 100 150 Reduction size [k] Runtime [sec] SYNTH

PTA DP

Diagonal pruning substantially reduces runtime

ADBIS 2015 16/23

G. Mahlknecht et al.

SLIDE 26

Runtime: Split Point Graph vs PTA with Diagonal Pruning

1 2 3 4 5 2 4 6 Reduction size [k] Runtime [sec] ETDS

SPG DP

1 2 3 4 5 50 100 150 Reduction size [k] Runtime [sec] SYNTH

SPG DP

The overhead of the dynamic graph structure and path pruning is very small

ADBIS 2015 17/23

G. Mahlknecht et al.

SLIDE 27

Space Efficiency: PTA vs SPG (compression to 10%)

0 1 2 3 4 5 6 7 8 910 20 40 60 80 Input cardinality n [k] Memory [MB] ETDS

PTA, c=10% SPG, c=10%

0 1 2 3 4 5 6 7 8 910 20 40 60 80 Input cardinality n [k] Memory [MB] SYNTH

PTA, c=10% SPG, c=10%

Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption

ADBIS 2015 18/23

G. Mahlknecht et al.

SLIDE 28

Space Efficiency: PTA vs SPG (compression to 1%)

0 1 2 3 4 5 6 7 8 910 2 4 6 8 Input cardinality n [k] Memory [MB] ETDS

PTA, c=1% SPG, c=1%

0 1 2 3 4 5 6 7 8 910 2 4 6 8 Input cardinality n [k] Memory [MB] SYNTH

PTA, c=1% SPG, c=1%

Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption

ADBIS 2015 19/23

G. Mahlknecht et al.

SLIDE 29

Space Efficiency: Effect of Path Pruning

1 2 3 4 5 50 100 150 200 Reduction size c [103] Memory [MB] ETDS

SPG without Path Pruning PTA SPG

1 2 3 4 5 50 100 150 200 Reduction size c [103] Memory [MB] SYNTH

SPG without Path Pruning PTA SPG

Path Pruning has a huge pruning effect. It prunes about 2/3

f the graph

ADBIS 2015 20/23

G. Mahlknecht et al.

SLIDE 30

Related Work

◮ Tuma, P.: Implementing Historical Aggregates in TempIS.

Ph.D. thesis, Wayne State University, Detroit, Michigan (1992)

◮ Kline, N., Snodgrass, R.T.: Computing temporal aggregates.

In: ICDE. pp. 222-231 (1995)

◮ Moon, B., Vega Lopez, I.F., Immanuel, V.: Efficient

algorithms for large-scale temporal aggregation. IEEE Trans.

Knowl. Data Eng. 15(3), 744-759 (2003)

◮ Tao, Y., Papadias, D., Faloutsos, C.: Approximate temporal

aggregation. In: ICDE. pp. 190-201 (2004)

◮ Gordeviˇ

cius, J., Gamper, J., B¨

hlen, M.H.: Parsimonious

temporal aggregation. VLDB J. 21(3), 309-332 (2012)

ADBIS 2015 21/23

G. Mahlknecht et al.

SLIDE 31

Conclusion

◮ Diagonal Pruning reduces the runtime of the computation

by reducing the search space of the DP scheme adopted by PTA

◮ Split Point Graph in combination with Path Pruning reduces

memory consumption

◮ Experiments showed that the two optimizations reduce

memory requirements to one third of the original PTA implementation

ADBIS 2015 22/23

G. Mahlknecht et al.