Efficient Computation of Parsimonious Temporal Aggregation Giovanni - - PowerPoint PPT Presentation

efficient computation of parsimonious temporal aggregation
SMART_READER_LITE
LIVE PREVIEW

Efficient Computation of Parsimonious Temporal Aggregation Giovanni - - PowerPoint PPT Presentation

Efficient Computation of Parsimonious Temporal Aggregation Giovanni Mahlknecht, Anton Dign os, Johann Gamper Free University of Bozen-Bolzano, Italy ADBIS 2015 September 8-11, 2015 - Futuroscope, Poitiers, France ADBIS 2015 1/23 G.


slide-1
SLIDE 1

Efficient Computation of Parsimonious Temporal Aggregation

Giovanni Mahlknecht, Anton Dign¨

  • s, Johann Gamper

Free University of Bozen-Bolzano, Italy

ADBIS 2015 September 8-11, 2015 - Futuroscope, Poitiers, France

ADBIS 2015 1/23

  • G. Mahlknecht et al.
slide-2
SLIDE 2

Outline

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 2/23

  • G. Mahlknecht et al.
slide-3
SLIDE 3

Instant Temporal Aggregation (ITA)

Patient treatment periods with daily cost

P C T r1 Bob 600 [1,4] r2 Mary 400 [1,2] r3 Eve 40 [3,3] r4 Eric 310 [3,4] r5 Joe 30 [6,6] r6 John 300 [6,9] r7 Alex 100 [9,9]

. days

1 2 3 4 5 6 7 8 9 Eve 40 Joe 30 Alex 100 Eric 310 John 300 Mary 400 Bob 600

ADBIS 2015 3/23

  • G. Mahlknecht et al.
slide-4
SLIDE 4

Instant Temporal Aggregation (ITA)

Patient treatment periods with daily cost

P C T r1 Bob 600 [1,4] r2 Mary 400 [1,2] r3 Eve 40 [3,3] r4 Eric 310 [3,4] r5 Joe 30 [6,6] r6 John 300 [6,9] r7 Alex 100 [9,9]

. days

1 2 3 4 5 6 7 8 9 Eve 40 Joe 30 Alex 100 Eric 310 John 300 Mary 400 Bob 600

ITA: at each timepoint SUM(C)

Val T s1 1000 [1,2] s2 950 [3,3] s3 910 [4,4] s4 330 [6,6] s5 300 [7,8] s6 400 [9,9]

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400

ADBIS 2015 3/23

  • G. Mahlknecht et al.
slide-5
SLIDE 5

Parsimonious Temporal Aggregation (PTA)

Input: ITA result Output: merged tuples to size c with minimum error

Rules:

◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 value = 983.33, error = 1667 value = 333.33, error = 6667

ADBIS 2015 4/23

  • G. Mahlknecht et al.
slide-6
SLIDE 6

Parsimonious Temporal Aggregation (PTA)

Input: ITA result Output: merged tuples to size c with minimum error

Rules:

◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 value = 332.5, error = 6675

ADBIS 2015 4/23

  • G. Mahlknecht et al.
slide-7
SLIDE 7

PTA Optimal Solution

Result of ITA for SUM(C) (size n = 6)

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 error = 800 error = 600

Result PTA (c = 4) total error 1,400 (optimal solution)

. days

1 2 3 4 5 6 7 8 9 1, 000 930 310 400

ADBIS 2015 5/23

  • G. Mahlknecht et al.
slide-8
SLIDE 8

Split Points / Split Path

◮ PTA computes a split path (sequence of split points) ◮ Tuples between split points are merged

. days

1 2 3 4 5 6 7 8 9 1000 950 910 300 330 400 error = 800 error = 600 split 5 split 3 split 1

Split Path: [1, 3, 5] . days

1 2 3 4 5 6 7 8 9 1, 000 930 310 400

ADBIS 2015 6/23

  • G. Mahlknecht et al.
slide-9
SLIDE 9

PTA Existing Algorithm

Dynamic Programming Algorithm

Error Matrix E

i=1 2 3 4 5 6 k=1 0 1667 5700 ∞ ∞ ∞ 2 - 800 5700 6300 12375 3 -

  • 800 1400 6300

4 -

  • 600

1400 Ei,k minimum error in reducing the first i tuples to size k

Split Point Matrix J

i=1 2 3 4 5 6 k=1 0 0 0 0 0 0 2 0 1 1 3 3 3 3 0 0 2 3 3 5 4 0 0 0 3 3 5 Ji,k optimum split point when reducing the first i tuples to size k

  • nly the last two rows are used

whole matrix

ADBIS 2015 7/23

  • G. Mahlknecht et al.
slide-10
SLIDE 10

Problem and Contribution

Problem

◮ Runtime and space requirements of existing algorithm not

scalable

Contribution

◮ Diagonal Pruning: Reduces the computational complexity by

avoiding unnecessary computations

◮ Split Point Graph: Reduces the space complexity ◮ Result remains optimal

ADBIS 2015 8/23

  • G. Mahlknecht et al.
slide-11
SLIDE 11

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 9/23

  • G. Mahlknecht et al.
slide-12
SLIDE 12

Diagonal Pruning

Lemma (Diagonal Pruning)

For the computation of the error matrix E and split point matrix J there exists an upper bound for variable i. i=1 2 3 4 5 6 k=1 2

  • 1

1 3 3 3 3

  • 2

3 3 5 4

  • 3

3 5

◮ Red cells can be avoided, reduces runtime ◮ allows to eliminate parts of the matrices, reduces memory

ADBIS 2015 10/23

  • G. Mahlknecht et al.
slide-13
SLIDE 13

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 11/23

  • G. Mahlknecht et al.
slide-14
SLIDE 14

Split Point Graph

Challenge

Substitution of Split Point Matrix J by alternative structure to reduce memory consumption

Idea

unnecessary nodes are not stored

Split Point Graph

◮ Only necessary nodes are inserted ◮ Nodes are removed when they become obsolete (Path

Pruning)

ADBIS 2015 12/23

  • G. Mahlknecht et al.
slide-15
SLIDE 15

Graph Evolution

1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-16
SLIDE 16

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-17
SLIDE 17

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-18
SLIDE 18

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-19
SLIDE 19

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-20
SLIDE 20

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-21
SLIDE 21

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-22
SLIDE 22

Graph Evolution

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2 diagonal pruned node 1 active path pruned nodes 1 path pruned nodes Total number of nodes: 24 Not computed nodes: 12 Path pruned nodes: 4

ADBIS 2015 13/23

  • G. Mahlknecht et al.
slide-23
SLIDE 23

Introduction Diagonal Pruning Split Point Graph Experimental Evaluation

ADBIS 2015 14/23

  • G. Mahlknecht et al.
slide-24
SLIDE 24

Experimental Configuration

Synthetic Datasets

◮ SYNTH: random distributed values ◮ ETDS: evolution of employees in a company

Algorithm Comparisons

◮ PTA: original Algorithm ◮ DP: PTA with diagonal pruning ◮ SGP: Split point graph with diagonal and path pruning

ADBIS 2015 15/23

  • G. Mahlknecht et al.
slide-25
SLIDE 25

Runtime: PTA vs Diagonal Pruning

1 2 3 4 5 5 10 Reduction size [k] Runtime [sec] ETDS

PTA DP

1 2 3 4 5 50 100 150 Reduction size [k] Runtime [sec] SYNTH

PTA DP

Diagonal pruning substantially reduces runtime

ADBIS 2015 16/23

  • G. Mahlknecht et al.
slide-26
SLIDE 26

Runtime: Split Point Graph vs PTA with Diagonal Pruning

1 2 3 4 5 2 4 6 Reduction size [k] Runtime [sec] ETDS

SPG DP

1 2 3 4 5 50 100 150 Reduction size [k] Runtime [sec] SYNTH

SPG DP

The overhead of the dynamic graph structure and path pruning is very small

ADBIS 2015 17/23

  • G. Mahlknecht et al.
slide-27
SLIDE 27

Space Efficiency: PTA vs SPG (compression to 10%)

0 1 2 3 4 5 6 7 8 910 20 40 60 80 Input cardinality n [k] Memory [MB] ETDS

PTA, c=10% SPG, c=10%

0 1 2 3 4 5 6 7 8 910 20 40 60 80 Input cardinality n [k] Memory [MB] SYNTH

PTA, c=10% SPG, c=10%

Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption

ADBIS 2015 18/23

  • G. Mahlknecht et al.
slide-28
SLIDE 28

Space Efficiency: PTA vs SPG (compression to 1%)

0 1 2 3 4 5 6 7 8 910 2 4 6 8 Input cardinality n [k] Memory [MB] ETDS

PTA, c=1% SPG, c=1%

0 1 2 3 4 5 6 7 8 910 2 4 6 8 Input cardinality n [k] Memory [MB] SYNTH

PTA, c=1% SPG, c=1%

Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption

ADBIS 2015 19/23

  • G. Mahlknecht et al.
slide-29
SLIDE 29

Space Efficiency: Effect of Path Pruning

1 2 3 4 5 50 100 150 200 Reduction size c [103] Memory [MB] ETDS

SPG without Path Pruning PTA SPG

1 2 3 4 5 50 100 150 200 Reduction size c [103] Memory [MB] SYNTH

SPG without Path Pruning PTA SPG

Path Pruning has a huge pruning effect. It prunes about 2/3

  • f the graph

ADBIS 2015 20/23

  • G. Mahlknecht et al.
slide-30
SLIDE 30

Related Work

◮ Tuma, P.: Implementing Historical Aggregates in TempIS.

Ph.D. thesis, Wayne State University, Detroit, Michigan (1992)

◮ Kline, N., Snodgrass, R.T.: Computing temporal aggregates.

In: ICDE. pp. 222-231 (1995)

◮ Moon, B., Vega Lopez, I.F., Immanuel, V.: Efficient

algorithms for large-scale temporal aggregation. IEEE Trans.

  • Knowl. Data Eng. 15(3), 744-759 (2003)

◮ Tao, Y., Papadias, D., Faloutsos, C.: Approximate temporal

  • aggregation. In: ICDE. pp. 190-201 (2004)

◮ Gordeviˇ

cius, J., Gamper, J., B¨

  • hlen, M.H.: Parsimonious

temporal aggregation. VLDB J. 21(3), 309-332 (2012)

ADBIS 2015 21/23

  • G. Mahlknecht et al.
slide-31
SLIDE 31

Conclusion

◮ Diagonal Pruning reduces the runtime of the computation

by reducing the search space of the DP scheme adopted by PTA

◮ Split Point Graph in combination with Path Pruning reduces

memory consumption

◮ Experiments showed that the two optimizations reduce

memory requirements to one third of the original PTA implementation

ADBIS 2015 22/23

  • G. Mahlknecht et al.
slide-32
SLIDE 32

Future Work

◮ Generalization of split point graph for some classes of DP

problems

◮ Computation of PTA queries while the ITA result is computed

avoiding two step computation

ADBIS 2015 23/23

  • G. Mahlknecht et al.