[PPT] - Stream Monitoring under the Time Warping Distance Yasushi Sakurai PowerPoint Presentation

SLIDE 1

Stream Monitoring under the Time Warping Distance

Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT Cyber Space Labs)

SLIDE 2

ICDE 2007

Y. Sakurai et al

2

Introduction

n Data-stream applications

q Network analysis q Sensor monitoring q Financial data analysis q Moving object tracking

n Goal

q Monitor numerical streams q Find subsequences similar to the given query

sequence

q Distance measure: Dynamic Time Warping (DTW)

SLIDE 3

ICDE 2007

Y. Sakurai et al

3

Introduction

n DTW is computed by dynamic programming

q Stretch sequences along the time axis to minimize the distance q Warping path: set of grid cells in the time warping matrix

X Y xN yM xi yj y1 x1 X Y x1 xi xN y1 yj yM

Time warping matrix

Optimal warping path (the best alignment)

SLIDE 4

ICDE 2007

Y. Sakurai et al

4

Related Work

n Sequence indexing, subsequence matching

q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002)

n Fast sequence matching for DTW

q Yi et al. (ICDE 1998) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q Sakurai et al. (PODS 2005)

SLIDE 5

ICDE 2007

Y. Sakurai et al

5

Related Work

n Data stream processing for pattern discovery

q Clustering for data streams

Guha et al. (TKDE 2003)

q Monitoring multiple streams

Zhu et al. (VLDB 2002)

q Forecasting

Papadimitriou et al. (VLDB 2003)

q Detecting lag correlations

Sakurai et al. (SIGMOD 2005)

n DTW has been studied for finite, stored sequence sets n We address a new problem for DTW

SLIDE 6

ICDE 2007

Y. Sakurai et al

6

Overview

n Introduction / Related work n Problem definition n Main ideas n Experimental results

SLIDE 7

ICDE 2007

Y. Sakurai et al

7

Problem Definition

n Subsequence matching for data streams

q (Fixed-length) query sequence Y=(y1 , y2 ,…, ym) q Sequence (data stream) X=(x1 , x2 ,…, xn) q Find all subsequences X[ts,te] such that

e £ ) ], : [ ( Y t t X D

e s

SLIDE 8

ICDE 2007

Y. Sakurai et al

8

Subsequence Matching

X[ts:te] xte xts x1 xn X ym Y y1

Other similar subsequences Redundant, useless subsequences

SLIDE 9

ICDE 2007

Y. Sakurai et al

9

Problem Definition

n Subsequence matching for data streams

q (Fixed-length) query sequence Y q Sequence (data stream) X=(x1 , x2 ,…, xn) q Find all subsequence X[ts,te] such that

n Multiple matches by subsequences which heavily

verlap with the “local minimum” best match

[ double harm ]

q Flood the user with redundant information q Slow down the algorithm by forcing it to keep track of and

report all these useless “solutions”

n Eliminate the redundant subsequences, and report

nly the “optimal” ones

e £ ) ], : [ ( Y t t X D

e s

SLIDE 10

ICDE 2007

Y. Sakurai et al

10

Problem Definition

n

Problem: Disjoint query

q

Given a threshold e, report all X[ts:te] such that

1. 2.

Only the local minimum is the smallest value in the group of

verlapping subsequences that satisfy the first condition

n

Additional challenges: streaming solution

q

Process a new value of X efficiently

q

Guarantee no false dismissals

q

Report each match as early as possible

e £ ) ], : [ ( Y t t X D

e s

) ], : [ ( Y t t X D

e s

SLIDE 11

ICDE 2007

Y. Sakurai et al

11

Overview

n Introduction / Related work n Problem definition n Main ideas n Experimental results

SLIDE 12

ICDE 2007

Y. Sakurai et al

12

n Compute the time warping matrices starting from

every time-tick

q Need O(n) matrices, O(nm) time per time-tick

n Disjoint query

q Compute all the possible subsequences and then choose

the optimal ones

Y X xts xte x1

Why not ‘naive’?

Capture the optimal subsequence starting from t = ts

SLIDE 13

ICDE 2007

Y. Sakurai et al

13

Main idea (1)

n Star-padding

q Use only a single matrix

(the naïve solution uses n matrices)

q Prefix Y with ‘*’, that always gives zero distance q instead of Y=(y1 , y2 , …, ym), compute distances

with Y’

q O(m) time and space (the naïve requires O(nm))

) : ( ) , , , , ( '

2 1

+¥

¥

= = y y y y y Y

m

!

SLIDE 14

ICDE 2007

Y. Sakurai et al

14

X t=ts t=te

Report X[ts:te] Second subsequence

t=1

Y¢

SPRING

Start at zero distance on every bottom row

SLIDE 15

ICDE 2007

Y. Sakurai et al

15

Main idea (2)

n STWM (Subsequence Time Warping Matrix)

q Problem of the star-padding: we lose the information

about the starting time-tick of the match

q After the scan, “which is the optimal subsequence?”

n Elements of STWM

q Distance value of each subsequence q Starting position

n Combination of star-padding and STWM

q Efficiently identify the optimal subsequence in a

stream fashion

SLIDE 16

ICDE 2007

Y. Sakurai et al

16

Main idea (3)

n Algorithm for disjoint queries n Designed to:

q Guarantee no false dismissals q Report each match as early as possible

SLIDE 17

ICDE 2007

Y. Sakurai et al

17

Algorithm for disjoint queries

1.

Update m elements (distance and starting position) at every time-tick

2.

Keep track of the minimum distance dmin when a subsequence within e is found

3.

Report the subsequence that gives dmin if (a) and (b) are satisfied

(a) the captured optimal subsequence cannot be replaced by the upcoming subsequences (b) the upcoming subsequences dot not overlap with the captured optimal subsequence

SLIDE 18

ICDE 2007

Y. Sakurai et al

18

Algorithm for disjoint queries

n

distance (upper number), starting position (number in parentheses)

n

X=(5,12,6,10,6,5,13), Y=(11,6,9,4), e = 20

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

SLIDE 19

ICDE 2007

Y. Sakurai et al

19

Algorithm for disjoint queries

n

distance (upper number), starting position (number in parentheses)

n

X=(5,12,6,10,6,5,13), Y=(11,6,9,4), e = 20

n

ptimal subsequence, redundant subsequences

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

SLIDE 20

ICDE 2007

Y. Sakurai et al

20

Algorithm for disjoint queries

n

distance (upper number), starting position (number in parentheses)

n

X=(5,12,6,10,6,5,13), Y=(11,6,9,4), e = 20

n

ptimal subsequence, redundant subsequences

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

SLIDE 21

ICDE 2007

Y. Sakurai et al

21

Algorithm for disjoint queries

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

n

distance (upper number), starting position (number in parentheses)

n

X=(5,12,6,10,6,5,13), Y=(11,6,9,4), e = 20

n

ptimal subsequence, redundant subsequences

SLIDE 22

ICDE 2007

Y. Sakurai et al

22

Algorithm for disjoint queries

n Guarantee to report the optimal subsequence

(a) The captured optimal subsequence cannot be replaced (b) The upcoming subsequences do not overlap with the captured optimal subsequence

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

SLIDE 23

ICDE 2007

Y. Sakurai et al

23

Algorithm for disjoint queries

n Guarantee to report the optimal subsequence

q Finally report the optimal subsequence X[2:5] at t=7 q Initialize the distance values (d2=51, d3=18, d4=88)

y4 = 4 54 (1) 110 (2) 14 (2) 38 (2) 6 (2) 7 (2) 88 (2) y3 = 9 53 (1) 46 (2) 10 (2) 2 (2) 10 (4) 17 (4) 18 (4) y2 = 6 37 (1) 37 (2) 1 (2) 17 (4) 1 (4) 2 (4) 51 (4) y1 = 11 36 (1) 1 (2) 25 (3) 1 (4) 25 (5) 36 (6) 4 (7) xt 5 12 6 10 6 5 13 t 1 2 3 4 5 6 7

SLIDE 24

ICDE 2007

Y. Sakurai et al

24

Overview

n Introduction / Related work n Problem definition n Main ideas n Experimental results

SLIDE 25

ICDE 2007

Y. Sakurai et al

25

Experimental Results

n Experiments with real and synthetic data sets

q MaskedChirp, Temperature, Kursk, Sunspots

n Evaluation

q Accuracy for pattern discovery q Computation time q (Memory space consumption)

SLIDE 26

ICDE 2007

Y. Sakurai et al

26

Pattern Discovery

n MaskedChirp

Query sequence Data stream

SLIDE 27

ICDE 2007

Y. Sakurai et al

27

Pattern Discovery

n MaskedChirp

SPRING identifies all sound parts with varying time periods Query sequence Data stream The output time of each captured subsequence is very close to its end position

SLIDE 28

ICDE 2007

Y. Sakurai et al

28

Pattern Discovery

n Temperature

Query sequence Data stream

SLIDE 29

ICDE 2007

Y. Sakurai et al

29

Pattern Discovery

n Temperature

Query sequence Data stream SPRING finds the days when the temperature fluctuates from cool to hot

SLIDE 30

ICDE 2007

Y. Sakurai et al

30

Pattern Discovery

n Kursk

Query sequence Data stream

SLIDE 31

ICDE 2007

Y. Sakurai et al

31

Pattern Discovery

n Kursk

Query sequence Data stream SPRING is not affected by the difference in the environmental conditions

SLIDE 32

ICDE 2007

Y. Sakurai et al

32

Pattern Discovery

n Sunspots

Query sequence Data stream

SLIDE 33

ICDE 2007

Y. Sakurai et al

33

Pattern Discovery

n Sunspots

Query sequence Data stream SPRING can capture bursty periods and identify the time- varying periodicity

SLIDE 34

ICDE 2007

Y. Sakurai et al

34

Computation time

n Wall clock time per time-tick

q Naïve method: O(nm) q SPRING: O(m)，not depend on sequence length n

SLIDE 35

ICDE 2007

Y. Sakurai et al

35

Extension to multiple streams

n Motion capture data

q Place special markers on the joints of a human actor q Record their x-, y-, z-velocities q Use 16-dimensional sequences q Capture motions based on the similarity of rotational

energy

Erotation : rotational energy I : moment of inertia w : angular velocity

2

2 1 w I Erotation =

SLIDE 36

ICDE 2007

Y. Sakurai et al

36

High-speed Motion Capture

SLIDE 37

ICDE 2007

Y. Sakurai et al

37

High-speed Motion Capture

n Recognize all motions in a stream fashion

q Entertainment applications, etc

Walk Swing Rotate Swing Rotate One-leg jump Jump Walk Run Walk

SLIDE 38

ICDE 2007

Y. Sakurai et al

38

Conclusions

n

Subsequence matching under the DTW distance over data streams

1.

High-speed, and low memory consumption

q

O(m) time and space; not depend on n

2.

Accuracy

q

Guarantee no false dismissals

n

Stored data sets

q

SPRING can be applied to stored sequence sets

SLIDE 39

Appendix

SLIDE 40

ICDE 2007

Y. Sakurai et al

40

Mini-introduction to DTW

n DTW allows sequences to be stretched along the

time axis

q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance

‘stutters’:

riginal

SLIDE 41

ICDE 2007

Y. Sakurai et al

41

Mini-introduction to DTW

n DTW is computed by dynamic programming

q Warping path: set of grid cells in the time warping

matrix

data sequence P of length N query sequence Q of length M pN qM pi qj q1 p1 P Q p1 pi pN q1 qj qM p-stutters q-stutters Optimum warping path (the best alignment)

SLIDE 42

ICDE 2007

Y. Sakurai et al

42

Mini-introduction to DTW

ï î ï í ì

+
=

= ) 1 , 1 ( ) , 1 ( ) 1 , ( min ) , ( ) , ( ) , ( j i f j i f j i f q p j i f M N f Q P D

j i dtw

q-stutter no stutter p-stutter

n DTW is computed by dynamic programming

p1, p2, …, pi,; q1, q2, …, qj

SLIDE 43

ICDE 2007

Y. Sakurai et al

43

Pattern Discovery

n Humidity

Query sequence Data stream

SLIDE 44

ICDE 2007

Y. Sakurai et al

44

Pattern Discovery

n Humidity

Query sequence Data stream

SLIDE 45

ICDE 2007

Y. Sakurai et al

45

Two Algorithms of SPRING

n SPRING-optimal

e = 10,000 e = 15,000 Query sequence

SLIDE 46

ICDE 2007

Y. Sakurai et al

46

Two Algorithms of SPRING

n SPRING-first

e = 10,000 e = 15,000 Query sequence

SLIDE 47

ICDE 2007

Y. Sakurai et al

47

Memory space consumption

n Memory space for time warping matrix (matrices)

q Naïve method: O(nm) q SPRING: O(m)，not depend on sequence length n q SPRING (path): clearly lower than that of the naïve method