FTW: Fast Similarity Search under the Time Warping Distance Yasushi - - PowerPoint PPT Presentation

ftw fast similarity search under the time warping distance
SMART_READER_LITE
LIVE PREVIEW

FTW: Fast Similarity Search under the Time Warping Distance Yasushi - - PowerPoint PPT Presentation

FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.) Motivation n Time-series data q many applications n computational


slide-1
SLIDE 1

FTW: Fast Similarity Search under the Time Warping Distance

Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.)

slide-2
SLIDE 2

PODS 2005

  • Y. Sakurai et al

2

Motivation

n Time-series data

q many applications n computational biology, astrophysics, geology,

meteorology, multimedia, economics

n Similarity search

q Euclidean distance q DTW (Dynamic Time Warping) n Useful for different sequence lengths n Different sampling rates n scaling along the time axis

slide-3
SLIDE 3

PODS 2005

  • Y. Sakurai et al

3

Mini-introduction to DTW

n DTW allows sequences to be stretched along the

time axis

q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance

‘stutters’:

  • riginal
slide-4
SLIDE 4

PODS 2005

  • Y. Sakurai et al

4

Mini-introduction to DTW

n DTW is computed by dynamic programming

q Warping path: set of grid cells in the time warping

matrix

data sequence P of length N query sequence Q of length M pN qM pi qj q1 p1 P Q p1 pi pN q1 qj qM p-stutters q-stutters Optimum warping path (the best alignment)

slide-5
SLIDE 5

PODS 2005

  • Y. Sakurai et al

5

Mini-introduction to DTW

ï î ï í ì

  • +
  • =

= ) 1 , 1 ( ) , 1 ( ) 1 , ( min ) , ( ) , ( ) , ( j i f j i f j i f q p j i f M N f Q P D

j i dtw

q-stutter no stutter p-stutter

n DTW is computed by dynamic programming

p1, p2, …, pi,; q1, q2, …, qj

slide-6
SLIDE 6

PODS 2005

  • Y. Sakurai et al

6

Mini-introduction to DTW

n Global constraints limit the warping scope

q Warping scope: area that the warping path is allowed to

visit

P Q p1 pi pN q1 qj qM P Q p1 pi pN q1 qj qM Itakura Parallelogram Sakoe-Chiba Band

slide-7
SLIDE 7

PODS 2005

  • Y. Sakurai et al

7

Mini-introduction to DTW

n Width of the warping scope W is user-defined

P Q p1 pi pN q1 qj qM Sakoe-Chiba Band

W1

P Q p1 pi pN q1 qj qM

W2

slide-8
SLIDE 8

PODS 2005

  • Y. Sakurai et al

8

Motivation

n Similarity search for time-series data

q DTW (Dynamic Time Warping)

n scaling along the time axis

But…

n High search cost O(NM) n prohibitive for long sequences

slide-9
SLIDE 9

PODS 2005

  • Y. Sakurai et al

9

Our Solution, FTW

n Requirements:

  • 1. Fast
  • 2. No false dismissals
  • 3. No restriction on the sequence length

n It should handle data sequences of different lengths

  • 4. Support for any, as well as for no restriction on

“warping scope”

slide-10
SLIDE 10

PODS 2005

  • Y. Sakurai et al

10

Problem Definition

n Given

q S time-series data sequences of unequal lengths

{P1, P2, …, PS},

q a query sequence Q, q an integer k, q (optionally) a warping scope W,

n Find the k-nearest neighbors of Q from the

data sequence set by using DTW with W

slide-11
SLIDE 11

PODS 2005

  • Y. Sakurai et al

11

Overview

n Introduction n Related work n Main ideas n Experimental results n Conclusions

slide-12
SLIDE 12

PODS 2005

  • Y. Sakurai et al

12

Related Work

n Sequence indexing

q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q …

n Subsequence matching

q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) q …

slide-13
SLIDE 13

PODS 2005

  • Y. Sakurai et al

13

Related Work

n Fast sequence matching for DTW

q Yi et al. (ICDE 1998) q Kim et al. (ICDE 2001) q Chu et al. (SDM 2002) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q …

n None of the existing methods for DTW fulfills all

the requirements

slide-14
SLIDE 14

PODS 2005

  • Y. Sakurai et al

14

Overview

n Introduction n Related work n Main ideas n Experimental results n Conclusions

slide-15
SLIDE 15

PODS 2005

  • Y. Sakurai et al

15

Main Idea (1) - LBS

n LBS (Lower Bounding distance measure with

Segmentation)

n P

A : Approximate sequences

q

: segment range

q

: upper value

q

: lower value

q t: length of time intervals*

) : (

U i L i R i

p p p =

U i

p

R i

p

L i

p

A

P

R

p1

R

p4

R

p3

t t t t

R

p2

slide-16
SLIDE 16

PODS 2005

  • Y. Sakurai et al

16

Main Idea (1) - LBS

R j

q

R i

p

n Compute lower bounding distance

q Distance of the two ranges and :

distance of their two closest points

R j

q

R i

p

Time Value Lower bound Time Value Lower bound =0

slide-17
SLIDE 17

PODS 2005

  • Y. Sakurai et al

17

Main Idea (1) - LBS

n Compute lower bounding distance

q Distance of the two ranges and :

distance of their two closest points

R j

q

ï ï î ï ï í ì >

  • >
  • =

) ( ) ( ) ( ) , (

  • therwise

p q p q q p q p q p D

U i L j U i L j U j L i U j L i R j R i seg R i

p

details

slide-18
SLIDE 18

PODS 2005

  • Y. Sakurai et al

18

Main Idea (1) - LBS

P

Q

P

Q

n Exact DTW distance

slide-19
SLIDE 19

PODS 2005

  • Y. Sakurai et al

19

Main Idea (1) - LBS

n Compute lower bounding distance from P

A and QA

n Use a dynamic programming approach

A

P

A

Q

A

P

A

Q

) , ( ) , ( Q P D Q P D

dtw A A lbs

£

slide-20
SLIDE 20

PODS 2005

  • Y. Sakurai et al

20

Main Idea (1) - LBS

n Compute lower bounding distance from P

A and QA

n Use a dynamic programming approach

A

P

A

Q

) , ( ) , ( Q P D Q P D

dtw A A lbs

£

P

Q

slide-21
SLIDE 21

PODS 2005

  • Y. Sakurai et al

21

Main Idea (2) - EarlyStopping

n Exploit the fact that we have found k-near neighbors

at distance dcb

q dcb: k-nearest neighbor distance (the Current Best)

the exact distance of the best k candidates so far

slide-22
SLIDE 22

PODS 2005

  • Y. Sakurai et al

22

Main Idea (2) - EarlyStopping

n Exclude useless warping paths by using

q Omit g(1,3) if q Omit g(4,1) if

A

P

A

Q

g(1,2) g(3,1)

A

P

A

Q

cb

d g > ) 2 , 1 (

cb

d

cb

d g > ) 1 , 3 (

slide-23
SLIDE 23

PODS 2005

  • Y. Sakurai et al

23

Main Idea (3) - Refinement

n Q: How to choose t (length of time intervals)?

A

P

A

Q

g(1,2) g(3,1)

A

P

A

Q

t t

slide-24
SLIDE 24

PODS 2005

  • Y. Sakurai et al

24

Main Idea (3) - Refinement

n Q: How to choose t (length of intervals)? n A: Use multiple granularities, as follows:

A

P

A

Q

g(1,2) g(3,1)

A

P

A

Q

t t

slide-25
SLIDE 25

PODS 2005

  • Y. Sakurai et al

25

Main Idea (3) - Refinement

n Compute the lower bounding distance from the

coarsest sequences as the first refinement step

n Ignore P if , otherwise:

A

P

A

Q

g(1,2) g(3,1)

A

P

A

Q

cb A A lbs

d Q P D > ) , (

slide-26
SLIDE 26

PODS 2005

  • Y. Sakurai et al

26

Main Idea (3) - Refinement

n … compute the distance from more accurate

sequences as the second refinement step

n … repeat

A

P

A

Q

A

Q

A

P

slide-27
SLIDE 27

PODS 2005

  • Y. Sakurai et al

27

Main Idea (3) - Refinement

n … until the finest granularity n Update the list of k-nearest neighbors if

P

Q

P

Q

cb dtw

d Q P D £ ) , (

slide-28
SLIDE 28

PODS 2005

  • Y. Sakurai et al

28

Overview

n Introduction n Related work n Main ideas n Experimental results n Conclusions

slide-29
SLIDE 29

PODS 2005

  • Y. Sakurai et al

29

Experimental results

n Setup

q Intel Xeon 2.8GHz, 1GB memory, Linux q Datasets:

Temperature, Fintime, RandomWalk

q Four different time intervals (for n=2048)

t1=2, t2=8, t3=32, t4=128

n Evaluation

q Compared FTW with LB_PAA (the best so far) q Mainly computation time

slide-30
SLIDE 30

PODS 2005

  • Y. Sakurai et al

30

Outline of experiments

n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences

slide-31
SLIDE 31

PODS 2005

  • Y. Sakurai et al

31

Search Performance

n Itakura Parallelogram

P Q p1 pi pN q1 qj qM

slide-32
SLIDE 32

PODS 2005

  • Y. Sakurai et al

32

Search Performance

n Wall clock time as a function of data set size n Temperature

FTW is up to 50 times faster!

slide-33
SLIDE 33

PODS 2005

  • Y. Sakurai et al

33

Search Performance

n Wall clock time as a function of data set size n Fintime

FTW is up to 40 times faster!

slide-34
SLIDE 34

PODS 2005

  • Y. Sakurai et al

34

Search Performance

n Wall clock time as a function of data set size n RandomWalk

FTW is up to 40 times faster! More effective as the size grows

slide-35
SLIDE 35

PODS 2005

  • Y. Sakurai et al

35

Outline of experiments

n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences

slide-36
SLIDE 36

PODS 2005

  • Y. Sakurai et al

36

Search Performance

n Sakoe-Chiba Band

P Q p1 pi pN q1 qj qM

W1

P Q p1 pi pN q1 qj qM

W2

slide-37
SLIDE 37

PODS 2005

  • Y. Sakurai et al

37

Search Performance

n Wall clock time as a function of warping scope n Temperature

FTW is up to 220 times faster!

slide-38
SLIDE 38

PODS 2005

  • Y. Sakurai et al

38

Search Performance

n Wall clock time as a function of warping scope n Fintime

FTW is up to 70 times faster!

slide-39
SLIDE 39

PODS 2005

  • Y. Sakurai et al

39

Search Performance

n Wall clock time as a function of warping scope n RandomWalk

FTW is up to 100 times faster!

slide-40
SLIDE 40

PODS 2005

  • Y. Sakurai et al

40

Outline of experiments

n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences

slide-41
SLIDE 41

PODS 2005

  • Y. Sakurai et al

41

Effect of filtering

n Most of data sequences are excluded by coarser

approximations (t4=128 and t3=32)

q Using multiple granularities has significant advantages

Frequency of approximation use

slide-42
SLIDE 42

PODS 2005

  • Y. Sakurai et al

42

Outline of experiments

n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length sequences

slide-43
SLIDE 43

PODS 2005

  • Y. Sakurai et al

43

Difference in Sequence Lengths

n 5 sequence data sets

Random(2048,0): length 2048 +/- 0 Random(2048,32): length 2048 +/- 16 Random(2048,64), Random(2048,128), Random(2048,256)

Outperform by 2+ orders of magnitude LB_PAA can not handle

slide-44
SLIDE 44

PODS 2005

  • Y. Sakurai et al

44

Overview

n Introduction n Related work n Main ideas n Experimental results n Conclusions

slide-45
SLIDE 45

PODS 2005

  • Y. Sakurai et al

45

Conclusions

n

Design goals:

  • 1. Fast
  • 2. No false dismissals
  • 3. No restriction on the sequence length
  • 4. Support for any, as well as for no

restriction on “warping scope”

slide-46
SLIDE 46

PODS 2005

  • Y. Sakurai et al

46

Conclusions

n

Design goals:

  • 1. Fast (up to 220 times faster)
  • 2. No false dismissals
  • 3. No restriction on the sequence length
  • 4. Support for any, as well as for no

restriction on “warping scope”

slide-47
SLIDE 47

PODS 2005

  • Y. Sakurai et al

47

Page Accesses

n Sequential scan of feature data should boost

performance (speed-up factors SF=5, SF=10)

PAds: page accesses for data sequences PAfd: page accesses for feature data

ds fd SF

PA SF PA PA + =

details