FTW: Fast Similarity Search under the Time Warping Distance Yasushi - - PowerPoint PPT Presentation
FTW: Fast Similarity Search under the Time Warping Distance Yasushi - - PowerPoint PPT Presentation
FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.) Motivation n Time-series data q many applications n computational
PODS 2005
- Y. Sakurai et al
2
Motivation
n Time-series data
q many applications n computational biology, astrophysics, geology,
meteorology, multimedia, economics
n Similarity search
q Euclidean distance q DTW (Dynamic Time Warping) n Useful for different sequence lengths n Different sampling rates n scaling along the time axis
PODS 2005
- Y. Sakurai et al
3
Mini-introduction to DTW
n DTW allows sequences to be stretched along the
time axis
q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance
‘stutters’:
- riginal
PODS 2005
- Y. Sakurai et al
4
Mini-introduction to DTW
n DTW is computed by dynamic programming
q Warping path: set of grid cells in the time warping
matrix
data sequence P of length N query sequence Q of length M pN qM pi qj q1 p1 P Q p1 pi pN q1 qj qM p-stutters q-stutters Optimum warping path (the best alignment)
PODS 2005
- Y. Sakurai et al
5
Mini-introduction to DTW
ï î ï í ì
- +
- =
= ) 1 , 1 ( ) , 1 ( ) 1 , ( min ) , ( ) , ( ) , ( j i f j i f j i f q p j i f M N f Q P D
j i dtw
q-stutter no stutter p-stutter
n DTW is computed by dynamic programming
p1, p2, …, pi,; q1, q2, …, qj
PODS 2005
- Y. Sakurai et al
6
Mini-introduction to DTW
n Global constraints limit the warping scope
q Warping scope: area that the warping path is allowed to
visit
P Q p1 pi pN q1 qj qM P Q p1 pi pN q1 qj qM Itakura Parallelogram Sakoe-Chiba Band
PODS 2005
- Y. Sakurai et al
7
Mini-introduction to DTW
n Width of the warping scope W is user-defined
P Q p1 pi pN q1 qj qM Sakoe-Chiba Band
W1
P Q p1 pi pN q1 qj qM
W2
PODS 2005
- Y. Sakurai et al
8
Motivation
n Similarity search for time-series data
q DTW (Dynamic Time Warping)
n scaling along the time axis
But…
n High search cost O(NM) n prohibitive for long sequences
PODS 2005
- Y. Sakurai et al
9
Our Solution, FTW
n Requirements:
- 1. Fast
- 2. No false dismissals
- 3. No restriction on the sequence length
n It should handle data sequences of different lengths
- 4. Support for any, as well as for no restriction on
“warping scope”
PODS 2005
- Y. Sakurai et al
10
Problem Definition
n Given
q S time-series data sequences of unequal lengths
{P1, P2, …, PS},
q a query sequence Q, q an integer k, q (optionally) a warping scope W,
n Find the k-nearest neighbors of Q from the
data sequence set by using DTW with W
PODS 2005
- Y. Sakurai et al
11
Overview
n Introduction n Related work n Main ideas n Experimental results n Conclusions
PODS 2005
- Y. Sakurai et al
12
Related Work
n Sequence indexing
q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q …
n Subsequence matching
q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) q …
PODS 2005
- Y. Sakurai et al
13
Related Work
n Fast sequence matching for DTW
q Yi et al. (ICDE 1998) q Kim et al. (ICDE 2001) q Chu et al. (SDM 2002) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q …
n None of the existing methods for DTW fulfills all
the requirements
PODS 2005
- Y. Sakurai et al
14
Overview
n Introduction n Related work n Main ideas n Experimental results n Conclusions
PODS 2005
- Y. Sakurai et al
15
Main Idea (1) - LBS
n LBS (Lower Bounding distance measure with
Segmentation)
n P
A : Approximate sequences
q
: segment range
q
: upper value
q
: lower value
q t: length of time intervals*
) : (
U i L i R i
p p p =
U i
p
R i
p
L i
p
A
P
R
p1
R
p4
R
p3
t t t t
R
p2
PODS 2005
- Y. Sakurai et al
16
Main Idea (1) - LBS
R j
q
R i
p
n Compute lower bounding distance
q Distance of the two ranges and :
distance of their two closest points
R j
q
R i
p
Time Value Lower bound Time Value Lower bound =0
PODS 2005
- Y. Sakurai et al
17
Main Idea (1) - LBS
n Compute lower bounding distance
q Distance of the two ranges and :
distance of their two closest points
R j
q
ï ï î ï ï í ì >
- >
- =
) ( ) ( ) ( ) , (
- therwise
p q p q q p q p q p D
U i L j U i L j U j L i U j L i R j R i seg R i
p
details
PODS 2005
- Y. Sakurai et al
18
Main Idea (1) - LBS
P
Q
P
Q
n Exact DTW distance
PODS 2005
- Y. Sakurai et al
19
Main Idea (1) - LBS
n Compute lower bounding distance from P
A and QA
n Use a dynamic programming approach
A
P
A
Q
A
P
A
Q
) , ( ) , ( Q P D Q P D
dtw A A lbs
£
PODS 2005
- Y. Sakurai et al
20
Main Idea (1) - LBS
n Compute lower bounding distance from P
A and QA
n Use a dynamic programming approach
A
P
A
Q
) , ( ) , ( Q P D Q P D
dtw A A lbs
£
P
Q
PODS 2005
- Y. Sakurai et al
21
Main Idea (2) - EarlyStopping
n Exploit the fact that we have found k-near neighbors
at distance dcb
q dcb: k-nearest neighbor distance (the Current Best)
the exact distance of the best k candidates so far
PODS 2005
- Y. Sakurai et al
22
Main Idea (2) - EarlyStopping
n Exclude useless warping paths by using
q Omit g(1,3) if q Omit g(4,1) if
A
P
A
Q
g(1,2) g(3,1)
A
P
A
Q
cb
d g > ) 2 , 1 (
cb
d
cb
d g > ) 1 , 3 (
PODS 2005
- Y. Sakurai et al
23
Main Idea (3) - Refinement
n Q: How to choose t (length of time intervals)?
A
P
A
Q
g(1,2) g(3,1)
A
P
A
Q
t t
PODS 2005
- Y. Sakurai et al
24
Main Idea (3) - Refinement
n Q: How to choose t (length of intervals)? n A: Use multiple granularities, as follows:
A
P
A
Q
g(1,2) g(3,1)
A
P
A
Q
t t
PODS 2005
- Y. Sakurai et al
25
Main Idea (3) - Refinement
n Compute the lower bounding distance from the
coarsest sequences as the first refinement step
n Ignore P if , otherwise:
A
P
A
Q
g(1,2) g(3,1)
A
P
A
Q
cb A A lbs
d Q P D > ) , (
PODS 2005
- Y. Sakurai et al
26
Main Idea (3) - Refinement
n … compute the distance from more accurate
sequences as the second refinement step
n … repeat
A
P
A
Q
A
Q
A
P
PODS 2005
- Y. Sakurai et al
27
Main Idea (3) - Refinement
n … until the finest granularity n Update the list of k-nearest neighbors if
P
Q
P
Q
cb dtw
d Q P D £ ) , (
PODS 2005
- Y. Sakurai et al
28
Overview
n Introduction n Related work n Main ideas n Experimental results n Conclusions
PODS 2005
- Y. Sakurai et al
29
Experimental results
n Setup
q Intel Xeon 2.8GHz, 1GB memory, Linux q Datasets:
Temperature, Fintime, RandomWalk
q Four different time intervals (for n=2048)
t1=2, t2=8, t3=32, t4=128
n Evaluation
q Compared FTW with LB_PAA (the best so far) q Mainly computation time
PODS 2005
- Y. Sakurai et al
30
Outline of experiments
n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences
PODS 2005
- Y. Sakurai et al
31
Search Performance
n Itakura Parallelogram
P Q p1 pi pN q1 qj qM
PODS 2005
- Y. Sakurai et al
32
Search Performance
n Wall clock time as a function of data set size n Temperature
FTW is up to 50 times faster!
PODS 2005
- Y. Sakurai et al
33
Search Performance
n Wall clock time as a function of data set size n Fintime
FTW is up to 40 times faster!
PODS 2005
- Y. Sakurai et al
34
Search Performance
n Wall clock time as a function of data set size n RandomWalk
FTW is up to 40 times faster! More effective as the size grows
PODS 2005
- Y. Sakurai et al
35
Outline of experiments
n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences
PODS 2005
- Y. Sakurai et al
36
Search Performance
n Sakoe-Chiba Band
P Q p1 pi pN q1 qj qM
W1
P Q p1 pi pN q1 qj qM
W2
PODS 2005
- Y. Sakurai et al
37
Search Performance
n Wall clock time as a function of warping scope n Temperature
FTW is up to 220 times faster!
PODS 2005
- Y. Sakurai et al
38
Search Performance
n Wall clock time as a function of warping scope n Fintime
FTW is up to 70 times faster!
PODS 2005
- Y. Sakurai et al
39
Search Performance
n Wall clock time as a function of warping scope n RandomWalk
FTW is up to 100 times faster!
PODS 2005
- Y. Sakurai et al
40
Outline of experiments
n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences
PODS 2005
- Y. Sakurai et al
41
Effect of filtering
n Most of data sequences are excluded by coarser
approximations (t4=128 and t3=32)
q Using multiple granularities has significant advantages
Frequency of approximation use
PODS 2005
- Y. Sakurai et al
42
Outline of experiments
n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length sequences
PODS 2005
- Y. Sakurai et al
43
Difference in Sequence Lengths
n 5 sequence data sets
Random(2048,0): length 2048 +/- 0 Random(2048,32): length 2048 +/- 16 Random(2048,64), Random(2048,128), Random(2048,256)
Outperform by 2+ orders of magnitude LB_PAA can not handle
PODS 2005
- Y. Sakurai et al
44
Overview
n Introduction n Related work n Main ideas n Experimental results n Conclusions
PODS 2005
- Y. Sakurai et al
45
Conclusions
n
Design goals:
- 1. Fast
- 2. No false dismissals
- 3. No restriction on the sequence length
- 4. Support for any, as well as for no
restriction on “warping scope”
PODS 2005
- Y. Sakurai et al
46
Conclusions
n
Design goals:
- 1. Fast (up to 220 times faster)
- 2. No false dismissals
- 3. No restriction on the sequence length
- 4. Support for any, as well as for no
restriction on “warping scope”
PODS 2005
- Y. Sakurai et al
47
Page Accesses
n Sequential scan of feature data should boost
performance (speed-up factors SF=5, SF=10)
PAds: page accesses for data sequences PAfd: page accesses for feature data
ds fd SF
PA SF PA PA + =
details