ftw fast similarity search under the time warping distance
play

FTW: Fast Similarity Search under the Time Warping Distance Yasushi - PowerPoint PPT Presentation

FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.) Motivation n Time-series data q many applications n computational


  1. FTW: Fast Similarity Search under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Masatoshi Yoshikawa (Nagoya Univ.) Christos Faloutsos (Carnegie Mellon Univ.)

  2. Motivation n Time-series data q many applications n computational biology, astrophysics, geology, meteorology, multimedia, economics n Similarity search q Euclidean distance q DTW (Dynamic Time Warping) n Useful for different sequence lengths n Different sampling rates n scaling along the time axis PODS 2005 2 Y. Sakurai et al

  3. Mini-introduction to DTW n DTW allows sequences to be stretched along the time axis q Minimize the distance of sequences q Insert ‘stutters’ into a sequence q THEN compute the (Euclidean) distance original ‘ stutters’: PODS 2005 3 Y. Sakurai et al

  4. Mini-introduction to DTW n DTW is computed by dynamic programming q Warping path: set of grid cells in the time warping matrix Optimum warping path (the best alignment ) data sequence P of length q M N p i p N p 1 p -stutters Q q j q 1 q 1 p 1 p i p N q M q j P query sequence Q of length M q -stutters PODS 2005 4 Y. Sakurai et al

  5. Mini-introduction to DTW n DTW is computed by dynamic programming p 1 , p 2 , …, p i,; q 1 , q 2 , …, q j = D ( P , Q ) f ( N , M ) dtw p -stutter - ì f ( i , j 1 ) ï q -stutter = - + - f ( i , j ) p q min f ( i 1 , j ) í i j ï no stutter - - f ( i 1 , j 1 ) î PODS 2005 5 Y. Sakurai et al

  6. Mini-introduction to DTW n Global constraints limit the warping scope q Warping scope: area that the warping path is allowed to visit q M q M Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band Itakura Parallelogram PODS 2005 6 Y. Sakurai et al

  7. Mini-introduction to DTW n Width of the warping scope W is user-defined q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P Sakoe-Chiba Band PODS 2005 7 Y. Sakurai et al

  8. Motivation n Similarity search for time-series data q DTW (Dynamic Time Warping) n scaling along the time axis But… n High search cost O(NM) n prohibitive for long sequences PODS 2005 8 Y. Sakurai et al

  9. Our Solution, FTW n Requirements: 1. Fast 2. No false dismissals 3. No restriction on the sequence length n It should handle data sequences of different lengths 4. Support for any, as well as for no restriction on “warping scope” PODS 2005 9 Y. Sakurai et al

  10. Problem Definition n Given q S time-series data sequences of unequal lengths { P 1 , P 2 , …, P S } , q a query sequence Q , q an integer k , q (optionally) a warping scope W , n Find the k -nearest neighbors of Q from the data sequence set by using DTW with W PODS 2005 10 Y. Sakurai et al

  11. Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 11 Y. Sakurai et al

  12. Related Work n Sequence indexing q Agrawal et al. (FODO 1998) q Keogh et al. (SIGMOD 2001) q … n Subsequence matching q Faloutsos et al. (SIGMOD 1994) q Moon et al. (SIGMOD 2002) q … PODS 2005 12 Y. Sakurai et al

  13. Related Work n Fast sequence matching for DTW q Yi et al. (ICDE 1998) q Kim et al. (ICDE 2001) q Chu et al. (SDM 2002) q Keogh (VLDB 2002) q Zhu et al. (SIGMOD 2003) q … n None of the existing methods for DTW fulfills all the requirements PODS 2005 13 Y. Sakurai et al

  14. Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 14 Y. Sakurai et al

  15. Main Idea (1) - LBS n LBS (Lower Bounding distance measure with Segmentation) n P A : Approximate sequences R p : segment range q i U p : upper value q i L p : lower value q A P i p = R L U ( p : p ) i i i R p 4 R p 1 R q t : length of time intervals* R p 2 p 3 t t t t PODS 2005 15 Y. Sakurai et al

  16. Main Idea (1) - LBS n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points R p i Value Lower bound Value Lower bound =0 R q j Time Time PODS 2005 16 Y. Sakurai et al

  17. Main Idea (1) - LBS details n Compute lower bounding distance R R p q q Distance of the two ranges and : i j distance of their two closest points ì - > L U L U p q ( p q ) i j i j ï ï = - > R R L U L U D ( p , q ) q p ( q p ) í seg i j j i j i ï 0 ( otherwise ) ï î PODS 2005 17 Y. Sakurai et al

  18. Main Idea (1) - LBS n Exact DTW distance P Q Q P PODS 2005 18 Y. Sakurai et al

  19. Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A P A Q A Q A P PODS 2005 19 Y. Sakurai et al

  20. Main Idea (1) - LBS A and Q A n Compute lower bounding distance from P n Use a dynamic programming approach £ A A D ( P , Q ) D ( P , Q ) lbs dtw A Q Q A P P PODS 2005 20 Y. Sakurai et al

  21. Main Idea (2) - EarlyStopping n Exploit the fact that we have found k -near neighbors at distance d cb q d cb : k-nearest neighbor distance (the Current Best) the exact distance of the best k candidates so far PODS 2005 21 Y. Sakurai et al

  22. Main Idea (2) - EarlyStopping d n Exclude useless warping paths by using cb > g ( 1 , 2 ) d q Omit g (1,3) if cb > g ( 3 , 1 ) d q Omit g (4,1) if cb A P A Q g (1,2) A Q g (3,1) A P PODS 2005 22 Y. Sakurai et al

  23. Main Idea (3) - Refinement n Q: How to choose t (length of time intervals)? A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 23 Y. Sakurai et al

  24. Main Idea (3) - Refinement n Q: How to choose t (length of intervals)? n A: Use multiple granularities, as follows: A P A Q g(1,2) A Q t g(3,1) A P t PODS 2005 24 Y. Sakurai et al

  25. Main Idea (3) - Refinement n Compute the lower bounding distance from the coarsest sequences as the first refinement step > A A n Ignore P if , otherwise: D ( P , Q ) d lbs cb A P A Q g(1,2) A Q g(3,1) A P PODS 2005 25 Y. Sakurai et al

  26. Main Idea (3) - Refinement n … compute the distance from more accurate sequences as the second refinement step n … repeat A P A Q A Q A P PODS 2005 26 Y. Sakurai et al

  27. Main Idea (3) - Refinement n … until the finest granularity £ n Update the list of k -nearest neighbors if D ( P , Q ) d dtw cb P Q Q P PODS 2005 27 Y. Sakurai et al

  28. Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 28 Y. Sakurai et al

  29. Experimental results n Setup q Intel Xeon 2.8GHz, 1GB memory, Linux q Datasets: Temperature, Fintime, RandomWalk q Four different time intervals (for n =2048) t 1 =2, t 2 =8, t 3 =32, t 4 =128 n Evaluation q Compared FTW with LB_PAA (the best so far) q Mainly computation time PODS 2005 29 Y. Sakurai et al

  30. Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 30 Y. Sakurai et al

  31. Search Performance n Itakura Parallelogram q M Q q j q 1 p 1 p i p N P PODS 2005 31 Y. Sakurai et al

  32. Search Performance n Wall clock time as a function of data set size n Temperature FTW is up to 50 times faster! PODS 2005 32 Y. Sakurai et al

  33. Search Performance n Wall clock time as a function of data set size n Fintime FTW is up to 40 times faster! PODS 2005 33 Y. Sakurai et al

  34. Search Performance n Wall clock time as a function of data set size n RandomWalk FTW is up to 40 times faster! More effective as the size grows PODS 2005 34 Y. Sakurai et al

  35. Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 35 Y. Sakurai et al

  36. Search Performance n Sakoe-Chiba Band q M q M W 1 W 2 Q Q q j q j q 1 q 1 p 1 p i p N p 1 p i p N P P PODS 2005 36 Y. Sakurai et al

  37. Search Performance n Wall clock time as a function of warping scope n Temperature FTW is up to 220 times faster! PODS 2005 37 Y. Sakurai et al

  38. Search Performance n Wall clock time as a function of warping scope n Fintime FTW is up to 70 times faster! PODS 2005 38 Y. Sakurai et al

  39. Search Performance n Wall clock time as a function of warping scope n RandomWalk FTW is up to 100 times faster! PODS 2005 39 Y. Sakurai et al

  40. Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length data sequences PODS 2005 40 Y. Sakurai et al

  41. Effect of filtering n Most of data sequences are excluded by coarser approximations ( t 4 =128 and t 3 =32) q Using multiple granularities has significant advantages Frequency of approximation use PODS 2005 41 Y. Sakurai et al

  42. Outline of experiments n Speed vs db size n Speed vs warping scope W n Effect of filtering n Effect of varying-length sequences PODS 2005 42 Y. Sakurai et al

  43. Difference in Sequence Lengths n 5 sequence data sets Random (2048,0): length 2048 +/- 0 Random (2048,32): length 2048 +/- 16 Random (2048,64), Random (2048,128), Random (2048,256) Outperform by 2+ orders of magnitude LB_PAA can not handle PODS 2005 43 Y. Sakurai et al

  44. Overview n Introduction n Related work n Main ideas n Experimental results n Conclusions PODS 2005 44 Y. Sakurai et al

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend