Fast Algorithms for Coevolving Time Series Mining Lei Li Computer - - PowerPoint PPT Presentation

fast algorithms for coevolving time series mining
SMART_READER_LITE
LIVE PREVIEW

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer - - PowerPoint PPT Presentation

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer Science Department Carnegie Mellon University Advisor: Christos Faloutsos ICDE 2010 PHD workshop 3/21/2010 Thanks Organizers: Nikos Mamoulis Yannis


slide-1
SLIDE 1

Lei Li

Computer Science Department Carnegie Mellon University

Fast Algorithms for Coevolving Time Series Mining

3/21/2010

Advisor: Christos Faloutsos

ICDE 2010 PHD workshop

slide-2
SLIDE 2

Thanks

  • Organizers:

– Nikos Mamoulis – Yannis Papakonstantinou – Timos Sellis

  • Travel fellowship from NSF

– NSF grant IIS-0956600

2

slide-3
SLIDE 3

Coevolving Time Series (TS)

3

Temperature in datacenter Marker positions in mocap BGP updates in network

Need fast algorithms for time series mining

Chlorine level in water

slide-4
SLIDE 4

Outline

  • Motivation

– Mining tasks, goals, and problems

  • Completed Work

– P1:Mining w/ Missing Value [Li+ 2009] – P2:Parallel Learning [Li+ 2008b] – P3:Natural Motion Stitching [Li+ 2008a]

  • Conclusion

4

slide-5
SLIDE 5

M1: Natural Motion Generation

  • How to generate new realistic

motions from mocap database?

  • e.g. “karate kick”  “boxing”
  • Applications:

– Game ($57billion 2009) – Movie animation – Quality of Life (assistive devices)

5

slide-6
SLIDE 6

M2: Data Summarization

  • How to compress & manage large time

series?

– A datacenter with 5000 servers: 1TB data per day, 55 million streams ([Reeves+ 2009])

  • Goal: save energy in data center

– $4.5billion power for US dc’s 2006

7

CMU DCO

temperatures Time

slide-7
SLIDE 7

M3: Anomaly Detection

  • How to detect anomalies?
  • Applications:

– Intrusion computer network traffic (e.g. # of packets) – Detect leakage or attack in drinking water system by monitoring chlorine levels – Spam/robot in web clicks

8

slide-8
SLIDE 8

Time Series Mining Tasks

  • Pattern Discovery (e.g. cross-correlation, lag-

correlation) – T1:Forecasting – T2:Summarization – T3:Segmentation (detecting change points) – T4:Anomaly detection

  • Feature Extraction (e.g. wavelets coefficients)

– T5:Clustering – T6:Indexing TS database – T7:Visualization

10

slide-9
SLIDE 9

Goals for Mining Algorithms

  • G1:Effective:

– achieve low reconstruction error (mean square error) (DynaMMo, [Li+2009]) – high precision/recall, classification accuracy

  • G2:Scalable:

– to the size (e.g. length) of sequences – on modern hardware (Cut-And-Stitch [Li+2008b])

11

slide-10
SLIDE 10

Outline

  • Motivation
  • Completed Work

– P1: DynaMMo: Mining w/ Missing Value[Li+09]

  • Problem Definition
  • Intuition of Proposed Method
  • Results

– P2: Cut-And-Stitch: Parallel Learning [Li+08b] – P3: Natural Motion Stitching [Li+08a]

  • Conclusion

12

recovering compression segmentation

slide-11
SLIDE 11

Missing Values in Time Series

  • Motion Capture:

– Markers on human actors – Cameras used to track the 3D positions – Duration: 100-500 – 93 dimensional body-local coordinates after preprocessing (31-bones)

  • Sensor data missing due

to:

– Low battery – RF error

13

From mocap.cs.cmu.edu joint work w/ C. Faloutsos, J. McCann, N. Pollard. [Li et al, KDD 2009]

slide-12
SLIDE 12
  • Given
  • Find algorithms for:

– Recovering missing values – Compression/summarization (T2) – Segmentation (T3)

Time sensor 1 sensor 2 … sensorm blackout

Problem Definition [Li+2009]

14

slide-13
SLIDE 13

Problem Definition (cont’)

  • Want the algorithms to be:

– G1:Effective – G2:Scalable: to duration of sequences

15

Time sensor 1 sensor 2 … sensorm blackout

slide-14
SLIDE 14

Proposed Method: Intuition

16

Position of Left hand marker Position of right hand marker missing Recover using Correlation among multiple sequences

slide-15
SLIDE 15

Proposed Method: DynaMMo Intuition

17

missing Recover using Dynamics temporal moving pattern Position of Left hand marker Position of right hand marker more results in [Li et al 2009]

slide-16
SLIDE 16

Underlying Model

18

z1 = z0+ω0 zn+1 = F∙zn+ωn xn = G∙zn+εn

Z1 Z2 Z3 Z4 X1 X2 X3 X4 N(F∙z1, Λ) N(z0, Γ) N(G∙z3, Σ) N(F∙z2, Λ) N(G∙z1, Σ) N(G∙z2, Σ) N(G∙z4, Σ) N(F∙z3, Λ) N(F∙z4, Λ)

Model parameters: θ={z0, Γ, F, Λ, G, Σ}

Use Linear Dynamical Systems to model whole sequence.

  • bserved

partially

  • bserved

(details)

slide-17
SLIDE 17

Results – Better Missing Value Recovery

42

Reconstruction error Average missing length

Ideal

Proposed DynaMMo

MSVD [Srebro’03] Linear Interpolation Spline

Dataset: CMU Mocap #16 mocap.cs.cmu.edu

more results in [Li et al 2009]

slide-18
SLIDE 18

Results – Better Compression

43

Compression ratio error

DynaMMo w/ optimal compression

Ideal

Dataset: Chlorine levels

more results in [Li et al 2009]

slide-19
SLIDE 19

Results: segment synthetic data

  • Segment by threshold on reconstruction error

44

  • riginal data

reconstruction error

slide-20
SLIDE 20

Results – Segmentation

  • Find the transition during “running” to

“stop”.

45

left hip left femur

reconstruction error

slide-21
SLIDE 21

Results – Segmentation

  • Find the transition during “running” to

“stop”.

46

left hip left femur

reconstruction error

run stop slow down

slide-22
SLIDE 22
  • Motivation
  • Completed Work

– P1: DynaMMo: Mining w/ Missing Value [Li+09]

  • Contribution: the most accurate mining algorithms for

TS with missing value so far.

– P2: Cut-And-Stitch: Parallel Learning [Li+08b] – P3:Natural Motion Stitching [Li+08a]

  • Conclusion

Outline

47

slide-23
SLIDE 23

Outline

  • Motivation
  • Completed Work

– P1: DynaMMo: Mining w/ Missing Value[Li 09] – P2: Cut-And-Stitch: Parallel Learning [Li 08b]

  • Problem Definition
  • Basic Intuition
  • Results

48

Goals for Mining Algorithms

  • G1:Effective:

– achieve low reconstruction error (mean square error) (DynaMMo, [Li 2009]) – high precision/recall, classification accuracy

  • G2:Scalable:

– to the size (e.g. length) of sequences –

  • n modern hardware (Cut-And-Stitch

[Li 2008b])

slide-24
SLIDE 24

Recap Model for DynaMMo

49

z1 = z0+ω0 zn+1 = F∙zn+ωn xn = G∙zn+εn

Z1 Z2 Z3 Z4 X1 X2 X3 X4 N(F∙z1, Λ) N(z0, Γ) N(G∙z3, Σ) N(F∙z2, Λ) N(G∙z1, Σ) N(G∙z2, Σ) N(G∙z4, Σ) N(F∙z3, Λ) N(F∙z4, Λ)

Model parameters: θ={z0, Γ, F, Λ, G, Σ}

Use Linear Dynamical Systems to model whole sequence.

  • bserved

partially

  • bserved

(details)

slide-25
SLIDE 25

Challenge of Learning LDS: Expectation-Maximization Alg.

  • Not easy to parallelize on multi-processors

due to non-trivial data dependency (details in writeup)

  • Q: How to parallelize the learning to

achieve scalability?

51

Z1 Z2 Z3 Z4

X1 X2 X3 X4

N(F∙z1, Λ) N(z0, Γ) N(G∙z3, Σ) N(F∙z2, Λ) N(G∙z1, Σ) N(G∙z2, Σ) N(G∙z4, Σ) N(F∙z3, Λ) N(F∙z4, Λ)

slide-26
SLIDE 26

Challenge illustration Expectation-Maximization Alg.

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Timeline for E-step (forward-backward) in learning LDS

1 2 3 4 5

60

EM can

  • nly uses

Single CPU Due to data dependency

slide-27
SLIDE 27

Problem Definition

61

  • Problem:

– Given a sequence of numbers, design a parallel learning algorithm to find the best model parameters for Linear Dynamical Systems

  • Goal:

– Achieve ~ linear speed up on multi-core

  • Assumption:

– Shared memory architecture (e.g. multi-core)

slide-28
SLIDE 28

Proposed Method: Cut-And-Stitch

Step 1 Step 2 Step 3 Step 4

Intuition:

1 2 3 4 5

62

Goal: with 2 CPUs

Details in [Li et al 2008b]:

Joint work w/ Wenjie Fu, Fan Guo, Todd

  • C. Mowry, Christos Faloutsos.
slide-29
SLIDE 29

Near Linear Speedup

70

speedup

# of processors ideal

Proposed Cut-And-Stitch EM algorithm Dataset: 58 motion sequences CMU Mocap #16 mocap.cs.cmu.edu, tested on NCSA super computer,

more results in [Li et al 2008b]

slide-30
SLIDE 30

No loss of accuracy

71

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% (#16.22) (#16.01) (#16.45)

EM alg Cut-And-Stitch

~ IDENTICAL

Normalized Reconstruction Error

more results in [Li et al 2008b]

slide-31
SLIDE 31

Goals for Mining Algorithms

  • G1:Effective:

– achieve low reconstruction error (mean square error) (DynaMMo, [Li 2009]) – high precision/recall, classification accuracy

  • G2:Scalable:

– to the size (e.g. length) of sequences –

  • n modern hardware (Cut-And-Stitch [Li

2008b])

  • Motivation
  • Completed Work

– P1:DynaMMo: Mining w/ Missing Value [Li+09] – P2:Cut-And-Stitch:Parallel Learning [Li+08b]

  • Contribution: the 1st parallel algorithm for learning

LDS

Outline

72

slide-32
SLIDE 32

Outline

  • Motivation
  • Completed Work

– P1:DynaMMo: Mining w/ Missing Value [Li+09] – P2:Cut-And-Stitch:Parallel Learning [Li+08b] – P3:Natural Motion Stitching [Li+08a]

  • Problem Definition
  • Proposed Method
  • Results
  • Conclusion

73

slide-33
SLIDE 33

Motion Stitching A Database Approach

  • Select best

stitchable segments from a set of basic motion pieces and generate new natural motions

74

slide-34
SLIDE 34

Problem Definition

  • Given two motion-capture sequences that are to

be stitched together, how can we assess the goodness of the stitching?

75

1 2 3

Which stitching looks best?

Joint work w/ Jim McCann, Nancy Pollard, Christos Faloutsos [Li et al, Eurographics2008]

slide-35
SLIDE 35

Competitor: Euclidean distance fail

76

straight moving U-Turn Equally “good” under Euclidean distance

slide-36
SLIDE 36

Result – Synthetic Transition

78

Laziness-score prefer straightforward moving straight moving U-Turn more results in [Li 2008a]

slide-37
SLIDE 37

Conclusion

  • Pattern discovery w/ missing values

(DynaMMo)

– Recovering missing values – Compression – Segmentation

  • Scale up learning on multicore

– Parallel learning algorithm for LDS (Cut-And- Stitch)

  • Natural human motion stitching

– An intuitive distance function(Laziness score)79

slide-38
SLIDE 38

References

  • Lei Li, Jim McCann, Nancy Pollard, Christos
  • Faloutsos. DynaMMo: Mining and

Summarization of Coevolving Sequences with Missing Values. KDD '09.

  • Lei Li, Wenjie Fu, Fan Guo, Todd C.

Mowry, Christos Faloutsos. Cut-and-stitch: efficient parallel learning of linear dynamical systems on SMPs. KDD '08.

  • Lei Li, Jim McCann, Christos Faloutsos, Nancy
  • Pollard. Laziness is a virtue: Motion stitching

using effort minimization. Eurographics 2008.

80

slide-39
SLIDE 39

Question

  • Thanks!
  • contact: Lei Li (leili@cs.cmu.edu)
  • paper, software, dataset on

http://www.cs.cmu.edu/~leili

81