CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: - - PowerPoint PPT Presentation

โ–ถ
cs6220 data mining techniques
SMART_READER_LITE
LIVE PREVIEW

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: - - PowerPoint PPT Presentation

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 30, 2015 Announcement No class next week and see you on Dec. 14. The final report and presentation guideline is going to be


slide-1
SLIDE 1

CS6220: DATA MINING TECHNIQUES

Instructor: Yizhou Sun

yzsun@ccs.neu.edu November 30, 2015

Mining Time Series Data

slide-2
SLIDE 2

Announcement

  • No class next week and see you on Dec.

14.

  • The final report and presentation

guideline is going to be released soon.

  • Office hour:
  • Tuesday: 3:30-5:00pm
  • Friday: 2:30-4:00pm

2

slide-3
SLIDE 3

Methods to Learn

3

Matrix Data Text Data Set Data Sequence Data Time Series Graph & Network Images Classification

Decision Tree; Naรฏve Bayes; Logistic Regression SVM; kNN HMM Label Propagation* Neural Network

Clustering

K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means* PLSA SCAN*; Spectral Clustering*

Frequent Pattern Mining

Apriori; FP-growth GSP; PrefixSpan

Prediction

Linear Regression Autoregression

Similarity Search

DTW P-PageRank

Ranking

PageRank

slide-4
SLIDE 4

Mining Time Series Data

  • Basic Concepts
  • Time Series Prediction and Forecasting
  • Time Series Similarity Search
  • Summary

4

slide-5
SLIDE 5

Example: Inflation Rate Time Series

5

slide-6
SLIDE 6

Example: Unemployment Rate Time Series

6

slide-7
SLIDE 7

Example: Stock

7

slide-8
SLIDE 8

Example: Product Sale

8

slide-9
SLIDE 9

Time Series

  • A time series is a sequence of numerical data

points, measured typically at successive times, spaced at (often uniform) time intervals

  • Random variables for a time series are

Represented as:

  • ๐‘ = ๐‘

1, ๐‘ 2, โ€ฆ , ๐‘๐‘ 

  • ๐‘ = ๐‘

๐‘ข: ๐‘ข โˆˆ ๐‘ˆ , ๐‘ฅโ„Ž๐‘“๐‘ ๐‘“ ๐‘ˆ ๐‘—๐‘ก ๐‘ขโ„Ž๐‘“ ๐‘—๐‘œ๐‘’๐‘“๐‘ฆ ๐‘ก๐‘“๐‘ข

  • An observation of a time series with length N is

represent as:

  • ๐‘ = {๐‘ง1, ๐‘ง2, โ€ฆ , ๐‘ง๐‘‚}

9

slide-10
SLIDE 10

Mining Time Series Data

  • Basic Concepts
  • Time Series Prediction and Forecasting
  • Time Series Similarity Search
  • Summary

10

slide-11
SLIDE 11

Categories of Time-Series Movements

  • Categories of Time-Series Movements (T, C, S, I)
  • Long-term or trend movements (trend curve): general

direction in which a time series is moving over a long interval

  • f time
  • Cyclic movements or cycle variations: long term oscillations

about a trend line or curve

  • e.g., business cycles, may or may not be periodic
  • Seasonal movements or seasonal variations
  • E.g., almost identical patterns that a time series appears to

follow during corresponding months of successive years.

  • Irregular or random movements

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

Lag, Difference

  • The first lag of ๐‘

๐‘ข is ๐‘ ๐‘ขโˆ’1; the jth lag of ๐‘ ๐‘ข

is ๐‘

๐‘ขโˆ’๐‘˜

  • The first difference of a time series, ฮ”๐‘

๐‘ข =

๐‘

๐‘ข โˆ’ ๐‘ ๐‘ขโˆ’1

  • Sometimes difference in logarithm is used

ฮ”ln(๐‘

๐‘ข) = ln(๐‘ ๐‘ข) โˆ’ ln(๐‘ ๐‘ขโˆ’1)

13

slide-14
SLIDE 14

Example: First Lag and First Difference

14

slide-15
SLIDE 15

Autocorrelation

  • Autocorrelation: the correlation between

a time series and its lagged values

  • The first autocorrelation ๐œ1
  • The jth autocorrelation ๐œ๐‘˜

15

Autocovariance

slide-16
SLIDE 16

Sample Autocorrelation Calculation

  • The jth sample autocorrelation
  • ๐œ๐‘˜ =

๐‘‘๐‘๐‘ค(๐‘

๐‘ข,๐‘๐‘ขโˆ’๐‘˜)

๐‘ค๐‘๐‘ (๐‘

๐‘ข)

  • Where

๐‘‘๐‘๐‘ค(๐‘

๐‘ข, ๐‘ ๐‘ขโˆ’๐‘˜) is calculated as:

  • i.e., considering two time series: Y(1,โ€ฆ,T-j) and

Y(j+1,โ€ฆ,T)

16

๐‘

๐‘ข

๐‘

๐‘ขโˆ’๐‘˜

๐‘ง๐‘˜+1 ๐‘ง1 ๐‘ง๐‘˜+2 ๐‘ง2 โ‹ฎ โ‹ฎ ๐‘ง๐‘ˆโˆ’1 ๐‘ง๐‘ˆโˆ’๐‘˜โˆ’1 ๐‘ง๐‘ˆ ๐‘ง๐‘ˆโˆ’๐‘˜

slide-17
SLIDE 17

Example of Autocorrelation

  • For inflation and its change

17

๐‡๐Ÿ = ๐Ÿ. ๐Ÿ—๐Ÿ”, very high: Last quarterโ€™s inflation rate contains much information about this quarterโ€™s inflation rate

slide-18
SLIDE 18

Focus on Stationary Time Series

  • Stationary is key for time series

regression: Future is similar to the past in terms of distribution

18

slide-19
SLIDE 19

Autoregression

  • Use past values ๐‘

๐‘ขโˆ’1,๐‘ ๐‘ขโˆ’2, โ€ฆ to predict ๐‘ ๐‘ข

  • An autore

toregressi gression

  • n is a regression model in

which Yt is regressed against its own lagged values.

  • The number of lags used as regressors is called

the or

  • rder

er of the autoregression.

  • In a first order autoregression, Yt is regressed

against Ytโ€“1

  • In a pth order autoregression, Yt is regressed

against Ytโ€“1,Ytโ€“2,โ€ฆ,Ytโ€“p

19

slide-20
SLIDE 20

The First Order Autoregression Model AR(1)

  • AR(1) model:
  • The AR(1) model can be estimated by OLS

regression of Yt against Ytโ€“1

  • Testing ฮฒ1 = 0 vs. ฮฒ1 โ‰  0 provides a test of

the hypothesis that Ytโ€“1 is not useful for forecasting Yt

20

slide-21
SLIDE 21

Prediction vs. Forecast

  • A predicted value refers to the value of Y

predicted (using a regression) for an

  • bservation in the sample used to estimate

the regression โ€“ this is the usual definition

  • Predicted values are โ€œin sampleโ€
  • A forecast refers to the value of Y forecasted

for an observation not in the sample used to estimate the regression.

  • Forecasts are forecasts of the future โ€“ which

cannot have been used to estimate the regression.

21

slide-22
SLIDE 22

Time Series Regression with Additional Predictors

  • So far we have considered forecasting

models that use only past values of Y

  • It makes sense to add other variables (X)

that might be useful predictors of Y, above and beyond the predictive value of lagged values of Y:

  • 22
slide-23
SLIDE 23

Mining Time Series Data

  • Basic Concepts
  • Time Series Prediction and Forecasting
  • Time Series Similarity Search
  • Summary

23

slide-24
SLIDE 24

Why Similarity Search?

  • Wide applications
  • Find a time period with similar inflation rate

and unemployment time series?

  • Find a similar stock to Facebook?
  • Find a similar product to a query one

according to sale time series?

  • โ€ฆ

24

slide-25
SLIDE 25

Example

25 VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund

Two similar mutual funds in the different fund group

slide-26
SLIDE 26

Similarity Search for Time Series Data

  • Time Series Similarity Search
  • Euclidean distances and ๐‘€๐‘ž norms
  • Dynamic Time Warping (DTW)
  • Time Domain vs. Frequency Domain

26

slide-27
SLIDE 27

Euclidean Distance and Lp Norms

  • Given two time series with equal length n
  • ๐ท = ๐‘‘1, ๐‘‘2, โ€ฆ , ๐‘‘๐‘œ
  • ๐‘… = ๐‘Ÿ1, ๐‘Ÿ2, โ€ฆ , ๐‘Ÿ๐‘œ
  • ๐‘’ ๐ท, ๐‘… = โˆ‘|๐‘‘๐‘— โˆ’ ๐‘Ÿ๐‘—|๐‘ž 1/๐‘ž
  • When p=2, it is Euclidean distance

27

slide-28
SLIDE 28

Enhanced Lp Norm-based Distance

  • Issues with Lp Norm: cannot deal with
  • ffset and scaling in the Y-axis
  • Solution: normalizing the time series
  • ๐‘‘๐‘—

โ€ฒ = ๐‘‘๐‘—โˆ’๐œˆ(๐ท) ๐œ(๐ท)

28

slide-29
SLIDE 29

Dynamic Time Warping (DTW)

  • For two sequences that do not line up

well in X-axis, but share roughly similar shape

  • We need to warp the time axis to make better

alignment

29

slide-30
SLIDE 30

Goal of DTW

  • Given
  • Two sequences (with possible different

lengths):

  • ๐‘Œ = {๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘‚}
  • ๐‘ = {๐‘ง1, ๐‘ง2, โ€ฆ , ๐‘ง๐‘}
  • A local distance (cost) measure between ๐‘ฆ๐‘œ

and ๐‘ง๐‘›

  • Goal:
  • Find an alignment between X and Y, such that,

the overall cost is minimized

30

slide-31
SLIDE 31

Cost Matrix of Two Time Series

31

slide-32
SLIDE 32

Represent an Alignment by Warping Path

  • An (N,M)-warping path is a sequence ๐‘ž =

(๐‘ž1, ๐‘ž2, โ€ฆ , ๐‘ž๐‘€) with ๐‘ž๐‘š = (๐‘œ๐‘š, ๐‘›๐‘š), satisfying the three conditions:

  • Boundary condition: ๐‘ž1 = 1,1 , ๐‘ž๐‘€ = ๐‘‚, ๐‘
  • Starting from the first point and ending at last point
  • Monotonicity condition: ๐‘œ๐‘š and ๐‘›๐‘š are non-

decreasing with ๐‘š

  • Step size condition: ๐‘ž๐‘š+1 โˆ’ ๐‘ž๐‘š โˆˆ

0,1 , 1,0 , 1,1

  • Move one step right, up, or up-right

32

slide-33
SLIDE 33

Q: Which Path is a Warping Path?

33

slide-34
SLIDE 34

Optimal Warping Path

  • The total cost given a warping path p
  • ๐‘‘๐‘ž ๐‘Œ, ๐‘ = โˆ‘๐‘š ๐‘‘(๐‘ฆ๐‘œ๐‘š, ๐‘ง๐‘›๐‘š)
  • The optimal warping path p*
  • ๐‘‘๐‘žโˆ— ๐‘Œ, ๐‘ =

min ๐‘‘๐‘ž ๐‘Œ, ๐‘ ๐‘ž ๐‘—๐‘ก ๐‘๐‘œ ๐‘‚, ๐‘ โˆ’ ๐‘ฅ๐‘๐‘ ๐‘ž๐‘—๐‘œ๐‘• ๐‘ž๐‘๐‘ขโ„Ž

  • DTW distance between X and Y is defined as:
  • the optimal cost ๐‘‘๐‘žโˆ— ๐‘Œ, ๐‘

34

slide-35
SLIDE 35

How to Find p*?

  • Naรฏve solution:
  • Enumerate all the possible warping path
  • Exponential in N and M!

35

slide-36
SLIDE 36

Dynamic Programming for DTW

  • Dynamic programming:
  • Let D(n,m) denote the DTW distance between

X(1,โ€ฆ,n) and Y(1,โ€ฆ,m)

  • D is called accumulative cost matrix
  • Note D(N,M) = DTW(X,Y)
  • Recursively calculate D(n,m)
  • ๐ธ ๐‘œ, ๐‘› = min ๐ธ ๐‘œ โˆ’ 1, ๐‘› , ๐ธ ๐‘œ, ๐‘› โˆ’ 1 , ๐ธ ๐‘œ โˆ’ 1, ๐‘› โˆ’ 1

+ ๐‘‘(๐‘ฆ๐‘œ, ๐‘ง๐‘›)

  • When m or n = 1
  • ๐ธ ๐‘œ, 1 = โˆ‘๐‘™=1:๐‘œ ๐‘‘ ๐‘ฆ๐‘™, 1 ;
  • ๐ธ 1, ๐‘› = โˆ‘๐‘™=1:๐‘› ๐‘‘ 1, ๐‘ง๐‘™ ;

36

Time complexity: O(MN)

slide-37
SLIDE 37

Trace back to Get p* from D

37

slide-38
SLIDE 38

Example

38

slide-39
SLIDE 39

Time Domain vs. Frequency Domain

  • Many techniques for signal analysis require the data to be in

the frequency domain

  • Usually data-independent transformations are used
  • The transformation matrix is determined a

priori

  • discrete Fourier transform (DFT)
  • discrete wavelet transform (DWT)
  • The distance between two signals in the time domain is the

same as their Euclidean distance in the frequency domain

39

slide-40
SLIDE 40

Example of DFT

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

Example of DWT (with Harr Wavelet)

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

*Discrete Fourier Transformation

  • DFT does a good job of concentrating energy in

the first few coefficients

  • If we keep only first a few coefficients in DFT, we

can compute the lower bounds of the actual distance

  • Feature extraction: keep the first few coefficients

(F-index) as representative of the sequence

44

slide-45
SLIDE 45

*DFT (Cont.)

  • Parsevalโ€™s Theorem
  • The Euclidean distance between two signals in the time

domain is the same as their distance in the frequency domain

  • Keep the first few (say, 3) coefficients underestimates

the distance and there will be no false dismissals!

45

๏ƒฅ ๏ƒฅ

๏€ญ ๏€ฝ ๏€ญ ๏€ฝ

๏€ฝ

1 2 1 2

| | | |

n f f n t t

X x

| ] )[ ( ] )[ ( | | ] [ ] [ |

3 2 2

๏ƒฅ ๏ƒฅ

๏€ฝ ๏€ฝ

๏‚ฃ ๏€ญ ๏ƒž ๏‚ฃ ๏€ญ

f n t

f Q F f S F t Q t S ๏ฅ ๏ฅ

slide-46
SLIDE 46

Mining Time Series Data

  • Basic Concepts
  • Time Series Prediction and Forecasting
  • Time Series Similarity Search
  • Summary

46

slide-47
SLIDE 47

Summary

  • Time Series Prediction and Forecasting
  • Autocorrelation; autoregression
  • Time series similarity search
  • Euclidean distance and Lp norm
  • Dynamic time warping
  • Time domain vs. frequency domain

47