Time Series Mining and Forecasting Duen Horng (Polo) Chau Georgia - - PowerPoint PPT Presentation

time series
SMART_READER_LITE
LIVE PREVIEW

Time Series Mining and Forecasting Duen Horng (Polo) Chau Georgia - - PowerPoint PPT Presentation

CSE 6242 / CX 4242 Time Series Mining and Forecasting Duen Horng (Polo) Chau Georgia Tech Slides based on Prof. Christos Faloutsoss materials Outline Motivation Similarity search distance functions Linear


slide-1
SLIDE 1

Duen Horng (Polo) Chau
 Georgia Tech

Time Series


Mining and Forecasting

Slides based on Prof. Christos Faloutsos’s materials

CSE 6242 / CX 4242

slide-2
SLIDE 2

Outline

  • Motivation
  • Similarity search – distance functions
  • Linear Forecasting
  • Non-linear forecasting
  • Conclusions
slide-3
SLIDE 3

Problem definition

  • Given: one or more sequences

x1 , x2 , … , xt , … (y1, y2, … , yt , …) (… )

  • Find

– similar sequences; forecasts – patterns; clusters; outliers

slide-4
SLIDE 4

Motivation - Applications

  • Financial, sales, economic series
  • Medical

– ECGs +; blood pressure etc monitoring – reactions to new drugs – elderly care

slide-5
SLIDE 5

Motivation - Applications (cont’d)

  • ‘Smart house’

– sensors monitor temperature, humidity, air quality

  • video surveillance
slide-6
SLIDE 6

Motivation - Applications (cont’d)

  • Weather, environment/anti-pollution

– volcano monitoring – air/water pollutant monitoring

slide-7
SLIDE 7

Motivation - Applications (cont’d)

  • Computer systems

– ‘Active Disks’ (buffering, prefetching) – web servers (ditto) – network traffic monitoring – ...

slide-8
SLIDE 8

Stream Data: Disk accesses

time #bytes

slide-9
SLIDE 9

Problem #1:

Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress

year count lynx caught per year (packets per day; temperature per day)

slide-10
SLIDE 10

Problem#2: Forecast

Given xt, xt-1, …, forecast xt+1

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-11
SLIDE 11

Problem#2’: Similarity search

E.g.., Find a 3-tick pattern, similar to the last one

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-12
SLIDE 12

Problem #3:

  • Given: A set of correlated time sequences
  • Forecast ‘Sent(t)’

Number of packets

23 45 68 90

Time Tick

1 4 6 9 11

sent lost repeated

slide-13
SLIDE 13

Important observations

Patterns, rules, forecasting and similarity indexing are closely related:

  • To do forecasting, we need

– to find patterns/rules – to find similar settings in the past

  • to find outliers, we need to have forecasts

– (outlier = too far away from our forecast)

slide-14
SLIDE 14

Outline

  • Motivation
  • Similarity Search and Indexing
  • Linear Forecasting
  • Non-linear forecasting
  • Conclusions
slide-15
SLIDE 15

Outline

  • Motivation
  • Similarity search and distance functions

– Euclidean – Time-warping

  • ...
slide-16
SLIDE 16

Importance of distance functions

Subtle, but absolutely necessary:

  • A ‘must’ for similarity indexing (->

forecasting)

  • A ‘must’ for clustering

Two major families

– Euclidean and Lp norms – Time warping and variations

slide-17
SLIDE 17

Euclidean and Lp

=

− =

n i i i

y x y x D

1 2

) ( ) , ( ! !

x(t) y(t)

...

=

− =

n i p i i p

y x y x L

1

| | ) , ( ! ! L1: city-block = Manhattan L2 = Euclidean L∞

slide-18
SLIDE 18

Observation #1

  • Time sequence -> n-d

vector

...

Day-1 Day-2 Day-n

slide-19
SLIDE 19

Observation #2

Euclidean distance is closely related to

– cosine similarity – dot product – ‘cross-correlation’ function

...

Day-1 Day-2 Day-n

slide-20
SLIDE 20

Time Warping

  • allow accelerations - decelerations

– (with or w/o penalty)

  • THEN compute the (Euclidean) distance (+

penalty)

  • related to the string-editing distance
slide-21
SLIDE 21

Time Warping

‘stutters’:

slide-22
SLIDE 22

Time warping

Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix

  • f length j of second sequence y
slide-23
SLIDE 23

Thus, with no penalty for stutter, for sequences x1, x2, …, xi,; y1, y2, …, yj

! " ! # $ − − − − + − = ) , 1 ( ) 1 , ( ) 1 , 1 ( min ] [ ] [ ) , ( j i D j i D j i D j y i x j i D

x-stutter y-stutter no stutter

Time warping

slide-24
SLIDE 24

VERY SIMILAR to the string-editing distance

! " ! # $ − − − − + − = ) , 1 ( ) 1 , ( ) 1 , 1 ( min ] [ ] [ ) , ( j i D j i D j i D j y i x j i D

x-stutter y-stutter no stutter

Time warping

slide-25
SLIDE 25

Time warping

  • Complexity: O(M*N) - quadratic on the

length of the strings

  • Many variations (penalty for stutters; limit
  • n the number/percentage of stutters; …)
  • popular in voice processing 


[Rabiner + Juang]

slide-26
SLIDE 26

Other Distance functions

  • piece-wise linear/flat approx.; compare

pieces [Keogh+01] [Faloutsos+97]

  • ‘cepstrum’ (for voice [Rabiner+Juang])

– do DFT; take log of amplitude; do DFT again!

  • Allow for small gaps [Agrawal+95]

See tutorial by [Gunopulos + Das, SIGMOD01]

slide-27
SLIDE 27

Other Distance functions

  • In [Keogh+, KDD’04]: parameter-free, MDL

based

slide-28
SLIDE 28

Conclusions

Prevailing distances:

– Euclidean and – time-warping

slide-29
SLIDE 29

Outline

  • Motivation
  • Similarity search and distance functions
  • Linear Forecasting
  • Non-linear forecasting
  • Conclusions
slide-30
SLIDE 30

Linear Forecasting

slide-31
SLIDE 31

Forecasting “Prediction is very difficult, especially about the future.”

  • Nils Bohr 


Danish physicist and Nobel Prize laureate

slide-32
SLIDE 32

Outline

  • Motivation
  • ...
  • Linear Forecasting

– Auto-regression: Least Squares; RLS – Co-evolving time sequences – Examples – Conclusions

slide-33
SLIDE 33

Reference

[Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

slide-34
SLIDE 34

Problem#2: Forecast

  • Example: give xt-1, xt-2, …, forecast xt

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-35
SLIDE 35

Forecasting: Preprocessing

MANUALLY: remove trends spot periodicities

2 3 5 6 1 2 3 4 5 6 7 8 9 10 1 2 2 3 1 3 5 7 9 11 13

time time 7 days

slide-36
SLIDE 36

Problem#2: Forecast

  • Solution: try to express

xt as a linear function of the past: xt-1, xt-2, …, (up to a window of w)

Formally:

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick

??

slide-37
SLIDE 37

(Problem: Back-cast; interpolate)

  • Solution - interpolate: try to express

xt as a linear function of the past AND the future:

xt+1, xt+2, … xt+wfuture; xt-1, … xt-wpast

(up to windows of wpast , wfuture)

  • EXACTLY the same algo’s

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick

??

slide-38
SLIDE 38

40 45 50 55 60 65 70 75 80 85 15 25 35 45

Body weight

patient weight height 1 27 43 2 43 54 3 54 72 … … … N 25 ??

  • express what we don’t know (= “dependent variable”)
  • as a linear function of what we know (= “independent variable(s)”)

Body height

Refresher: Linear Regression

slide-39
SLIDE 39

40 45 50 55 60 65 70 75 80 85 15 25 35 45

Body weight

patient weight height 1 27 43 2 43 54 3 54 72 … … … N 25 ??

  • express what we don’t know (= “dependent variable”)
  • as a linear function of what we know (= “independent variable(s)”)

Body height

Refresher: Linear Regression

slide-40
SLIDE 40

40 45 50 55 60 65 70 75 80 85 15 25 35 45

Body weight

patient weight height 1 27 43 2 43 54 3 54 72 … … … N 25 ??

  • express what we don’t know (= “dependent variable”)
  • as a linear function of what we know (= “independent variable(s)”)

Body height

Refresher: Linear Regression

slide-41
SLIDE 41

40 45 50 55 60 65 70 75 80 85 15 25 35 45

Body weight

patient weight height 1 27 43 2 43 54 3 54 72 … … … N 25 ??

  • express what we don’t know (= “dependent variable”)
  • as a linear function of what we know (= “independent variable(s)”)

Body height

Refresher: Linear Regression

slide-42
SLIDE 42

Time Packets Sent (t-1) Packets Sent(t) 1

  • 43

2 43 54 3 54 72 … … … N 25 ??

Linear Auto Regression

slide-43
SLIDE 43

Linear Auto Regression

#packets sent at time t-1 #packets sent at time t

Time Packets Sent (t-1) Packets Sent(t) 1

  • 43

2 43 54 3 54 72 … … … N 25 ??

Lag w = 1

Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

‘lag-plot’

slide-44
SLIDE 44

Linear Auto Regression

#packets sent at time t-1 #packets sent at time t

Time Packets Sent (t-1) Packets Sent(t) 1

  • 43

2 43 54 3 54 72 … … … N 25 ??

Lag w = 1

Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

‘lag-plot’

slide-45
SLIDE 45

Linear Auto Regression

#packets sent at time t-1 #packets sent at time t

Time Packets Sent (t-1) Packets Sent(t) 1

  • 43

2 43 54 3 54 72 … … … N 25 ??

Lag w = 1

Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

‘lag-plot’

slide-46
SLIDE 46

Linear Auto Regression

#packets sent at time t-1 #packets sent at time t

Time Packets Sent (t-1) Packets Sent(t) 1

  • 43

2 43 54 3 54 72 … … … N 25 ??

Lag w = 1

Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

‘lag-plot’

slide-47
SLIDE 47

Outline

  • Motivation
  • ...
  • Linear Forecasting

– Auto-regression: Least Squares; RLS – Co-evolving time sequences – Examples – Conclusions

slide-48
SLIDE 48

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES!

xt-2 xt xt-1

slide-49
SLIDE 49

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! (we’ll fit a hyper-plane, then!)

xt-2 xt xt-1

slide-50
SLIDE 50

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! (we’ll fit a hyper-plane, then!)

xt-2 xt-1 xt

slide-51
SLIDE 51

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! The problem becomes:

X[N ×w] × a[w ×1] = y[N ×1]

  • OVER-CONSTRAINED

– a is the vector of the regression coefficients – X has the N values of the w indep. variables – y has the N values of the dependent variable

slide-52
SLIDE 52

More details:

  • X[N ×w] × a[w ×1] = y[N ×1]

! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ % & = ! ! ! ! " # $ $ $ $ % & × ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ % &

N w Nw N N w w

y y y a a a X X X X X X X X X ! ! ! ! … ! ! ! … "

2 1 2 1 2 1 2 22 21 1 12 11

, , , , , , , , ,

Ind-var1 Ind-var-w time

slide-53
SLIDE 53

More details:

  • X[N ×w] × a[w ×1] = y[N ×1]

! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ % & = ! ! ! ! " # $ $ $ $ % & × ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ % &

N w Nw N N w w

y y y a a a X X X X X X X X X ! ! ! ! … ! ! ! … "

2 1 2 1 2 1 2 22 21 1 12 11

, , , , , , , , ,

Ind-var1 Ind-var-w time

slide-54
SLIDE 54

More details

  • Q2: How to estimate a1, a2, … aw = a?
  • A2: with Least Squares fit
  • (Moore-Penrose pseudo-inverse)
  • a is the vector that minimizes the RMSE

from y a = ( XT × X )-1 × (XT × y)

slide-55
SLIDE 55

More details

  • Straightforward solution:
  • Observations:

– Sample matrix X grows over time – needs matrix inversion – O(N×w2) computation – O(N×w) storage a = ( XT × X )-1 × (XT × y)

a : Regression Coeff. Vector X : Sample Matrix

XN:

w N

slide-56
SLIDE 56

Even more details

  • Q3: Can we estimate a incrementally?
  • A3: Yes, with the brilliant, classic method of

“Recursive Least Squares” (RLS) (see, e.g., [Yi+00], for details).

  • We can do the matrix inversion, WITHOUT

inversion! (How is that possible?!)

slide-57
SLIDE 57

Even more details

  • Q3: Can we estimate a incrementally?
  • A3: Yes, with the brilliant, classic method of

“Recursive Least Squares” (RLS) 
 (see, e.g., [Yi+00], for details).

  • We can do the matrix inversion, WITHOUT

inversion! (How is that possible?!)

  • A: our matrix has special form: (XT X)
slide-58
SLIDE 58

More details

XN:

w N

XN+1

At the N+1 time tick:

xN+1 SKIP

slide-59
SLIDE 59
  • Let GN = ( XN

T × XN )-1 (“gain matrix”)

  • GN+1 can be computed recursively from GN 


without matrix inversion

GN

w w SKIP

More details: key ideas

slide-60
SLIDE 60

Comparison:

  • Straightforward Least

Squares

– Needs huge matrix
 (growing in size) O(N×w) – Costly matrix operation O(N×w2)

  • Recursive LS

– Need much smaller, fixed size matrix O(w×w) – Fast, incremental computation O(1×w2) – no matrix inversion N = 106, w = 1-100

slide-61
SLIDE 61

EVEN more details:

N N T N N N N

G x x G c G G × × × × − =

+ + − + 1 1 1 1

] [ ] [

] 1 [

1 1 T N N N

x G x c

+ +

× × + =

1 x w row vector Let’s elaborate (VERY IMPORTANT, VERY VALUABLE!) SKIP

slide-62
SLIDE 62

EVEN more details:

] [ ] [

1 1 1 1 1 + + − + +

× × × =

N T N N T N

y X X X a

SKIP

slide-63
SLIDE 63

EVEN more details:

] [ ] [

1 1 1 1 1 + + − + +

× × × =

N T N N T N

y X X X a

[w x 1] [w x (N+1)] [(N+1) x w] [w x (N+1)] [(N+1) x 1] SKIP

slide-64
SLIDE 64

EVEN more details:

] [ ] [

1 1 1 1 1 + + − + +

× × × =

N T N N T N

y X X X a

[w x (N+1)] [(N+1) x w] SKIP

slide-65
SLIDE 65

EVEN more details:

N N T N N N N

G x x G c G G × × × × − =

+ + − + 1 1 1 1

] [ ] [

] 1 [

1 1 T N N N

x G x c

+ +

× × + =

wxw wxw wxw wx1 1xw wxw 1x1 SCALAR! SKIP

] [ ] [

1 1 1 1 1 + + − + +

× × × =

N T N N T N

y X X X a

1 1 1 1

] [

− + + +

× ≡

N T N N

X X G

‘gain matrix’

slide-66
SLIDE 66

Altogether:

I G δ ≡

where I: w x w identity matrix δ: a large positive number

SKIP

slide-67
SLIDE 67

Comparison:

  • Straightforward Least

Squares

– Needs huge matrix
 (growing in size) O(N×w) – Costly matrix operation O(N×w2)

  • Recursive LS

– Need much smaller, fixed size matrix O(w×w) – Fast, incremental computation O(1×w2) – no matrix inversion N = 106, w = 1-100

slide-68
SLIDE 68

Pictorially:

  • Given:

Independent Variable Dependent Variable

slide-69
SLIDE 69

Pictorially:

Independent Variable Dependent Variable

.

new point

slide-70
SLIDE 70

Pictorially:

Independent Variable Dependent Variable

RLS: quickly compute new best fit

new point

slide-71
SLIDE 71

Even more details

  • Q4: can we ‘forget’ the older samples?
  • A4: Yes - RLS can easily handle that [Yi+00]:
slide-72
SLIDE 72

Adaptability - ‘forgetting’

Independent Variable eg., #packets sent Dependent Variable eg., #bytes sent

slide-73
SLIDE 73

Adaptability - ‘forgetting’

Independent Variable

  • eg. #packets sent

Dependent Variable eg., #bytes sent Trend change (R)LS with no forgetting

slide-74
SLIDE 74

Adaptability - ‘forgetting’

Independent Variable Dependent Variable Trend change (R)LS with no forgetting (R)LS with forgetting

  • RLS: can *trivially* handle ‘forgetting’