Mahdi Roozbahani Lecturer, Computational Science and Engineering, - - PowerPoint PPT Presentation

mahdi roozbahani
SMART_READER_LITE
LIVE PREVIEW

Mahdi Roozbahani Lecturer, Computational Science and Engineering, - - PowerPoint PPT Presentation

Class Website CX4242: Time Series Mining and Forecasting Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech Outline Motivation Similarity search distance functions Linear Forecasting Non-linear


slide-1
SLIDE 1

Class Website

CX4242: Time Series Mining and Forecasting

Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

slide-2
SLIDE 2

Outline

  • Motivation
  • Similarity search – distance functions
  • Linear Forecasting
  • Non-linear forecasting
  • Conclusions
slide-3
SLIDE 3

Problem definition

  • Given: one or more sequences

x1 , x2 , … , xt , … (y1, y2, … , yt , …) (… )

  • Find

– similar sequences; forecasts – patterns; clusters; outliers

slide-4
SLIDE 4

Motivation - Applications

  • Financial, sales, economic series
  • Medical

– ECGs +; blood pressure etc monitoring – reactions to new drugs – elderly care

slide-5
SLIDE 5

Motivation - Applications (cont’d)

  • ‘Smart house’

– sensors monitor temperature, humidity, air quality

  • video surveillance
slide-6
SLIDE 6

Motivation - Applications (cont’d)

  • Weather, environment/anti-pollution

– volcano monitoring – air/water pollutant monitoring

slide-7
SLIDE 7

Motivation - Applications (cont’d)

  • Computer systems

– ‘Active Disks’ (buffering, prefetching) – web servers (ditto) – network traffic monitoring – ...

slide-8
SLIDE 8

Stream Data: Disk accesses

time #bytes

slide-9
SLIDE 9

Problem #1:

Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress

year count lynx caught per year (packets per day; temperature per day)

slide-10
SLIDE 10

Problem#2: Forecast

Given xt, xt-1, …, forecast xt+1

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-11
SLIDE 11

Problem#2’: Similarity search

E.g.., Find a 3-tick pattern, similar to the last one

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-12
SLIDE 12

Problem #3:

  • Given: A set of correlated time sequences
  • Forecast ‘Sent(t)’

23 45 68 90 1 4 6 9 11

Number of packets Time Tick sent lost repeated

slide-13
SLIDE 13

Important observations

Patterns, rules, forecasting and similarity indexing are closely related:

  • To do forecasting, we need

– to find patterns/rules – to find similar settings in the past

  • to find outliers, we need to have forecasts

– (outlier = too far away from our forecast)

slide-14
SLIDE 14

Outline

  • Motivation
  • Similarity search and distance functions

– Euclidean – Time-warping

  • ...
slide-15
SLIDE 15

Importance of distance functions

Subtle, but absolutely necessary:

  • A ‘must’ for similarity indexing (->

forecasting)

  • A ‘must’ for clustering

Two major families

– Euclidean and Lp norms – Time warping and variations

slide-16
SLIDE 16

Euclidean and Lp

x(t) y(t)

... L1: city-block = Manhattan L2 = Euclidean L

slide-17
SLIDE 17

Observation #1

Time sequence -> n-d vector

...

Day-1 Day-2 Day-n

slide-18
SLIDE 18

Observation #2

Euclidean distance is closely related to

– cosine similarity – dot product

...

Day-1 Day-2 Day-n

slide-19
SLIDE 19

Time Warping

  • allow accelerations - decelerations

– (with or without penalty)

  • THEN compute the (Euclidean) distance (+

penalty)

  • related to the string-editing distance
slide-20
SLIDE 20

Time Warping

‘stutters’:

slide-21
SLIDE 21

Time warping

Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix

  • f length j of second sequence y
slide-22
SLIDE 22

http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWalgorithm.htm

Time warping

slide-23
SLIDE 23

Thus, with no penalty for stutter, for sequences x1, x2, …, xi,; y1, y2, …, yj x-stutter y-stutter no stutter

Time warping

https://nipunbatra.github.io/blog/2014/dtw.html

slide-24
SLIDE 24

VERY SIMILAR to the string-editing distance x-stutter y-stutter no stutter

Time warping

slide-25
SLIDE 25

Time warping

  • Complexity: O(M*N) - quadratic on the

length of the strings

  • Many variations (penalty for stutters; limit
  • n the number/percentage of stutters; …)
  • popular in voice processing

[Rabiner + Juang]

slide-26
SLIDE 26

Other Distance functions

  • piece-wise linear/flat approx.; compare

pieces [Keogh+01] [Faloutsos+97]

  • ‘cepstrum’ (for voice [Rabiner+Juang])

– do DFT; take log of amplitude; do DFT again!

  • Allow for small gaps [Agrawal+95]

See tutorial by [Gunopulos + Das, SIGMOD01]

slide-27
SLIDE 27

Other Distance functions

  • In [Keogh+, KDD’04]: parameter-free,

MDL based

slide-28
SLIDE 28

Conclusions

Prevailing distances:

– Euclidean and – time-warping

slide-29
SLIDE 29

Outline

  • Motivation
  • Similarity search and distance functions
  • Linear Forecasting
  • Non-linear forecasting
  • Conclusions
slide-30
SLIDE 30

Linear Forecasting

slide-31
SLIDE 31

Outline

  • Motivation
  • ...
  • Linear Forecasting

– Auto-regression: Least Squares; RLS – Co-evolving time sequences – Examples – Conclusions

slide-32
SLIDE 32

Problem#2: Forecast

  • Example: give xt-1, xt-2, …, forecast xt

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick Number of packets sent

??

slide-33
SLIDE 33

Forecasting: Preprocessing

MANUALLY: remove trends spot periodicities

2 3 5 6 1 2 3 4 5 6 7 8 9 10 1 2 2 3 4 1 2 3 4 5 6 7 8 9 1011121314

time time 7 days

https://machinelearningmastery.com/time-series-trends-in-python/

slide-34
SLIDE 34

Problem#2: Forecast

  • Solution: try to express

xt as a linear function of the past: xt-1, xt-2, …, (up to a window of w)

Formally:

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick

??

slide-35
SLIDE 35

(Problem: Back-cast; interpolate)

  • Solution - interpolate: try to express

xt as a linear function of the past AND the future:

xt+1, xt+2, … xt+wfuture; xt-1, … xt-wpast

(up to windows of wpast , wfuture)

  • EXACTLY the same algo’s

10 20 30 40 50 60 70 80 90 1 3 5 7 9 11

Time Tick

??

slide-36
SLIDE 36

40 45 50 55 60 65 70 75 80 85 15 25 35 45

Body weight

Express what we don’t know (= “dependent variable”)

as a linear function of what we know (= “independent variable(s)”)

Body height

Refresher: Linear Regression

slide-37
SLIDE 37

Linear Auto Regression

slide-38
SLIDE 38

Linear Auto Regression

#packets sent at time t-1 #packets sent at time t

Lag w = 1

Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

‘lag-plot’

slide-39
SLIDE 39

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES!

xt-2 xt xt-1

slide-40
SLIDE 40

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! (we’ll fit a hyper-plane, then!)

xt-2 xt xt-1

slide-41
SLIDE 41

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! (we’ll fit a hyper-plane, then!)

xt-2 xt-1 xt

slide-42
SLIDE 42

More details:

  • Q1: Can it work with window w > 1?
  • A1: YES! The problem becomes:

X[N w]  a[w 1] = y[N 1]

  • OVER-CONSTRAINED

– a is the vector of the regression coefficients – X has the N values of the w indep. variables – y has the N values of the dependent variable

slide-43
SLIDE 43

More details:

  • X[N w]  a[w 1] = y[N 1]

Ind-var1 Ind-var-w time

slide-44
SLIDE 44

More details:

  • X[N w]  a[w 1] = y[N 1]

Ind-var1 Ind-var-w time

slide-45
SLIDE 45

More details

  • Q2: How to estimate a1, a2, … aw = a?
  • A2: with Least Squares fit
  • (Moore-Penrose pseudo-inverse)
  • a is the vector that minimizes the RMSE

from y a = ( XT  X )-1  (XT  y)

slide-46
SLIDE 46

More details

  • Straightforward solution:
  • Observations:

– Sample matrix X grows over time – needs matrix inversion – O(Nw2) computation – O(Nw) storage a = ( XT  X )-1  (XT  y)

a : Regression Coeff. Vector X : Sample Matrix

XN:

w N

slide-47
SLIDE 47

Even more details

  • Q3: Can we estimate a incrementally?
  • A3: Yes, with the brilliant, classic method of

“Recursive Least Squares” (RLS) (see, e.g., [Yi+00], for details).

  • We can do the matrix inversion, WITHOUT

inversion! (How is that possible?!)

slide-48
SLIDE 48

Even more details

  • Q3: Can we estimate a incrementally?
  • A3: Yes, with the brilliant, classic method of

“Recursive Least Squares” (RLS) (see, e.g., [Yi+00], for details).

  • We can do the matrix inversion, WITHOUT

inversion! (How is that possible?!)

  • A: our matrix has special form: (XT X)
slide-49
SLIDE 49

More details

XN:

w N

XN+1

At the N+1 time tick:

xN+1 SKIP

slide-50
SLIDE 50
  • Let GN = ( XN

T  XN )-1 (“gain matrix”)

  • GN+1 can be computed recursively from GN

without matrix inversion

GN

w w SKIP

More details: key ideas

slide-51
SLIDE 51

Comparison:

  • Straightforward Least

Squares

– Needs huge matrix (growing in size) O(N×w) – Costly matrix operation O(N×w2)

  • Recursive LS

– Need much smaller, fixed size matrix O(w×w) – Fast, incremental computation O(1×w2) – no matrix inversion N = 106, w = 1-100

slide-52
SLIDE 52

EVEN more details:

1 x w row vector Let’s elaborate (VERY IMPORTANT, VERY VALUABLE!) SKIP

slide-53
SLIDE 53

EVEN more details:

SKIP

slide-54
SLIDE 54

EVEN more details:

[w x 1] [w x (N+1)] [(N+1) x w] [w x (N+1)] [(N+1) x 1] SKIP

slide-55
SLIDE 55

EVEN more details:

[w x (N+1)] [(N+1) x w] SKIP

slide-56
SLIDE 56

EVEN more details:

wxw wxw wxw wx1 1xw wxw 1x1 SCALAR! SKIP ‘gain matrix’

slide-57
SLIDE 57

Altogether:

where I: w x w identity matrix d: a large positive number

SKIP

slide-58
SLIDE 58

Comparison:

  • Straightforward Least

Squares

– Needs huge matrix (growing in size) O(N×w) – Costly matrix operation O(N×w2)

  • Recursive LS

– Need much smaller, fixed size matrix O(w×w) – Fast, incremental computation O(1×w2) – no matrix inversion N = 106, w = 1-100

slide-59
SLIDE 59

Pictorially:

  • Given:

Independent Variable Dependent Variable

slide-60
SLIDE 60

Pictorially:

Independent Variable Dependent Variable

.

new point

slide-61
SLIDE 61

Pictorially:

Independent Variable Dependent Variable

RLS: quickly compute new best fit

new point

slide-62
SLIDE 62

Even more details

  • Q4: can we ‘forget’ the older samples?
  • A4: Yes - RLS can easily handle that [Yi+00]:
slide-63
SLIDE 63

Adaptability - ‘forgetting’

Independent Variable eg., #packets sent Dependent Variable eg., #bytes sent

slide-64
SLIDE 64

Adaptability - ‘forgetting’

Independent Variable

  • eg. #packets sent

Dependent Variable eg., #bytes sent Trend change (R)LS with no forgetting

slide-65
SLIDE 65

Adaptability - ‘forgetting’

Independent Variable Dependent Variable Trend change (R)LS with no forgetting (R)LS with forgetting

  • RLS: can *trivially* handle ‘forgetting’