Marcel Dettling Institute for Data Analysis and Process Design - - PowerPoint PPT Presentation

marcel dettling
SMART_READER_LITE
LIVE PREVIEW

Marcel Dettling Institute for Data Analysis and Process Design - - PowerPoint PPT Presentation

Applied Time Series Analysis FS 2012 Week 02 Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zrich, February 27, 2012 Marcel


slide-1
SLIDE 1

1

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Marcel Dettling

Institute for Data Analysis and Process Design Zurich University of Applied Sciences

marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling

ETH Zürich, February 27, 2012

slide-2
SLIDE 2

2

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Stochastic Model for Time Series

Def: A time series process is a set of random variables, where is the set of times. Each of the random variables has a univariate probability distribution .

  • If we exclusively consider time series processes with

equidistant time intervals, we can enumerate

  • An observed time series is a realization of ,

and is denoted with small letters as .

  • We have a multivariate distribution, but only 1 observation

(i.e. 1 realization from this distribution) is available. In order to perform “statistics”, we require some additional structure.

 

,

t

X t  ,

t

X t 

t

F

 

1,2,3,... T 

 

1,

,

n

X X X  

1

( , , )

n

x x x  

slide-3
SLIDE 3

3

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Stationarity

For being able to do statistics with time series, we require that the series “doesn’t change its probabilistic character” over time. This is mathematically formulated by strict stationarity. Def: A time series is strictly stationary, if the joint distribution of the random vector is equal to the one of for all combinations of t, s and k.  all are identically distributed all have identical expected value all have identical variance the autocov depends only on the lag

 

,

t

X t

( , , )

t t k

X X   ( , , )

s s k

X X  

t

X

t

X

t

X h ~

t

X F [ ]

t

E X  

2

( )

t

Var X   ( , )

t t h h

Cov X X 

slide-4
SLIDE 4

4

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Stationarity

It is impossible to „prove“ the theoretical concept of stationarity from data. We can only search for evidence in favor or against it. However, with strict stationarity, even finding evidence only is too

  • difficult. We thus resort to the concept of weak stationarity.

Def: A time series is said to be weakly stationary, if for all lags and thus also: Note that weak stationarity is sufficient for „practical purposes“.

 

,

t

X t

[ ]

t

E X   ( , )

t t h h

Cov X X 

 h

2

( )

t

Var X  

slide-5
SLIDE 5

5

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Testing Stationarity

  • In time series analysis, we need to verify whether the series

has arisen from a stationary process or not. Be careful: stationarity is a property of the process, and not of the data.

  • Treat stationarity as a hypothesis! We may be able to reject it

when the data strongly speak against it. However, we can never prove stationarity with data. At best, it is plausible.

  • Formal tests for stationarity do exist ( see scriptum). We

discourage their use due to their low power for detecting general non-stationarity, as well as their complexity.  Use the time series plot for deciding on stationarity!

slide-6
SLIDE 6

6

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Evidence for Non-Stationarity

  • Trend, i.e. non-constant expected value
  • Seasonality, i.e. deterministic, periodical oscillations
  • Non-constant variance, i.e. multiplicative error
  • Non-constant dependency structure

Remark: Note that some periodical oscillations, as for example in the lynx data, can be stochastic and thus, the underlying process is assumed to be stationary. However, the boundary between the two is fuzzy.

slide-7
SLIDE 7

7

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Strategies for Detecting Non-Stationarity

1) Time series plot

  • non-constant expected value (trend/seasonal effect)
  • changes in the dependency structure
  • non-constant variance

2) Correlogram (presented later...)

  • non-constant expected value (trend/seasonal effect)
  • changes in the dependency structure

A (sometimes) useful trick, especially when working with the correlogram, is to split up the series in two or more parts, and producing plots for each of the pieces separately.

slide-8
SLIDE 8

8

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Example: Simulated Time Series 1

Simulated Time Series Example

Time ts.sim 100 200 300 400

  • 4
  • 2

2 4 6

slide-9
SLIDE 9

9

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Example: Simulated Time Series 2

Simulated Time Series Example

Time ts.sim 100 200 300 400

  • 10
  • 5

5 10

slide-10
SLIDE 10

10

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Example: Simulated Time Series 3

Simulated Time Series Example

Time ts.sim 100 200 300 400

  • 15
  • 10
  • 5
slide-11
SLIDE 11

11

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Example: Simulated Time Series 4

Simulated Time Series Example

Time 100 200 300 400

  • 4
  • 2

2 4

slide-12
SLIDE 12

12

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Time Series in R

  • In R, there are objects, which are organized in a large

number of classes. These classes e.g. include vectors, data frames, model output, functions, and many more. Not surprisingly, there are also several classes for time series.

  • We focus on ts, the basic class for regularly spaced time

series in R. This class is comparably simple, as it can only represent time series with fixed interval records, and only uses numeric time stamps, i.e. enumerates the index set.

  • For defining a ts object, we have to supply the data, but

also the starting time (as argument start), and the frequency

  • f measurements as argument frequency.
slide-13
SLIDE 13

13

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Time Series in R: Example

Data: number of days per year with traffic holdups in front of the Gotthard road tunnel north entrance in Switzerland.

> rawdat <- c(88, 76, 112, 109, 91, 98, 139) > ts.dat <- ts(rawdat, start=2004, freq=1) > ts.dat Time Series: Start = 2004 End = 2010; Frequency = 1 [1] 88 76 112 109 91 98 139 2004 2005 2006 2007 2008 2009 2010 88 76 112 109 91 98 139

slide-14
SLIDE 14

14

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Time Series in R: Example

> plot(ts.dat, ylab="# of Days", main="Traffic Holdups")

Traffic Holdups

Time # of Days 2004 2005 2006 2007 2008 2009 2010 80 90 100 120 140

slide-15
SLIDE 15

15

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Addendum: Daily Data and Leap Years

Example from Exercises: Rainfall Data, 8 years with daily data from 2000-2007. While 2001-2003 and 2005-2007 have 365 days each, years 2000 and 2004 are leap years with 366 days.

  • Do never cancel the leap days, and neither introduce

missing values for Feb 29 in non-leap years.

  • Is this a (deterministically) periodic series? Using the

Gregorian calendar, we can say the time unit is 4 years, and the frequency is .

  • Physically, we can say that the frequency equals

.

366 (3 365) 1461    365.25

slide-16
SLIDE 16

16

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Further Topics in R

The scriptum discusses some further topics which are of interest when doing time series analysis in R:

  • Handling of dates and times in R
  • Reading/Importing data into R

 Please thoroughly read and study these chapters. Examples will be shown/discussed in the exercises.

slide-17
SLIDE 17

17

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Visualization: Time Series Plot

> plot(tsd, ylab="(%)", main="Unemployment in Maine")

Unemployment in Maine

Time (%) 1996 1998 2000 2002 2004 2006 3 4 5 6

slide-18
SLIDE 18

18

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Multiple Time Series Plots

> plot(tsd, main="Chocolate, Beer & Electricity")

2000 6000

choc

100 150 200

beer

2000 8000 14000 1960 1965 1970 1975 1980 1985 1990

elec Time

Chocolate, Beer & Electricity

slide-19
SLIDE 19

19

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Only One or Multiple Frames?

  • Due to different scale/units it is often impossible to directly

plot multiple time series in one single frame. Also, multiple frames are convenient for visualizing the series.

  • If the relative development of multiple series is of interest,

then we can (manually) index the series and (manually) plot them into one single frame.

  • This clearly shows the magnitudes for trend and seasonality.

However, the original units are lost.

  • For details on how indexing is done, see the scriptum.
slide-20
SLIDE 20

20

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Multiple Time Series Plots

> plot(tsd, main="Chocolate, Beer & Electricity")

Time Index 1960 1965 1970 1975 1980 1985 1990 200 400 600 800

Indexed Chocolate, Beer & Electricity

choc beer elec

slide-21
SLIDE 21

Descriptive Decomposition

It is convenient to describe non-stationary time series with a simple decomposition model = trend + seasonal effect + stationary remainder The modelling can be done with: 1) taking differences with appropriate lag (=differencing) 2) smoothing approaches (= filtering) 3) parametric models (= curve fitting)

21

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

t t t t

X m s E   

slide-22
SLIDE 22

22

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Differencing: Theory

In the absence of a seasonal effect, a piecewise linear trend of a non-stationary time series can by removed by taking differences

  • f first order at lag 1:

The new time series is then going to be stationary, but has some new, strong and artificial dependencies. If there is a seasonal effect, we have to take first order differences at the lag of the period, which removes both trend and season:

t

Y

p

1 t t t

Y X X   

t t t p

Y X X   

slide-23
SLIDE 23

23

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Differencing: Example

Mauna Loa Data: original series, containing trend and season

Time co2 1960 1970 1980 1990 320 330 340 350 360

Mauna Loa Data

slide-24
SLIDE 24

24

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Differencing: Example

Mauna Loa Data: first order differences with lag 1

CO2 - Differenzen, lag 1

Time diff(co2) 1960 1970 1980 1990

  • 2
  • 1

1 2

slide-25
SLIDE 25

25

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Differencing: Example

Mauna Loa Data: first order differences with lag 12

CO2 - Differenzen, lag 12

Time diff(co2, lag = 12) 1960 1970 1980 1990 0.0 1.0 2.0 3.0

slide-26
SLIDE 26

26

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Differencing: Remarks

Some advantages and disadvantages: + trend and seasonal effect can be removed + procedure is very quick and very simple to implement

  • and

are not known, and cannot be visualised

  • resulting time series will be shorter than the original
  • differencing leads to strong artificial dependencies
  • extrapolation of

, is not possible

ˆ t m ˆt s ˆ t m ˆt s

slide-27
SLIDE 27

27

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Smoothing, Filtering: Part 1

In the absence of a seasonal effect, the trend of a non-stationary time series can be determined by applying any additive, linear

  • filter. We obtain a new time series , representing the trend:
  • the window, defined by and , can or can‘t be symmetric
  • the weights, given by , can or can‘t be uniformly distributed
  • other smoothing procedures can be applied, too.

ˆ

q t i t i i p

m a X 



  ˆ t m

i

a p q

slide-28
SLIDE 28

28

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Smoothing, Filtering: Part 2

In the presence a seasonal effect, smoothing approaches are still valid for estimating the trend. We have to make sure that the sum is taken over an entire season, i.e. for monthly data: An estimate of the seasonal effect at time can be obtained by: By averaging these estimates of the effects for each month, we

  • btain a single estimate of the effect for each month.

6 5 5 6

1 1 1 ˆ 7,..., 6 12 2 2

t t t t t

m X X X X for t n

   

             

t

s t ˆ ˆ

t t t

s x m  

slide-29
SLIDE 29

29

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Smoothing, Filtering: Part 3

  • The smoothing approach is based on estimating the trend

first, and then the seasonality.

  • The generalization to other periods than

, i.e. monthly data is straighforward. Just choose a symmetric window and use uniformly distributed coefficients that sum up to 1.

  • The sum over all seasonal effects will be close to zero.

Usually, it is centered to be exactly there.

  • This procedure is implemented in R with function:

decompose()

12 p 

slide-30
SLIDE 30

30

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Smoothing, Filtering: Remarks

Some advantages and disadvantages: + trend and seasonal effect can be estimated + and are explicitly known, can be visualised + procedure is transparent, and simple to implement

  • resulting time series will be shorter than the original
  • averaging leads to strong artificial dependencies
  • extrapolation of

, are not entirely obvious

ˆ t m ˆt s ˆ t m ˆt s

slide-31
SLIDE 31

31

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Smoothing, Filtering: STL-Decomposition

The Seasonal-Trend Decomposition Procedure by Loess

  • is an iterative, non-parametric smoothing algorithm
  • yields a simultaneous estimation of trend and seasonal effect

 similar to what was presented above, but more robust! + very simple to apply + very illustrative and quick + seasonal effect can be constant or smoothly varying

  • model free, extrapolation and forecasting is difficult

 Good method for „having a quick look at the data“

slide-32
SLIDE 32

32

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Constant Season

stl(log(ts(airline,freq=12)),s.window=„periodic“)

5.0 5.5 6.0 6.5

data

  • 0.2
  • 0.1

0.0 0.1 0.2

seasonal

4.8 5.2 5.6 6.0

trend

  • 0.10
  • 0.05

0.00 0.05 2 4 6 8 10 12

remainder time

slide-33
SLIDE 33

33

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Constant Season

stl(log(ts(airline,freq=12)),s.window=„periodic“)

the seasonal effect here is not time varying

erg$time.series[, 1] J F M A M J J A S O N D

  • 0.2
  • 0.1

0.0 0.1 0.2

slide-34
SLIDE 34

34

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Evolving Season

stl(log(ts(airline,freq=12)),s.window=15)

5.0 5.5 6.0 6.5

data

  • 0.2
  • 0.1

0.0 0.1 0.2

seasonal

4.8 5.2 5.6 6.0

trend

  • 0.05

0.00 0.05 2 4 6 8 10 12

remainder time

slide-35
SLIDE 35

35

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Evolving Season

stl(log(ts(airline,freq=12)),s.window=15)

correct amount of smoothing on the time varying seasonal effect

erg$time.series[, 1] J F M A M J J A S O N D

  • 0.2
  • 0.1

0.0 0.1 0.2

slide-36
SLIDE 36

36

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Evolving Season

stl(log(ts(airline,freq=12)),s.window=7)

5.0 5.5 6.0 6.5

data

  • 0.2 -0.1

0.0 0.1 0.2

seasonal

4.8 5.2 5.6 6.0

trend

  • 0.06
  • 0.02

0.02 0.06 2 4 6 8 10 12

remainder time

slide-37
SLIDE 37

37

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

STL-Decomposition: Evolving Season

stl(log(ts(airline,freq=12)),s.window=7)

erg$time.series[, 1] J F M A M J J A S O N D

  • 0.2
  • 0.1

0.0 0.1 0.2

Monthplot

not enough smoothing

  • n the time varying

seasonal effect

slide-38
SLIDE 38

38

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Parametric Modelling

When to use?  Parametric modelling is often used if we have previous knowledge about the trend following a functional form.  If the main goal of the analysis is forecasting, a trend in functional form may allow for easier extrapolation than a trend obtained via smoothing.  It can also be useful if we have a specific model in mind and want to infer it. Caution: correlated errors!

slide-39
SLIDE 39

39

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Parametric Modeling: Example

Mauna Loa Data: original series, containing trend and season

Time co2 1960 1970 1980 1990 320 330 340 350 360

Mauna Loa Data

slide-40
SLIDE 40

40

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Parametric Modeling for the Mauna Loa Data

Most often, time series are parametrically decomposed by using regression models. For the trend, polynomial functions are widely used, whereas the seasonal effect is modelled with dummy variables (= a factor). where Remark: choice of the polynomial degree is crucial!

2 3 1 2 3 ( ) t i t t

X t t t E              

   

1,2,...,468 ( ) 1,2,...,12 t i t  

slide-41
SLIDE 41

41

Marcel Dettling, Zurich University of Applied Sciences

Applied Time Series Analysis

FS 2012 – Week 02

Parametric Modeling: Remarks

Some advantages and disadvantages: + trend and seasonal effect can be estimated + and are explicitly known, can be visualised + even some inference on trend/season is possible + time series keeps the original length

  • choice of a/the correct model is necessary/difficult
  • residuals are correlated: this is a model violation!
  • extrapolation of

, are not entirely obvious

ˆ t m ˆt s ˆ t m ˆt s