Analysis and Computation for Finance Time Series - An Introduction - - PowerPoint PPT Presentation

analysis and computation for finance time series an
SMART_READER_LITE
LIVE PREVIEW

Analysis and Computation for Finance Time Series - An Introduction - - PowerPoint PPT Presentation

ECMM703 Analysis and Computation for Finance Time Series - An Introduction Alejandra Gonz alez Harrison 161 Email: mag208@exeter.ac.uk Time Series - An Introduction A time series is a sequence of observations ordered in time;


slide-1
SLIDE 1

ECMM703

Analysis and Computation for Finance Time Series - An Introduction

Alejandra Gonz´ alez Harrison 161 Email: mag208@exeter.ac.uk

slide-2
SLIDE 2

Time Series - An Introduction

  • A time series is a sequence of observations ordered in time;
  • bservations are numbers (e.g. measurements).
  • Time series analysis comprises methods that attempt to:

– understand the underlying context of the data (where did they come from? what generated them?); – make forecasts (predictions).

1

slide-3
SLIDE 3

Definitions/Setting

  • A stochastic process is a collection of random variables {Yt : t ∈ T}

defined on a probability space (Ω, F, P).

  • In time series modelling, a sequence of observations is considered

as one realisation of an unknown stochastic process:

  • 1. can we infer properties of this process?
  • 2. can we predict its future behaviour?
  • By time series we shall mean both the sequence of observations

and the process of which it is a realization (language abuse).

  • We will only consider discrete time series: observations (y1, . . . , yN)
  • f a variable at different times (yi = y(ti), say).

2

slide-4
SLIDE 4

Setting (cont.)

  • We will only deal with time series observed at regular time points

(days, months etc.).

  • We focus on pure univariate time series models: a single time series

(y1, . . . , yN) is modelled in terms of its own values and their order in time. No external factors are considered.

  • Modelling of time series which:

– are measured at irregular time points, or – are made up of several observations at each time point (multi- variate data), or – involve explanatory variables xt measured at each time point, is based upon the ideas presented here.

3

slide-5
SLIDE 5

Work plan

  • We provide an overview of pure univariate time series models:

– ARMA (‘Box-Jenkins’) models; – ARIMA models; – GARCH models.

  • Models will be implemented in the public domain general purpose

statistical language R.

4

slide-6
SLIDE 6

References

  • 1. Anderson, O. D. Time series analysis and forecasting The Box-

Jenkins approach. Butterworths, London-Boston, Mass., 1976.

  • 2. Box, George E. P. and Jenkins, Gwilym M.

Time series analysis: forecasting and control. Holden-Day, San Francisco, Calif. 1976

  • 3. Brockwell, Peter J. and Davis, Richard A. Time series: theory and

methods, Second Edition. Springer Series in Statistics, Springer-

  • Verlag. 1991.
  • 4. Jonathan D. Cryer Time series Analysis. PWS-KENT Publishing

Company, Boston. 1986. 5. R webpage: http://cran.r-project.org

  • 6. Time Series Analysis and Its Applications: With R Examples.

http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.html

5

slide-7
SLIDE 7

Statistical versus Time series modelling

Problem: Given a time series (y1, y2, . . . , yN): (i) determine tem- poral structure and patterns; (ii) forecast non-observed values. Approach: Construct a mathematical model for the data.

  • In statistical modelling it is typically assumed that the observations

(y1, . . . , yN) are a sample from a sequence of independent random

  • variables. Then

– there is no covariance (or correlation) structure between the

  • bservations; in other words,

– the joint probability distribution for the data is just the product

  • f the univariate probability distributions for each observation;

– we are mostly concerned with estimation of the mean behaviour µi and the variance σ2

i of the error about the mean, errors being

unrelated to each other.

6

slide-8
SLIDE 8

Statistical vs. Time series modelling (cont.)

  • However, for a time series we cannot assume that the observations

(y1, y2, . . . , yN) are independent: the data will be serially correlated

  • r auto-correlated, rather than independent.
  • Since we want to understand/predict the data, we need to ex-

plain/use the correlation structure between observations.

  • Hence, we need stochastic processes with a correlation structure
  • ver time in their random component.
  • Thus we need to directly consider the joint multivariate distribution

for the data, p(y1, . . . , yN), rather than just each marginal distribu- tion p(yt).

7

slide-9
SLIDE 9

Time series modelling

  • If one could assume joint normality of (y1, . . . , yN) then the joint

distribution, p(y1, . . . , yN), would be completely characterised by: – the means: µ = (µ1, µ2, . . . , µN); – the auto-covariance matrix Σ, i.e. the N ×N matrix with entries σij = cov(yi, yj) = E[(yi − µi)(yj − µj)].

  • In practice joint normality is not an appropriate assumption for

most time series (certainly not for most financial time series).

  • Nevertheless, in many cases knowledge of µ and Σ will be sufficient

to capture the major properties of the time series.

8

slide-10
SLIDE 10

Time series modelling (cont.)

  • Thus the focus in time series analysis reduces to understand the

mean µ and the autocovariance Σ of the generating process (weakly stationary time series).

  • In the applications both µ and Σ are unknown and so must be

estimated from the data.

  • There are N elements involved in the mean component µ and

N(N + 1)/2 distinct elements in Σ: vastly too many distinct un- knowns to estimate without some further restrictions.

  • To reduce the number of unknowns, we have to introduce para-

metric structure so that the modelling becomes manageable.

9

slide-11
SLIDE 11

Strict Stationarity

  • The time series {Yt : t ∈ Z} is strictly stationary if the joint distri-

butions of (Yt1, . . . , Ytk) and (Yt1+τ, . . . , Ytk+τ) are the same for all positive integers k and all t1, . . . , tk, τ ∈ Z.

  • Equivalently, the time series {Yt : t ∈ Z} is strictly stationary if

the random vectors (Y1, . . . , Yk) and (Y1+τ, Y2+τ, . . . , Yk+τ) have the same joint probability distribution for any time shift τ,

  • Taking k = 1 yields that Yt has the same distribution for all t.
  • If E[|Yt|2] < ∞, then E[Yt] and Var(Yt) are both constant.
  • Taking k = 2, we find that Yt and Yt+h have the same joint distri-

bution and hence cov(Yt, Yt+h) is the same for all h.

10

slide-12
SLIDE 12

Weak Stationarity

  • Let {Yt : t ∈ Z} be a stochastic process with mean µt and variance

σ2

t < ∞, for each t. Then, the autocovariance function is defined

by: γ(t, s) = cov (Yt, Ys) = E[(Yt − µt)(Ys − µs)] .

  • The stochastic process {Yt : t ∈ Z} is weak stationary if for all t ∈ Z

the following holds: – E

  • |Yt|2

< ∞,

E[Yt] = m;

– γ(r, s) = γ(r + t, s + t) for all r, s ∈ Z.

  • Notice that the autocovariance function of a weak stationary pro-

cess is a function of only the time shift (or lag) τ ∈ Z: γτ = γ(τ, 0) = cov

  • Yt+τ, Yt
  • ,

for all t ∈ Z. In particular the variance is independent of time: Var(Yt) = γ0.

11

slide-13
SLIDE 13

Autocorrelation

  • Let {Yt : t ∈ Z} be a stochastic process with mean µt and variance

σ2

t < ∞, for each t. Then, the autocorrelation is defined by:

ρ(t, s) = cov (Yt, Ys) σt σs = γ(t, s) σt σs .

  • If the function ρ(t, s) is well-defined, its value must lie in the range

[−1, 1], with 1 indicating perfect correlation and -1 indicating per- fect anti-correlation.

  • The autocorrelation describes the correlation between the process

at different points in time.

12

slide-14
SLIDE 14

Autocorrelation Function (ACF)

  • If {Yt : t ∈ Z} is weak stationary then the autocorrelation depends
  • nly on the lag τ ∈ Z:

ρτ = cov

  • Yt+τ, Yt
  • στ στ

= γτ σ2, for all t ∈ Z, where σ2 = γ0 denotes the variance of the process.

  • So weak stationarity (and therefore also strict stationarity) implies

auto-correlations depend only on the lag τ and this relationship is referred to as the auto-correlation function (ACF) of the process.

13

slide-15
SLIDE 15

Partial Autocorrelation Functions (PACF)

  • For a weak stationary process {Yt : t ∈ Z}, the PACF αk at lag k

may be regarded as the correlation between Y1 and Y1+k, adjusted for the intervening observations Y1, Y2, . . . , Yk−1.

  • For k ≥ 2 the PACF is the correlation of the two residuals ob-

tained after regressing Yk and Y1 on the intermediate observations Y2, Y3, . . . , Yk.

  • The PACF at lag k is defined by αk = ψkk, k ≥ 1, where ψkk is

uniquely determined by:

    

ρ0 ρ1 ρ2 . . . ρk−1 ρ1 ρ0 ρ1 . . . ρk−2 . . . . . . ρk−1 ρk−2 ρk−3 . . . ρ0

         

ψk1 ψk2 . . . ψkk

     =     

ρ1 ρ2 . . . ρk

     .

14

slide-16
SLIDE 16

Stationary models

  • Assuming weak stationarity, modelling a time series reduces to

estimation of a constant mean µ = µt and of a covariance matrix:

Σ = σ2

       

1 ρ1 ρ2 . . . ρN−1 ρ1 1 ρ1 . . . ρN−2 ρ2 ρ1 1 . . . ρN−3 . . . . . . . . . ... . . . ρN−1 ρN−2 ρN−3 . . . 1

       

  • There are many fewer parameters in Σ (N −1) than in an arbitrary,

unrestricted covariance matrix.

  • Still, for large N the estimation can be problematic without addi-

tional structure in Σ, to further reduce the number of parameters.

15

slide-17
SLIDE 17

Auto-regressive Moving Average (ARMA) processes

  • Weak stationary Auto-regressive Moving Average (ARMA) pro-

cesses allow reduction to a manageable number of parameters.

  • The simple structure of ARMA processes makes them very useful

and flexible models for weak stationary time series (y1, . . . , yN).

  • We assume that yt has zero mean. Incorporation of non-zero mean

is straightforward.

  • Modelling of non-stationary data is based on variations of ARMA

models.

16

slide-18
SLIDE 18

ARMA Modelling

First order auto-regressive processes: AR(1)

  • The simplest example from the ARMA family is the first-order

auto-regressive process denoted AR(1) i.e. yt = ϕ1yt−1 + ǫt . (1) Here ǫt constitute a white noise process i.e. zero mean ‘random shocks’ or ‘innovations’ assumed to be independent of each other and identically distributed with constant variance σ2

ǫ .

  • Equation (1) can be written symbolically in the more compact form

ϕ(B)yt = ǫt, where ϕ(z) = 1 − ϕ1z and B is the backward shift or lag-operator defined by Bmyt = yt−m.

17

slide-19
SLIDE 19

AR(1) (cont.)

  • The stationarity condition in an AR(1) process yt = ϕ1yt−1 + ǫt

amounts to |ϕ1| < 1. Equivalently, ϕ(z) = 1 − ϕ1z = 0, for all z ∈ C such that |z| ≤ 1.

  • By slight rearrangement and using the lag-operator, the AR(1)

model (1 − ϕ1B)yt = ǫt can be written as: yt = (1 − ϕ1B)−1ǫt = (1 + ϕ1B + ϕ2

1B2 + ϕ3 1B3 + . . .)ǫt.

Notice that this series representation will converge as long as |ϕ1| < 1.

18

slide-20
SLIDE 20

AR(1) (cont.)

  • For the AR(1) process it can be shown that:

Var(yt) = γ0 = σ2

ǫ (1 + ϕ2 1 + ϕ4 1 + . . .) =

σ2

ǫ

(1 − ϕ2

1),

cov(yt, yt−k) = γk = γk−1ϕ1 , k > 0, ρk = γk γ0 = ϕk

1.

  • Since |ϕ1| < 1, the ACF ρk shows a pattern which is decreasing in

absolute value. This implies that the linear dependence of two ob- servations yt and ys becomes weaker with increasing time distance between t and s.

  • If 0 < ϕ1 < 1, the ACF decays exponentially to zero, while if

−1 < ϕ1 < 0, the ACF decays in an oscillatory manner. Both decays are slow if ϕ is close to the non-stationary boundaries ±1.

19

slide-21
SLIDE 21

AR(1), phi= 0.3

Time 200 400 600 800 1000 −3 −1 1 2 3 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

20

slide-22
SLIDE 22

AR(1), phi= 0.8

Time 200 400 600 800 1000 −6 −2 2 4 6 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

21

slide-23
SLIDE 23

AR(1), phi= 0.99

Time 200 400 600 800 1000 −20 −10 5 10 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

22

slide-24
SLIDE 24

First order moving average processes: MA(1)

  • A first-order moving-average process, MA(1), is defined by:

yt = ǫt − θ1ǫt−1 = (1 − θ1B)ǫt.

  • For the MA(1) process it can be shown that:

Var(yt) = γ0 = (1 + θ2

1)σ2 ǫ

cov(yt, yt−1) = γ1 = −θ1σ2

ǫ

cov(yt, yt−k) = 0, k > 1 ρ1 = γ1 γ0 = −θ1 (1 + θ2

1)

ρk = 0, k > 1 .

  • Note: the two observations yt and ys generated by a MA(1) process

are uncorrelated if t and s are more than one observation apart.

23

slide-25
SLIDE 25

MA(1), theta= 0.9

Time 200 400 600 800 1000 −4 −2 2 4 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

24

slide-26
SLIDE 26

MA(1), theta= −0.9

Time 200 400 600 800 1000 −4 −2 2 4 1 2 3 4 5 6 −0.5 0.0 0.5 1.0 Lag ACF

25

slide-27
SLIDE 27

AR(p) and MA(q) processes

  • Both the AR(1) and MA(1) processes impose strong restrictions
  • n the pattern of the corresponding ACF.
  • More general ACF patterns are allowed by autoregressive or moving

average models of higher order.

  • The AR(p) and MA(q) models are defined as follows:

yt = ϕ1yt−1 + ϕ2yt−2 + . . . + ϕpyt−p + ǫt (AR(p) process) and yt = ǫt − θ1ǫt−1 − θ2ǫt−2 − . . . − θqǫt−q (MA(q) process) The ϕi and θj, i = 1, . . . , p; j = 1, . . . , q are parameters.

26

slide-28
SLIDE 28

Autoregressive Moving Average Processes: ARMA(p,q)

  • Combining the AR(p) and MA(q) processes we define an autore-

gressive moving average process ARMA(p,q): yt = ϕ1yt−1+ϕ2yt−2+. . .+ϕpyt−p+ǫt−θ1ǫt−1−θ2ǫt−2−. . .−θqǫt−q .

  • Using the lag operator B, the ARMA(p,q) model may be written:

(1 − ϕ1B − ϕ2B2 − . . . − ϕpBp)yt = (1 − θ1B − θ2B2 − . . . − θqBq)ǫt

  • r more compactly as:

ϕ(B)yt = θ(B)ǫt, where ϕ(z) = 1 − ϕ1z − ϕ2z2 − . . . − ϕpzp, θ(z) = 1 − θ1z − θ2z2 − . . . − θqzq.

27

slide-29
SLIDE 29

ARMA(1,1): +0.5, +0.8

Time x 200 400 600 800 1000 −4 −2 2 4 6 2 4 6 8 10 12 0.0 0.2 0.4 0.6 0.8 1.0

True ACF

Lag True ACF

28

slide-30
SLIDE 30

Stationarity Conditions

  • Assume that the polynomials θ(z) and ϕ(z) have no common ze-

roes.

  • An ARMA(p,q) model defined by ϕ(B)yt = θ(B)ǫt is stationary if

ϕ(z) = (1 − ϕ1z − ϕ2z2 − ... − ϕpzp) = 0, for |z| ≤ 1.

  • For a stationary ARMA(p,q) process the polynomial ϕ(B) can be

’inverted’ and so yt has a moving average representation of infinite

  • rder:

yt =

  • j=0

ψj ǫt−j , (2) where the coefficients ψj are determined by the relation θ(z) ϕ(z) =

  • j=0

ψj zj, |z| ≤ 1. We write equation (2) in compact form: yt = ϕ−1(B)θ(B)ǫt.

29

slide-31
SLIDE 31

Invertibility Conditions

  • The ARMA(p,q) model ϕ(B)yt = θ(B)ǫt is called invertible if there

exists a sequence of constants {πj} such that ∞

j=0

  • πj
  • < ∞ and

ǫt =

  • j=0

πj yt−j. (3)

  • Assume that the polynomials θ(z) and ϕ(z) have no common ze-
  • roes. Then the ARMA(p,q) process is invertible if and only if

θ(z) = (1 − θ1z − θ2z2 − ... − θqzq) = 0, for |z| ≤ 1. The coefficients πj are determined by the relation ϕ(z) θ(z) =

  • j=0

πj zj, |z| ≤ 1. We write equation (3) in the following compact form: θ−1(B)ϕ(B)yt = ǫt .

30

slide-32
SLIDE 32

−10 −8 −6 −4 −2 −20 −10 B Function

Polynomial Function vs. B

−4 −2 2 4 −4 −2 2 4

Roots and Unit Circle

Real Part Imaginary Part 1 2 3 4 5 0.4 0.8 1.2 B Function

Polynomial Function vs. B

−2 −1 1 2 −2 1 2

Roots and Unit Circle

Real Part Imaginary Part −4 −3 −2 −1 1 2 3 −100 −60 −20 B Function

Polynomial Function vs. B

−1 1 −1 1

Roots and Unit Circle

Real Part Imaginary Part

31

slide-33
SLIDE 33

Non-zero mean ARMA processes

  • For ARMA models we have so far assumed a zero-mean stationary

process.

  • The generalisation of stationary non-zero constant mean ARMA(p,q)

is straightforward: – Augmenting the stationary process with an additional parameter ν = 0 one obtains: ϕ(B)yt = ν + θ(B)ǫt. – Inversion of ϕ(B) then immediately yields the mean of yt as: µ = E(yt) = ϕ−1(B)ν. – Note that if ϕ(B) = 1 (which is the case for the pure MA(q) model) one has µ = ν.

32

slide-34
SLIDE 34

Modelling using ARMA processes

Step 1. ARMA model identification; Step 2. ARMA parameter estimation Step 3. ARMA model selection ; Step 4. ARMA model checking; Step 5. forecasting from ARMA models.

33

slide-35
SLIDE 35

ARMA model identification

  • A plot of the data will give us some clue as to whether the series

is not stationary.

  • To analyse an observed stationary time series through an ARMA(p,q)

model, the first step is to determine appropriate values for p and q.

  • One of the basic tools in such model order identification are plots
  • f the estimated ACF ˆ

ρk and PACF ˆ αk against the lag k.

  • The shape of these plots can help to discriminate between com-

peting models.

34

slide-36
SLIDE 36

ARMA model identification (cont.)

  • The autocorrelations:

– for a MA(q) process ρk = 0 for k ≥ q + 1; – for an AR(p) process they decay exponentially. – for a mixed ARMA(p,q) we expect the correlations to tail off after lag p − q.

  • These considerations assist in deciding whether p > 0 and, if not,

to choose the value of q.

35

slide-37
SLIDE 37

Estimators for ACF/PACF (see Ch. 7 in ref 3)

  • Let (y1, y2, . . . , yN) be a realization of a weak stationary time series.
  • The sample autocovariance function is defined by

ˆ γk = 1 N

N−k

  • t=1

(yt − ¯ y)(yt+k − ¯ y) 0 ≤ k ≤ N , ˆ γk = ˆ γ−k , −N < h ≤ 0, where ¯ y is the sample mean ¯ y = 1 N

N

  • j=1

yj.

  • The sample autocorrelation function is defined by

ˆ ρk = ˆ γk ˆ γ0 , |k| < N.

36

slide-38
SLIDE 38

Estimators ACF/PACF (cont.)

  • The sample PACF at lag k can be computed as a function of the

sample estimate of the ACF as: ˆ αk = ˆ ψkk , k ≥ 1, where ˆ ψkk is uniquely determined by:

    

ˆ ρ0 ˆ ρ1 ˆ ρ2 . . . ˆ ρk−1 ˆ ρ1 ˆ ρ0 ˆ ρ1 . . . ˆ ρk−2 . . . . . . ˆ ρk−1 ˆ ρk−2 ˆ ρk−3 . . . ˆ ρ0

         

ˆ ψk1 ˆ ψk2 . . . ˆ ψkk

     =     

ˆ ρ1 ˆ ρ2 . . . ˆ ρk

     .

37

slide-39
SLIDE 39

200 400 600 800 −3 2

AR(1): +0.5

1:length(x) x 2 4 6 8 10 12 0.0 0.6 Lag ACF

Series x

2 4 6 8 10 12 0.0 0.3 Lag Partial ACF

Series x

200 400 600 800 −2 2

AR(1): −0.5

1:length(x) x 2 4 6 8 10 12 −0.4 0.4 Lag ACF

Series x

2 4 6 8 10 12 −0.4 −0.1 Lag Partial ACF

Series x

200 400 600 800 −4 4

AR(2): +0.5, +0.3

1:length(x) x 2 4 6 8 10 12 0.0 0.6 Lag ACF

Series x

2 4 6 8 10 12 0.0 0.4 Lag Partial ACF

Series x

200 400 600 800 −4 4

AR(2): −0.5, +0.3

1:length(x) x 2 4 6 8 10 12 −0.5 0.5 Lag ACF

Series x

2 4 6 8 10 12 −0.6 0.0 Lag Partial ACF

Series x

38

slide-40
SLIDE 40

200 400 600 800 −4 4

MA(1): +0.8

1:length(x) x 2 4 6 8 10 12 0.0 0.6 Lag ACF

Series x

2 4 6 8 10 12 −0.2 0.2 Lag Partial ACF

Series x

200 400 600 800 −4 4

MA(1): −0.8

1:length(x) x 2 4 6 8 10 12 −0.5 0.5 Lag ACF

Series x

2 4 6 8 10 12 −0.5 −0.1 Lag Partial ACF

Series x

200 400 600 800 −2 2

ARMA(1,1): 0.5, −0.8

1:length(x) x 2 4 6 8 10 12 −0.2 0.6 Lag ACF

Series x

2 4 6 8 10 12 −0.20 0.00 Lag Partial ACF

Series x

200 400 600 800 −6 4

ARMA(1,1): −0.5, −0.8

1:length(x) x 2 4 6 8 10 12 −0.5 0.5 Lag ACF

Series x

2 4 6 8 10 12 −0.6 0.0 Lag Partial ACF

Series x

39

slide-41
SLIDE 41

200 400 600 800 1000 −4 4

AR(2): +0.5, 0.3

1:length(x) x 2 4 6 8 10 12 0.0 Lag ACF

Series x

2 4 6 8 10 12 0.0

True ACF

Lag True ACF 2 4 6 8 10 12 0.0 Lag Partial ACF

Series x

2 4 6 8 10 12 0.0

True PACF

Lag True PACF 200 400 600 800 1000 −4 4

AR(2): −0.5, 0.3

1:length(x) x 2 4 6 8 10 12 −0.5 Lag ACF

Series x

2 4 6 8 10 12 −0.5

True ACF

Lag True ACF 2 4 6 8 10 12 −0.6 Lag Partial ACF

Series x

2 4 6 8 10 12 −0.5

True PACF

Lag True PACF

40

slide-42
SLIDE 42

ARMA Parameter estimation

  • Fitting an ARMA(p,q) model requires estimation of:

– the model parameters (ϕ1, . . . , ϕp); (θ1, . . . , θq); – the mean µ (where this is non-zero) and – the variance, σ2

ǫ , of the underlying white noise process ǫt.

  • If we denote the full set of these parameters by a vector Θ then

we can proceed: – to write down a likelihood for the data L(Θ; y) = p(y; Θ), – estimate the parameters by maximum likelihood and – derive standard errors and confidence intervals through the asymp- totic likelihood theory results.

41

slide-43
SLIDE 43

ARMA Parameter estimation (cont.)

  • The usual way to proceed is to assume that ǫt ∼ N(0, σ2

ǫ ).

  • The resulting derivation of the likelihood function and the associ-

ated maximisation algorithm for the general ARMA(p,q) model is somewhat involved and we do not go into details here.

  • The basic idea is to factorise the joint distribution p(y1, y2, . . . , yN)

as p(y1, y2, . . . , yN) = p(y1)

N

  • t=2

p(yt|y1, . . . , yt−1).

  • It may then be shown that p(yt|y1, . . . , yt−1) is normal with mean

given by the predicted value ˆ yt of yt and similarly that the marginal distribution p(y1) is normal with mean ˆ y1.

  • Then log likelihood can then be expressed in terms of the prediction

errors (yt − ˆ yt). This assists in developing algorithms to effect the maximisation.

42

slide-44
SLIDE 44

ARMA Model Selection

  • We want to find a model that fits the observed data as well as

possible.

  • Once fitted, models can then be compared by the use of a suitable

penalised log-likelihood measure, for example Akaike’s Information Criterion (AIC)

  • There exists a variety of other selection criteria that have been

suggested to choose an appropriate model.

  • All these are similar differing only in the penalty adjustment involv-

ing the number of estimated parameters.

  • As for the AIC, the criteria are generally arranged so that better

fitting models correspond to lower values of the criteria.

43

slide-45
SLIDE 45

ARMA Model checking

  • The residuals for an ARMA model are estimated by subtraction of

the adopted model predictions from the observed time series.

  • If the model assumptions are valid then we would expect the (stan-

dard) residuals to be independent and normally distributed.

  • In time series analysis it is important to check that there is no au-

tocorrelation remaining in the residuals. Plots of residuals against the time ordering are therefore important.

  • Various tests for serial correlation in the residuals are available.

44

slide-46
SLIDE 46
  • Ex. 4

200 400 600 800 1000 −4 −2 2

AR(5), −0.4,0.1,0,0,0.1

1:length(x) x 2 4 6 8 10 12 −0.4 −0.2 0.0 Lag Partial ACF

Series x

45

slide-47
SLIDE 47

Example 5

  • The function armaFit() estimates the parameters of ARMA models

(arguments are described on the help page).

  • Consider the time series generated in Ex 4. from an AR(5) model

with parameters: ϕ1 = −0.4, ϕ2 = 0.1, ϕ3 = ϕ4 = 0, ϕ5 = 0.1.

  • Examination of the PACF (see above) reveals significant correlation

at lag 5, after which the correlation is negligible.

  • This suggests to use an ARMA(p,q) model with p = 5, with q 1 or

2 (this is because the PACF of an MA(q) decreases exponentialy).

  • We first apply the function armaFit() to estimate the parameters
  • f an AR(5) model.

46

slide-48
SLIDE 48

Example 5 (cont) fit<-armaFit(x~ar(5),x,method="mle") summary(fit) Model: ARIMA(5,0,0) with method: CSS-ML Coefficient(s): ar1 ar2 ar3 ar4 ar5 intercept

  • 0.419200

0.108544 0.006913

  • 0.004710

0.146163

  • 0.054552

Residuals: Min 1Q Median 3Q Max

  • 3.36283 -0.65182

0.02615 0.65574 3.19371 Moments: Skewness Kurtosis

  • 0.1242

0.1234

47

slide-49
SLIDE 49

Example 5 (cont) Coefficient(s): Estimate

  • Std. Error

t value Pr(>|t|) ar1

  • 0.419200

0.031291

  • 13.397

< 2e-16 *** ar2 0.108544 0.033978 3.195 0.0014 ** ar3 0.006913 0.034145 0.202 0.8396 ar4

  • 0.004710

0.034024

  • 0.138

0.8899 ar5 0.146163 0.031329 4.665 3.08e-06 *** intercept -0.054552 0.027412

  • 1.990

0.0466 *

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 sigma^2 estimated as: 1.016 log likelihood:

  • 1427.07

AIC Criterion: 2868.15

48

slide-50
SLIDE 50

Example 5 (cont)

  • Note that summary() also provides the estimate of the variance σ2
  • f the white noise process.
  • The values of the AR coefficients of order 3 and 4 are small and

the associated standard errors are large: as a consequence, these coefficients have large p-values (last column) and are not statisti- cally significant according to a 5% t-test. It is therefore a good idea to fit an AR(5) process in which these coefficients (as well as the intercept) are fixed to zero. This can be specified with the parameter fixed=c():

49

slide-51
SLIDE 51

Example 5 (cont.) fit<-armaFit(x~ar(5),x,fixed=c(NA,NA,0,0,NA,0),method="mle") par(mfrow=c(2,2)) summary(fit) Model: ARIMA(5,0,0) with method: CSS-ML Coefficient(s): ar1 ar2 ar3 ar4 ar5 intercept

  • 0.3564

0.1135 0.0000 0.0000 0.1231 0.0000 Residuals: Min 1Q Median 3Q Max

  • 3.13847 -0.66654 -0.01819

0.68648 3.36718

50

slide-52
SLIDE 52

Example 5 (cont) Moments: Skewness Kurtosis 0.07226 -0.02576 Coefficient(s): Estimate

  • Std. Error

t value Pr(>|t|) ar1

  • 0.35642

0.03115

  • 11.441

< 2e-16 *** ar2 0.11350 0.03120 3.637 0.000275 *** ar3 0.00000 0.02861 0.000 1.000000 ar4 0.00000 0.03115 0.000 1.000000 ar5 0.12309 0.03120 3.945 7.98e-05 *** intercept 0.00000 0.02861 0.000 1.000000

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 sigma^2 estimated as: 1.095 log likelihood:

  • 1464.51

AIC Criterion: 2937.02

51

slide-53
SLIDE 53

200 400 600 800 1000 −3 −2 −1 1 2 3

Standardized Residuals

Index Residuals 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

ACF of Residuals

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

QQ−Plot of Residuals

Normal Quantiles Residual Quantiles 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0

Ljung−Box p−values

lag p value

52

slide-54
SLIDE 54

See Ex. 5 (cont)

  • The summary() method automatically plots the residuals, the au-

tocorrelation function of the residuals, the standardized residuals, and the Ljung-Box statistic (test of independence).

  • In order to investigate the model fit we could estimate the param-

eters for various ARMA(p,q) models with pmax = 5 and qmax = 2 for the same simulated time series and compare the relative fits through the AIC value (see the R script ex5.r).

53

slide-55
SLIDE 55

Modeling with ARMA(p,q) models (summary)

  • Model identification: Use the ACF and the PACF function to get

indicators of p and q. The following can assist you to do that: Procces ACF Partial ACF AR(p)

  • Exp. decay or damped cos

zero after lag p MA(q) Cuts after lag q

  • Exp. decay or damped cos

ARM(p,q) Exponential decay after q − p Decay after p − q

  • Parameter estimation: Estimate values for the model parameters

(ϕ1, . . . , ϕp); (θ1, . . . , θq), µ and σ2

ǫ . (there are several ways one can

do this e.g. the Yule-Walker method).

54

slide-56
SLIDE 56

Modeling with ARMA(p,q) models (summary cont.)

  • Model selection:

– Fit ARMA(p,q) models by the maximum Likelihood estimates using the (Yule-Walke) estimates for the parameter as initial values of the maximisation algorithm. – Prevent over-fitting by imposing a cost for increasing the num- ber of parameters in the fitted model. One way in which this can be done is by the information criterion of Akaike (AIC) – The model selected is the one that minimises the value of AIC.

55

slide-57
SLIDE 57
  • Model checking

– The residuals of a fitted model are the scaled difference between an observed and a predicted value. – Goodness of fit is checked essentially by checking that the resid- uals are like white noise (i.e. mean zero i.i.d. process with constant variance). – There are several candidates for the residuals one is the com- puted in the course of determining the maximum likelihood es- timates: ˆ Wt = Yt − ˆ Yt(ˆ ϕ, ˆ θ)) rt−1(ˆ ϕ, ˆ θ)1/2 , where ˆ Yt(ˆ ϕ, ˆ θ) are the predicted values of Yt, based on Y1, . . . , YN−1 for the fitted ARMA(p,q) model and rt−1(ˆ ϕ, ˆ θ)1/2 are the sample mean squared errors. Another is: ˆ Zt = ˆ θ−1(B)(ˆ ϕ)(B)Yt.

56

slide-58
SLIDE 58
slide-59
SLIDE 59

Forecasting from ARMA models

  • Given a series (y1, y2, . . . , yN) up to time N, a prominent issue within

time series analysis is: – to provide estimates of future values yN+h, h = 1, 2, . . . – conditionally on the available information, i.e. yN, yN−1, yN−2, . . ..

  • Within the class of weak stationary ARMA(p,q) processes yN+h is

given by: yN+h = ν + ϕ1 yN+h−1 + . . . + ϕp yN+h−p + ǫN+h − θ1 ǫN+h−1 − . . . − θq ǫN+h−q (∗)

57

slide-60
SLIDE 60

Forecasting from ARMA models (cont.)

  • An obvious forecast for yN+h is

ˆ yN+h = E

  • yN+h|yN, yN−1, yN−2, . . .
  • i.e. its expected value given the observed series.
  • The computation of this expectation follows a recursive scheme of

substituting: ˆ yN+j =

  • yN+j

, j ≤ 0 ˆ yN+j , j > 0 into equation (∗) in place of yN+j and taking ǫN+j = 0 for j > 0.

58

slide-61
SLIDE 61

Forecasting from ARMA models (cont.)

  • For example for the ARMA(1,1) model with a non-zero mean,

equation (∗) is: yN+h = ν + ϕ1yN+h−1 + ǫN+h − θ1ǫN+h−1 so we

  • btain successively:

ˆ yN+1 = ν + ϕ1yN − θ1ǫN ˆ yN+2 = ν + ϕ1ˆ yN+1 = ν + ϕ1(ν + ϕ1yN − θ1ǫN) ˆ yN+3 = ν + ϕ1ˆ yN+2 = ν + ϕ1(ν + ϕ1(ν + ϕ1yN − θ1ǫN)) . . . . . . Iterating this scheme shows that with increased forecast horizon the forecast converges to the mean of the process µ.

59

slide-62
SLIDE 62

Forecasting from ARMA models (cont.)

  • Obtaining the sequence of forecast errors ˆ

ǫN+h = yN+h − ˆ yN+h follows the same sort of scheme so that: ˆ ǫN+1 = yN+h − ˆ yN+h = ν + ϕ1yN + ǫN+1 − θ1ǫN − (ν + ϕ1yN − θ1ǫN) = ǫN+1

  • Iterating along similar lines we obtain:

ˆ ǫN+2 = ǫN+2 + (ϕ1 − θ1)ǫN+1 ˆ ǫN+3 = ǫN+3 + (ϕ1 − θ1)ǫN+3 + ϕ1(ϕ1 − θ1)ǫN+1 and so on.

60

slide-63
SLIDE 63

Forecasting from ARMA models (cont.)

  • The forecasts ˆ

yN+h are unbiased and so the expected values of the forecast errors ˆ ǫN+h are zero.

  • The variance of the forecast error however increase with h.
  • In the limit as h increases this variance converges to the uncondi-

tional variance of the process i.e. var(yt) = σ2 = γ0.

  • Clearly in practical forecasting from an ARMA(p,q) model the val-

ues of the parameters (ϕ1, . . . , ϕp) and (θ1, . . . , θq) will be unknown and these are replaced by their maximum likelihood estimates.

  • Standard errors and confidence intervals for the forecasts may be

derived from the general likelihood theory in the usual way. See Ex. 6

61

slide-64
SLIDE 64

Time Series: x 950 960 970 980 990 1000 1010 −2 −1 1 2

ARIMA(5,0,0) with method: CSS−ML

62

slide-65
SLIDE 65

Non–stationary processes

  • Many time series encountered in practice may exhibit non-stationary
  • behaviour. For example, there maybe non-stationarity in the mean

component e.g. a time trend or seasonal effect in µt.

  • We may think of this situation as the series consisting of a non-

constant systematic (trend) component (usually some relatively simple function of time) and then a random component which is a zero-mean stationary series.

  • Note that such a model is only reasonable if there are good reasons

for believing that the trend is appropriate forever.

  • There are several methods to eliminate trend and seasonal effects

to generate stationary data.

63

slide-66
SLIDE 66

ARIMA models

  • Some types of time series the non-stationary behaviour of the mean

µt is simple enough so that some differencing of the original series yields a new series which is stationary (so µt is constant).

  • For example for financial time series (comprising log prices), first

differencing (log returns) is often sufficient to produce a stationary time series with a constant mean.

  • So the differenced series can be modelled directly by an ARMA

process and no additional systematic component is required.

  • This type of time series modelling where some degree of differ-

encing is combined with an ARMA model is called Auto-regressive Integrated Moving Average (ARIMA) modelling.

64

slide-67
SLIDE 67

ARIMA models (cont.)

  • We have seen already that if the moduli of the roots of the charac-

teristic equation of an ARMA(p,q) model lie inside the unit circle then the process will not be stationary.

  • In general, if the modulus of a root is strictly inside the unit circle

then this will lead to exponential or explosive behaviour in the series and no practical models result.

  • If the modulus of the offending root lies on the circle then a more

reasonable type of non-stationarity results. For example for the simple random walk yt = yt−1 + ǫt. Note that the first difference of this series yt − yt−1 is a white noise process.

65

slide-68
SLIDE 68

ARIMA models (cont.)

  • This differencing idea can be generalised to the notion of using a

model Yt where the first difference of the process Xt = (1 − B)Yt = Yt−1 − Yt is a stationary ARMA process, rather than white noise.

  • More generally, if d ≥ 1, Yt is an ARIMA(p,d,q) process if Xt =

(1 − B)dYt is an ARMA(p,q).

  • An ARIMA(p,d,q) process Yt satisfies:

ϕ∗(B)Yt ≡ ϕ(B)(1 − B)d Yt = θ(B)ǫt, where ϕ(z) and θ(z) are polynomials of degrees p and q, resp., and ϕ(z) = 0 for |z| ≤ 1 and ǫt is a white noise process.

  • An ARIMA model for a series yt is one where a differencing oper-

ation on yt leads to a series with stationary ARMA behaviour.

66

slide-69
SLIDE 69

ARIMA models (cont.)

  • A distinctive feature of the data which suggest the appropriateness
  • f an ARIMA model is the slowly decaying positive sample ACF.
  • Sample ACF with slowly decaying oscillatory behaviour are associ-

ated with models ϕ∗(B)Yt = θ(B)ǫt , in which ϕ∗ has a zero near eiα for some α ∈ (−π, π] other than α = 0.

  • In modeling using ARIMA processes the original series is simply

differenced until stationarity is obtained and then the differenced series is modelled following the standard ARMA approach.

67

slide-70
SLIDE 70

ARIMA models (cont.)

  • Results may then be transformed back to the undifferenced original

scale if required.

  • Choice of an appropriate differencing parameter adds an extra di-

mension to model choice.

  • For financial time series that have non-stationary behaviour, as

mentioned earlier, first differencing (which leads to use of log re- turns), is usually sufficient to produce a time series with a stationary mean.

68

slide-71
SLIDE 71

ARIMA models (summary)

  • Plot the data to determine whether there is a trend.

Of course this is only an indication, and what we see as a trend may be part

  • f a very long–term circle.
  • Use the sample ACF and PACF to determine whether it is possible

to model the time series with an ARIMA model.

  • Use differences to obtain an ARMA model.
  • Model the differenced data using ARMA modelling.

See Ex. 7

69

slide-72
SLIDE 72

ARCH and GARCH Modelling

  • ARMA and ARIMA modelling is quite flexible and applicable. How-

ever, in some financial time series there are effects which cannot be adequately explained by these sorts of models.

  • One particular feature is so called volatility clustering.
  • This refers to a tendency for the variance of the random component

to be large if the magnitude of recent ‘errors’ has been large and smaller if the magnitude of recent ‘errors’ has been small.

  • This kind of behaviour requires non-stationarity in variance (i.e.

heteroscedasticity) rather than in the mean

  • This leads to alternative kinds of models to the ARIMA family

which are referred to as ARCH and GARCH models.

70

slide-73
SLIDE 73

ARCH and GARCH Modelling

  • A dominant feature in many financial series is volatility clustering:

The conditional variance of ǫt appears to be large if recent ob- servations ǫt−1, ǫt−2, .. are large in absolute value and small during periods where lagged innovations are also small in absolute value.

  • This effect cannot be explained by ARIMA models which assume

a constant variance.

  • Autoregressive Conditionally Heteroscedastic(ARCH) models, (En-

gle 1982), were developed to model changes in volatility.

  • These were extended to Generalised ARCH, or (GARCH) models

(Bollerslev 1986).

71

slide-74
SLIDE 74

ARCH Models

  • Let xt be the value of a stock at time t. The return, or relative

gain, yt, of the stock at time t is yt = xt − xt−1 xt−1 .

  • Note, for financial series, return does not have a constant variance,

with highly volatile periods tending to be clustered together – there is a strong dependence of sudden bursts of variability in a return

  • n the time series’ own past.
  • Volatility models like ARCH, GARCH are used to study the returns

yt.

72

slide-75
SLIDE 75

ARCH(1) Models

  • The most simple ARCH model, the ARCH(1), models the return

as yt = σtǫt σ2

t

= ω + α1y2

t−1,

where ǫt ∼ N(0, 1).

  • As with ARMA models, we impose constraints on the model pa-

rameters to obtain desirable properties: Sufficient conditions that guarantee σ2

t > 0 are ω > 0, α1 ≥ 0.

73

slide-76
SLIDE 76

ARCH(1)(Properties)

  • Conditionally on yt−1, yt is Gaussian: yt|yt−1 ∼ N
  • 0, ω + α1y2

t−1

  • .
  • The returns {yt} have zero mean and they are uncorrelated.
  • The squared returns
  • y2

t

  • satisfy:

y2

t = ω + α1y2 t−1 + vt,

where the error process vt = σ2

t

  • ǫ2

t − 1

  • is a white noise process.
  • Hence

– ARCH(1) models returns {yt} as a white noise process with non-constant conditional variance, and the conditional variance depends on the previous return. – the returns {yt} are uncorrelated, whereas their squares

  • y2

t

  • follow a non-Gaussian autoregressive process.

74

slide-77
SLIDE 77

ARCH(1) Models (cont.)

  • Moreover, the kurtosis of yt is

κ = E

  • y4

t

  • E
  • y2

t

2 = 3

1 − α2

1

  • 1 − 3α2

1

  • which is always larger than 3, the kurtosis of the normal distribu-

tion.

  • Thus, the marginal distribution of the returns, yt, is leptokurtic, or

has heavy tails.

  • So outliers are more likely. This agrees with empirical evidence -
  • utliers appear more often in asset returns than implied by an i.i.d

sequence of normal random variates.

75

slide-78
SLIDE 78

ARCH(1) Models (cont.)

  • Estimation of the parameters ω and α1 of the ARCH(1) model is

accomplished using conditional MLE.

  • The likelihood of the data y2, ..., yn conditional on y1, is given by

L (ω, α1|y1) =

n

  • t=2

fω,α1

yt|yt−1 ,

where fω,α1

yt|yt−1 ∼ N

  • 0, ω + α1y2

t−1

  • ,

that is fω,α1

yt|yt−1 ∝

1

  • ω + α1y2

t−1

1

2

exp

 −1

2

 

y2

t

ω + α1y2

t−1

    .

76

slide-79
SLIDE 79

ARCH(1) Models (cont.)

  • Hence, the objective function to be maximised is the conditional

log-likelihood l (ω, α1|y1) = ln [L (ω, α1|y1)] ∝ − 1 2

n

  • t=2

ln

  • ω + α1y2

t−1

1 2

n

  • t=2

 

y2

t

ω + α1y2

t−1

  .

  • Maximisation of this function is achieved using numerical methods

(analytic expressions for the gradient vector and Hessian matrix of the log-likelihood functions can be obtained).

77

slide-80
SLIDE 80

ARCH(m) Models (cont.)

  • The general ARCH(m) model is defined by:

yt = σtǫt σ2

t

= ω + α1y2

t−1 + ... + αmy2 t−m,

where the parameter m determines the maximum order of lagged in- novations which are supposed to have an impact on current volatil- ity.

  • Similar results to those from the ARCH(1) model hold:

yt|yt−1, ..., yt−m ∼ N

  • 0, ω + α1y2

t−1 + ... + αmy2 t−m

  • ,

y2

t = ω + α1y2 t−1 + α1y2 t−1 + ... + αmy2 t−m + vt,

where vt = σ2

t

  • ǫ2

t − 1

  • is a shifted χ2

1 random variable.

  • yt and vt have a zero mean.
  • Estimation of the parameters ω, α1, ..., αm is similar to that for

ARCH(1)

78

slide-81
SLIDE 81

Building ARCH models

  • An ARIMA model is built for the observed time series to remove

any serial correlation in the data.

  • Examine the squared residuals to check for conditional heteroscedas-

ticity.

  • Use the PACF of squared residuals to determine the ARCH order.

As final remarks we should comments on some of the weaknesses:

  • ARCH treats positive and negative returns in the same way (by

past square returns).

  • ARCH often over-predicts the volatility, because it responds slowly

to large shocks.

79

slide-82
SLIDE 82

GARCH(m,r) models

  • Generalised ARCH models, GARCH (m,r) process (Boyerslev, 1982)

are obtained by augmenting σ2

t with a component autoregressive

in σ2

t .

  • For instance, a GARCH(1,1) model is

yt = σtǫt σ2

t

= ω + α1y2

t−1 + β1σ2 t−1.

  • Assuming α1 + β1 < 1 and using similar manipulations as before, it

can be shown that the GARCH(1,1) model admits a non-Gaussian ARMA(1,1) model for the squared process.

  • Indeed:

y2

t

= σ2

t ǫ2 t

ω + α1y2

t−1 + β1σ2 t−1

= σ2

t

80

slide-83
SLIDE 83

GARCH(m,r) models (cont)

It can be seen that: y2

t − σ2 t = σ2 t

  • ǫ2

t − 1

  • = vt

= ⇒ y2

t−1 − σ2 t−1 = σ2 t−1

  • ǫ2

t−1 − 1

  • = vt−1

and then y2

t − ω − α1y2 t−1 − β1σ2 t−1 = vt

= ⇒ y2

t = ω + α1y2 t−1 + β1y2 t−1 + β1

  • σ2

t−1 − y2 t−1

  • + vt

and so y2

t = ω + (α1 + β1) y2 t−1 − β1vt−1 + vt.

81

slide-84
SLIDE 84

GARCH(m,r) models (cont)

In general, the GARCH (m,r) model is yt = σtǫt σ2

t

= ω + α1y2

t−1 + ... + αmy2 t−m + β1σ2 t−1 + ... + βrσ2 t−r.

Sufficient conditions for the conditional variance to be positive are

  • bvious:

ω > 0, αi, βj ≥ 0, i = 1, ..., m; j = 1, ..., r. Using polynomials in the lag B, the specification of σ2

t may also be

given by (1 − β1B − ... − βrBr) σ2

t = ω + (α1B + ... + αmBr) yt

  • r

(1 − β (B)) σ2

t = ω + α (B) yt.

82

slide-85
SLIDE 85

GARCH(m,r) models (cont)

Assuming the zeros of the polynomial (1 − β (z)) are larger than one in absolute value, the model can also be written as an ARCH process

  • f infinite order:

σ2

t = (1 − β (B))−1 ω + (1 − β (B))−1 α (B) yt.

Note that a GARCH(m,r) admits a non-Gaussian ARMA(m,r) model for the squared process: y2

t = ω + max(m,r)

  • i=1

(αi + βi) y2

t−i + vt − r

  • i=1

βiy2

t−i.

Building and fitting GARCH models follows similarly to that discussed previously for ARCH models. See Ex. 8

83