Comments on Choice of ARMA model Keep it simple! Use small p and q . - - PowerPoint PPT Presentation

▶

Mar 11, 2024 872 likes •997 views

Comments on Choice of ARMA model Keep it simple! Use small p and q . Some systems have autoregressive-like structure. E.g. first order dynamics: dx ( t ) = x ( t ) dt or in stochastic form, dx ( t ) = x ( t ) dt + dW ( t )

SLIDE 1

Comments on Choice of ARMA model

Keep it simple! Use small p and q.
Some systems have autoregressive-like structure.
E.g. first order dynamics:

dx(t) dt = −αx(t)

r in stochastic form,

dx(t) = −αx(t)dt + dW(t) where W(t) is a Wiener process, the continuous time limit of the random walk.

1

SLIDE 2

Discrete time approximation:

δx(t) = x(t + δt) − x(t) = −αx(t)δt + δW(t)

x(t + δt) = x(t) − αx(t)δt + δW(t) = (1 − αδt)x(t) + δW(t), an AR(1) (causal if α > 0 and δt is small).

Similarly a second order system leads to AR(2).
Since many real-world systems can be approximated by first
r second order dynamics, this suggests using p = 1 or 2,

and q = 0.

2

SLIDE 3

Some systems have more dimensions. E.g. first order vector

autoregression, VARp(1):

xt

p × 1 =

Φ

p × p

xt−1

p × 1 +

wt

p × 1 .

Here each component time series is typically ARMA(p, p−1).
This suggests using q < p, especially q = p − 1.

3

SLIDE 4

Added noise: if yt is ARMA(p, q) with q < p, but we observe

xt = yt + w′

t where w′ t is white noise, uncorrelated with yt,

then xt is ARMA(p, p).

This suggests using q = p.
Summary:

you’ll often find that you can use small p and q ≤ p, perhaps q = 0 or q = p − 1 or q = p, depending on the background of the series.

4

SLIDE 5

Estimation

Current methods are likelihood-based:

f1,2,...,n (x1, x2, . . . , xn) = f1 (x1) × f2|1 (x2|x1) × . . . × fn|n−1,...,1

xn|xn−1, xn−2, . . . , x1 .

If xt is AR(p) and n > p, then

fn|n−1,...,1

xn|xn−1, xn−2, . . . , x1 =

fn|n−1,...,n−p

xn|xn−1, xn−2, . . . , xn−p .

5

SLIDE 6

Assume xt is Gaussian. E.g. AR(1):

ft|t−1(xt|xt−1) is N[(1 − φ)µ + φxt−1, σ2

w] for t > 1,

and f1(x1) is N[µ, σ2

w/(1 − φ2)].

So the likelihood, still for AR(1), is

L(µ, φ, σ2

w) = (2πσ2 w)−n/2

1 − φ2 exp
−S(µ, φ)

2σ2

w

where S(µ, φ) = (1 − φ2) (x1 − µ)2 +

n

(xt − µ) − φ xt−1 − µ 2 .

6

SLIDE 7

Methods in proc arima

method = ml: maximize the likelihood.
method = uls:

minimize the unconditional sum of squares S(µ, φ).

method = cls: minimize the conditional sum of squares Sc(µ, φ):

Sc(µ, φ) = S(µ, φ) − (1 − φ2) (x1 − µ)2 =

n

(xt − µ) − φ xt−1 − µ 2 .

This is essentially least squares regression of xt on xt−1.

7

SLIDE 8

AR(p), p > 1, can be handled similarly.
ARMA(p, q) with q > 0 is more complicated; state space

methods can be used to calculate the exact likelihood.

proc arima implements the same three methods in all cases.
All three methods give estimators with the same large-sample

normal distribution; all are asymptotically optimal.

8

SLIDE 9

Brute Force

Above methods fail (or need serious modification) if any data

are missing.

Can always fall back to brute force:

x1, x2, . . . , xn ∼ Nn(µ1, Γ), where

Γ

n × n =

       

γ(0) γ(1) γ(2) . . . γ(n − 1) γ(1) γ(0) γ(1) . . . γ(n − 2) γ(2) γ(1) γ(0) . . . γ(n − 3) . . . . . . . . . ... . . . γ(n − 1) γ(n − 2) γ(n − 3) . . . γ(0)

       

9

Comments on Choice of ARMA model

dx(t) dt = −αx(t)

dx(t) = −αx(t)dt + dW(t) where W(t) is a Wiener process, the continuous time limit of the random walk.

1

δx(t) = x(t + δt) − x(t) = −αx(t)δt + δW(t)

x(t + δt) = x(t) − αx(t)δt + δW(t) = (1 − αδt)x(t) + δW(t), an AR(1) (causal if α > 0 and δt is small).

and q = 0.

2

autoregression, VARp(1):

xt

p × 1 =

Φ

p × p

xt−1

p × 1 +

wt

p × 1 .

3

xt = yt + w′

t where w′ t is white noise, uncorrelated with yt,

then xt is ARMA(p, p).

you’ll often find that you can use small p and q ≤ p, perhaps q = 0 or q = p − 1 or q = p, depending on the background of the series.

4

Estimation

f1,2,...,n (x1, x2, . . . , xn) = f1 (x1) × f2|1 (x2|x1) × . . . × fn|n−1,...,1

xn|xn−1, xn−2, . . . , x1 .

fn|n−1,...,1

xn|xn−1, xn−2, . . . , x1 =

fn|n−1,...,n−p

xn|xn−1, xn−2, . . . , xn−p .

5

ft|t−1(xt|xt−1) is N[(1 − φ)µ + φxt−1, σ2

w] for t > 1,

and f1(x1) is N[µ, σ2

w/(1 − φ2)].

L(µ, φ, σ2

w) = (2πσ2 w)−n/2

2σ2

w

where S(µ, φ) = (1 − φ2) (x1 − µ)2 +

n

(xt − µ) − φ xt−1 − µ 2 .

6

Methods in proc arima

minimize the unconditional sum of squares S(µ, φ).

Sc(µ, φ) = S(µ, φ) − (1 − φ2) (x1 − µ)2 =

n

(xt − µ) − φ xt−1 − µ 2 .

This is essentially least squares regression of xt on xt−1.

7

methods can be used to calculate the exact likelihood.

normal distribution; all are asymptotically optimal.

8

Brute Force

are missing.

x1, x2, . . . , xn ∼ Nn(µ1, Γ), where

Γ

n × n =

       

γ(0) γ(1) γ(2) . . . γ(n − 1) γ(1) γ(0) γ(1) . . . γ(n − 2) γ(2) γ(1) γ(0) . . . γ(n − 3) . . . . . . . . . ... . . . γ(n − 1) γ(n − 2) γ(n − 3) . . . γ(0)

       

9

wγ∗(h), and use e.g.

R’s ARMAacf(...) to compute γ∗(h).

1

exp

2(x − µ1)′Γ−1(x − µ1)

1

wΓ∗)

exp

2σ2

w

(x − µ1)′Γ∗−1(x − µ1)

w, then

numerically with respect to φ and θ.

10