Analysis of Multiple Time Series Kevin Sheppard - - PDF document

analysis of multiple time series
SMART_READER_LITE
LIVE PREVIEW

Analysis of Multiple Time Series Kevin Sheppard - - PDF document

Analysis of Multiple Time Series Kevin Sheppard ttsr Oxford MFE This version: February 24, 2020 February March, 2020 This weeks material Vector Autoregressions


slide-1
SLIDE 1

Analysis of Multiple Time Series

Kevin Sheppard ❤tt♣✿✴✴✇✇✇✳❦❡✈✐♥s❤❡♣♣❛r❞✳❝♦♠

Oxford MFE

This version: February 24, 2020

February – March, 2020

slide-2
SLIDE 2

This week’s material

Vector Autoregressions Basic examples Properties ◮ Stationarity Revisiting univariate ARMA processes Forecasting ◮ Granger Causality ◮ Impulse Response functions Cointegration ◮ Examining long-run relationships ◮ Determining whether a VAR is cointegrated ◮ Error Correction Models ◮ Testing for Cointegration ⊲ Engle-Granger

Lots of revisiting univariate time series.

2 / 67

slide-3
SLIDE 3

Why VAR analysis?

Stationary VARs ◮ Determine whether variables feedback into one another ◮ Improve forecasts ◮ Model the effect of a shock in one series on another ◮ Differentiate between short-run and long-run dynamics Cointegration ◮ Link random walks ◮ Uncover long run relationships ◮ Can improve medium to long term forecasting a lot 3 / 67

slide-4
SLIDE 4

VAR Defined

Pth order autoregression, AR(P):

yt = φ0 + φ1yt−1 + . . . + φP yt−p + ǫt

Pth order vector autoregression, VAR(P):

yt = Φ0 + Φ1yt−1 + . . . + ΦP yt−p + ǫt where yt and ǫt are k by 1 vectors

Bivariate VAR(1):

y1,t y2,t

  • =

φ01 φ02

  • +

φ11 φ12 φ21 φ22 y1,t−1 y2,t−1

  • +

ǫ1,t ǫ2,t

  • Compactly expresses two linked models:

y1,t = φ01 + φ11y1,t−1 + φ12y2,t−1 + ǫ1,t y2,t = φ02 + φ21y1,t−1 + φ22y2,t−1 + ǫ2,t

4 / 67

slide-5
SLIDE 5

Stationarity Revisited

Stationarity is a statistically meaningful form of regularity.

A stochastic process {yt} is covariance stationary if E[yt] = µ ∀t V[yt] = σ2 σ2 < ∞∀t E[(yt − µ)(yt−s − µ)] = γs ∀t, s

AR(1) stationarity: yt = φyt−1 + ǫt ◮ |φ| < 1 ◮ ǫt is white noise AR(P) stationarity: yt = φ1yt−1 + . . . + φP yt−P + ǫt ◮ Roots of (zP − φ1zP −1 − φ2zP −2 − . . . − φP −1z − φP ) less than 1 ◮ ǫt is white noise No dependence on t 5 / 67

slide-6
SLIDE 6

Relationship to AR

AR(1)

yt = φ0 + φ1yt−1 + ǫt = φ0 + φ1(φ0 + φ1yt−2 + ǫt−1) + ǫt = φ0 + φ1φ0 + φ2

1yt−2 + φ1ǫt−1 + ǫt

= φ0 + φ1φ0 + φ2

1(φ0 + φ1yt−3 + ǫt−2) + φ1ǫt−1 + ǫt

= φ0

  • i=0

φi

1 + ∞

  • i=0

φi

1ǫt−i

= (1 − φ1)−1φ0 +

  • i=0

φi

1ǫt−i

6 / 67

slide-7
SLIDE 7

Relationship to AR

VAR(1)

yt = Φ0 + Φ1yt−1 + ǫt = Φ0 + Φ1(Φ0 + Φ1yt−2 + ǫt−1) + ǫt = Φ0 + Φ1Φ0 + Φ2

1yt−2 + Φ1ǫt−1 + ǫt

= Φ0 + Φ1Φ0 + Φ2

1(Φ0 + Φ1yt−3 + ǫt−2) + Φ1ǫt−1 + ǫt

=

  • i=0

Φi

1Φ0 + ∞

  • i=0

Φi

1ǫt−i

= (Ik − Φ1)−1Φ0 +

  • i=0

Φi

1ǫt−i

7 / 67

slide-8
SLIDE 8

Properties of a VAR(1) and AR(1)

AR(1) : yt = φ0 + φ1yt−1 + ǫt VAR(1) : yt = Φ0 + Φ1yt−1 + ǫt AR(1) VAR(1) Mean φ0/(1 − φ1) (Ik − Φ1)−1Φ0 Variance σ2/(1 − φ2

1)

(I − Φ1 ⊗ Φ1)−1vec(Σ) sth Autocovariance γs = φs

1V[yt]

Γs = Φs

1V[yt]

  • sth Autocovariance

γ−s = φs

1V[yt]

Γ−s = V[yt]Φs

1 ′

Autocovariances of vector processes are not symmetric, but Γs = Γ′

−s

Stationarity ◮ AR(1): |φ1| < 1 ◮ VAR(1): |λi| < 1 where λi are the eigenvalues of Φ1 8 / 67

slide-9
SLIDE 9

Stock and Bond VAR

VWM from CRSP TERM constructed from 10-year bond return minus 1-year return

from FRED

February 1962 until December 2018 (683 months)

VWMt TER Mt

  • =

φ01 φ02

  • +

φ11,1 φ12,1 φ21,1 φ22,1 VWMt−1 TER Mt−1

  • +

ǫ1,t ǫ2,t

  • Market model:

V WMt = φ01 + φ11,1V WMt−1 + φ12,110Y Rt−1 + ǫ1,t

Long bond model

TER Mt = φ01 + φ21,1VWMt−1 + φ22,1TER Mt−1 + ǫ2,t.

Estimates

V WMt TERMt

  • =

  0.801

(0.000)

0.232

(0.041)

 +   0.059

(0.122)

0.166

(0.004)

−0.104

(0.000)

0.116

(0.002)

  V WMt−1 TERMt−1

  • +

ǫ1,t ǫ2,t

  • 9 / 67
slide-10
SLIDE 10

Stock and Bond VAR

Estimates from VAR

VWMt = 0.816

(0.000)

+ 0.060

(0.117)

VWMt−1 + 0.168

(0.003)

TER Mt−1 TER Mt = 0.228

(0.045)

− 0.104

(0.000)

VWMt−1 + 0.115

(0.002)

TER Mt−1

Estimates from AR

VWMt = 0.830

(0.000)

+ 0.073

(0.057)

VWMt−1 TER Mt = 0.137

(0.224)

+ 0.098

(0.011)

TER Mt−1

10 / 67

slide-11
SLIDE 11

Comparing AR and VAR forecasts

1-month-ahead forecasts of the VWM returns

727198 728659 730120 731581 733042 734503 735964 0.0 0.1 0.2

1-month-ahead forecasts of 10-year bond returns

727198 728659 730120 731581 733042 734503 735964

  • 0.1

0.0 0.1 0.2

11 / 67

slide-12
SLIDE 12

Monetary Policy VAR

Standard tool in monetary policy analysis ◮ Unemployment rate (differenced) ◮ Federal Funds rate ◮ Inflation rate (differenced)

  ∆❯◆❊▼Pt ❋❋t ∆■◆❋t   = Φ0 + Φ1   ∆❯◆❊▼Pt−1 ❋❋t−1 ∆■◆❋t−1   +   ǫ1,t ǫ2,t ǫ3,t   .

∆ ln ❯◆❊▼Pt−1 ❋❋t−1 ∆■◆❋t−1 ∆ ln ❯◆❊▼Pt 0.624

(0.000)

0.015

(0.001)

0.016

(0.267)

❋❋t −0.816

(0.000)

0.979

(0.000)

−0.045

(0.317)

∆■◆❋t −0.501

(0.010)

−0.009

(0.626)

−0.401

(0.000)

12 / 67

slide-13
SLIDE 13

Interpreting Estimates

Variable scale affects cross-parameter estimates ◮ Not an issue in ARMA analysis Standardizing data can improve interpretation when scales differ

∆ ln ❯◆❊▼Pt−1 ❋❋t−1 ∆■◆❋t−1 ∆ ln ❯◆❊▼Pt 0.624

(0.000)

0.153

(0.001)

0.053

(0.267)

❋❋t −0.080

(0.000)

0.979

(0.000)

−0.015

(0.317)

∆■◆❋t −0.151

(0.010)

−0.028

(0.626)

−0.401

(0.000)

Other important measures – statistical significance, persistence,

model selection – are unaffected by standardization

13 / 67

slide-14
SLIDE 14

VAR(P) is really a VAR(1)

Companion form:

yt = Φ0 + Φ1yt−1 + Φ2yt−2 + . . . + ΦP yt−P + ǫt

Reform into a single VAR(1) where

µ = E [yt] = (I − Φ1 − . . . − ΦP )−1 Φ0 zt = Υzt−1 + ξt zt =      yt − µ yt−1 − µ . . . yt−P+1 − µ      , Υ =        Φ1 Φ2 Φ3 . . . ΦP−1 ΦP Ik . . . Ik . . . . . . . . . . . . . . . . . . . . . . . . Ik       

◮ All results can be directly applied to the companion form. ◮ Can also be used to transform AR(P) into VAR(1) 14 / 67

slide-15
SLIDE 15

Revisiting Univariate Forecasting

Consider standard AR(1)

yt = φ0 + φ1yt−1 + ǫt

Optimal 1-step ahead forecast:

Et[yt+1] = Et[φ0] + Et[φ1yt] + Et[ǫt+1] = φ0 + φ1yt + 0

Optimal 2-step ahead forecast:

Et[yt+2] = Et[φ0] + Et[φ1yt+1] + Et[ǫt+2] = φ0 + φ1Et[yt+1] + 0 = φ0 + φ1(φ0 + φ1yt) = φ0 + φ1φ0 + φ2

1yt

Optimal h-step ahead forecast:

Et[yt+h] =

h−1

  • i=0

φi

1φ0 + φh 1yt

15 / 67

slide-16
SLIDE 16

Forecasting with VARs

Identical to univariate case

yt = Φ0 + Φ1yt−1 + ǫt

Optimal 1-step ahead forecast:

Et[yt+1] = Et[Φ0] + Et[Φ1yt] + Et[ǫt+1] = Φ0 + Φ1yt + 0

Optimal h-step ahead forecast:

Et[yt+h] = Φ0 + Φ1Φ0 + . . . + Φh−1

1

Φ0 + Φh

1yt

=

h−1

  • i=0

Φi

1Φ0 + Φh 1yt

Higher order forecast can be recursively computed

Et[yt+h] = Φ0 + Φ1Et[yt+h−1] + . . . + ΦP Et[yt+h−P ]

16 / 67

slide-17
SLIDE 17

What makes a good forecast?

Forecast residuals

ˆ et+h|t = yt+h − ˆ yt+h|t

Residuals are not white noise Can contain an MA(h − 1) component ◮ Forecast error for yt+1 − ˆ

yt+1|t−h+1 was not known at time t.

Plot your residuals Residual ACF Mincer-Zarnowitz regressions Three period procedure ◮ Training sample: Used to build model ◮ Validation sample: Used to refine model ◮ Evaluation sample: Ultimate test, ideally 1 shot 17 / 67

slide-18
SLIDE 18

Multi-step Forecasting

Two methods Iterative method ◮ Build model for 1-step ahead forecasts

yt = Φ0 + Φ1yt−1 + ǫt

◮ Iterate forecast out to period h

ˆ yt+h|t =

h−1

  • i=0

Φi

1Φ0 + Φh 1yt ◮ Makes efficient use of information ◮ Imposes a lot of structure on the problem Direct Method ◮ Build model for h-step ahead forecasts

yt = Φ0 + Φhyt−h + ǫt

◮ Directly forecast using a pseudo 1-step ahead method

ˆ yt+h|t = Φ0 + Φhyt

◮ Robust to some nonlinearities 18 / 67

slide-19
SLIDE 19

Multi-step Forecast Evaluation

Multistep forecast evaluation is identical to one-step ahead forecast

evaluation with one caveat

h-step ahead forecast errors may be correlated with any forecast error

not known at time t ˆ et+1|t−h+1, ˆ et+2|t−h+2, . . . , ˆ et+h−1|t−1

Leads to a MA(h − 1) structure in the forecast errors Solutions: ◮ Use regular GMZ regression with a Newey-West covariance

estimator yt+h − ˆ yt+h|t = β1 + β2ˆ yt+h|t + γxt + ηt H0 : β1 = β2 = γ = 0, H1 : β1 = 0 ∪ β2 = 0 ∪ γj = 0 ∃j

◮ Explicitly model the MA(h − 1) and use a standard covariance

estimator yt+h − ˆ yt+h|t = β1 + β2ˆ yt+h|t + γxt + ηt +

h−1

  • i=1

θiηt−i Note: Null is the same; does not impose a restriction on θ

19 / 67

slide-20
SLIDE 20

Example: Monetary Policy VAR

Forecasts produced iteratively for 1 to 8 quarters ahead Random walk (FF) or constant mean benchmark AR and VAR select lag length using BIC Restricted force reversion to in-sample mean using 2-step

estimator

  • 1. Estimate sample mean, and subtract to produce ˜

yt = yt − ˆ µ

  • 2. Estimate VAR without a constant

˜ yt = Φ1˜ yt−1 + . . . + ΦP ˜ yt−P + ǫt

  • 3. Forecast and then add the in-sample mean

Et [˜ yt+h] + ˆ µ

Evaluation based on relative MSE

❘❡❧✳ ▼❙❊ = ▼❙❊ ▼❙❊b

m

, ▼❙❊ = 1/T−h−R

T−h

  • t=R
  • yt+h − ˆ

yt+h|t 2

20 / 67

slide-21
SLIDE 21

Example: Monetary Policy VAR

VAR AR Horizon Series Restricted Unrestricted Restricted Unrestricted Unemployment 0.522 0.520 0.507 0.507 1

  • Fed. Funds Rate

0.887 0.903 0.923 0.933 Inflation 0.869 0.868 0.839 0.840 Unemployment 0.716 0.710 0.717 0.718 2

  • Fed. Funds Rate

0.923 0.943 1.112 1.130 Inflation 1.082 1.081 1.031 1.030 Unemployment 0.872 0.861 0.937 0.940 4

  • Fed. Funds Rate

0.952 0.976 1.082 1.109 Inflation 1.000 0.999 0.998 0.998 Unemployment 0.820 0.806 0.973 0.979 8

  • Fed. Funds Rate

0.974 1.007 1.062 1.110 Inflation 1.001 1.000 0.998 0.997

21 / 67

slide-22
SLIDE 22

Estimation and Identification

Univariate Identification: Box-Jenkins ◮ Use ACF and PACF to determine AR and MA lag order ◮ Examine residuals ◮ Parsimony principle The autocorrelation of a scalar process is defined

ρs = γs γ0 where γs is sth the autocovariance

◮ Regression coefficient:

yt = µ + ρsyt−s + ǫt

Partial autocorrelation ψs ◮ Regression interpretation of sth partial autocorrelation:

yt = µ + φ1yt−1 + φ2yt−2 + . . . + φs−1yt−s+1 + ψsyt−s + ǫt

◮ ψ is the sth partial autocorrelation 22 / 67

slide-23
SLIDE 23

CCF and Partial CCF

Multivariate equivalents ◮ ACF and PACF have same regression definitions ◮ Cross-correlation function

ρxy,s = E[(xt − µx)(yt−s − µy)]

  • V[xt]V[yt]

ρyx,s = E[(yt − µy)(xt−s − µx)]

  • V[xt]V[yt]

◮ Generally different ◮ Cross-partial-correlation function ψxy,s

xt =φ0 + φx1xt−1 + . . . + φxs−1xt−(s−1) + φy1yt−1 + . . . + φys−1yt−(s−1) + ϕxy,syt−s + ǫx,t

⊲ Can help identify VAR order Deeper issue: too many and too complicated Simple solution: Model selection 23 / 67

slide-24
SLIDE 24

Interpreting CCFs and PCCFs

y has HAR dynamics, spills over to x

xt yt

  • =

0.5 0.9 .0 0.47 xt−1 yt−1

  • +

5

  • i=2

0.06 xt−i yt−i

  • +

22

  • j=6

0.01 xt−j yt−j

  • +

ǫx,t ǫy,t

  • Simulated data

200 400 600 800 1000

  • 10

10 x y

24 / 67

slide-25
SLIDE 25

ACFs and CCFs

ACF (x on lagged x) CCF (x on lagged y)

1 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 1 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0

CCF (y on lagged x) ACF (y on lagged y)

1 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 1 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 25 / 67

slide-26
SLIDE 26

PACFs and Partial CCFs

PACF (x on lagged x) PCCF (x on lagged y)

1 6 12 18 24 0.00 0.05 0.10 0.15 0.20 0.25 1 6 12 18 24 0.00 0.05 0.10 0.15 0.20 0.25

PCCF (y on lagged x) PACF (y on lagged y)

1 6 12 18 24 0.00 0.05 0.10 0.15 0.20 0.25 1 6 12 18 24 0.00 0.05 0.10 0.15 0.20 0.25 26 / 67

slide-27
SLIDE 27

Model Selection

Step 1: Pick maximum lag length ◮ Information criteria

AIC: ln |Σ(P)| + k2P 2 T Hannan-Quinn IC (HQIC): ln |Σ(P)| + k2P ln ln T T SIC: ln |Σ(P)| + k2P ln T T

⊲ Σ(P) is the covariance of the residuals using P lags ⊲ | · | is the determinant ◮ Hypothesis testing based ⊲ General to Specific ⊲ Specific to General ◮ Likelihood Ratio

(T − P2k2) (ln |Σ(P1)| − ln |Σ(P2)|)

A

∼ χ2

(P2−P1)k2

27 / 67

slide-28
SLIDE 28

Lag Length Selection in Monetary Policy VAR

Maximum lag: 12 (1 year)

Lag Length AIC HQIC BIC LR P-val 4.014 3.762 3.605 925 0.000 1 0.279 0.079 0.000 39.6 0.000 2 0.190 0.042 0.041 40.9 0.000 3 0.096 0.000 0.076 29.0 0.001 4 0.050 0.007 0.160 7.34 0.602 5 0.094 0.103 0.333 29.5 0.001 6 0.047 0.108 0.415 13.2 0.155 7 0.067 0.180 0.564 32.4 0.000 8 0.007 0.172 0.634 19.8 0.019 9 0.000 0.217 0.756 7.68 0.566 10 0.042 0.312 0.928 13.5 0.141 11 0.061 0.382 1.076 13.5 0.141 12 0.079 0.453 1.224 – –

28 / 67

slide-29
SLIDE 29

Granger Causality

First fundamentally new concept Examines whether lags of one variable are helpful in predicting

another

Definition (Granger Causality)

A scalar random variable {xt} is said to not Granger cause {yt} if E[yt|xt−1, yt−1, xt−2, yt−2, . . .] = E[yt|, yt−1, yt−2, . . .]. That is, {xt} does not Granger cause if the forecast of yt is the same whether conditioned

  • n past values of xt or not.

29 / 67

slide-30
SLIDE 30

Granger Causality

Translates directly into a restriction in a VAR Unrestricted

xt yt

  • =

φ01 φ02

  • +

φ11 φ12 φ21 φ22 xt−1 yt−1

  • +

ǫ1,t ǫ2,t

  • Restricted so that xt does not GC yt

xt yt

  • =

φ01 φ02

  • +

φ11 φ12 φ22 xt−1 yt−1

  • +

ǫ1,t ǫ2,t

  • xt = φ01 + φ11xt−1 + φ12yt−1 + ǫ1,t

yt = φ02 + φ22yt−1 + ǫ2,t ⇐ No xt!

30 / 67

slide-31
SLIDE 31

More Granger Causality

In P lag model

yt = Φ0 + Φ1yt−1 + Φ2yt−2 + . . . + ΦP yt−P + ǫt the null hypothesis is H0 : φij,1 = φij,2 = . . . = φij,P = 0

Alternative is

H0 : φij,1 = 0 or φij,2 = 0 or . . . or φij,P = 0

Likelihood Ratio test

(T − Pk2) (ln |Σr| − ln |Σu|) A ∼ χ2

P

Σu is the covariance of the errors from unrestricted model Σr is the covariance of the errors from restricted model T − Pk2 is number of observations minus number of free

parameters in unrestricted model

◮ Why χ2

P ?

31 / 67

slide-32
SLIDE 32

Monetary Policy VAR

Standard tool in monetary policy analysis ◮ Unemployment rate (differenced) ⊲ Federal Funds rate ⊲ Inflation rate (differenced)

  ∆❯◆❊▼Pt ❋❋t ∆■◆❋t   = Φ0 + Φ1   ∆❯◆❊▼Pt−1 ❋❋t−1 ∆■◆❋t−1   +   ǫ1,t ǫ2,t ǫ3,t   .

32 / 67

slide-33
SLIDE 33

Granger Causality in Campbell’s VAR

Using model with lags 3 lags (HQIC) H0 : φij,1 = φij,2 = φij,3 = 0 H1 : φij,1 = 0 or φij,2 = 0 or φij,3 = 0 i represent series being affected by lags of series j

  • Fed. Funds Rate

Inflation Unemployment Exclusion P-val Stat P-val Stat P-val Stat

  • Fed. Funds Rate

– – 0.001 13.068 0.014 8.560 Inflation 0.001 14.756 – – 0.375 1.963 Unemployment 0.000 19.586 0.775 0.509 – – All 0.000 33.139 0.000 18.630 0.005 10.472

33 / 67

slide-34
SLIDE 34

Impulse Response Functions

Second fundamentally new concept Complicated dynamics of a VAR make direct interpretation of

coefficients difficult

Solution is to examine impulse responses The impulse response function of yi with respect to a shock in ǫj,

for any j and i, is defined as the change in yit+s, s ≥ 0 for a unit shock in ǫjt

◮ Hard to decipher As long as yt is covariance stationarity it must have a VMA

representation, yt = µ + ǫt + Ξ1ǫt−1 + Ξ2ǫt−2 + . . .

Ξj are the impulse responses! Why? ◮ Directly measure the effect in period j of any shock 34 / 67

slide-35
SLIDE 35

AR(P) and MA(∞)

Any stationary AR(P)

yt = φ0 + φ1yt−1 + φ2yt−2 + . . . + φP yt−P + ǫt can be represented as an MA(∞) yt = φ0/(1 − φ1 − φ2 − . . . − φP ) + ǫt +

  • i=1

θiǫt−i

AR(1)

yt = φ0 + φ1yt−1 + ǫt becomes yt = φ0/(1 − φ1) + ǫt +

  • i=1

φi

1ǫt−i

Stationary VAR(P) have the same relationship to VMA(∞)

yt = Φ0 + Φ1yt−1 + Φ2yt−2 + . . . + ΦP yt−P + ǫt yt = µ + ǫt + Ξ1ǫt−1 + Ξ2ǫt−2 + . . .

35 / 67

slide-36
SLIDE 36

Solving IR

Easy in VAR(1)

yt = (IK − Φ1)−1Φ0 + ǫt + Φ1ǫt−1 + Φ2

1ǫt−2 + . . . .

Ξj = Φj

1

In the general VAR(P),

Ξj = Φ1Ξj−1 + Φ2Ξj−2 + . . . + ΦP Ξj−P where Ξ0 = Ik and Ξm = 0 for m < 0.

◮ In a VAR(2),

yt = Φ1yt−1 + Φ2yt−2 + ǫt

⊲ Ξ0 = Ik, Ξ1 = Φ1, Ξ2 = Φ2 1 + Φ2, and Ξ3 = Φ3 1 + Φ1Φ2 + Φ2Φ1. Confidence intervals are also somewhat painful ◮ Explained in notes 36 / 67

slide-37
SLIDE 37

Considerations for Shocks

Simple bivariate VAR(1)

xt yt

  • =

φ01 φ02

  • +

φ11 φ12 φ21 φ22 xt−1 yt−1

  • +

ǫ1,t ǫ2,t

  • How you shock matters

Depends on correlation between ǫ1,t and ǫ2,t 3 methods ◮ Ignore correlation and just shock ǫj,t with a 1 standard deviation

shock

◮ Use Cholesky to factor Σ and use Σ1/2ej where ej is a vector of

zeros with 1 in the jth position Σ = 1 .5 .5 1

  • Σ1/2

C

= 1 .5 .866

  • ⊲ Variable order matters

◮ “Generalized” impulse response that uses a projection method 37 / 67

slide-38
SLIDE 38

Example of the different shocks

Define the error covariance

Σ =

  • σ2

x

σxσyρ σxσyρ σ2

y

  • ◮ Standardized

σx

  • and

σy

  • ◮ Cholesky

Σ1/2

C

  • 1
  • =

σx σyρ σy

  • 1 − ρ2

1

  • =
  • σy
  • 1 − ρ2
  • σx

σyρ σy

  • 1 − ρ2

1

  • =

σx σyρ

  • ,other is
  • σy
  • 1 − ρ2
  • 38 / 67
slide-39
SLIDE 39

Impulse Responses

Federal Funds ordered first Response to Federal Funds Shock Cholesky factorization

Inflation Rate Unemployment Rate Federal Funds

4 8 12 16 0.00 0.05 0.10 4 8 12 16

  • 0.25
  • 0.20

4 8 12 16 0.1 0.2

39 / 67

slide-40
SLIDE 40

Cointegration

Cointegration is the VAR version of unit roots Establishes long run relationships between two unit root variables ◮ Consumption has a unit root, income has a unit root ◮ Consumption - Income : ????

Definition (Integrated of Order 1)

A variable yt is integrated of order 1 (I(1)) if yt is non-stationary and ∆yt = yt − yt−1 is stationary.

40 / 67

slide-41
SLIDE 41

Cointegration

Definition (Bivariate Cointegration)

If xt and yt are are cointegrated if both are I(1) and there exists a vector β with both elements non-zero such that β1xt − β2yt ∼ I(0)

Strong link between xt and yt Both are random walks but difference is mean reverting Mean reversion to the trend (stochastic trend) 41 / 67

slide-42
SLIDE 42

What does cointegration look like?

yt = Φijyt−1 + ǫt Φ11 = .8 .2 .2 .8

  • Φ12 =

1 1

  • λi = 1, 0.6

λi = 1, 1 Φ21 = .7 .2 .2 .7

  • Φ22 =

−.3 .3 .1 −.2

  • λi = 0.9, 0.5

λi = −0.43, −0.06

42 / 67

slide-43
SLIDE 43

Persistence, Anti-persistence and Cointegration

Cointegration (Φ11) Independent Unit Roots(Φ12)

20 40 60 80 100 20 40 60 80 100

Persistent, Stationary (Φ21) Anti-persistent, Stationary(Φ22)

20 40 60 80 100 20 40 60 80 100 43 / 67

slide-44
SLIDE 44

How do we know when a VAR is cointegrated?

Eigenvalue condition determines whether a VAR(1) is cointegrated

yt xt

  • =

.8 .2 .2 .8 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • Cointegrated if only 1 eigenvalue is unity.

If all less than 1: ? If both 1: two independent unit roots

Φ11 = .8 .2 .2 .8

  • Φ12 =

1 1

  • λi = 1, 0.6

λi = 1, 1 Φ21 = .7 .2 .2 .7

  • Φ22 =

−.3 .3 .1 −.2

  • λi = 0.9, 0.5

λi = −0.43, −0.06

44 / 67

slide-45
SLIDE 45

Error Correction Models

Major point of cointegration ◮ Cointegrated ⇔ Error correction model What is an error correction model? ◮ Cointegrated VAR:

yt xt

  • =

.8 .2 .2 .8 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • ◮ Error correction model:

∆yt ∆xt

  • =
  • −.2

.2 .2 −.2 yt−1 xt−1

  • +
  • ǫ1,t

ǫ2,t

  • ◮ Normalized form

∆yt ∆xt

  • =

−.2 .2 1 −1 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • [1 − 1] is cointegrating vector

[−.2 .2]′ measures the speed of adjustment 45 / 67

slide-46
SLIDE 46

From VAR to VECM

yt xt

  • =

.8 .2 .2 .8 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • Subtracting [yt−1 xt−1]′ from both sides

yt xt

yt−1 xt−1

  • =

.8 .2 .2 .8 yt−1 xt−1

yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • ∆yt

∆xt

  • =

.8 .2 .2 .8

1 1 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • ∆yt

∆xt

  • =

−.2 .2 .2 −.2 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • ∆yt

∆xt

  • =

−.2 .2 1 −1 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • 46 / 67
slide-47
SLIDE 47

Cointegrating vectors

∆yt ∆xt

  • =

−.2 .2 .2 −.2 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • ∆yt

∆xt

  • =

−.2 .2 1 −1 yt−1 xt−1

  • +

ǫ1,t ǫ2,t

  • Cointegrating relationship can always be decomposed

∆yt = πyt−1 + ǫt π = αβ′

α measures the speed of convergence β contain the cointegrating vectors Number of cointegrating vectors is rank(αβ′)

αβ′ =   0.3 0.2 −0.36 0.2 0.5 −0.35 −0.3 −0.3 0.39  

How many? 47 / 67

slide-48
SLIDE 48

Determining the cointegrating vectors

∆yt = πyt−1 + ǫt π =   0.3 0.2 −0.36 0.2 0.5 −0.35 −0.3 −0.3 0.39  

Put π in row echelon form

❘♦✇ ❊❝❤❡❧♦♥ ❋♦r♠ =   1 −1 1 −0.3  

Recall π = αβ′

β =   1 1 −1 −.3   α =   .3 .2 .2 .5 −.3 −.3  

48 / 67

slide-49
SLIDE 49

Solving for the cointegrating vectors

αβ′ =   0.3 0.2 −0.36 0.2 0.5 −0.35 −0.3 −0.3 0.39   Row-Echelon Form ⇒   1 −1 1 −0.3   β =   1 1 β1 β2   and α has 6 unknown parameters. αβ′ can be combined to produce π =   α11 α12 α11β1 + α12β2 α21 α22 α21β1 + α22β2 α31 α32 α31β1 + α32β2  

49 / 67

slide-50
SLIDE 50

Testing for Cointegration

Two tests for cointegration ◮ Engle-Granger ◮ Johansen We will focus on Engle-Granger ◮ Simple and intuitive ◮ Only applicable with 1 cointegrating relationship Test key property of cointegration: difference is I(0) Most of the work is a simple OLS

yt = δ0 + βxt + ǫt

Rest of work is testing ˆ

ǫt for a unit root

Johansen tests eigenvalues of π = αβ′ directly. 50 / 67

slide-51
SLIDE 51

Engle-Granger Procedure

Algorithm (Engle-Granger Test)

  • 1. Begin by analyzing xt and yt in isolation. Both must be unit roots to consider

cointegration.

  • 2. Estimate the long run relationship

yt = δ0 + βxt + ǫt and test H0 : γ = 0 against H0 : γ < 0 in the ADF regression ∆ˆ ǫt = γˆ ǫt−1 + δ1∆ˆ ǫt−1 + . . . + δp∆ˆ ǫt−P + ηt.

  • 3. Using the estimated parameters, specify and estimate the error correction form
  • f the relationship,

∆xt ∆yt

  • =

π01 π02 + α1ˆ ǫt α2ˆ ǫt + π1 ∆xt−1 ∆yt−1

  • + . . . + πP

∆xt−P ∆yt−P

  • +

η1,t η2,t

  • 4. Assess the model

51 / 67

slide-52
SLIDE 52

Engle-Granger Considerations

Deterministic terms ◮ No deterministic terms: only in special circumstances

yt = βxt + ǫt

◮ Constant: standard case

yt = δ0 + βxt + ǫt

◮ Time trend and constant: allow different growth rates/time trends in

variables yt = δ0 + δ1t + βxt + ǫt

Critical Values ◮ Critical values depend on the deterministics in the CI regression ⊲ Models with more deterministics have lower (more negative) critical

values

◮ Critical values depend on number of RHS I(1) variables ⊲ Larger models have lower critical values 52 / 67

slide-53
SLIDE 53

Example: cay

Consumption-Aggregate Wealth has been an interesting

cointegrated series in recent finance literature

Has revived the CCAPM Three components: ◮ Consumption (c) ◮ Asset Wealth (a) ◮ Labor Income (Human Wealth) (y) Deviation from long run related to expected return Cointegrating relationship: ct + .643 − 0.249at − 0.785yt

Unit Root Tests Series T-stat P-val ADF Lags c

  • 1.198

0.674 5 a

  • 0.205

0.938 3 y

  • 2.302

0.171 ˆ ǫc

t

  • 2.706

0.383 1 ˆ ǫa

t

  • 2.573

0.455 ˆ ǫy

t

  • 2.679

0.398 1

53 / 67

slide-54
SLIDE 54

cay Cointegration Analysis

Original Series (logs)

1960 1970 1980 1990 2000 2010

  • 0.1

0.0 0.1 0.2 Consumption Asset Prices Labor Income

Error

1960 1970 1980 1990 2000 2010

  • 2

2 c Residual a Residual (Neg.) y Residual (Neg.)

54 / 67

slide-55
SLIDE 55

Vector Error Correction Model

VECM estimated using the residuals from cointegrating regression

  ∆ct ∆at ∆yt   =      0.003

(0.000)

0.004

(0.014)

0.003

(0.000)

    +      −0.000

(0.281)

0.002

(0.037)

0.000

(0.515)

     ˆ ǫt−1+      0.192

(0.005)

0.102

(0.000)

0.147

(0.004)

0.282

(0.116)

0.220

(0.006)

−0.149

(0.414)

0.369

(0.000)

0.061

(0.088)

−0.139

(0.140)

       ∆ct−1 ∆at−1 ∆yt−1  +ηt

P-values in parentheses Estimation of cointegration relationship has no effect on standard

errors

◮ Converges fast (T) ◮ VECM parameters converge at rate

√ T

55 / 67

slide-56
SLIDE 56

Spurious Regression and Balance

Caution is needed when working with I(1) data ◮ I(0) on I(0): The usual case. Standard asymptotic arguments apply. ◮ I(1) on I(0): This regression is unbalanced. ◮ I(1) on I(1): Cointegration or spurious regression. ◮ I(0) on I(1): This regression is unbalanced. Spurious regression can lead to large t-stats when the series are

independent.

◮ Two unrelated I(1) processes, xt and yt

xt = xt−1 + ǫt yt = yt−1 + ηt

◮ When T = 50, approx 80% of t-stats are significant ◮ Always check for I(1) when using time-series data ◮ If both I(1), make sure cointegrated. 56 / 67

slide-57
SLIDE 57

Spurious Regression

T = 50

  • 15
  • 10
  • 5

5 10 15 ±1.96

T = 200

  • 30
  • 20
  • 10

10 20 30 ±1.96

57 / 67

slide-58
SLIDE 58

Cross-section Regression with Time Series Data

It is common to run regressions using time-series data

yt = xtβ + ǫt

Using time-series data in a cross-sectional regression may require

modification to inference

Modification is needed if the scores {xtǫt} are autocorrelated

ˆ β − β =

  • 1

T

T

  • t=1

xtx′

t

−1 1 T

T

  • t=1

xtǫt ⇒ V

  • ˆ

β − β

Σ−1

XXV

  • 1

T

T

  • t=1

xtǫt

  • Σ−1

XX

◮ Usually occurs when the errors ǫt are autocorrelated due to mis- or

under-specification of the model

58 / 67

slide-59
SLIDE 59

Why the difference?

Consider the estimation of the mean when yt has white noise

errors yt = µ + ǫt

Obviously The sample mean is unbiased

E[ˆ µ] = E

  • T −1

T

  • t=1

yt

  • = T −1

T

  • t=1

E [yt] = µ

59 / 67

slide-60
SLIDE 60

Why the difference?

The variance of the sample mean

V[ˆ µ] = E  

  • T −1

T

  • t=1

yt − µ 2  = E  T −2  

T

  • t=1

ǫ2

t + T

  • r=1

T

  • s=1,r=s

ǫrǫs     = T −2

T

  • t=1

E[ǫ2

t] + T −2 T

  • r=1

T

  • s=1,r=s

E[ǫrǫs] = T −2

T

  • t=1

σ2 + T −2

T

  • r=1

T

  • s=1,r=s

= σ2 T ,

Due to white noise, E[ǫiǫj]=0 whenever i = j. This is the usual result 60 / 67

slide-61
SLIDE 61

The case of an MA(1) error

Now suppose that the error follows an MA(1)

ηt = θǫt−1 + ǫt where {ǫt} is a white noise process

Error is mean 0 and so sample mean is still unbiased Variance of sample mean is different since ηt is autocorrelated ◮ E[ηtηt−1] = 0.

V[ˆ µ] = E  

  • T −1

T

  • t=1

ηt 2  = E

  • T −2

T

  • t=1

η2

t + 2 T −1

  • t=1

ηtηt+1 + 2

T −2

  • t=1

ηtηt+2 + . . . + 2

2

  • t=1

ηtηt+T −2 + 2

1

  • t=1

ηtηt+T −1

  • 61 / 67
slide-62
SLIDE 62

The case of an MA(1) error

In terms of autocovariances,

V[ˆ µ] = T −2

T

  • t=1

E[η2

t ] + 2T −2 T −1

  • t=1

E[ηtηt+1] + 2T −2

T −2

  • t=1

E[ηtηt+2] + . . . + 2T −2

2

  • t=1

E[ηtηt+T −2] + 2T −2

1

  • t=1

E[ηtηt+T −1] = T −2

T

  • t=1

γ0 + 2T −2

T −1

  • t=1

γ1 + 2T −2

T −2

  • t=1

γ2 + . . . + 2T −2

1

  • t=1

γT −1

γ0 = V[ηt]=

  • 1 + θ2

V [ǫt] and γs = E[ηtηt−s]

An MA(1) has 1 non-zero autocovariance,

γ1 = E[ηtηt−1] = E[(θǫt−1 + ǫt)(θǫt−2 + ǫt−1)] = θ2E[ǫt−1ǫt−2] + θE[ǫ2

t−1] + θE[ǫtǫt−2] + E[ǫtǫt−1]

= θσ2

62 / 67

slide-63
SLIDE 63

The case of an MA(1) error

Putting it all together

V[ˆ µ] = T −2

T

  • t=1

γ0 + 2T −2

T+1

  • t=1

γ1 = T −2Tγ0 + 2T −2(T − 1)γ1 ≈ γ0 + 2γ1 T = σ2 1 + θ2 + 2θ

  • T

Can be larger or smaller (−2 < θ < 0) The variance of the sum is the sum of the variance

  • nly when the errors are uncorrelated

63 / 67

slide-64
SLIDE 64

Estimating the parameter covariance (from CS lecture)

When the scores are uncorrelated (a Martingale Difference

sequence (MDS)) White’s covariance estimator is consistent

Theorem (Consistency of Asymptotic Covariance Estimator)

Under the large sample assumptions, ˆ ΣXX =T −1X′X

p

→ ΣXX ˆ S =T −1

T

  • t=1

ˆ ǫ2

t x′ txt p

→ S and ˆ Σ

−1 XXˆ

S ˆ Σ

−1 XX p

→ Σ−1

XXSΣ−1 XX

64 / 67

slide-65
SLIDE 65

Modification to regression parameter covariance

White’s estimator is only heteroskedasticity robust – not

heteroskedasticity and autocorrelation robust ˆ S =T −1

T

  • t=1

ˆ ǫ2

tx′ txt p

→ S

Solution is to use a Newey-West covariance for the scores (xtǫt)

Definition (Newey-West Covariance Estimator)

Let zt be a k by 1 vector series that may be autocorrelated and define z∗

t = zt − ¯

z where ¯ z = T −1 T

t=1 zt. The L-lag Newey-West covariance

estimator for the variance of ¯ z is ˆ ΣNW = ˆ Γ0 +

L

  • l=1

wl

  • ˆ

Γl + ˆ Γ

′ l

  • where ˆ

Γl = T −1 T

t=l+1 z∗ t z∗′ t−l and wl = 1 − l L+1.

65 / 67

slide-66
SLIDE 66

Modification to regression parameter covariance

Applied to a cross-sectional regression with time-series data

ˆ SNW = T −1  

T

  • t=1

e2

tx′ txt + L

  • l=1

wl  

T

  • s=l+1

eses−lx′

sxs−l + T

  • q=l+1

eq−leqx′

q−lxq

    = ˆ Γ0 +

L

  • l=1

wl

  • ˆ

Γl + ˆ Γ

′ l

  • The HAC robust covariance of ˆ

β is ˆ Σ

−1 XXˆ

SNW ˆ Σ

−1 XX

66 / 67

slide-67
SLIDE 67

Considerations when using Newey-West an estimator

Is a Newey-West estimator needed? Complex estimators have

worse finite sample performance

It must be the case that L → ∞ as T → ∞ Even if the scores follow a MA(1)! Optimal rate is O(T 1 3 ) so L ∝ T 1 3 or L = cT 1 3 for some (unknown)

c

Other HAC estimators available and may work well if the scores

very persistent

◮ Den Haan-Levin Alternative is to include lagged regressand(s) in the regression

yt = xtβ +

P

  • p=1

φpyt−p + ǫt

◮ Not popular when focus is on cross-section component of model 67 / 67