Weight Selection for a Model Weight Selection for a Model Average - - PowerPoint PPT Presentation

weight selection for a model weight selection for a model
SMART_READER_LITE
LIVE PREVIEW

Weight Selection for a Model Weight Selection for a Model Average - - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work


slide-1
SLIDE 1

1

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator

Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong

(joint work with H. Liang and G. (joint work with H. Liang and G. Zou Zou, University , University

  • f Rochester)
  • f Rochester)
slide-2
SLIDE 2

2

  • Model selection methods assume final model chosen

in advance

  • Under‐reporting of variability and confidence

intervals.

  • Papers on under‐reporting due to model selection:

Danilov and Magnus (2004, J. of Econometrics) Leeb and Pötscher (2006, Annals of Stat) Leeb and Pötscher (2008, Econometric Theory)

slide-3
SLIDE 3

3

  • Current paper – frequentist model

averaging

  • Bayesian model averaging

– very common – based on prior probabilities for potential models and priors for parameters. – Hoeting et al. (1999, Stat Science)

  • Frequentist model averaging

– Hjort and Claeskens (2003, JASA) – Yuan and Yang (2005, JASA) – Leung and Barron (2006, IEEE Info Th.) – Hansen (2007, Econometrica)

slide-4
SLIDE 4

4

  • Current paper motivated by Hansen

(2007, Econometrica)

  • Hansen’s approach: Weights chosen by

minimizing the Mallows criterion, equivalent to squared error in large samples.

  • Model framework:

( ) ( )( ) ( )

( )

2 1 1 1

, . . . ~ ; σ ε ε θ d i i H y

n P P n n × × × ×

+ =

slide-5
SLIDE 5

5

Hansen’s approach:

  • Order the regressors at the outset,

X1, X2, X3,…Xp

  • Estimate a set of nested models:

ε θ θ θ ε θ θ ε θ + + + + = + + = + =

p p

X X X y X X y X y

  • 2

2 1 1 2 2 2 1 1 1 1 1

; ;

slide-6
SLIDE 6

6

  • Let Hp be an n × p (≤ P) matrix

comprising the 1st p columns of H and ωp is the weight.

  • Hansen’s (MMA) estimator :

1 1

( ) ˆ

P p p p m p p

H H H y ω

− =

′ ′ ⎛ ⎞ Θ = ⎜ ⎟ ⎝ ⎠

slide-7
SLIDE 7

7

  • Mallows criterion:
  • Dk

( )

( ) ( )

( ) ( ) ( )

parameters

  • f

number effective the is and , 2 ˆ ˆ

2 1 2

ω ω ω ω ω ω σ θ θ ω k k H y H y C

P ′

= + − ′ − = …

( )

ω ω C min arg ˆ =

where

slide-8
SLIDE 8

8

Difficulties with Hansen’s (2007) approach: 1) explicit ordering of regressors 2) Estimation of nested models (cannot handle, for example, combining (X1, X4, X8) and (X1, X5, X7) 3) criterion based on asymptotic justification

  • ε

θ θ θ ε θ θ ε θ + + + + = + + = + =

p p

X X X y X X y X y

  • 2

2 1 1 2 2 2 1 1 1 1 1

; ;

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

Alternative approach: X : focus (required) regressors Z : auxillary regressors Framework follows Magnus and Durbin (1999, Econometrica)

( ) ( )( ) ( )( ) ( )

;

1 1 1 1 × × × × × ×

+ + =

n m m n k k n n

Z X y ε γ β

slide-11
SLIDE 11

11

Choice of weights

  • when m = 1, Magnus (2002,

Econometrics Journal) and Danilov (2005, Econometrics Journal) considered weight based on Laplace prior.

  • Our approach: select weights based on

the MSE of the weighted average estimator.

slide-12
SLIDE 12

12

  • With m auxiliary regressors in Z, there are

2m models

  • Unrestricted estimators :
  • Fully restricted estimators:

ˆ =

r

γ

1

( )

r

b X X X y

′ ′ =

;

My Z D

u

′ =

2

ˆ γ

ˆ

u r

b b Qθ = −

;

( ) ( ) ( )

( )

. with , ~ ˆ and , , where

1 2 1 2 1 1

γ θ σ θ θ

− − − −

= ′ = ′ ′ − = ′ = ′ ′ = D I N My Z D X X X X I M MZ Z D ZD X X X Q

m n

slide-13
SLIDE 13

13

( )

( ) ( )

θ γ θ ˆ ˆ ; ˆ : estimators restricted partially 2 The

th i i i r i m

DW QW b b i i = − = ≤ ≤

{ }

, , where

1 2

D S S D S DS P P I W

i i i i i i m i

′ ′ = − =

. rank

  • f

matrix selection a is and

i i

r S

slide-14
SLIDE 14

14

  • Traditional model selection chooses the “best”

among the 2m models.

  • Frequentist Model Average estimators:

where and

  • Consider weights
  • Write

( )

=

=

m

i i i f 2 1

ˆ ˆ γ λ γ

i

λ ≥

2 1

1.

m

i i

λ

=

=

2 ( ) 1

,

m

f i i i

b b λ

=

=∑

2

ˆ ˆ ( , ).

i i

λ λ θ σ =

2 1

,

m

i i i

W W λ

=

= ∑ ;

slide-15
SLIDE 15

15

( )

2

ˆ 2 2 ( )/ 2 1 ( )/ 2 1 1

ˆ ˆ ˆ ˆ ( , ) ( ) / 2 ( ) ( , )

n k m n k m

n k m t t dt

σ

θ σ σ θ

− − − + − − −

Ψ = − − Ψ

( )

2 1 1

ˆ ˆ ˆ ˆ ( , ) ( , ) / .

m

i i i

t Q W t W Q θ λ θ θ θ

=

⎧ ⎫ ⎪ ⎪ ′ ′ Ψ = + ∂ ∂ ⎨ ⎬ ⎪ ⎪ ⎩ ⎭

Theorem 3.1

where and

( )

( ) ( )

{ } ( ) ( ) { }

′ + Ψ + − + ′ − ′ =

− 2 2 2 1 2

ˆ , ˆ ˆ , ˆ ˆ ˆ ˆ ˆ σ θ ψ σ θ θ σ σ W I Q Q Q X X b E S M

m f

slide-16
SLIDE 16

16

One problem with minimizing is that is complex.

( )

f

b R ˆ

( )

2

ˆ 2 2 ( )/ 2 1 ( )/ 2 1 1

ˆ ˆ ˆ ˆ ( , ) ( ) / 2 ( ) ( , )

n k m n k m

n k m t t dt

σ

θ σ σ θ

− − − + − − −

Ψ = − − Ψ

( )

( ) ( ) ( ) ( )

( ) { }

2 2 1 2

ˆ , ˆ 2 ˆ ˆ ˆ ˆ ˆ σ θ θ θ σ σ Ψ + − ′ − + ′ − ′ =

tr W I Q Q W I Q Q tr X X tr b R

m m f

slide-17
SLIDE 17

17

Solution : Replace by where since

( )

2

ˆ , ˆ σ θ Ψ

( )

2 1 2

ˆ , ˆ ˆ σ θ σ Ψ

( ) { } ( ) { }.

ˆ , ˆ ˆ , ˆ

2 1 2 2

σ θ σ σ θ ψ Ψ = E E

( )

2 1 1

ˆ ˆ ˆ ˆ ( , ) ( , ) / .

m

i i i

t Q W t W Q θ λ θ θ θ

=

⎧ ⎫ ⎪ ⎪ ′ ′ Ψ = + ∂ ∂ ⎨ ⎬ ⎪ ⎪ ⎩ ⎭

,

slide-18
SLIDE 18

18

So, we have Write , where are positive constants and c is a non‐positive constant. S‐AIC (Buckland et al. (1997, Biometrics)): S‐BIC : S‐AICC (Hurvich and Tsai (1989, Biometrika)):

( ) { }

2 ; 1 exp n c q a

i i

− = + − =

2

ˆ ˆ ( , )

i

λ θ σ = (

)

2 2 2 1

ˆ ˆ ( ) / ( )

m

c c i i i i i

a a σ σ

=

⎛ ⎞ ⎜ ⎟ ⎝ ⎠

( )

2 ;

2 1

n c n a

i

q i

− = − =

+ −

( ) ( ) { }

2 ; 2 1 exp n c q n q n a

i i i

− = − − + − = s '

i

a

( )

( ) ( ) ( ) ( )

( ) { }

, ˆ , ˆ 2 ˆ ˆ ˆ ˆ ˆ

2 2 1 2

σ θ θ θ σ σ Ψ + − ′ − + ′ − ′ =

tr W I Q Q W I Q Q tr X X tr b R

m m f a

slide-19
SLIDE 19

19

Recall that Now,

( ) ( ) ( )

. ˆ ˆ ˆ , ˆ ˆ , ˆ

2 1 2 2 1

Q W W Q

m

i i i

′ ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ ′ ∂ ∂ + = Ψ

=

θ θ σ θ λ σ θ

} ˆ

( ) ,

m i

I W θ × −

(*)

( )

( )

( ) (

) (

)

( )(

)

⎭ ⎬ ⎫ ⎩ ⎨ ⎧ − − = ∂ ∂

− = −

1 2 2 2 1 1 2 2 2

ˆ ˆ , ˆ ˆ ˆ , ˆ 2 ˆ ˆ , ˆ

i i i i m i i i

m

W I c n σ σ θ λ σ σ θ λ θ σ θ λ

slide-20
SLIDE 20

20

Putting (*) in we have where and φ each be a 2m × 1 vector with g consisting of the diagonal elemtns of G and the ith element of φ be

( ) ,

ˆ , ˆ

2 1

σ θ ψ

, ) ( , ) (

ij ij

g G l L = =

( )

( )

, ˆ ˆ θ θ

j m i m ij

W I Q Q W I l − ′ − ′ =

( ) ( )

, 2 , 1 , , ˆ ˆ ˆ

1 2 m j m i j ij

j i W I Q Q W g … = − ′ ′ =

θ θ σ

( )

. 2 , 1 ,

m i

i Q QW tr … = ′

( )

( ) ( ) ( )

λ λ σ λ λ σ σ G c n L Q Q tr X X tr b R

f a 2 2 1 2

ˆ 4 ˆ ˆ ˆ − ′ + ′ − ′ =

( )

, ˆ 4 ˆ 2

2 2

g c n λ σ φ λ σ ′ + ′ +

slide-21
SLIDE 21

21

Interesting special case

Setting c = 0 and considering only mixing bu and br, then minimization criterion leads to i.e. James and Stein estimator !!

2 2 2 2

ˆ ˆ ( ) ( ) 1 , || || || ||

js u r u r u r

tr Q Q tr Q Q b b b b b b b σ σ ⎧ ⎫ ′ ′ = − + ⎨ ⎬ − − ⎩ ⎭

slide-22
SLIDE 22

22

Optimal predictor

Let where and is the estimator of γ corresponding to bf

, ˆ ˆ

f f

H θ μ =

( )

( )

) ( 2 1 2

ˆ ˆ , ˆ ˆ ˆ , ˆ

i i i f f f f

m

b γ σ θ λ γ γ θ

=

= ′ ′ =

slide-23
SLIDE 23

23

{ }

2 2 2 2 2 1 2 1 2 1 2 1 2 1 2

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ( , , , ) ( ) ( , ) { ( , )} ,

m

C C C C C I W C C C C C ϕ θ σ σ θ θ σ θ σ

′ = − + − + Ξ + Ξ

2

ˆ 1 1 2 2 2 2 1

ˆ ˆ ˆ ˆ ( , ) ( ) ( , ) 2

n k m n k m

n k m t t dt

σ

θ σ σ θ

− − − − − + −

− − Ξ = Ξ

2 1 1

ˆ ( , ) ˆ ˆ ( , ) ˆ

m

i i i

t t W W λ θ θ θ θ

=

∂ ′ Ξ = + ∂

where and

( )

( ) ( ) ( ) ( ) ( ) ⎟

⎠ ⎞ ⎜ ⎝ ⎛ ′ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ′ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ′ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ′ − ′ ′ =

ZD ZD XQ ZD ZQ XQ XQ XQ X X X X E S M

f

, , ˆ , ˆ , , ˆ , ˆ , , ˆ , ˆ , , ˆ , ˆ ˆ ˆ ˆ

2 2 2 2 1 2

σ θ ϕ σ θ ϕ σ θ ϕ σ θ ϕ σ μ

slide-24
SLIDE 24

24

The trace of is which is approximately equal to by analogy of previous analysis.

( )

f

E S M μ ˆ ˆ ˆ ˆ ( )

f

R μ =

{ } { }

2 2 2

ˆ ˆ ˆ ˆ ˆ ( , , ,( ) ) 2 ( , , ,( ) ) k tr XQ XQ tr ZD XQ σ ϕ θ σ ϕ θ σ ′ ′ + −

2 2 2

ˆ ˆ ˆ ˆ ˆ ( ) ( ) 2 (4/ )

a f

R k m L n c G μ σ λ λ σ λ φ σ λ λ ′ ′ ′ = − + + −

slide-25
SLIDE 25

25

Experiment 1

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ;

1 / ; ; 1 , ~ , 1 , ~ , 1 , 4 ~ . 1 , ~ , 1 , 4 ~ 1 , ~ , 1 , ~ , 1 ,

2 1 2 1 2 1 7 6 5 4 3 2 1 7 1

v c v c R c N e U x N x u x N x N x N x x e x y

j j i i i i i i i i j i ji j i

τ τ θ θ

α

′ + ′ = = = + =

− =

α

τ

2 −

= j

j

300 , 150 , 80 , 30 = n v is a vector comprising the variances of the regressors ; 5 . 1 , . 1 , 5 . = α

slide-26
SLIDE 26

26

OPT (Optimal Frequentist MA) estimator

Focus regressors : x1, x2, x3, x4 Auxiliary regressors : x5, x6, x7 (i.e., 23 models)

MMA (Hansen’s) estimator

Order regressors x1, x2, x3, x4, x5, x6, x7 (i.e., 7 models)

slide-27
SLIDE 27

27

( ) ⎥

⎦ ⎤ ⎢ ⎣ ⎡ − = ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ′ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − =

∑ ∑ ∑

= = = 4 1 2 2 7 1 7 1 1

ˆ ˆ ˆ

j j j j j j j j j

L x y x y L θ θ θ θ

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

61.51% 84.98% 82.98% 64.95% 30 80 150 300 1.5 72.47% 80.36% 68.45% 47.25% 30 80 150 300 1.0 73.99% 69.77% 53.34% 30.88% 30 80 150 300 0.5 Risk (OPT) < Risk (MMA) under L1 n α

slide-33
SLIDE 33

33

98.67% 300 1.5 100% 150 1.0 91.13% 80 1.0 79.96% 30 0.5 Risk (OPT) < Risk (MMA) under L2 n α

slide-34
SLIDE 34

34

Experiment 2

  • Data taken from Pearson and Timmermann

(1994, J. of Forecasting)

  • Predictability of excess returns for S & P 500

Index

  • Same data used by Danilov and Magnus (2004,
  • J. of Forecasting)
slide-35
SLIDE 35

35

and definition:

1 2 2 3 1 4 1 1 1 2 1 3 1 4 2

3

t t t t t t t t t

y PI DI SPREAD YSP DIP PER DLEAD β β β β γ γ γ γ ε

− − − − − − −

= + + + + + + + +

2 t

PI −

= annual inflation rate (lagged two periods),

1

3t DI

− = change in 3‐month T‐bill rate (lagged one period),

= dividend yield on SP500 portfolio (lagged one period), = credit spread (lagged one period),

1 t

SPREAD −

1 t

YSP−

= annual change in industrial production (lagged one period), = price‐earnings ratio (lagged one period),

1 t

DIP−

1 t

PER −

2 t

DLEAD − = annual change in leading business cycle indicator (lagged two periods), t = 1956 – 2001 (46 observations). yt = excess returns,

slide-36
SLIDE 36

36

Focus regressors : Pit–2 , DI3t–1 , SPREADt–1 and intercept Auxiliary regressors : YSPt–1 , DIPt–1 , PERt–1 and DLEADt–2

slide-37
SLIDE 37

37

OPT estimator 24 = 16 models MMA estimator 8! = 40320 possible ordering sequences

slide-38
SLIDE 38

38

Design of Monte Carlo Study

  • Uses OLS estimates as true parameters
  • yt in each round of simulation is
  • btained by drawing 46 random

disturbances with replacement from OLS residuals.

  • A total of 100 Monte‐Carlo samples are

drawn.

slide-39
SLIDE 39

39

  • R(OPT) under L1 is 0.0878
  • R(MMA) under L1 ranges from 0.0771 to

0.1057 depending on the ordering pattern.

  • Average R(MMA) = 0.0942
  • R(OPT) < R(MMA) in 35778 out of 40320
  • r 88.7% of cases.
slide-40
SLIDE 40

40

Conclusions

  • Alternative way to select model weights for a

frequentist model average estimator

  • Merits

– framework does not require explicit ordering of regressors (ordering pattern is a key determinant of Hansen’s estimator) – Finite sample justifications

  • Work ahead : extend present analysis to out‐of‐sample

forecasts.