weight selection for a model weight selection for a model
play

Weight Selection for a Model Weight Selection for a Model Average - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work


  1. Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work with H. Liang and G. of Rochester) of Rochester) 1

  2. • Model selection methods assume final model chosen in advance • Under ‐ reporting of variability and confidence intervals. • Papers on under ‐ reporting due to model selection: Danilov and Magnus (2004, J. of Econometrics) Leeb and P ö tscher (2006, Annals of Stat) Leeb and P ö tscher (2008, Econometric Theory) 2

  3. • Current paper – frequentist model averaging • Bayesian model averaging – very common – based on prior probabilities for potential models and priors for parameters. – Hoeting et al. (1999, Stat Science) • Frequentist model averaging – Hjort and Claeskens (2003, JASA) – Yuan and Yang (2005, JASA) – Leung and Barron (2006, IEEE Info Th.) – Hansen (2007, Econometrica) 3

  4. • Current paper motivated by Hansen (2007, Econometrica) • Hansen’s approach: Weights chosen by minimizing the Mallows criterion, equivalent to squared error in large samples. • Model framework: ( ) = θ + ε ε σ 2 ; ~ . . . 0 , y H i i d ( ) ( )( ) ( ) × × × × 1 1 1 n n P P n 4

  5. Hansen’s approach: •Order the regressors at the outset, X 1 , X 2 , X 3 , … X p •Estimate a set of nested models: = θ + ε ; y X 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p 5

  6. • Let H p be an n × p ( ≤ P ) matrix comprising the 1 st p columns of H and ω p is the weight. • Hansen’s (MMA) estimator : ⎛ ′ ′ ⎞ − 1 P ( ) H H H y ∑ Θ ˆ = ω ⎜ p p p ⎟ m p ⎝ ⎠ 0 = 1 p 6

  7. •Mallows criterion: ( ) ( ) ′ ( ) ( ) ˆ ˆ ω = − θ − θ + σ ω 2 2 C y H y H k where ( ) P ′ ω = ω ω ω … , 1 2 and ( ) ω is the effective number of parameters k ( ) ˆ = ω ω arg min C • Dk 7

  8. Difficulties with Hansen’s (2007) approach: 1) explicit ordering of regressors 2) Estimation of nested models = θ + ε ; y X � 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p ( cannot handle, for example, combining (X 1 , X 4 , X 8 ) and (X 1 , X 5 , X 7 ) 3) criterion based on asymptotic justification 8

  9. 9

  10. Alternative approach: = β + γ + ε ; y X Z ( )( ) ( )( ) ( ) ( ) × × × × × × 1 1 1 1 n n k k n m m n X : focus (required) regressors Z : auxillary regressors Framework follows Magnus and Durbin (1999, Econometrica) 10

  11. Choice of weights • when m = 1, Magnus (2002, Econometrics Journal) and Danilov (2005, Econometrics Journal) considered weight based on Laplace prior. • Our approach: select weights based on the MSE of the weighted average estimator. 11

  12. • With m auxiliary regressors in Z, there are 2 m models • Unrestricted estimators : ′ ˆ = − Q θ γ = 2 ˆ ; D Z My b b u u r • Fully restricted estimators: γ = ′ ′ = − 1 ˆ 0 ; ( ) b X X X y r r ( ) ( ) − − 1 ′ ′ ′ = = 1 where , , Q X X X ZD D Z MZ 2 ( ) ( ) − ′ ′ ′ ˆ = − θ = θ σ 1 2 and ~ , M I X X X X D Z My N I n m − θ = γ 1 with . D 12

  13. ( ) ≤ ≤ th m The 0 2 partially restricted estimators : i i ˆ ˆ = − θ γ = θ ˆ ; b b QW DW ( ) ( ) i r i i i { } − 1 ′ ′ = − = 2 where , , W I P P DS S D S S D i m i i i i i i and is a selection matrix of rank . S r i i 13

  14. • Traditional model selection chooses the “best” among the 2 m models. • Frequentist Model Average estimators: m m 2 2 = ∑ ∑ λ γ = λ γ ˆ ˆ , b b ( ) ; ( ) f i i f i i = = 1 1 i i m 2 ∑ λ = λ ≥ 1. where 0 and i i = 1 i ˆ λ = λ θ σ 2 ˆ ( , ). • Consider weights i i m = ∑ 2 λ • Write , W W i i = 1 i 14

  15. Theorem 3.1 { } ( ) { ( ) } ( ) ′ ( ) ( ) 2 − ˆ ′ ′ ˆ ˆ ˆ = σ − σ + − θ + Ψ θ σ + ψ θ σ 1 2 2 2 ˆ ˆ ˆ ˆ , , M S E b X X Q Q Q I W f m where σ 2 ˆ ( ) ∫ Ψ θ σ ˆ = − − σ − − − + − − − Ψ θ ˆ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 and ⎧ ⎫ ( ) ⎪ m ⎪ 2 ∑ ′ ′ Ψ θ ˆ = + ∂ λ θ ˆ ∂ θ θ ˆ ˆ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ = 1 i 15

  16. ( ) ( ) ( ) ( ) ( ) − ′ ′ ˆ ′ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W f m m { ( ) } + Ψ θ ˆ σ 2 ˆ 2 , tr ( ) ˆ R b One problem with minimizing is that f σ 2 ˆ ( ) ∫ ˆ − − − + − − − ˆ Ψ θ σ = − − σ Ψ θ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 is complex. 16

  17. Solution : ( ) ˆ σ Ψ θ Replace by 2 ˆ , ( ) ˆ σ Ψ θ σ 2 2 ˆ ˆ , 1 where ⎧ ⎫ ( ) ⎪ ⎪ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ = + ∂ λ θ ∂ θ θ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ , = 1 i { ( ) } { ( ) } . ˆ ˆ ψ θ σ = σ Ψ θ σ 2 2 2 ˆ ˆ , , since E E 1 17

  18. So, we have ( ) ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W a f m m { ( ) } ˆ + Ψ θ σ 2 ˆ 2 , , tr ⎛ ⎞ m = ( ) 2 ∑ λ θ σ ˆ ˆ σ σ 2 2 2 c ⎜ c ⎟ ( , ) ˆ ˆ Write , ( ) / ( ) a a i i i i i ⎝ ⎠ = 1 i where are positive constants and c is a non ‐ positive ' s a i constant. S ‐ AIC (Buckland et al. (1997, Biometrics)): { ( ) } = − + = − n exp 1 ; a q c 2 i i ( ) − + = − = − n 1 2 q ; S ‐ BIC : a n c i 2 i S ‐ AICC (Hurvich and Tsai (1989, Biometrika)): { ( ) ( ) } − = − + − − = n exp 1 2 ; a n q n q c 18 2 i i i

  19. Recall that ( ) ( ( ) ) ⎧ ⎫ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ σ = + ∂ λ θ σ ∂ θ θ 2 2 ⎨ ⎬ ˆ ˆ , , . Q W W Q 1 i i ⎩ ⎭ = 1 i Now, ( ) ( ) ( ( ) ( ⎧ ⎫ ) ( ) m 2 ( ) ) ∑ − − 1 1 ˆ ˆ ˆ ˆ ∂ λ θ σ ∂ θ = λ θ σ σ − − λ θ σ σ 2 2 2 2 2 ⎨ ⎬ ˆ ˆ ˆ ˆ ˆ , 2 , , n c I W i i i m i i i ⎩ ⎭ = i 1 } ˆ × − θ ( ) , I W (*) m i 19

  20. ( ) , ψ θ ˆ σ 2 Putting (*) in we have ˆ , 1 ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ = σ − σ + λ λ − σ λ λ 1 2 2 2 ˆ ˆ ˆ 4 R b tr X X tr Q Q L n c G a f ( ) ′ ′ + σ λ φ + σ λ 2 2 ˆ ˆ 2 4 , n c g where = = ( ) , ( ) , L l G g ij ij ( ) ( ) ′ ′ ˆ ˆ = θ − − θ , l I W Q Q I W ij m i m j ( ) ( ) − 1 ′ ′ ˆ ˆ = σ θ − θ = 2 … m ˆ , , 1 , 2 , g W Q Q I W i j ij j i m j and φ each be a 2 m × 1 vector with g consisting of the diagonal elemtns of G and the i th element of φ ( ) ′ = … m , 1 , 2 . be tr QW Q i 20 i

  21. Interesting special case Setting c = 0 and considering only mixing b u and b r , then minimization criterion leads to ⎧ ⎫ ′ ′ σ σ 2 2 ˆ ˆ ( ) ( ) tr Q Q tr Q Q = − + ⎨ ⎬ 1 , b b b − − js u r 2 2 || || || || ⎩ ⎭ b b b b u r u r i.e. James and Stein estimator !! 21

  22. Optimal predictor ˆ μ = H θ ˆ , Let f f ( ) ′ ′ θ ˆ = γ where ˆ , b f f f ( ) m 2 ∑ ˆ γ = λ θ σ γ 2 ˆ ˆ ˆ , and is the ( ) f i i = 1 i estimator of γ corresponding to b f 22

  23. ( ) ( ) ⎛ ( ) ⎞ ⎛ ( ) ⎞ ′ ′ − ˆ ′ ′ ˆ ˆ μ = σ − ϕ θ σ − ϕ θ σ 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 ˆ ˆ ˆ ˆ , , , , , , M S E X X X X XQ XQ XQ ZQ ⎝ ⎠ ⎝ ⎠ f ⎛ ( ) ⎞ ⎛ ( ) ⎟ ⎞ ′ ′ ˆ ˆ − ϕ θ σ + ϕ θ σ ⎜ ⎟ ⎜ 2 2 ˆ ˆ , , , , , , ZD XQ ZD ZD ⎝ ⎠ ⎝ ⎠ where { } ⊗ 2 ′ ϕ θ σ ˆ = − σ + − θ ˆ + Ξ θ σ ˆ + Ξ θ σ ˆ 2 2 2 2 ˆ ˆ ˆ ˆ ( , , , ) ( ) ( , ) { ( , )} , C C C C C I W C C C C C 1 2 1 2 1 2 1 2 1 2 m − − − − − − n k m n k m − + σ 2 − n k m ˆ 1 1 ∫ ˆ ˆ Ξ θ σ = σ Ξ θ 2 2 ˆ ˆ ( , ) ( ) 2 2 ( , ) t t dt 1 2 0 and ˆ ∂ λ θ m 2 ( , ) t ∑ ′ ˆ ˆ Ξ θ = + θ i ( , ) t W W 1 ˆ ∂ θ i = 1 i 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend