Weight Selection for a Model Weight Selection for a Model Average - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work with H. Liang and G. of Rochester) of Rochester) 1

• Model selection methods assume final model chosen in advance • Under ‐ reporting of variability and confidence intervals. • Papers on under ‐ reporting due to model selection: Danilov and Magnus (2004, J. of Econometrics) Leeb and P ö tscher (2006, Annals of Stat) Leeb and P ö tscher (2008, Econometric Theory) 2

• Current paper – frequentist model averaging • Bayesian model averaging – very common – based on prior probabilities for potential models and priors for parameters. – Hoeting et al. (1999, Stat Science) • Frequentist model averaging – Hjort and Claeskens (2003, JASA) – Yuan and Yang (2005, JASA) – Leung and Barron (2006, IEEE Info Th.) – Hansen (2007, Econometrica) 3

• Current paper motivated by Hansen (2007, Econometrica) • Hansen’s approach: Weights chosen by minimizing the Mallows criterion, equivalent to squared error in large samples. • Model framework: ( ) = θ + ε ε σ 2 ; ~ . . . 0 , y H i i d ( ) ( )( ) ( ) × × × × 1 1 1 n n P P n 4

Hansen’s approach: •Order the regressors at the outset, X 1 , X 2 , X 3 , … X p •Estimate a set of nested models: = θ + ε ; y X 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p 5

• Let H p be an n × p ( ≤ P ) matrix comprising the 1 st p columns of H and ω p is the weight. • Hansen’s (MMA) estimator : ⎛ ′ ′ ⎞ − 1 P ( ) H H H y ∑ Θ ˆ = ω ⎜ p p p ⎟ m p ⎝ ⎠ 0 = 1 p 6

•Mallows criterion: ( ) ( ) ′ ( ) ( ) ˆ ˆ ω = − θ − θ + σ ω 2 2 C y H y H k where ( ) P ′ ω = ω ω ω … , 1 2 and ( ) ω is the effective number of parameters k ( ) ˆ = ω ω arg min C • Dk 7

Difficulties with Hansen’s (2007) approach: 1) explicit ordering of regressors 2) Estimation of nested models = θ + ε ; y X � 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p ( cannot handle, for example, combining (X 1 , X 4 , X 8 ) and (X 1 , X 5 , X 7 ) 3) criterion based on asymptotic justification 8

Alternative approach: = β + γ + ε ; y X Z ( )( ) ( )( ) ( ) ( ) × × × × × × 1 1 1 1 n n k k n m m n X : focus (required) regressors Z : auxillary regressors Framework follows Magnus and Durbin (1999, Econometrica) 10

Choice of weights • when m = 1, Magnus (2002, Econometrics Journal) and Danilov (2005, Econometrics Journal) considered weight based on Laplace prior. • Our approach: select weights based on the MSE of the weighted average estimator. 11

• With m auxiliary regressors in Z, there are 2 m models • Unrestricted estimators : ′ ˆ = − Q θ γ = 2 ˆ ; D Z My b b u u r • Fully restricted estimators: γ = ′ ′ = − 1 ˆ 0 ; ( ) b X X X y r r ( ) ( ) − − 1 ′ ′ ′ = = 1 where , , Q X X X ZD D Z MZ 2 ( ) ( ) − ′ ′ ′ ˆ = − θ = θ σ 1 2 and ~ , M I X X X X D Z My N I n m − θ = γ 1 with . D 12

( ) ≤ ≤ th m The 0 2 partially restricted estimators : i i ˆ ˆ = − θ γ = θ ˆ ; b b QW DW ( ) ( ) i r i i i { } − 1 ′ ′ = − = 2 where , , W I P P DS S D S S D i m i i i i i i and is a selection matrix of rank . S r i i 13

• Traditional model selection chooses the “best” among the 2 m models. • Frequentist Model Average estimators: m m 2 2 = ∑ ∑ λ γ = λ γ ˆ ˆ , b b ( ) ; ( ) f i i f i i = = 1 1 i i m 2 ∑ λ = λ ≥ 1. where 0 and i i = 1 i ˆ λ = λ θ σ 2 ˆ ( , ). • Consider weights i i m = ∑ 2 λ • Write , W W i i = 1 i 14

Theorem 3.1 { } ( ) { ( ) } ( ) ′ ( ) ( ) 2 − ˆ ′ ′ ˆ ˆ ˆ = σ − σ + − θ + Ψ θ σ + ψ θ σ 1 2 2 2 ˆ ˆ ˆ ˆ , , M S E b X X Q Q Q I W f m where σ 2 ˆ ( ) ∫ Ψ θ σ ˆ = − − σ − − − + − − − Ψ θ ˆ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 and ⎧ ⎫ ( ) ⎪ m ⎪ 2 ∑ ′ ′ Ψ θ ˆ = + ∂ λ θ ˆ ∂ θ θ ˆ ˆ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ = 1 i 15

( ) ( ) ( ) ( ) ( ) − ′ ′ ˆ ′ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W f m m { ( ) } + Ψ θ ˆ σ 2 ˆ 2 , tr ( ) ˆ R b One problem with minimizing is that f σ 2 ˆ ( ) ∫ ˆ − − − + − − − ˆ Ψ θ σ = − − σ Ψ θ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 is complex. 16

Solution : ( ) ˆ σ Ψ θ Replace by 2 ˆ , ( ) ˆ σ Ψ θ σ 2 2 ˆ ˆ , 1 where ⎧ ⎫ ( ) ⎪ ⎪ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ = + ∂ λ θ ∂ θ θ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ , = 1 i { ( ) } { ( ) } . ˆ ˆ ψ θ σ = σ Ψ θ σ 2 2 2 ˆ ˆ , , since E E 1 17

So, we have ( ) ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W a f m m { ( ) } ˆ + Ψ θ σ 2 ˆ 2 , , tr ⎛ ⎞ m = ( ) 2 ∑ λ θ σ ˆ ˆ σ σ 2 2 2 c ⎜ c ⎟ ( , ) ˆ ˆ Write , ( ) / ( ) a a i i i i i ⎝ ⎠ = 1 i where are positive constants and c is a non ‐ positive ' s a i constant. S ‐ AIC (Buckland et al. (1997, Biometrics)): { ( ) } = − + = − n exp 1 ; a q c 2 i i ( ) − + = − = − n 1 2 q ; S ‐ BIC : a n c i 2 i S ‐ AICC (Hurvich and Tsai (1989, Biometrika)): { ( ) ( ) } − = − + − − = n exp 1 2 ; a n q n q c 18 2 i i i

Recall that ( ) ( ( ) ) ⎧ ⎫ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ σ = + ∂ λ θ σ ∂ θ θ 2 2 ⎨ ⎬ ˆ ˆ , , . Q W W Q 1 i i ⎩ ⎭ = 1 i Now, ( ) ( ) ( ( ) ( ⎧ ⎫ ) ( ) m 2 ( ) ) ∑ − − 1 1 ˆ ˆ ˆ ˆ ∂ λ θ σ ∂ θ = λ θ σ σ − − λ θ σ σ 2 2 2 2 2 ⎨ ⎬ ˆ ˆ ˆ ˆ ˆ , 2 , , n c I W i i i m i i i ⎩ ⎭ = i 1 } ˆ × − θ ( ) , I W (*) m i 19

( ) , ψ θ ˆ σ 2 Putting (*) in we have ˆ , 1 ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ = σ − σ + λ λ − σ λ λ 1 2 2 2 ˆ ˆ ˆ 4 R b tr X X tr Q Q L n c G a f ( ) ′ ′ + σ λ φ + σ λ 2 2 ˆ ˆ 2 4 , n c g where = = ( ) , ( ) , L l G g ij ij ( ) ( ) ′ ′ ˆ ˆ = θ − − θ , l I W Q Q I W ij m i m j ( ) ( ) − 1 ′ ′ ˆ ˆ = σ θ − θ = 2 … m ˆ , , 1 , 2 , g W Q Q I W i j ij j i m j and φ each be a 2 m × 1 vector with g consisting of the diagonal elemtns of G and the i th element of φ ( ) ′ = … m , 1 , 2 . be tr QW Q i 20 i

Interesting special case Setting c = 0 and considering only mixing b u and b r , then minimization criterion leads to ⎧ ⎫ ′ ′ σ σ 2 2 ˆ ˆ ( ) ( ) tr Q Q tr Q Q = − + ⎨ ⎬ 1 , b b b − − js u r 2 2 || || || || ⎩ ⎭ b b b b u r u r i.e. James and Stein estimator !! 21

Optimal predictor ˆ μ = H θ ˆ , Let f f ( ) ′ ′ θ ˆ = γ where ˆ , b f f f ( ) m 2 ∑ ˆ γ = λ θ σ γ 2 ˆ ˆ ˆ , and is the ( ) f i i = 1 i estimator of γ corresponding to b f 22

( ) ( ) ⎛ ( ) ⎞ ⎛ ( ) ⎞ ′ ′ − ˆ ′ ′ ˆ ˆ μ = σ − ϕ θ σ − ϕ θ σ 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 ˆ ˆ ˆ ˆ , , , , , , M S E X X X X XQ XQ XQ ZQ ⎝ ⎠ ⎝ ⎠ f ⎛ ( ) ⎞ ⎛ ( ) ⎟ ⎞ ′ ′ ˆ ˆ − ϕ θ σ + ϕ θ σ ⎜ ⎟ ⎜ 2 2 ˆ ˆ , , , , , , ZD XQ ZD ZD ⎝ ⎠ ⎝ ⎠ where { } ⊗ 2 ′ ϕ θ σ ˆ = − σ + − θ ˆ + Ξ θ σ ˆ + Ξ θ σ ˆ 2 2 2 2 ˆ ˆ ˆ ˆ ( , , , ) ( ) ( , ) { ( , )} , C C C C C I W C C C C C 1 2 1 2 1 2 1 2 1 2 m − − − − − − n k m n k m − + σ 2 − n k m ˆ 1 1 ∫ ˆ ˆ Ξ θ σ = σ Ξ θ 2 2 ˆ ˆ ( , ) ( ) 2 2 ( , ) t t dt 1 2 0 and ˆ ∂ λ θ m 2 ( , ) t ∑ ′ ˆ ˆ Ξ θ = + θ i ( , ) t W W 1 ˆ ∂ θ i = 1 i 23

Weight Selection for a Model Weight Selection for a Model Average - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work

cProbLog: Restricting the Possible Worlds of Probabilistic Logic Programs Dimitar Shterionov

Gemstones a Unit of Weight Gemstones a Unit of Weight The historical unit of weight

INTRODUCING Connecting Weight Loss Patients Directly to your Weight Loss Center Physicians Weight

Formulation and development of foods for weight management Paola Vitaglione Weight control and

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

MEASUREMENT Weight ESSENTIAL QUESTION: How do we know which unit to choose to measure weight?

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Fad Diets & Healthy Weight Management GOAL: Participants will gain an understanding of the

Wha hat t do do yo you REALLY u REALLY want want from from w weight eight l loss? oss?

WIAA Wrestling Weight Management Assessor Training 18-19 Season History of Weight Management

Dr. Najaf Masood Assistant Professor Pediatrics BBH Rawalpindi Low Birth Weight Normal birth

The Weighted Average Constraint Alessio Bonfietti <alessio.bonfietti@unibo.it> Michele

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics

Cantor bouquets in spiders webs Yannis Dourekas July 3, 2018 The Open University Basic

Workshop on statistical machine translation for curious translators Vctor M. Snchez-Cartagena

Baumgartner, POLI 203 Spring 2016 Public opinion over time Reading: Chapter 6 of Decline of DP

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

1 Weights Need Not be Reals Goal: Parameterized FSMs a/ q / p b/ r Parameterized FSM:

Classification: K-Nearest Neighbors 3/27/17 Recall: Machine Learning Taxonomy Supervised

Weight Selection for a Model Weight Selection for a Model Average - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work

cProbLog: Restricting the Possible Worlds of Probabilistic Logic Programs Dimitar Shterionov

Gemstones a Unit of Weight Gemstones a Unit of Weight The historical unit of weight

INTRODUCING Connecting Weight Loss Patients Directly to your Weight Loss Center Physicians Weight

Formulation and development of foods for weight management Paola Vitaglione Weight control and

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

MEASUREMENT Weight ESSENTIAL QUESTION: How do we know which unit to choose to measure weight?

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Fad Diets &amp; Healthy Weight Management GOAL: Participants will gain an understanding of the

Wha hat t do do yo you REALLY u REALLY want want from from w weight eight l loss? oss?

WIAA Wrestling Weight Management Assessor Training 18-19 Season History of Weight Management

Dr. Najaf Masood Assistant Professor Pediatrics BBH Rawalpindi Low Birth Weight Normal birth

The Weighted Average Constraint Alessio Bonfietti &lt;alessio.bonfietti@unibo.it&gt; Michele

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics &amp; Statistics

Cantor bouquets in spiders webs Yannis Dourekas July 3, 2018 The Open University Basic

Workshop on statistical machine translation for curious translators Vctor M. Snchez-Cartagena

Baumgartner, POLI 203 Spring 2016 Public opinion over time Reading: Chapter 6 of Decline of DP

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/

1 Weights Need Not be Reals Goal: Parameterized FSMs a/ q / p b/ r Parameterized FSM:

Classification: K-Nearest Neighbors 3/27/17 Recall: Machine Learning Taxonomy Supervised

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Fad Diets & Healthy Weight Management GOAL: Participants will gain an understanding of the

The Weighted Average Constraint Alessio Bonfietti <alessio.bonfietti@unibo.it> Michele

Stochastic Processes MATH5835, P. Del Moral UNSW, School of Mathematics & Statistics