the factor lasso and k step bootstrap approach for
play

The Factor-Lasso and K-Step Bootstrap Approach for Inference in - PowerPoint PPT Presentation

The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications Christian Hansen Yuan Liao May 2017 Montreal Hansen and Liao Factor-Lasso Introduction Observe many control variables Two popular


  1. The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications Christian Hansen Yuan Liao May 2017 Montreal Hansen and Liao Factor-Lasso

  2. Introduction ◮ Observe many control variables ◮ Two popular (formal) dimension blueuction techniques: Variable/model selection - e.g. lasso Factor models Hansen and Liao Factor-Lasso

  3. Variable Selection Review ( α parameter of interest): y i = α d i + x ′ i β + ε i d i = x ′ i γ + u i 1. Allow MANY control variables 2. Impose SPARSITY on β, γ ◮ Literature: Belloni, Chernozhukov and Hansen (12 REStud. ), etc. ◮ weak dependence among x ◮ just a few x have impact on y , d Hansen and Liao Factor-Lasso

  4. Large Factor Model Review ( α parameter of interest): y i = α d i + f ′ i β + ε i d i = f ′ i γ + v i x i = Λ f i + U i 1. Most of x have impact on y , d . 2. dimension of f i is small ◮ Literature: Factor augmented regressions, diffusion index forecast (e.g. Bai and Ng (03), Stock and Watson (02)) ◮ Generally results in strong dependence among x ◮ Regression directly on x will generally NOT produce sparse coefficients ◮ Do not worry about the “remaining information” in U i Hansen and Liao Factor-Lasso

  5. What we aim to do nests large factor models and variable selection. i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = f ′ i γ + U ′ x i = Λ f i + U i 1. U i represent variation in observables not captured by factors 2. estimation method: lasso on U i . 3. Justifications of key assumptions for lasso: ◮ Weak dependence among regressors: Most variations in x are driven by factors. ◮ Sparsity of θ : only a few x have “useful remaining information” after factors are controlled. Hansen and Liao Factor-Lasso

  6. Some “why not” questions we had... 1. control for ( f i , x i ) instead of ( f i , U i ) : i θ y + ε i y i = α d i + f ′ i β + x ′ i θ d + v i d i = f ′ i γ + x ′ x i = Λ f i + U i ◮ within x i : strongly correlated. ◮ between x i and f i : strongly correlated. 2. Use lots of factors y i = α d i + f ′ i β + ε i d i = f ′ i γ + v i x i = Λ f i + U i ◮ Allow dim ( f i ) to increase fast with p = dim ( x i ) ◮ Assume ( β, γ ) sparse, then “lasso” them. ◮ No sufficient amount “cross-sectional” information for factors ◮ Estimating factors is either inconsistent or with slow rate, impacting inference on α Hansen and Liao Factor-Lasso

  7. Some “why not” questions we had... 3. Sparse PCA x i , l = λ ′ l f i + U i , l = 1 , ..., p , i = 1 , ..., n ◮ Most of ( λ 1 , ..., λ p ) are zero. ◮ Most of x do not depend on factors. Become a sparse model: y i = α d i + x ′ i β + ε i d i = x ′ i γ + u i Hansen and Liao Factor-Lasso

  8. What we do i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = f ′ i γ + U ′ x i = Λ f i + U i , i = 1 , ..., n ◮ Do not directly observe ( f , U ) ; ( θ y , θ d ) are sparse ◮ dim ( f i ) , dim ( α ) are small. 1. Estimate ( f , U ) from the third equation 2. Lasso on i θ new + ε new y i − � E ( y i | f i ) = � U ′ ε new , = α v i + ε i i i i θ d + v i d i − � E ( d i | f i ) = � U ′ 3. OLS on � = α � ε new v i + ε i i Hansen and Liao Factor-Lasso

  9. Extensions: I, II I: endogenous treatment i θ y + ε i y i = α d i + f ′ i β + U ′ i θ d + v i d i = π z i + f ′ i γ + U ′ i θ z + u i z i = f ′ i ψ + U ′ x i = Λ f i + U i , i = 1 , ..., n II: diffusion index forecast y t + h = α y t + f ′ t β + U ′ t θ + ε t + h x t = Λ f t + U t , t = 1 , ..., T . Include U t to capture idiosyncratic information in x t . Hansen and Liao Factor-Lasso

  10. Extensions: III Panel data What we focused on in this paper: it θ y + µ y y it = α d it + ( λ y t ) ′ f i + U ′ i + δ y t + ǫ it it θ d + µ d d it = ( λ d t ) ′ f i + U ′ i + δ d t + η it X it = Λ t f i + µ X i + δ X t + U it , i ≤ n , t ≤ T , dim ( X it ) = p ◮ µ i and δ t are unrestricted individual and time effects ◮ p → ∞ , n → ∞ , ◮ T is either fixed or growing but satisfy T = o ( n ) , because: need accurate estimation of U it , relying on estimating Λ t ◮ n = o ( p 2 ) because need accurate estimation of f i . Hansen and Liao Factor-Lasso

  11. Asymptotic Normality Define   �� � 2 � � 1 σ ηǫ = 1   σ ηǫ = Var √ ( η it − ¯ η i )( ǫ it − ¯ ǫ i ) � � η it � ǫ it nT nT i , t t i   � �  1 η = 1 σ 2 η i ) 2  σ 2 η 2 η = E ( η it − ¯ � � it nT nT i , t i , t √ η σ − 1 / 2 d σ 2 nT ( � α − α ) − → N ( 0 , 1 ) ηǫ √ d σ 2 σ − 1 / 2 � η � nT ( � α − α ) − → N ( 0 , 1 ) ηǫ Additional comments: ◮ Not clear that you could get these results even if λ y t = 0 were known due to strong dependence in X resulting from presence of factors ◮ First taking care of factor structure in X seems potentially important Hansen and Liao Factor-Lasso

  12. Extensions of Inference I: K-Step Bootstrap Alternative to inference from plug-in asymptotic distribution is bootstrap inference Full bootstrap lasso: ◮ Generate bootstrap data ( X i , ∗ , Y ∗ i ) ◮ � n β ∗ = arg min 1 β ) 2 + λ � β � 1 � ( Y ∗ i − X ∗ T i n i = 1 ◮ Repeat B times. Full bootstrap lasso is potentially burdensome. Hansen and Liao Factor-Lasso

  13. K-Step Bootstrap Consider a K-Step bootstrap in Andrews (2002): ◮ Start lasso at full sample solution ( � β lasso ) ◮ For each bootstrap data, initialize at � 0 = � β ∗ β lasso ◮ Employ iterative algorithms: Obtain β lasso = � � β ∗ 0 ⇒ � β ∗ 1 ⇒ ... ⇒ � β ∗ k ◮ Similar to Andrews 02, each step is in closed form - fast even in large problems ◮ Different from Andrews 02, each step is still an l 1 -penalized problem Hansen and Liao Factor-Lasso

  14. Coordinate descent (Fu 1998) ◮ Update one component at a time, fixing the remaining components: � 1 − X ij β j ) 2 + λ | ψ j β j | = min i − X ∗ ′ ( Y ∗ i , − j � β ∗ L ℓ ( β j ) + λ | ψ j β j | min ℓ, − j n � �� � β j β j i others, known � β ∗ ℓ + 1 , j = arg min L ℓ ( β j ) + λ | ψ j β j | β j for j = 1 , ..., p . ◮ Each � β ∗ ℓ + 1 , j is closed form = soft-thresholding. 1 2 ( z − β ) 2 + λ | β | arg min β ∈ R = sgn ( z ) max ( | z | − λ, 0 ) Hansen and Liao Factor-Lasso

  15. Faster methods ◮ “Composite Gradient descent” (Nesterov 07, Agarwal et al. 12 Ann. Statist. ) update the entire vector at once β ∗ � β ( β − � β ∗ l ) ′ V ( β − � β ∗ l ) + b ′ ( β − � β ∗ l + 1 = arg min l ) + λ � ψβ � 1 originally: Replace V by h 2 × identity ⇒ the entire vector is in closed form= soft thresholding ◮ choose h : if dimension is small, use h = 2 λ max ( V ) to “majorize” V If dimension is large, 2 λ max ( V ) is unbounded (Johnstone 01) Hansen and Liao Factor-Lasso

  16. General Conditions for Iterative Algorithms Q ( β ) = 1 n � Y ∗ − X ∗ β � 2 2 + λ � Ψ β � 1 Suppose � β ∗ k satisfies: 1. minimization error is smaller than statistical error. Q ( � β Q ( β ) + o P ∗ ( | � β k ) ≤ min β − β 0 | ) 2. sparsity: | � β k | 0 = O P ∗ ( | J | 0 ) . Can be directly verified using the KKT condition We verified both conditions for the Coordinate descent ( Fu 98) Hansen and Liao Factor-Lasso

  17. Bootstrap Confidence Interval √ τ/ 2 be the τ/ 2 th upper quantile of { α b − � Let q ∗ nT | � α | : b = 1 , ..., B } k-step bootstrap does not affect first-order asymptotics. (proved for linear model) � � √ ◮ P α ± q ∗ α ∈ � τ/ 2 / nT → 1 − τ. ◮ extendable to nonlinear models with orthogonality conditions Hansen and Liao Factor-Lasso

  18. Technical remarks ◮ We spent most of the time proving: The effect of estimating ( f , U ) is first-order negligible under weakest possible conditions on ( n , T , p ) ◮ Require weighted errors of the form: � � d ≤ p | 1 d ≤ p | 1 ( � ( � max f i − f i ) w id | , max f i − f i ) z it , d | n nT i it � i � � f i − f i � 2 ◮ Easy to bound using Cauchy-Schwarz and 1 n But very crude, leading to stronger than necessary conditions ◮ Need to use the expansion of � f i − f i ( � f i = PCA estimator) ◮ If � f i has no closed form (e.g., MLE), need its Bahadur expansion Hansen and Liao Factor-Lasso

  19. Extensions of Inference: II, III II: factor augmented regression: t θ y + ε t y t = α d t + f ′ t β + U ′ t γ + U t θ d + v t d t = f ′ x t = Λ f t + U t , t = 1 , ..., T ◮ α ⊥ E ( y t | f t , U t ) , E ( d t | f t , U t ) , Lasso does NOT affect first-order asymptotics (Robinson 88, Andrews 94, Chernozhukov et al 16) ◮ Apply HAC (Newey-West) III: Out-of- sample forecast interval y t + h = α y t + f ′ t β + U ′ t θ + ε t + h � �� � y t + h | t x t = Λ f t + U t , t = 1 , ..., T . y T + h | T �⊥ U ′ t θ , Lasso estimation of U ′ t θ DOES affect confidence interval for y T + h | T Hansen and Liao Factor-Lasso

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend