de biasing arbitrary convex regularizers and asymptotic
play

De-biasing arbitrary convex regularizers and asymptotic normality - PowerPoint PPT Presentation

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers University Mathematical Methods of Modern Statistics 2, June 2020 Joint work with Cun-Hui Zhang (Rutgers). Second order Poincar inequalities and


  1. De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers University Mathematical Methods of Modern Statistics 2, June 2020

  2. Joint work with Cun-Hui Zhang (Rutgers). ◮ Second order Poincaré inequalities and de-biasing arbitrary convex regularizers arXiv:1912.11943 ◮ De-biasing the Lasso with degrees-of-freedom adjustment. arXiv:1902.08885.

  3. High-dimensional statistics ◮ n data points ( x i , Y i , i = 1 , ..., n ) ◮ p covariates, x i ∈ R p p ≥ n α p ≥ n , p ≥ cn For instance, linear model Y i = x ⊤ i β + ǫ i for unknown β

  4. M-estimators and regularization � � n � 1 ˆ ℓ ( x ⊤ β = arg min i b , Y i ) + regularizer( b ) n b ∈ R p i =1 for some loss ℓ ( · , · ) and regularization penalty. Typically in the linear model, with the least-squares loss, � � ˆ � y − Xb � 2 / (2 n ) + g ( b ) β = arg min b ∈ R p with g convex. Example ◮ Lasso, Elastic-Net ◮ Bridge g ( b ) = � p j =1 | b j | c ◮ Group-Lasso ◮ Nuclear Norm penalty ◮ Sorted L1 penalty (SLOPE)

  5. Different goals, different scales � � y − Xb � 2 / (2 n ) + g ( b ) � , ˆ β = arg min b ∈ R p g convex 1. Design of regularizer g with intuition about complexity, structure ◮ convex relaxation of unknown structure (sparsity, low-rank) ◮ ℓ 1 balls are spiky at sparse vectors 2. Upper and lower bounds on the risk of ˆ β : β − β � 2 ≤ Cr n . cr n ≤ � ˆ 3. Characterization of the risk β − β � 2 = r n (1 + o P (1)) � ˆ under some asymptotics, e.g., p / n → γ or s log( p / s ) / n → 0. 4. Asymp. distribution in fixed direction a 0 ∈ R p (resp a 0 = e j ) and confidence interval for a ⊤ 0 β (resp β j ) √ n a ⊤ √ n ( � β − β ) → ? N (0 , V 0 ) , β j − β j ) → ? N (0 , V j ) . 0 (ˆ

  6. Focus of today: Confidence interval in the linear model based on convex regularized estimators of the form � , � � y − Xb � 2 / (2 n ) + g ( b ) ˆ β = arg min b ∈ R p g convex √ n (ˆ b j − β j ) ⇒ N (0 , V j ) , β j unknown parameter of interest

  7. Confidence interval in the linear model Design X with iid N (0 , Σ ) rows, known Σ , noise ε ∼ N (0 , σ 2 I n ), and a given initial estimator ˆ y = X β + ε , β . Goal: Inference for θ = a ⊤ 0 β , projection in direction a 0 Examples: ◮ a 0 = e j , interested in inference on the j -th coefficient β j ◮ a 0 = x new where x new is the characteristics of a new patient, inference for x new ⊤ β .

  8. De-biasing, confidence intervals for the Lasso

  9. Confidence interval in the linear model Design X with iid N (0 , Σ ) rows, known Σ , noise ε ∼ N (0 , σ 2 I n ), and a given initial estimator ˆ y = X β + ε , β . Goal: Inference for θ = a ⊤ 0 β , projection in direction a 0 Examples: ◮ a 0 = e j , interested in inference on the j -th coefficient β j ◮ a 0 = x new where x new is the characteristics of a new patient, inference for x new ⊤ β . De-biasing: construct an unbiased estimate in the direction a 0 0 ˆ i.e., find a correction such that [ a ⊤ β − correction] is an unbiased estimator of a ⊤ 0 β ∗

  10. Existing results Lasso ◮ Zhang and Zhang (2014) ( s log( p / s ) / n → 0) ◮ Javanmard and Montanari (2014a) ; Javanmard and Montanari (2014b) ; Javanmard and Montanari (2018) ( s log ( p / s ) / n → 0) ◮ Van de Geer et al. (2014) ( s log( p / s ) / n → 0) ◮ Bayati and Montanari (2012) ; Miolane and Montanari (2018) ( p / n → γ ) Beyond Lasso? ◮ Robust M -estimators El Karoui et al. (2013) Lei, Bickel, and El Karoui (2018) Donoho and Montanari (2016) ( p / n → γ ) ◮ Celentano and Montanari (2019) symmetric convex penalty and ( Σ = I p , p / n → γ ), using Approximate Message Passing ideas from statistical physics ◮ logistic regression Sur and Candès (2018) ( Σ = I p , p / n → γ )

  11. Focus today: General theory for confidence intervals based on any convex regularized estimators of the form � , � � y − Xb � 2 / (2 n ) + g ( b ) ˆ β = arg min b ∈ R p g convex. Little or no constraint on the convex regularizer g .

  12. Degrees-of-freedom of estimator � � ˆ � y − Xb � 2 / (2 n ) + g ( b ) β = arg min b ∈ R p ◮ then y → X ˆ β for fixed X is 1-Lipscthiz ◮ the Jacobian of y �→ X ˆ β exists everywhere (Rademacher’s theorem) � X ∂ ˆ � β ( X , y ) df = trace ∇ ( y �→ X ˆ ˆ ˆ β ) , df = trace . ∂ y used for instance in Stein’s Unbiased Risk Estimate (SURE). The Jacobian matrix ˆ H is also useful. ˆ H is always symmetric 1 H = X ∂ ˆ β ( X , y ) ˆ ∈ R n × n ∂ y 1 P.C.B and C.-H. Zhang (2019) Second order Poincaré inequalities and de-biasing arbitrary convex regularizers when p / n → γ

  13. Isotropic design, any g , p / n → γ (B. and Zhang, 2019) Assumptions ◮ Sequence of linear regression problems y = X β + ε ◮ with n , p → + ∞ and p / n → γ ∈ (0 , ∞ ), ◮ g : R p → R coercive convex penalty, strongly convex if γ ≥ 1. ◮ Rows of X are iid N ( 0 , I p ) and ◮ Noise ε ∼ N (0 , σ 2 I n ) is independent of X .

  14. Isotropic design, any penalty g , p / n → γ Theorem (B. and Zhang, 2019) � � ˆ � y − Xb � 2 / (2 n ) + g ( b ) β = arg min b ∈ R p ◮ β j = � e j , β � parameter of interest ◮ ˆ H = X ( ∂/∂ y )ˆ df = trace ˆ ˆ β , H , β � 2 + trace[( ˆ ◮ ˆ V ( β j ) = � y − X ˆ H − I n ) 2 ]( � β j − β j ) 2 . Then there exists a subset J p ⊂ [ p ] of size at least ( p − log log p ) s.t. � � ( n − ˆ df)( � j X ⊤ ( y − X ˆ � � β j − β j ) + e ⊤ β ) � � sup ≤ t − Φ( t ) � → 0 . � P ˆ V ( β j ) 1 / 2 j ∈ J p

  15. Correlated design, any g , p / n → γ Assumption ◮ Sequence of linear regression problems y = X β + ε ◮ with n , p → + ∞ and p / n → γ ∈ (0 , ∞ ), ◮ g : R p → R coercive convex penalty, strongly convex if γ ≥ 1. ◮ Rows of X are iid N ( 0 , Σ ) and ◮ Noise ε ∼ N (0 , σ 2 I n ) is independent of X .

  16. Correlated design, any penalty g , p / n → γ Theorem (B. and Zhang, 2019) � � ˆ � y − Xb � 2 / (2 n ) + g ( b ) β = arg min b ∈ R p ◮ θ = � a 0 , β � parameter of interest ◮ ˆ H = X ( ∂/∂ y )ˆ df = trace ˆ ˆ β , H , β � 2 + trace[( ˆ ◮ ˆ V ( θ ) = � y − X ˆ H − I n ) 2 ]( � a 0 , ˆ β � − θ ) 2 . ◮ Assume a ⊤ 0 Σ a 0 = 1 and set z 0 = Σ − 1 a 0 . Then there exists a subset S ⊂ S p − 1 with relative volume | S | / | S p − 1 | ≥ 1 − 2 e − p 0 . 99 � � � ( n − ˆ df)( � ˆ β , a 0 � − θ ) + � z 0 , y − X ˆ � β � � � sup ≤ t − Φ( t ) � → 0 . � P ˆ V ( θ ) 1 / 2 a 0 ∈ Σ 1 / 2 S This applies to at least ( p − φ cond ( Σ ) log log p ) indices j ∈ [ p ].

  17. Resulting 0 . 95 confidence interval � � � � � ( n − ˆ df)( � ˆ β , a 0 � − θ ) + � z 0 , y − X ˆ β � � � ˆ CI = θ ∈ R : � ≤ 1 . 96 ˆ V ( θ ) 1 / 2 Variance approximation β � 2 and the length of the interval is Typically, ˆ V ( θ ) ≈ � y − X ˆ � 2 · 1 . 96 � y − X ˆ ( n − ˆ β � df) . � � � � � ( n − ˆ df)( � ˆ β , a 0 � − θ ) + � z 0 , y − X ˆ β � � � ˆ CI approx = θ ∈ R : � ≤ 1 . 96 . � y − X ˆ β �

  18. Simulations using the approximation ˆ V ( θ ) ≈ � y − X ˆ β � 2 n = 750, p = 500, correlated Σ . β is the vectorization of a row-sparse matrix of size 25 × 20. a 0 is a direction that leads to large initial bias. Estimators: 7 different penalty functions: ◮ Group Lasso with tuning parameters µ 1 , µ 2 ◮ Lasso with tuning parameters λ 1 , ..., λ 4 ◮ Nuclear norm penalty Boxplots of initial errors √ n a ⊤ 0 (ˆ β − β ) (biased!)

  19. Simulations using the approximation ˆ V ( θ ) ≈ � y − X ˆ β � 2 n = 750, p = 500, correlated Σ β is the vectorization of a row-sparse matrix of size 25 × 20 Estimators: 7 different penalty functions: ◮ Group Lasso with tuning parameters µ 1 , µ 2 ◮ Lasso with tuning parameters λ 1 , ..., λ 4 ◮ Nuclear norm penalty Boxplots of √ n [ a ⊤ 0 (ˆ 0 ( y − X ˆ β − β ) + z ⊤ β )]

  20. Before/after bias correction

  21. QQ-plot, Lasso, λ 1 , λ 2 , λ 3 , λ 3 . � � For Lasso, ˆ � { j = 1 , ..., p : � � . df = β j � = 0 } β � 2 instead of ˆ Pivotal quantity when using � y − X ˆ V ( θ ) for the variance. ◮ The visible discrepancy in the last plot is fixed when using ˆ V ( θ ) instead.

  22. QQ-plot, Group Lasso, µ 1 , µ 2 . Explicit formula for ˆ df

  23. QQ-plot, Nuclear norm penalty No explicit formula for ˆ df available, although it is possible to compute numerical approximations.

  24. Summary of the main result 2 Asymptotic normality result, and valid 1 − α confidence interval by de-biasing any convex regularized M estimator. ◮ Asymptotics p / n → γ ◮ Under Gaussian design, known covariance matrix Σ ◮ Strong convexity of the penalty required if γ ≥ 1; otherwise any penalty is allowed. 2 P.C.B and C.-H. Zhang (2019) Second order Poincaré inequalities and de-biasing arbitrary convex regularizers when p / n → γ

  25. Time-pertmitting 1. Necessity of degrees-of-freedom adjustment 2. Central Limit Theorems and Second Order Poincar’e inequalities 3. Unknown Σ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend