de biasing the lasso optimal sample size for gaussian
play

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel - PowerPoint PPT Presentation

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel Javanmard (USC ) Hypothesis Testing


  1. De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel Javanmard (USC ) Hypothesis Testing October 2015 1 / 39

  2. An example Kaggle challenge: Identify patients diagnosed with type-2 diabetes Adel Javanmard (USC ) Hypothesis Testing October 2015 2 / 39

  3. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  4. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  5. Statistical model Data ( Y 1 , X 1 ) ,..., ( Y n , X n ) : Y i = Patient i gets type-2 diabetes ∈ { 0 , 1 } ∈ R p X i = Features of patient i θ 0 ∈ R p Y i ∼ f θ 0 ( ·| X i ) θ 0 , j = contribution of feature j Adel Javanmard (USC ) Hypothesis Testing October 2015 3 / 39

  6. Regularized estimator � � � θ ≡ argmin L ( θ ) + λ � θ � 1 . � �� � ���� θ ∈ R p logistic loss regularizer Convex optimization Variable selection Adel Javanmard (USC ) Hypothesis Testing October 2015 4 / 39

  7. Practice fusion data set (Kaggle) Database n = 500 : patients p = 805 : medical information (meds, lab results, diagnosis, . . . ) Adel Javanmard (USC ) Hypothesis Testing October 2015 5 / 39

  8. 0.4 0.3 Blood Billirubin pressure 0.2 Globulin 0.1 � θ 0 − 0.1 − 0.2 (HDL) cholesterol − 0.3 − 0.4 Year of birth − 0.5 0 200 400 600 800 Regularized logreg selects 62 features ( λ chosen via cross validation resulting AUC = 0 . 75 ) Shall we trust our findings? Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

  9. 0.4 0.3 Blood Billirubin pressure 0.2 Globulin 0.1 � θ 0 − 0.1 − 0.2 (HDL) cholesterol − 0.3 − 0.4 Year of birth − 0.5 0 200 400 600 800 Regularized logreg selects 62 features ( λ chosen via cross validation resulting AUC = 0 . 75 ) Shall we trust our findings? Adel Javanmard (USC ) Hypothesis Testing October 2015 6 / 39

  10. In summary Will focus on linear model and Lasso Compute confidence intervals/p-values Adel Javanmard (USC ) Hypothesis Testing October 2015 7 / 39

  11. Outline Problem definition 1 Debiasing approach 2 Hypothesis testing under nearly optimal sample size 3 Adel Javanmard (USC ) Hypothesis Testing October 2015 8 / 39

  12. Problem definition Adel Javanmard (USC ) Hypothesis Testing October 2015 9 / 39

  13. Linear model We focus on linear models: Y = X θ 0 + W Y ∈ R n (response), X ∈ R n × p (design matrix), θ 0 ∈ R p (parameters) Noise vector has independent entries with i ) = σ 2 , E ( W 2 E ( W i ) = 0 , E ( | W i | 2 + κ ) < ∞ , for some κ > 0 . Adel Javanmard (USC ) Hypothesis Testing October 2015 10 / 39

  14. Problem Confidence intervals: For each i ∈ { 1 ,..., p } , θ i , θ i ∈ R such that � � θ 0 , i ∈ [ θ i , θ i ] ≥ 1 − α P We would like | θ i − θ i | as small as possible. Hypothesis testing: H 0 , i : θ 0 , i = 0 , H A , i : θ 0 , i � = 0 Adel Javanmard (USC ) Hypothesis Testing October 2015 11 / 39

  15. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  16. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  17. LASSO � 1 � � 2 n � y − X θ � 2 θ ≡ argmin 2 + λ � θ � 1 . θ ∈ R p [Tibshirani 1996, Chen, Donoho 1996] Distribution of � θ ? Debiasing approach: (LASSO is biased towards small ℓ 1 norm.) debiasing � → � θ d θ − − − − − − − − − We characterize distribution of � θ d . Adel Javanmard (USC ) Hypothesis Testing October 2015 12 / 39

  18. Debiasing approach Adel Javanmard (USC ) Hypothesis Testing October 2015 13 / 39

  19. Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

  20. Classical setting ( n ≫ p ) We know everything about the least-square estimator: θ LS = 1 � � Σ − 1 X T Y , n where � Σ ≡ ( X T X ) / n is empirical covariance. • Confidence intervals: � ( � Σ − 1 ) ii [ θ i , θ i ] = [ � − c α ∆ i , � θ LS θ LS + c α ∆ i ] , ∆ i ≡ σ i i n Adel Javanmard (USC ) Hypothesis Testing October 2015 14 / 39

  21. High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Problem in high dimension: � Σ is not invertible! Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

  22. High-dimensional setting ( n < p ) θ LS = 1 � Σ − 1 X T Y � n Take your favorite M ∈ R p × p : θ ∗ = 1 � nM X T Y = 1 nM X T X θ 0 + 1 nM X T W + 1 = θ 0 +( M � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (USC ) Hypothesis Testing October 2015 15 / 39

  23. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  24. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ d = � θ ∗ − ( M � � Σ − I ) � θ Lasso Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  25. Debiased estimator + 1 θ ∗ = θ 0 +( M � � nM X T W Σ − I ) θ 0 � �� � � �� � bias Gaussian error Let us (try to) subtract the bias θ d = � θ ∗ − ( M � � Σ − I ) � θ Lasso Debiased estimator ( � θ = � θ Lasso ) θ + 1 θ d ≡ � � nM X T ( Y − X � θ ) Adel Javanmard (USC ) Hypothesis Testing October 2015 16 / 39

  26. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  27. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Does this remind you anything? θ + Σ − 1 1 θ d ≡ � � n X T ( y − X � θ ) Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  28. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Does this remind you anything? θ + Σ − 1 1 θ d ≡ � � n X T ( y − X � θ ) (pseudo-) Newton method Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  29. Debiased estimator: Choosing M ? θ + 1 θ d ≡ � � nM X T ( y − X � θ ) Gaussian design ( x i ∼ N ( 0 , Σ ) ) � Assume known Σ (relevant in semi-supervised learning) � M = Σ − 1 [Javanmard, Montanari 2012] Approximate inverse of � Σ : nodewise LASSO on X (under row-sparsity assumption on Σ − 1 ) [S. van de Geer, P . Bühlmann, Y. Ritov, R. Dezeure 2014] Adel Javanmard (USC ) Hypothesis Testing October 2015 17 / 39

  30. Debiased estimator: Choosing M ? Our approach: Optimizing two objectives (bias and variance of � θ d ) [A. Javanmard, A. Montanari 2014] √ n ( � θ d − θ 0 ) = √ n ( M � Σ − I )( θ 0 − � θ ) + Z � �� � bias ↓ Σ = 1 Z | X ∼ N ( 0 , σ 2 M � � Σ M T n XX T ) , � �� �  � noise covariance Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  31. Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] 1 ≤ i ≤ p ( M � Σ M T ) i , i max minimize M | M � Σ − I | ∞ ≤ ξ subject to Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  32. Debiased estimator: Choosing M ? Our approach: Find M by solving an optimization problem: [A. Javanmard, A. Montanari] i � m T Σ m i minimize m i � � Σ m i − e i � ∞ ≤ ξ subject to The optimization can be decoupled and solved in parallel. Adel Javanmard (USC ) Hypothesis Testing October 2015 18 / 39

  33. What does it look like? 0.4 0.3 Density 0.2 0.1 0.0 -10 -5 0 5 10 � θ d i Can estimate σ ‘Ground truth’ from n tot = 10 , 000 records. Adel Javanmard (USC ) Hypothesis Testing October 2015 19 / 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend