ultrahigh dimensional variable selection beyond the
play

Ultrahigh dimensional variable selection: Beyond the linear model - PowerPoint PPT Presentation

Ultrahigh dimensional variable selection: Beyond the linear model Jianqing Fan Princeton University With Richard Samworth and Yichao Wu ; Rui Song http://www.princeton.edu/ jqfan May 16, 2009 Jianqing Fan ( Princeton University)


  1. Ultrahigh dimensional variable selection: Beyond the linear model Jianqing Fan Princeton University With Richard Samworth and Yichao Wu ; Rui Song http://www.princeton.edu/ ∼ jqfan May 16, 2009 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 1 / 43

  2. Outline Introduction 1 Large-scale screening 2 Moderate-scale Selection 3 Iterative feature selection 4 Numerical Studies 5 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 2 / 43

  3. Introduction Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 3 / 43

  4. Introduction High-dim variable selection characterizes many contemporary statistical problems. Bioinformatic: disease classification using microarray, proteomics, fMRI data. Document or text classification: E-mail spam. Association studies between phenotypes and SNPs. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 4 / 43

  5. Growth of Dimensionality � Dimensionality grows rapidly with interactions Portfolio selection and network modeling : 2,000 stocks involves over 2m unknown parameters in the covariance matrix. 50% 50% 0% Gene-gene inteaction : interactions of 5000 genes result in 12.5m features. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 5 / 43

  6. Aims of High-dimensional Regression and Classification � To construct as effective a method as possible to predict future observations. � To gain insight into the relationship between features and response for scientific purposes, as well as, hopefully, to construct an improved prediction method. Bickel (2008) discussion of the SIS paper (JRSS-B). Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 6 / 43

  7. Challenges with Ultrahigh Dimensionality � Computational cost � Estimation accuracy. � Stability Key idea : Large-scale screening and moderate-scale searching. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 7 / 43

  8. Large-scale sreening Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 8 / 43

  9. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  10. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  11. Independence learning Regression : Feature ranking by correlation learning (Fan and Lv, 2008, JRSS-B) . When Y = ± 1, this implies Classification : Feature ranking by two-sample t-tests or other tests (Tibshirani, et al, 03; Fan and Fan, 2008) . SIS : By an appropriate thresholding (e.g., n variables), relevant features are in the selected set (Fan and Lv, 08) , relying on joint-normality assumption. Other independent learning : Hall, Titterington and Xue (2009) derive such a method from empirical likelihood point of view. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 9 / 43

  12. Model setting � � GLIM : f Y ( y | X = x ; θ ) = exp ( y θ − b ( θ )) / φ + c ( y , φ ) with b ′− 1 ( µ ) = θ = x T β . canonial link : Objective : Find sparse β to minimize Q ( β ) = ∑ n i = 1 L ( Y i , x T i β ) . � GLIM : L ( Y i , x T i β ) = b ( x T i β ) − Y i x T i β . � Classification : Y = ± 1. ⋆ SVM L ( Y i , x T i β ) = ( 1 − Y i x T i β ) + . ⋆ AdaBoost L ( Y i , x T i β ) = exp ( − Y i x T i β ) . � Robustness : L ( Y i , x T i β ) = | Y i − x T i β | . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 10 / 43

  13. Model setting � � GLIM : f Y ( y | X = x ; θ ) = exp ( y θ − b ( θ )) / φ + c ( y , φ ) with b ′− 1 ( µ ) = θ = x T β . canonial link : Objective : Find sparse β to minimize Q ( β ) = ∑ n i = 1 L ( Y i , x T i β ) . � GLIM : L ( Y i , x T i β ) = b ( x T i β ) − Y i x T i β . � Classification : Y = ± 1. ⋆ SVM L ( Y i , x T i β ) = ( 1 − Y i x T i β ) + . ⋆ AdaBoost L ( Y i , x T i β ) = exp ( − Y i x T i β ) . � Robustness : L ( Y i , x T i β ) = | Y i − x T i β | . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 10 / 43

  14. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  15. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  16. Questions How to screen discrete variables (Genome-wide association)? 1 Do they have sure screening property? 2 What is the size of selected model in order to have SIS? 3 The arguments in Fan and Lv (2008) can not be applied here. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 11 / 43

  17. Independence learning L 0 = min β 0 n − 1 ∑ n Marginal utility : Letting ˆ i = 1 L ( Y i , β 0 ) , define n ∑ ˆ L j = ˆ L 0 − min n − 1 L ( Y i , β 0 + X ij β j ) Wilks . β 0 , β j i = 1 M or ˆ β ( Wald ), assuming EX 2 j = 1. j Feature ranking : Select features w/ largest marginal utilities : M w β M � � γ n = { j : ˆ M ν n = { j : ˆ L j ≥ ν n } , j ≥ γ n } Dim. reduction : From p n = O ( exp ( n a )) to O ( n b ) : 200 10000 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 12 / 43

  18. Independence learning L 0 = min β 0 n − 1 ∑ n Marginal utility : Letting ˆ i = 1 L ( Y i , β 0 ) , define n ∑ ˆ L j = ˆ L 0 − min n − 1 L ( Y i , β 0 + X ij β j ) Wilks . β 0 , β j i = 1 M or ˆ β ( Wald ), assuming EX 2 j = 1. j Feature ranking : Select features w/ largest marginal utilities : M w β M � � γ n = { j : ˆ M ν n = { j : ˆ L j ≥ ν n } , j ≥ γ n } Dim. reduction : From p n = O ( exp ( n a )) to O ( n b ) : 200 10000 Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 12 / 43

  19. Theoretical Basis – Population Aspect I j = E ℓ ( Y , β M Marginal utility : L ⋆ 0 ) − min E ℓ ( Y , β 0 + β j X j ) . Likelihood ratio (Fan and Song, 09) ⇒ cov ( Y , X j ) = cov ( b ′ ( X T β ⋆ ) , X j ) = 0 Theorem 1 : L ⋆ j = 0 ⇐ ⇒ β M ⇐ j = 0 . For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | = 0, i.e. independence. Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 13 / 43

  20. Theoretical Basis – Population Aspect II j � = 0 } , where β ⋆ = argmin EL ( Y , X T β ) . True model : M ⋆ = { j : β ⋆ Theorem 2 : If | cov ( b ′ ( X T β ⋆ ) , X j ) | ≥ c 1 n − κ for j ∈ M ⋆ , then j | ≥ c 1 n − κ , j | ≥ c 2 n − 2 κ . | β M | L ⋆ min min j ∈ M ⋆ j ∈ M ⋆ If { X j , j / ∈ M ⋆ } is independent of { X i , i ∈ M ⋆ } , then L ⋆ j = 0. For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | ≥ c 1 n − κ , min condition even for LS . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 14 / 43

  21. Theoretical Basis – Population Aspect II j � = 0 } , where β ⋆ = argmin EL ( Y , X T β ) . True model : M ⋆ = { j : β ⋆ Theorem 2 : If | cov ( b ′ ( X T β ⋆ ) , X j ) | ≥ c 1 n − κ for j ∈ M ⋆ , then j | ≥ c 1 n − κ , j | ≥ c 2 n − 2 κ . | β M | L ⋆ min min j ∈ M ⋆ j ∈ M ⋆ If { X j , j / ∈ M ⋆ } is independent of { X i , i ∈ M ⋆ } , then L ⋆ j = 0. For Gaussian covariates, conclusion holds if | cov ( X T β ⋆ , X j ) | ≥ c 1 n − κ , min condition even for LS . Jianqing Fan ( Princeton University) High-dimensional variable selection Yale University 14 / 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend