sliced inverse regression with interaction siri detection
play

Sliced Inverse Regression with Interaction (Siri) Detection for - PowerPoint PPT Presentation

Sliced Inverse Regression with Interaction (Siri) Detection for non-Gaussian BN learning Jun S. Liu Department of Statistics Harvard University Joint work with Bo Jiang 1 General: Regression and Classification Responses Covariates Ind 1 x


  1. Sliced Inverse Regression with Interaction (Siri) Detection for non-Gaussian BN learning Jun S. Liu Department of Statistics Harvard University Joint work with Bo Jiang 1

  2. General: Regression and Classification Responses Covariates Ind 1 x 11 , x 12 , …, x 1p Y 1 Ind 2 Y 2 x 21 , x 22 , …, x 2p M   x N1 , x N2 , …, x NP Ind N Y N 2

  3. 3

  4. Variable Selection with Interaction Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) Suppose p = 1000. How to find X 1 and X 2 ? One step forward selection : ∼ 500,000 interaction terms 4

  5. Variable Selection with Interaction Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) Suppose p = 1000. How to find X 1 and X 2 ? One step forward selection : ∼ 500,000 interaction terms Is there any marginal relationship between Y and X 1 ? 5

  6. [Y | X] ? [X | Y] ? Who is behind the bar ? 6

  7. General: Regression and Classification Responses Covariates Ind 1 x 11 , x 12 , …, x 1p Y 1 Ind 2 Y 2 x 21 , x 22 , …, x 2p M   x N1 , x N2 , …, x NP Ind N Y N P Y ( | X ) P ( X | ) ( )/ Y P Y P ( ) X = How to model this? 7

  8. Naïve Bayes model Y X 1 X 2 X 3 X m 8

  9. (Augmented) Naïve Bayes Model l BEAM: Bayesian Epistasis Association Mapping (Zhang and Liu 2007): discrete univariate response and discrete predictors l (Augmented) Naïve Bayes Classifier with Variable Selection and Interaction Detection (Yuan Yuan et al.): discrete univariate response and continuous (but discretized) predictors l Bayesian Partition Model for eQTL study (Zhang et al. 2010): continuous multivariate responses and discrete predictors l Sliced Inverse Regression with Interaction Detection (SIRI): continuous univariate response and continuous predictors 9

  10. Tree-Augmented Naïve Bayes Y TAN (tree-augmented naïve Bayes) X 4 X 6 X 3 X 1 X 2 X 5 (Pearl 1988; Friedman 1997) 10

  11. Augmented Naïve Bayes X 2.21 Y Group 0 X 01 X 02 X 2.22 Group 22 X 2.12 X 11 X 12 X 13 X 2.11 X 2.13 Group 1 Group 21 11

  12. How about continuous covariates? • We may discretize Y, and discretize each X • Or discretize Y, assuming joint Gaussian distributions on X? • Sound familiar?

  13. An observation: Y = X 1 × X 2 + ϵ , ϵ ∼ N ( 0, σ 2 ) , X ∼ MVN ( 0, I p ) y 13 x 1

  14. Sliced Inverse Regression (SIR, Li 1991) SIR is a tool for dimension reduction in multivariate statistics Let Y ∈ R be a univerate response variable and X ∈ R p be a vector of p continuous predictor variables T X , ... , β K T X , ϵ ) Y = f ( β 1 f is an unknown function and ϵ is the error with finite variance How to identify unknown projection vectors β 1 , ... , β K ? 14

  15. 15

  16. 16

  17. 17

  18. 18

  19. 19

  20. 20

  21. 21

  22. 22

  23. 23

  24. 24

  25. 25

  26. 26

  27. 27

  28. 28

  29. 29

  30. ###ëëÊ ¹ à”‹åf 30

  31. ̂ ̂ SIR Algorithm Let Σ xx be the covariance matrix of X . Standarize X to : − 1 / 2 { X − E X } Z = Σ xx Divde the range of y i into S nonoverlapping slices H s ∈ { 1,... ,S } n s is the number of observations within each slice − 1 ∑ i ∈ H s z i , and Compute the mean of z i over all slices ̄ z s = n s calculate the estimate for Cov { E ( X ∣ Y )} : S − 1 ∑ s = 1 T M = n n s ̄ z s ̄ z s M , ̂ λ k and corresponding Identify largest K eigenvalues of ̂ η k . Then, eigenvectors ̂ − 1 / 2 ̂ β k = ̂ Σ xx η k ( k = 1,... ,K ) 31

  32. SIR with Variable Selection Only a subset of predictors are relevant: β 1 , ... , β K are sparse Backward subset selection (Cook 2004, Li et al. 2005) Shrinkage estimates of β 1 , ... , β K using L 1 - or L 2 -penalty : Regularized SIR (RSIR, Zhong et al. 2005) Sparse SIR (SSIR, Li 2007) Correlation Pursuit (Zhong et al. 2012) : A forward selection and backward elimination procedure motivated by F-test in stepwise regression F 1, n − d − 1 = ( n − d − 1 )( ̂ R d + 1 − ̂ R d 2 2 ) 1 − ̂ R d + 1 2 32

  33. Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j 33

  34. Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k 34

  35. Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k r ) ,r < 1 / 2 The stepwise procedure is consistent if p = O ( n 35

  36. Correlation Pursuit (COP) A the k th Let A be the current set of selected predictors and ̂ λ k largest eigenvalue estimated by SIR based on predictors in A For j th predictor ( j ∉ A ) , X j , define statistic A + j = n ( ̂ λ k A + j − ̂ λ k A ) COP k 1 − ̂ λ k A + j If j ∉ A ,COP k A + j ( k = 1,... , K ) are asymptotically � 2 ( 1 ) , i.i.d. χ A + j = ∑ k = 1 K A + j is asymptotically χ 2 ( K ) and COP 1: K COP k r ) ,r < 1 / 2 The stepwise procedure is consistent if p = O ( n Dimension K and threshold in forward selection (backward elimnation) are chosen by cross-validation 36

  37. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) 37

  38. SIR via MLE Let A be the set of relevant predictors and C = ¬ A , d = ∣ A ∣ X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) 38

  39. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) μ s = α + Γ γ s , where γ s ∈ R K and Γ is a d × K orthogonal matrix 39

  40. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V 40

  41. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) 41

  42. SIR via MLE Let A be the set of relevant predictors and C = ¬ A , d = ∣ A ∣ X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant 42

  43. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant P M 1 ( X ∣ Y ) P M 1 ( X j ∣ X A ,Y ) P M 0 ( X ∣ Y ) = P M 0 ( X j ∣ X A ,Y ) 43

  44. SIR via MLE C , d = ∣ A ∣ Let A be the set of relevant predictors and C = A X A ∣ Y ∈ H s ∼ N ( μ s , Σ ) X C ∣ X A ,Y ∈ H s ∼ N ( X A β , Σ 0 ) K , belongs to a K -dimensional affine space ( K < d ) μ s = α + V K coincides with SIR directions MLE of the span of subspace V (Cook 2007, Szretter and Yohai 2009) Given current A and predctor X j ∉ A , we want to test H 0 : X j is irrelevant, vs. H 1 : X j is relevant M 1 ( X j ∣ X A ,Y ) P ̂ LR j = M 0 ( X j ∣ X A ,Y ) P ̂ 44

  45. ̂ LR Test vs. COP Given current A , the likelihood ratio (LR) test statistic of H 0 : X j is irrelevant, vs. H 1 : X j is relevant A + j )− ∑ k = 1 K K 2LR j = − n ( ∑ k = 1 log ( 1 − ̂ log ( 1 − ̂ λ k A )) λ k log ( A + j ) A + j − ̂ λ k A λ k K = n ∑ k = 1 1 + 1 − ̂ λ k Under H 0 : X j is irrelevant A + j = n ( ̂ λ k A + j − ̂ λ k 2 ( 1 ) , ( ̂ λ k A + j − ̂ λ k A ) A ) → p χ → p 0 COP k 1 − ̂ λ k 1 − ̂ λ k A + j A + j 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend