high dimensional classification in the presence of
play

High Dimensional Classification in the Presence of Correlation: A - PowerPoint PPT Presentation

High Dimensional Classification in the Presence of Correlation: A Factor Model Approach A. PEDRO DUARTE SI LVA * Faculdade de Economia e Gesto / Centro de Estudos em Gesto e Economia Universidade Catlica Portuguesa Centro Regional do


  1. High Dimensional Classification in the Presence of Correlation: A Factor Model Approach A. PEDRO DUARTE SI LVA * Faculdade de Economia e Gestão / Centro de Estudos em Gestão e Economia Universidade Católica Portuguesa Centro Regional do Porto PARIS, 23-28 August 2010 Compstat’ 2010 (*) Supported by: FEDER / POCI 2010

  2. High Dim ensional Correlation Adjusted Classification Overview 1. A Factor-model linear classification rule for High-Dimensional correlated data 2. Asymptotic properties with p   3. Variable selection for problems with “rare” and “mostly weak” group differences 4. Performance in Micro-Array problems 5. Conclusions and Perspectives Compstat ’ 2010 PARIS, 23-28 August 2010

  3. High Dim ensional Correlation Adjusted Classification Problem Statment: X   p ( Y ; X ) Y  {0,1} We want to find a rule that predicts Y given X  ˆ Y argmax π f (X) Bayes rule: g g g X | Y ~ N ( μ , Σ ) Assuming p (Y)  Bayes rule: π 1  =  ( 1 ) -  ( 0)  { } 1 ( ) ˆ     Y T 1 Δ Σ X ( μ μ ) log 0 i 0 1 2 π 1 How to estimate  -1 when p > n and the X correlations are important ? Compstat ’ 2010 PARIS, 23-28 August 2010

  4. High Dim ensional Correlation Adjusted Classification A Factor-Model Approach  i   P f i   q X i =  ( Yi) + B f i +  i q < < p  j D  (j) > k 0   0 f ~ N (0 , I ) ε ~ N (0 , D ) i q q i p ε   = B B T + D   -1 = D  -1 B [ I q + B T D  -1 B] -1 B T D  -1 - D  -1 ˆ Σ  T  ˆ ˆ ˆ B B D RFctq ε   ˆ ˆ ˆ -1/2 ˆ ˆ -1/2 ˆ -1/2 ˆ -1/2 2 B , D arg min || V Σ V V S V ||  RFctq F B ˆ , D ˆ ε Compstat ’ 2010 PARIS, 23-28 August 2010

  5. High Dim ensional Correlation Adjusted Classification Asym ptotic Properties We will compare empirical linear rules 1 n { } ( )      ˆ T ˆ 1 δ 1 Δ Σ X ( X X ) log 0 L δ i 0 1 2 n L 1 and  estimator ˆ Δ For some parameter space Γ satisfying δ L   ˆ 2 max E || Δ Δ || o(1) ( C1 ) Γ L θ δ based on the criterion     ˆ T ˆ -1 ˆ  Δ Σ Δ      δ      W ( δ max P δ (Y 1 | Y 0 max 1 Φ L ) )     Γ L Γ θ L i i Γ   δ δ δ   ˆ T -1 -1 ˆ L L L ˆ ˆ 2 Δ Σ Σ Σ Δ     δ δ L L n(p)      when p ; d p Compstat ’ 2010 PARIS, 23-28 August 2010

  6. High Dim ensional Correlation Adjusted Classification Asym ptotic Properties     Main Result      T 1  2 θ : Δ Σ Δ c ,         when k λ ( ) λ ( Σ k )  1 min max 2          θ μ , μ , Σ Γ (k , k , k , q, B, c) Δ B   (0) (1) F 0 1 2 q     β (j, a)    0 =  1 = 1 / 2    j, a    R(j' , l' )  j' , l'     D (j)       j ε ( C1 ) is satisfied    R(j' , l' )   j' , l'  T  2   -1/2 -1/2 2  -1/2 -1/2 Σ B B D arg min || R V Σ V || R V Σ V  RFctq B, D RFctq F RFctq RFct  q n(p)     p ; It follows that: when log p   1 1 K    λ ( Σ ) Σ Σ Σ Σ   0Fq 2 2   max 0Fq W ( δ 1 Φ c  K max ) RFct RFct   Γ Fq Fq  q q 0Fq Γ 0F 1 K λ ( Σ ) q δ F q   0Fq min 0Fq Compstat ’ 2010 PARIS, 23-28 August 2010

  7. High Dim ensional Correlation Adjusted Classification Selecting Predictors 1 - Rank variables acording to tw o-sam ple t-scores 2 – Choose a selection cut-off for the score values (Donoho e Jin 2004) Higher Criticism Given p ordered p-values:  1 , ...,  p ( ) j/p - π p  j HC(j; π ) j ( ) ( ) j / p 1 - (j / p)  HC * max HC(j; π )  j α j 0 Compstat ’ 2010 PARIS, 23-28 August 2010

  8. High Dim ensional Correlation Adjusted Classification Selecting Predictors Higher Criticism I n a tw o-group hom okedastic m odel, w ith : - Diagonal classification rules - p-values derived from two-group t-scores - Independent variables - Rare “effects” (mean group diferences) - Weak effects w hen p  HC* is asym ptotically equivalent to the (Donoho e Jin 2009) optim al selection threshold Compstat ’ 2010 PARIS, 23-28 August 2010

  9. High Dim ensional Correlation Adjusted Classification Selecting Predictors Control of false discovery rates Given a sequence of p independent tests w ith ordered p-values:  1 , ...,  p Reject the null hypothesis ( H 0 j ) w here j  k, w ith   j (Benjamini e Hochberg 1995)   k max j : π α   j p   Given a sequence of p dependent tests w ith ordered p-values:  1 , ...,  p Reject the null hypothesis ( H 0 j ) w here j  k, w ith       j (Benjamini e Yekutieli 2001)   k max j : π α   j p 1    p   i    i 1 Compstat ’ 2010 PARIS, 23-28 August 2010

  10. High Dim ensional Correlation Adjusted Classification Selecting Predictors Expanded Higher Criticism A selection scheme for problems where effects are rare and m ost (but not necessarly all) effects are weak 1 - Include all variables that satisfy Benjamini and Yekutieli’s criterion Estimate an “empirical null distributiuon” 2 - 3 - Compute p-values for the effects of non-selected variables, based on the null estimated in step 2 4 - Find the HC* threshold from the p-values computed in step 3 Compstat ’ 2010 PARIS, 23-28 August 2010

  11. High Dim ensional Correlation Adjusted Classification Singh’s Prostate Cancer Data – p= 6033; n= 50+ 52 Rule Error Estimate # Variables kept (std error) (min – median - max) 0.2146 58 – 134.5 – 421 Fisher’s LDA* (0.0101) 0.0670 Naive Bayes* 58 – 134.5 – 421 (0.0052) 0.0642 Support Vector Machines* 58 – 134.5 – 421 (0.0052) 0.0838 108 – 356 – 1771 Nearest Shruken Centroids (0.0063) 0.0741 Regularized DA 82 – 390 – 1201 (0.0053) 0.0650 Shrunken DA* 58 – 134.5 – 421 (0.0051) 0.0641 Factor-based LDA* (q=1) 58 – 134.5 – 421 (0.0052) 0.0720 NLDA* 58 – 134.5 – 421 (0.0052) * After variable selection by the maximum of FDR (False Discovery Rates) and HC (Higher Criticism), both derived from Independence based T-scores. The p-values used in the HC computations are derived from empirical Null distributions Compstat ’ 2010 PARIS, 23-28 August 2010

  12. High Dim ensional Correlation Adjusted Classification Golubs’s Leukemia Data –- p = 7 129 ; n = 47+ 25 Rule Error Estimate # Variables kept (std error) (min – median - max) 0.2558 326 – 478 – 712 Fisher’s LDA* (0.0109) 0.480 326 – 478 – 712 Naive Bayes* (0.0085) 0.0405 326 – 478 – 712 Support Vector Machines* (0.0049) 0.0201 Nearest Shruken Centroids 703 – 3166 – 7129 (0.0039) 0.0491 12 – 1934 – 7124 Regularized DA (0.0062) 0.0276 326 – 478 – 712 Shrunken DA* (0.0044) 0.0174 Factor-based LDA* (q=1) 326 – 478 – 712 (0.0034) 0.1510 326 – 478 – 712 NLDA* (0.0085) * After variable selection by the maximum of FDR (False Discovery Rates) and HC (Higher Criticism), both derived from Independence based T-scores. The p-values used in the HC computations are derived from empirical Null distributions Compstat ’ 2010 PARIS, 23-28 August 2010

  13. High Dim ensional Correlation Adjusted Classification Alon’s Colon Data -– p = 2 000 ; n = 40+ 22 Rule Error Estimate # Variables kept (std error) (min – median - max) 0.3285 3 – 71.5 – 200 Fisher’s LDA* (0.0143) 0.2275 3 – 71.5 – 200 Naive Bayes* (0.0133) 0.1576 Support Vector Machines* 3 – 71.5 – 200 (0.0095) 0.1563 Nearest Shruken Centroids 7 – 39 – 527 (0.0098) 0.2174 14 – 425 – 2000 Regularized DA (0.0126) 0.1865 3 – 71.5 – 200 Shrunken DA* (0.0100) 0.1746 Factor-based LDA* (q=1) 3 – 71.5 – 200 (0.0098) 0.2614 3 – 71.5 – 200 NLDA* (0.0114) * After variable selection by the maximum of FDR (False Discovery Rates) and HC (Higher Criticism), both derived from Independence based T-scores. The p-values used in the HC computations are derived from empirical Null distributions Compstat ’ 2010 PARIS, 23-28 August 2010

  14. High Dim ensional Correlation Adjusted Classification Conclusions  A factor-m odel classification rule, designed for high- dim ensional correlated data, w as proposed  Asymptotic Analysis show that As p  the new rule can approach a low expected error rate Often, much lower than unrestricted covariance rules independence-based rules  Empirical comparisons sugest that w hen com bined w ith sensible variable selection schem es the new rule is highly com petitive in MicroArray Applications Compstat ’ 2010 PARIS, 23-28 August 2010

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend