Factor Analysis for Multiple Testing : an R package for large-scale - - PowerPoint PPT Presentation

factor analysis for multiple testing an r package for
SMART_READER_LITE
LIVE PREVIEW

Factor Analysis for Multiple Testing : an R package for large-scale - - PowerPoint PPT Presentation

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments Factor Analysis for Multiple Testing : an R package for large-scale significance testing under dependence Maela Kloareg, Chlo Friguet & David


slide-1
SLIDE 1

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Factor Analysis for Multiple Testing : an R package for large-scale significance testing under dependence

Maela Kloareg, Chloé Friguet & David Causeur

Applied mathematics department Agrocampus Ouest, Université Européenne de Bretagne

The UseR! Conference, July 2009 Agrocampus Ouest, France

1 / 19

slide-2
SLIDE 2

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Outline

1

Background

2

Factor Analysis for Multiple Testing

3

The FAMT package procedure

4

Concluding comments

2 / 19

slide-3
SLIDE 3

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Impact of dependence in multiple testing

Multiple testing: to point out genes which expressions (Y) significantly depend on the experimental condition (X) High dimension: a few microarrays and a huge number of gene expressions A major concern: the biological links among genes and the high dimensional setting generates a large-scale correlation structure, which induces high instability in multiple testing procedures.

3 / 19

slide-4
SLIDE 4

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Distribution of error rates in multiple tests

Distribution of False Discovery Proportion (Vt/Rt) on 1.000 simulated datasets/scenario (Friguet et al., 2009, JASA)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj FDP sd = 3.18e−02 sd = 3.32e−02 sd = 4.45e−02 sd = 6.03e−02 sd = 7.91e−02 sd = 10.61e−02 sd = 11.33e−02 sd = 14.14e−02 sd = 14.24e−02 sd = 14.16e−02 Mean FDP 0.05 and 0.95 quantiles

m Rt m-Rt m1

St Tt

True H1 m0

Vt Ut

True H0 Total Declared H1 Declared H0

4 / 19

slide-5
SLIDE 5

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Distribution of error rates in multiple tests

Distribution of Non-Discovery Proportion (Tt/m1) on 1.000 simulated datasets/scenario (Friguet et al., 2009, JASA)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj NDP Mean NDP 0.05 and 0.95 quantiles

m Rt m-Rt m1

St Tt

True H1 m0

Vt Ut

True H0 Total Declared H1 Declared H0

4 / 19

slide-6
SLIDE 6

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Outline

1

Background

2

Factor Analysis for Multiple Testing

3

The FAMT package procedure

4

Concluding comments

5 / 19

slide-7
SLIDE 7

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Factor Analysis for Multiple Testing

The common information shared by all the variables (m) is modeled by a factor analysis structure. The common factors Z : small number (q << m) of latent variables (Friguet et al., 2009, JASA)

B B ′ + Ψ = Σ

Specific variability (uniqueness) Common variability

( )

Ψ = ) ( , ; ~ ε V I N Z

q

) ( ) ( ) ( ) ( k k k k

BZ x Y ε β β + + ′ + =

Common factors

6 / 19

slide-8
SLIDE 8

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Factor Analysis for Multiple Testing

The common information shared by all the variables (m) is modeled by a factor analysis structure. The common factors Z : small number (q << m) of latent variables (Friguet et al., 2009, JASA) Similar idea : Surrogate Variable Analysis method, Leek and Storey, 2007, 2008.

6 / 19

slide-9
SLIDE 9

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Factor-adjusted test statistics

The adjusted test statistics are conditionally centered and scaled version of usual test statistics Conditional distribution of the usual test statistic T (k) E(T (k) | Z) = τk + b′

k

σk τ(Z), Var(T (k) | Z) = ψ2

k

σ2

k

. Conditional centering and scaling T (k)

z

= σk ψk

  • T (k) − b′

k

σk τ(Z)

  • .

with E(T (k)

z

) =

τk

1−h2

k

and Var(Tz) = Im.

7 / 19

slide-10
SLIDE 10

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Distribution of error rates in multiple tests

Distribution of False Discovery Proportion on 1.000 simulated datasets/scenario (Friguet et al., 2009, JASA) Usual t-tests Factor-adjusted t-tests

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj FDP sd = 3.18e−02 sd = 3.32e−02 sd = 4.45e−02 sd = 6.03e−02 sd = 7.91e−02 sd = 10.61e−02 sd = 11.33e−02 sd = 14.14e−02 sd = 14.24e−02 sd = 14.16e−02 Mean FDP 0.05 and 0.95 quantiles 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj FDP sd = 3.18e−02 sd = 3.39e−02 sd = 3.34e−02 sd = 3.06e−02 sd = 2.99e−02 sd = 3.12e−02 sd = 3e−02 sd = 4.1e−02 sd = 4.33e−02 sd = 3.88e−02 Mean FDP 0.05 and 0.95 quantiles

8 / 19

slide-11
SLIDE 11

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Distribution of error rates in multiple tests

Distribution of Non-Discovery Proportion on 1.000 simulated datasets/scenario (Friguet et al., 2009, JASA) Usual t-tests Factor-adjusted t-tests

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj NDP Mean NDP 0.05 and 0.95 quantiles 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 πj NDP Mean NDP 0.05 and 0.95 quantiles

8 / 19

slide-12
SLIDE 12

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Outline

1

Background

2

Factor Analysis for Multiple Testing

3

The FAMT package procedure

4

Concluding comments

9 / 19

slide-13
SLIDE 13

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

The FAMT package steps

1 Estimation of the number of factors 2 Factor Analysis model (using

M0 = {k, Pk ≥ α})

3 Multiple testing : conditional statistics and p-values

  • M0 updated, step 1 to 3 are done twice

4 Estimation of the proportion of null hypotheses 5 Benjamini and Hochberg’s procedure to control the FDR

10 / 19

slide-14
SLIDE 14

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

The FAMT package steps

1 Estimation of the number of factors 2 Factor Analysis model (using

M0 = {k, Pk ≥ α})

3 Multiple testing : conditional statistics and p-values

  • M0 updated, step 1 to 3 are done twice

4 Estimation of the proportion of null hypotheses 5 Benjamini and Hochberg’s procedure to control the FDR

Illustration on the Lymphoma dataset (Alizadeh et al. 2000)

  • 32 samples : 2 classes of B cell-like diffuse large cell

lymphoma (DLCL) : germinal center B cell-like DLCL (18 samples) and active B cell-like DLCL (14 samples)

  • Expression levels of 10295 genes

10 / 19

slide-15
SLIDE 15

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

1/ Estimation of the number of factors

The number of factors is chosen to reduce the variance of the number of false positives in multiple tests.

  • 1

2 3 4 5 6 7 2000000 2200000 2400000 Number of factors Variance Inflation Criterion 11 / 19

slide-16
SLIDE 16

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

2/ Factor Analysis model

To deal with high-dimension, the model parameters are estimated with an EM-algorithm (Rubin and Thayer, 1982) :

  • E step : estimation of Z
  • M step : estimation of B and Ψ

B B ′ + Ψ = Σ

Specific variability (uniqueness) Common variability

( )

Ψ = ) ( , ; ~ ε V I N Z

q

) ( ) ( ) ( ) ( k k k k

BZ x Y ε β β + + ′ + =

Common factors 12 / 19

slide-17
SLIDE 17

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

3/ Multiple testing (conditional p-values)

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 stud_pvalues cond_pvalues 13 / 19

slide-18
SLIDE 18

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

3/ Multiple testing (conditional p-values)

  • Class

Y

  • bserved Y, p=0.29

adjusted Y, pz=0.005 13 / 19

slide-19
SLIDE 19

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

4/ Estimation of the proportion of null hypotheses

Key parameter to control the error rates. FAMT provides 2 estimation algorithms :

  • one based on the density of the conditional p-values
  • the other uses a modified smoothing spline approach

(based on Storey and Tibshirani, 2003).

Diagnostic Plot: Distribution of conditional p−values and estimated pi0

Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.73

method = density cond_pi0 = 0.7288

14 / 19

slide-20
SLIDE 20

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

5/ Benjamini and Hochberg’s procedure (q-values)

0.00 0.05 0.10 0.15 200 400 600 800 1000 1200 Cut off on q−values (FDR) number of H0 rejection FAMT Student

15 / 19

slide-21
SLIDE 21

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Heat maps

Cut off on the adjusted q-values : 5% FDR control level (389 genes) Observed values Factor-adjusted values

X10038 X10703 X2530 X14398 X10014 X15633 X10558 X2888 X19353 X15160 X2920 X942 X954 X1312 X13307 X1599 X13489 X12858 X10618 X1016 X21226 X3089 X2315 X21006 X19312 X3793 X3928 X4474 X18509 X10196 X20644 X14636 X9941 X14924 X3542 X14420 X13858 X1476 X10751 X3706 X1480 X13964 X21102 X20626 X13914 X3118 X17734 X15488 X2016 X10158 X11362 X21008 X19291 X21607 X17496 X2554 X14304 X19321 X4059 X13723 X14963 X16886 X21096 X21551 X10345 X4028 X15864 X18388 X10688 X2059 X2300 X15900 X3392 X16038 X964 X1976 X9872 X18430 X17428 X18436 X21329 X15163 X20602 X10323 X20253 X9976 X16426 X16952 X16858 X18601 X2438 X16140 X19271 X20316 X17774 X19408 X17708 X16985 X17717 X18611 X15579 X17785 X1086 X3629 X10103 X1832 X14906 X2216 X10109 X15897 X14671 X2948 X3231 X19219 X3575 X16924 X10674 X20648 X4083 X3466 X21053 X259 X16336 X4392 X20423 X14164 X17140 X9678 X21397 X16706 X16666 X15947 X19541 X13347 X16689 X2081 X2450 X2551 X16111 X4625 X17823 X18454 X2673 X4338 X17517 X10256 X2269 X18605 X16850 X18502 X16792 X3033 X4991 X4989 X2507 X9623 X9620 X9621 X9619 X9690 X9643 X9692 X9694 X9672 X9695 X9667 X9671 X12444 X18355 X16614 X12603 X16736 X18428 X10904 X4297 X19339 X17274 X1716 X1715 X19238 X17218 X13050 X20110 X14296 X11182 X12998 X15017 X1486 X20486 X11082 X15432 X13392 X11276 X260 X1280 X13056 X13805 X19201 X50 X13054 X14724 X14338 X15422 X15064 X19278 X19272 X13776 X1628 X14248 X747 X2875 X14344 X14388 X1247 X14394 X13681 X15359 X2790 X12557 X21116 X2386 X14295 X12940 X16654 X14949 X4047 X17821 X21133 X21229 X10059 X20946 X2900 X17038 X21462 X1796 X11617 X20363 X20700 X11316 X13399 X1282 X13958 X930 X14830 X12946 X11199 X15673 X15914 X19384 X14993 X15836 X14206 X10727 X10728 X278 X10116 X19287 X11301 X13526 X13820 X14828 X13394 X19205 X3063 X13761 X19289 X15524 X13812 X10526 X13312 X11045 X19288 X19274 X1725 X20585 X934 X11003 X13055 X16380 X21655 X21025 X14423 X16272 X10824 X2772 X15568 X19493 X11513 X20040 X2230 X16252 X17945 X16945 X16870 X17908 X16656 X20103 X4051 X389 X2517 X488 X13257 X1538 X19390 X17882 X12869 X14253 X17709 X1825 X16466 X20340 X4794 X16626 X20949 X16859 X21086 X18401 X20085 X17632 X19989 X21105 X15685 X2051 X3812 X387 X20095 X16288 X21437 X14624 X2910 X2482 X10418 X20566 X20567 X2963 X19206 X11322 X14422 X13273 X11074 X19365 X511 X1152 X10604 X1277 X16438 X10804 X19284 X19268 X19460 X13424 X14847 X2805 X16517 X20354 X13747 X17308 X17073 X14145 X11057 X13438 X10112 X10694 X20922 X11406 X20202 X21638 X10401 X10056 X14844 X16128 X21594 X11226 X1268 X19994 X11616 X13000 X20980 X21068 X13119 X15311 X13520 X1088 X17189 X11122 X585 X15304 X20428 X20569 Activated 31 Activated 11 Activated 2 Activated 15 Activated 32 Activated 6 Activated 5 Activated 23 Activated 13 Activated 21 Activated 26 Activated 7 Activated 30 Activated 18 GC 28 GC 10 GC 14 GC 12 GC 17 GC 9 GC 27 GC 19 GC 3 GC 20 GC 22 GC 8 GC 16 GC 1 GC 29 GC 25 GC 4 GC 24 X14398 X13489 X10014 X12858 X3706 X10038 X10703 X13964 X2530 X1016 X1599 X1480 X21102 X2300 X2888 X19353 X21096 X21551 X15160 X15633 X14636 X21226 X13050 X20110 X13776 X2315 X3089 X3542 X1476 X10751 X21008 X19321 X4059 X4047 X14924 X1628 X11182 X14248 X260 X1280 X2920 X942 X1312 X13307 X954 X10558 X10345 X10618 X14304 X10196 X20644 X14420 X13858 X9941 X15422 X15064 X3928 X4474 X18509 X19278 X19272 X20626 X18388 X15864 X10688 X2059 X21006 X19312 X3793 X17734 X11362 X10158 X2790 X13914 X3118 X19291 X21607 X2554 X17496 X14724 X19201 X50 X15914 X19384 X4028 X11082 X15017 X15432 X15488 X2016 X12557 X747 X1247 X278 X10116 X2386 X10727 X10728 X12940 X16654 X21462 X13054 X13056 X14344 X14388 X15359 X2875 X10604 X1152 X17821 X14338 X511 X20363 X11316 X13399 X13723 X14963 X16886 X13805 X15836 X14206 X14993 X14394 X1486 X14296 X13392 X20486 X13438 X21116 X10112 X17038 X21229 X21133 X10059 X12998 X11226 X20946 X2900 X11406 X12946 X11199 X14830 X930 X15673 X1282 X10694 X1268 X10418 X2963 X19206 X13273 X16288 X11322 X13958 X19287 X11301 X13394 X10056 X11045 X1725 X19365 X19274 X20585 X934 X11003 X13526 X13820 X14828 X13761 X3063 X19289 X11074 X15524 X10526 X13312 X13812 X19288 X19339 X4297 X18430 X17428 X10323 X15163 X3392 X20602 X20253 X17274 X16038 X16952 X16426 X964 X1976 X9872 X15900 X20423 X21329 X19408 X17708 X18436 X17774 X21397 X3466 X16336 X17717 X18611 X16985 X2081 X1832 X16706 X16666 X10103 X2216 X4392 X2450 X13347 X16858 X18601 X9976 X17140 X14164 X16140 X19271 X2438 X3629 X20316 X2507 X15579 X21053 X18355 X16614 X259 X17785 X4625 X9620 X9623 X9621 X9690 X9692 X9694 X9672 X9643 X9619 X9667 X9695 X9671 X19541 X9678 X10674 X16689 X12603 X15947 X17823 X2551 X16111 X10109 X18502 X16792 X14906 X4338 X20103 X2948 X3231 X15897 X14671 X12444 X2673 X2269 X18605 X16850 X16736 X18428 X10904 X3575 X1086 X19219 X16924 X3033 X20648 X4083 X18454 X17517 X389 X10256 X4051 X4991 X4989 X19390 X12869 X14253 X488 X16626 X2517 X1538 X13257 X17882 X20085 X19989 X17632 X20949 X18401 X16859 X21086 X1825 X2230 X16466 X4794 X20340 X17908 X16870 X16656 X16252 X17945 X17709 X16945 X21105 X13681 X13520 X21638 X13000 X11616 X21594 X17308 X15311 X16128 X11276 X19994 X2051 X3812 X387 X20095 X14949 X15304 X19205 X1088 X21655 X21025 X14422 X20428 X20569 X14295 X585 X20566 X20567 X1796 X11617 X14145 X11057 X20922 X21437 X14624 X2910 X10804 X1277 X11122 X17189 X20700 X2482 X16438 X13055 X16517 X20354 X10401 X2805 X13747 X20202 X19284 X19268 X19460 X13424 X14844 X13119 X15685 X17073 X14423 X14847 X20980 X21068 X1716 X1715 X19238 X17218 X16272 X16380 X10824 X2772 X19493 X11513 X15568 X20040 Activated 23 Activated 15 Activated 7 Activated 31 Activated 26 Activated 21 Activated 13 Activated 30 Activated 2 Activated 18 Activated 32 Activated 5 Activated 11 Activated 6 GC 17 GC 8 GC 16 GC 1 GC 28 GC 10 GC 25 GC 3 GC 27 GC 20 GC 12 GC 14 GC 4 GC 24 GC 29 GC 19 GC 9 GC 22

16 / 19

slide-22
SLIDE 22

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Outline

1

Background

2

Factor Analysis for Multiple Testing

3

The FAMT package procedure

4

Concluding comments

17 / 19

slide-23
SLIDE 23

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Concluding comments

  • FAMT procedure : large improvements in multiple testing

procedures regarding the FDR control and the power (decreasing the non-discovery proportion)

  • The interpretation of the factors can be useful for biologists
  • The factor-adjustment of test statistics also decreases

misclassification rates and improves stability of model selection in supervised classification

  • FAMT

package available at http://www.agrocampus-ouest.fr/math/FAMT

18 / 19

slide-24
SLIDE 24

Background Factor Analysis for Multiple Testing The FAMT package procedure Concluding comments

Interpretation of the factors

19 / 19