Differential analysis of microarray data, Multiple testing problems - PowerPoint PPT Presentation

Differential analysis of microarray data, Multiple testing problems and Local False Discovery Rate. S. Robin robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math´ ematique et Informatique Appliqu´ ees Semi-parametric modeling joint work with J.-J. Daudin, A. Bar-Hen, L. Pierre Bio-Info-Math Workshop, Tehran, April 2005 S. Robin: Differential analysis of microarrays 1

Microarray data and differential analysis Molecular biology central dogma DNA molecule (gene) | transcription ↓ messenger RNA (transcript) | translation ↓ Protein (biological function) � � � � Expression level number of copies ∝ “Definition”: of a gene of mRNA in the cell S. Robin: Differential analysis of microarrays 2

Microarray technology Aims to monitor the expression level of several thousands of genes simultaneously 1 spot = 1 gene Expression level in the cell: • at given time, • in a given condition Inferring genes’ functions. Determining the conditions (times, tissues, etc. ) in which the expression of a given gene is the highest (or lowest) should help in understanding its function. S. Robin: Differential analysis of microarrays 3

S. Robin: Differential analysis of microarrays 4

Differential analysis Elementary data: Y itr = expression level of gene i in condition t ( t = 1 or 2 ) at replicate r Differentially expressed genes are genes for which Y i 1 r is not distributed as Y i 2 r . L Null hypothesis for gene i : H 0 ( i ) = { Y i 1 r = Y i 2 r } Statistical test: Student, Wilcoxon, permutation, etc. For each gene we get: the value of the test statistic T i P i = Pr {T > T i | H 0 ( i ) } the corresponding p -value Comparing more than 2 conditions. Same problem: Fisher, Kruskall-Wallis tests provide one p -value for each gene. S. Robin: Differential analysis of microarrays 5

Multiple testing problem Rejection rule: For a given level α , P i < α = ⇒ gene i is declared positive (i.e. differentially expressed) Multiple testing: When performing n simultaneous tests Decision (random) H 0 accepted H 0 rejected TN FN n 0 H 0 true true negatives false negatives negatives FP TP n 1 H 0 false false positives true positives positives N negatives R positives n All the random quantities (capital) depend on the data and the pre-fixed level α . S. Robin: Differential analysis of microarrays 6

Microarray experiment: Typically n = 10 000 tests are performed simultaneously For α = 5% , if no gene is actually differentially expressed ( n 1 = 0 , n 0 = n ), we expect 0 . 05 × 10 000 = 500 “positive” genes which are all false positives. Problem: We’d like to control some “global risk” α ∗ such as • the probability of having one false positive (FWER) FWER = Pr { FP ≥ 1 } , E ( FP/R ) . • or the proportion of false positives (FDR) FDR = (Benjamini & Hochberg, JRSS-B, 1995; Dudoit & al., Stat. Sci., 2003) S. Robin: Differential analysis of microarrays 7

Family Wise Error Rate (FWER) FWER = Pr { FP ≥ 1 } Sidak: If the n tests are independent, Pr { FP ≥ 1 } = 1 − (1 − α ) n . FP ∼ B ( n, α ) = ⇒ Fixing level at α = 1 − (1 − α ∗ ) 1 /n ( ≃ α ∗ /n ) ensures FWER = α ∗ . Bonferroni: In any case �� ≤ Pr { i false positive } = nα FWER = Pr i false positive i i Fixing level at α = α ∗ /n ensures FWER ≤ α ∗ . Remark: The independent case is, in some sense, the worst case. S. Robin: Differential analysis of microarrays 8

Adaptive procedure for FWER Idea: One step procedure are designed for the smallest p -value ⇒ = they are too conservative. Principle: Order the p -values P (1) ≤ · · · ≤ P ( i ) ≤ · · · < P ( n ) . Step 1: Apply (say) the Bonferroni correction to P (1) : if P (1) ≤ α ∗ /n then go to step 2 Step 2: Apply the same correction to P (2) , replacing n by n − 1 : if P (2) ≤ α ∗ / ( n − 1) then go to step 3 Step k : Apply the same correction to P ( k ) , replacing n by n − k + 1 : if P ( k ) ≤ α ∗ / ( n − k + 1) then go to step k + 1 S. Robin: Differential analysis of microarrays 9

Thresholds for Golub data: 27 patients with AML, 11 with ALL, n = 7070 genes, Welch test 0 10 −2 10 . . . p -value −4 10 – 5% −6 10 – Bonferroni −8 10 . . . Holm −10 10 – Sidak −12 10 . . . Sidak ad. −14 10 −16 10 0 1000 2000 3000 4000 5000 6000 7000 8000 S. Robin: Differential analysis of microarrays 10

Adjusted p -values can be directly compared to the desired FWER α ∗ . • One step Bonferroni ˜ P ( i ) ≤ α ∗ /n P ( i ) = min( nP ( i ) , 1) ≤ α ∗ ⇐ ⇒ • One step Sidak P ( i ) = 1 − (1 − P ( i ) ) n ≤ α ∗ ˜ P ( i ) ≤ 1 − (1 − α ∗ ) 1 /n ⇐ ⇒ • Adaptive Bonferroni (Holm, 79) ˜ P ( i ) = max j ≤ i { min[( n − j + 1) P ( j ) , 1] } • Adaptive Sidak ˜ j ≤ i { min[1 − (1 − P ( j ) ) n − j +1 , 1] } P ( i ) = max S. Robin: Differential analysis of microarrays 11

Accounting for dependency The Westfall & Young (93) procedure preserves the correlation between genes I { p s using permutation tests and applying the same permutations to all the genes. Adjusted p -values are estimated by I {| T s � 1 ˆ p = ˜ ( g ) < p g } ”minP” procedure S s � 1 ( g ) | > | T g |} ”maxT” procedure S s The same procedure allows to estimate the distribution of the second, third, etc., smallest p value Limitation. The number of replicates strongly conditions the precision of the estimated distribution: � � � � 8 10 = 70 , = 252 4 5 S. Robin: Differential analysis of microarrays 12

E ( FP/R ) False Discovery Rate (FDR) FDR = Idea: Instead of preventing any error, just control the proportion of errors ⇒ less conservative = Benjamini & Hochberg (95) procedure: Given the sorted p -values P (1) ≤ · · · ≤ P ( i ) ≤ · · · ≤ P ( n ) , rejecting H 0 for all ( i ) such as � � ≤ iα ∗ ≤ iα ∗ FDR ≤ n 0 n α ∗ ≤ α ∗ ⇒ P ( i ) = n n 0 Benjamini & Yakutieli (01): For positively correlated test statistics iα ∗ P ( i ) ≤ n ( � j 1 /j ) . S. Robin: Differential analysis of microarrays 13

Adjusted p -values for Golub data / Number of positive genes: α ∗ = 5% 0 10 −2 10 −4 10 p -value: 1887 −6 10 Bonferroni: 111 −8 10 Sidak: 113 −10 10 Holm: 112 −12 10 −14 Sidak adp.: 113 10 −16 10 FDR: 903 −18 10 0 500 1000 1500 S. Robin: Differential analysis of microarrays 14

Local False Discovery Rate FDR provides a general information about the risk of the whole procedure (up to step i ). We are interested in a specific risk, associated to each gene. Local FDR ( ℓFDR ). First defined by Efron & al. (JASA, 2001) in a mixture model framework: ℓFDR i := Pr { H 0 ( i ) is false | T i } . Derivative of the FDR: ℓFDR ( i ) can be also defined as the derivative of the FDR FDR ( t + h ) − FDR ( t ) ℓFDR ( t ) = lim h h ↓ 0 which can be estimated by n 0 ( P ( i ) − P ( i − 1) ) � (Aubert & al., BMC Bioinfo., 04). S. Robin: Differential analysis of microarrays 15

Estimation of the proportion n 1 /n The efficiency of all multiple testing procedures would be improved if n 0 was known. I { P i ≤ p } . Empirical cdf. The cumulative distribution function (cdf) of the p -value can be estimated via its empirical version: n � G ( p ) = 1 � n i =1 The cdf of the negative p -values is given by the uniform distribution: Pr { P i ≤ p | i ∈ H 0 } = p. Cdf mixture. Denoting F the cdf of the positive p -value, we have G ( p ) = aF ( p ) + (1 − a ) p, where a = n 1 /n. Above a certain threshold t , F ( p ) should be close to 1: G ( p ) ≃ a + (1 − a ) p. x > t : S. Robin: Differential analysis of microarrays 16

Empirical proportion. Storey & al, Genovese & Wasserman (JRSS-B, 02) propose an estimate of a based on this approximation: a = [1 − P ( t ) /n ] / (1 − t ) . � Linear regression. (1 − a ) can also be estimated by the coefficient of the linear regression of � G ( p ) wrt p 80 1 0.9 70 0.8 60 0.7 50 0.6 40 0.5 0.4 30 0.3 20 0.2 10 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 S. Robin: Differential analysis of microarrays 17

Mixture model Model: Posteriori probability: τ gk = Pr { g ∈ f k | x g } = π k f k ( x g ) /f ( x g ) f ( x ) = π 1 f 1 ( x ) + π 2 f 2 ( x ) + π 3 f 3 ( x ) τ gk (%) g = 1 g = 2 g = 3 k = 1 65 . 8 0 . 7 0 . 0 k = 2 34 . 2 47 . 8 0 . 0 k = 3 0 . 0 51 . 5 1 . 0 S. Robin: Differential analysis of microarrays 18

Distribution of the test statistic. Efron & al. (01) propose to describe the distribution of the test statistic T i using a mixture model. T i ∼ f ( t ) = p 1 f 1 ( t ) + p 0 f 0 ( t ) where both, a , f 0 and f 1 have are unknown. 0.5 f0 0.4 f 0.3 density 0.2 0.1 f1 0.0 -4 -2 0 2 4 Figure 2: Estimates of f ( � ) ; f ( � ) and f ( � ) for the situation of Figur e 1, mo del 0 1 Z value (3.3); p = : 189 , its minimum p ossible value. 1 S. Robin: Differential analysis of microarrays 19 12

Differential analysis of microarray data, Multiple testing problems - PowerPoint PPT Presentation

Differential analysis of microarray data, Multiple testing problems and Local False Discovery Rate. S. Robin robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math ematique et Informatique Appliqu ees Semi-parametric modeling joint work

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF

Differential expression analysis Mary Piper Bioinformatics Consultant and Trainer DataCamp

Microarray analysis at a glance from low-level data processing to data analysis Olga

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

Algorithmique des structures dARN H el` ene Touzet Groupe de travail COMATEGE

Stochastic models of protein production with feedback Renaud Dessalles joint work with Vincent

CSE527 Computational Biology http://www.cs.washington.edu/527 Larry Ruzzo Autumn 2007 UW CSE

Cancer Alliances Workshop (South Region) Thursday 9 June 2016 11:00 15:00

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa

Differential analysis of microarray data, Multiple testing problems - PowerPoint PPT Presentation

Differential analysis of microarray data, Multiple testing problems and Local False Discovery Rate. S. Robin robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math ematique et Informatique Appliqu ees Semi-parametric modeling joint work

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF

Differential expression analysis Mary Piper Bioinformatics Consultant and Trainer DataCamp

Microarray analysis at a glance from low-level data processing to data analysis Olga

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

Algorithmique des structures dARN H el` ene Touzet Groupe de travail COMATEGE

Stochastic models of protein production with feedback Renaud Dessalles joint work with Vincent

CSE527 Computational Biology http://www.cs.washington.edu/527 Larry Ruzzo Autumn 2007 UW CSE

Cancer Alliances Workshop (South Region) Thursday 9 June 2016 11:00 15:00

1 June 26. Punch-through detection using Muon Spectrometer Showers &amp; MET resolution

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution