Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 15, 2009 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 1 / 17
Introduction/Motivation The vast use of high throughput technologies involves testing thousands of hypotheses simultaneously. The field of multiple testing deals with developing methods to determine the level of significance in such a scenario. Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 2 / 17
Multiple testing m null hypotheses H 0 , i , ∀ i = 1 , 2 , .... m . Example: for m variables H 0 , i : µ A i = µ B i . Calculate p-values p i , and set a threshold for significance. The set of random variables ( U , V , T , S ), and parameters ( m , m 0 , m 1 ) describes this scenario: ”ground truth” non-rejected rejected total hypotheses hypotheses null hypothesis is true U V m 0 null hypothesis is false T S m 1 total m − R R m Fraction of false discoveries = V R Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 3 / 17
Control of the FDR= E ( V R + ) In 1995 Benjamini and Hochberg proposed a procedure (BH95) to control the FDR for a given set of p-values: Sort and re-label the p-values, p (1) ≤ p (2) ≤ ... ≤ p ( m ) . 1 Choose 0 ≤ q ≤ 1 the desired FDR level. 2 Define the set of constants α i = i m q , i = 1 , 2 , , , , m . 3 Identify R = max { i : p ( i ) ≤ α i } . 4 If R ≥ 1 reject all hypotheses ( i ) = 1 , 2 , , , R , else no hypothesis is 5 rejected. BH proved that FDR ≤ m 0 q m ≤ q . The bound is not tight, there is room for improvement. Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 4 / 17
Control vs. estimation, and improved procedures Control ( q ⇒ R ) - significance is preset at a desired level q . The procedure yields a set of rejected hypotheses with FDR ≤ q . Estimation ( R ⇒ q ) - the threshold is preset at a level that yields a desired number of rejections R . The corresponding FDR is estimated. There were many attempts to produce tighter bounds on the FDR, using an estimator for m 0 . Difficulty: the estimator ˆ m 0 is a fluctuating random variable. Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 5 / 17
Aim Produce an improved BH procedure using an estimator ˆ m 0 for m 0 , the number of true null hypotheses. Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 6 / 17
Definitions: monotonic estimator m ( m ) : [0 , 1] m → R , An estimator for m 0 is a family of functions ˆ m 0 ≡ ˆ 0 m 0 ≡ ˆ ˆ m 0 ( p 1 , .., p m ). ˆ m 0 is a monotonic estimator if it satisfies: m ( m ) m ( m ) ( p 1 , .., p ′ ˆ ( p 1 , .., p i , .., p m ) ≥ ˆ i , .., p m ), 1 0 0 if p i ≥ p ′ i , ∀ i = 1 , 2 , , , m , m ≥ 1 m ( m ) m ( m − 1) ˆ ( p 1 , .., p i , .., p m ) ≥ ˆ ( p 1 , .., p i − 1 , p i +1 , .., p m ), 2 0 0 ∀ i = 1 , 2 , , , m , m ≥ 2 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 7 / 17
Definitions: modified BH procedure ( m → ˆ m 0 ) Given m hypotheses of which m 0 are null, let p 1 , .., p m be the respective p-values. The modified BH procedure with estimator ˆ m 0 is: Compute ˆ m 0 ≡ ˆ m 0 ( p 1 , .., p m ). 1 Sort and relabel the p-values p (1) ≤ ... ≤ p ( m ) . 2 Define the set of constants q k = qk k = 1 , 2 ..., m . 3 ˆ m 0 Let R = max { k : p ( k ) ≤ q k } . 4 If R ≥ 1 reject p (1) , .., p ( R ) else don’t reject any hypothesis. 5 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 8 / 17
Theorem for control (see Benjamini et al 2006 (BKY)) Let ˆ m 0 ≡ ˆ m 0 ( p 1 , .., p m ) be a monotonic estimator for m 0 . Let m ( � 1) ˆ 0 ( p 1 , .., p m ) ≡ ˆ m 0 ( p 2 , .., p m ) be the same estimator, but disregarding the first p-value p 1 . Assume that the null p-values are i.i.d. U [0 , 1]. Then the modified BH procedure satisfies: � V � � � 1 FDR = E ≤ m 0 qE (1) R + m ( � 1) ˆ 0 � � 1 1 Note: if E ≤ m 0 , then FDR ≤ q . m ( � 1) ˆ 0 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 9 / 17
Two improved procedures The two procedures are based on the estimators: � �� s ( m ) , 2 � m � IBHsum: m 0 = C ( m ) · min ˆ m , max . j =1 p j 1 C ( m ) , s ( m ) are universal correction factors. (a) 1.03 1.02 C 1.01 1 2 3 4 5 10 10 10 10 (b) 0.3 0.2 s/m 0.1 0 2 3 4 5 m 10 10 10 10 m 0 = 2 − � m IBHlog: ˜ i =1 log(1 − p i ) . 2 Both procedures satisfy E ( V / R + ) ≤ q Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 10 / 17
Performance: simulations (IBHsum) ρ is the correlation between test statistics: q=0.05, ρ =0 q=0.05, ρ =0.8 4 4 0.06 0 0.05 . 0 3.5 3.5 4 0.03 0 0.03 0.02 . 0 5 0 3 3 . 0 0 . 0 3 4 5 0.045 2.5 2.5 µ 1 µ 1 4 0 . 0 2 2 0.04 0.045 0.015 0.035 0.03 1.5 0 1.5 . 0 0.04 2 5 1 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 m 0 /m m 0 /m q=0.2, ρ =0 q=0.2, ρ =0.8 4 4 0.18 0.16 0.1 3.5 3.5 8 4 . 1 0 0 . 0.18 0 0.12 3 3 0.1 0.14 0.16 2.5 2.5 µ 1 0 µ 1 . 0.12 1 8 6 0 1 . 0 . 2 2 0 0.16 0.08 1.5 8 1.5 0 . 1 4 1 0.1 0 . 0 6 0 0.12 . . 0 0 0.08 6 1 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 m 0 /m m 0 /m Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 11 / 17
Compare to other methods Results from simulations, m = 500 , µ 1 = 3 . 5: ρ =0 ρ =0.8 0.14 0.14 0.13 0.13 0.12 0.12 0.11 0.11 0.1 0.1 0.09 0.09 0.08 0.08 E(V/R) 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Oracle BH (1995) BKY (Benjamini et al 2006) STS (Storey et al 2004) IBHsum IBHlog 1 1 0.95 0.95 0.9 0.9 0.85 0.85 E(S)/m 1 0.8 0.8 0.75 0.75 0.7 0.7 0.65 0.65 0.6 0.6 0.55 0.55 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m 0 /m m 0 /m Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 12 / 17
Applying to 33 gene expression datasets Large number of discoveries Small number of discoveries 1 1 0.8 0.8 0.6 0.6 Two tailed p 0.4 0.4 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 0.8 0.8 0.6 0.6 One tailed p 0.4 0.4 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 i/m i/m Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 13 / 17
q BKY STS IBHsum IBHlog a. Two tailed, large number of discoveries (10 studies) 1.110 1.239 1.200 1.222 0.05 (0.043) (0.138) (0.110) (0.130) b. Two tailed, small number of discoveries (10 studies) 1.003 1.316 1.231 1.291 0.05 (0.003) (0.197) (0.140) (0.179) c. One tailed, large number of discoveries (8 studies) 1.049 1.011 1.014 0.108 0.05 (0.019) (0.033) (0.026) (0.306) d. One tailed, small number of discoveries (5 studies) 0.998 1.027 1.025 0.882 0.05 (0.020) (0.052) (0.017) (0.123) Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 14 / 17
Summary and conclusions We proved a theorem that provides a bound on the FDR for any improved procedure based on an monotonic estimator of m 0 . We proposed two improved procedures based on the estimators 2 � m j =1 p j and � m i =1 log(1 − p i ). For the case of independent statistics all improved procedures provide similar results: saturation of the bound and more power than BH95. We showed by simulations that even in the case of dependent statistics our procedures provide a reliable bound and improved power. For real gene expression data, where dependencies are expected, our methods improve, in general, over existing ones. Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 15 / 17
Recommend
More recommend