Hommels Method for False Discovery Proportions Jelle Goeman Joint - - PowerPoint PPT Presentation

hommel s method for false discovery proportions
SMART_READER_LITE
LIVE PREVIEW

Hommels Method for False Discovery Proportions Jelle Goeman Joint - - PowerPoint PPT Presentation

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion Hommels Method for False Discovery Proportions Jelle Goeman Joint work with: Aldo Solari, Rosa Meijer Van Dantzig, 2016-02-26


slide-1
SLIDE 1

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Hommel’s Method for False Discovery Proportions

Jelle Goeman

Joint work with: Aldo Solari, Rosa Meijer

Van Dantzig, 2016-02-26

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-2
SLIDE 2

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . Familywise error control 95% conf.: no false positives

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-3
SLIDE 3

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . Familywise error control 95% conf.: no false positives False discovery rate control Expected prop. of false positives < 5%

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-4
SLIDE 4

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . Familywise error control 95% conf.: no false positives False discovery rate control Expected prop. of false positives < 5% Practice Genes chosen for validation

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-5
SLIDE 5

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . Familywise error control 95% conf.: no false positives False discovery rate control Expected prop. of false positives < 5% Practice Genes chosen for validation Question How many false positives to expect?

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-6
SLIDE 6

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Set-up

Hypotheses H1, . . . , Hm True hypotheses T ⊆ {1, . . . , m} indices of true hypotheses Rejections R ⊆ {1, . . . , m} set of rejected hypotheses (usually random) Type I errors T ∩ R ⊆ {1, . . . , m}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-7
SLIDE 7

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

FWER, FDR, k-FWER

User role Before seeing the data choose error rate to be controlled FWER: P(T ∩ R = ∅) FDR: E #(T ∩ R) #R ∨ 1

  • Procedure

Chooses R that controls the chosen error rate

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-8
SLIDE 8

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

FWER, FDR, k-FWER

User role Before seeing the data choose error rate to be controlled FWER: P(T ∩ R = ∅) FDR: E #(T ∩ R) #R ∨ 1

  • Procedure

Chooses R that controls the chosen error rate Problem R is often too small or too large R based on p-values only “Take it or leave it”

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-9
SLIDE 9

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Alterative: simultaneous control

Role of the user The user selects collection of hypotheses R freely and post hoc

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-10
SLIDE 10

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Alterative: simultaneous control

Role of the user The user selects collection of hypotheses R freely and post hoc Role of the multiple testing procedure Inform user of the number/proportion of false rejections incurred

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-11
SLIDE 11

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Alterative: simultaneous control

Role of the user The user selects collection of hypotheses R freely and post hoc Role of the multiple testing procedure Inform user of the number/proportion of false rejections incurred Number of false rejections = #(T ∩ R) = function of the model parameters = something we can estimate or make a confidence interval for

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-12
SLIDE 12

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Alterative: simultaneous control

Role of the user The user selects collection of hypotheses R freely and post hoc Role of the multiple testing procedure Inform user of the number/proportion of false rejections incurred Number of false rejections = #(T ∩ R) = function of the model parameters = something we can estimate or make a confidence interval for Post hoc If we make a simultaneous CI, post hoc choice of R is allowed

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-13
SLIDE 13

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed Testing: ingredients

Marcus, Peritz and Gabriel (1976) Fundamental principle of FWER control Intersection hypothesis HC =

i∈C Hi, for C ⊆ {1, . . . , m}

Closure Collection of all intersection hypotheses C =

  • HC : C ⊆ {1, . . . , m}
  • Local test

Valid α-level test for every intersection hypothesis

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-14
SLIDE 14

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed testing (graphically)

A B C

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-15
SLIDE 15

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed testing (graphically)

A B C A ∩ B ∩ C A ∩ C B ∩ C A ∩ B

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-16
SLIDE 16

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed testing: procedure

Raw rejections Hypotheses U ⊆ C rejected by the local test Multiplicity-rejected rejections Reject H ∈ C if J ∈ U for every J ⊆ H

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-17
SLIDE 17

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed testing: procedure

Raw rejections Hypotheses U ⊆ C rejected by the local test Multiplicity-rejected rejections Reject H ∈ C if J ∈ U for every J ⊆ H Statement P(R ∩ T = ∅) ≥ 1 − α with R = {C ∈ C : C rejected} and T = {C ∈ C : C true}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-18
SLIDE 18

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Closed testing: procedure

Raw rejections Hypotheses U ⊆ C rejected by the local test Multiplicity-rejected rejections Reject H ∈ C if J ∈ U for every J ⊆ H Statement P(R ∩ T = ∅) ≥ 1 − α with R = {C ∈ C : C rejected} and T = {C ∈ C : C true} Proof {R ∩ T = ∅} ⊇ {HT / ∈ U}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-19
SLIDE 19

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Consonance

Traditionally, only rejection of elementary hypotheses is of interest A ∩ B ∩ C A B C A ∩ B A ∩ C B ∩ C The closed graph of hypotheses A, B and C

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-20
SLIDE 20

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Consonance

Traditionally, only rejection of elementary hypotheses is of interest A ∩ B ∩ C A ∩ B ∩ C A ∩ B ∩ C A B C A ∩ B A ∩ B A ∩ C A ∩ C A ∩ C B ∩ C Consonant rejections

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-21
SLIDE 21

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Consonance

Traditionally, only rejection of elementary hypotheses is of interest A ∩ B ∩ C A ∩ B ∩ C A ∩ B ∩ C A B C A ∩ B A ∩ B A ∩ C A ∩ C A ∩ C B ∩ C B ∩ C B ∩ C Non-consonant rejections of A ∩ B, A ∩ C, B ∩ C

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-22
SLIDE 22

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Parameter, confidence bound and coverage

Parameter τ(R) = #(T ∩ R) for a fixed set R Closed testing Let X be the collection of hypotheses rejected Confidence bound tα(R) = max(#C : C ⊆ R, HC / ∈ X}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-23
SLIDE 23

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

In the example

A ∩ B ∩ C A ∩ B ∩ C A ∩ B ∩ C A B C A ∩ B A ∩ B A ∩ C A ∩ C A ∩ C B ∩ C B ∩ C B ∩ C tα({A, B, C}) = 1

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-24
SLIDE 24

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Coverage

Coverage statement P(τ(R) ≤ tα(R)) ≥ 1 − α

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-25
SLIDE 25

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Coverage

Coverage statement P(τ(R) ≤ tα(R)) ≥ 1 − α Proof {τ(R) ≤ tα(R)} ⊆ {HT / ∈ U}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-26
SLIDE 26

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Coverage

Coverage statement P(τ(R) ≤ tα(R)) ≥ 1 − α Proof {τ(R) ≤ tα(R)} ⊆ {HT / ∈ U} Confidence set Trivial lower bound τ(R) ≥ 0: confidence set {0, . . . , tα(R)} Confidence set for φ(R) = #R − τ(R) immediate Confidence set for FDP = φ(R)/#R immediate

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-27
SLIDE 27

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Coverage

Coverage statement P(τ(R) ≤ tα(R)) ≥ 1 − α Proof {τ(R) ≤ tα(R)} ⊆ {HT / ∈ U} Confidence set Trivial lower bound τ(R) ≥ 0: confidence set {0, . . . , tα(R)} Confidence set for φ(R) = #R − τ(R) immediate Confidence set for FDP = φ(R)/#R immediate Simultaneous control over all R Consequence: coverage robust against post hoc selection of R

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-28
SLIDE 28

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Reject hypotheses

R confidence set for τ(R) confidence set for φ(R) {A} {0,1} {0,1} {B} {0,1} {0,1} {C} {0,1} {0,1} {A, B} {0,1} {1,2} {A, C} {0,1} {1,2} {B, C} {0,1} {1,2} {A, B, C} {0,1} {2,3}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-29
SLIDE 29

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Bonus: an estimate

Point estimate of FDP Take confidence bound at α = 1/2 Property (immediate) FDP overestimated at most with probability 0.5 Reporting (classical!) FDP estimate and confidence bound Single hypothesis Estimated false if p < 0.5; confidently false if p < 0.05

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-30
SLIDE 30

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Shortcuts

General Procedure can be used for any local test Number of intersection hypotheses 2m − 1: computationally prohibitive above ≈20 hypotheses Concept: shortcut Smart choice of local test to save calculations Smart choice of local test Also crucial for the power properties of the procedure

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-31
SLIDE 31

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Simes’ inequality

  • rank of p−value

p−value 1 1000 2000 3000 4000 4919 0.5 1

Sorted p-value curve and lower confidence bound

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-32
SLIDE 32

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Local test based on Simes’ inequality

Simes’ inequality With probability ≥ 1 − α, we have p(i:T) >

iα #T for all

i = 1, . . . , #T. where p(i:I) is the Ith smallest p-value among pi, i ∈ I. Use Simes as local test Reject if any p(i:I) ≤ iα

#I

Assumptions (Sarkar, Yekutieli and others) Generally assumed valid for two-sided asymptotically normal tests Variant without assumptions (conservative) Reject if any p(i) ≤

iα kb(k) with b(k) = k s=1 1/s

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-33
SLIDE 33

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Example: α/2 < pA ≤ pB ≤ pC ≤ 2α/3 and pD > α

ABCD

pC ≤ 3

4 α

ABC

pC ≤ α pB ≤ 2

3 α

ABD

pB ≤ 2

3 α

ACD

pC ≤ 2

3 α

BCD

pC ≤ 2

3 α

AB

pB ≤ α

AC

pC ≤ α

AD BC

pC ≤ α

BD CD A

pA ≤ α

B

pB ≤ α

C

pC ≤ α

D

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-34
SLIDE 34

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Shortcut

Lemma HI is rejected in closed testing with Simes local tests at level α iff there is an i ∈ I such that p(i:I) ≤ iα j(α) Crucial quantity j(α) All HI with |I| > j(α) are rejected At least one HI with |I| = j(α) is not rejected j(α) = max{s ∈ 1, . . . , m : p(m−s+k) > kα/s for k = 1, . . . , s}

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-35
SLIDE 35

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Calculating j(α)

Steps of j(α) j(α) jumps from s to s − 1 (s = 1, . . . , m) at αs = min

j=1,...,s

s · pm−s+k k Naive calculation of αs, s = 1, . . . , m Order m2 steps Use lemma (next slide) Reduce to m log(m) steps

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-36
SLIDE 36

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Calculating αs

To calculate αm, . . . , α1 are minima of columns of matrix M =        p1 p2/2 p2 p3/3 p3/2 p3 . . . . . . ... pm/m pm/(m − 1) · · · pm        Lemma Row location of minimum is non-increasing Find minima in m log(m) time By starting in the middle column

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-37
SLIDE 37

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Calculating the confidence bound tα(R)

Category Find category ci =

  • j(α)

α pi

  • for all i ∈ R

Then (1 − α) confidence lower bound for τ(R) tα(R) = #R − maxr=1,...,#R

  • 1 − r + #{ci ≤ r}
  • Computation

Linear complexity

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-38
SLIDE 38

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Relationship with Hommel

Hommel’s procedure FWER control Uniformly better than Hochberg’s procedure Also based on closed testing plus Simes Relationship with Hommel R rejected by Hommel → bound tα(R) = 0 Improvements Better bounds by exploiting non-consonant rejections Faster algorithm (order m log(m) instead of classical m2

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-39
SLIDE 39

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Relationship with Benjamini/Hochberg

Assumptions ≈ same assumptions; same weak FWER control Lemma Let R with r = |R| and mp(r:R)/r = q ≤ α. Then tα(R)/r ≤ j(α)q

mα .

Colloquially Set R with maximal FDR-corrected p-value q has (1 − α)-confidence of FDP ≤ q/α Consequences FDR rejected set R has FDP estimate < 0.10 FDR rejected set R has (1 − α)-confidence of FDP < 1

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-40
SLIDE 40

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Scalability

Assume non-vanishing alternative #T/m → const < 1 as m → ∞ FWER methods as m → ∞: not scalable Rejected set → ∅ Adjusted p-values → 1 FDR methods as m → ∞: scalable (under condition) Rejected set R has #R → const > 0 Adjusted p-value ˜ p(cm) → const < 1 If FDR scales, FDP confidence scales too ∃R with #R/m → const > 0 so that tα(R)/#R → const < 1

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-41
SLIDE 41

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . How many false positives to expect? 95% conf.: max. 1 false positive Estimated number of false positives no false positives

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-42
SLIDE 42

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Data analysis in genomics

Top differential expression Gene p-value XDH 5.5e-10 NEK3 6.7e-7 TAF5 7.1e-7 CYP2A7 1.6e-6 NAT2 1.8e-6 ZNF19 2.6e-6 SKP1 2.7e-6 NAT1 3.1e-6 GDF3 2.0e-5 CCDC25 2.1e-5 . . . . . . How many false positives to expect? 95% conf.: max. 1 false positive Estimated number of false positives no false positives

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-43
SLIDE 43

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Example: Rosenwald DLBCL data

Data 240 diffuse large B-cell lymphoma patients; 7399 hypotheses Classical results Bonferroni, Holm, Hocherg, Hommel: 4 hypotheses Benjamini and Hochberg: 72 hypotheses

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-44
SLIDE 44

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

# false hypotheses among top k p-values

10 20 30 40 50 60 70 number of hypotheses number of false hypotheses 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 point estimate 95% confidence bound

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-45
SLIDE 45

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

FDP estimates and bounds: top k p-values

0.0 0.2 0.4 0.6 0.8 1.0 number of hypotheses FDP 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 point estimate 95% confidence bound

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-46
SLIDE 46

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Conclusion

New method Between weak and strong FWER control Counting false positives: tail probabilities for FDP Nothing new Just closed testing and simultaneous confidence sets But free additional statements relative to classical Hommel Fast algorithms Reduced from exponential to m log(m) complexity Side effect: fast algorithm for Hommel’s procedure Simultaneous but still scalable Rejections don’t vanish when m → ∞

Hommel’s Method for False Discovery Proportions Jelle Goeman

slide-47
SLIDE 47

Exploratory data analysis Closed testing A Confidence Set Simes Relationships Applications Discussion

Read more?

Goeman JJ and Solari A (2011) Multiple Testing for Exploratory Research. Statistical Science 26:584–597 and 608–612 Goeman JJ and Solari A (2014) Tutorial in Biostatistics: Multiple Hypothesis Testing in Genomics. Statistics in Medicine, 23 (11) 1946–1978 Meijer RJ, Krebs T, Solari A and Goeman JJ (2015) Extending Hommel’s method In preparation Goeman JJ, Solari A, Meijer RJ cherry R package cran.r-project.org

Hommel’s Method for False Discovery Proportions Jelle Goeman