SLIDE 1
Power, Sample Size, and the FDR
Peter Dalgaard
Department of Biostatistics University of Copenhagen
Center for Bioinformatics, Univ.Copenhagen, June 2005
SLIDE 2 Sample Size
“How many observations do we need?” Depends on
- Design
- Standard error of measurements
- Effect size
- How sure you want to be of finding it
SLIDE 3 Sample Size
“How many observations do we need?” Depends on
- Design
- Standard error of measurements
- Effect size
- How sure you want to be of finding it
SLIDE 4 Sample Size
“How many observations do we need?” Depends on
- Design
- Standard error of measurements
- Effect size
- How sure you want to be of finding it
SLIDE 5 Sample Size
“How many observations do we need?” Depends on
- Design
- Standard error of measurements
- Effect size
- How sure you want to be of finding it
SLIDE 6 Sample Size
“How many observations do we need?” Depends on
- Design
- Standard error of measurements
- Effect size
- How sure you want to be of finding it
SLIDE 7 Reminders
(Continuous data) One-sample (or paired, differences): SEM = s ×
Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×
|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.
SLIDE 8 Reminders
(Continuous data) One-sample (or paired, differences): SEM = s ×
Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×
|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.
SLIDE 9 Reminders
(Continuous data) One-sample (or paired, differences): SEM = s ×
Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×
|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.
SLIDE 10 Reminders
(Continuous data) One-sample (or paired, differences): SEM = s ×
Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×
|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.
SLIDE 11 Reminders
(Continuous data) One-sample (or paired, differences): SEM = s ×
Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×
|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.
SLIDE 12
Variation of Observations and Means
−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 x dnorm(x, sd = sqrt(1/20))
SLIDE 13 t Test
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 t dt(t, 38)
- If there is no (true) difference then there is little chance of
getting an observation in the tails
- If there is a difference, then the center of the distribution is
shifted.
SLIDE 14 t Test
−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 t dt(t, 38)
- If there is no (true) difference then there is little chance of
getting an observation in the tails
- If there is a difference, then the center of the distribution is
shifted.
SLIDE 15
Type I and Type II Errors
A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.
SLIDE 16
Type I and Type II Errors
A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.
SLIDE 17
Type I and Type II Errors
A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.
SLIDE 18
Type I and Type II Errors
A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.
SLIDE 19 Calculating n – Preliminaries
- (First consider one-sample case)
- Wish to find difference of δ = µ − µ0 (“clinically relevant
difference”),
- Naive guess: n should satisfy δ = 2 × SEM?
- But the observed difference is not precisely δ. It is smaller
with 50% probability, and then it wouldn’t be significant.
- We need to make SEM so small that there is a high
probability of getting a significant result
SLIDE 20 Calculating n – Preliminaries
- (First consider one-sample case)
- Wish to find difference of δ = µ − µ0 (“clinically relevant
difference”),
- Naive guess: n should satisfy δ = 2 × SEM?
- But the observed difference is not precisely δ. It is smaller
with 50% probability, and then it wouldn’t be significant.
- We need to make SEM so small that there is a high
probability of getting a significant result
SLIDE 21 Calculating n – Preliminaries
- (First consider one-sample case)
- Wish to find difference of δ = µ − µ0 (“clinically relevant
difference”),
- Naive guess: n should satisfy δ = 2 × SEM?
- But the observed difference is not precisely δ. It is smaller
with 50% probability, and then it wouldn’t be significant.
- We need to make SEM so small that there is a high
probability of getting a significant result
SLIDE 22 Calculating n – Preliminaries
- (First consider one-sample case)
- Wish to find difference of δ = µ − µ0 (“clinically relevant
difference”),
- Naive guess: n should satisfy δ = 2 × SEM?
- But the observed difference is not precisely δ. It is smaller
with 50% probability, and then it wouldn’t be significant.
- We need to make SEM so small that there is a high
probability of getting a significant result
SLIDE 23 Calculating n – Preliminaries
- (First consider one-sample case)
- Wish to find difference of δ = µ − µ0 (“clinically relevant
difference”),
- Naive guess: n should satisfy δ = 2 × SEM?
- But the observed difference is not precisely δ. It is smaller
with 50% probability, and then it wouldn’t be significant.
- We need to make SEM so small that there is a high
probability of getting a significant result
SLIDE 24
Power, Sketch of Principle
−2 2 4 6 0.0 0.1 0.2 0.3 0.4 x dnorm(x)
(x axis in units of SEM)
SLIDE 25
Size of SEM relative to δ
(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM
SLIDE 26
Size of SEM relative to δ
(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM
SLIDE 27
Size of SEM relative to δ
(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM
SLIDE 28
Calculating n
Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide
SLIDE 29
Calculating n
Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide
SLIDE 30
Calculating n
Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide
SLIDE 31
A Useful Table
f(α, β) = (z1−α/2 + z1−β)2 β α 0.05 0.1 0.2 0.5 0.1 10.82 8.56 6.18 2.71 0.05 12.99 10.51 7.85 3.84 0.02 15.77 13.02 10.04 5.41 0.01 17.81 14.88 11.68 6.63
SLIDE 32 Two-Sample Test
Optimal to have equal group sizes. Then SEDM = s ×
n and we get (two-tailed test, α = 0.05, 1 − β = 0.90) n = 2 × 3.242 × (σ/δ)2 e.g. for δ = σ: n = 2 × (3.24)2 = 21.0, i.e., 21 per group. Or in general n = 2 × (σ/δ)2 × f(α, β)
SLIDE 33 Two-Sample Test
Optimal to have equal group sizes. Then SEDM = s ×
n and we get (two-tailed test, α = 0.05, 1 − β = 0.90) n = 2 × 3.242 × (σ/δ)2 e.g. for δ = σ: n = 2 × (3.24)2 = 21.0, i.e., 21 per group. Or in general n = 2 × (σ/δ)2 × f(α, β)
SLIDE 34 Multiple Tests
- Traditional significance tests and power calculations are
designed for testing one and only one null hypothesis
- Modern screening procedures (microarrays, genome
scans) generate thousands of individual tests
- One traditional approach is the Bonferroni correction:
Adjust p values by multiplication with number of tests.
- This controls the familywise error rate (FWE): Risk of
making at least one Type I error
- However, this tends to give low power. An alternative is the
False Discovery Rate (FDR)
SLIDE 35 Multiple Tests
- Traditional significance tests and power calculations are
designed for testing one and only one null hypothesis
- Modern screening procedures (microarrays, genome
scans) generate thousands of individual tests
- One traditional approach is the Bonferroni correction:
Adjust p values by multiplication with number of tests.
- This controls the familywise error rate (FWE): Risk of
making at least one Type I error
- However, this tends to give low power. An alternative is the
False Discovery Rate (FDR)
SLIDE 36 Multiple Tests
- Traditional significance tests and power calculations are
designed for testing one and only one null hypothesis
- Modern screening procedures (microarrays, genome
scans) generate thousands of individual tests
- One traditional approach is the Bonferroni correction:
Adjust p values by multiplication with number of tests.
- This controls the familywise error rate (FWE): Risk of
making at least one Type I error
- However, this tends to give low power. An alternative is the
False Discovery Rate (FDR)
SLIDE 37 Multiple Tests
- Traditional significance tests and power calculations are
designed for testing one and only one null hypothesis
- Modern screening procedures (microarrays, genome
scans) generate thousands of individual tests
- One traditional approach is the Bonferroni correction:
Adjust p values by multiplication with number of tests.
- This controls the familywise error rate (FWE): Risk of
making at least one Type I error
- However, this tends to give low power. An alternative is the
False Discovery Rate (FDR)
SLIDE 38 Multiple Tests
- Traditional significance tests and power calculations are
designed for testing one and only one null hypothesis
- Modern screening procedures (microarrays, genome
scans) generate thousands of individual tests
- One traditional approach is the Bonferroni correction:
Adjust p values by multiplication with number of tests.
- This controls the familywise error rate (FWE): Risk of
making at least one Type I error
- However, this tends to give low power. An alternative is the
False Discovery Rate (FDR)
SLIDE 39 The False Discovery Rate
- Basic idea: We have a family of m null hypotheses, some
- f which are false (i.e. there is an effect on those “genes”)
- “On average”, we have a table like
Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m
- FDR = r0/r proportion of rejects that are from true
hypotheses
- Compare FWE: probability that r0 > 0. FDR at same level
allows more Type I errors if not all null hypotheses are true.
SLIDE 40 The False Discovery Rate
- Basic idea: We have a family of m null hypotheses, some
- f which are false (i.e. there is an effect on those “genes”)
- “On average”, we have a table like
Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m
- FDR = r0/r proportion of rejects that are from true
hypotheses
- Compare FWE: probability that r0 > 0. FDR at same level
allows more Type I errors if not all null hypotheses are true.
SLIDE 41 The False Discovery Rate
- Basic idea: We have a family of m null hypotheses, some
- f which are false (i.e. there is an effect on those “genes”)
- “On average”, we have a table like
Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m
- FDR = r0/r proportion of rejects that are from true
hypotheses
- Compare FWE: probability that r0 > 0. FDR at same level
allows more Type I errors if not all null hypotheses are true.
SLIDE 42 The False Discovery Rate
- Basic idea: We have a family of m null hypotheses, some
- f which are false (i.e. there is an effect on those “genes”)
- “On average”, we have a table like
Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m
- FDR = r0/r proportion of rejects that are from true
hypotheses
- Compare FWE: probability that r0 > 0. FDR at same level
allows more Type I errors if not all null hypotheses are true.
SLIDE 43 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 44 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 45 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 46 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 47 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 48 Controlling the FDR
- The FDR sounds like a good idea, but you don’t know r0,
so what should you do?
- Benjamini and Hochberg step-up procedure:
- Ensures that the FDR is at most q under some
assumptions
- Sort the unadjusted p values
- Compare in turn with q × i/m and reject until
non-significance
- I.e. the first critical value is as in Bonferroni correction, the
2nd is twice as big, etc.
SLIDE 49 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 50 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 51 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 52 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 53 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 54 Assumptions for Step-Up Procedure
- Works if tests are independent
- — Or positively correlated (which is not necessarily the
case)
- Benjamini and Yekutieli: Replace q with q/(m
1 1/i) in the
procedure and FDR is less than q for any correlation pattern.
- Notice that tbe divisor in the B-Y procedure is quite big:
≈ ln m − 0.5772. For m = 10000 it is 9.8.
- The B-Y procedure is probably way too conservative in
practice
- Resampling procedures seem like a better approach to the
correlation issue
SLIDE 55 FDR and Sample Size Calculations
- This is tricky . . .
- VERY recent research. I don’t think there is full consensus
about what to do
- Interesting papers coming up in Bioinformatics (April 21,
Advance Access)
- Pawitan et al.
- Sin-Ho Jung
SLIDE 56 FDR and Sample Size Calculations
- This is tricky . . .
- VERY recent research. I don’t think there is full consensus
about what to do
- Interesting papers coming up in Bioinformatics (April 21,
Advance Access)
- Pawitan et al.
- Sin-Ho Jung
SLIDE 57 FDR and Sample Size Calculations
- This is tricky . . .
- VERY recent research. I don’t think there is full consensus
about what to do
- Interesting papers coming up in Bioinformatics (April 21,
Advance Access)
- Pawitan et al.
- Sin-Ho Jung
SLIDE 58 FDR and Sample Size Calculations
- This is tricky . . .
- VERY recent research. I don’t think there is full consensus
about what to do
- Interesting papers coming up in Bioinformatics (April 21,
Advance Access)
- Pawitan et al.
- Sin-Ho Jung
SLIDE 59 FDR and Sample Size Calculations
- This is tricky . . .
- VERY recent research. I don’t think there is full consensus
about what to do
- Interesting papers coming up in Bioinformatics (April 21,
Advance Access)
- Pawitan et al.
- Sin-Ho Jung