Power, Sample Size, and the FDR Peter Dalgaard Department of - - PowerPoint PPT Presentation

power sample size and the fdr
SMART_READER_LITE
LIVE PREVIEW

Power, Sample Size, and the FDR Peter Dalgaard Department of - - PowerPoint PPT Presentation

Power, Sample Size, and the FDR Peter Dalgaard Department of Biostatistics University of Copenhagen Center for Bioinformatics, Univ.Copenhagen, June 2005 Sample Size How many observations do we need? Depends on Design Standard


slide-1
SLIDE 1

Power, Sample Size, and the FDR

Peter Dalgaard

Department of Biostatistics University of Copenhagen

Center for Bioinformatics, Univ.Copenhagen, June 2005

slide-2
SLIDE 2

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it
slide-3
SLIDE 3

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it
slide-4
SLIDE 4

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it
slide-5
SLIDE 5

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it
slide-6
SLIDE 6

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it
slide-7
SLIDE 7

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

slide-8
SLIDE 8

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

slide-9
SLIDE 9

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

slide-10
SLIDE 10

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

slide-11
SLIDE 11

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

slide-12
SLIDE 12

Variation of Observations and Means

−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 x dnorm(x, sd = sqrt(1/20))

slide-13
SLIDE 13

t Test

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 t dt(t, 38)

  • If there is no (true) difference then there is little chance of

getting an observation in the tails

  • If there is a difference, then the center of the distribution is

shifted.

slide-14
SLIDE 14

t Test

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 t dt(t, 38)

  • If there is no (true) difference then there is little chance of

getting an observation in the tails

  • If there is a difference, then the center of the distribution is

shifted.

slide-15
SLIDE 15

Type I and Type II Errors

A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.

slide-16
SLIDE 16

Type I and Type II Errors

A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.

slide-17
SLIDE 17

Type I and Type II Errors

A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.

slide-18
SLIDE 18

Type I and Type II Errors

A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.

slide-19
SLIDE 19

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

slide-20
SLIDE 20

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

slide-21
SLIDE 21

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

slide-22
SLIDE 22

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

slide-23
SLIDE 23

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

slide-24
SLIDE 24

Power, Sketch of Principle

−2 2 4 6 0.0 0.1 0.2 0.3 0.4 x dnorm(x)

(x axis in units of SEM)

slide-25
SLIDE 25

Size of SEM relative to δ

(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM

slide-26
SLIDE 26

Size of SEM relative to δ

(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM

slide-27
SLIDE 27

Size of SEM relative to δ

(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM

slide-28
SLIDE 28

Calculating n

Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide

slide-29
SLIDE 29

Calculating n

Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide

slide-30
SLIDE 30

Calculating n

Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide

slide-31
SLIDE 31

A Useful Table

f(α, β) = (z1−α/2 + z1−β)2 β α 0.05 0.1 0.2 0.5 0.1 10.82 8.56 6.18 2.71 0.05 12.99 10.51 7.85 3.84 0.02 15.77 13.02 10.04 5.41 0.01 17.81 14.88 11.68 6.63

slide-32
SLIDE 32

Two-Sample Test

Optimal to have equal group sizes. Then SEDM = s ×

  • 2

n and we get (two-tailed test, α = 0.05, 1 − β = 0.90) n = 2 × 3.242 × (σ/δ)2 e.g. for δ = σ: n = 2 × (3.24)2 = 21.0, i.e., 21 per group. Or in general n = 2 × (σ/δ)2 × f(α, β)

slide-33
SLIDE 33

Two-Sample Test

Optimal to have equal group sizes. Then SEDM = s ×

  • 2

n and we get (two-tailed test, α = 0.05, 1 − β = 0.90) n = 2 × 3.242 × (σ/δ)2 e.g. for δ = σ: n = 2 × (3.24)2 = 21.0, i.e., 21 per group. Or in general n = 2 × (σ/δ)2 × f(α, β)

slide-34
SLIDE 34

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

slide-35
SLIDE 35

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

slide-36
SLIDE 36

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

slide-37
SLIDE 37

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

slide-38
SLIDE 38

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

slide-39
SLIDE 39

The False Discovery Rate

  • Basic idea: We have a family of m null hypotheses, some
  • f which are false (i.e. there is an effect on those “genes”)
  • “On average”, we have a table like

Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m

  • FDR = r0/r proportion of rejects that are from true

hypotheses

  • Compare FWE: probability that r0 > 0. FDR at same level

allows more Type I errors if not all null hypotheses are true.

slide-40
SLIDE 40

The False Discovery Rate

  • Basic idea: We have a family of m null hypotheses, some
  • f which are false (i.e. there is an effect on those “genes”)
  • “On average”, we have a table like

Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m

  • FDR = r0/r proportion of rejects that are from true

hypotheses

  • Compare FWE: probability that r0 > 0. FDR at same level

allows more Type I errors if not all null hypotheses are true.

slide-41
SLIDE 41

The False Discovery Rate

  • Basic idea: We have a family of m null hypotheses, some
  • f which are false (i.e. there is an effect on those “genes”)
  • “On average”, we have a table like

Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m

  • FDR = r0/r proportion of rejects that are from true

hypotheses

  • Compare FWE: probability that r0 > 0. FDR at same level

allows more Type I errors if not all null hypotheses are true.

slide-42
SLIDE 42

The False Discovery Rate

  • Basic idea: We have a family of m null hypotheses, some
  • f which are false (i.e. there is an effect on those “genes”)
  • “On average”, we have a table like

Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m

  • FDR = r0/r proportion of rejects that are from true

hypotheses

  • Compare FWE: probability that r0 > 0. FDR at same level

allows more Type I errors if not all null hypotheses are true.

slide-43
SLIDE 43

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-44
SLIDE 44

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-45
SLIDE 45

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-46
SLIDE 46

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-47
SLIDE 47

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-48
SLIDE 48

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

slide-49
SLIDE 49

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-50
SLIDE 50

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-51
SLIDE 51

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-52
SLIDE 52

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-53
SLIDE 53

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-54
SLIDE 54

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-55
SLIDE 55

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung
slide-56
SLIDE 56

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung
slide-57
SLIDE 57

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung
slide-58
SLIDE 58

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung
slide-59
SLIDE 59

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung