Sample Size Power, Sample Size, and the FDR How many observations - - PDF document

sample size power sample size and the fdr
SMART_READER_LITE
LIVE PREVIEW

Sample Size Power, Sample Size, and the FDR How many observations - - PDF document

Sample Size Power, Sample Size, and the FDR How many observations do we need? Depends on Peter Dalgaard Design Standard error of measurements Department of Biostatistics Effect size University of Copenhagen How sure you


slide-1
SLIDE 1

Power, Sample Size, and the FDR

Peter Dalgaard

Department of Biostatistics University of Copenhagen

Center for Bioinformatics, Univ.Copenhagen, June 2005

Sample Size

“How many observations do we need?” Depends on

  • Design
  • Standard error of measurements
  • Effect size
  • How sure you want to be of finding it

Reminders

(Continuous data) One-sample (or paired, differences): SEM = s ×

  • 1/n

Significance if |¯ x − µ0| SEM > t.975(DF) Two-sample: SEDM = s ×

  • 1/n1 + 1/n2

|¯ x1 − ¯ x2| SEDM > t.975(DF) t.975(DF) ≈ 2. Notice that SE(D)M decreases with n.

Variation of Observations and Means

−3 −2 −1 1 2 3 0.0 0.5 1.0 1.5 x dnorm(x, sd = sqrt(1/20))

t Test

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 t dt(t, 38)

  • If there is no (true) difference then there is little chance of

getting an observation in the tails

  • If there is a difference, then the center of the distribution is

shifted.

Type I and Type II Errors

A test of a hypothesis can go wrong in two ways: Type I error: Rejecting a true null hypothesis Type II error: Accepting a false null hypothesis Error probabilities: α resp. β α: Significance level (0.05, e.g.) 1 − β: Power – probability of detecting difference Notice that the power depends on the effect size as well as on the number of observations and significance level.

Calculating n – Preliminaries

  • (First consider one-sample case)
  • Wish to find difference of δ = µ − µ0 (“clinically relevant

difference”),

  • Naive guess: n should satisfy δ = 2 × SEM?
  • But the observed difference is not precisely δ. It is smaller

with 50% probability, and then it wouldn’t be significant.

  • We need to make SEM so small that there is a high

probability of getting a significant result

Power, Sketch of Principle

−2 2 4 6 0.0 0.1 0.2 0.3 0.4 x dnorm(x)

(x axis in units of SEM)

slide-2
SLIDE 2

Size of SEM relative to δ

(Notice: These formulas assume known SD. Watch out if n is very small. More accurate formulas in R’s power.t.test) zp quantiles in normal distribution, z0.975 = 1.96, etc. Two-tailed test, α = 0.05, power 1 − β = 0.90 δ = (1.96 + k) × SEM k is distance between middle and right peak in slide 8. Find k so that there is a probability of 0.90 of observing a difference of at least 1.96 × SEM. k = −z0.10 = z0.90 z0.90 = 1.28, so δ = 3.24 × SEM

Calculating n

Just insert SEM = σ/√n in δ = 3.24 × SEM and solve for n: n = (3.24 × σ/δ)2 (for two-sided test at level α = 0.05, with power 1 − β = 0.90) General formula for arbitrary α and β: n = ((z1−α/2 + z1−β) × (σ/δ))2 = (σ/δ)2 × f(α, β) next slide

A Useful Table

f(α, β) = (z1−α/2 + z1−β)2 β α 0.05 0.1 0.2 0.5 0.1 10.82 8.56 6.18 2.71 0.05 12.99 10.51 7.85 3.84 0.02 15.77 13.02 10.04 5.41 0.01 17.81 14.88 11.68 6.63

Two-Sample Test

Optimal to have equal group sizes. Then SEDM = s ×

  • 2

n and we get (two-tailed test, α = 0.05, 1 − β = 0.90) n = 2 × 3.242 × (σ/δ)2 e.g. for δ = σ: n = 2 × (3.24)2 = 21.0, i.e., 21 per group. Or in general n = 2 × (σ/δ)2 × f(α, β)

Multiple Tests

  • Traditional significance tests and power calculations are

designed for testing one and only one null hypothesis

  • Modern screening procedures (microarrays, genome

scans) generate thousands of individual tests

  • One traditional approach is the Bonferroni correction:

Adjust p values by multiplication with number of tests.

  • This controls the familywise error rate (FWE): Risk of

making at least one Type I error

  • However, this tends to give low power. An alternative is the

False Discovery Rate (FDR)

The False Discovery Rate

  • Basic idea: We have a family of m null hypotheses, some
  • f which are false (i.e. there is an effect on those “genes”)
  • “On average”, we have a table like

Accept Reject Total True hypotheses a0 r0 m0 False hypotheses a1 r1 m1 Total a r m

  • FDR = r0/r proportion of rejects that are from true

hypotheses

  • Compare FWE: probability that r0 > 0. FDR at same level

allows more Type I errors if not all null hypotheses are true.

Controlling the FDR

  • The FDR sounds like a good idea, but you don’t know r0,

so what should you do?

  • Benjamini and Hochberg step-up procedure:
  • Ensures that the FDR is at most q under some

assumptions

  • Sort the unadjusted p values
  • Compare in turn with q × i/m and reject until

non-significance

  • I.e. the first critical value is as in Bonferroni correction, the

2nd is twice as big, etc.

Assumptions for Step-Up Procedure

  • Works if tests are independent
  • — Or positively correlated (which is not necessarily the

case)

  • Benjamini and Yekutieli: Replace q with q/(m

1 1/i) in the

procedure and FDR is less than q for any correlation pattern.

  • Notice that tbe divisor in the B-Y procedure is quite big:

≈ ln m − 0.5772. For m = 10000 it is 9.8.

  • The B-Y procedure is probably way too conservative in

practice

  • Resampling procedures seem like a better approach to the

correlation issue

slide-3
SLIDE 3

FDR and Sample Size Calculations

  • This is tricky . . .
  • VERY recent research. I don’t think there is full consensus

about what to do

  • Interesting papers coming up in Bioinformatics (April 21,

Advance Access)

  • Pawitan et al.
  • Sin-Ho Jung