On the Optimum Number of Hypotheses to Test when the Number of - - PowerPoint PPT Presentation

on the optimum number of hypotheses to test when the
SMART_READER_LITE
LIVE PREVIEW

On the Optimum Number of Hypotheses to Test when the Number of - - PowerPoint PPT Presentation

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited A. Futschik and M. Posch Vienna University & Medical Univ. of Vienna On the Optimum Number of Hypotheses to Test when the Number of Observations is


slide-1
SLIDE 1

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited

  • A. Futschik and M. Posch

Vienna University & Medical Univ. of Vienna

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 1/??

slide-2
SLIDE 2

A Main Goal in Statistics

Extract as much information as possible from a limited number of observations In the context of Multiple Hypothesis Testing: Reject (correctly!) as many null hypotheses as possible while still ensuring some global control of the type I error.

Much work has been done to derive multiple test procedures that achieve this goal! We address issue from a different as usual point of view ...

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 2/??

slide-3
SLIDE 3

Our Framework

Consider situation where ... multiple hypotheses are to be tested there is control at the design stage concerning how many hypotheses will be tested

  • verall number of observations is limited by

some constant m there is control at the design stage concerning the allocation of the observations among the hypotheses to be tested

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 3/??

slide-4
SLIDE 4

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 4/28

slide-5
SLIDE 5

Some Applications

Clinical trials with subgroups defined by age, treatment etc. Crop variety selection Microarrays Discrete event systems

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 5/??

slide-6
SLIDE 6

Our Goal

Given a maximum overall number of observations, a certain multiple test procedure Maximize (in number k of considered hypotheses): expected number of correct rejections

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 6/??

slide-7
SLIDE 7

Outline

Framework of optimization problem Optimization w.r.t. a reference alternative Optimum number of hypotheses when controlling the family-wise error (Bonferroni, Bonferroni–Holm, Dunnett) Optimum number of hypotheses when controlling the false discovery rate (Benjamini–Hochberg) Optimization w.r.t. a composite alternative Classification Procedures

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 7/??

slide-8
SLIDE 8

The Optimization Problem

Total of m observations and K potential hypotheses pairs available. Focus on hypotheses of type H0,i : θi = 0 vs. H1,i : θi > 0, (1 ≤ i ≤ K). If k hypothesis pairs selected at random, m/k

  • bservations available for each hypothesis

pair (up to round off differences). Choose k to maximize expected number of correct rejections ENk.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 8/??

slide-9
SLIDE 9

General Observations

If no correction for multiplicity applied, k as large as possible is often optimal. With correction for multiplicity, there is usually a unique optimum k.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 9/??

slide-10
SLIDE 10

Bonferroni Tests

Define ∆m := θ(1) √m σ Then, for normally N(0, σ2) distributed data and

  • ne-sided Bonferroni z-tests:

E(Nk) = q k

  • 1 − Φ(∆m/

√ k,1)(zα/k)

  • where q is the expected proportion of incorrect

null hypotheses.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 10/??

slide-11
SLIDE 11

Example: Bonferroni z- and t-tests

5 10 15 20 25 30 k E(Nk) 10000 20000 30000 z−test t−test

The expected number of correctly rejected null hypotheses for given k and the parameters m = 100000, q = 0.01,

α = 0.05, and θ = σ under H1.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 11/??

slide-12
SLIDE 12

Optimum Number of Hypotheses

Theorem: Define

km := ∆2

m

2 log(∆2

m).

Then, as m → ∞, the optimum number of hypotheses to test is k∗

m = km[1 + o(1)],

with remainder term being negative.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 12/??

slide-13
SLIDE 13

Numerical Example

The optimum number of hypotheses k∗

m and the

power (in %) to reject an individual incorrect null hypotheses:

∆m 5 10 20 50 100 1000 0.01 3 (57) 8 (70) 25 (74) 124 (76) 425 (78) 28908 (82) α 0.025 3 (69) 9 (71) 29 (72) 138 (75) 469 (77) 30883 (81) 0.05 4 (60) 11 (66) 33 (70) 152 (74) 508 (76) 32564 (81)

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 13/??

slide-14
SLIDE 14

Bonferroni–Holm Tests

10 20 30 40 50 60 2 4 6 8 k E(Nk) Bonferroni Holm

Bonferroni vs. Bonferroni–Holm Tests:

θ = 1, m = 200, α = 0.025, and q = 0.5.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 14/??

slide-15
SLIDE 15

Control of False Discovery Rate

Benjamini–Hochberg:

FDR = E( V max(R, 1)) Asymptotically equivalent problem (see Genovese and Wasserman (2002)): E(Nk) = q k

  • 1 − Φ(∆m/

√ k,1)(zu)

  • → max

k ,

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 15/??

slide-16
SLIDE 16

Benjamini–Hochberg

Theorem: Asymptotically, the optimum solution is

k∗

m =

∆2

m

(zu∗

β − zβu∗ β)2,

where u∗

β maximizes

u (zu − zβu)2.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 16/??

slide-17
SLIDE 17

Asymptotic vs. Simulated Objective Function

10 20 30 40 50 60 2 4 6 8 k E(Nk) BH asymptotic BH simulation

The parameters: θ = 1, m = 200, α = 0.025, and q = 0.5.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 17/??

slide-18
SLIDE 18

t-Tests I

Bonferroni-tests: EN (t)

k

= q k[1 − F (t)

m/k,∆m/ √ k(tα/k,m/k)],

with F (t)

ν,δ non-central t-cdf with ν − 1 df and

noncentrality parameter δ, and tγ,ν 1 − γ quantile

  • f standard t-distribution with ν − 1 degrees of

freedom.

Benjamini–Hochberg procedure: EN (t)

k

= q k[1 − F (t)

m/k,∆m/ √ k(tu,m/k)].

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 18/??

slide-19
SLIDE 19

t-Tests II

Theorem: Let θ(1) > 0, and define θm = θ/√m.

Assume that ∆m = θm √m σ = θ(1) σ . Then, for m → ∞, the optimum solution for t-tests converges to that for z-tests.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 19/??

slide-20
SLIDE 20

Possible Rejections for z- and t-Test

10 20 30 40 k E(Nk) 5000 10000 15000 20000 z−test t−test

Parameters: m = 100000, q = 0.01, α = 0.05 and θ(1)/σ = 1.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 20/??

slide-21
SLIDE 21

Composite Alternatives I

Bonferroni z-Tests:

ENk = q k ∞

  • 1 − Φ(zα/k − ∆m(θ)

√ k )

  • dF(θ),

where F conditional c.d.f. of θ given θ > 0, q = P(θ > 0), and ∆m(θ) = θ

√m σ .

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 21/??

slide-22
SLIDE 22

Composite Alternatives II

Theorem: Assume that F is continuous and define

km,F := m d2

F/σ2

2 log(m d2

F/σ2),

where dF maximizes d2[1 − F(d)]. Assuming that d2(1 − F(d)) → 0 as d → ∞,

  • ptimum solution k∗

m,F satisfies

k∗

m,F = km,F(1 + o(1)).

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 22/??

slide-23
SLIDE 23

Composite Alternatives III

5 10 15 20 25 30 k E(Nk) 10000 20000 30000 z−test t−test

Parameters: m = 100000, q = 0.01, α = 0.05. Effect size under alternative N(0, 1.2) distributed.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 23/??

slide-24
SLIDE 24

Composite Alternatives IV

Similar result can be obtained for Benjamini–Hochberg procedure ...

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 24/??

slide-25
SLIDE 25

Classification Procedures I

Classification between θ = θ0 and θ = θ1 Minimize k (w1 q [1 − gk(θ1)] + w0 (1 − q)gk(θ0)) , with gk(θ) probability of deciding for θ(1) under θ. For fixed k, problem equivalent to maximizing U(k) = k (w1 q gk(θ1) − w0 (1 − q)gk(θ0)) .

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 25/??

slide-26
SLIDE 26

Classification Procedures II

Theorem: For Bayes-classifier, normal data and

r = w0 (1 − q)/(w1 q): If r > 1, then optimum k satisfies k = ∆m xr 2 , where xr is the solution of 0 = x ϕ[x−c(r, x)]/2−Φ[x−c(r, x)]+r Φ[−c(r, x)], with c(r, x) = log(r)/x + x/2, and ∆m = θ1 − θ0 √m/σ.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 26/??

slide-27
SLIDE 27

Objective Function U(k)

20 40 60 80 100 0.0 1.0 2.0 3.0 k U(k) correct incorrect

Parameters m = 100, q = 0.5, w0 = 3, w1 = 1, and θ(1)/σ = 1/2.

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 27/??

slide-28
SLIDE 28

Summary

Given a limited maximum number of observations, the number of possible rejections depends considerably on the number of considered hypotheses. With a good design involving an appropriate allocation

  • f the observations to the hypotheses, a lot more can

be gained than by using a more sophisticated multiple test procedure. For more details see: On the Optimum Number of Hypotheses

to Test when the Number of Observations is Limited. (Futschik & Posch (2005))

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited – p. 28/??