Bayesian Adjustment for Multiplicity Jim Berger Duke University - - PowerPoint PPT Presentation

bayesian adjustment for multiplicity
SMART_READER_LITE
LIVE PREVIEW

Bayesian Adjustment for Multiplicity Jim Berger Duke University - - PowerPoint PPT Presentation

2011 Rao Prize Conference, Penn State, June 19 Bayesian Adjustment for Multiplicity Jim Berger Duke University with James Scott University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19,


slide-1
SLIDE 1

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Bayesian Adjustment for Multiplicity

Jim Berger

Duke University

with James Scott

University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19, 2011

1

slide-2
SLIDE 2

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

2

slide-3
SLIDE 3

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Outline

  • Background on multiplicity
  • Illustration of the Bayesian approach through simpler examples

– Multiple testing under exclusivity – Multiple testing under non-exclusivity – Sequence multiple testing

  • The general Bayesian approach to multiplicity adjustment
  • Multiple models
  • Variable selection (including comparison with empirical Bayes)
  • Subgroup analysis

3

slide-4
SLIDE 4

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Some Multiplicity Problems in SAMSI Research Programs

  • Stochastic Computation / Data Mining and Machine Learning

– Example: Microarrays, with 100,000 mean gene expression differentials µi, and testing H0 : µi = 0 versus H1 : µi ̸= 0. Multiplicity problem: Even if all µi = 0, one would find that roughly 500 tests reject at, say, level α = 0.05, so a correction for this effect is needed.

  • Astrostatistics and Phystat

– Example: 1.6 million tests of Cosmic Microwave Background radiation for non-Gaussianity in its spatial distribution. – Example: At the LHC, they are considering using up to 1012 tests for each particle event to try to detect particles such as the Higgs boson. And recently (pre LHC), there was an 8σ event that didn’t replicate.

  • Multiplicity and Reproducibility in Scientific Studies

– In the USA, drug compounds entering Phase I development today have an 8% chance of reaching market, versus a 14% chance 15 years ago – 70% phase III failure rates, versus 20% failure rate 10 years ago. – Reports that 30% of phase III successes fail to replicate.

4

slide-5
SLIDE 5

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Simple Examples of the Bayesian Approach to Multiplicity Adjustment

Key Fact: Bayesian analysis deals with multiplicity adjustment solely through the assignment of prior probabilities to models or hypotheses. Example: Multiple Testing under Exclusivity Suppose one is testing mutually exclusive hypotheses Hi, i = 1, . . . , m, so each hypothesis is a separate model. If the hypotheses are viewed as exchangeable, choose P(Hi) = 1/m. Example: 1000 energy channels are searched for a signal:

  • if the signal is known to exist and occupy only one channel, but no channel is

theoretically preferred, each channel can be assigned prior probability 0.001.

  • if the signal is not known to exist (e.g., it is the prediction of a non-standard

physics theory) prior probability 1/2 should be given to ‘no signal,’ and probability 0.0005 to each channel.

This is the Bayesian solution regardless of the structure of the data.

5

slide-6
SLIDE 6

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ In contrast, frequentist solutions depend on the structure of the data. Example: For each channel, test H0i : µi = 0 versus H1i : µi > 0. Data: Xi, i = 1, ..., m, are normally distributed with mean µi, variance 1, and correlation ρ. If ρ = 0, one can just do individual tests at level α/m (Bonferroni) to

  • btain an overall error probability of α.

If ρ > 0, harder work is needed:

  • Choose an overall decision rule, e.g., “declare channel i to have the

signal if Xi is the largest value and Xi > K.”

  • Compute the corresponding error probability, which can be shown to be

α = Pr(max

i

Xi > K | µ1 = . . . = µm = 0) = EZ [ 1 − Φ (K − √ρZ √1 − ρ )m] , where Φ is the standard normal cdf and Z is standard normal. Note that this gives (essentially) the Bonferroni correction when ρ = 0, and converges to 1 − Φ[K] as ρ → 1 (the one-dimensional solution).

6

slide-7
SLIDE 7

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

An example of non-mutually exclusive Bayesian multiple testing

(Scott and Berger, 2006 JSPI; other, more sophisticated full Bayesian analyses are in G¨

  • nen et. al. (03), Do, M¨

uller, and Tang (02), Newton et all. (01), Newton and Kendziorski (03), M¨ uller et al. (03), Guindani, M., Zhang, S. and Mueller, P.M. (2007), . . .; many empirical Bayes such as Storey, J.D., Dai, J.Y and Leek, J.T. (2007))

  • Suppose xi ∼ N(µi, σ2), i = 1, . . . , m, are observed, σ2 known, and test

H0i : µi = 0 versus H1i : µi ̸= 0.

  • Most of the µi are thought to be zero; let p denote the unknown

common prior probability that µi is zero.

  • Assume that the nonzero µi follow a N(0, V ) distribution, with V
  • unknown. q
  • Assign p the uniform prior on (0, 1) and V the prior density

π(V ) = σ2/(σ2 + V )2.

7

slide-8
SLIDE 8

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

  • Then the posterior probability that µi ̸= 0 is

pi = 1 − ∫ 1 ∫ 1

0 p ∏ j̸=i

( p + (1 − p)√1 − w ewxj 2/(2σ2)) dpdw ∫ 1 ∫ 1 ∏m

j=1

( p + (1 − p)√1 − w ewxj 2/(2σ2)) dpdw .

  • (p1, p2, . . . , pm) can be computed numerically; for large m, it is most

efficient to use importance sampling, with a common importance sample for all pi. Example: Consider the following ten ‘signal’ observations:

  • 8.48, -5.43, -4.81, -2.64, -2.40, 3.32, 4.07, 4.81, 5.81, 6.24
  • Generate n = 10, 50, 500, and 5000 N(0, 1) noise observations.
  • Mix them together and try to identify the signals.

8

slide-9
SLIDE 9

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ The ten ‘signal’ observations #noise n

  • 8.5
  • 5.4
  • 4.8
  • 2.6
  • 2.4

3.3 4.1 4.8 5.8 6.2 pi > .6 10 1 1 1 .94 .89 .99 1 1 1 1 1 50 1 1 1 .71 .59 .94 1 1 1 1 500 1 1 1 .26 .17 .67 .96 1 1 1 2 5000 1 1.0 .98 .03 .02 .16 .67 .98 1 1 1 Table 1: The posterior probabilities of being nonzero for the ten ‘signal’ means. Note 1: The penalty for multiple comparisons is automatic. Note 2: Theorem: E[#i : pi > .6 | all µj = 0] = O(1) as m → ∞, so the Bayesian procedure exerts medium-strong control over false positives. (In comparison, E[#i : Bonferroni rejects | all µj = 0] = α.)

9

slide-10
SLIDE 10

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

−10 −5 5 10 0.0 0.1 0.2 0.3 0.4

−5.65

mu Posterior density −10 −5 5 10 0.0 0.1 0.2 0.3 0.4

−5.56

mu Posterior density −10 −5 5 10 0.0 0.1 0.2 0.3 0.4

−2.98

mu Posterior density 0.32 −10 −5 5 10 0.0 0.1 0.2 0.3 0.4

−2.62

mu Posterior density 0.45

Figure 1: For four of the observations, 1 − pi = Pr(µi = 0 | y) (the vertical bar),

and the posterior densities for µi ̸= 0 .

10

slide-11
SLIDE 11

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Sequence Multiple Testing

11

slide-12
SLIDE 12

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Hypotheses and Data:

  • Alvac had shown no effect
  • Aidsvax had shown no effect

Question: Would Alvac as a primer and Aidsvax as a booster work? The Study: Conducted in Thailand with 16,395 individuals from the general (not high-risk) population:

  • 74 HIV cases reported in the 8198 individuals receiving placebos
  • 51 HIV cases reported in the 8197 individuals receiving the treatment

12

slide-13
SLIDE 13

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ The test that was performed:

  • Let p1 and p2 denote the probability of HIV in the placebo and

treatment populations, respectively.

  • Test H0 : p1 = p2 versus H1 : p1 ̸= p2
  • Normal approximation okay, so

z = ˆ p1 − ˆ p2 √ˆ σ{ˆ

p1−ˆ p2}

= .009027 − .006222 .001359 = 2.06 is approximately N(θ, 1), where θ = (p1 − p2)/(.001359). We thus test H0 : θ = 0 versus H1 : θ ̸= 0, based on z.

  • Observed z = 2.06, so the p-value is 0.04.

Questions:

  • Is the p-value useable as a direct measure of vaccine efficacy?
  • Should the fact that there were two previous similar trials be taken

into account (the multiple testing part of the story)?

13

slide-14
SLIDE 14

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Bayesian Analysis of the Single Trial:

Prior distribution:

  • Pr(Hi) = prior probability that Hi is true, i = 0, 1,
  • On H1 : θ > 0,

let π(θ) be the prior density for θ. Note: H0 must be believable (at least approximately) for this to be reasonable (i.e., no fake nulls). Subjective Bayes: choose these based on personal beliefs Objective (or default) Bayes: choose

  • Pr(H0) = Pr(H1) = 1

2,

  • π(θ) = Uniform(0, 6.46), which arises from assigning

– uniform for p2 on 0 < p2 < p1, – plug in for p1 .

14

slide-15
SLIDE 15

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Posterior probability of hypotheses: Pr(H0|z) = probability that H0 true, given data z = f(z | θ = 0) Pr(H0) Pr(H0) f(x | θ = 0) + Pr(H1) ∫ ∞ f(z | θ)π(θ)dθ For the objective prior, Pr(H0 | z = 2.06) ≈ 0.33 (recall, p-value ≈ .04) Posterior density on H1 : θ > 0 is π(θ|z = 2.06, H1) ∝ π(θ)f(2.06 | θ) = (0.413)e− 1

2 (2.06−θ)2

for 0 < θ < 6.46.

15

slide-16
SLIDE 16

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ −2 2 4 6 0.0 0.2 0.4 0.6 0.8 z p(z) 0.337

16

slide-17
SLIDE 17

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Robust Bayes: Report the Bayes factor (the odds of H0 to H1) as a function of πC(θ) ≡ Uniform(0, C): B01(C) = likelihood of H0 for observed data average likelihood of H1 =

1 √ 2πe−(2.06−θ)2/2

∫ C

1 √ 2πe−(2.06−θ)2/2C−1dθ

.

1 2 3 4 5 6 0.4 0.6 0.8 1.0 c B_01(c)

Note: minC B01(C) = 0.265 (while B01(6.46) = 0.51). Note: The robustness analysis applies to all non increasing priors.

17

slide-18
SLIDE 18

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Incorporation information from multiple tests: To adjust for the two previous similar failed trials, the (exchangeable) Bayesian solution

  • assigns each trial common unknown probability p of success, with p

having a uniform distribution;

  • computes the resulting posterior probability that the current trial

exhibits no efficacy

Pr(H0 | x1, x2, x3) = ( 1 + B01(x1)B01(x2) + B01(x1) + B01(x2) + 3 3B01(x1)B01(x2) + B01(x1) + B01(x2) + 1 × 1 B01(x3) )−1

where B01(xi) is the Bayes factor of “no effect” to “effect” for trial i. The result is Pr(H0 | x1, x2, x3) = 0.54.

18

slide-19
SLIDE 19

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

General Approach to Bayesian Multiplicity Adjustment

  • 1. Represent the problem as a model uncertainty problem: Models Mi, with

densities fi(x | θi) for data x, given unknown parameters θi; prior distributions πi(θi); and marginal likelihoods mi(x) = ∫ fi(x | θi)πi(θi)dθi.

  • 2. Specify prior probabilities, P(Mi), of models to reflect the multiplicity

issues; Bayesian analysis controls multiplicity through P(Mi) a

  • Subjective Bayesian Analysis: If the P(Mi) are real subjective

probabilities, that’s it: multiplicity correction has been done.

  • Objective Bayesian Analysis: One has to be careful to make choices of the

P(Mi) that ensure multiplicity correction (e.g., specifying equal prior probabilities does not generally control multiplicity)!

  • 3. Implement Bayesian model averaging (model selection?), based on

P(Mi | x) = P(Mi) mi(x) ∑k

j=1 P(Mj) mj(x)

.

asee, e.g., Jeffreys 1961; Waller and Duncan 1969; Meng and Demptster 1987; Berry

1988; Westfall, Johnson and Utts 1997; Carlin and Louis 2000.

19

slide-20
SLIDE 20

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Choice of transformation/model

Bayesian solution: model averaging.

  • Assign each model/transformation a prior probability.
  • Compute model/transformation posterior probabilities.
  • Perform inference with weighted averages over the

models/transformations. (An overwhelmingly supported model/transformation will receive weight near one.)

20

slide-21
SLIDE 21

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Example: From i.i.d. vehicle emission data X = (X1, . . . , Xn), one desires to determine the probability that the vehicle type will meet regulatory standards. Traditional models for this type of data are Weibull and lognormal distributions given, respectively, by M1 : fW (x; β, γ) = γ β (x β )γ−1 exp [ − (x β )γ] M2 : fL(x; µ, σ2) = 1 x √ 2πσ2 exp [−(log x − µ)2 2σ2 ] . Note that both distributions are in the location-scale family (the Weibull being so after a log transformation).

21

slide-22
SLIDE 22

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Model Averaging Analysis:

  • Assign each model prior probability 1/2.
  • Because of the common location-scale invariance structures, assign the

right-Haar prior densities πW (β, γ) = 1/(βγ) and πL(µ, σ) = 1/(σ), respectively (Berger, Pericchi and Varshavsky, 1998 Sankhy¯

a).

  • The posterior probabilities (and conditional frequentist error

probabilities) of the two models are then P(M1 | x) = 1 − P(M2 | x) = B(x) 1 + B(x) , where zi = log xi,¯ z = 1

n

∑n

i=1 zi, s2 z = 1 n

∑n

i=1(zi − ¯

z)2, and

B(x) = Γ(n)nnπ(n−1)/2 Γ(n − 1/2) ∫ ∞ [ y n

n

i=1

exp (zi − ¯ z szy )]−n dy .

  • For the studied data set, P(M1 | x) = .712. Hence,

P(meeting standard) = .712 P(meeting standard | M1) +.288 P(meeting standard | M2) .

22

slide-23
SLIDE 23

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Variable Selection

Problem: Data X arises from a normal linear regression model, with m possible regressors having associated unknown regression coefficients βi, i = 1, . . . m, and unknown variance σ2. Models: Consider selection from among the submodels Mi , i = 1, . . . , 2m, having only ki regressors with coefficients βi (a subset of (β1, . . . , βm)) and resulting density fi(x | βi, σ2). Prior density under Mi: Zellner-Siow priors πi(βi, σ2). Marginal likelihood of Mi: mi(x) = ∫ fi(x | βi, σ2)πi(βi, σ2) dβidσ2 Prior probability of Mi: P(Mi) Posterior probability of Mi: P(Mi | x) = P(Mi)mi(x) ∑

j P(Mj)mj(x) . 23

slide-24
SLIDE 24

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Common Choices of the P(Mi)

Equal prior probabilities: P(Mi) = 2−m Bayes exchangeable variable inclusion:

  • Each variable, βi, is independently in the model with unknown

probability p (called the prior inclusion probability).

  • p has a Beta(p | a, b) distribution. (We use a = b = 1, the uniform

distribution, as did Jeffreys 1961, who also suggested alternative choices of the P(Mi). Probably a = b = 1/2 is better.)

  • Then, since ki is the number of variables in model Mi,

P(Mi) = ∫ 1 pki(1 − p)m−kiBeta(p | a, b)dp = Beta(a + ki, b + m − ki) Beta(a, b) .

Empirical Bayes exchangeable variable inclusion: Find the MLE ˆ p by maximizing the marginal likelihood of p, ∑

j pkj(1 − p)m−kjmj(x), and use

P(Mi) = ˆ pki(1 − ˆ p)m−ki as the prior model probabilities.

24

slide-25
SLIDE 25

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Controlling for multiplicity in variable selection

Equal prior probabilities: P(Mi) = 2−m does not control for multiplicity here (as it did in the simpler examples); it corresponds to fixed prior inclusion probability p = 1/2 for each variable. Empirical Bayes exchangeable variable inclusion does control for multiplicity, in that ˆ p will be small if there are many βi that are zero. Bayes exchangeable variable inclusion also controls for multiplicity (see Scott and Berger, 2008), although the P(Mi) are fixed.

Note: The control of multiplicity by Bayes and EB variable inclusion usually reduces model complexity, but is different than the usual Bayeisan Ockham’s razor effect that reduces model complexity.

  • The Bayesian Ockham’s razor operates through the effect of model priors

πi(βi, σ2) on mi(x), penalizing models with more parameters.

  • Multiplicity correction occurs through the choice of the P(Mi).

25

slide-26
SLIDE 26

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Equal model probabilities Bayes variable inclusion Number of noise variables Number of noise variables Signal 1 10 40 90 1 10 40 90 β1 : −1.08 .999 .999 .999 .999 .999 .999 .999 .999 β2 : −0.84 .999 .999 .999 .999 .999 .999 .999 .988 β3 : −0.74 .999 .999 .999 .999 .999 .999 .999 .998 β4 : −0.51 .977 .977 .999 .999 .991 .948 .710 .345 β5 : −0.30 .292 .289 .288 .127 .552 .248 .041 .008 β6 : +0.07 .259 .286 .055 .008 .519 .251 .039 .011 β7 : +0.18 .219 .248 .244 .275 .455 .216 .033 .009 β8 : +0.35 .773 .771 .994 .999 .896 .686 .307 .057 β9 : +0.41 .927 .912 .999 .999 .969 .861 .567 .222 β10 : +0.63 .995 .995 .999 .999 .996 .990 .921 .734 False Positives 2 5 10 1

Table 2: Posterior inclusion probabilities for 10 real variables in a simulated data set.

26

slide-27
SLIDE 27

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Comparison of Bayes and Empirical Bayes Approaches

Theorem 1 In the variable-selection problem, if the null model (or full model) has the largest marginal likelihood, m(x), among all models, then the MLE of p is ˆ p = 0 (or ˆ p = 1.) (The naive EB approach, which assigns P(Mi) = ˆ pki(1 − ˆ p)m−ki, concludes that the null (full) model has probability 1.) A simulation with 10,000 repetitions to gauge the severity of the problem:

  • m = 14 covariates, orthogonal design matrix
  • p drawn from U(0, 1); regression coefficients are 0 with probability p and

drawn from a Zellner-Siow prior with probability (1 − p).

  • n = 16, 60, and 120 observations drawn from the given regression model.

Case ˆ p = 0 ˆ p = 1 n = 16 820 781 n = 60 783 766 n = 120 723 747

27

slide-28
SLIDE 28

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Is empirical Bayes at least accurate asymptotically as m → ∞?

Posterior model probabilities, given p:

P(Mi | x, p) = pki(1 − p)m−kimi(x) ∑

j pkj(1 − p)m−kjmj(x)

Posterior distribution of p: π(p | x) = K ∑

j pkj(1 − p)m−kjmj(x)

This does concentrate about the true p as m → ∞, so one might expect that

P(Mi | x) = ∫ 1

0 P(Mi | x, p)π(p | x)dp ≈ P(Mi | x, ˆ

p) ∝ mi(x) ˆ pki(1 − ˆ p)m−ki.

This is not necessarily true; indeed

∫ 1 P(Mi | x, p)π(p | x)dp = ∫ 1 pki(1 − p)m−kimi(x) π(p | x)/K × π(p | x) dp ∝ mi(x) ∫ 1 pki(1 − p)m−kidp ∝ mi(x)P(Mi) .

Caveat: Some EB techniques have been justified; see Efron and Tibshirani (2001), Johnstone and Silverman (2004), Cui and George (2006), and Bogdan et. al. (2008).

28

slide-29
SLIDE 29

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Theorem 2 Suppose the true model size kT satisfies kT /m → pT as m → ∞, where 0 < pT < 1. Consider all models Mi such that kT − ki = O(√m), and consider the optimal situation for EB in which ˆ p = pT + O( 1 √m) as m → ∞ . Then the ratio of the prior probabilities assigned to such models by the Bayes approach and the empirical Bayes approach satisfies PB(Mi) PEB(Mi) = ∫ 1

0 pki(1 − p)m−kiπ(p)dp

(ˆ p)ki(1 − ˆ p)m−ki = O ( 1 √m ) , providing π(·) is continuous and nonzero.

29

slide-30
SLIDE 30

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Subgroup Analysis

30

slide-31
SLIDE 31

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

31

slide-32
SLIDE 32

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Frequentist adjustment for performing 26 hypothesis tests

  • Split the data into one part to suggest a subgroup and another part to

confirm (or confirm with a new experiment).

  • Bonferonni correction

– To achieve an overall error probability level of 0.05 when conducting 26 tests, one would need to use a per-test rejection level of α = 0.05/26 = 0.002. – This is likely much too conservative because of the dependence in the 26 tests.

  • Various bootstrap types of correction to try to account for dependence.

32

slide-33
SLIDE 33

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Bayesian adjustment

Let v be the vector of 25 zeroes and ones indicating subgroup characteristics. For each possible such vector, let µv denote the mean of the intersected subgroup (e.g., young, male, diabetic, non-smoker,...). Data: x ∼ f(x | {µv, all possible v}). Two classes of approaches

  • Factor-based approaches
  • Aggregation-based approaches

33

slide-34
SLIDE 34

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

An example factor-based approach

Model the intersected subgroup means additively as µv = µ + vβ, β = (β1, . . . , β25)′ , where µ is an overall mean and βi is the effect corresponding to the ith subgroup factor. Conversion to model selection:

  • Let γ = (γ0, γ∗) = (γ0, γ1, . . . , γ25) be the vector of zeroes and ones,

indicating whether µ (corresponding to γ0) and each factor βi is zero or not.

  • This defines the model Mγ.

34

slide-35
SLIDE 35

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ A reasonable objective choice of prior model probabilities:

  • P(γ0 = 0) = P(µ = 0) = 3/4.
  • Independently, P(γ∗ = 0) = 2/3 and γ∗ ̸= 0 have probability

P(γ∗) = 26 75 · Beta(1 + r, 1 + 25 − r) Beta(1, 1) where r = # zeroes in γ∗.

  • Note that then

– P(no effect) = P(µ = 0, γ∗ = 0) = 1/2 – P(µ ̸= 0, γ∗ = 0) = 1/6 – P(µ = 0, γ∗ ̸= 0) = 1/4 – P(µ ̸= 0, γ∗ ̸= 0) = 1/12 – P(γi ̸= 0) = 13/75

The experimenter could (pre-experimentally) make different choices here, as long as P(no effect) is kept at 1/2. Post-experimentally, one would need to utilize an objective choice such as the above.

35

slide-36
SLIDE 36

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪ Possible Bayesian outputs of interest:

  • P(effect of factor i ̸= 0 | x) = ∑

{γ:γi=1} P(Mγ | x).

  • P(effect in subgroup i ̸= 0 | x) = ∑

{γ:γ0=1 or γi=1} P(Mγ | x).

  • P(a constant effect ̸= 0 | x) = P(M(1,0) | x).

Of course, posterior densities for all effects, conditional on their being nonzero, are also available.

36

slide-37
SLIDE 37

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Aggregation-based approaches

Basic idea: Recall that for every intersected subgroup (e.g., young, male, diabetic, non-smoker,...) there is an unknown mean µv. Plausible models involve aggregation of these means into common effects, e.g. µv1 = µv2. There are a number of ways to aggregate means, including

  • Product partition models (Hartigan and Berry)
  • Dirichlet process models (Gopalan and Berry use for multiplicity control)
  • Generalized partition models
  • Species sampling models
  • Tree-based models (our current favorite)

Surmountable problem: Any of these aggregate means could be zero; with some work, this can typically be handled by adding “zero” to the list. Harder problem: Not all (not even most) aggregations are sensible

(e.g., µF1G1 = µF2G2 ̸= µF1G2 = µF2G1 versus µF1G1 = µF2G1 ̸= µF1G2 = µF2G2).

37

slide-38
SLIDE 38

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Summary

  • Developing methods for controlling for multiplicity is a dramatically

increasing need in science.

  • Approaching multiplicity control from the Bayesian perspective has the

attractions that – there is a single approach that can be applied in any situation; – since multiplicity is controlled solely through prior probabilities of models, it does not depend on the error structure of the model; – there is flexibility in the assignment of prior probabilities to hypotheses, from pure objective assignments to (pre-experimental) subjective assignments favoring scientifically preferred hypotheses; – objective Bayesian control can even be implemented retroactively.

  • Associated empirical Bayes analysis exhibits multiplicity control, but

cannot be assumed to be an approximation to the Bayesian analysis.

  • Bayesian implementation of subgroup analysis is promising.

38

slide-39
SLIDE 39

2011 Rao Prize Conference, Penn State, June 19

✬ ✫ ✩ ✪

Thanks!

39