A Bayesian Method for Partially Paired High Dimensional Data Fei - - PowerPoint PPT Presentation

a bayesian method for partially paired high dimensional
SMART_READER_LITE
LIVE PREVIEW

A Bayesian Method for Partially Paired High Dimensional Data Fei - - PowerPoint PPT Presentation

A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006 Outline Bayesian methods have been developed for paired high


slide-1
SLIDE 1

A Bayesian Method for Partially Paired High Dimensional Data

Fei Liu, Feng Liang, Woncheol Jang

Institute of Statistics and Decision Sciences Duke University

SAMSI, Summer 2006

slide-2
SLIDE 2

Outline

◮ Bayesian methods have been developed for paired high

dimensional data such as gene expression data.

◮ For partially paired data, however, excluding those

unpaired observations for the analysis may lead to significant information loss.

◮ Using test statistics with FDR control is a possible solution. ◮ We provides a generalized Bayesian method for partially

paired high dimensional data.

slide-3
SLIDE 3

Statistical Model

◮ The data for jth gene are arranged as:

X1j, . . . , Xnj; X ∗

1j, . . . , X ∗ n1j;

Y1j, . . . , Ynj; Y ∗

1j . . . , Y ∗ n2j.

(Xij, Yij): paired gene expressions. X ∗

ij , Y ∗ ij : unpaired

  • bservations.

◮ (Xij, Yij)T ∼ N(µj, Σj)

µj =

  • µj

µj + δj

  • , Σj =
  • σ2

1

ρjσ1σ2 ρjσ1σ2 σ2

2

  • .

◮ For the incomplete data

X ∗

ij ∼ N(µj, σ2 1), Y ∗ ij ∼ N(µj + δj, σ2 2)

slide-4
SLIDE 4

Review the FDR Control

Accept Reject Total True Null

U V (False positive) m0

Untrue Null

T (False negative) S m − m0

Total

W R m FDR = E(V/R | R = 0) Benjamini and Hochberg(1995) procedure to control FDR at q∗: H1 H2 . . . Hm p1 p2 . . . pm

  • =

⇒ H(1) H(2) . . . H(m) p(1) p(2) . . . p(m)

  • k = max(i, p(j) ≤

j mq∗, for all j ≤ i), Reject H(1):(k).

slide-5
SLIDE 5

Test Statistics for Partially Paired Data in One Dimensional Space

◮ Lin and Stivers (1974) use the test statistic when n is small

and | ρ |≤ 0.5:

T = ¯ x1

(n+n1) − ¯

x2

(n+n2)

r

1 n+n1 + 1 n+n2 − 2nr (n+n1)(n+n2)

q (a∗

11 + b22)/(N − 2)

where T ∼ tN−4 approximately, and N = n + n1 + n2.

◮ Another test statistic is given by the mixed effect model:

zik = µ + αi + βk + ǫik, where βk ∼ N(0, σ2

β), ǫij ∼ N(0, σ2 ǫ )

Perform ANOVA to test the fixed effect αi = 0.

slide-6
SLIDE 6

Scott and Berger (2003)

Noticing the built-in penalty (“Ockham’s razor effect”) of the Bayesian method, Scott and Berger (2003) propose A Bayesian Hierarchical model for multiple comparisons, Observe x = (x1, . . . , xM): xj ∼ N(µj, σ2) γj = 1 − δ(µj = 0) f(x | σ2, γ, µ) =

M

  • j=1

1 √ 2πσ2 exp −(xj − γjµj)2 2σ2

  • µj

∼ N(0, V) π(V, σ2) ∝ (V + σ2)−2 γj ∼ Bernoulli(p) π(p) ∼ Beta(α, β)

slide-7
SLIDE 7

EBarrays Method

Kendziorski et. al (2004) propose Parametric Empirical Bayes Method to account for replicating arrays (multiple conditions as well). Observe x = (x1, . . . , xJ), where xj = (xj1, xj2, . . . , xjI)

◮ If gene j is not differentially expressed (δj = 0),

f0(xj) =

  • (

I

  • i=1

fobs(xji | µ))π(µ)dµ

◮ If gene j is differentially expressed (δj = 0),

f1(xj) = f0(xj1)f0(xj2)

◮ Data is marginally distributed: pf1(xj) + (1 − p)f0(xj) ◮ By Bayes’ rule, posterior probability of δj = 0 is

pf1(xj) pf1(xj) + (1 − p)f0(xj)

slide-8
SLIDE 8

Mixture Prior

◮ Our primary interest is:

H0 : δj = 0

◮ We propose a mixture distribution for δj, i.e.,

π(δj | p, τ 2) = pφ(δj/τ) + (1 − p)I{0}(δj),

p : probability of being differentially expressed. γj: Latent variables. Set to 1 if the jth gene is differentially expressed;

  • therwise 0. Interest P(γj = 1 | Data).
slide-9
SLIDE 9

Priors and Posteriors

◮ Priors distributions for (µ, σ2, p, τ 2) are:

π(µj) ∝ 1 π(τ 2 | σ2) ∝ 1 σ2

  • 1 + τ 2

σ2 −2 π(p) ∝ pα−1(1 − p)β−1 ≡ Beta(α, β)

◮ Improper prior distributions for ρ

π1(ρj) ∝ 1 (1 − ρ2

j ) , π2(ρj) ∝

1 (1 − ρ2

j )2

π1 and π2 are both can be shown to have proper posteriors. Bayarri(1981) shows that π1 avoids the “Jeffrey-Lindley” paradox.

slide-10
SLIDE 10

Gibbs Sampling

Θ = (µ, δ, γ, ρ, σ2, τ 2, p), Data = (x, y, x∗, y∗) Closed forms for sampling µ, δ, γ, σ2:

(µj | Θ−µ, Data) ∼ N(m(µ)

j

, σ(µ)

j

) (γj | Θ−(γ∪δ), Data) ∼ Bernoulli(p(γ)

j

) (δj | Θ−δ, Data) ∼ N(m(δ)

j

, σ(δ)

j

) (p | Θ−p, Data) ∼ Beta @α +

J

X

j=1

γj , β + J −

J

X

j=1

γj 1 A (σ2 | Θ−σ2 , Data) ∼ IG „ J „ n + n1 + n2 2 « , η «

No closed forms for (τ 2, ρ).

slide-11
SLIDE 11

Simulation Study with Normal Distributions

Simulate the data with 1000 genes, 5 paired, 2 unpaired control, 2 unpaired treatment, and ρ = 0.1, τ 2 = 100, σ2 = 1.0, p = 0.01. False Positive False Negative FDR - T test 0/9 2/991 FDR - random effect 1/11 1/989 Bayesian Model 0/10 1/990

Histogram of p, true 0.01 p density 0.005 0.010 0.015 0.020 0.025 20 40 60 80 100 120 140

Figure: Posterior distribution of p (true is 0.01)

slide-12
SLIDE 12

Simulation Study with Normal Distributions (Cont...)

Delta_ 325

delta density −13.5 −13.0 −12.5 −12.0 −11.5 −11.0 −10.5 0.0 0.2 0.4 0.6 0.8

True = −12.68011;P = 1

Delta_ 666

delta density −4 −3 −2 −1 5 10 15

True = −3.878074;P = 0.081

Delta_ 84

delta density 0.0 0.2 0.4 0.6 0.8 1.0 20 40 60 80 100

True = 0;P = 0

Figure: Posterior distribution for δ’s

slide-13
SLIDE 13

Simulation Study with t Distributions

Simulate the data with 1000 genes, 9 samples (5 pairs, 2 unpaired control, 2 unpaired treatment) Data = Bivariate T4 with mean 0 and Σ = 0.1 1 1 0.1

  • +µ + δ

µ ∼ U(−0.01, 0.01) δi | δi = 0 ∼ N(0, τ 2 = 100); P(δi = 0) = 0.01 False Positive False Negative FDR - T test 1/7 3/993 FDR - random effect 6/13 2/987 Bayesian Model 6/13 2/987

slide-14
SLIDE 14

Simuation Study with t Distributions (Cont...)

Histogram of p, true 0.01

p density 0.005 0.010 0.015 0.020 0.025 0.030 0.035 20 40 60 80 100

Figure: Posterior distribution for p

slide-15
SLIDE 15

Simulation Study with t Distributions (Cont...)

Delta_ 6

delta density −10 −9 −8 −7 0.0 0.2 0.4 0.6 0.8 1.0

True = −8.308207;P = 1

Delta_ 390

delta density −6 −5 −4 −3 −2 −1 0.0 0.5 1.0 1.5 2.0 2.5 3.0

True = −4.34904;P = 0.855

Delta_ 401

delta density −1 1 2 5 10 15

True = 0;P = 0.178

Figure: Posterior distribution for δ’s

slide-16
SLIDE 16

Future work

◮ Apply the method to gene expression data. ◮ Use different p to achieve different thresholding. ◮ EBarrays method with random effects ek.

Observe Xjk = (Xjk1, Xjk2) for gene j and sample k.

  • If gene j is not differentially expressed,

f0(Xjk) =

i k

fobs(Xjki | µ + ek)π(µ)π(ek)dek

  • If gene j is differentially expressed,

f0(Xjk) =

i k

fobs(Xjki | µi + ek)π(µi)π(ek)dek

  • dµi