Biospecimen Assessment Michelle Danaher University of Maryland, - - PowerPoint PPT Presentation

biospecimen assessment
SMART_READER_LITE
LIVE PREVIEW

Biospecimen Assessment Michelle Danaher University of Maryland, - - PowerPoint PPT Presentation

Biospecimen Assessment Michelle Danaher University of Maryland, Baltimore County and Eunice Kennedy Shriver National Institute of Child Health and Human Development 1/60 Outline Biospecimen Assessment Chapter 1: Background Pooling variables


slide-1
SLIDE 1

Biospecimen Assessment

Michelle Danaher

University of Maryland, Baltimore County and Eunice Kennedy Shriver National Institute of Child Health and Human Development

1/60

slide-2
SLIDE 2

Outline

Biospecimen Assessment

Chapter 1: Background Pooling variables in biomarker assessment models

Chapter 2: Gene-environment interactions Chapter 3: Estimation of interaction effects using pooled biospecimen

Constrained estimation in biomarker assessment models

Chapter 4: Minkowski-Weyl priors for models with parameter constraints

2/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-3
SLIDE 3

Outline

Biospecimen Assessment

Chapter 1: Background Pooling variables in biomarker assessment models

Chapter 2: Gene-environment interactions Chapter 3: Estimation of interaction effects using pooled biospecimen

Constrained estimation in biomarker assessment models

Chapter 4: Minkowski-Weyl priors for models with parameter constraints

3/60

Danaher, M.R., Schisterman, E.F ., Roy, A. and Albert, P .S. (2012), Estimation of gene-environment interaction by pooling biospecimens. Statistics in Medicine, 31: 3241–3252. DOI: 10.1002/sim.5357 Danaher, M.R., Roy, A., Chen, Z., Mumford, S.L. and Schisterman, E.F . (2012), Minkowski-Weyl priors for models with parameter constraints: an analysis of the BioCycle Study. Journal of the American Statistical Association, in press.

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-4
SLIDE 4

Chapter 1: Background

4/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-5
SLIDE 5

Background

Why are we interested in biospecimen assessment?

5/60

slide-6
SLIDE 6

Background

Why are we interested in biospecimen assessment? Epidemiologic studies increasingly depend on biomarkers to indicate exposure.

5/60

slide-7
SLIDE 7

Background

Why are we interested in biospecimen assessment? Epidemiologic studies increasingly depend on biomarkers to indicate exposure. Financial expense limits epidemiologists from measuring biomarkers

5/60

slide-8
SLIDE 8

Background

Why are we interested in biospecimen assessment? Epidemiologic studies increasingly depend on biomarkers to indicate exposure. Financial expense limits epidemiologists from measuring biomarkers We develop methods for proper statistical inference when biospecimens are pooled.

5/60

slide-9
SLIDE 9

Background

Why are we interested in biospecimen assessment? Epidemiologic studies increasingly depend on biomarkers to indicate exposure. Financial expense limits epidemiologists from measuring biomarkers We develop methods for proper statistical inference when biospecimens are pooled. Biomarkers have known relationships due to biology

5/60

slide-10
SLIDE 10

Background

Why are we interested in biospecimen assessment? Epidemiologic studies increasingly depend on biomarkers to indicate exposure. Financial expense limits epidemiologists from measuring biomarkers We develop methods for proper statistical inference when biospecimens are pooled. Biomarkers have known relationships due to biology We develop methods for effectively incorporating known biological constraints into a model.

5/60

slide-11
SLIDE 11

Background

Pooling variables in biomarker assessment models

In World War II Dorfman (1943) proposed pooling biospecimen to identify syphilis. Later, pooling was used to obtain a sum of individual components which are used in estimation of model parameters. (Brown and Fisher, 1972; Rhode, 1976)

6/60

slide-12
SLIDE 12

Background

Pooling variables in biomarker assessment models

The gene-environment interaction is a particularly important hypothesis. Traditional interaction estimators require genotyping a lot

  • f individuals.

We propose a novel pooling strategy to obtain a substantially less expensive estimator.

7/60

slide-13
SLIDE 13

Background

Pooling variables in biomarker assessment models

When two or more exposures are measured in pools, it is interesting to estimate interaction effects. Weinberg and Umbach (1999) proposed the set-based logistic model which,

estimates main effects of exposures measured in pools, estimates interaction effects when one exposure is used to form pooling strata. HOWEVER, it can not estimate interaction, quadratic, or higher order effects of exposures measured in pools.

We propose a novel method for estimating interactions of pooled exposures.

8/60

slide-14
SLIDE 14

Background

Constrained estimation in biomarker assessment models

20 40 60 80 100 120 2 4 6 8 10 12 14 16 18 20

1 2

20 40 60 80 100 120 2 4 6 8 10 12 14 16 18 20

1 2

(10,0) (98.89, 8.89)

                5 . 1 5 . 89 . 8 89 . 98 10 ) , ( U V U V     

Gelfand et al. (1992) proposed drawing samples from an unconstrained posterior, and retaining only those satisfying the constraints. Dunson et al. (2003) proposed a transformation approach for draws falling outside the constraint space. We propose a Bayesian model with a prior density on a constraint space using Minkowski-Weyl decomposition.

9/60

slide-15
SLIDE 15

Chapter 3: Estimation of interaction effects using pooled biospecimen

10/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-16
SLIDE 16

Outline

Estimation of interaction effects using pooled biospecimen

Introduction Notation and assumptions A cohort study

The EM algorithm Variance estimation Simulation

A case control study

The EM algorithm Variance estimation Simulation

Discussion

11/60

   1 2 p

Intro Notation Cohort Cohort Simulation Case Control Case Control Simulation Discussion

slide-17
SLIDE 17

Introduction

Why are we interested in pooling biospecimen?

12/60

slide-18
SLIDE 18

Introduction

Why are we interested in pooling biospecimen?

Pooling saves financial resources, and invaluable biological resources.

12/60

slide-19
SLIDE 19

Introduction

Why are we interested in pooling biospecimen?

Pooling saves financial resources, and invaluable biological resources. When measuring a continuous biomarker, pooling may reduce the percentage of measurements under the LOD.

Pooled exposure assessment Individual exposure assessment

  • 50
  • 40
  • 30
  • 20
  • 10
10 20 30 40 50 500 1000 1500 2000 2500
  • 40
  • 30
  • 20
  • 10
10 20 30 500 1000 1500 2000 2500

LOD LOD

12/60

slide-20
SLIDE 20

Notation and assumptions

n = the # of individuals in the study. p = the # of individuals’ biospecimen within each pool. np = n/p is the # of pools of biospecimen. xij = [xij1, xij2]′ are two continuous exposures of interest, of the jth person in the ith pool. xp

i = p j=1 xij, is the observed exposure within a pool.

yij is the binary response of the jth person in the ith pool.

13/60

slide-21
SLIDE 21

Notation and assumptions

We assume that the disease status and exposures have the following distributions, yij|xij, β ind ∼ Ber[π(xij, β)], xij|µ, Σ iid ∼ N2(µ, Σ), for i = 1, 2, . . . , np, j = 1, 2, . . . , p π(xij, β) is the logistic link given by, π(xij, β) = exp

  • β′D(xij)
  • 1 + exp
  • β′D(xij)

, β = [β0, β1, β2, β3]′ are the unknown parameters of interest, and D(xij) = [1, xij1, xij2, xij1 × xij2]′. Define the unknown parameters by θ = [β′, γ′]′, where γ = [σ1, σ2, ρ, µ′]′.

14/60

slide-22
SLIDE 22

A cohort study

The likelihood for a cohort study is as follows, L(θ|y1, . . . yp, X1 . . . Xp) =

np

  • i=1

p

  • j=1

Pr(Y = yij|xij, β)φ(xij|µ, Σ) When p = 2, and pooling is matched within disease status (i.e. y1 = y2), L(θ|y1, X1, X p) =

np

  • i=1

g(xi1, yi1, xp

i , θ),

where the function g is defined as, g(xi1, yi1, xp

i , θ)

= Pr(Y = yi1|xi1, β)φ(xi1|µ, Σ) ×Pr(Y = yi1|xp

i − xi1, β)φ(xp i − xi1|µ, Σ).

15/60

slide-23
SLIDE 23

A cohort study

The EM algorithm

The first step of the EM algorithm is to evaluate Q, Q(θ|θt) = Ef{log[L(θ|y1, X1, X p)]} =

np

  • i=1
  • xi1∈R2 log[g(xi1, yi1, xp

i , θ)]fX1|y1,Xp,θt (xi1|y1, X p, θt) dxi1,

where the density fX1|y1,X p,θt is, fX1|y1,X p,θt(xi1|y1, X p, θt) = g(xi1, yi1, xp

i , θt)

  • xi1∈R2 g(xi1, yi1, xp

i , θt) dxi1

.

16/60

slide-24
SLIDE 24

A cohort study

The EM algorithm

The second step in the EM algorithm is to maximize Q with respect to (w.r.t.) the unknown parameters θ. We use Newton Raphson to maximize Q w.r.t β, where,

βk+1 = βk −

  • δ2Q(θ|θt)

δβδβ′ −1 δQ(θ|θt) δβ ,

and k increases until convergence. The values of µ and Σ which maximize Q are,

µt+1 = 1 np

np

  • i=1

xp

i

2 , Σt+1 = µt+1(µt+1)′ + 1 np

np

  • i=1

xp

i xp′ i

2 − xp

i (µt+1)′ + Ef

  • xi1x′

i1

  • − xp

i

2 Ef

  • x′

i1

  • − Ef (xi1) xp′

i

2 ,

17/60

slide-25
SLIDE 25

A cohort study

Variance estimation

We use the variance estimator proposed by Louis (1982),

Iobs(ˆ θ) = Icomplete(ˆ θ) − Imiss|obs(ˆ θ),

Icomplete(ˆ θ) is the observed information matrix of the conditional expected complete data,

Icomplete(ˆ θ) = −δ2Q(θ|θt) δθδθ′

  • θ=θt = ˆ

θ

= −  

δ2Q(θ|θt ) δβδβ′ δ2Q(θ|θt ) δγδγ′

 

  • θ=θt = ˆ

θ

,

Imiss|obs(ˆ θ) is the expected information for the distribution f.

Imiss|obs(ˆ θ) =

np

  • i=1

Ef

  • δlog
  • f(xi1|y1, X p, θt)
  • δθt

δlog

  • f(xi1|y1, X p, θt)
  • δθt

  • θt = ˆ

θ 18/60

slide-26
SLIDE 26

A cohort study

Simulation

β0 = 0.2 β1 = 0.1 β2 = 0.2 β3 = 0.1 Traditional MLE, n = 1000, Without Pooling (i.e. p = 1) Bias

  • 0.001

0.001 0.002

  • 0.003

RMSE 0.070 0.071 0.066 0.067 Est SD 0.072 0.072 0.066 0.067 Obs SD 0.070 0.071 0.066 0.067 Power 0.810 0.278 0.857 0.339 MLE using the EM algorithm, n = 1000, p = 2 Bias 0.001 0.008

  • 0.001
  • 0.001

RMSE 0.070 0.077 0.065 0.096 Est SD 0.072 0.079 0.067 0.096 Obs SD 0.071 0.077 0.066 0.096 Power 0.780 0.200 0.878 0.173 Traditional MLE, n = 500, Without Pooling (i.e. p = 1) Bias

  • 0.004
  • 0.001

0.001

  • 0.004

RMSE 0.102 0.104 0.096 0.102 Est SD 0.102 0.103 0.094 0.096 Obs SD 0.102 0.105 0.096 0.102 Power 0.521 0.157 0.560 0.199

Table : For this simulation γ = [1, 1, 0.02, 0.1, 0.5]. Obs SD is the observed standard

deviation from the 1000 Monte Carlo replicate estimates, and Est SD is the estimated standard deviation.

19/60

slide-27
SLIDE 27

A case control study

The likelihood for a case control study is as follows,

L(θ|y1, . . . yp, X1 . . . Xp, s1, . . . sp) =

np

  • i=1

p

  • j=1

Pr(Y = yij, xij|sij = 1, θ) =

np

  • i=1

p

  • j=1

Pr(Y = yij|xij, β)φ(xij|µ, Σ)τ(yij) α(yij, θ) ,

where τ(yij) = Pr(Y = yij|sij = 1), and α(yij, θ) is defined,

α(yij, θ) =

  • x∈R2 Pr(Y = yij|x, β)φ(x|µ, Σ) dx.

When p = 2, and pooling is within strata of the disease (i.e. y1 = y2),

L(θ|y1, X1, X p, s1) =

np

  • i=1

g(xi1, yi1, xp

i , θ)τ(yi1)2

α(yi1, θ)2 .

20/60

slide-28
SLIDE 28

A case control study

Define κ to be an estimate of disease prevalence. Approximate α(1, θ) using Gauss-Hermite quadratures, α(1, θ) ≈ (π)−1

j

exp

  • β′D
  • s∗(rj)
  • wj

1 + exp

  • β′D
  • s∗(rj)

, where s∗(z) = (2Σ)1/2z + µ, and rj and wj are the Gauss-Hermite nodes and weights. We solve the following equation for β0, 0 = κ − (π)−1

j

exp

  • β′D
  • s∗(rj)
  • wj

1 + exp

  • β′D
  • s∗(rj)

. Define β∗ =

  • β∗

0, β1, β2, β3

′ and θ∗ = [β∗′, γ′].

21/60

slide-29
SLIDE 29

A case control study

The EM algorithm

The first step of the EM algorithm is to evaluate Q∗,

Q∗(θ∗|θ∗t) = E{log[L(θ∗|y1, X1, X p, s1)]|y1, X p, θ∗t, s1} =

np

  • i=1
  • xi1∈R2 log[g(xi1, yi1, xp

i , θ∗)]fX1|y1,Xp,θ∗t (xi1|y1, X p, θ∗t) dxi1

+2log(τ(yi1)) − 2log(α(yi1, θ∗)) = Q(θ∗|θ∗t) + 2np1 [log(τ(1)) − log(α(1, θ∗))] +2np0 [log(1 − τ(1)) − log(1 − α(1, θ∗))] .

22/60

slide-30
SLIDE 30

A case control study

The EM algorithm

The second step in the EM algorithm is to maximize Q with respect to (w.r.t.) the unknown parameters θ∗. We are not able to separately maximize Q∗ w.r.t. β∗ and γ as we did for the cohort design because of α(1, θ∗) Instead, we use the Optimization Toolbox in MATLAB to maximize Q∗ w.r.t. [β1, β2, β3, γ′].

23/60

slide-31
SLIDE 31

A case control study

Variance estimation

Imiss|obs(ˆ θ) is exactly the same as in the cohort design. Icomplete(ˆ θ) must be modified slightly as follows,

Icomplete(ˆ θ) = −δ2Q(θ|θt) δθδθ′

  • θ=θt = ˆ

θ

+ δ2 2np1log(α(1, θ)) + 2np0log(1 − α(1, θ)

  • δθδθ′
  • θ= ˆ

θ 24/60

slide-32
SLIDE 32

A case control study

Simulation

β0 = −3.000 β1 = 0.100 β2 = 0.100 β3 = 0.100 Traditional MLE, n = 1000, Without Pooling (i.e. p = 1) Bias

  • 2.912
  • 0.002

0.002 0.000 RMSE 2.913 0.075 0.067 0.066 Power 0.069 0.307 0.324 0.334 MLE using the EM algorithm, n = 1000, p = 2, κ = 0.05 Bias 0.034 0.005

  • 0.002
  • 0.004

RMSE 0.051 0.079 0.065 0.091 Est SD 0.060 0.082 0.067 0.091 Obs SD 0.038 0.079 0.065 0.091 Power 1.000 0.193 0.337 0.202 MLE using the EM algorithm, n = 1000, p = 2, κ = 0.15 Bias

  • 1.175

0.003

  • 0.004
  • 0.004

RMSE 1.176 0.080 0.064 0.091 Est SD 0.064 0.082 0.067 0.093 Obs SD 0.037 0.080 0.064 0.091 Power 1.000 0.207 0.340 0.188 Traditional MLE, n = 500, Without Pooling (i.e. p = 1) Bias

  • 2.911
  • 0.002

0.000

  • 0.001

RMSE 2.912 0.104 0.094 0.093 Power 0.018 0.165 0.185 0.193

Table : γ = [1.0, 1.0, 0.02, 0.1, 0.5], and the true disease prevalence is 0.0516.

25/60

slide-33
SLIDE 33

Discussion

The EM algorithm is a useful tool when biospecimen have been measured in pools. Through simulation we showed that sampling half of the individuals compared to pooling pairs,

loses power in regards to the main effects, the power to test the interaction effect is approximately the same.

Future work:

Use Box-Cox transformation to allow more flexible distributions for the exposure. For pools larger than two individuals, use the Monte Carlo EM (MCEM) algorithm.

26/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-34
SLIDE 34

Chapter 2: Gene-environment interactions

27/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-35
SLIDE 35

Outline

Gene-environment interactions

Introduction Pooled GxE estimator The pooling process The pooled GxE estimator with measurement error Simulations Discussion

28/60

   1 2 p

slide-36
SLIDE 36

Introduction

Why are we interested in pooling for estimating GxE?

GxE are critical hypotheses in studying diseases with complex etiologies. Case-control interaction estimator is known to require a large sample size for reasonable power. Geneotyping a large number of individuals can be expensive and time consuming. Access to sufficient quantities of biospecimens from each individual may not be available.

29/60

slide-37
SLIDE 37

Pooled GxE estimator

Notation

D = binary disease status. E = binary environmental exposure. Ai = binary allele status, i ∈ {1, 2}, each individual has two copies of the allele. G = number of copies of the allele of interest, G = A1 + A2.

30/60

slide-38
SLIDE 38

Pooled GxE estimator

Our propose pooling strategy

D = 0 D = 1 E = 0 f00 f10 E = 1 f01 f11 Pool biospecimens within all levels of D and E. Perform R replicate measurements on each level (4R

  • verall measurements).

31/60

slide-39
SLIDE 39

Pooled GxE estimator

Model for D given A and E

Assume that the disease is related to the allele and environmental statuses by logit(Pr(D|A, E)) = α + βEE + βAA + βA×EAE. Therefore βA×E is the allele-environment interaction parameter.

32/60

slide-40
SLIDE 40

Pooled GxE estimator

Model for D given G and E

Assume independence between A1 and A2. The model for the disease given genotype and environmental status is logit(Pr(D|G, E)) = ˜ α + ˜ βEE + βAG + βA×EGE. The GxE parameter of interest, is equivalent to the allele-environment interaction (βG×E = βA×E).

33/60

slide-41
SLIDE 41

Pooled GxE estimator

The estimator!

The closed form MLE of βA×E is, ˜ βG×E(MLE) = ˜ γ11 − ˜ γ01 − ˜ γ10 + ˜ γ00 where ˜ γDE = log ˜

fDE 1−˜ fDE

  • , and ˜

fDE is the sample allele frequency.

34/60

slide-42
SLIDE 42

The pooling process

Three stages

1

Pool formation

2

PCR (polymerase chain reaction)

3

Estimation

35/60

slide-43
SLIDE 43

The pooling process

Estimation: as described in Germer et al. (2000)

  Frequency PCR cycle Threshold

t

C 

 

Each PCR amplification cycle doubles the frequency. The minor allele frequency can be estimated as, ˜ fDE = 1 1 + 2∆˜

CtDE ,

36/60

slide-44
SLIDE 44

The pooled GxE estimator with measurement error

The estimator and it’s estimated variance

Define ∆ˆ CtDE(k) to be the estimated cycle difference, ∆ˆ CtDE(k) = ∆˜ CtDE + ǫDE(k) The estimator for GxE is, ˆ βG×E = − log(2) R

R

  • k=1

∆ˆ Ct11(k)−∆ˆ Ct01(k)−∆ˆ Ct10(k)+∆ˆ Ct00(k) The variance of the GxE estimator is, ˆ var

  • ˆ

βG×E

  • =

1

  • D=0

1

  • E=0

1 R

R

  • k=1
  • 1

rDE(1 − ˆ fDE(k))ˆ fDE(k)

  • + 1

R (log(2))2s2

DE

37/60

slide-45
SLIDE 45

Simulations

Simulation parameters

We define Pr(A = 1) = 0.37, Pr(E = 1) = 0.25, and Pr(D|E, G) as follows, logit(Pr(D = 1|E, G)) = α + βEE + βGG + βG×EGE where α = −2.26, βE = 1.06, and βG = 0.18. We found ∆˜ CtDE and add measurement error using ∆ˆ CtDE(k) = ∆˜ CtDE + ǫDE(k), where ǫDE(k)

iid

∼ N(0, σ2

DE).

We defined R to be 3 and σ2

DE = 0.05.

38/60

slide-46
SLIDE 46

Simulations

Simulation results

βGXE Mean

  • ˆ

βGXE

  • Empirical

Mean

  • ˆ

var

  • ˆ

βGXE

  • Coverage

Variance Probability

  • f ˆ

βGXE (95% CI)

  • .5
  • 0.5017

0.0764 0.0759 0.9416

  • .3
  • 0.2996

0.0757 0.0757 0.9432 0.0023 0.0732 0.0756 0.9478 .3 0.2926 0.0744 0.0768 0.9508 .5 0.4915 0.0773 0.0785 0.9470

Table : Simulation results. The level for the empirical coverage probability is 0.05. The number of Monte Carlo replicates is 5000.

39/60

slide-47
SLIDE 47

Simulations

Simulation results

40/60

slide-48
SLIDE 48

Discussion

We proposed a powerful method to estimate a gene-environment interaction by pooling biospecimens. Benefits:

Pooling can greatly reduce the number of measurements. Pooling can save valuable biospecimens.

Limitations:

The disease and environment statuses, must be known before the pooling. We assume that A1 and A2 (i.e. the mother and father) are independent within disease and environmental status.

Although our method has limitations in its assumptions, it may be very useful in a pilot study.

41/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-49
SLIDE 49

Chapter 4: Minkowski-Weyl priors for models with parameter constraints

42/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-50
SLIDE 50

Outline

Minkowski-Weyl priors for models with parameter constraints

Introduction The Minkowski-Weyl Prior Prior Specification Posterior Computation and Simulation Analysis of the BioCycle Study Data Conclusions

43/60

slide-51
SLIDE 51

Introduction

Scientific Question

?

Hormonal Profiles Oxidative Stress

Associated

?

Profiles Stress

44/60

slide-52
SLIDE 52

Introduction

The BioCycle Study

Prospective cohort of 259 healthy menstruating women aged 18-44 from western New York. Women were provided with fertility monitors to schedule 8 clinic visits.

45/60

slide-53
SLIDE 53

The Minkowski-Weyl Prior

For illustration, let yi = µ + ǫi, ǫi

iid

∼ Np(0, Σ). Suppose there is prior knowledge that, M = {µ ∈ Rp : Aµ < b, µ > 0}. The likelihood is then given by, L(µ, Σ|y1, . . . , yn) =

n

  • i=1

Np(yi; µ, Σ)I(µ ∈ M). The covariance parameter Σ is taken to belong to C, the cone of all p × p positive definite matrices. Bayesian analysis would require one to specify a prior on the restricted set M × C.

46/60

slide-54
SLIDE 54

The Minkowski-Weyl Prior

Definition (Minkowski-Weyl) A set P in Rp is a convex polyhedral iff P = conv(v1, . . . , vk)

  • cone(u1, . . . , ul)

(1) for some vectors v1, . . . , vk, u1, . . . , ul. v1, . . . , vk denote the k extreme points of P. u1, . . . , ul denote the extreme directions of P. Any vector θ in P can be represented as θ = (α1v1 + · · · + αkvk) + (γ1u1 + · · · + γlul), where α = (α1, . . . , αk)′ belongs to the k-dimensional simplex and γ = (γ1, . . . , γl)′ is a vector of positive coefficients.

47/60

slide-55
SLIDE 55

The Minkowski-Weyl Prior

20

+ = ) , ( U V γ α γ α μ

16 18

⎥ ⎤ ⎢ ⎡ = 89 . 98 10 V

12 14

2

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎢ ⎣ 5 1 5 . 89 . 8 U V

8 10

μ2

(98.89,8.89)

⎥ ⎦ ⎢ ⎣ 5 . 1

4 6 20 40 60 80 100 120 2

(10,0) (0.5,0.5) (0,1) (0 0)

20 40 60 80 100 120

μ1

(10,0) (0,0)

48/60

slide-56
SLIDE 56

Prior Specification

We invoke a prior distribution for µ via prior distributions on α and γ. We assume that α, γ and Σ are a priori mutually independent, therefore, p(α, γ, Σ) = p(MW)(πα, πγ)p(Σ), α ∼ Dir(M, a1, . . . , ak), γ ∼

l

  • i=1

G(g1i, g2i), Σ ∼ IW(Ψ, κ).

49/60

slide-57
SLIDE 57

Posterior Computation and Simulation

Consider the case where the parameters are as follows, µ = 100.1 10.5

  • , Σ =
  • 5

0.2 0.2 5

  • ,

where prior knowledge restricts the mean, µ, to M. We generate the outcomes, yi

iid

∼ N2(µ, Σ) for i = 1, 2, . . . , n, n = 250. We assumed the following prior distributions, α ∼ Dir(M, a1, . . . , ak), γ ∼

l

  • i=1

G(1, 4), Σ ∼ IW(10I2, 10).

50/60

slide-58
SLIDE 58

Posterior Computation and Simulation

10 20 30 40 50 60 70 80 90 100 110 5 10 15 20 25 30

µ1 µ2

(a) M, a = (2, 0.45, 0.05, 0.5)

99 99.2 99.4 99.6 99.8 100 100.2 100.4 100.6 100.8 101 9.5 10 10.5 11 11.5

µ1 µ2

(b) M, a = (2, 0.45, 0.05, 0.5)

Figure : On the left is the prior density of the means, µ, when the Dirichlet distribution with parameters (M, a) is the prior for α. On the right is samples from the posterior distribution.

51/60

slide-59
SLIDE 59

Posterior Computation and Simulation

10 20 30 40 50 60 70 80 90 100 110 5 10 15 20 25 30

µ1 µ2

(a) M, a = (1, 0.05, 0.05, 0.9)

99 99.2 99.4 99.6 99.8 100 100.2 100.4 100.6 100.8 101 9.5 10 10.5 11 11.5

µ1 µ2

(b) M, a = (1, 0.05, 0.05, 0.9)

10 20 30 40 50 60 70 80 90 100 110 5 10 15 20 25 30

µ1 µ2

(c) M, a = (1, 0.9, 0.05, 0.05)

99 99.2 99.4 99.6 99.8 100 100.2 100.4 100.6 100.8 101 9.5 10 10.5 11 11.5

µ1 µ2

(d) M, a = (1, 0.9, 0.05, 0.05)

52/60

slide-60
SLIDE 60

Analysis of the BioCycle Study Data

We assume the following linear mixed model for i = 1, . . . , n, n = 250, h ∈ {E, L, F, P}, and t ∈ {1, . . . , 8} , yiht = µht + w′

itφh + ξih + ζit + ǫiht,

    ξiE ξiL ξiF ξiP    

iid

∼ N(0, DH),     ζi1 ζi2 . . . ζi8    

iid

∼ N(0, DT ), ǫi

iid

∼ N(0, σ2

eI)

53/60

slide-61
SLIDE 61

Analysis of the BioCycle Study Data

Let µ = (µE1, . . . , µE8, . . . , µP1, . . . , µP8)′, We partition µ = [µC, µU]′, and µC = [µCE, µCL, µCP]′. Transform µC to the positive space by defining a maximum such that, µC < K, where K = 10 × 19. The following is the constraint space for µC∗, where µC = K − µC∗, M = {µC∗ : AhµCh∗ < bh, µCh∗ > 0, h ∈ {E, L, P}}. Applying Minkowski decomposition of each partition of M, we find the corresponding V h and Uh matrices.

54/60

slide-62
SLIDE 62

Analysis of the BioCycle Study Data

Time varying covariate effects on hormones covariate hormone mean 90% credible int. log(beta- estrogen 0.000

  • 0.037 , 0.036

carotene) FSH 0.000

  • 0.032 , 0.032

LH

  • 0.015
  • 0.052 , 0.023

progesterone 0.048 0.001 , 0.095 log(γ-vitamin) estrogen 0.021

  • 0.014 , 0.056

FSH 0.012

  • 0.018 , 0.043

LH

  • 0.014
  • 0.050 , 0.021

progesterone

  • 0.011
  • 0.053 , 0.032

cholesterol estrogen 0.027

  • 0.013 , 0.066

FSH

  • 0.033
  • 0.068 , 0.001

LH 0.003

  • 0.038 , 0.043

progesterone 0.042

  • 0.010 , 0.094

homocysteine estrogen

  • 0.049
  • 0.088 , -0.010

FSH 0.038 0.006 , 0.070 LH 0.015

  • 0.025 , 0.055

progesterone

  • 0.002
  • 0.060 , 0.056

log(F2- estrogen 0.038 0.002 , 0.074 isoprostane) FSH

  • 0.032
  • 0.064 , 0.000

LH

  • 0.013
  • 0.049 , 0.024

progesterone 0.072 0.027 , 0.116

55/60

slide-63
SLIDE 63

Conclusions

We defined a new class of priors that permit full Bayesian analysis for models with linear inequality constraints. Analyzing the BioCycle data using MW prior provides new insight to the association between menstrual hormone levels and agents for oxidative stress. We are interested in extending exploring prior distributions for models with nonlinear inequality constraints.

56/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments

slide-64
SLIDE 64

Thank You

Thank You.

Funding: Intramural Research Training Award, Eunice Kennedy Shriver, National Institute of Child Health and Human Development

57/60

slide-65
SLIDE 65

Thank You

58/60

slide-66
SLIDE 66

Thank You

59/60

slide-67
SLIDE 67

Thank you for coming and listening! Questions?

60/60 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Acknowledgments