Bayesian Latent Variable Modelling of Longitudinal Family Data for - - PowerPoint PPT Presentation

bayesian latent variable modelling of longitudinal family
SMART_READER_LITE
LIVE PREVIEW

Bayesian Latent Variable Modelling of Longitudinal Family Data for - - PowerPoint PPT Presentation

Pleiotropy Latent Variable Model Bayesian Model Simulations Bayesian Latent Variable Modelling of Longitudinal Family Data for Genetic Pleiotropy Studies Radu Craiu Department of Statistics University of Toronto Joint with Lei Sun and


slide-1
SLIDE 1

Pleiotropy Latent Variable Model Bayesian Model Simulations

Bayesian Latent Variable Modelling of Longitudinal Family Data for Genetic Pleiotropy Studies

Radu Craiu

Department of Statistics University of Toronto Joint with Lei Sun and Lizhen Xu (Toronto)

McGill University November, 2013

slide-2
SLIDE 2

Pleiotropy Latent Variable Model Bayesian Model Simulations

Outline

Pleiotropy Latent Variable Model Data and Statistical Model Statistical Complications Bayesian Model Computational Complications A Parameter Expanded Model for Added Efficiency Variable Selection Simulations Simulation Study Application: GAW 18 Application: Type 1 Diabetes

slide-3
SLIDE 3

Pleiotropy Latent Variable Model Bayesian Model Simulations

Pleiotropy

◮ For many complex human diseases, the trait of interest

(”state of disease”) is not directly observable (e.g. diabetes, hypertension, cardiovascular disease).

◮ Instead we observe a set of surrogate phenotypes (physical

manifestations of the disease) which may be continuous or discrete.

◮ These response variables (phenotypes or outcomes) are

mutually correlated as they measure the underlying trait from different perspectives.

◮ In order to increase statistical efficiency, it is desirable to

model these outcomes jointly.

◮ Many studies also involve repeated measures over time in

samples that include families/clusters ⇒ complex dependence structures in the data.

◮ We are considering here continuous and binary phenotypes.

slide-4
SLIDE 4

Pleiotropy Latent Variable Model Bayesian Model Simulations

The Data and Model

◮ Let Ycit = (YccitT, YbcitT)T be the J × 1 vector of responses

(e.g. phenotypes) measured at the tth time on the ith individual from the cth family(or cluster) for c = 1, 2, ..., C, i = 1, 2, ..., Nc, t = 1, 2, ..., B, and j = 1, 2, ..., J, where C denotes the total number of families, Nc is the number of individuals within the cth family, B is the total number of repeated measurements and J is the total number of responses.

◮ The dependence patterns are approximated via random effects. ◮ The trait of interest is included as a latent variable Ucit in the

model.

slide-5
SLIDE 5

Pleiotropy Latent Variable Model Bayesian Model Simulations

Illustration of the Data Structure

2 t 1 U 11J 11 U t 12 Y 121 Y 122 12J Y Y Y 111 Y 112 dependence t U Y U t 2 1 21 22 211 21J Y Y Y 212 221 Y 222 Y 22J serial serial dependence dependence serial dependence serial dependence Cluster time time W 11 W12 X21 X 22 X11 X12

slide-6
SLIDE 6

Pleiotropy Latent Variable Model Bayesian Model Simulations

The Statistical Model

◮ The latent variable model

Ucit = X T

citα + Z T citac + QT citdci + ǫcit,

(1) where ǫcit

iid

∼ N(0, ψ2).

◮ dci ∈ Rq2×1 represents the subject-specific random effects. ◮ ac are cluster-specific random effects. ◮ ac iid

∼ Nq1(0, ΣA), dci

iid

∼ Nq2(0, ΣD) and all random effects are independent of the ǫcit.

◮ We are particularly interested in the regression coefficient for

the SNP’s genotype.

◮ Pleiotropy is detected if the SNP’s genotype effect on U is

statistically significant.

slide-7
SLIDE 7

Pleiotropy Latent Variable Model Bayesian Model Simulations

The Statistical Model

◮ The continuous response model

yc

citj = β0j + bcij + W T citβj + λjUcit + ecitj,

(2) where ecitj

iid

∼ N(0, σ2

j ), Wcit is a p1-dimensional vector of

direct effect covariates.

◮ The λ’s are the factor loadings that allow different

contributions of the latent variable to each phenotype.

◮ The random component bcij captures the family-specific

within-subject correlations over time. We assume bcij

iid

∼ N(0, τ 2

j ), and ecitj and bcij are mutually independent for

c = 1, ..., C, i = 1, ..., Nc, t = 1, ..., B and j = 1, ..., J.

slide-8
SLIDE 8

Pleiotropy Latent Variable Model Bayesian Model Simulations

Statistical Complications - Identifiability

◮ If K ∈ R then

yc

citj = β0j + W T citβj + λjK −1KUcit + bcij + ecitj,

(3)

◮ Without any restriction on λ or the variance of Ucit, an

infinite number of equivalent models can be created.

◮ We assume that:

◮ The variance of Ucit is equal to 1 and that λj is non-negative. ◮ The direct-effect covariates (Wcit) and the indirect-effect

covariates (Xcit, Zcit, Qcit) are distinct.

slide-9
SLIDE 9

Pleiotropy Latent Variable Model Bayesian Model Simulations

Statistical Complications - Effect of Ignoring Cluster Correlation

◮ Individuals from the same family are genetically related

resulting in correlation between their latent disease status.

◮ If familial dependence is ignored inference is biased. ◮ Consider the case of continuous only phenotypes and no

repeated measurements.

slide-10
SLIDE 10

Pleiotropy Latent Variable Model Bayesian Model Simulations

Statistical Complications - Effect of Ignoring Cluster Correlation

◮ Model 1 (correct):

ycij = β0j +W T

ci βj +λjUci +ecij, and Uci = X T ci α+Z T ci ac +ǫci,

where ecij ∼ N(0, σ2

j ) and ǫci ∼ N(0, 1), λj > 0 and

ac ∼ N(0, ΣA).

◮ Model 2 (misspecified):

yhj = β0j + W T

h βj +

λj Uh + ehj, and

  • Uh = X T

h

α + ǫh.

◮ We can show that

λ2

j = (Z T ci ΣAZci + 1)λ2 j and

  • α = λj
  • λj

α = 1

  • (Z T

ci ΣAZci + 1)

α < α.

slide-11
SLIDE 11

Pleiotropy Latent Variable Model Bayesian Model Simulations

Bayesian Model

◮ We consider a Bayesian framework for inference. ◮ If conditional conjugate priors are defined for the model

parameters Θ, then a standard Gibbs (SG) sampler can be used to analyze the posterior distribution.

◮ The implementation requires introducing the random effects

as latent variables/missing data. The set of all latent variables is denoted Ω.

slide-12
SLIDE 12

Pleiotropy Latent Variable Model Bayesian Model Simulations

Computational Complications: Torpid Mixing

◮ Due to high dependence between the components of the

Markov chain corresponding to the parameter vector Θ and the latent data vector Ω, we observe a very slow mixing of the chain.

◮ For instance, a small variance τ 2 j leads to small random effects

bcij and vice versa. Similar patterns develop between the factor loadings λj and the latent variable U.

◮ These lead to computational inefficiency.

slide-13
SLIDE 13

Pleiotropy Latent Variable Model Bayesian Model Simulations

Parameter Expansion for Increased Computational Efficiency

◮ Parameter Expansion/Auxiliary Variable methods have a long

tradition in MCMC (Besag and Green, JRSSB ’93; Higdon, JASA ’98; Liu and Wu, JASA ’99; van Dyk and Meng, JCGS ’01)

◮ These methods aim at eliminating ”bottlenecks” in simulation

experiments by expanding the parameter space or by introducing ”missing” data/latent variables in the model.

◮ However, the parameter expansion guidelines need to be

modified/adapted for each model.

slide-14
SLIDE 14

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Continuous Outcomes

◮ Original model is

yc

citj = β0j + bcij + W T citβj + λjUcit + ecitj,

Ucit = X T

citα + Z T citac + QT citdci + ǫcit,

slide-15
SLIDE 15

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Continuous Outcomes

◮ Introduce auxiliary parameters {ξj : 1 ≤ j ≤ J} and ψ and

reparametrise the model.

◮ Transformed model:

yc

citj = ξj

β0j ξj + ξj bcij ξj + W T

citβj + ψλj

Ucit ψ + ecitj, Ucit ψ = X T

cit

α ψ + Z T

cit

ac ψ + QT

cit

dci ψ + ǫcit ψ ,

slide-16
SLIDE 16

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Continuous Outcomes

◮ Transformed model:

yc

citj = ξjb∗ cij + W T citβj + λ∗ j U∗ cit + ecitj,

U∗

cit = X T citα∗ + Z T cita∗ c + QT citd∗ ci + ǫ∗ cit.

slide-17
SLIDE 17

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Continuous Outcomes

◮ b∗ cij ∼ N(β∗ 0j, τ ∗2 j ), a∗ c iid

∼ Nq1(0, Σ∗

a), d∗ ci iid

∼ Nq2(0, Σ∗

d), and

ǫ∗

cit iid

∼ N(0, ψ2).

◮ The conditional conjugate priors assigned to

θ∗ = (α∗, λ∗ . . . , ψ) impose particular priors on θ = (α, λ, . . .).

◮ The parametrization is redundant and the algorithm is not

efficient on the expanded state space, but it gains efficiency for the original set of parameters!

◮ The original parameters are recovered using

α = α∗/ψ, Ucit = U∗

cit/ψ,

ΣA = Σ∗

A/ψ2,

ΣD = Σ∗

D/ψ2,

λj = λ∗

j ψ,

βj0 = β∗

j0ξj,

τ 2

j = ξ2 j τ ∗2 j ,

for all 1 ≤ j ≤ J.

slide-18
SLIDE 18

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Mixed Outcomes

◮ When the traits are mixed denote {yc citj : 1 ≤ j ≤ J1} the

continuous outcomes and {yb

citj : J1 + 1 ≤ j ≤ J} the binary

  • nes.

◮ The probit model is expanded using the latent variables

yb

citj

so that yb

citj = 1(0,∞)(

yb

citj). ◮ The continuous response models are expanded as before.

slide-19
SLIDE 19

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Mixed Outcomes

◮ Define

yb

citj such that yb citj = 1(0,∞)(

yb

citj) ◮ Start with the usual latent model for probit regression

  • yb

citj = β0j + bcij + W T citβj + λjUcit + ǫ ◮ Use auxiliary parameters {γj : J1 + 1 ≤ j ≤ J}

γj yb

citj = γjξj

β0j ξj + γjξj bcij ξj + W T

citγjβj + γjλjUcit + γjǫ

slide-20
SLIDE 20

Pleiotropy Latent Variable Model Bayesian Model Simulations

A Parameter Expanded Model - Mixed Outcomes

◮ And finally work with

  • yb∗

citj = ˜

ξjb∗

cij + W T cit ˜

βj + ˜ λjUcit + ˜ ǫ

◮ The original parameters are recovered via βj =

βj/γj, λj = λj/γj, and ξj = ξj/γj, for all J1 + 1 ≤ j ≤ J .

slide-21
SLIDE 21

Pleiotropy Latent Variable Model Bayesian Model Simulations

Variable Selection

◮ Of primary interest is the effect of a genetic marker on the

latent variable. Ucit = X T

citα + Z T citac + QT citdci + ǫcit. ◮ Of secondary interest is to determine whether the jth

phenotype is indeed related to the latent disease status (i.e. λj = 0 or not). yc

citj = β0j + bcij + W T citβj + λjUcit + ecitj.

slide-22
SLIDE 22

Pleiotropy Latent Variable Model Bayesian Model Simulations

Variable Selection

◮ We can use a spike-and-slab prior for λ∗ j ,

p(λ∗

j |ωj) = ωj1{0}(λ∗ j ) + (1 − ωj)TN+(λ∗ j |0, 1)

and p(ωj) = Beta(a, b). The relevance of the jth phenotype is based on P(λj > 0|Y).

◮ We can consider comparing two models (almost identical, but

  • ne imposes λj = 0) via Bayes factor.

◮ Compare the two models via Deviance Information Criterion

(DIC).

slide-23
SLIDE 23

Pleiotropy Latent Variable Model Bayesian Model Simulations

Simulation Scenarios

M1 All responses are continuous. We let J = 3 and set β0 = (5, 5, 5), β11 = β12 = β13 = 1, α1 = −1, α2 = 1, λ = (2, 3, 4), τ 2 = (0.3, 0.5, 1), σ2

1 = σ2 2 = σ2 3 = 3, σD = 0.1

and ΣA = 0.3I2. M2 We set J = 4 and we simulate y1, y2 continuous and y3, y4 binary responses. We set β0 = (5, 3, 1, 1), β1j = 1 for all j = 1, . . . , 4, α1 = −1, α2 = −1, λ = (2, 3, 2, 2), τ 2 = (0.6, 0.6, 0.6, 0.6), σ2

1 = σ2 2 = 1, σD = 0.1 and

ΣA = 0.3I2. M3 The simulation model parameters were chosen to mimic the hypertension data example. We set J = 3 with y1 and y2 continuous, and y3 binary. The parameter values were: β0 = (100, 60, 1.0), β1j = 1 for all j = 1, . . . , 3, α1 = −1, α2 = −1, λ = (6, 4, 0.5), τ 2 = (10, 7, 0.2),σ2

1 = 25, σ2 2 = 20,

σD = 0.1 and ΣA = 0.3I2.

slide-24
SLIDE 24

Pleiotropy Latent Variable Model Bayesian Model Simulations

Trace plots for M2: λ1

5000 10000 15000 20000 25000 1.80 1.85 1.90 1.95 2.00

λ1

iterations

M2: T race plot for SG sample of λ1

5000 10000 15000 20000 25000 1.85 1.90 1.95 2.00 2.05 2.10

λ1

iterations

M2: T race plot for PX2 − HC sample of λ1

slide-25
SLIDE 25

Pleiotropy Latent Variable Model Bayesian Model Simulations

ACF plots for M2: λ1 − λ3

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M2: ACF plots for SG and PX2 − HC sample of λ1

SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M2: ACF plots for SG and PX2 − HC sample of λ2

SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M2: ACF plots for SG and PX2 − HC sample of λ3

SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M2: ACF plots for SG and PX2 − HC sample of λ4

SG PX2 − HC

slide-26
SLIDE 26

Pleiotropy Latent Variable Model Bayesian Model Simulations

ACF plots for M3: λ1 − λ3

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M3: ACF plots for SG and PX2 − HC sample of λ1

SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M3: ACF plots for SG and PX2 − HC sample of λ2

SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF

M3: ACF plots for SG and PX2 − HC sample of λ3

SG PX2 − HC

slide-27
SLIDE 27

Pleiotropy Latent Variable Model Bayesian Model Simulations

M1: Simulation Results

Parameters True RMSE ESS Value Reduction (%) Increase (%) α α1

  • 1.0

43 578 α2 1.0 14 66 λ λ1 2.0 49 1100 λ2 3.0 62 933 λ3 4.01 62 1019 ΣA (ΣA)11 0.3 46 41 (ΣA)22 0.3 58 80 σD (σD) 0.1 76 90

slide-28
SLIDE 28

Pleiotropy Latent Variable Model Bayesian Model Simulations

M2: Simulation Results

Parameters True RMSE ESS Value Reduction (%) Increase (%) α α1

  • 1.0

45 1088 α2 1.0 14 52 λ λ1 2.0 63 2179 λ2 3.0 62 1897 λ3 2.0 7 79 ΣA (ΣA)11 0.3 39 57 (ΣA)22 0.3 54 146 σD σD 0.1 73 95

slide-29
SLIDE 29

Pleiotropy Latent Variable Model Bayesian Model Simulations

M3: Simulation Results

Parameters True RMSE ESS Value Reduction (%) Increase (%) β0 β01 100 69 72 β02 60.0 65 88 β03 1.0 31 139 α α1

  • 1.0

57 269 α2 1.0 30 80 λ λ1 6.0 72 258 λ2 4.0 66 401 λ3 0.5 24 178 ΣA (ΣA)11 0.3 44 33 (ΣA)22 0.3 67 73 ΣD (ΣD)11 0.1 85 87

slide-30
SLIDE 30

Pleiotropy Latent Variable Model Bayesian Model Simulations

Identification of Pleiotropic Genetic Marker

◮ We simulate five indirect fixed-effect covariates X1, . . . , X5. ◮ We assume that X3 is the genotype of the marker of interest

and the remaining ones are standardized continuous variables.

◮ Coefficients (α1, α2, α3, α4, α5) quantify the effects of the five

covariates which are set to be (1.0, −0.5, 0.2, 0, 0).

slide-31
SLIDE 31

Pleiotropy Latent Variable Model Bayesian Model Simulations

Identification of Pleiotropic Genetic Marker via BF

Table: The estimated logBF for identifying whether α = 0 that quantifies the association between the covariate Xj and the latent variable.

Covariates with effect on the latent variable X1 X2 X3 (genotype) X4 X5 True α 1.0

  • 0.5

0.2

  • logBF 337.53 94.99

6.30

  • 1.94
  • 1.95

SD 33.25 11.19 2.39 0.46 0.42

slide-32
SLIDE 32

Pleiotropy Latent Variable Model Bayesian Model Simulations

Selection of Relevant Phenotypes via BF

Table: The estimated logBF for testing the factor loading λj that quantifies the association between phenotype yj and the latent variable.

Continuous Phenotypes Binary Phenotypes y1 y2 y3 y4 y5 y6 y7 True λj 0.5 0.05 0.02 0.2 0.01

  • logBF 121.60 5.91 -1.76
  • 2.67

22.11 -2.43

  • 2.43

SD 6.47 2.22 1.05 0.47 8.19 0.32 0.13

slide-33
SLIDE 33

Pleiotropy Latent Variable Model Bayesian Model Simulations

GAW18: Genetic study of Hypertension

◮ Data included genotypes from a real human whole genome

sequencing study (N = 483 individuals) and systolic and diastolic blood pressure phenotypes plus age, sex, medication use and cigarette smoking.

◮ The data were longitudinal, with three measurements for most

participants at roughly 5-year intervals.

◮ Among the 464 individuals, 396 individuals have at least one

blood pressure measures (90 have only one, 78 have two, 131 have three and 97 have four measurements).

◮ The length of time between two consecutive measurements

ranges from 3 to 9 years, and the number of family members varies from 11 to 36.

slide-34
SLIDE 34

Pleiotropy Latent Variable Model Bayesian Model Simulations

GAW18: Genetic study of Hypertension

◮ We focused on a set of six SNPs that had been reported to be

significantly associated with either DBP or the binary hypertension trait

◮ We applied the Bayesian LVM method to analyze one SNP at

a time assuming an additive genetic model.

◮ The phenotypes are SBP and DBP, and the covariates include

the genotype of the SNP, age and sex.

slide-35
SLIDE 35

Pleiotropy Latent Variable Model Bayesian Model Simulations

GAW18: Results for SNP rs9816772

  • rs9816772 had been identified to be associated with DBP.

Parameter Estimate logBF 95% HpdI SBP λ1 13.15 255.3 (12.19, 14.11) DBP λ2 7.60 139.6 (7.01, 8.14) Sex for SBP β11

  • 0.66
  • 0.074

(-2.12, 0.81) Sex for DBP β21

  • 1.79

2.017 (-2.92, -0.65) rs9816772 α1

  • 0.045
  • 0.653

(-0.208, 0.124) Age α2 0.043 126.53 (0.036, 0.049)

slide-36
SLIDE 36

Pleiotropy Latent Variable Model Bayesian Model Simulations

Genetic study of type 1 diabetes (T1D) complications.

◮ The study sample consists of n = 1300 individuals with T1D

from the Diabetes Control and Complications Trial (DCCT)

◮ Various phenotypes thought be to related to T1D

complication severity, including glycosylated hemoglobin (HbA1c) and diastolic (DBP) and systolic blood pressure (SBP). We define hyperglycaemia HPG = 1(HbA1C > 8)

◮ Previous studies have identified rs7842868 on chromosome 8

as a SNP significantly associated with DBP.

◮ Our goal here is to formally perform a multi-phenotype

analysis, jointly analyzing the measured manifest variables using the proposed Bayesian LVM methodology. This approach allows us not only to determine if rs7842868 is associated with the latent conceptual T1D complication variable, but also to test if DBP and SBP are truly related to the LV.

slide-37
SLIDE 37

Pleiotropy Latent Variable Model Bayesian Model Simulations

Genetic study of type 1 diabetes (T1D) complications

Analysis of SNP rs7842868 Parameter Estimate 95% HpdI

  • logBF

SBP λ1 6.621 (6.153, 7.077) 114.85 DBP λ2 3.842 (3.566, 4.110) 112.98 HPG λ3 0.011 ( 2.19

107 , 2.98 102 )

  • 1.05

rs7842868 α1

  • 0.269

(-0.372, -0.164) 10.06 sex α2

  • 0.721

(-0.866, -0.584) 62.27 cohort α3 0.443 ( 0.299, 0.585) 20.15 treatment α4 0.128 (-0.004, 0.263) 0.366

slide-38
SLIDE 38

Pleiotropy Latent Variable Model Bayesian Model Simulations

References

◮ Liu, J. S., Wu, Y. N. (1999) “Parameter Expansion for Data

Augmentation.,” Journal of the American Statistical Association, 94, 1264–1274.

◮ Hobert, J. P., Marchev, D. (2008), “A theoretical comparison of the

data augmentation, marginal augmentation and PX-DA algorithms,” Ann. Statist., 36, 532–554.

◮ Ghosh, J., Dunson, D. B. (2009), “Default Prior Distributions and

Efficient Posterior Computation in Bayesian Factor Analysis.,” Journal of Computational and Graphical Statistics, 18(2), 306–320.

◮ Lizhen Xu‡, Radu V. Craiu, Andriy Derkach, Andrew Paterson and

Lei Sun (2013). Using a Bayesian Latent Variable Approach to Detect Pleiotropy in the Genetic Analysis Workshop 18 Data. BMC Proceedings.

◮ Lizhen Xu‡, Radu V. Craiu, Lei Sun and Andrew Paterson (2012).

Parameter expanded Algorithms for Bayesian Latent Variable Modeling of Genetic Pleiotropy Data. Submitted to Journal of Computational and Graphical Statistics.