Pleiotropy Latent Variable Model Bayesian Model Simulations
Bayesian Latent Variable Modelling of Longitudinal Family Data for - - PowerPoint PPT Presentation
Bayesian Latent Variable Modelling of Longitudinal Family Data for - - PowerPoint PPT Presentation
Pleiotropy Latent Variable Model Bayesian Model Simulations Bayesian Latent Variable Modelling of Longitudinal Family Data for Genetic Pleiotropy Studies Radu Craiu Department of Statistics University of Toronto Joint with Lei Sun and
Pleiotropy Latent Variable Model Bayesian Model Simulations
Outline
Pleiotropy Latent Variable Model Data and Statistical Model Statistical Complications Bayesian Model Computational Complications A Parameter Expanded Model for Added Efficiency Variable Selection Simulations Simulation Study Application: GAW 18 Application: Type 1 Diabetes
Pleiotropy Latent Variable Model Bayesian Model Simulations
Pleiotropy
◮ For many complex human diseases, the trait of interest
(”state of disease”) is not directly observable (e.g. diabetes, hypertension, cardiovascular disease).
◮ Instead we observe a set of surrogate phenotypes (physical
manifestations of the disease) which may be continuous or discrete.
◮ These response variables (phenotypes or outcomes) are
mutually correlated as they measure the underlying trait from different perspectives.
◮ In order to increase statistical efficiency, it is desirable to
model these outcomes jointly.
◮ Many studies also involve repeated measures over time in
samples that include families/clusters ⇒ complex dependence structures in the data.
◮ We are considering here continuous and binary phenotypes.
Pleiotropy Latent Variable Model Bayesian Model Simulations
The Data and Model
◮ Let Ycit = (YccitT, YbcitT)T be the J × 1 vector of responses
(e.g. phenotypes) measured at the tth time on the ith individual from the cth family(or cluster) for c = 1, 2, ..., C, i = 1, 2, ..., Nc, t = 1, 2, ..., B, and j = 1, 2, ..., J, where C denotes the total number of families, Nc is the number of individuals within the cth family, B is the total number of repeated measurements and J is the total number of responses.
◮ The dependence patterns are approximated via random effects. ◮ The trait of interest is included as a latent variable Ucit in the
model.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Illustration of the Data Structure
2 t 1 U 11J 11 U t 12 Y 121 Y 122 12J Y Y Y 111 Y 112 dependence t U Y U t 2 1 21 22 211 21J Y Y Y 212 221 Y 222 Y 22J serial serial dependence dependence serial dependence serial dependence Cluster time time W 11 W12 X21 X 22 X11 X12
Pleiotropy Latent Variable Model Bayesian Model Simulations
The Statistical Model
◮ The latent variable model
Ucit = X T
citα + Z T citac + QT citdci + ǫcit,
(1) where ǫcit
iid
∼ N(0, ψ2).
◮ dci ∈ Rq2×1 represents the subject-specific random effects. ◮ ac are cluster-specific random effects. ◮ ac iid
∼ Nq1(0, ΣA), dci
iid
∼ Nq2(0, ΣD) and all random effects are independent of the ǫcit.
◮ We are particularly interested in the regression coefficient for
the SNP’s genotype.
◮ Pleiotropy is detected if the SNP’s genotype effect on U is
statistically significant.
Pleiotropy Latent Variable Model Bayesian Model Simulations
The Statistical Model
◮ The continuous response model
yc
citj = β0j + bcij + W T citβj + λjUcit + ecitj,
(2) where ecitj
iid
∼ N(0, σ2
j ), Wcit is a p1-dimensional vector of
direct effect covariates.
◮ The λ’s are the factor loadings that allow different
contributions of the latent variable to each phenotype.
◮ The random component bcij captures the family-specific
within-subject correlations over time. We assume bcij
iid
∼ N(0, τ 2
j ), and ecitj and bcij are mutually independent for
c = 1, ..., C, i = 1, ..., Nc, t = 1, ..., B and j = 1, ..., J.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Statistical Complications - Identifiability
◮ If K ∈ R then
yc
citj = β0j + W T citβj + λjK −1KUcit + bcij + ecitj,
(3)
◮ Without any restriction on λ or the variance of Ucit, an
infinite number of equivalent models can be created.
◮ We assume that:
◮ The variance of Ucit is equal to 1 and that λj is non-negative. ◮ The direct-effect covariates (Wcit) and the indirect-effect
covariates (Xcit, Zcit, Qcit) are distinct.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Statistical Complications - Effect of Ignoring Cluster Correlation
◮ Individuals from the same family are genetically related
resulting in correlation between their latent disease status.
◮ If familial dependence is ignored inference is biased. ◮ Consider the case of continuous only phenotypes and no
repeated measurements.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Statistical Complications - Effect of Ignoring Cluster Correlation
◮ Model 1 (correct):
ycij = β0j +W T
ci βj +λjUci +ecij, and Uci = X T ci α+Z T ci ac +ǫci,
where ecij ∼ N(0, σ2
j ) and ǫci ∼ N(0, 1), λj > 0 and
ac ∼ N(0, ΣA).
◮ Model 2 (misspecified):
yhj = β0j + W T
h βj +
λj Uh + ehj, and
- Uh = X T
h
α + ǫh.
◮ We can show that
λ2
j = (Z T ci ΣAZci + 1)λ2 j and
- α = λj
- λj
α = 1
- (Z T
ci ΣAZci + 1)
α < α.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Bayesian Model
◮ We consider a Bayesian framework for inference. ◮ If conditional conjugate priors are defined for the model
parameters Θ, then a standard Gibbs (SG) sampler can be used to analyze the posterior distribution.
◮ The implementation requires introducing the random effects
as latent variables/missing data. The set of all latent variables is denoted Ω.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Computational Complications: Torpid Mixing
◮ Due to high dependence between the components of the
Markov chain corresponding to the parameter vector Θ and the latent data vector Ω, we observe a very slow mixing of the chain.
◮ For instance, a small variance τ 2 j leads to small random effects
bcij and vice versa. Similar patterns develop between the factor loadings λj and the latent variable U.
◮ These lead to computational inefficiency.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Parameter Expansion for Increased Computational Efficiency
◮ Parameter Expansion/Auxiliary Variable methods have a long
tradition in MCMC (Besag and Green, JRSSB ’93; Higdon, JASA ’98; Liu and Wu, JASA ’99; van Dyk and Meng, JCGS ’01)
◮ These methods aim at eliminating ”bottlenecks” in simulation
experiments by expanding the parameter space or by introducing ”missing” data/latent variables in the model.
◮ However, the parameter expansion guidelines need to be
modified/adapted for each model.
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Continuous Outcomes
◮ Original model is
yc
citj = β0j + bcij + W T citβj + λjUcit + ecitj,
Ucit = X T
citα + Z T citac + QT citdci + ǫcit,
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Continuous Outcomes
◮ Introduce auxiliary parameters {ξj : 1 ≤ j ≤ J} and ψ and
reparametrise the model.
◮ Transformed model:
yc
citj = ξj
β0j ξj + ξj bcij ξj + W T
citβj + ψλj
Ucit ψ + ecitj, Ucit ψ = X T
cit
α ψ + Z T
cit
ac ψ + QT
cit
dci ψ + ǫcit ψ ,
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Continuous Outcomes
◮ Transformed model:
yc
citj = ξjb∗ cij + W T citβj + λ∗ j U∗ cit + ecitj,
U∗
cit = X T citα∗ + Z T cita∗ c + QT citd∗ ci + ǫ∗ cit.
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Continuous Outcomes
◮ b∗ cij ∼ N(β∗ 0j, τ ∗2 j ), a∗ c iid
∼ Nq1(0, Σ∗
a), d∗ ci iid
∼ Nq2(0, Σ∗
d), and
ǫ∗
cit iid
∼ N(0, ψ2).
◮ The conditional conjugate priors assigned to
θ∗ = (α∗, λ∗ . . . , ψ) impose particular priors on θ = (α, λ, . . .).
◮ The parametrization is redundant and the algorithm is not
efficient on the expanded state space, but it gains efficiency for the original set of parameters!
◮ The original parameters are recovered using
α = α∗/ψ, Ucit = U∗
cit/ψ,
ΣA = Σ∗
A/ψ2,
ΣD = Σ∗
D/ψ2,
λj = λ∗
j ψ,
βj0 = β∗
j0ξj,
τ 2
j = ξ2 j τ ∗2 j ,
for all 1 ≤ j ≤ J.
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Mixed Outcomes
◮ When the traits are mixed denote {yc citj : 1 ≤ j ≤ J1} the
continuous outcomes and {yb
citj : J1 + 1 ≤ j ≤ J} the binary
- nes.
◮ The probit model is expanded using the latent variables
yb
citj
so that yb
citj = 1(0,∞)(
yb
citj). ◮ The continuous response models are expanded as before.
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Mixed Outcomes
◮ Define
yb
citj such that yb citj = 1(0,∞)(
yb
citj) ◮ Start with the usual latent model for probit regression
- yb
citj = β0j + bcij + W T citβj + λjUcit + ǫ ◮ Use auxiliary parameters {γj : J1 + 1 ≤ j ≤ J}
γj yb
citj = γjξj
β0j ξj + γjξj bcij ξj + W T
citγjβj + γjλjUcit + γjǫ
Pleiotropy Latent Variable Model Bayesian Model Simulations
A Parameter Expanded Model - Mixed Outcomes
◮ And finally work with
- yb∗
citj = ˜
ξjb∗
cij + W T cit ˜
βj + ˜ λjUcit + ˜ ǫ
◮ The original parameters are recovered via βj =
βj/γj, λj = λj/γj, and ξj = ξj/γj, for all J1 + 1 ≤ j ≤ J .
Pleiotropy Latent Variable Model Bayesian Model Simulations
Variable Selection
◮ Of primary interest is the effect of a genetic marker on the
latent variable. Ucit = X T
citα + Z T citac + QT citdci + ǫcit. ◮ Of secondary interest is to determine whether the jth
phenotype is indeed related to the latent disease status (i.e. λj = 0 or not). yc
citj = β0j + bcij + W T citβj + λjUcit + ecitj.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Variable Selection
◮ We can use a spike-and-slab prior for λ∗ j ,
p(λ∗
j |ωj) = ωj1{0}(λ∗ j ) + (1 − ωj)TN+(λ∗ j |0, 1)
and p(ωj) = Beta(a, b). The relevance of the jth phenotype is based on P(λj > 0|Y).
◮ We can consider comparing two models (almost identical, but
- ne imposes λj = 0) via Bayes factor.
◮ Compare the two models via Deviance Information Criterion
(DIC).
Pleiotropy Latent Variable Model Bayesian Model Simulations
Simulation Scenarios
M1 All responses are continuous. We let J = 3 and set β0 = (5, 5, 5), β11 = β12 = β13 = 1, α1 = −1, α2 = 1, λ = (2, 3, 4), τ 2 = (0.3, 0.5, 1), σ2
1 = σ2 2 = σ2 3 = 3, σD = 0.1
and ΣA = 0.3I2. M2 We set J = 4 and we simulate y1, y2 continuous and y3, y4 binary responses. We set β0 = (5, 3, 1, 1), β1j = 1 for all j = 1, . . . , 4, α1 = −1, α2 = −1, λ = (2, 3, 2, 2), τ 2 = (0.6, 0.6, 0.6, 0.6), σ2
1 = σ2 2 = 1, σD = 0.1 and
ΣA = 0.3I2. M3 The simulation model parameters were chosen to mimic the hypertension data example. We set J = 3 with y1 and y2 continuous, and y3 binary. The parameter values were: β0 = (100, 60, 1.0), β1j = 1 for all j = 1, . . . , 3, α1 = −1, α2 = −1, λ = (6, 4, 0.5), τ 2 = (10, 7, 0.2),σ2
1 = 25, σ2 2 = 20,
σD = 0.1 and ΣA = 0.3I2.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Trace plots for M2: λ1
5000 10000 15000 20000 25000 1.80 1.85 1.90 1.95 2.00
λ1
iterations
M2: T race plot for SG sample of λ1
5000 10000 15000 20000 25000 1.85 1.90 1.95 2.00 2.05 2.10
λ1
iterations
M2: T race plot for PX2 − HC sample of λ1
Pleiotropy Latent Variable Model Bayesian Model Simulations
ACF plots for M2: λ1 − λ3
10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M2: ACF plots for SG and PX2 − HC sample of λ1
SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M2: ACF plots for SG and PX2 − HC sample of λ2
SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M2: ACF plots for SG and PX2 − HC sample of λ3
SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M2: ACF plots for SG and PX2 − HC sample of λ4
SG PX2 − HC
Pleiotropy Latent Variable Model Bayesian Model Simulations
ACF plots for M3: λ1 − λ3
10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M3: ACF plots for SG and PX2 − HC sample of λ1
SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M3: ACF plots for SG and PX2 − HC sample of λ2
SG PX2 − HC 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF
M3: ACF plots for SG and PX2 − HC sample of λ3
SG PX2 − HC
Pleiotropy Latent Variable Model Bayesian Model Simulations
M1: Simulation Results
Parameters True RMSE ESS Value Reduction (%) Increase (%) α α1
- 1.0
43 578 α2 1.0 14 66 λ λ1 2.0 49 1100 λ2 3.0 62 933 λ3 4.01 62 1019 ΣA (ΣA)11 0.3 46 41 (ΣA)22 0.3 58 80 σD (σD) 0.1 76 90
Pleiotropy Latent Variable Model Bayesian Model Simulations
M2: Simulation Results
Parameters True RMSE ESS Value Reduction (%) Increase (%) α α1
- 1.0
45 1088 α2 1.0 14 52 λ λ1 2.0 63 2179 λ2 3.0 62 1897 λ3 2.0 7 79 ΣA (ΣA)11 0.3 39 57 (ΣA)22 0.3 54 146 σD σD 0.1 73 95
Pleiotropy Latent Variable Model Bayesian Model Simulations
M3: Simulation Results
Parameters True RMSE ESS Value Reduction (%) Increase (%) β0 β01 100 69 72 β02 60.0 65 88 β03 1.0 31 139 α α1
- 1.0
57 269 α2 1.0 30 80 λ λ1 6.0 72 258 λ2 4.0 66 401 λ3 0.5 24 178 ΣA (ΣA)11 0.3 44 33 (ΣA)22 0.3 67 73 ΣD (ΣD)11 0.1 85 87
Pleiotropy Latent Variable Model Bayesian Model Simulations
Identification of Pleiotropic Genetic Marker
◮ We simulate five indirect fixed-effect covariates X1, . . . , X5. ◮ We assume that X3 is the genotype of the marker of interest
and the remaining ones are standardized continuous variables.
◮ Coefficients (α1, α2, α3, α4, α5) quantify the effects of the five
covariates which are set to be (1.0, −0.5, 0.2, 0, 0).
Pleiotropy Latent Variable Model Bayesian Model Simulations
Identification of Pleiotropic Genetic Marker via BF
Table: The estimated logBF for identifying whether α = 0 that quantifies the association between the covariate Xj and the latent variable.
Covariates with effect on the latent variable X1 X2 X3 (genotype) X4 X5 True α 1.0
- 0.5
0.2
- logBF 337.53 94.99
6.30
- 1.94
- 1.95
SD 33.25 11.19 2.39 0.46 0.42
Pleiotropy Latent Variable Model Bayesian Model Simulations
Selection of Relevant Phenotypes via BF
Table: The estimated logBF for testing the factor loading λj that quantifies the association between phenotype yj and the latent variable.
Continuous Phenotypes Binary Phenotypes y1 y2 y3 y4 y5 y6 y7 True λj 0.5 0.05 0.02 0.2 0.01
- logBF 121.60 5.91 -1.76
- 2.67
22.11 -2.43
- 2.43
SD 6.47 2.22 1.05 0.47 8.19 0.32 0.13
Pleiotropy Latent Variable Model Bayesian Model Simulations
GAW18: Genetic study of Hypertension
◮ Data included genotypes from a real human whole genome
sequencing study (N = 483 individuals) and systolic and diastolic blood pressure phenotypes plus age, sex, medication use and cigarette smoking.
◮ The data were longitudinal, with three measurements for most
participants at roughly 5-year intervals.
◮ Among the 464 individuals, 396 individuals have at least one
blood pressure measures (90 have only one, 78 have two, 131 have three and 97 have four measurements).
◮ The length of time between two consecutive measurements
ranges from 3 to 9 years, and the number of family members varies from 11 to 36.
Pleiotropy Latent Variable Model Bayesian Model Simulations
GAW18: Genetic study of Hypertension
◮ We focused on a set of six SNPs that had been reported to be
significantly associated with either DBP or the binary hypertension trait
◮ We applied the Bayesian LVM method to analyze one SNP at
a time assuming an additive genetic model.
◮ The phenotypes are SBP and DBP, and the covariates include
the genotype of the SNP, age and sex.
Pleiotropy Latent Variable Model Bayesian Model Simulations
GAW18: Results for SNP rs9816772
- rs9816772 had been identified to be associated with DBP.
Parameter Estimate logBF 95% HpdI SBP λ1 13.15 255.3 (12.19, 14.11) DBP λ2 7.60 139.6 (7.01, 8.14) Sex for SBP β11
- 0.66
- 0.074
(-2.12, 0.81) Sex for DBP β21
- 1.79
2.017 (-2.92, -0.65) rs9816772 α1
- 0.045
- 0.653
(-0.208, 0.124) Age α2 0.043 126.53 (0.036, 0.049)
Pleiotropy Latent Variable Model Bayesian Model Simulations
Genetic study of type 1 diabetes (T1D) complications.
◮ The study sample consists of n = 1300 individuals with T1D
from the Diabetes Control and Complications Trial (DCCT)
◮ Various phenotypes thought be to related to T1D
complication severity, including glycosylated hemoglobin (HbA1c) and diastolic (DBP) and systolic blood pressure (SBP). We define hyperglycaemia HPG = 1(HbA1C > 8)
◮ Previous studies have identified rs7842868 on chromosome 8
as a SNP significantly associated with DBP.
◮ Our goal here is to formally perform a multi-phenotype
analysis, jointly analyzing the measured manifest variables using the proposed Bayesian LVM methodology. This approach allows us not only to determine if rs7842868 is associated with the latent conceptual T1D complication variable, but also to test if DBP and SBP are truly related to the LV.
Pleiotropy Latent Variable Model Bayesian Model Simulations
Genetic study of type 1 diabetes (T1D) complications
Analysis of SNP rs7842868 Parameter Estimate 95% HpdI
- logBF
SBP λ1 6.621 (6.153, 7.077) 114.85 DBP λ2 3.842 (3.566, 4.110) 112.98 HPG λ3 0.011 ( 2.19
107 , 2.98 102 )
- 1.05
rs7842868 α1
- 0.269
(-0.372, -0.164) 10.06 sex α2
- 0.721
(-0.866, -0.584) 62.27 cohort α3 0.443 ( 0.299, 0.585) 20.15 treatment α4 0.128 (-0.004, 0.263) 0.366
Pleiotropy Latent Variable Model Bayesian Model Simulations