SLIDE 1
Marianne Jonker
VU University Amsterdam, The Netherlands.
A FRAILTY MODEL FOR CENSORED FAMILY SURVIVAL DATA,
applied to the age at onset of mental problems
Joint work with: D.I Boomsma
SLIDE 2 Motivation
Observations from the Netherlands Twin Register: For a study on health and personality, longitudinal data were collected
- f twins and their sibs at different time points between 1991 and 2002.
At every timepoint it is observed whether the individual ever had contacted social service before.
SLIDE 3 Motivation
Observations from the Netherlands Twin Register: For a study on health and personality, longitudinal data were collected
- f twins and their sibs at different time points between 1991 and 2002.
At every timepoint it is observed whether the individual ever had contacted social service before. Aims:
- to estimate the degree of heritability and environmental effects of
the age at which individuals contact social service for non-physical problems for the first time,
- to test if the survival functions for twins and non-twins differ.
SLIDE 4 Heritability
Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among the individuals in that population. Phenotypic variation may be due to genetic and environmental factors. If a trait has a heritability of 0.5, this means that the phenotypic variation due to genetic variation is 50%. It does not imply that the trait is caused by genetics for 50%. Example: body height. Heritability of “height” in Western society is approximately ..%. Nowadays, environmental effects are very homogenous among these
- populations. Decades ago, the heriatabilty of height was probably
much lower.
SLIDE 5 Heritability
Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among the individuals in that population. Phenotypic variation may be due to genetic and environmental factors. If a trait has a heritability of 0.5, this means that the phenotypic variation due to genetic variation is 50%. It does not imply that the trait is caused by genetics for 50%. Example: body height Heritability of “height” in Western society is approximately 80%. Nowadays, environmental effects are very homogenous among these
- populations. Decades ago, the heriatabilty of height was probably
much lower.
SLIDE 6 Body height
Heritability is implied if mono-zygotic twins are more similar to each
- ther than dizygotic twins for the trait under study.
170 180 190 165 170 175 180 185 190 195 155 160 165 170 175 180 155 160 165 170 175 180 170 175 180 185 190 195 170 175 180 185 190 195 160 165 170 175 180 185 155 165 175 185
SLIDE 7 Estimation of Heritability
Suppose we are interested in a trait that has been observed for every individual. We model the trait value as: T = µ + G + C + E. Then for a monozygotic twin: T MZ
1
= µ + G + C + E1 T MZ
2
= µ + G + C + E2 and for a dizygotic twin T DZ
1
= µ + G1 + C + E1 T DZ
2
= µ + G2 + C + E2.
SLIDE 8
Estimation of Heritability
Heritability is defined as h2 = Var G Var T . It can be computed that ρMZ = Var G + Var C Var T ρDZ = 0.5Var G + Var C Var T . So, h2 = Var G Var T = 2(ρMZ − ρDZ). The term at the right can be easily estimated by inserting sample correlations.
SLIDE 9
Interval Censored Data
The age at which an individual visists social service for the first time is interval censored; we do not observe the exact age, but an interval in which this age falls. An interval may start at zero or end at infinity. Heritability as defined before can not be estimated.
SLIDE 10
The model, hazards
Suppose T is the age at which an individual contacted social service for the first time with distribution function F and density f, then the hazard is defined as λ(t) = f(t) 1 − F(t) 1 − F(t) = e−Λ(t), Λ(t) = t λ(s)ds Interpretation If f(t)dt is interpreted as the probability that an individual visits social service for the first time in the interval [t, t + dt), then λ(t)dt ≈ P(t ≤ T < t + dt) P(T ≥ t) = P(t ≤ T < t + dt|T ≥ t)
SLIDE 11
The model, hazards
Suppose T is the age at which an individual contacted social service for the first time with distribution function F and density f, then the hazard is defined as λ(t) = f(t) 1 − F(t) 1 − F(t) = e−Λ(t), Λ(t) = t λ(s)ds Interpretation If f(t)dt is interpreted as the probability that an individual visits social service for the first time in the interval [t, t + dt), then λ(t)dt ≈ P(t ≤ T < t + dt) P(T ≥ t) = P(t ≤ T < t + dt|T ≥ t)
SLIDE 12 Statistical Model (Frailty Model)
A frailty model is a random effects proportional hazards model. The hazard function is defined as: λi(t|Zi) = Ziλ(t). Family survival data can be modelled with a correlated frailty model where the random effects (“frailties”) account for the dependence between family members. Let: T1, T2, T3: ages at onset of mental help for a twin and a sib Z1, Z2, Z3: corresponding frailty variables
- T1, T2, T3 are independent given (Z1, Z2, Z3)
- T1, T2, T3 have hazard functions t → λi(t|Zi) = Ziλ(t), i = 1, 2, 3
Equivalently P(T1 > t1, T2 > t2, T3 > t3|Z1, Z2, Z3) = e−Z1Λ(t1)−Z2Λ(t2)−Z3Λ(t3).
SLIDE 13 Statistical Model (Frailty Model)
A frailty model is a random effects proportional hazards model. The hazard function is defined as: λi(t|Zi) = Ziλ(t). Family survival data can be modelled with a correlated frailty model where the random effects (“frailties”) account for the dependence between family members. Let: T1, T2, T3: ages at onset of mental help for a twin and a sib Z1, Z2, Z3: corresponding frailty variables
- T1, T2, T3 are independent given (Z1, Z2, Z3)
- T1, T2, T3 have hazard functions t → λi(t|Zi) = Ziλ(t), i = 1, 2, 3
Equivalently P(T1 > t1, T2 > t2, T3 > t3|Z1, Z2, Z3) = e−Z1Λ(t1)−Z2Λ(t2)−Z3Λ(t3).
SLIDE 14 Statistical Model (Frailty Variable)
The coordinates in the frailty vector (Z1, Z2, Z3) are decomposed as Zi = Ai + C + Bi + Ei i = 1, 2, 3 with
- Ai: genetic effects for individual i
- C: common environmental effects for twin and sib
- Bi: specific twin environmental effects
- Ei: non-shared, specific environmental and genetic effects
SLIDE 15 Model Assumptions
Zi = Ai + C + Bi + Ei i = 1, 2, 3
- Ai: cor(A1, A2) = 1 if MZ-twin-pair, otherwise cor(Ai, Aj) = 1/2
- Bi: cor(B1, B2) = 1, otherwise cor(Bi, Bj) = 0, i = j
- Ei: E1, E2, E3 are independent; cor(Ei, Ej) = 0, i = j
SLIDE 16 Model Assumptions
Zi = Ai + C + Bi + Ei i = 1, 2, 3
- Ai: cor(A1, A2) = 1 if MZ-twin-pair, otherwise cor(Ai, Aj) = 1/2
- Bi: cor(B1, B2) = 1, otherwise cor(Bi, Bj) = 0, i = j
- Ei: E1, E2, E3 are independent; cor(Ei, Ej) = 0, i = j
Furthermore:
- (A1, A2, A3), C, (B1, B2, B3) and (E1, E2, E3) are independent.
- Ai, C, Bi, Ei have gamma-distributions with inverse scale
parameter η and shape parameter ν, νc, νb, νe, respectively. So, Zi has a gamma distribution with parameters η and ν + νc + νb + νe Assume: η = ν + νc + νb + νe, then EZ1 = EZ2 = EZ3 = 1.
SLIDE 17 Heritability, Environmental and Twin effect
Heritability, and environmental and twin effect are defined as the proportions of variance of the frailty associated with genetic, environmental and twin effects: h2 = Var Ai Var Z = ν η c2 = Var C Var Z = νc η b2 = νb η (see also Yashin et al, 1999 and Jonker et al, 2009). These definitions are as usual, except that the frailties are viewed as the latent
- phenotypes. Then, the correlations between the frailty variables of two
individuals equal ρMZ = h2 + c2 + b2, ρDZ = 0.5h2 + c2 + b2, ρS,S = 0.5h2 + c2.
SLIDE 18 Simultaneous Survival function
Because the vector of frailties has a multivariate gamma distribution, the simultaneous survival functions can be written in terms of the marginal survival function S. For instance, SMZ,S(t1, t2, t3) = P(T1 > t1, T2 > t2, T3 > t3) =
- S(t1)−σ2 + S(t2)−σ2 + S(t3)−σ2 − 2
−ν/2−νc ×
−ν/2−νb ×S(t3)(ν/2+νb)σ2(S(t1)S(t2)S(t3))νeσ2 (1) with σ2 = 1/η and with S the marginal survival function. S can be taken gender and/or twin - non twin specific.
SLIDE 19 Aims
- to estimate the degree of heritability (h2), environmental effects
(c2), and twin effects (b2),
- to investigate whether the survival functions differ for twins and
non-twins. Steps:
- Compute the likelihood (interval censored data)
- Estimate the unknown parameters in the model (maximum
likelihood estimators)
- Estimate h2, c2, b2 and test whether they are positive (likelihood
ratio test)
- Test whether the survival fuctions for twins and non-twins differ
(likelihood ratio test)
SLIDE 20
Likelihood and estimators
For the twin and sib timepoints Ui and Vi are observed, with Ui ≤ Ti ≤ Vi, i = 1, 2, 3. The likelihood for a MZ-twin with a sib is proportional to L(ν, νc, νb, η, S)[(U1, V1), (U2, V2), (U3, V3)] ∝ SMZ,S(U1, U2, U3) − SMZ,S(U1, U2, V3) − SMZ,S(U1, V2, U3) −SMZ,S(V1, U2, U3) + SMZ,S(U1, V2, V3) + SMZ,S(V1, U2, V3) +SMZ,S(V1, V2, U3) − SMZ,S(V1, V2, V3).
SLIDE 21
Likelihood and estimators
For the twin and sib timepoints Ui and Vi are observed, with Ui ≤ Ti ≤ Vi, i = 1, 2, 3. The likelihood for a MZ-twin with a sib is proportional to L(ν, νc, νb, η, S)[(U1, V1), (U2, V2), (U3, V3)] ∝ SMZ,S(U1, U2, U3) − SMZ,S(U1, U2, V3) − SMZ,S(U1, V2, U3) −SMZ,S(V1, U2, U3) + SMZ,S(U1, V2, V3) + SMZ,S(V1, U2, V3) +SMZ,S(V1, V2, U3) − SMZ,S(V1, V2, V3). SMZ,S can be expressed in terms of the model parameters by (1). Under the assumption that all families are independent, the total likelihood is the product of the marginal likelihoods. The unknown parameters are estimated by their maximum likelihood estimators (S was taken equal to a parametric distribution).
SLIDE 22
Hypothesis testing
We tested whether heritability, environmental and twin effects are positive: H0 : h2 = 0 ∨ H1 : h2 > 0, H0 : c2 = 0 ∨ H1 : c2 > 0, H0 : b2 = 0 ∨ H1 : b2 > 0, and whether the survival functions for twins (Stwin) and non-twins Snon−twin differ: H0 : Stwin ≡ Snon−twin ∨ H1 : Stwin ≡ Snon−twin, We used the likelihood ratio test. We assumed that S(t) = 1 − α(1 − exp(−λ(t − s))) for t ≥ s, with α, λ and s gender and twin- non twin specific (a truncated and shifted exponential distribution).
SLIDE 23
Data
Data of 4498 families (twins and siblings) 727 families with 1 individual 1988 families with 2 individuals 1232 families with 3 individuals 551 families with 4 or more individuals In total, we have data of 6289 females and 4314 males.
SLIDE 24 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
age
NPMLEs of distribution functions (1 − S) for males (dashed curve) and females (continued curve) under the assumption of independence between all individuals.
SLIDE 25 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
age
NPMLEs of distribution functions (1 − S) for males (dashed curve) and females (continued curve) under the assumption of independence between all individuals.
20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
age
20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
age
NPMLEs for females (left) and males (right) under the independence
- assumption. Dashed curves: non-twins, continued curves: twins
SLIDE 26 Results
Estimates: ˆ h2 = 0.392, ˆ c2 = 0.101,ˆ b2 = 0.137. p-values for the four hypotheses are 1.1810−6(h2), 0.0198(c2), 0.000361(b2), 0.000264(S, for males), 0.282(S, for females).
20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
age
Parametric maximum likelihood estimates for the distribution function for all women (dashed curve) and male twins and non-twins (continued lines).
SLIDE 27 Problems to think over
- Definition of “Social Service”
- Fit of the model (especially for male non-twins)
- Generations-effect
SLIDE 28 Ackowledgments
- Dorret Boomsma (VU University, Amsterdam) for providing the
data.
- Piet Groeneboom for using his computer program for estimating
the NPMLE for the survival curve based on interval censored survival data.
- All Dutch twins and their families for participation in the study.
Financially supported by: NDNS+ and NWO
SLIDE 29 References
- MA Jonker, S Bhulai, DI Boomsma, RSL Ligthart, D Posthuma, and
AW van der Vaart. Gamma frailty model for linkage analysis with application to interval censored migraine data Biostatistics 10: 187-200; 2009.
- MA Jonker and DI Boomsma. A frailty model for (interval) censored
family survival data, applied to the age at onset of non-physical problems, Lifetime data analysis, 16: 299-315; 2010
- AI Yashin, AZ Begun and IA Iachine. Genetic factors in susceptibility
to death: a comparitive analysis of bivariate survival models. Journal of Epidemiology and Biostatistics 4(1), 53-60; 1999.