Gompertz regression parameterized as accelerated failure time model - - PowerPoint PPT Presentation
Gompertz regression parameterized as accelerated failure time model - - PowerPoint PPT Presentation
Gompertz regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini Biostatistics Team Departmentof Public Health Sciences Karolinska Institutet 2017 Nordic and Baltic Stata meeting Content Introduction
Content
§ Introduction § Proportional hazard model § Accelerated failure time model § The Gompertz distribution § Structural equation models and mediation § Mediation in survival models § Estimating confidence intervals § What I am working on
2017-08-31 Filip Andersson 2
Content
§ Example
à Data à Pre-estimation à Gompertz proportional hazard à Cox regression à Gompertz vs. Kaplan-Maier à Gompertz ATF model à Post-estimation à Conclusion
2017-08-31 Filip Andersson 3
Introduction
§ Why use parametric surival models?
à Can handle right-, left- or interval-censored data à Cox regression can’t handle left- or interval-censored data à Produce better estimation if you have a theoretical expectation of the baseline hazard à Can estimate expected life, not only hazard ratios (AFT-models) à Can include random effects – frailty models (not discussed here)
2017-08-31 Filip Andersson 4
Introduction
§ A model that is lacking an easy way to estimate in Stata
à Gompertz regression parameterized as accelerated failure time model à Exist in R
§ eha-package, with command: aftreg
§ Why use Stata?
à Easy handling survival data
§ Data management § Setup
à Good graphical possibility
2017-08-31 Filip Andersson 5
Proportional hazard model
§ Easy to compare with Cox regression
à Hazard ratios à Plots
§ Cummulative hazard function § Survival function
à Commonly used
§ Hazard function general form
à ℎ 𝑢 𝑦 = ℎ%(𝑢)𝑓)*
2017-08-31 Filip Andersson 6
Accelerated failure time model
§ Can be seen as a linear model (simplest form):
à log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜁 à Usefulin mediation
§ Estimation on life scale
à Estimation of expected baseline life
§ Area under the survival curve when all covariates are zero
à Compare expected life between two groups
§ Logarithmic change in expected life compared to the baseline life expectancy § Expected life = Baseline life expectancy ∗ exp (effect)
2017-08-31 Filip Andersson 7
Accelerated failure time model
§ Definiton of accelerated failure time model
à For a group (X1,X2…Xp) , the model is written mathematically as 𝑇 𝑢 𝑦 = 𝑇%
C D()) , where S0(t) is the baseline survival function and
𝜃(𝑦) is an acceleration factor that is a ratio of survival times corresponding to any fixed value of S(t). The acceleration factor is given according to the formula 𝜃 𝑦 = 𝑓(FG)GH⋯HFJ)J). (Qi, J (2009))
§ Hazard function
à ℎ 𝑢 𝑦 =
K D()) ℎ% C D())
§ Log-linear from
à log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜏𝜁 à Where t and ε following corresponing distributions
2017-08-31 Filip Andersson 8
The Gompertz distribution
§ When is it useful?
à Adult and old age mortality for humans
§ Demographic models § Including models with treatment effects, such as cancer patiens § Can be problem with very old individuals
§ Normal paramertization
à ℎ 𝑢 = 𝜇𝑓NC à 𝜇 > 0, 𝛿 ≥ 0, 𝑢 > 0
2017-08-31 Filip Andersson 9
The Gompertz distribution
§ Suggested new parametrization by Broström, G & Edvinsson, S (2013)
à 𝜇 → U
N , 𝛿 → K N
à ℎ 𝑢 = U
N𝑓
V W
X
à 𝜇 > 0, 𝛿 > 0, 𝑢 > 0
§ Proof of new parametrization
à Hazard for AFT-model à ℎ 𝑢 𝑦 =
K D()) ℎ% C D())
à Here, new gamma can be seen as an accelerated factor
2017-08-31 Filip Andersson 10
The Gompertz distribution
§ Linear model: log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜁
§ Here, ε is following a log-Gompertz or inverse Weibull distribution § Compare to the Weibull model, where ε follows a Gumbel distribution
§ Likelihood function
à Survivalfunction: 𝑇 𝑢 = 𝑓𝑦𝑞 −𝜇 𝑓
V W
X − 1
à Density function: 𝐺 𝑢 = ℎ 𝑢 𝑇 𝑢 à Hazard function: ℎ 𝑢 = U
N𝑓
V W
X
à 𝑀 𝛽, 𝜈, 𝜏 = ∏ ℎa 𝑢a 𝑇a(𝑢a) bc 𝑇a(𝑢a) Kdbc
e afK
2017-08-31 Filip Andersson 11
Structural equation models and mediation
§ Simple way to estimate linear models within a pathway framework § Estimate all equations and combine for the direct and indirect effects § Supported by most statistical programs
à In Stata the gsem-command combined with simulation is preferable
2017-08-31 Filip Andersson 12
Mediation in survival models
§ What do we need to do?
- 1. Estimate a parametric survivalmodel
- 2. Estimate the exposure on the mediator
§ First two steps directly from the gsem output
- 3. Estimate the indirect, direct and total effect
- 4. Estimate confidence intervals and significance
§ Step three and four can be done with either simulation or delta method § These models are simple for continous mediators, but can be tricky with binary or categorical mediators
2017-08-31 Filip Andersson 13
Estimating confidence intervals
§ Simulation
à Boostraping
§ Seems to be the more popular simulation method § Calculate point estimates for the indirect and direct effects § Simulate these point estimates
à Monte carlo simulation
§ More flexible to handle problematic correlations § Not as straight forward
§ Delta method
§ Easiest method and probably most popular § Need a stronger assumption of normality
2017-08-31 Filip Andersson 14
What I am working on
§ A Stata command, staftgomp, to estimate the Gompertz regression parameterized as accelerated failure time model similar to what streg does § A post-estimation command that would make it simple to estimate direct, indirect and total effect, with confidence intervals, for survival models
2017-08-31 Filip Andersson 15
Example
§ Scanian Economic Demographic Database (Bengtsson, T., Dribe, M. and Svensson, P. (2012)) § Longitudinal historical database
à Data from 17thcentury and onwards à Here, data from individuals born between 1815-1860 are used à Comes from five rural parishes in western Scania à Consist of important life course events as birth and death, but also births of children, marriage or socioeconomic status are recorded
2017-08-31 Filip Andersson 16
Data
§ Variables used:
à ”Treatmentvariable”:
§ Approximation of bad early life conditions § Infant mortality rate at the year of birth § High imr vs. low imr (binary) § Years of high diseaseload such as measles, smallpox and whooping cough (Quaranta, L. (2013))
à Parentalsocioeconomic status
§ Socioceconomic status at birth (binary) § Confounder
à Outcome
§ The individuals are followed until death or out-migration.
2017-08-31 Filip Andersson 17
Pre-estimation
§ Compare hazard estimations of Gompertz proportional hazard model and Cox regression § Plot survival curve and compare with Kaplan-Maier § If not acceptable test with different survival distribution until the parametric model is acceptable
à Here, we choose Gompertz as it fits good and are supported theoretically for adult mortality
2017-08-31 Filip Andersson 18
Gompertz proportional hazard
. streg imr_high ses, dist(gompertz) Gompertz regression -- log relative-hazard form
- No. of subjects = 3,756 Number of obs = 3,756
- No. of failures = 880
Time at risk = 19824107 LR chi2(2) = 26.53 Log likelihood = -1773.9194 Prob > chi2 = 0.0000
- _t | Haz. Ratio
- Std. Err. z P>|z| [95% Conf. Interval]
- ------------+----------------------------------------------------------------
imr_high | 1.259023 .0951873 3.05 0.002 1.085624 1.460119 ses | 1.362878 .1010669 4.17 0.000 1.178513 1.576084 _cons | 9.57e-06 8.25e-07 -134.05 0.000 8.08e-06 .0000113
- ------------+----------------------------------------------------------------
/gamma | .0002332 8.35e-06 27.92 0.000 .0002168 .0002496
- 2017-08-31
Filip Andersson 19
Cox regression
. stcox imr_high ses Cox regression -- Breslow method for ties
- No. of subjects = 3,756 Number of obs = 3,756
- No. of failures = 880
Time at risk = 19824107 LR chi2(2) = 28.17 Log likelihood = -5889.8259 Prob > chi2 = 0.0000
- _t | Haz. Ratio
- Std. Err. z P>|z| [95% Conf. Interval]
- ------------+----------------------------------------------------------------
imr_high | 1.261686 .0955679 3.07 0.002 1.087617 1.463614 ses | 1.381581 .102833 4.34 0.000 1.194043 1.598573
- 2017-08-31
Filip Andersson 20
Gompertz vs. Kaplan-Maier
2017-08-31 Filip Andersson 21
Gompertz AFT model
. staftgomp imr_high ses Gompertz AFT regression No. of obs = 3756 Log likelihood = -9325.8767 LR chi2(2) = 14.34 Baseline life expectancy = 11669.94 Prob > chi2 = 0.0008
- _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
- ------------+----------------------------------------------------------------
xb | imr_high | -.0496389 .0261027 -1.90 0.057 -.1007992 .0015214 ses | -.0873728 .0273233 -3.20 0.001 -.1409255 -.0338202
- ------------+----------------------------------------------------------------
bp | lambda | 8.434053 .0451675 186.73 0.000 8.345526 8.522579 gamma | -2.931995 .102271 -28.67 0.000 -3.132442 -2.731547
- 2017-08-31
Filip Andersson 22
Post-estimation
. lincom imr_high, eform ( 1) [xb]imr_high = 0
- _t | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
- ------------+----------------------------------------------------------------
(1) | .951573 .0248386 -1.90 0.057 .9041146 1.001523
- . nlcom exp([xb]imr_high)*11699
_nl_1: exp([xb]imr_high)*11699
- _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
- ------------+----------------------------------------------------------------
_nl_1 | 11132.45 290.5866 38.31 0.000 10562.91 11701.99
- 2017-08-31
Filip Andersson 23
Post-estimation
§ Baseline life expectancy
à KKghh
igj days = 32,1 years
§ Estimating for individuals after 16000 days
à KKghhHKg%%%
igj
days = 75,9 years of age
§ Effect of high imr during birth
à KKKiqHKg%%%
igj
days = 74,3 years of age
2017-08-31 Filip Andersson 24
Conclusion
§ Conclusion
à Even if you survive over the age of 40 you still have a mean shorter life expectancyof 1,6 years if you were born in a year with high imr à Latent effect à Support for the fetal origins hypothesis à Is the estimate reasonable?
§ If needed
à Mediation analysisand calculation ofdirect, indirectand total effect
- f treatment
à Here, total effect = direct effect
2017-08-31 Filip Andersson 25