Gompertz regression parameterized as accelerated failure time model - - PowerPoint PPT Presentation

gompertz regression parameterized as accelerated failure
SMART_READER_LITE
LIVE PREVIEW

Gompertz regression parameterized as accelerated failure time model - - PowerPoint PPT Presentation

Gompertz regression parameterized as accelerated failure time model Filip Andersson and Nicola Orsini Biostatistics Team Departmentof Public Health Sciences Karolinska Institutet 2017 Nordic and Baltic Stata meeting Content Introduction


slide-1
SLIDE 1

Gompertz regression parameterized as accelerated failure time model

Filip Andersson and Nicola Orsini Biostatistics Team Departmentof Public Health Sciences Karolinska Institutet 2017 Nordic and Baltic Stata meeting

slide-2
SLIDE 2

Content

§ Introduction § Proportional hazard model § Accelerated failure time model § The Gompertz distribution § Structural equation models and mediation § Mediation in survival models § Estimating confidence intervals § What I am working on

2017-08-31 Filip Andersson 2

slide-3
SLIDE 3

Content

§ Example

à Data à Pre-estimation à Gompertz proportional hazard à Cox regression à Gompertz vs. Kaplan-Maier à Gompertz ATF model à Post-estimation à Conclusion

2017-08-31 Filip Andersson 3

slide-4
SLIDE 4

Introduction

§ Why use parametric surival models?

à Can handle right-, left- or interval-censored data à Cox regression can’t handle left- or interval-censored data à Produce better estimation if you have a theoretical expectation of the baseline hazard à Can estimate expected life, not only hazard ratios (AFT-models) à Can include random effects – frailty models (not discussed here)

2017-08-31 Filip Andersson 4

slide-5
SLIDE 5

Introduction

§ A model that is lacking an easy way to estimate in Stata

à Gompertz regression parameterized as accelerated failure time model à Exist in R

§ eha-package, with command: aftreg

§ Why use Stata?

à Easy handling survival data

§ Data management § Setup

à Good graphical possibility

2017-08-31 Filip Andersson 5

slide-6
SLIDE 6

Proportional hazard model

§ Easy to compare with Cox regression

à Hazard ratios à Plots

§ Cummulative hazard function § Survival function

à Commonly used

§ Hazard function general form

à ℎ 𝑢 𝑦 = ℎ%(𝑢)𝑓)*

2017-08-31 Filip Andersson 6

slide-7
SLIDE 7

Accelerated failure time model

§ Can be seen as a linear model (simplest form):

à log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜁 à Usefulin mediation

§ Estimation on life scale

à Estimation of expected baseline life

§ Area under the survival curve when all covariates are zero

à Compare expected life between two groups

§ Logarithmic change in expected life compared to the baseline life expectancy § Expected life = Baseline life expectancy ∗ exp (effect)

2017-08-31 Filip Andersson 7

slide-8
SLIDE 8

Accelerated failure time model

§ Definiton of accelerated failure time model

à For a group (X1,X2…Xp) , the model is written mathematically as 𝑇 𝑢 𝑦 = 𝑇%

C D()) , where S0(t) is the baseline survival function and

𝜃(𝑦) is an acceleration factor that is a ratio of survival times corresponding to any fixed value of S(t). The acceleration factor is given according to the formula 𝜃 𝑦 = 𝑓(FG)GH⋯HFJ)J). (Qi, J (2009))

§ Hazard function

à ℎ 𝑢 𝑦 =

K D()) ℎ% C D())

§ Log-linear from

à log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜏𝜁 à Where t and ε following corresponing distributions

2017-08-31 Filip Andersson 8

slide-9
SLIDE 9

The Gompertz distribution

§ When is it useful?

à Adult and old age mortality for humans

§ Demographic models § Including models with treatment effects, such as cancer patiens § Can be problem with very old individuals

§ Normal paramertization

à ℎ 𝑢 = 𝜇𝑓NC à 𝜇 > 0, 𝛿 ≥ 0, 𝑢 > 0

2017-08-31 Filip Andersson 9

slide-10
SLIDE 10

The Gompertz distribution

§ Suggested new parametrization by Broström, G & Edvinsson, S (2013)

à 𝜇 → U

N , 𝛿 → K N

à ℎ 𝑢 = U

N𝑓

V W

X

à 𝜇 > 0, 𝛿 > 0, 𝑢 > 0

§ Proof of new parametrization

à Hazard for AFT-model à ℎ 𝑢 𝑦 =

K D()) ℎ% C D())

à Here, new gamma can be seen as an accelerated factor

2017-08-31 Filip Andersson 10

slide-11
SLIDE 11

The Gompertz distribution

§ Linear model: log 𝑢 = 𝑏 + 𝑐𝑦 + 𝜁

§ Here, ε is following a log-Gompertz or inverse Weibull distribution § Compare to the Weibull model, where ε follows a Gumbel distribution

§ Likelihood function

à Survivalfunction: 𝑇 𝑢 = 𝑓𝑦𝑞 −𝜇 𝑓

V W

X − 1

à Density function: 𝐺 𝑢 = ℎ 𝑢 𝑇 𝑢 à Hazard function: ℎ 𝑢 = U

N𝑓

V W

X

à 𝑀 𝛽, 𝜈, 𝜏 = ∏ ℎa 𝑢a 𝑇a(𝑢a) bc 𝑇a(𝑢a) Kdbc

e afK

2017-08-31 Filip Andersson 11

slide-12
SLIDE 12

Structural equation models and mediation

§ Simple way to estimate linear models within a pathway framework § Estimate all equations and combine for the direct and indirect effects § Supported by most statistical programs

à In Stata the gsem-command combined with simulation is preferable

2017-08-31 Filip Andersson 12

slide-13
SLIDE 13

Mediation in survival models

§ What do we need to do?

  • 1. Estimate a parametric survivalmodel
  • 2. Estimate the exposure on the mediator

§ First two steps directly from the gsem output

  • 3. Estimate the indirect, direct and total effect
  • 4. Estimate confidence intervals and significance

§ Step three and four can be done with either simulation or delta method § These models are simple for continous mediators, but can be tricky with binary or categorical mediators

2017-08-31 Filip Andersson 13

slide-14
SLIDE 14

Estimating confidence intervals

§ Simulation

à Boostraping

§ Seems to be the more popular simulation method § Calculate point estimates for the indirect and direct effects § Simulate these point estimates

à Monte carlo simulation

§ More flexible to handle problematic correlations § Not as straight forward

§ Delta method

§ Easiest method and probably most popular § Need a stronger assumption of normality

2017-08-31 Filip Andersson 14

slide-15
SLIDE 15

What I am working on

§ A Stata command, staftgomp, to estimate the Gompertz regression parameterized as accelerated failure time model similar to what streg does § A post-estimation command that would make it simple to estimate direct, indirect and total effect, with confidence intervals, for survival models

2017-08-31 Filip Andersson 15

slide-16
SLIDE 16

Example

§ Scanian Economic Demographic Database (Bengtsson, T., Dribe, M. and Svensson, P. (2012)) § Longitudinal historical database

à Data from 17thcentury and onwards à Here, data from individuals born between 1815-1860 are used à Comes from five rural parishes in western Scania à Consist of important life course events as birth and death, but also births of children, marriage or socioeconomic status are recorded

2017-08-31 Filip Andersson 16

slide-17
SLIDE 17

Data

§ Variables used:

à ”Treatmentvariable”:

§ Approximation of bad early life conditions § Infant mortality rate at the year of birth § High imr vs. low imr (binary) § Years of high diseaseload such as measles, smallpox and whooping cough (Quaranta, L. (2013))

à Parentalsocioeconomic status

§ Socioceconomic status at birth (binary) § Confounder

à Outcome

§ The individuals are followed until death or out-migration.

2017-08-31 Filip Andersson 17

slide-18
SLIDE 18

Pre-estimation

§ Compare hazard estimations of Gompertz proportional hazard model and Cox regression § Plot survival curve and compare with Kaplan-Maier § If not acceptable test with different survival distribution until the parametric model is acceptable

à Here, we choose Gompertz as it fits good and are supported theoretically for adult mortality

2017-08-31 Filip Andersson 18

slide-19
SLIDE 19

Gompertz proportional hazard

. streg imr_high ses, dist(gompertz) Gompertz regression -- log relative-hazard form

  • No. of subjects = 3,756 Number of obs = 3,756
  • No. of failures = 880

Time at risk = 19824107 LR chi2(2) = 26.53 Log likelihood = -1773.9194 Prob > chi2 = 0.0000

  • _t | Haz. Ratio
  • Std. Err. z P>|z| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

imr_high | 1.259023 .0951873 3.05 0.002 1.085624 1.460119 ses | 1.362878 .1010669 4.17 0.000 1.178513 1.576084 _cons | 9.57e-06 8.25e-07 -134.05 0.000 8.08e-06 .0000113

  • ------------+----------------------------------------------------------------

/gamma | .0002332 8.35e-06 27.92 0.000 .0002168 .0002496

  • 2017-08-31

Filip Andersson 19

slide-20
SLIDE 20

Cox regression

. stcox imr_high ses Cox regression -- Breslow method for ties

  • No. of subjects = 3,756 Number of obs = 3,756
  • No. of failures = 880

Time at risk = 19824107 LR chi2(2) = 28.17 Log likelihood = -5889.8259 Prob > chi2 = 0.0000

  • _t | Haz. Ratio
  • Std. Err. z P>|z| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

imr_high | 1.261686 .0955679 3.07 0.002 1.087617 1.463614 ses | 1.381581 .102833 4.34 0.000 1.194043 1.598573

  • 2017-08-31

Filip Andersson 20

slide-21
SLIDE 21

Gompertz vs. Kaplan-Maier

2017-08-31 Filip Andersson 21

slide-22
SLIDE 22

Gompertz AFT model

. staftgomp imr_high ses Gompertz AFT regression No. of obs = 3756 Log likelihood = -9325.8767 LR chi2(2) = 14.34 Baseline life expectancy = 11669.94 Prob > chi2 = 0.0008

  • _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

xb | imr_high | -.0496389 .0261027 -1.90 0.057 -.1007992 .0015214 ses | -.0873728 .0273233 -3.20 0.001 -.1409255 -.0338202

  • ------------+----------------------------------------------------------------

bp | lambda | 8.434053 .0451675 186.73 0.000 8.345526 8.522579 gamma | -2.931995 .102271 -28.67 0.000 -3.132442 -2.731547

  • 2017-08-31

Filip Andersson 22

slide-23
SLIDE 23

Post-estimation

. lincom imr_high, eform ( 1) [xb]imr_high = 0

  • _t | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

(1) | .951573 .0248386 -1.90 0.057 .9041146 1.001523

  • . nlcom exp([xb]imr_high)*11699

_nl_1: exp([xb]imr_high)*11699

  • _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • ------------+----------------------------------------------------------------

_nl_1 | 11132.45 290.5866 38.31 0.000 10562.91 11701.99

  • 2017-08-31

Filip Andersson 23

slide-24
SLIDE 24

Post-estimation

§ Baseline life expectancy

à KKghh

igj days = 32,1 years

§ Estimating for individuals after 16000 days

à KKghhHKg%%%

igj

days = 75,9 years of age

§ Effect of high imr during birth

à KKKiqHKg%%%

igj

days = 74,3 years of age

2017-08-31 Filip Andersson 24

slide-25
SLIDE 25

Conclusion

§ Conclusion

à Even if you survive over the age of 40 you still have a mean shorter life expectancyof 1,6 years if you were born in a year with high imr à Latent effect à Support for the fetal origins hypothesis à Is the estimate reasonable?

§ If needed

à Mediation analysisand calculation ofdirect, indirectand total effect

  • f treatment

à Here, total effect = direct effect

2017-08-31 Filip Andersson 25