A novel approach to competing risks analysis using case-base - - PowerPoint PPT Presentation

a novel approach to competing risks analysis using case
SMART_READER_LITE
LIVE PREVIEW

A novel approach to competing risks analysis using case-base - - PowerPoint PPT Presentation

A novel approach to competing risks analysis using case-base sampling Maxime Turgeon June 10th, 2017 McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/19 Acknowledgements This project is joint work with:


slide-1
SLIDE 1

A novel approach to competing risks analysis using case-base sampling

Maxime Turgeon June 10th, 2017

McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/19

slide-2
SLIDE 2

Acknowledgements

This project is joint work with:

  • Sahir Bhatnagar
  • Olli Saarela (U. Toronto)
  • Jim Hanley

2/19

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Motivation

3/19

slide-5
SLIDE 5

Motivation

  • Jane Doe, 35 yo, received stem-cell transplant for acute

myeloid leukemia

3/19

slide-6
SLIDE 6

Motivation

  • Jane Doe, 35 yo, received stem-cell transplant for acute

myeloid leukemia

  • “What is my 5-year risk of relapse?”

3/19

slide-7
SLIDE 7

Motivation

  • Jane Doe, 35 yo, received stem-cell transplant for acute

myeloid leukemia

  • “What is my 5-year risk of relapse?”
  • P(Time to event < 5, Relapse | Covariates)

3/19

slide-8
SLIDE 8

Motivation

  • Jane Doe, 35 yo, received stem-cell transplant for acute

myeloid leukemia

  • “What is my 5-year risk of relapse?”
  • P(Time to event < 5, Relapse | Covariates)
  • “What about 1-year? 2-year?”

3/19

slide-9
SLIDE 9

Motivation

  • Jane Doe, 35 yo, received stem-cell transplant for acute

myeloid leukemia

  • “What is my 5-year risk of relapse?”
  • P(Time to event < 5, Relapse | Covariates)
  • “What about 1-year? 2-year?”
  • A smooth absolute risk curve.

3/19

slide-10
SLIDE 10

Current methods

4/19

slide-11
SLIDE 11

Current methods

  • Proportional hazards hypothesis

4/19

slide-12
SLIDE 12

Current methods

  • Proportional hazards hypothesis
  • Disease etiology

4/19

slide-13
SLIDE 13

Current methods

  • Proportional hazards hypothesis
  • Disease etiology
  • E.g. Cox regression.

4/19

slide-14
SLIDE 14

Current methods

  • Proportional hazards hypothesis
  • Disease etiology
  • E.g. Cox regression.
  • Proportional subdistribution hypothesis

4/19

slide-15
SLIDE 15

Current methods

  • Proportional hazards hypothesis
  • Disease etiology
  • E.g. Cox regression.
  • Proportional subdistribution hypothesis
  • Absolute risk

4/19

slide-16
SLIDE 16

Current methods

  • Proportional hazards hypothesis
  • Disease etiology
  • E.g. Cox regression.
  • Proportional subdistribution hypothesis
  • Absolute risk
  • E.g. Fine-Gray model.

4/19

slide-17
SLIDE 17

Summary

5/19

slide-18
SLIDE 18

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

5/19

slide-19
SLIDE 19

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

  • Our approach relies on Hanley & Miettinen’s case-base

sampling method [1].

5/19

slide-20
SLIDE 20

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

  • Our approach relies on Hanley & Miettinen’s case-base

sampling method [1].

  • Smooth hazards give rise to smooth absolute risk curves.

5/19

slide-21
SLIDE 21

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

  • Our approach relies on Hanley & Miettinen’s case-base

sampling method [1].

  • Smooth hazards give rise to smooth absolute risk curves.
  • Our approach allows for a symmetric treatment of all time

variables.

5/19

slide-22
SLIDE 22

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

  • Our approach relies on Hanley & Miettinen’s case-base

sampling method [1].

  • Smooth hazards give rise to smooth absolute risk curves.
  • Our approach allows for a symmetric treatment of all time

variables.

  • Finally, it also allows for hypothesis testing and variable

selection.

5/19

slide-23
SLIDE 23

Summary

  • We propose a simple approach to modeling directly the

cause-specific hazards using (smooth) parametric families.

  • Our approach relies on Hanley & Miettinen’s case-base

sampling method [1].

  • Smooth hazards give rise to smooth absolute risk curves.
  • Our approach allows for a symmetric treatment of all time

variables.

  • Finally, it also allows for hypothesis testing and variable

selection. This method is currently available in the R package casebase on CRAN. See also our website: http://sahirbhatnagar.com/casebase/

5/19

slide-24
SLIDE 24

Case-base sampling

slide-25
SLIDE 25

20 40 60 80 100 120 50 100 150 Follow−up time (months) Population

6/19

slide-26
SLIDE 26

20 40 60 80 100 120 50 100 150 Follow−up time (months) Population

6/19

slide-27
SLIDE 27

20 40 60 80 100 120 50 100 150 Follow−up time (months) Population

  • Relapse

6/19

slide-28
SLIDE 28

20 40 60 80 100 120 50 100 150 Follow−up time (months) Population

  • Relapse

Competing event

6/19

slide-29
SLIDE 29

20 40 60 80 100 120 50 100 150 Follow−up time (months) Population

  • Relapse

Competing event Base series

6/19

slide-30
SLIDE 30

Case-base sampling

7/19

slide-31
SLIDE 31

Case-base sampling

  • The unit of analysis is a person-moment.

7/19

slide-32
SLIDE 32

Case-base sampling

  • The unit of analysis is a person-moment.
  • Case-base sampling reduces the model fitting to a familiar

multinomial regression.

7/19

slide-33
SLIDE 33

Case-base sampling

  • The unit of analysis is a person-moment.
  • Case-base sampling reduces the model fitting to a familiar

multinomial regression.

  • The sampling process is taken into account using an offset

term.

7/19

slide-34
SLIDE 34

Case-base sampling

  • The unit of analysis is a person-moment.
  • Case-base sampling reduces the model fitting to a familiar

multinomial regression.

  • The sampling process is taken into account using an offset

term.

  • By sampling a large base series, the information loss

eventually becomes negligible.

7/19

slide-35
SLIDE 35

Case-base sampling

  • The unit of analysis is a person-moment.
  • Case-base sampling reduces the model fitting to a familiar

multinomial regression.

  • The sampling process is taken into account using an offset

term.

  • By sampling a large base series, the information loss

eventually becomes negligible.

  • This framework can easily be used with time-varying

covariates (e.g. time-varying exposure).

7/19

slide-36
SLIDE 36

Theoretical details

slide-37
SLIDE 37

Assumptions

We make the following assumptions:

8/19

slide-38
SLIDE 38

Assumptions

We make the following assumptions:

  • For each event type j = 1, . . . , m, a non-homogeneous Poisson

process with hazard λj(t).

8/19

slide-39
SLIDE 39

Assumptions

We make the following assumptions:

  • For each event type j = 1, . . . , m, a non-homogeneous Poisson

process with hazard λj(t).

  • At most one event type can occur.

8/19

slide-40
SLIDE 40

Assumptions

We make the following assumptions:

  • For each event type j = 1, . . . , m, a non-homogeneous Poisson

process with hazard λj(t).

  • At most one event type can occur.
  • Non-informative censoring.

8/19

slide-41
SLIDE 41

Assumptions

We make the following assumptions:

  • For each event type j = 1, . . . , m, a non-homogeneous Poisson

process with hazard λj(t).

  • At most one event type can occur.
  • Non-informative censoring.
  • Case-base sampling occurs following a non-homogenous

Poisson process with hazard ρ(t).

8/19

slide-42
SLIDE 42

Likelihood

Each person-moment’s contribution to the likelihood is of the form:

m

  • j=1

λj(t)dNj(t) ρ(t) + m

j=1 λj(t). 9/19

slide-43
SLIDE 43

Likelihood

Each person-moment’s contribution to the likelihood is of the form:

m

  • j=1

λj(t)dNj(t) ρ(t) + m

j=1 λj(t).

This is reminiscent of a multinomial likelihood, with offset log(1/ρ(t)).

9/19

slide-44
SLIDE 44

Likelihood

Main Theorem

10/19

slide-45
SLIDE 45

Likelihood

Main Theorem The likelihood defined above has mean zero and is asymptotically normal.

10/19

slide-46
SLIDE 46

Likelihood

Main Theorem The likelihood defined above has mean zero and is asymptotically normal. Implication: All the GLM machinery (e.g. deviance tests, information criteria, regularization) is available to us.

10/19

slide-47
SLIDE 47

Parametric families

11/19

slide-48
SLIDE 48

Parametric families

We can fit any model of the following form: log λ(t; α, β) = g(t; α) + βX.

11/19

slide-49
SLIDE 49

Parametric families

We can fit any model of the following form: log λ(t; α, β) = g(t; α) + βX. Different choices of the function g leads to familiar parametric families:

11/19

slide-50
SLIDE 50

Parametric families

We can fit any model of the following form: log λ(t; α, β) = g(t; α) + βX. Different choices of the function g leads to familiar parametric families:

  • Exponential: g is constant.

11/19

slide-51
SLIDE 51

Parametric families

We can fit any model of the following form: log λ(t; α, β) = g(t; α) + βX. Different choices of the function g leads to familiar parametric families:

  • Exponential: g is constant.
  • Gompertz: g(t; α) = αt.

11/19

slide-52
SLIDE 52

Parametric families

We can fit any model of the following form: log λ(t; α, β) = g(t; α) + βX. Different choices of the function g leads to familiar parametric families:

  • Exponential: g is constant.
  • Gompertz: g(t; α) = αt.
  • Weibull: g(t; α) = α log t.

11/19

slide-53
SLIDE 53

Simulation study

slide-54
SLIDE 54

Simulation scenario

  • We simulate 1000 datasets from an exponential and a

Gompertz family.

12/19

slide-55
SLIDE 55

Simulation scenario

  • We simulate 1000 datasets from an exponential and a

Gompertz family.

  • Binary covariate

12/19

slide-56
SLIDE 56

Simulation scenario

  • We simulate 1000 datasets from an exponential and a

Gompertz family.

  • Binary covariate
  • Random censoring

12/19

slide-57
SLIDE 57

Simulation scenario

  • We simulate 1000 datasets from an exponential and a

Gompertz family.

  • Binary covariate
  • Random censoring
  • We compare case-base with a correctly specified family,

case-base with splines, and Cox regression.

12/19

slide-58
SLIDE 58

Simulation results

  • Exponential

Gompertz Case−base Case−base/Splines Cox Case−base Case−base/Splines Cox 1 2

Method Beta

13/19

slide-59
SLIDE 59

Data analysis

slide-60
SLIDE 60

Data

Variable description Statistical summary Sex M=Male (87) F=Female (72) Disease ALL (59) AML (100) Phase CR1 (43) CR2 (40) CR3 (10) Relapse (65) Type of transplant BM+PB (15) PB (144) Age of patient (years) 16–62 33 (IQR 19.5) Failure time (months) 0.13–131.77 20.28 (30.78) Status indicator 0=censored (40) 1=relapse (49) 2=competing event (70)

14/19

slide-61
SLIDE 61

Acute Lymphoid Leukemia Acute Myeloid Leukemia 20 40 60 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Time (in Months) Relapse risk Method

Case−base Fine−Gray Kaplan−Meier

Absolute risk for female patient, median age, in relapse at transplant (stem cells from peripheral blood).

15/19

slide-62
SLIDE 62

Model fit

Case-base Cox regression Variable Hazard ratio 95% CI Hazard ratio 95% CI Sex 0.64 (0.35, 1.20) 0.75 (0.42, 1.35) Disease 0.54 (0.27, 1.07) 0.63 (0.34, 1.19) Phase CR2 1.00 (0.37, 2.70) 0.95 (0.36, 2.51) Phase CR3 1.25 (0.24, 6.53) 1.38 (0.28, 6.76 ) Phase Relapse 4.71 (2.11, 10.54) 4.06 (1.85, 8.92) Source 1.89 (0.40, 8.99) 1.49 (0.32, 6.85) Age 0.99 (0.97, 1.02) 0.99 (0.97, 1.02)

16/19

slide-63
SLIDE 63

Discussion

slide-64
SLIDE 64

Discussion

17/19

slide-65
SLIDE 65

Discussion

  • We proposed a simple and flexible way of directly modeling

the hazard function, using multinomial regression.

17/19

slide-66
SLIDE 66

Discussion

  • We proposed a simple and flexible way of directly modeling

the hazard function, using multinomial regression.

  • This leads to smooth estimates of the absolute risks.

17/19

slide-67
SLIDE 67

Discussion

  • We proposed a simple and flexible way of directly modeling

the hazard function, using multinomial regression.

  • This leads to smooth estimates of the absolute risks.
  • We are explicitely modeling time.

17/19

slide-68
SLIDE 68

Discussion

  • We proposed a simple and flexible way of directly modeling

the hazard function, using multinomial regression.

  • This leads to smooth estimates of the absolute risks.
  • We are explicitely modeling time.
  • We can test the significance of covariates.

17/19

slide-69
SLIDE 69

References I

  • J. A. Hanley and O. S. Miettinen.

Fitting smooth-in-time prognostic risk functions via logistic regression. The International Journal of Biostatistics, 5(1), 2009.

  • O. Saarela.

A case-base sampling method for estimating recurrent event intensities. Lifetime data analysis, pages 1–17, 2015.

  • O. Saarela and J. A. Hanley.

Case-base methods for studying vaccination safety. Biometrics, 71(1):42–52, 2015.

18/19

slide-70
SLIDE 70

References II

  • L. Scrucca, A. Santucci, and F. Aversa.

Regression modeling of competing risk using R: an in depth guide for clinicians. Bone marrow transplantation, 45(9):1388–1395, 2010.

19/19

slide-71
SLIDE 71

Questions or comments? For more details, visit http://sahirbhatnagar.com/casebase/

19/19