Elements of survival analysis Gilbert Ritschard Department of - - PowerPoint PPT Presentation

elements of survival analysis
SMART_READER_LITE
LIVE PREVIEW

Elements of survival analysis Gilbert Ritschard Department of - - PowerPoint PPT Presentation

Survival analysis Elements of survival analysis Gilbert Ritschard Department of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data Mining University of


slide-1
SLIDE 1

Survival analysis

Elements of survival analysis

Gilbert Ritschard

Department of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch/biomining

APA-ATI Workshop on Exploratory Data Mining University of Southern California, Los Angeles, CA, July 2009

24/7/2009gr 1/22

slide-2
SLIDE 2

Survival analysis

Classical statistical approaches

Survival Approaches

Survival or Event history analysis (Blossfeld and Rohwer, 2002)

Focuses on one event. Concerned with duration until event occurs

  • r with hazard of experiencing event.

Survival curves: Distribution of duration until event occurs S(t) = p(T ≥ t) . Hazard models: Regression like models for S(t, x) or hazard h(t) = p(T = t | T ≥ t) h(t, x) = g

  • t, β0 + β1x1 + β2x2(t) + · · ·
  • .

24/7/2009gr 2/22

slide-3
SLIDE 3

Survival analysis

Classical statistical approaches

Survival Approaches

Survival or Event history analysis (Blossfeld and Rohwer, 2002)

Focuses on one event. Concerned with duration until event occurs

  • r with hazard of experiencing event.

Survival curves: Distribution of duration until event occurs S(t) = p(T ≥ t) . Hazard models: Regression like models for S(t, x) or hazard h(t) = p(T = t | T ≥ t) h(t, x) = g

  • t, β0 + β1x1 + β2x2(t) + · · ·
  • .

24/7/2009gr 2/22

slide-4
SLIDE 4

Survival analysis

Survival curves (Switzerland, SHP 2002 biographical survey)

Women 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 AGE (years) Survival probability Leaving home Marriage 1st Chilbirth Parents' death Last child left Divorce Widowing

24/7/2009gr 3/22

slide-5
SLIDE 5

Survival analysis

Analysis of sequences

Frequencies of given subsequences

Essentially event sequences. Subsequences considered as categories ⇒ Methods for categorical data apply (Frequencies, cross tables, log-linear models, logistic regression, ...).

Markov chain models

State sequences. Focuses on transition rates between states. Does the rate also depend on previous states? How many previous states are significant?

Optimal Matching (Abbott and Forrest, 1986) .

State sequences. Edit distance (Levenshtein, 1966; Needleman and Wunsch, 1970) between pairs of sequences. Clustering of sequences.

24/7/2009gr 4/22

slide-6
SLIDE 6

Survival analysis

Typology of methods for life course data

Issues Questions duration/hazard state/event sequencing descriptive

  • Survival curves:
  • Optimal matching

Parametric clustering (Weibull, Gompertz, ...)

  • Frequencies of given

and non parametric patterns (Kaplan-Meier, Nelson-

  • Discovering typical

Aalen) estimators. episodes causality

  • Hazard regression models
  • Markov models

(Cox, ...)

  • Mobility trees
  • Survival trees
  • Association rules

among episodes

24/7/2009gr 5/22

slide-7
SLIDE 7

Survival analysis Survival Trees

Table of content

1

Survival Trees

24/7/2009gr 6/22

slide-8
SLIDE 8

Survival analysis Survival Trees The biographical SHP dataset

Section content

1

Survival Trees The biographical SHP dataset Survival Tree Principle Example Social Science Issues

24/7/2009gr 7/22

slide-9
SLIDE 9

Survival analysis Survival Trees The biographical SHP dataset

SHP biographical retrospective survey

http://www.swisspanel.ch

SHP retrospective survey: 2001 (860) and 2002 (4700 cases). We consider only data collected in 2002. Data completed with variables from 2002 wave (language). Characteristics of retained data for divorce (individuals who get married at least once) men women Total Total 1414 1656 3070 1st marriage dissolution 231 308 539 16.3% 18.6% 17.6%

24/7/2009gr 8/22

slide-10
SLIDE 10

Survival analysis Survival Trees The biographical SHP dataset

SHP biographical retrospective survey

http://www.swisspanel.ch

SHP retrospective survey: 2001 (860) and 2002 (4700 cases). We consider only data collected in 2002. Data completed with variables from 2002 wave (language). Characteristics of retained data for divorce (individuals who get married at least once) men women Total Total 1414 1656 3070 1st marriage dissolution 231 308 539 16.3% 18.6% 17.6%

24/7/2009gr 8/22

slide-11
SLIDE 11

Survival analysis Survival Trees The biographical SHP dataset

Distribution by birth cohort

Birth year

year Frequency 1910 1920 1930 1940 1950 1960 100 200 300 400 500

24/7/2009gr 9/22

slide-12
SLIDE 12

Survival analysis Survival Trees The biographical SHP dataset

Marriage duration until divorce

Survival curves

0 8 0.85 0.9 0.95 1 vie 0.5 0.55 0.6 0.65 0.7 0.75 0.8 10 20 30 40

  • prob. de surv

Durée du mariage, Femmes 0 8 0.85 0.9 0.95 1 vie 0.5 0.55 0.6 0.65 0.7 0.75 0.8 10 20 30 40

  • prob. de surv

Durée du mariage, Hommes

0 8 v 8 v 1942 et avant 1943-1952 1953 et après

24/7/2009gr 10/22

slide-13
SLIDE 13

Survival analysis Survival Trees The biographical SHP dataset

Marriage duration until divorce

Hazard model

Discrete time model (logistic regression on person-year data) exp(B) gives the Odds Ratio, i.e. change in the odd h/(1 − h) when covariate increased by 1 unit. exp(B) Sig. birthyr 1.0088 0.002 university 1.22 0.043 child 0.73 0.000 language unknwn 1.47 0.000 French 1.26 0.007 German 1 ref Italian 0.89 0.537 Constant 0.0000000004 0.000

24/7/2009gr 11/22

slide-14
SLIDE 14

Survival analysis Survival Trees Survival Tree Principle

Section content

1

Survival Trees The biographical SHP dataset Survival Tree Principle Example Social Science Issues

24/7/2009gr 12/22

slide-15
SLIDE 15

Survival analysis Survival Trees Survival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic. Aim: Partition data set into groups that differ as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survival curves by selecting split with smallest p-value of Tarone-Ware Chi-square statistics TW =

  • i

wi

  • di1 − E(Di)
  • w 2

i var(Di)

1/2

are as homogeneous as possible (min within class variability)

Example: Leblanc and Crowley (1992) maximize gain in deviance (-log-likelihood) of relative risk estimates.

24/7/2009gr 13/22

slide-16
SLIDE 16

Survival analysis Survival Trees Survival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic. Aim: Partition data set into groups that differ as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survival curves by selecting split with smallest p-value of Tarone-Ware Chi-square statistics TW =

  • i

wi

  • di1 − E(Di)
  • w 2

i var(Di)

1/2

are as homogeneous as possible (min within class variability)

Example: Leblanc and Crowley (1992) maximize gain in deviance (-log-likelihood) of relative risk estimates.

24/7/2009gr 13/22

slide-17
SLIDE 17

Survival analysis Survival Trees Survival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic. Aim: Partition data set into groups that differ as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survival curves by selecting split with smallest p-value of Tarone-Ware Chi-square statistics TW =

  • i

wi

  • di1 − E(Di)
  • w 2

i var(Di)

1/2

are as homogeneous as possible (min within class variability)

Example: Leblanc and Crowley (1992) maximize gain in deviance (-log-likelihood) of relative risk estimates.

24/7/2009gr 13/22

slide-18
SLIDE 18

Survival analysis Survival Trees Example

Section content

1

Survival Trees The biographical SHP dataset Survival Tree Principle Example Social Science Issues

24/7/2009gr 14/22

slide-19
SLIDE 19

Survival analysis Survival Trees Example

Divorce, Switzerland, Differences in KM Survival Curves I

24/7/2009gr 15/22

slide-20
SLIDE 20

Survival analysis Survival Trees Example

Divorce, Switzerland, Differences in KM Survival Curves II

10 20 30 40 0.5 0.6 0.7 0.8 0.9 1.0 Cohort <=1940 & Non French Speaking & University Cohort <=1940 & Non French Speaking & < University Cohort <=1940 & French Speaking Cohort > 1940 & No Child & University Cohort > 1940 & No Child & < University Cohort > 1940 & Child & German or Italian Speaking Cohort > 1940 & Child & French or Unknown Speaking

24/7/2009gr 16/22

slide-21
SLIDE 21

Survival analysis Survival Trees Example

Divorce, Switzerland, Relative risk

24/7/2009gr 17/22

slide-22
SLIDE 22

Survival analysis Survival Trees Example

Hazard model with interaction

Adding interaction effects detected with the tree approach improves significantly the fit (sig ∆χ2 = 0.004) exp(B) Sig. born after 1940 1.78 0.000 university 1.22 0.049 child 0.94 0.619 language unknwn 1.50 0.000 French 1.12 0.282 German 1 ref Italian 0.92 0.677 b_before_40*French 1.46 0.028 b_after_40*child 0.68 0.010 Constant 0.008 0.000

24/7/2009gr 18/22

slide-23
SLIDE 23

Survival analysis Survival Trees Social Science Issues

Section content

1

Survival Trees The biographical SHP dataset Survival Tree Principle Example Social Science Issues

24/7/2009gr 19/22

slide-24
SLIDE 24

Survival analysis Survival Trees Social Science Issues

Issues with survival trees in social sciences

1 Dealing with time varying predictors

Segal (1992) discusses few possibilities, none being really satisfactory. Huang et al. (1998) propose a piecewise constant approach suitable for discrete variables and limited number of changes. Room for development ...

2 Multi-level analysis

How can we account for multi-level effects in survival trees, and more generally in trees? Conjecture: Should be possible to include unobserved shared effect in deviance-based splitting criteria.

24/7/2009gr 20/22

slide-25
SLIDE 25

Survival analysis Survival Trees Social Science Issues

Issues with survival trees in social sciences

1 Dealing with time varying predictors

Segal (1992) discusses few possibilities, none being really satisfactory. Huang et al. (1998) propose a piecewise constant approach suitable for discrete variables and limited number of changes. Room for development ...

2 Multi-level analysis

How can we account for multi-level effects in survival trees, and more generally in trees? Conjecture: Should be possible to include unobserved shared effect in deviance-based splitting criteria.

24/7/2009gr 20/22

slide-26
SLIDE 26

Survival analysis For Further Reading

For Further Reading I

Abbott, A. and J. Forrest (1986). Optimal matching methods for historical sequences. Journal of Interdisciplinary History 16, 471–494. Blossfeld, H.-P. and G. Rohwer (2002). Techniques of Event History Modeling, New Approaches to Causal Analysis (2nd ed.). Mahwah NJ: Lawrence Erlbaum. Huang, X., S. Chen, and S. Soong (1998). Piecewise exponential survival trees with time-dependent covariates. Biometrics 54, 1420–1433. Leblanc, M. and J. Crowley (1992). Relative risk trees for censored survival data. Biometrics 48, 411–425. Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710.

24/7/2009gr 21/22

slide-27
SLIDE 27

Survival analysis For Further Reading

For Further Reading II

Needleman, S. and C. Wunsch (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453. Segal, M. R. (1988). Regression trees for censored data. Biometrics 44, 35–47. Segal, M. R. (1992). Tree-structured methods for longitudinal

  • data. Journal of the American Statistical Association 87(418),

407–418.

24/7/2009gr 22/22