Coxs proportional hazards/regression model - model assessment - PowerPoint PPT Presentation

Cox’s proportional hazards/regression model - model assessment Rasmus Waagepetersen October 19, 2020 1 / 14

Topics: ◮ Plots based on estimated cumulative hazards ◮ Cox-Snell residuals: overall check of fit ◮ Martingale residuals: assessment of functional form of covariate ◮ Deviance residuals: detection of outliers ◮ Score-process residual: check of proportional hazards for each covariate ◮ Detection of influential observations. 2 / 14

Why not just proceed as for linear normal models ? Issues: ◮ censoring. ◮ for Cox ph model we do not have a fully specified model - thus we do not know distribution of residuals. Generally, residual analysis is a bit tricky not only for survival data but for non-normal data in general - residuals tend to look ‘ugly’ even if the model is correct. 3 / 14

Model with one factor Suppose we have observations ( t ij , δ ij ) i = 1 , . . . , K and model for the i th group h i ( t ij ) = h 0 ( t ij ) exp( β i ) Compute a cumulative hazard estimate ˆ H i for each group. Recall H i ( t ) = H 0 ( t ) exp( β i ) ⇔ log H i ( t ) = log H 0 ( t ) + β i Various types of plots can be considered 1. log ˆ H i ( t )’s against t 2. log ˆ H i vs log ˆ H j 3. ˆ H i vs ˆ H j 4. log ˆ H i ( t ) − log ˆ H 1 ( t )’s vs t . Alternatives 2.-4. require a bit of programming since the estimates are not obtained for the same t s. 4 / 14

Stratified Cox process Suppose we have several covariates and the first is a factor dividing subjects into K groups. Then a stratified Cox model is specified by h i ( t | z ) = h 0 i ( t ) exp( z T − 1 β − 1 ) where h i ( ·| z − 1 ) is the hazard for a subject in the i th group with remaining covariate vector z − 1 = ( z 2 , . . . , z p ) T . That is, a separate baseline hazard h 0 i for each group/strata. If proportional hazards holds for the factor used for stratification then H 0 i ( t ) = H 0 ( t ) exp( β i ) . So we can make plots similar to those on the previous slide to assess proportional hazards for the factor considered. If we want to assess ph for a quantitative covariate then we can initially discretize it into a factor variable. 5 / 14

Martingale residuals Martingale residuals: r M = δ i − ˆ H 0 ( t i ) exp( z T i ˆ β ) i Very skewed with values in interval ] − ∞ , 1]. Not useful for detecting outliers. May be used for assessing functional form of covariate by computing r M for model without covariate and plotting r M against i i the omitted covariate. Curve fitted to scatter plot may give indication of possible transformation of covariate. Reason for terminology will be more clear when we later on discuss counting processes and martingales. 6 / 14

Cox-Snell Cox-Snell residuals based on results for continuous random variable X with survivor function S and cumulative hazard and H : S ( X ) ∼ Unif(]0 , 1[) H ( X ) ∼ Exp(1) . Cox-Snell residual: = ˆ i ˆ r C H 0 ( t i ) exp( z T β ) = δ i − r M i i Cox-Snell residuals should look like censored sample of unit-rate exponential random variables which have H ( t ) = t . This can be checked by considering estimated cumulative hazard for r C i . Cox-Snell residuals may be used for checking overall fit of model - but see reservations in practical notes in KM page 358-359. 7 / 14

Deviance residuals Deviance residuals are obtained by applying ‘symmetrizing’ transformation to martingale residuals: r D = sign( r M i )[ − 2( r M + δ i log( δ i − r M i ))] 1 / 2 . i i These residuals should look (approximately) like a sample of iid normal random variables if model correct. However, if heavy censoring distribution becomes bimodal. May be useful for spotting outliers. 8 / 14

Schoenfeld residuals and score process For a time t let R t denote the random index of the person that dies at t given that persons R ( t ) are at risk and that a death occurs at time t . Recall score function u ( β ) for Cox’s partial likelihood is a sum of terms ( p -dimensional vectors) u i ( β ) = z i − E [ z R ti | H ( t i )]) = z i − e i i ∈ D where H ( t i ) is history up to time t i (determines R ( t i ) and that a death occurs at time t i ). The components of these terms are also known as Schoenfeld residuals (KM page 376). 9 / 14

We can define the score process (KM page 376) as � u ( β, t ) = u l ( β ) l ∈ D : t l ≤ t By definition u (ˆ β, t ) = 0 for t greater than the maximal observed death time. KM suggest to plot score process u (ˆ β, t ) against time and compare with 95% boundaries of Brownian bridge process. Martinussen and Scheike (2006) Dynamic regression models for survival data, suggest to compare with simulations of score process under assumed model. 10 / 14

The score process can also be expressed as n ( z l − e ( l )) � δ i ( z i − e i ) − exp( z T � u ( β, t ) = i β ) k ∈ R ( t l ) exp( z T � k β ) i =1 l ∈ D : t l ≤ t (we will see later why, when considering counting processes and martingales). The score residuals are given by the components of u ( β, t i ), i = 1 , . . . , n (i.e. in total np residuals). These are also available from the residuals function and can be cumulated to obtain score process. 11 / 14

Assessment of timevarying effects Suppose that we do not have proportional hazards for the j th covariate in the sense that the true effect of z j is timevarying: β j ( t ) = β j + γ j g ( t ) . Let r S j , i be Schoenfeld residual scaled with the covariance matrix of ˆ β . Then the expected value of r S j , i is approximately equal to γ j g ( t i ). Thus a plot of scaled Schoenfeld residuals versus time may reveal deviations from proportional hazards. Implemented in the cox.zph procedure. This is not covered in KM. See e.g. book by Collett. 12 / 14

Influential observations Do some observations have unusually large influence on estimation of β ? Let ˆ β and ˆ β − i denote estimates of β based on full data set and data with i th observation omitted. Want to look for i where β − ˆ ˆ β − i is an outlier. Based on score process residuals it is possible to compute approximation of ˆ β − i - i.e. we do not need to fit Cox model for all datasets obtained by omitting one observation. The resulting estimates of ˆ β − ˆ β − i are called dfbeta in the residual function for coxph objects. 13 / 14

Use of formal testing ? KM note 5 page on 380 advocates use of graphical checks rather than formal tests. This is because we know that any statistical model is just an approximation and thus is bound to be rejected if the sample size is large enough. Remember the famous quote by Box: ‘all models are wrong but some are useful’ Graphical checks may reveal if there are any serious deviations between model and data and possibly also hint to the cause of such deviations. 14 / 14

Coxs proportional hazards/regression model - model assessment - PowerPoint PPT Presentation

Coxs proportional hazards/regression model - model assessment Rasmus Waagepetersen October 19, 2020 1 / 14 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale residuals:

The Challenge of Natural Hazards This PowerPoint will cover information on: Natural Hazards

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Occupational Health Hazards PPT-SM-OCPHLTHHAZ 1 V.A.0.0 Occupational Health Hazards Three

Health Hazards in Construction Health Hazards Potential exposures to health hazards: Worker

Analysis of variance and regression Other types of regression models Other types of regression

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

CSCI341 Lecture 36, Pipelining & Hazards RECALL... RECALL... HAZARDS Data Hazards

Lecture 17: Survival Analysis -- Cox proportional Hazards Ani Manichaikul amanicha@jhsph.edu 14

Middle Grades Proportional Reasoning Middle Grades Proportional Reasoning Middle Grades

The Dantzig selector in Coxs proportional hazards model A. Antoniadis 1 , P . Fryzlewicz 2 , F

SIO15-SS1 2020: Topic 6 Earthquake Hazards { SIO15-SS1 2020: Topic 6 Earthquake Hazards {

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

AN INTRODUCTION TO PROPORTIONAL GIVING WHAT IS PROPORTIONAL GIVING(PG)? A new way for local

CHAPTER 12: PRACTICAL ISSUES Feedback Controller - P, I and D Proportional - The proportional

BlinkDB (some figures were poached from the Eurosys conference talk) The Holy Grail Support

Rare events: models and simulations Josselin Garnier (Universit e Paris Diderot)

Tree-based and GA tools for optimal sampling design The R User Conference 2008 August 12-14,

Advance Stochastic Gradient with Variance Reduction Jingchang Liu December 7, 2017 University

Announcements TCE website open - please fill it out! no assignment due next week So

Sampling Methods: How to collect data Some important terms Random - occurring by chance

Estimating the Size of Hidden Populations based on Partially-Observed Network Data Mark S.

An Area Preserving Parametrization for Spherical Rectangles Carlos Urea 1 , Marcos Fajardo 2 ,

Coxs proportional hazards/regression model - model assessment - PowerPoint PPT Presentation

Coxs proportional hazards/regression model - model assessment Rasmus Waagepetersen October 19, 2020 1 / 14 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale residuals:

The Challenge of Natural Hazards This PowerPoint will cover information on: Natural Hazards

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Occupational Health Hazards PPT-SM-OCPHLTHHAZ 1 V.A.0.0 Occupational Health Hazards Three

Health Hazards in Construction Health Hazards Potential exposures to health hazards: Worker

Analysis of variance and regression Other types of regression models Other types of regression

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

CSCI341 Lecture 36, Pipelining &amp; Hazards RECALL... RECALL... HAZARDS Data Hazards

Lecture 17: Survival Analysis -- Cox proportional Hazards Ani Manichaikul amanicha@jhsph.edu 14

Middle Grades Proportional Reasoning Middle Grades Proportional Reasoning Middle Grades

The Dantzig selector in Coxs proportional hazards model A. Antoniadis 1 , P . Fryzlewicz 2 , F

SIO15-SS1 2020: Topic 6 Earthquake Hazards { SIO15-SS1 2020: Topic 6 Earthquake Hazards {

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

AN INTRODUCTION TO PROPORTIONAL GIVING WHAT IS PROPORTIONAL GIVING(PG)? A new way for local

CHAPTER 12: PRACTICAL ISSUES Feedback Controller - P, I and D Proportional - The proportional

BlinkDB (some figures were poached from the Eurosys conference talk) The Holy Grail Support

Rare events: models and simulations Josselin Garnier (Universit e Paris Diderot)

Tree-based and GA tools for optimal sampling design The R User Conference 2008 August 12-14,

Advance Stochastic Gradient with Variance Reduction Jingchang Liu December 7, 2017 University

Announcements TCE website open - please fill it out! no assignment due next week So

Sampling Methods: How to collect data Some important terms Random - occurring by chance

Estimating the Size of Hidden Populations based on Partially-Observed Network Data Mark S.

An Area Preserving Parametrization for Spherical Rectangles Carlos Urea 1 , Marcos Fajardo 2 ,

CSCI341 Lecture 36, Pipelining & Hazards RECALL... RECALL... HAZARDS Data Hazards