variable selection and model choice in survival models
play

Variable Selection and Model Choice in Survival Models with - PowerPoint PPT Presentation

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universit at Erlangen-N


  1. Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg joint work with Thomas Kneib and Torsten Hothorn Department of Statistics Ludwig-Maximilians-Universit¨ at M¨ unchen useR! 2008 1 benjamin.hofner@imbe.med.uni-erlangen.de

  2. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Introduction Cox PH model: λ i ( t ) = λ ( t , x i ) = λ 0 ( t ) exp( x ′ i β ) with λ i ( t ) hazard rate of observation i [ i = 1 , . . . , n ] λ 0 ( t ) baseline hazard rate x i vector of covariates for observation i [ i = 1 , . . . , n ] β vector of regression coefficients Problem: restrictive model, not allowing for non-proportional hazards (e.g., time-varying effects) non-linear effects

  3. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Additive Hazard Regression Generalisation: Additive Hazard Regression (Kneib & Fahrmeir, 2007) λ i ( t ) = exp( η i ( t )) with J � η i ( t ) = f j ( x i ( t )) , j =1 generic representation of covariate effects f j ( x i ) a) linear effects: f j ( x i ( t )) = f linear (˜ x i ) = ˜ x i β b) smooth effects: f j ( x i ( t )) = f smooth (˜ x i ) c) time-varying effects: f j ( x i ( t )) = f smooth ( t ) · ˜ x i where ˜ x i ∈ x i ( t ). Note: c) includes log-baseline for ˜ x i ≡ 1

  4. Introduction Technical Preparations Cox flex Boost Summary / Outlook References P-Splines flexible terms can be represented using P-splines (Eilers & Marx, 1996) model term ( x can be either ˜ x i or t ): � M f j ( x ) = β jm B jm ( x ) ( j = 1 , . . . , J ) m =1 penalty: � κ j β j ′ K β j cases b),c) pen j ( β j ) = 0 case a) with K = D ′ D (i.e., cross product of difference matrix D ) � 1 � − 2 1 . . . e . g . D = 0 1 − 2 1 . . . κ j smoothing parameter (larger κ j ⇒ more penalization ⇒ smoother fit)

  5. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Inference Penalized Likelihood Criterion: (NB: this is the full log-likelihood) � � t i � n J � � L pen ( β ) = δ i η i ( t i ) − exp( η i ( t )) dt − pen j ( β j ) 0 i =1 j =0 T i true survival time C i censoring time t i = min( T i , C i ) observed survival time (right censoring) δ i = 1 ( T i ≤ C i ) indicator for non-censoring Problem: Estimation and in particular model choice

  6. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( · ) for each covariate (or each model component) [ j ∈ { 1 , . . . , J } ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

  7. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( · ) for each covariate (or each model component) [ j ∈ { 1 , . . . , J } ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

  8. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( · ) for each covariate (or each model component) [ j ∈ { 1 , . . . , J } ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

  9. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Cox flex Boost Algorithm (i) Initialization: Iteration index m := 0. Function estimates (for all j ∈ { 1 , . . . , J } ): ˆ f [0] ( · ) ≡ 0 j Offset (MLE for constant log hazard): �� n � i =1 δ i η [0] ( · ) ≡ log ˆ � n i =1 t i

  10. Introduction Technical Preparations Cox flex Boost Summary / Outlook References (ii) Estimation: m := m + 1. Fit all (linear/P-spline) base-learners separately g j = g j ( · ; ˆ ˆ β j ) , ∀ j ∈ { 1 , . . . , J } , by penalized MLE, i.e., ˆ L [ m ] β j = arg max j , pen ( β ) β with the penalized log-likelihood ( analogously as above ) � � n L [ m ] η [ m − 1] j , pen ( β ) = δ i · (ˆ + g j ( x i ( t i ); β )) i i =1 � � t i � � η [ m − 1] ( ˜ t ) + g j ( x i ( ˜ d ˜ − exp ˆ t ); β ) t − pen j ( β ) , i 0 with the additive predictor η i split η [ m − 1] into the estimate from previous iteration ˆ i and the current base-learner g j ( · ; β )

  11. Introduction Technical Preparations Cox flex Boost Summary / Outlook References (iii) Selection: Choose base-learner ˆ g j ∗ with j ∗ = arg j ∈{ 1 ,..., J } L [ m ] j , unpen (ˆ max β j ) (iv) Update: Function estimates (for all j ∈ { 1 , . . . , J } ): � ˆ f [ m − 1] + ν · ˆ g j j = j ∗ f [ m ] ˆ j = f [ m − 1] ˆ j j � = j ∗ j Additive predictor (= fit): η [ m ] = ˆ η [ m − 1] + ν · ˆ ˆ g j ∗ with step-length ν ∈ (0 , 1] (here: ν = 0 . 1) (v) Stopping rule: Continue iterating steps (ii) to (iv) until m = m stop

  12. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Some Aspects of Cox flex Boost Estimation full penalized MLE · ν (step-length) based on unpenalized log-likelihood L [ m ] Selection j , unpen specified by (initial) degrees of freedom, i.e., df j = � Base-Learners df j Likelihood-based boosting (in general): See, e.g., Tutz and Binder (2006) Above aspects in Cox flex Boost: See, e.g., model based boosting (B¨ uhlmann & Hothorn, 2007)

  13. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Degrees of Freedom Specifying df more intuitive than specifying smoothing parameter κ Comparable to other modeling components, e.g., linear effects Problem: Not constant over the (boosting) iterations But simulation studies showed: No big deviation from the initial df j = � df j bbs ( x 3 ) 1.0 Estimated degrees of freedom traced 0.8 over the boosting steps for the flexi- 0.6 df ( m ) ble base-learners of x 3 (in 200 repli- 0.4 cates) and initially specified degrees 0.2 of freedom (dashed line). 0.0 0 200 400 600 800 boosting iteration m

  14. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Model Choice Recall from generic representation: f j (˜ x i ) can be a a) linear effect: f j ( x i ( t )) = f linear (˜ x i ) = ˜ x i β b) smooth effect: f j ( x i ( t )) = f smooth (˜ x i ) c) time-varying effect: f j ( x i ( t )) = f smooth ( t ) · ˜ x i ⇒ We see: ˜ x i can enter the model in 3 different ways But how? Add all possibilities as base-learners to the model. Boosting can chose between the possibilities But the df must be comparable! Otherwise: more flexible base-learners are preferred

  15. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Model Choice Recall from generic representation: f j (˜ x i ) can be a a) linear effect: f j ( x i ( t )) = f linear (˜ x i ) = ˜ x i β b) smooth effect: f j ( x i ( t )) = f smooth (˜ x i ) c) time-varying effect: f j ( x i ( t )) = f smooth ( t ) · ˜ x i ⇒ We see: ˜ x i can enter the model in 3 different ways But how? Add all possibilities as base-learners to the model. Boosting can chose between the possibilities But the df must be comparable! Otherwise: more flexible base-learners are preferred

  16. Introduction Technical Preparations Cox flex Boost Summary / Outlook References For higher order differences ( d ≥ 2): df > 1 ( κ → ∞ ) Polynomial of order d − 1 remains unpenalized Solution: Decomposition (based on Kneib, Hothorn, & Tutz, 2008) g ( x ) = β 0 + β 1 x + . . . + β d − 1 x d − 1 + g centered ( x ) � �� � � �� � unpenalized, parametric part deviation from polynomial Add unpenalized part as separate, parametric base-learners Assign df = 1 to the centered effect (and add as P-spline base-learner) Analogously for time-varying effects Technical realization (see Fahrmeir, Kneib, & Lang, 2004): decomposing the vector of regression coefficients β into ( � β unpen , � β pen ) utilizing a spectral decomposition of the penalty matrix

  17. Introduction Technical Preparations Cox flex Boost Summary / Outlook References Early Stopping 1 Run the algorithm m stop -times (previously defined). 2 Determine new � m stop , opt ≤ m stop : ... based on out-of-bag sample (with simulations easy to use) ... based on information criterion, e.g., AIC ⇒ Prevents algorithm to stop in a local maximum (of the log-likelihood) ⇒ Early stopping prevents overfitting

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend