Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality - - PowerPoint PPT Presentation

boosted tree ensembles for predicting postsurgical icu
SMART_READER_LITE
LIVE PREVIEW

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality - - PowerPoint PPT Presentation

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality Garrick Aden-Buie, Yun Chen, Rashad Kayal, Gina Romero, Hui Yang Dept. of Industrial and Management Sciences Engineering College of Engineering University of South Florida, T


slide-1
SLIDE 1

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality

Garrick Aden-Buie, Yun Chen, Rashad Kayal, Gina Romero, Hui Yang

  • Dept. of Industrial and Management Sciences Engineering

College of Engineering University of South Florida, T ampa, FL

INFORMS Annual Meeting 2013, Minneapolis, MN

slide-2
SLIDE 2

Motivation MIMIC II Clinical Data Methods Results

Outline

Motivation MIMIC II Clinical Data Methods Results

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 1

slide-3
SLIDE 3

Motivation MIMIC II Clinical Data Methods Results

Outline

Motivation MIMIC II Clinical Data Methods Results

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 1

slide-4
SLIDE 4

Motivation MIMIC II Clinical Data Methods Results

Trends in Critical Care in US

◮ Critical care beds increased by 6.5% (2000-2005)

◮ Despite 12.2% decrease in hospitals with critical

care and 4.2% reduction overall in hospital beds

◮ Constrained ICU capacity ◮ High quality care: safe, effective, equitable

patient-centered, timely and efficient (IOM)

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 2 Halpern, Neil A, and Stephen M Pastores, 2010 “Critical Care Medicine in the United States 2000-2005”

slide-5
SLIDE 5

Motivation MIMIC II Clinical Data Methods Results

Acuity Scores in ICUs

◮ Existing acuity scores

◮ APACHE ◮ SAPS ◮ MPM ◮ SOFA

◮ Aim to compensate for population differences

to objectively compare practices across ICUs

◮ Need for patient-specific prognostic models

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 3

slide-6
SLIDE 6

Motivation MIMIC II Clinical Data Methods Results

Objective

◮ T

  • develop a data-driven, patient-specific

prognostic model to predict in-hospital death in post-surgical ICU patients.

◮ T

  • support effective, efficient use of

critical care resources

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 4

slide-7
SLIDE 7

Motivation MIMIC II Clinical Data Methods Results

Overview

◮ We created and evaluated a gradient boosted trees

model using routine patient data recorded during the first 48 hours of an ICU visit.

◮ Uses heterogeneous, routinely-collected data ◮ Requires minimal preprocessing ◮ Effectively addresses sampling and missing

information issues

◮ Accurately predicts in-hospital mortality

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 5

slide-8
SLIDE 8

Motivation MIMIC II Clinical Data Methods Results

Outline

Motivation MIMIC II Clinical Data Methods Results

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 5

slide-9
SLIDE 9

Motivation MIMIC II Clinical Data Methods Results

MIMIC II Clinical Data

◮ Physiologic signals and vital signs from patient

monitoring and hospital information systems

◮ PhysioNet Computing in Cardiology 2012 Challenge ◮ 12,000 patients divided into 3 sets of 4,000

◮ Set A: Training ◮ Set B: Validation ◮ Set C: T

esting

◮ Inclusion criteria

◮ Age ≥ 16 years ◮ Initial ICU stay ≥ 48hrs

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 6 http://www.physionet.org/challenge/2012/

slide-10
SLIDE 10

Motivation MIMIC II Clinical Data Methods Results

MIMIC II Clinical Data

◮ Physiologic signals and vital signs from patient

monitoring and hospital information systems

◮ PhysioNet Computing in Cardiology 2012 Challenge ◮ 12,000 patients divided into 3 sets of 4,000

◮ Set A: Training ◮ Set B: T

esting

◮ Inclusion criteria

◮ Age ≥ 16 years ◮ Initial ICU stay ≥ 48hrs

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 6 http://www.physionet.org/challenge/2012/

slide-11
SLIDE 11

Motivation MIMIC II Clinical Data Methods Results

Input Variables

◮ Up to 41 variables recorded per patient

◮ 5 general descriptors ◮ 36 time series variables

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 7

slide-12
SLIDE 12

Motivation MIMIC II Clinical Data Methods Results

General Descriptors

Variable Mean S.D. Age 64.5 yrs 17.1 Height 169.5 cm 17.1 Weight 81.2 kg 23.8 Gender Male: 56.1% Female: 43.8% ICU T ype Medical: 35.8% Surgical: 28.4% Cardiac surgery: 21.1% Coronary: 21.1% In-Hospital Death 13.85%

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 8

slide-13
SLIDE 13

Motivation MIMIC II Clinical Data Methods Results

Time Series Variables

36 variables describing

◮ Arterial Blood Gasses ◮ Cardiac Biomarkers ◮ Blood Count ◮ Consciousness ◮ Hepatic Function ◮ Overall Condition ◮ Renal Function ◮ Serum Electrolytes ◮ Ventilation Support ◮ Vital Signs

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 9

slide-14
SLIDE 14

Example Patient: Survived

  • ●● ● ● ●
  • Albumin

ALP ALT AST Bilirubin Cholesterol SaO2 TroponinI TroponinT MechVent BUN Creatinine Glucose HCO3 HCT K Lactate Mg Na PaCO2 PaO2 pH Platelets WBC DiasABP NIDiasABP MAP NIMAP SysABP NISysABP GCS HR Temp Urine.Sum FiO2 Weight 3.50 3.75 4.00 4.25 4.50 57.50 57.75 58.00 58.25 58.50 58.50 58.75 59.00 59.25 59.50 119.50 119.75 120.00 120.25 120.50 0.00 0.25 0.50 0.75 1.00 −1.50 −1.25 −1.00 −0.75 −0.50 97 98 99 100 −1.50 −1.25 −1.00 −0.75 −0.50 2.6 2.8 3.0 3.2 −1.50 −1.25 −1.00 −0.75 −0.50 8 10 12 14 0.700 0.725 0.750 0.775 0.800 90 93 96 99 21 22 23 24 25 26 36 37 38 39 3.6 3.7 3.8 3.9 4.0 4.1 −1.50 −1.25 −1.00 −0.75 −0.50 1.6 1.7 1.8 1.9 2.0 2.1 135 136 137 138 139 140 −1.50 −1.25 −1.00 −0.75 −0.50 −1.50 −1.25 −1.00 −0.75 −0.50 −1.50 −1.25 −1.00 −0.75 −0.50 160 180 200 220 8 9 10 11 55 60 65 70 75 80 20 40 60 80 65 70 75 80 85 90 25 50 75 75 80 85 90 95 100 50 100 14.50 14.75 15.00 15.25 15.50 70 80 90 100 110 36.00 36.25 36.50 36.75 20 40 60 80 −1.50 −1.25 −1.00 −0.75 −0.50 99.50 99.75 100.00 100.25 100.50 1000 2000 1000 2000 1000 2000 1000 2000 1000 2000 1000 2000

Time

Patient 133659 −− Outcome: 0 Female Age: 46 Weight: 220lbs Height: 5' 10" BMI: 31.63 kg/m2 ICUType: 1:Coronary Care

slide-15
SLIDE 15

Example Patient: In-Hospital Death

  • ● ●
  • ● ●
  • ● ●●● ●●●
  • ● ●●
  • ● ●
  • ●●● ●
  • ● ●
  • ● ● ● ● ●
  • Albumin

ALP ALT AST Bilirubin Cholesterol SaO2 TroponinI TroponinT MechVent BUN Creatinine Glucose HCO3 HCT K Lactate Mg Na PaCO2 PaO2 pH Platelets WBC DiasABP NIDiasABP MAP NIMAP SysABP NISysABP GCS HR Temp Urine.Sum FiO2 Weight 3.00 3.25 3.50 3.75 71.50 71.75 72.00 72.25 72.50 498.50 498.75 499.00 499.25 499.50 67.50 67.75 68.00 68.25 68.50 0.25 0.50 0.75 1.00 −1.50 −1.25 −1.00 −0.75 −0.50 85 90 95 0.4 0.6 0.8 1.0 1.2 1.4 −1.50 −1.25 −1.00 −0.75 −0.50 0.50 0.75 1.00 1.25 1.50 24 28 32 36 0.5 0.6 0.7 0.8 100 105 110 115 120 125 28 30 32 34 32 36 40 3.2 3.4 3.6 3.8 4.0 −1.50 −1.25 −1.00 −0.75 −0.50 2.100 2.125 2.150 2.175 2.200 140 141 142 143 144 50 60 70 100 200 300 7.25 7.30 7.35 7.40 205 210 9.5 10.0 10.5 40 50 60 70 80 −1.50 −1.25 −1.00 −0.75 −0.50 60 80 100 −1.50 −1.25 −1.00 −0.75 −0.50 100 150 −1.50 −1.25 −1.00 −0.75 −0.50 8 10 12 14 60 70 80 90 100 35 36 37 38 10 20 30 0.40 0.45 0.50 0.55 0.60 51.50 51.75 52.00 52.25 52.50 1000 2000 1000 2000 1000 2000 1000 2000 1000 2000 1000 2000

Time

Patient 142106 −− Outcome: 1 Male Age: 70 Weight: 115lbs Height: 5' 2" BMI: 20.96 kg/m2 ICUType: 1:Coronary Care

slide-16
SLIDE 16

Motivation MIMIC II Clinical Data Methods Results

Outline

Motivation MIMIC II Clinical Data Methods Results

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 11

slide-17
SLIDE 17

Motivation MIMIC II Clinical Data Methods Results

Preprocessing Overview

◮ Correct implausible values ◮ Categorize variables by

  • 1. Consistency of inclusion
  • 2. Number of observations when recorded

◮ Missing information ◮ Feature extraction ◮ Feature selection

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 12

slide-18
SLIDE 18

DiasABP HR MAP SysABP BUN Creatinine GCS HCT Temp Platelets WBC Na HCO3 K Mg Glucose Urine.Sum pH PaCO2 PaO2 FiO2 MechVent Lactate SaO2 AST ALT Bilirubin ALP Albumin RespRate TroponinT Cholesterol TroponinI 25 50 75 100 Type Infrequent Time Series

Consistency of variable inclusion

slide-19
SLIDE 19

Motivation MIMIC II Clinical Data Methods Results

Infrequently Included Variables

◮ Infrequently included variables

◮ Are included in ≤ 45% training set patients

◮ Transformed to a categorical variable:

◮ 0 = Not recorded ◮ 1 = Recorded & within normal range ◮ 2 = Recorded & abnormal

◮ Significant portion of missing minimal information

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 14

slide-20
SLIDE 20

SysABP DiasABP MAP HR Weight Urine.Sum Temp GCS FiO2 pH PaO2 PaCO2 HCT WBC Platelets Na Mg K HCO3 Glucose Creatinine BUN Lactate 20 40 60 80 Type Low−Freq Time Series Full Time Series

Number of observations per variable per patient

slide-21
SLIDE 21

Motivation MIMIC II Clinical Data Methods Results

Time Series Variables

◮ Low-frequency time series

◮ < 10 observations for ≥ 75% training set patients

◮ Full time series

◮ Variables not meeting the above criteria

◮ If no observation recorded for a variable:

◮ Impute from normal distribution representing

gender-specific normal physiologic values

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 16

slide-22
SLIDE 22

Motivation MIMIC II Clinical Data Methods Results

Feature Extraction

◮ Low-frequency time series

  • 1. Mean

◮ Full time series

  • 1. Mean, Median
  • 2. Min, Max
  • 3. First/Last Observation
  • 4. Trend over 0–24, 24–48, and 0–48 hours

◮ Requires 5, 5, 10 observations

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 17

slide-23
SLIDE 23

Motivation MIMIC II Clinical Data Methods Results

Feature Selection by mRMR

◮ mRMR: Minimum Redundancy, Maximum

Relevancy

◮ Redundancy: mutual information between

two features

◮ Relevancy: mutual information between

features and outcome

◮ Heuristic: scores and ranks features ◮ One feature per category selected

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 18 Peng, H, Fulmi Long, and C Ding, 2005. “Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy.”

slide-24
SLIDE 24

Motivation MIMIC II Clinical Data Methods Results

Boosted Tree Ensembles

◮ A weak learner can be boosted by aggregating the

predictions of an ensemble of weak learners

◮ Boost accuracy and retain benefits of weak learner ◮ Decision stumps

◮ Natural handling of heterogeneous data ◮ Non-linear ◮ Minimal preprocessing

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 19 Schapire, Robert E. "The strength of weak learnability." Machine learning 5, no. 2 (1990)

slide-25
SLIDE 25

Motivation MIMIC II Clinical Data Methods Results

Gradient Boosted Trees

◮ Given a feature vector, x = (x1, x2, . . . , xi),

and outcome labels Y = {0, 1}

◮ Build a function g(x): x → y ∈ Y ◮ g(x) = log

  • p(x)

1−p(x)

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 20

slide-26
SLIDE 26

Motivation MIMIC II Clinical Data Methods Results

Gradient Boosted Trees Algorithm

◮ Initialize g0(x) = basline log-odds of in-hospital

death

◮ Each step: find an h(x) to add to collection gm(x):

◮ Select a random subsample of training data, ˜

N

◮ Search for a decision stump h(x) that

best improves fit of gm(x) + h(x) on ˜ N

◮ Best fit is determined by maximized

Bernoulli log-likelihood

◮ gm+1(x) ← gm(x) + λh(x)

◮ Parameters selected by 10-fold cross validation

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 21 Generalized Boosted Regression Models, Greg Ridgeway R package version 2.0-8 – http://cran.R-project.org/package=gbm

slide-27
SLIDE 27

Motivation MIMIC II Clinical Data Methods Results

Outline

Motivation MIMIC II Clinical Data Methods Results

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 21

slide-28
SLIDE 28

Motivation MIMIC II Clinical Data Methods Results

PhysioNet Scoring

Optimize Precision-Recall curve: min(Se, PPV) Sensitivity Positive Predictivity Se = TP TP + FN PPV = TP TP + FP

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 22

slide-29
SLIDE 29

Motivation MIMIC II Clinical Data Methods Results

Performance on Sets A & B

Set A Score 0.481 Threshold 0.568 Score at thresh 0.453 Sensitivity 0.795 Specificity 0.767 AUC 0.848

Average across 10 folds

Set B Se 0.532 PPV 0.496 Final Score 0.496

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 23

slide-30
SLIDE 30

Motivation MIMIC II Clinical Data Methods Results

Performance Comparison

Method Score Random Classifier 0.15 SAPS-I 0.32 Fuzzy Rule Based System 0.36 Cascaded AdaBoost 0.38 Time Series Motifs 0.50 Gradient Boosted Trees 0.50 Logistic Regression & Hidden Markov Model 0.50 2-Layer Neural Network 0.51 Bayesian Ensemble 0.53

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 24

slide-31
SLIDE 31

Summary

Summary

◮ We developed a boosted tree ensemble model for

prediction of in-hospital mortality of ICU patients, using patient data collected over the first 48 hours

  • f ICU stay.

◮ Effectively uses routinely-collected ICU patient data ◮ Addresses ICU needs in clinical planning ◮ Future Work:

◮ Extend our model to provide and update

predictions during the 48 hour period

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 25

slide-32
SLIDE 32

Summary

Acknowledgements

US National Science Foundation CMMI-1266331, IOS-1146882 University of South Florida Internal Research Award (Grant No. 76734) Thank you Questions?

  • G. Aden-Buie

Boosted Tree Ensembles for Predicting Postsurgical ICU Mortality 26