of Australian hospital data Liam HEINIGER a , Norm GOOD b and Sankalp - - PowerPoint PPT Presentation

of australian hospital data
SMART_READER_LITE
LIVE PREVIEW

of Australian hospital data Liam HEINIGER a , Norm GOOD b and Sankalp - - PowerPoint PPT Presentation

You can change this image to be appropriate for your topic by inserting an image in this space or use the alternate title slide with lines. Note: only one image should be used and do not overlap the title text. Enter your Business Unit or


slide-1
SLIDE 1

Model selection and variable aggregation

  • f Australian hospital data

You can change this image to be appropriate for your topic by inserting an image in this space or use the alternate title slide with lines. Note: only one image should be used and do not overlap the title text. Enter your Business Unit or Flagship name in the ribbon above the url. Add collaborator logos in the white space below the ribbon. [delete instructions before use]

CSIRO HEALTH & BIOSECURITY FLAGSHIP

Liam HEINIGER a, Norm GOOD b and Sankalp KHANNA b

a University of Queensland, Brisbane, Australia b The CSIRO Australian e-Health Research Centre, Brisbane, Australia

slide-2
SLIDE 2

Patient Flow @ CSIRO AEHRC

Enabling hospitals to better manage their resources & hence reduce waiting times

www.csiro.au/patientflow

2 | Model selection and variable aggregation of Australian hospital data

slide-3
SLIDE 3

Project Background

  • National Emergency Access Target (NEAT): The percentage of

patients who present to the Emergency Department and are waiting for more than four hours

  • Hospital Standardised Mortality Rate (HSMR): ratio of actual

number of deaths to expected number of deaths

  • Looking at the relationship between NEAT and HSMR
  • Our focus here :
  • Predicting the probability of death for patients using Statistical

Modelling

3 | Model selection and variable aggregation of Australian hospital data

slide-4
SLIDE 4

UNDERSTANDING THE PROBLEM

4 | Model selection and variable aggregation of Australian hospital data

slide-5
SLIDE 5

Problem Complexity

  • Problem at hand- Building statistical models of HSMR
  • Model and predict probability of in-hospital mortality for a

given patient

  • HSMR = [Actual number deaths] / [Expected number deaths]
  • Data : Emergency Department (ED) and Inpatient Admission

records from several Australian Hospital over several years

  • In excess of 20 million ED Records.
  • In excess of 20 million Inpatient Records.
  • Large sets of multicollinear variables and potential

complex interactions

  • Categorical variables consisting of hundreds of sparsely

populated levels.

  • Initial Approach
  • Apply a Binomial Generalised Linear Model
  • Intel E5-2630 CPU machine with 2x2.6GHz processors and

128GB of RAM

  • Infeasible solution requiring an unreasonable amount of time

and processing power to compute variable estimates.

5 | Model selection and variable aggregation of Australian hospital data

slide-6
SLIDE 6

The Solution

Regularisation – address multicollinearity, reduce number of predictors

  • Statistical technique for tuning or selecting the preferred level of model

complexity so that models are better at predicting (generalizing).

  • Employed Elastic net regularisation
  • Hybrid of 2 popular techniques
  • Increases grouping
  • Reduces coefficients to zero
  • Works well with highly correlated predictors

Variable Aggregation

  • Reduce number of categories
  • Reduce sparsity

6 | Model selection and variable aggregation of Australian hospital data

slide-7
SLIDE 7

The Solution

7 | Model selection and variable aggregation of Australian hospital data

Step 1 : Pre-aggregation

  • Diseases where all patients died – Highest Risk group
  • Diseases where all patients survived – Lowest Risk group

Step 2 : Regularisation

  • Parameter estimates for remaining levels determined
  • Using binomial generalised linear modelling
  • Using Elastic Net modelling (cut-off is 1 standard deviation from the

minimum error) Step 3 : Aggregation

  • Parameter estimates aggregated into natural bins using the Jenks natural

breaks algorithm

slide-8
SLIDE 8

Results from Step 2 – GLM Model

8 | Model selection and variable aggregation of Australian hospital data

Without Pre-aggregation After Pre-aggregation AUC = 0.75

slide-9
SLIDE 9

Results from Step 2 – Elastic Net Model

9 | Model selection and variable aggregation of Australian hospital data

AUC = 0.65 75% less time

slide-10
SLIDE 10

Results from Step 3

10 | Model selection and variable aggregation of Australian hospital data

  • Parameter estimates aggregated into natural bins using the Jenks natural

breaks algorithm

  • Calculated parameter estimates placed back into a larger model with the other

variables and second order interactions GLM Model

  • AUC = 0.85

Elastic Net

  • More ICD-10 codes placed in the “all survival” level.
  • AUC = 0.85

The method chosen for aggregating variables is less significant that the act of aggregation itself

slide-11
SLIDE 11

Summary

  • Complexity often confounds health data modelling
  • Multicollinearity
  • High number of levels in categorical variables
  • Conventional models often fail due to such issues
  • Techniques like Elastic Net regularisation and variable aggregation

can provide efficient mechanisms

11 | Model selection and variable aggregation of Australian hospital data

slide-12
SLIDE 12

THE AUSTRALIAN E-HEALTH RESEARCH CENTRE

Thank you

For more information, please contact : Norm Good Senior Experimental Scientist t +61 7 3253 3640 e Norm.Good@csiro.au w www.aehrc.com Sankalp Khanna Research Scientist t +61 7 3253 3629 e Sankalp.Khanna@csiro.au w www.aehrc.com