Predicting ED Attendance from GP Records Jon Patrick CEO - - PowerPoint PPT Presentation

predicting ed attendance from
SMART_READER_LITE
LIVE PREVIEW

Predicting ED Attendance from GP Records Jon Patrick CEO - - PowerPoint PPT Presentation

Predicting ED Attendance from GP Records Jon Patrick CEO Statement of Interests Project was funded by HCF Foundation to Outcome Health HLA was contracted by Outcome Health to perform the predictive modeling and NLP work The


slide-1
SLIDE 1

Predicting ED Attendance from GP Records

Jon Patrick CEO

slide-2
SLIDE 2

Statement of Interests

  • Project was funded by HCF Foundation to

Outcome Health

  • HLA was contracted by Outcome Health to

perform the predictive modeling and NLP work

slide-3
SLIDE 3

The beginnings

  • HLA was contracted to provide the SNOMED

CT coding of 56,00 GP Reason for Visit notes.

  • A Common Usage Classification System was

created by HLA to better represent the classes of information important to the client.

  • This was based on a “dismembering and

reassembly” of the SCT hierarchy to fit the client needs.

slide-4
SLIDE 4

Outcome Health Records

  • 35,416 GP visits for 20,971 Unique patients with

subsequent ED records extracted from the VEMD

– Data attributes as part of the Patient Record:

  • Care Plan Goal (Have Goal 4%) with major classes

– Improve general health 1424, – Prevent influenza 1348, – Prevent complications 1166, – Maintain function 1147, – Manage pain 1125, – Reduce rate of progression of disease 1058, – Improve knowledge of condition 1035, – Maintain mobility 979

slide-5
SLIDE 5

More on Records

Data attributes from the Clinical tables includes:

  • Smoking status(12% smoker, 23% ex-smoker),
  • Alcohol status(9% drinker, 5% non-drinker),
  • Allergy status (known allergy 12%).
slide-6
SLIDE 6

Descriptive Statistics

  • List of Diagnoses has about 30% (6478) of patients with reported
  • diagnoses. The relative importance of this data attribute vis-a-vis the

Reason_for_Visit attribute was an open question.

  • About 24% of visits use 4500 unique diagnosis descriptions.
  • The most frequent diagnoses by Visit are {

– Hypertension 197, – URTI 174, – Asthma 174, – Depression 146, – Bronchitis 135, – Tonsillitis 115, – UTI 114, – Otitis media 97, – Gastroenteritis 95, – Review 87}

slide-7
SLIDE 7

Diagnosis records

slide-8
SLIDE 8

Descriptive Statistics

  • 86% of visits have some value for Diagnosis-Status-at-Visit –
  • The list of Diseases
  • COPD,
  • BOneJointDisease,
  • Diabetes,
  • Cancer,
  • CHD,
  • Asthma,
  • Gastroenteritis,
  • Stroke,
  • Influenza,
  • Hypertension,
  • Anxiety,
  • Depression,
  • Hepatitis}
  • Frequencies vary from 3-12%.
slide-9
SLIDE 9

Attributes for the Model – Pt 1

  • There are 14 attribute groups making up 27 attributes. Two attributes {Reaction

Types and Pathology Result Types have a total of 2341 attributes making a total of 2368 attributes. The range of values for each attribute is listed below. Many attribute values are left empty or have content equivalent to “unknown”

  • BP recorded (6 values)
  • Care goal (925 values)
  • Clinical fields

– clinical-smoke info (5 values), – clinical-alcohol info (4 values), – clinical-allergy info (4 values)

  • Diagnosis Details

– diagnosis-name (4099 values), – diagnosis-SCT category (29 values)

  • Immunisation (403 values)
  • MBS (69 values)
  • Reaction types and values (1661 types, 4 values)
slide-10
SLIDE 10

Attributes for the Model –Pt 2

  • Script Details

– Script-generic name (982 values), – script-drug name (2340 values), – script-product name (2058 values), – script-frequency (21 values), – script-repeat (4 values), – script-substitutions (2 values), – script-reason (956 values), – script-medication id (918 values)

  • Tobacco Usage

– Tobacco-risk factor (4 values), – tobacco-quit status (2 values)

  • GP Visit details

– GP visit-duration (5 values), – GP visit-age (6 values), – GP visit-type (8 values)

  • Gender (4 values)
slide-11
SLIDE 11

Baseline Predictive Model

The results also show that 24% of the patients who are admitted to ED within 30_days of a GP visit are not correctly identified in this model,

  • f which about 77% are classified as over_365_days class.
slide-12
SLIDE 12

2-class Model

slide-13
SLIDE 13

Feature Sets

  • Sparse features
  • No useful feature set can be extracted from
  • ver 13% of patient records (group 1) and
  • nly one useful feature set can be extracted

from over 17% of patient records

slide-14
SLIDE 14

1000 2000 3000 4000 5000 6000 7000 8000 30_day TP num 30_day FP num 30_day FN num 90_day TP num 90_day FP num 90_day FN num 180_day TP num 180_day FP num 180_day FN num 365_day TP num 365_day FP num 365_day FN num

  • ver_365_day TP num
  • ver_365_day FP num
  • ver_365_day FN num

All Class Gender Distribution

U M F

slide-15
SLIDE 15

50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104

Age Distribution for TP

30_day TP num 90_day TP num 180_day TP num 365_day TP num

  • ver_365_day TP num
slide-16
SLIDE 16

50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104

Age Distribution for FP

30_day FP num 90_day FP num 180_day FP num 365_day FP num

  • ver_365_day FP num
slide-17
SLIDE 17

Interim Analysis

  • Since the performance in the middle three classes

(90_day, 180_day, 365_day) is much poorer than that in the two end classes (30_day, over_365_day), we need to consider reducing the number of classes.

  • The sparsity issue of valid data in data sets implies that

we may need to consider extracting data from previous GP visits besides the most recent visit.

  • The gender and age distribution has significant

differences between 30-day class and other classes; however, it is not distinctive among other classes.

slide-18
SLIDE 18

Virtual Visit

  • Compilation of the visits over the past 2

years.

  • Selection of criteria for admission for many

attributes

slide-19
SLIDE 19
slide-20
SLIDE 20

Model Reframing – Add non-ED records

Extract 380,000 GP records without subsequent Hospital visit. Extract the patient intrinsic attributes and visit assigned attributes for those 380000 GP records Convert those extract attributes into learning features Draw a matching sample to the ED cases Re-build predictive models.

slide-21
SLIDE 21

10 Essential Attributes

– current diagnosis-name (current visit and activate only) – current diagnosis-sct category (current visit and activate

  • nly)'

– historical diagnosis-name (up to 10 years) – historical diagnosis-sct category (up to 10 years) – pt-age – pt-type – pt-gender – pt-atsi – pt-pension – pt-dva

slide-22
SLIDE 22

41 Optional Attributes

  • immunisation (current visit)
  • historical immunisation (up to 5 years)
  • mbs
  • reaction (current visit)
  • historical reaction (up to 5 years)
  • current pathology test-test name (current visit)
  • current pathology test-radiology test (current visit)
  • current pathology result (current visit)
  • historical pathology test-test name (within 12 months)
  • historical pathology test-radiology test (within 12 months)
  • historical pathology result (within 12 months)
  • current scrip-generic name (within 8 months)
  • current scrip-drug name (within 8 months)
  • current scrip-product name (within 8 months)
  • current scrip-frequency (within 8 months)
  • current scrip-repeat (within 8 months)
  • current scrip-substitutions (within 8 months)
  • current scrip-reason (within 8 months)
  • current scrip-medication id (within 8 months)
  • current scrip-drug-class (within 8 months)
slide-23
SLIDE 23

Revised 90-day class model

6 Class Model F=73.90

slide-24
SLIDE 24

Adopted Model

slide-25
SLIDE 25

Concluding Points

  • Built a coherent representation of the patient records suited to

computing a predictive model;

  • Tested a variety of combinations of attributes for the best results;
  • Converted the many attributes available into domain ranges that

were relevant to the task;

  • Tested many class configurations around 30-day, 90-day, 180-day,

365-day and post-1-year attendances.

  • Devised representations of the various time lapses between the

GP visits of patients;

  • Separated the analysis to use non-injury cases.
  • Designed a “virtual visit record” from the historical records which

compressed the historical data yet separated it from the data of the most recent visit.

slide-26
SLIDE 26

Feature Importance for Under 6

0.05 0.1 0.15 Ranking

Feature importance for aged under 6

alcohol-days alcohol-drinks alcohol-risk bp-recored care-goal clinical-alcohol clinical-allergy clinical-smoke diagnosis-category diagnosis-name historical_diagnosis-category historical_diagnosis-name historical_immunisation historical_measurement historical_pathology-result historical_pathology-test historical_scrip-drug-class historical_scrip-drug-name historical_scrip-frequency historical_scrip-generic-name historical_scrip-medication-id historical_scrip-product-name historical_scrip-rating historical_scrip-reason historical_scrip-repeat historical_scrip-substitutions immunisation mbs measurement pathology-result pathology-test pathology-test-radiology pt-age pt-atsi pt-dva pt-gender pt-pension pt-type reaction scrip-drug-class scrip-drug-name scrip-frequency scrip-generic-name scrip-medication-id scrip-product-name scrip-reason scrip-repeat scrip-substitutions tobacco-risk tobacco-status

slide-27
SLIDE 27

Feature Importance for Over 65

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Ranking

Feature importance for aged over 65

alcohol-days alcohol-drinks alcohol-risk bp-recored care-goal clinical-alcohol clinical-allergy clinical-smoke diagnosis-category diagnosis-name historical_diagnosis-category historical_diagnosis-name historical_immunisation historical_measurement historical_pathology-result historical_pathology-test historical_scrip-drug-class historical_scrip-drug-name historical_scrip-frequency historical_scrip-generic-name historical_scrip-medication-id historical_scrip-product-name historical_scrip-rating historical_scrip-reason historical_scrip-repeat historical_scrip-substitutions immunisation mbs measurement pathology-result pathology-test pathology-test-radiology pt-age pt-atsi pt-dva pt-gender pt-pension pt-type reaction scrip-drug-class scrip-drug-name scrip-frequency scrip-generic-name scrip-medication-id scrip-product-name scrip-reason scrip-repeat scrip-substitutions tobacco-risk tobacco-status

slide-28
SLIDE 28

Conclusions

  • It is apparent that Under 6 year olds major

predictive attributes are historical-diagnosis- name and diagnosis-name (See figure),

  • While for the over 65s the most important

attributes are historical-diagnosis-name and pathology result (See figure), plus more support from general pathology and prescription and MBS information, as well as the patient socio-demographic details.

slide-29
SLIDE 29

Epilogue

  • Outcome Health say:
  • Tested on 12 GPs who say

– Good for over 365 predictions – Not good for 30 day predictions – Makes them think more about what to do for the 30-day attendance prediction

  • Providing %ages was a bad idea – need to use

a categorical variable of 5 bands