Predicting ED Attendance from GP Records Jon Patrick CEO - - PowerPoint PPT Presentation
Predicting ED Attendance from GP Records Jon Patrick CEO - - PowerPoint PPT Presentation
Predicting ED Attendance from GP Records Jon Patrick CEO Statement of Interests Project was funded by HCF Foundation to Outcome Health HLA was contracted by Outcome Health to perform the predictive modeling and NLP work The
Statement of Interests
- Project was funded by HCF Foundation to
Outcome Health
- HLA was contracted by Outcome Health to
perform the predictive modeling and NLP work
The beginnings
- HLA was contracted to provide the SNOMED
CT coding of 56,00 GP Reason for Visit notes.
- A Common Usage Classification System was
created by HLA to better represent the classes of information important to the client.
- This was based on a “dismembering and
reassembly” of the SCT hierarchy to fit the client needs.
Outcome Health Records
- 35,416 GP visits for 20,971 Unique patients with
subsequent ED records extracted from the VEMD
– Data attributes as part of the Patient Record:
- Care Plan Goal (Have Goal 4%) with major classes
– Improve general health 1424, – Prevent influenza 1348, – Prevent complications 1166, – Maintain function 1147, – Manage pain 1125, – Reduce rate of progression of disease 1058, – Improve knowledge of condition 1035, – Maintain mobility 979
More on Records
Data attributes from the Clinical tables includes:
- Smoking status(12% smoker, 23% ex-smoker),
- Alcohol status(9% drinker, 5% non-drinker),
- Allergy status (known allergy 12%).
Descriptive Statistics
- List of Diagnoses has about 30% (6478) of patients with reported
- diagnoses. The relative importance of this data attribute vis-a-vis the
Reason_for_Visit attribute was an open question.
- About 24% of visits use 4500 unique diagnosis descriptions.
- The most frequent diagnoses by Visit are {
– Hypertension 197, – URTI 174, – Asthma 174, – Depression 146, – Bronchitis 135, – Tonsillitis 115, – UTI 114, – Otitis media 97, – Gastroenteritis 95, – Review 87}
Diagnosis records
Descriptive Statistics
- 86% of visits have some value for Diagnosis-Status-at-Visit –
- The list of Diseases
- COPD,
- BOneJointDisease,
- Diabetes,
- Cancer,
- CHD,
- Asthma,
- Gastroenteritis,
- Stroke,
- Influenza,
- Hypertension,
- Anxiety,
- Depression,
- Hepatitis}
- Frequencies vary from 3-12%.
Attributes for the Model – Pt 1
- There are 14 attribute groups making up 27 attributes. Two attributes {Reaction
Types and Pathology Result Types have a total of 2341 attributes making a total of 2368 attributes. The range of values for each attribute is listed below. Many attribute values are left empty or have content equivalent to “unknown”
- BP recorded (6 values)
- Care goal (925 values)
- Clinical fields
– clinical-smoke info (5 values), – clinical-alcohol info (4 values), – clinical-allergy info (4 values)
- Diagnosis Details
– diagnosis-name (4099 values), – diagnosis-SCT category (29 values)
- Immunisation (403 values)
- MBS (69 values)
- Reaction types and values (1661 types, 4 values)
Attributes for the Model –Pt 2
- Script Details
– Script-generic name (982 values), – script-drug name (2340 values), – script-product name (2058 values), – script-frequency (21 values), – script-repeat (4 values), – script-substitutions (2 values), – script-reason (956 values), – script-medication id (918 values)
- Tobacco Usage
– Tobacco-risk factor (4 values), – tobacco-quit status (2 values)
- GP Visit details
– GP visit-duration (5 values), – GP visit-age (6 values), – GP visit-type (8 values)
- Gender (4 values)
Baseline Predictive Model
The results also show that 24% of the patients who are admitted to ED within 30_days of a GP visit are not correctly identified in this model,
- f which about 77% are classified as over_365_days class.
2-class Model
Feature Sets
- Sparse features
- No useful feature set can be extracted from
- ver 13% of patient records (group 1) and
- nly one useful feature set can be extracted
from over 17% of patient records
1000 2000 3000 4000 5000 6000 7000 8000 30_day TP num 30_day FP num 30_day FN num 90_day TP num 90_day FP num 90_day FN num 180_day TP num 180_day FP num 180_day FN num 365_day TP num 365_day FP num 365_day FN num
- ver_365_day TP num
- ver_365_day FP num
- ver_365_day FN num
All Class Gender Distribution
U M F
50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104
Age Distribution for TP
30_day TP num 90_day TP num 180_day TP num 365_day TP num
- ver_365_day TP num
50 100 150 200 250 300 350 400 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104
Age Distribution for FP
30_day FP num 90_day FP num 180_day FP num 365_day FP num
- ver_365_day FP num
Interim Analysis
- Since the performance in the middle three classes
(90_day, 180_day, 365_day) is much poorer than that in the two end classes (30_day, over_365_day), we need to consider reducing the number of classes.
- The sparsity issue of valid data in data sets implies that
we may need to consider extracting data from previous GP visits besides the most recent visit.
- The gender and age distribution has significant
differences between 30-day class and other classes; however, it is not distinctive among other classes.
Virtual Visit
- Compilation of the visits over the past 2
years.
- Selection of criteria for admission for many
attributes
Model Reframing – Add non-ED records
Extract 380,000 GP records without subsequent Hospital visit. Extract the patient intrinsic attributes and visit assigned attributes for those 380000 GP records Convert those extract attributes into learning features Draw a matching sample to the ED cases Re-build predictive models.
10 Essential Attributes
– current diagnosis-name (current visit and activate only) – current diagnosis-sct category (current visit and activate
- nly)'
– historical diagnosis-name (up to 10 years) – historical diagnosis-sct category (up to 10 years) – pt-age – pt-type – pt-gender – pt-atsi – pt-pension – pt-dva
41 Optional Attributes
- immunisation (current visit)
- historical immunisation (up to 5 years)
- mbs
- reaction (current visit)
- historical reaction (up to 5 years)
- current pathology test-test name (current visit)
- current pathology test-radiology test (current visit)
- current pathology result (current visit)
- historical pathology test-test name (within 12 months)
- historical pathology test-radiology test (within 12 months)
- historical pathology result (within 12 months)
- current scrip-generic name (within 8 months)
- current scrip-drug name (within 8 months)
- current scrip-product name (within 8 months)
- current scrip-frequency (within 8 months)
- current scrip-repeat (within 8 months)
- current scrip-substitutions (within 8 months)
- current scrip-reason (within 8 months)
- current scrip-medication id (within 8 months)
- current scrip-drug-class (within 8 months)
Revised 90-day class model
6 Class Model F=73.90
Adopted Model
Concluding Points
- Built a coherent representation of the patient records suited to
computing a predictive model;
- Tested a variety of combinations of attributes for the best results;
- Converted the many attributes available into domain ranges that
were relevant to the task;
- Tested many class configurations around 30-day, 90-day, 180-day,
365-day and post-1-year attendances.
- Devised representations of the various time lapses between the
GP visits of patients;
- Separated the analysis to use non-injury cases.
- Designed a “virtual visit record” from the historical records which
compressed the historical data yet separated it from the data of the most recent visit.
Feature Importance for Under 6
0.05 0.1 0.15 Ranking
Feature importance for aged under 6
alcohol-days alcohol-drinks alcohol-risk bp-recored care-goal clinical-alcohol clinical-allergy clinical-smoke diagnosis-category diagnosis-name historical_diagnosis-category historical_diagnosis-name historical_immunisation historical_measurement historical_pathology-result historical_pathology-test historical_scrip-drug-class historical_scrip-drug-name historical_scrip-frequency historical_scrip-generic-name historical_scrip-medication-id historical_scrip-product-name historical_scrip-rating historical_scrip-reason historical_scrip-repeat historical_scrip-substitutions immunisation mbs measurement pathology-result pathology-test pathology-test-radiology pt-age pt-atsi pt-dva pt-gender pt-pension pt-type reaction scrip-drug-class scrip-drug-name scrip-frequency scrip-generic-name scrip-medication-id scrip-product-name scrip-reason scrip-repeat scrip-substitutions tobacco-risk tobacco-status
Feature Importance for Over 65
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Ranking
Feature importance for aged over 65
alcohol-days alcohol-drinks alcohol-risk bp-recored care-goal clinical-alcohol clinical-allergy clinical-smoke diagnosis-category diagnosis-name historical_diagnosis-category historical_diagnosis-name historical_immunisation historical_measurement historical_pathology-result historical_pathology-test historical_scrip-drug-class historical_scrip-drug-name historical_scrip-frequency historical_scrip-generic-name historical_scrip-medication-id historical_scrip-product-name historical_scrip-rating historical_scrip-reason historical_scrip-repeat historical_scrip-substitutions immunisation mbs measurement pathology-result pathology-test pathology-test-radiology pt-age pt-atsi pt-dva pt-gender pt-pension pt-type reaction scrip-drug-class scrip-drug-name scrip-frequency scrip-generic-name scrip-medication-id scrip-product-name scrip-reason scrip-repeat scrip-substitutions tobacco-risk tobacco-status
Conclusions
- It is apparent that Under 6 year olds major
predictive attributes are historical-diagnosis- name and diagnosis-name (See figure),
- While for the over 65s the most important
attributes are historical-diagnosis-name and pathology result (See figure), plus more support from general pathology and prescription and MBS information, as well as the patient socio-demographic details.
Epilogue
- Outcome Health say:
- Tested on 12 GPs who say
– Good for over 365 predictions – Not good for 30 day predictions – Makes them think more about what to do for the 30-day attendance prediction
- Providing %ages was a bad idea – need to use