m i n i n g a d m i n i s t r a t i v e a n d c l i n i c
play

M i n i n g A d m i n i s t r a t i v e a n d - PowerPoint PPT Presentation

M i n i n g A d m i n i s t r a t i v e a n d C l i n i c a l D i a b e t e s D a t a w ith Temporal Association Rules Stefano Concaro, Lucia Sacchi, Carlo Cerra, Riccardo Bellazzi MIE 2009, Sarajevo,


  1. M i n i n g A d m i n i s t r a t i v e a n d C l i n i c a l D i a b e t e s D a t a w ith Temporal Association Rules ‏ Stefano Concaro, Lucia Sacchi, Carlo Cerra, Riccardo Bellazzi MIE 2009, Sarajevo, August 31 st 2009

  2. Summary • DataWarehouse Healthcare Agency (ASL) of Pavia  Administrative healthcare data  Clinical data • Methods  Representation of temporal sequences  Integration of data sources  Temporal Association Rules (TARs) mining  Management of temporal heterogeneity in the data • Application  Diabetes Mellitus • Conclusions

  3. DataWarehouse Local Healthcare Agency (ASL) of Pavia (1) Administrative healthcare data (since 2002) Hospital Admissions Ambulatory Drug DW Visits/Lab Tests Prescriptions … “Process data” for Pavia area reimbursement purposes • 530.000 people • 170.000 admissions/year CodPat CodPharm PurchDate CodDrug ATC code Quantity Cost CodPat CodHosp AdmDate DischDate Diagn(1-6) Proc(1-6) Refund CodPat CodAmb ContDate CodTest Refund • 4.500.000 drug prescriptions/year xxx xxx yyy yyy 22/07/2008 72364 29/03/2008 347.1 C01AA02 50 € 2 60 € xxx yyy 16/06/2008 24/06/2008 428.1, 410 41.00 2300 € • 9.000.000 visits-tests/year Progressive increase of DW dimension due to new data introduction and historical data maintenance

  4. DataWarehouse Local Healthcare Agency (ASL) of Pavia (2) Clinical healthcare data (since 2007) • Jan 2007 – Oct 2008 • 1.300 diabetic patients Diabetes • 5.000 inspections Mellitus Outcomes of Cardio-Cerebro- Essential DW Vascular Disease Hypertension medical inspections and clinical tests … Variable Range IQ Range Unit 1. Body Mass Index (BMI) [10-80] [25.15-31.28] Kg/m 2 2. Systolic Blood Pressure (SBP) [60-240] [130-150] mmHg 3. Diastolic Blood Pressure (DBP) [30-150] [75-85] mmHg 4. Glycaemia [50-500] [112-162] mg/dl 5. Glycated Haemoglobin (HbA1c) [3-20] [6.3-7.9] % 6. Total Cholesterol [80-500] [175-232] mg/dl 7. HDL Cholesterol [10-120] [43-62] mg/dl 8. Triglycerides [10-2000] [91-177] mg/dl 9. Cardio-Vascular Risk (CVR) [0-100] [8.57-30.33] % 10. Anti-Hypertensive Therapy {Yes; No} - - 11. Care Intervention {Diet; Health training; None} - -

  5. Temporal Representation Primary role of the temporal dimension Drug prescriptions Clinical data [Patient j] SBP=190 mmHg Triglycerides=230 mg/dl … ACE inhibitors Glycaemia=115 mg/dl Time [days] t3 t1 t2 t4 t5 t6 t7 Creatinine test DRG 121 Hospital Blood glucose test Lab Diagn 410 admissions Gamma GT tests Case history: temporal sequences of healthcare events Temporal Association Rules (TARs) mining

  6. Integration of Data Sources • Administrative data: naturally represented as event sequences • Clinical data: pre-processing to shift from a quantitative representation to a qualitative description Knowledge-based Temporal Abstractions (TAs) State Trend TAs TAs • Glycaemia<65 ( low ) • Glycaemia • Glycaemia 65-100 Increasing ( regular ) • Glycaemia • Glycaemia 100-125 Steady ( IFG ) • Glycaemia • Glycaemia 125-180 Decreasing ( high )

  7. Temporal Heterogeneity Hybrid events: point-like and interval-like events granularity SBP>180 (severe hypertension) Triglycerides>350 (very high) … ACE inhibitors Glycaemia 100-125 (IFG) Time [days] t3 t2 t1 t4 t5 t6 t7 Creatinine test DRG 121 Blood glucose test Diagn 410 Gamma GT point-like events interval-like events TARs mining on temporal sequences of hybrid events

  8. Temporal Association Rules (TARs) TAR : relationship defined through a temporal operator ( op ) which holds between an event A ( the antecedent ) and an event C ( the consequent ) Basic rules : antecedent cardinality A C K=1 op E.g. ACE inhibitors BEFORE Heart failure diagnosis Complex rules : K>1 Apriori*-like search strategy A1 op A2 C op A3 op Time op A C E.g. {ACE inhibitors AND Beta-blockers AND Diuretics} BEFORE * [Agrawal R., Srikant R. Fast Algorithms for Mining Association Rules in Large Databases . Heart failure diagnosis In: 20th International Conference on Very Large Data Bases, 487-499 (1994)]

  9. Support Support = # subjects supporting the rule Total # of subjects Rule occurrences s1 f = 3, span = 0 f = 1, span = 0.7 Subject s2 s3 f = 2, span = 0.2 Time The number of subjects supporting the rule is based on a frequency threshold ( f_th ) and a duration threshold ( span_th ) E.g. f_th = 3 , span_th = 0.5  support = 2/3

  10. Support Support = # subjects supporting the rule Total # of subjects Support = 4/7

  11. Confidence Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA) Confidence = 2/3

  12. Confidence Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA) Probability that a patient experiences the consequent given that the antecedent occurred for that patient Events occurrences A C A C s1 1 T C s2 Subject 1 T A s3 1 T Time NSR = 1 Confidence = 1/2 NSA = 2

  13. Application: Diabetes Mellitus Clinical Rule template State • Jan 2007 – Oct 2008 TAs C A • 1.300 diabetic patients State TAs • 5.000 inspections Trend Gap Trend TAs TAs Before SSN Accesses SSN Administrative Accesses Access Code Description DRG 134 Hypertension Diagnosis 25000 Type II Diabetes Mellitus ATC C07A Beta Blocking Agents … … Parameter settings minsup = 0.01 (13 patients) minconf = 0.3

  14. Interesting rules Data Mining methods often produce a great amount of output information which is irrelevant , uninteresting or redundant Support | Confidence HbA1c 7-8 (high) BEFORE ATC A10: Drugs used in Diabetes 0.25 0.62 Total Cholesterol 220-280 (high) BEFORE ATC C10A: Lipid 0.18 0.73 modifying agents Post-processing Target : obtain only a reduced set of “ interesting ” rules Sequential Raw Reduced “Interesting” Mining Clinical data RuleSet RuleSet rules Minimp>1 step evaluation Quantitative verification of a-priori knowledge ClinR=1 Minimum improvement : keep only the rules which increase Rule classification based on the evidence of a clinical relationship between the events involved in the rules the confidence value with respect to all their subrules Suggestion for the discovery of unknown knowledge ClinR=0

  15. Results (1) Support Confidence  BMI 25-30 (overweight) 1 visit  Glycaemia 65-110  Glycaemia Increasing 0.013 0.56 (regular) BEFORE  HbA1c 7-8 (high)  Anti-hypertensive therapy: yes Given the occurence of the antecedent, TARs verified in the 1.3% of there is a 56% probability of an increase in the diabetic sample ( 17 p.) glycaemia in the following visit

  16. Results (2) Support Confidence  ATC C03C: High 365 days ceiling diuretics  HbA1c Increasing 0.012 0.57  ATC M04A: Antigout BEFORE agents Given the occurence of the antecedent, TARs verified in the 1.2% of there is a 57% probability of an increase in the diabetic sample ( 16 p.) HbA1c in the following year

  17. Results (2) Support Confidence  SBP<140 (regular)  ATC C03C: High 365 days  SBP Increasing ceiling diuretics 0.69 0.02  Care Intervention: BEFORE Diet Given the occurence of the antecedent, TARs verified in the 2% of there is a 69% probability of an increase in systolic blood pressure in the following the diabetic sample ( 26 p.) visit

  18. Results (2) Support ClinR Confidence  ATC B01 : 0.537 0.013 Antithrombotic agents  BMI 25-30 (overweight)  ATC B01A : 365 days  Glycaemia 110-180 0.537 0.013 Antithrombotic agents (high) 0 BEFORE  ATC B01AC : Platelet  HbA1c>9 (excessively high) aggregation inhibitors, 0.464 0.011 excluding heparin Anti-platelet agents as the Apparently no clinical relationship TARs verified in about the 1% of main antithrombotic drug between physiological observations the diabetic sample ( 14 - 17 p.) therapy in the subgroup of and drug effects patients

  19. Conclusions and Future Work General method to extract temporal relationships between diagnostic , therapeutic , or clinical patterns  Explicit handling of temporal heterogeneity (hybrid events)  Integration of different data sources with a uniform representation Ongoing Work and Future Developments Post-processing strategy  Rule set reduction: definition of “ interesting ” rules  Clinical classification of the rules  Hierarchical mining exploiting the taxonomical information  Ontology-driven rule classification to perform a totally automated post-processing procedure

  20. Conclusions (2) Future work  Hierarchical mining exploiting the taxonomical information  Ontology-driven rule classification to perform a totally automated post-processing procedure  Development of a method based on “chained” TARs to detect frequent temporal care-flows SBP>180 (severe hypertension) ACE inhibitors Time t2 t4 t1 t3 t5 t6 t7 Beta-blockers Heart failure {ACE inhibitors BEFORE Beta-blockers BEFORE SBP>180} Given a temporal case history , which are the most frequently {ACE inhibitors BEFORE Beta-blockers} {ACE inhibitors} BEFORE Beta-blockers BEFORE SBP>180 BEFORE Heart failure expected healthcare events ?

  21. Acknowledgments Riccardo Bellazzi (riccardo.bellazzi@unipv.it) Stefano Concaro (stefano.concaro@unipv.it) Carlo Cerra, Pietro Fratino (carlo_cerra@asl.pavia.it)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend