a k means approach to clustering disease progressions
play

A k-means approach to clustering disease progressions Duc Thanh Anh - PowerPoint PPT Presentation

A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017 Outline Motivation K-means approach An


  1. A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017

  2. Outline • Motivation • K-means approach • An application for Chronic Kidney Disease • Generating patient-specific disease profiles

  3. Motivation • Find subgroup of patients have similar disease progression • Identify the underlying mechanism of the subgroup • Provide better treatment for each subgroup

  4. Motivation • Different patients have different disease progressions • Consider the case of Chronic Kidney Disease 90 ● Are there few general trends of ● ● ● ● disease progressions? ● patient ID ● 70 8562280 ● ● ● ● ● ● ● ● ● ● ● ● 8563881 ● ● ● ● ● ● ● ● ● ● ● ● ● 8567589 ● ● ● ● ● ● ● ● ● ● ● ● eGFR ● ● ● ● 8571050 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8582794 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● 8587204 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8587950 ● ● ● ● ● ● ● ● ● ● ● Can we group patients by their ● ● ● ● ● ● ● ● 8601598 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8602147 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● progressions into few groups? ● 8602554 ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 days from first clinical record

  5. Motivation 75 eGFR 50 Trajectories of 500 25 patients 0 0 300 600 900 days from first clinical record Trajectories after being clustered

  6. Clustering problem and k-means algorithm • Cluster a set of data points into k clusters • Can be solved by K-means approach Bishop, Christopher M. Pattern recognition and machine learning . Springer, 2006.

  7. K-means approach ● 55 ● ● ● 50 eGFR ● ● Patient disease Data object ● ● progression 45 ● ● ● 40 50 0 500 1000 1500 days from first clinical record 45 ● 40 eGFR Distance ● ● ● 70 35 metric ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● 30 ● ● eGFR ● ● ● centroid 50 ● red 0 300 600 900 ● days from first clinical record ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● Centroid Regression line ● ● 0 400 800 1200 days from first clinical record

  8. K-means approach randomly assign patient into k Initial step clusters No Perform regression for each cluster Update to obtain “centroid” step Yes No patient move End to another group? Assignme Assign patient to the the cluster nt step that has closest centroid

  9. Dataset & Preprocessing DARTNet patients (n = 69,817) Excluded Invalid birth year and sex Excluded Number of serum creatinine value (n = 6,418) records < 1 (n = 181) Excluded Invalid data records (n = 9) “Preprocessed” DARTNet patients (n = 63,209) Having eGFR values less than 60 for more than three months (n = 29,585) Excluded Observation duration < 1 Excluded year (n = 5,285) Number of serum creatinine records < 10 (n = 17,158) Final CKD cohort (n = 7,142)

  10. Clustering result

  11. Demographic distribution in clusters 0.73% 4.56% 5.91% 80 (52) (326) (422) 9.49% (678) 14.86% (1061) 60 11.55% (825) age ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 13.72% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 (980) ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 1 ● ● 9.61% ● ● ● ● ● ● Cluster 2 ● ● ● (686) Cluster 3 ● ● ● ● ● Cluster 4 ● ● Cluster 5 ● Cluster 6 Cluster 7 ● ● Cluster 8 20 14.59% Cluster 9 14.98% ● (1042) Cluster 10 (1070) ● ● ● 2.5 5.0 7.5 10.0 cluster

  12. Other clinical markers

  13. Generating patient-specific disease profiles ● 40 residuals ● 35 eGFR ● ● Gaussian processes ● ● ● ● ● 30 ● ● 0 300 600 900 days from first clinical record Rasmussen, Carl Edward, and Christopher KI Williams. Gaussian processes for machine learning . Vol. 1. Cambridge: MIT press, 2006.

  14. Generating patient-specific disease profiles Patient 391 Cluster 5 100 100 cluster's trajectory cluster's trajectory individual predicted trajectory individual predicted trajectory 80 upper and lower limit 80 upper and lower limit actual eGFR value actual eGFR value ● ● 60 eGFR 60 40 40 20 20 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 day day

  15. Conclusion & Future Work • Clustering disease progressions – k-means approach • Generating individual prediction – Gaussian processes • Extend the approach to cope with multiple clinical markers • Give quantitative evaluation of clusters • Tightness • Separation

  16. Thank you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend