A k-means approach to clustering disease progressions
Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017
A k-means approach to clustering disease progressions Duc Thanh Anh - - PowerPoint PPT Presentation
A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017 Outline Motivation K-means approach An
Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017
50 70 90 1000 2000 3000
days from first clinical record eGFR patient ID
8563881 8567589 8571050 8582794 8587204 8587950 8601598 8602147 8602554
25 50 75 300 600 900
days from first clinical record eGFR
Bishop, Christopher M. Pattern recognition and machine learning. Springer, 2006.
Distance metric Centroid Data object Patient disease progression
45 50 55 500 1000 1500
days from first clinical record eGFR
40 50 60 70 400 800 1200
days from first clinical record eGFR centroid
red
Regression line
35 40 45 50 300 600 900
days from first clinical record eGFR
Initial step randomly assign patient into k clusters Update step Assignme nt step Perform regression for each cluster to obtain “centroid” Assign patient to the the cluster that has closest centroid
No patient move to another group?
End Yes No
DARTNet patients (n = 69,817) Invalid birth year and sex value (n = 6,418) Number of serum creatinine records < 1 (n = 181) Invalid data records (n = 9) “Preprocessed” DARTNet patients (n = 63,209) Having eGFR values less than 60 for more than three months (n = 29,585) Final CKD cohort (n = 7,142) Number of serum creatinine records < 10 (n = 17,158) Excluded Excluded Excluded Observation duration < 1 year (n = 5,285) Excluded Excluded
40 60 80 2.5 5.0 7.5 10.0
cluster age
5.91% (422) 14.86% (1061) 13.72% (980) 14.59% (1042) 14.98% (1070) 9.61% (686) 11.55% (825) 9.49% (678) 4.56% (326) 0.73% (52) Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10
35 40 300 600 900
days from first clinical record eGFR
residuals Gaussian processes
Rasmussen, Carl Edward, and Christopher KI Williams. Gaussian processes for machine learning. Vol. 1. Cambridge: MIT press, 2006.
200 400 600 800 1000 20 40 60 80 100
individual predicted trajectory upper and lower limit actual eGFR value 200 400 600 800 1000 20 40 60 80 100
individual predicted trajectory upper and lower limit actual eGFR value