A k-means approach to clustering disease progressions Duc Thanh Anh - PowerPoint PPT Presentation

A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017

Outline • Motivation • K-means approach • An application for Chronic Kidney Disease • Generating patient-specific disease profiles

Motivation • Find subgroup of patients have similar disease progression • Identify the underlying mechanism of the subgroup • Provide better treatment for each subgroup

Motivation • Different patients have different disease progressions • Consider the case of Chronic Kidney Disease 90 ● Are there few general trends of ● ● ● ● disease progressions? ● patient ID ● 70 8562280 ● ● ● ● ● ● ● ● ● ● ● ● 8563881 ● ● ● ● ● ● ● ● ● ● ● ● ● 8567589 ● ● ● ● ● ● ● ● ● ● ● ● eGFR ● ● ● ● 8571050 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8582794 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● 8587204 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8587950 ● ● ● ● ● ● ● ● ● ● ● Can we group patients by their ● ● ● ● ● ● ● ● 8601598 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 8602147 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● progressions into few groups? ● 8602554 ● ● ● ● ● ● 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1000 2000 3000 days from first clinical record

Motivation 75 eGFR 50 Trajectories of 500 25 patients 0 0 300 600 900 days from first clinical record Trajectories after being clustered

Clustering problem and k-means algorithm • Cluster a set of data points into k clusters • Can be solved by K-means approach Bishop, Christopher M. Pattern recognition and machine learning . Springer, 2006.

K-means approach ● 55 ● ● ● 50 eGFR ● ● Patient disease Data object ● ● progression 45 ● ● ● 40 50 0 500 1000 1500 days from first clinical record 45 ● 40 eGFR Distance ● ● ● 70 35 metric ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● 30 ● ● eGFR ● ● ● centroid 50 ● red 0 300 600 900 ● days from first clinical record ● ● ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 30 ● Centroid Regression line ● ● 0 400 800 1200 days from first clinical record

K-means approach randomly assign patient into k Initial step clusters No Perform regression for each cluster Update to obtain “centroid” step Yes No patient move End to another group? Assignme Assign patient to the the cluster nt step that has closest centroid

Dataset & Preprocessing DARTNet patients (n = 69,817) Excluded Invalid birth year and sex Excluded Number of serum creatinine value (n = 6,418) records < 1 (n = 181) Excluded Invalid data records (n = 9) “Preprocessed” DARTNet patients (n = 63,209) Having eGFR values less than 60 for more than three months (n = 29,585) Excluded Observation duration < 1 Excluded year (n = 5,285) Number of serum creatinine records < 10 (n = 17,158) Final CKD cohort (n = 7,142)

Clustering result

Demographic distribution in clusters 0.73% 4.56% 5.91% 80 (52) (326) (422) 9.49% (678) 14.86% (1061) 60 11.55% (825) age ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 13.72% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 (980) ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 1 ● ● 9.61% ● ● ● ● ● ● Cluster 2 ● ● ● (686) Cluster 3 ● ● ● ● ● Cluster 4 ● ● Cluster 5 ● Cluster 6 Cluster 7 ● ● Cluster 8 20 14.59% Cluster 9 14.98% ● (1042) Cluster 10 (1070) ● ● ● 2.5 5.0 7.5 10.0 cluster

Other clinical markers

Generating patient-specific disease profiles ● 40 residuals ● 35 eGFR ● ● Gaussian processes ● ● ● ● ● 30 ● ● 0 300 600 900 days from first clinical record Rasmussen, Carl Edward, and Christopher KI Williams. Gaussian processes for machine learning . Vol. 1. Cambridge: MIT press, 2006.

Generating patient-specific disease profiles Patient 391 Cluster 5 100 100 cluster's trajectory cluster's trajectory individual predicted trajectory individual predicted trajectory 80 upper and lower limit 80 upper and lower limit actual eGFR value actual eGFR value ● ● 60 eGFR 60 40 40 20 20 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 day day

Conclusion & Future Work • Clustering disease progressions – k-means approach • Generating individual prediction – Gaussian processes • Extend the approach to cope with multiple clinical markers • Give quantitative evaluation of clusters • Tightness • Separation

Thank you

A k-means approach to clustering disease progressions Duc Thanh Anh - PowerPoint PPT Presentation

A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017 Outline Motivation K-means approach An

JUST THE MATHS SLIDES NUMBER 2.1 SERIES 1 (Elementary progressions and series) by

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Coco Cloud Project Overview Aljosa Pasic Atos Spain Mission Seamless compliance and

Objective Cluster Identification A Finite State Machine Approach Want to find and identify

Aerotropolis Atlanta CIDs Presentation Freight Cluster Plan Pre-Bid Meeting Stan Reecy, Project

iGENI Presentation for ORCA Cluster Meeting, GEC 10 Presented By iGENI Consortium:

HPC Clusters: Best Practices and Performance Study Agenda HPC at HPE System

Partnering value proposition Through partnering with the ETC and its ecosystem, partners have

Farmer Clusters Pete Thompson Game & Wildlife Conservation Trust Biodiversity Adviser

Processing Big Data with Pentaho Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Sambuz

Useful Links

Newsletter

Mail Us

A k-means approach to clustering disease progressions Duc Thanh Anh - PowerPoint PPT Presentation

A k-means approach to clustering disease progressions Duc Thanh Anh Luong Varun Chandola Department of Computer Science & Engineering University at Buffalo IEEE ICHI 2017 August 26, 2017 Outline Motivation K-means approach An

JUST THE MATHS SLIDES NUMBER 2.1 SERIES 1 (Elementary progressions and series) by

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Coco Cloud Project Overview Aljosa Pasic Atos Spain Mission Seamless compliance and

Objective Cluster Identification A Finite State Machine Approach Want to find and identify

Aerotropolis Atlanta CIDs Presentation Freight Cluster Plan Pre-Bid Meeting Stan Reecy, Project

iGENI Presentation for ORCA Cluster Meeting, GEC 10 Presented By iGENI Consortium:

HPC Clusters: Best Practices and Performance Study Agenda HPC at HPE System

Partnering value proposition Through partnering with the ETC and its ecosystem, partners have

Farmer Clusters Pete Thompson Game &amp; Wildlife Conservation Trust Biodiversity Adviser

Processing Big Data with Pentaho Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Sambuz

Useful Links

Newsletter

Mail Us

Farmer Clusters Pete Thompson Game & Wildlife Conservation Trust Biodiversity Adviser