Extreme Spatio Temporal Data Analysis in Biomedical Informatics
- Joel Saltz MD, PhD
- Director Center for Comprehensive
Extreme Spatio Temporal Data Analysis in Biomedical Informatics - - PowerPoint PPT Presentation
Extreme Spatio Temporal Data Analysis in Biomedical Informatics Joel Saltz MD, PhD Director Center for Comprehensive Informatics Center for Com prehensive I nform atics Contributions Computer Science: Methods and middleware for
Center for Com prehensive I nform atics
Center for Com prehensive I nform atics
Center for Com prehensive I nform atics
Center for Com prehensive I nform atics
Shimada, Gurcan, Kong, Saltz
resolution levels)
(EMLDA)
Features)
Classification (Bayesian)
Controller (Confidence Region)
No Yes
Image Tile Initialization I = L Background? Label Create Image I(L) Segmentation Feature Construction Feature Extraction Classification Segmentation Feature Construction Feature Extraction Classifier Training Down-sampling Training Tiles Within Confidence Region ? I = I -1 I > 1?
Yes Yes No No
TRAINING TESTING
Center for Com prehensive I nform atics
Center for Com prehensive I nform atics
Anaplastic Astrocytoma (WHO grade III) Glioblastoma (WHO grade IV)
Center for Com prehensive I nform atics
Nuclei Segmentation Cellular Features
Whole Slide Imaging
Center for Comprehensive Informatics Consensus clustering of m orphological signatures
Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients Each possibility evaluated using 2000 iterations of K- means to quantify co-clustering
3 2 1
20 40 60 80 100 120 140 160
2 3 4 5 6 7 25 30 35 40 45 50 # Clusters Silhouette Area 0.5 1 1 2 3 Silhouette Value Cluster
Center for Comprehensive Informatics
Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB)
Feature Indices
CC CM PB
10 20 30 40 50 500 1000 1500 2000 2500 3000 0.2 0.4 0.6 0.8 1 Days Survival
CC CM PB
Center for Comprehensive Informatics
– Gene expression class not significant p= 0.58 – Morphology clustering p= 5.0e-3
CC CM PB 20 40 60 80 100 Cluster Subtype Percentage (%) Classical Mesenchymal Neural Proneural
Center for Comprehensive Informatics
Feature Indices
CC Mixed CM
10 20 30 40 50
20 40 60 80 100 0.2 0.4 0.6 0.8 1 Months Survival
CC Mixed CM
Center for Comprehensive Informatics
1000’s of genes
Center for Comprehensive Informatics
computer resources to squeeze the most out of image, sensor or simulation data
algorithms to derive sam e features
derive com plem entary features
management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
Center for Comprehensive Informatics
– Microscopy image analyses – Biomass monitoring using satellite imagery – Weather prediction using satellite and ground sensor data – Large scale simulations
Center for Comprehensive Informatics
unified framework
data
changes using higher resolution data (e.g. multitemporal AWiFS data)
Center for Comprehensive Informatics
Pipeline of filters connected though logical streams In transit processing Flow control between filters and streams Developed 1990s-2000s; led to IBM System S
Two level hierarchical pipeline framework In transit processing Coarse grained components coordinated by Manager that coordinates work on pipeline stages between nodes Fine grained pipeline operations managed at the node level Both levels employ filter/stream paradigm
Center for Comprehensive Informatics
Center for Comprehensive Informatics
– Nodes contain CPUs, GPUs – Each CPU contains multiple cores – GPU has complex internal architecture – Data locality within node – Data paths between CPUs and GPUs
Keeneland Node
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Morphological Reconstruction:
8-15 Fold speedup vis one CPU core (Intel i7 2.66 GHz) on NVIDIA C2070 and GTX580 GPUs
Center for Comprehensive Informatics
Point query: human marked point inside a nucleus
Window query: return markups contained in a rectangle Spatial join query: algorithm validation/comparison Containment query: nuclear feature aggregation in tumor regions
imageReferences, provenance
querying and sharing data
RDBMS + SDBMS
PAI S
Center for Comprehensive Informatics
PAI S: Exam ple Queries
SELECT c.pais_uid, pc.subtype, AVG(area), AVG(perimeter), AVG(eccentricity), COVARIANCE(area, perimeter), COVARIANCE(area, eccentricity) FROM pais.calculation_flat c,TCGA.PATIENT_CHARACTERISTIC pc, pais.patient p WHERE p.patientid = pc.patient_id AND p.pais_uid = c.pais_uid GROUP BY c.pais_uid, pc.subtype;
2 1 3 4
50 100 150 20 40 60 80 100 120 140 160
Feature Indices 10 20 30 40 50 60 70 80 90 100 110 500 1000 1500 2000 2500 3000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Days Survival Cluster 1 Cluster 2 Cluster 3 Cluster 4PAI S: Exam ple Queries
Center for Comprehensive Informatics
Center for Com prehensive I nform atics
Center for Com prehensive I nform atics
Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
Kurc, Himanshu Rathod Emory leads
Daniel Rubin, Fred Prior, Larry Tarbox and many others
Pantalone
Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)
Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe
Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado- Ramos
Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi