Big Image-Omics Data Analytics for Clinical Outcome Prediction - PowerPoint PPT Presentation

Big Image-Omics Data Analytics for Clinical Outcome Prediction Junzhou Huang, Ph.D. Associate Professor Dept. Computer Science & Engineering University of Texas at Arlington Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 1 1

Morphology and Prognosis • Integration: – Connections between morphology and prognosis – How: integrate pathological image data and molecular profiling data to learn this connection? Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 2

Clinical Outcome Prediction from Heterogeneous Cancer Data • Problem: – Subtype Recognition – Survival Prediction • Data: – Pathological Image – Gene Mutation – CNV – mRNA Expression – Protein Expression • Cohort: – TCGA (The Cancer Genome Atlas) – NLST (The National Lung Screening Trial) – UT lung SPORE cohort. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 3

Pipeline Overview Yao, J. and others: Clinical Imaging Biomarker Discovery for Lung Cancer Survival Prediction, To appear in MICCAI 2016. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 4

Subtype Cell Detection • Motivation: – D ifferent cell types (tumor cells, stromal cells, lymphocytes) play different roles in tumor growth and metastasis – Accurately classifying cell types is a critical step to better characterization of tumor growth and outcome predictions . • Traditional Cell Detection Methods[1]: – Pros: easily implemented and interpreted; faster – Cons: performance is not good enough • Deep Learning Cell Detection methods[2]: – Pros: better detection performance. – Cons: Slow; [1] Arteta, C., Lempitsky, and others: MICCAI. Learning to Detect Cells Using Non-overlapping Extremal Regions (2012) [2] Pan, H., Xu, Z., Huang, J.: MICCAI Workshop An E ff ective Approach for Robust Lung Cancer Cell Detection (2015) [3] Humayun Irshad, Student Member, IEEE, Antoine Veillard, Ludovic Roux, and Daniel Racoceanu, Member, IEEE Methodological Review (2014) Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 5

Deep Learning for Subtype Cell Detection • We designed a special structure for subtype cell detection: – Shared Convolution Weights : the cell/non-cell deep convolution neural network and subtype deep convolution neural network share all convolution weights to avoid the insufficiency and imbalance of the subtype cell patches. – Sparse Kernel : introducing the d-regularly sparse kernels to elimination all the redundant computation and to speed up the detection process. C: Convolutional Layers(With Pooling and ReLU Layer) F: Fully-Connected Layers S: Soft-max Layers Sheng Wang, Jiawen Yao, Zheng Xu, Junzhou Huang: Subtype Cell Detection with an Accelerated Deep Convolution Neural Network, MICCAI 2016 Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 6

Results on Subtype Cell Detection • Detection Results Method Precision Recall F1 score Times(s) NERS[1] 0.7990 0.6109 0.6757 31.47 RLCCD[2] 0.7280 0.8030 0.7759 52.89 Proposed 0.8029 0.8683 0.8215 0.7147 • Subtype Detection Results – Subtype Classification Neural Network Accuracy: 88.64% – Accuracy of Detected Cells: 87.18% � Lymphocytes Accuracy: 88.05% � Stromal Cell Accuracy: 81.08% � Tumor Cell Accuracy: 87.39% Our method has better performance in terms of both accuracy and computational time [1] Arteta, C., Lempitsky, and others: MICCAI. Learning to Detect Cells Using Non-overlapping Extremal Regions (2012) [2] Pan, H., Xu, Z., Huang, J.: MICCAI Workshop An E ff ective Approach for Robust Lung Cancer Cell Detection (2015) Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 7

Is lung cancer subtype cells detection easy? The size of one sample image is �� (usually larger), while traditional cell detection methods are still dealing with images with size ~�� . Cell density could be very high! Image size: 512 � 512 Pixel scale: 0.25��/�� Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 8

Acceleration (1): Sparse Kernel Sparse Kernel is used to eliminate all redundant computations in convolutions. The yellow area will be calculated several times since it will appear in several overlapping patches. We take the whole image as input and reuse the convolution operation result in the detection, which could roughly accelerate several hundred times depending on the sliding window size. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 9

Acceleration (2): Prefetching Technique We use Asynchronous Prefetching Technique to Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 10

Acceleration (3): Cluster Computing • Single-node Multi-GPU Computing - Communication Through PCI-e bus • Multi-node Computing - Communication Through Network � The data is mapped to a high-performance large-scale Network File System � Only the coordinates are communicated in the distributed system, which makes our framework scalable and communication-efficient in the cluster computing system . Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 11

Results on the Single Machine Time Comparison in Different Whole-slide Images Test Machine : • CPU: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz • RAM: 64 Gigabytes • GPU: 4 Nvidia Titan X GPUs • HDD: Samsung 950 Pro Solid-State Drive We’re able to detect cells in a �� image within only 20 seconds, on a single machine ! . (4000 times acceleration! Larger, more acceleration!) Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 12

More Results on the GPU Clusters Time Comparison on TACC Stampede Cluster With 32 Nvidia Tesla K20 GPU nodes , the benchmark of our framework in a �� -pixel image is only 155 seconds. (~10,000 times acceleration!) TACC Stampede Cluster: https://www.tacc.utexas.edu/stampede/ Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 13

Pipeline Overview Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 14

Biomarker Discovery for Survival Prediction • Data set – The National Lung Screening Trial (NLST): 144 ADC, 113 SCC • Predictive models – Multivariate Cox proportional hazards model with Lasso – Component-wise likelihood based boosting (CoxBoost) – Random survival forest (RSF) • Experiment Set – Compare with the state-of-the-arts framework in lung cancer – Compare performances on different types of data( CNV, mRNA, microRNA and Protein expression) on TCGA-LUSC Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267-288, 1996 Harald Binder and Martin Schumacher. Allowing for mandatory covariates in boosting estimation of sparse high- dimensional survival models. BMC Bioinformatics, 9(1):1-10, 2008. Hemant Ishwaran, Udaya B Kogalur, Eugene H Blackstone, and Michael S Lauer. Random survival forests. The annals of applied statistics, pages 841-860, 2008. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 15

Results � Survival Prediction Proposed Wang ADC A significant difference can be seen (smaller p-value) in the proposed framework Hongyuan Wang, Fuyong Xing, Hai Su, Arnold Stromberg, and Lin Yang. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC Bioinformatics, 15(1):310, 2014. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 16

Results Proposed Wang SCC A significant difference can be seen (smaller p-value) in the proposed framework Hongyuan Wang, Fuyong Xing, Hai Su, Arnold Stromberg, and Lin Yang. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC Bioinformatics, 15(1):310, 2014. Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 17

Results � Random Experiments (50 splits) 0.5741 0.5401 0.5563 0.4792 0.5965 0.5946 0.5690 0.5638 Fig. Boxplot of C-index distributions (Left: ADC, Right: SCC). Concordance index (C-index) : 1 indicates perfect prediction accuracy, 0.5 is as good as a random guess. Harrell, F., Lee, K. & Mark, D. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15 ,361–387 (1996). Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 18

Integration with Molecular Data Dept. CSE, UT Arlington Scalable Modeling & Imaging & Learning Lab (SMILE) 19

Big Image-Omics Data Analytics for Clinical Outcome Prediction - PowerPoint PPT Presentation

Big Image-Omics Data Analytics for Clinical Outcome Prediction Junzhou Huang, Ph.D. Associate Professor Dept. Computer Science & Engineering University of Texas at Arlington Dept. CSE, UT Arlington Scalable Modeling & Imaging &

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Introduction to Outcome Harvesting Open Contracting Programme Agenda Definition of Outcome

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Analytics (9:55-10:15am) Break Research Opportunities in Location, Analytics, Big Data and GIS

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is

a journal Catarina Kiefe PhD MD Dept of Quantitative Health Sciences University of Massachusetts

Calmness in stochastic programming exact penalization and sample approximation techniques

Belief models A very general theory of aggregation Seamus Bradley University of Leeds June 20,

rt qts r

Predicting Student Retention in STEM Majors Andrew Sage Dan Nettleton Cinzia Cervato Craig

-7-11/ XII 2009 1 Statistical Inference for Image Symmetries Mirek Pawlak

Sparse representation classification and positive L 1 minimization Cencheng Shen Joint Work with

Arni S.R. Srinivasa Rao Augusta University 1120 15th Street Augusta, GA 30912, USA Email:

Big Image-Omics Data Analytics for Clinical Outcome Prediction - PowerPoint PPT Presentation

Big Image-Omics Data Analytics for Clinical Outcome Prediction Junzhou Huang, Ph.D. Associate Professor Dept. Computer Science & Engineering University of Texas at Arlington Dept. CSE, UT Arlington Scalable Modeling & Imaging &

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Introduction to Outcome Harvesting Open Contracting Programme Agenda Definition of Outcome

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Analytics (9:55-10:15am) Break Research Opportunities in Location, Analytics, Big Data and GIS

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Big Data Analytics Armistead Boyd SVP, Product &amp; Data Partnerships October 25, 2016 What is

a journal Catarina Kiefe PhD MD Dept of Quantitative Health Sciences University of Massachusetts

Calmness in stochastic programming exact penalization and sample approximation techniques

Belief models A very general theory of aggregation Seamus Bradley University of Leeds June 20,

rt qts r

Predicting Student Retention in STEM Majors Andrew Sage Dan Nettleton Cinzia Cervato Craig

-7-11/ XII 2009 1 Statistical Inference for Image Symmetries Mirek Pawlak

Sparse representation classification and positive L 1 minimization Cencheng Shen Joint Work with

Arni S.R. Srinivasa Rao Augusta University 1120 15th Street Augusta, GA 30912, USA Email:

Big Data Analytics Armistead Boyd SVP, Product & Data Partnerships October 25, 2016 What is