Machine learning for cancer genomics Jean-Philippe Vert - PowerPoint PPT Presentation

Machine learning for cancer genomics Jean-Philippe Vert Jean-Philippe.Vert@mines.org Mines ParisTech / Curie Institute / Inserm ”Informatics and mathematical sciences: interactions with biomedical sciences” workshop, Paris, June 17, 2011. Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 1 / 44

Outline Introduction 1 Cancer prognosis from DNA copy number variations 2 Diagnosis and prognosis from gene expression data 3 Conclusion 4 Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 2 / 44

Outline Introduction 1 Cancer prognosis from DNA copy number variations 2 Diagnosis and prognosis from gene expression data 3 Conclusion 4 Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 3 / 44

Chromosomic aberrations in cancer Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 4 / 44

Comparative Genomic Hybridization (CGH) Motivation Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content 1 0.5 Log-ratio 0 -0.5 -1 Chromosome 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 X Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 5 / 44

Cancer prognosis: can we predict the future evolution? 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Aggressive (left) vs non-aggressive (right) melanoma Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 6 / 44

DNA → RNA → protein CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription) Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 7 / 44

Tissue profiling with DNA chips Data Gene expression measures for more than 10 k genes Measured typically on less than 100 samples of two (or more) different classes (e.g., different tumors) Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 8 / 44

Can we identify the cancer subtype? (diagnosis) Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 9 / 44

Can we predict the future evolution? (prognosis) Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 10 / 44

Pattern recognition, aka supervised classification 0.5 0.5 0 0 −0.5 −0.5 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 2 1 0 0 −2 −1 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 1 2 0 0 −1 −2 −2 −4 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 4 1 2 0 0 −2 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 2 0.5 0 0 −2 −0.5 −4 −1 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 11 / 44

Pattern recognition, aka supervised classification Challenges Few samples High dimension Structured data Heterogeneous data Prior knowledge Fast and scalable implementations Interpretable models Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 12 / 44

Shrinkage estimators Define a large family of "candidate classifiers", e.g., linear 1 predictors: f β ( x ) = β ⊤ x for x ∈ R p For any candidate classifier f β , quantify how "good" it is on the 2 training set with some empirical risk, e.g.: n R ( β ) = 1 � l ( f β ( x i ) , y i ) . n i = 1 Choose β that achieves the minimium empirical risk, subject to 3 some constraint: min β R ( β ) subject to Ω( β ) ≤ C . Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 13 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . b est b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . est b b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . b est b est C b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . b est b est C b* b* C Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? min β R ( β ) subject to Ω( β ) ≤ C . b est b est C Variance b* Bias b* C Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

Why skrinkage classifiers? b est b est C Variance b* Bias b* C "Increases bias and decreases variance" Common choices are Ω( β ) = � p i = 1 β 2 i (ridge regression, SVM, ...) Ω( β ) = � p i = 1 | β i | (lasso, boosting, ...) Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 15 / 44

Including prior knowledge in the penalty? min β R ( β ) subject to Ω( β ) ≤ C . b est b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

Including prior knowledge in the penalty? min β R ( β ) subject to Ω( β ) ≤ C . est b b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

Including prior knowledge in the penalty? min β R ( β ) subject to Ω( β ) ≤ C . b est b est C b* b* C Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

Including prior knowledge in the penalty? min β R ( β ) subject to Ω( β ) ≤ C . b est b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

Including prior knowledge in the penalty? min β R ( β ) subject to Ω( β ) ≤ C . est b b* Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

Machine learning for cancer genomics Jean-Philippe Vert - PowerPoint PPT Presentation

Machine learning for cancer genomics Jean-Philippe Vert Jean-Philippe.Vert@mines.org Mines ParisTech / Curie Institute / Inserm Informatics and mathematical sciences: interactions with biomedical sciences workshop, Paris, June 17, 2011.

Genomics Genomics extravaganza extravaganza Genomics Genomics overview overview Genomics

Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On

Melbourne Genomics Establishing data governance in clinical genomics Ian Pham Data Governance

Genomics extravaganza Genomics overview Genomics analysis of the structure and function of very

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &

clinical genomics Melbourne Genomics Health Alliance Melbourne Genomics Health Alliance Medical

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

High throughput methods approches in genomics D. Puthier Genomics The science for the 21st

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

functional functional genomics genomics Astrid Lgreid Department of Cancer Research and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

Cost of Equity Alex Kummant, Executive Vice President Network 13 December 2013 Aurizon

PRODUCING AND EXPLORING SEPTEMBER QUARTER 2011 CONFERENCE CALL & WEBCAST 1 CAUTIONARY

UK@Work Survey 2019 Results Faculty Council April 21, 2020 DEFINITION OF TERMS 2 2019 SURVEY

Lisbon, March 2020 This document has been prepared by EDP - Energias de Portugal, S.A. (the

Client Alert CMS Extends Deadline for States to Implement Medicaid RAC Programs Contact Attorneys

Client Alert MEDICARE RAC AUDITS AND APPEALS: Contact Attorneys Regarding TIPS, STRATEGIES, AND

Presented by: Phone: (732) 597-5824 Fax: (732) 597-5828 Web: http://www.e-hrs.com Why the

San Diego IRWM Program RAC Meeting #80 April 3, 2019 1 Agenda Welcome and Introductions

Machine learning for cancer genomics Jean-Philippe Vert - PowerPoint PPT Presentation

Machine learning for cancer genomics Jean-Philippe Vert Jean-Philippe.Vert@mines.org Mines ParisTech / Curie Institute / Inserm Informatics and mathematical sciences: interactions with biomedical sciences workshop, Paris, June 17, 2011.

Genomics Genomics extravaganza extravaganza Genomics Genomics overview overview Genomics

Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On

Melbourne Genomics Establishing data governance in clinical genomics Ian Pham Data Governance

Genomics extravaganza Genomics overview Genomics analysis of the structure and function of very

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &amp;

clinical genomics Melbourne Genomics Health Alliance Melbourne Genomics Health Alliance Medical

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

High throughput methods approches in genomics D. Puthier Genomics The science for the 21st

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

functional functional genomics genomics Astrid Lgreid Department of Cancer Research and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

Cost of Equity Alex Kummant, Executive Vice President Network 13 December 2013 Aurizon

PRODUCING AND EXPLORING SEPTEMBER QUARTER 2011 CONFERENCE CALL &amp; WEBCAST 1 CAUTIONARY

UK@Work Survey 2019 Results Faculty Council April 21, 2020 DEFINITION OF TERMS 2 2019 SURVEY

Lisbon, March 2020 This document has been prepared by EDP - Energias de Portugal, S.A. (the

Client Alert CMS Extends Deadline for States to Implement Medicaid RAC Programs Contact Attorneys

Client Alert MEDICARE RAC AUDITS AND APPEALS: Contact Attorneys Regarding TIPS, STRATEGIES, AND

Presented by: Phone: (732) 597-5824 Fax: (732) 597-5828 Web: http://www.e-hrs.com Why the

San Diego IRWM Program RAC Meeting #80 April 3, 2019 1 Agenda Welcome and Introductions

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &

PRODUCING AND EXPLORING SEPTEMBER QUARTER 2011 CONFERENCE CALL & WEBCAST 1 CAUTIONARY