Cancer gene discovery via network analysis of somatic mutation data - PowerPoint PPT Presentation

Cancer gene discovery via network analysis of somatic mutation data Insuk Lee

Cancer is a progressive genetic disorder. • Accumulation of somatic mutations cause cancer. • For example, in colorectal cancer, the first gatekeeping mutation (often occur in APC) is followed by series of activation of oncogene and loss-of-function of tumor suppressor genes, which eventually generates a malignant tumor.

Sequencing approach to the comprehensive catalog of cancer genes • Tumor samples and adjacent healthy tissue (or blood) samples (i.e., matched normal) samples are sequenced (WES) and aligned to identify cancer-associated somatic mutations (and cancer genes). Nat. Rev. Genet 15:556 (2014)

Driver vs. Passenger mutations • Driver mutation: A mutation that directly or indirectly confers a selective growth advantage to the cell in which it occurs (opposite to passenger mutation) • Not all mutations are driver mutations. Therefore, not all genes contain somatic mutations are cancer driver genes. Nature 458:719 (2009)

Distinguishing Drivers from Passengers § Based on recurrent mutations • Use deleteriousness of the mutations

Using additional information to reduce false positives • Mutation frequency is normalized by gene-specific background mutation rate (BMR), expression level, and replication timing in Mutsig CV . Nature reviews genetics 15:556 (2014)

What about cancer genes with low mutation rate? Many hills but only few mountains Of the genomic landscapes of human colorectal cancers (Wood et al. Science 2007) • Map of mutations in 11 breast and 11 colorectal cancers. • In the landscape, the heights of the peaks reflect the mutation frequency of each gene. A few gene “ mountains ” are mutated in a large proportion of tumors: most genes are mutated in <5% of tumors and are represented as “ hills ” in the figure. • We observed similar distribution of mutation frequency from TCGA data.

Long-tail distribution of mutation frequency • The majority of the cancer genes are infrequently mutated and have somatic mutations in only few patients, which result in long-tail distribution of mutation frequency. • Therefore, methods based on recurrent mutations have intrinsic limitation in cancer gene identification. 2000 1800 2000 Among 422 known cancer 1800 1600 genes by CGC 1600 1400 7 genes: mut in >5% tumors 1400 Mutation count Mutation count 128 genes: mut in >1% tumors 1200 1200 12 genes: no mut in tumors 1000 1000 800 800 600 400 600 200 400 0 TP53 PIK3CA PTEN BRAF KMT2C KMT2D APC ATRX IDH1 ARID1A 200 0 Mutation distribution across 422 CGC (Cancer Genome Census) genes in 6764 Pan-cancer samples (April 2014 TCGA). 410 mutated genes

Cancer is a disease by pathway disorders • However, mutations concentrated in known cancer-related pathways, which suggest that pathway-centric approach will be useful in analysis of cancer genomics data. Nat. Rev. Cancer Poster (2002)

MUFFINN: mutations for functional impact on network neighbors • Predict driver genes based on pathway-level mutational information Genome Biology ( 2016 )

3 ways to take account neighbors’ mutational burden • On the following two functional gene networks Genome Res. (2011) Nucleic Acids Res. (2015)

Cancer gene sets for benchmarking prediction • No comprehensive gold standard cancer gene set • We compiled multiple cancer gene sets from various sources of annotations. • Each cancer gene set has a different trade-off between accuracy, coverage, and bias. CGC CGC PointMut 20/20 Rules HCD MouseMut • 422 genes • 118 genes • 124 genes • 288 genes • 797 genes • From CGC • CGC genes which • based on the • High-confidence Ortholog-mapped (Cancer Genome act to cancer via mutational patterns driver genes by genes which are Census) point mutations rule-based identified by approach mutagenesis experiment in mice V ogelstein et al. 2013 Futreal et al. 2004 Tamborero et al. 2013 March et al. 2011 Mann et al. 2012

Result 1: MUFFINN performs better than gene-based methods 18 cancer types ~6700 TCGA samples

Result 1: MUFFINN performs better than gene-based methods Evaluation based on the all candidates Evaluation based on the top candidates, which go into the follow-up studies

Testing significance of using mutational information among indirect network neighbors for MUFFINN Use mutation information Use mutation information of direct neighbors only of all genes

Result 2: MUFFINN can predict cancer drivers better with taking only direct neighbors’ mutational information. GS: Gaussian smoothing IR: Iterative Rank RWR: Random walk with restart

Result 3: The larger size of Pan-cancer data makes only marginal improvement in predictions.

Result 4: MUFFINN effectively predict cancer genes with only 10% of tumor samples.

Manual examination of the novel candidate drivers • Selected 199 novel candidate drivers that pass all the following criteria. 1. Predicted in top 1000 by MUFFINN (Prob > 0.5) 2. Predicted in top 1000 by neither Mutsig nor MutationAccessor 3. Annotated by neither CGC nor 20/20 cancer gene sets (to exclude all knowns) • Among 199 candidate cancer genes, 128 (64%) genes have direct or indirect supportive evidences in the literatures. • Class 1 (11 genes): already reported as cancer genes but not annotated yet by CGC or 20/20 database. • Class 2 (14 genes): known to increase cancer susceptibility through germline variants. • Class 3 (14 genes): known to be involved in cancer by copy number variation (CNV) or structural variation (SV). • Class 4 (89 genes): associated with cancer via expression dysregulation with non-genetic alterations (e.g., epigenetic regulation, miRNA target). • Class 5 (71 genes): with no evidence (novel candidates to be investigated in the future)

Novel candidate drivers with low mutation occurrence have neighboring genes known to be involved in cancer pathways

Performing prediction using a companion web server www.inetbio.org/muffinn

Summary Cancer genome sequencing can facilitate discovery of cancer driver § genes. We can distinguish drivers from passengers based on recurrent § mutations. Conventional methods based on recurrent mutations are intrinsically § limited to the cancer genes with low mutation occurrence. Since cancer is pathway disease, incorporating pathway information will § enhance cancer genomics data analysis. We developed a network-based method, MUFFINN, and a companion § web server, and demonstrated its superiority in cancer gene prediction. Network-based analysis of cancer genomics data will provide a § promising route to the comprehensive catalog of cancer gene.

Acknowledgements MUFFINN: cancer gene discovery via network analysis of somatic mutation data Genome Biology 17:129 ( June 2016 ) Yonsei Univeristy, Department of Biotechnology (Korea) Ara Cho, Jung Eun Shim, Eiru Kim EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (Spain) Ben Lehner, Fran Supek

Network Biology Lab (www.netbiolab.org) Current members Former members PhD. Jung Eun Shim Sangyoung Lee PhD. Sohyun Hwang PhD. Taeyun Oh PhD. Eiru Kim Chan Yeong Kim PhD. Jawon Song PhD. Samuel Beck Tak Lee Muyoung Lee PhD. Jonghoon Lee PhD. Yoonhee Ko Sunmo Yang Jaewon Cho PhD. Junha Shin PhD. Hanhae Kim Kyungsoo Kim Eunbeen Kim PhD. Ara Cho PhD. Sungou Ji Heonjong Han Hongseok Shim Hyojin Kim Dasom Bae

Result : Accounting for mutational heterogeneity is not important for MUFFINN.

HotNet2 vs. MUFFINN HotNet2 (Nat.Genet. 2015) 1. Assign heat (mutation) to each gene 2. Diffuse heat from hot (highly mutated) to cold genes in the network 3. Extract significantly hot subnetwork (cancer pathway) MUFFINN (this study) 1. Assign heat (mutation) to each gene 2. For each gene, measure mutational burden over network neighbors 3. Rank genes (cancer genes) by the mutational burden

Result : HotNet2 and MUFFINN are complementary Retrieval rate for known cancer genes in 144 candidates by HotNet2 and top 144 canddiates by MUFFINN Venn diagram among 422 CGC genes, 144 candidates by HotNet2, and top 144 candidates by MUFFINN

Cancer gene discovery via network analysis of somatic mutation data - PowerPoint PPT Presentation

Cancer gene discovery via network analysis of somatic mutation data Insuk Lee Cancer is a progressive genetic disorder. Accumulation of somatic mutations cause cancer. For example, in colorectal cancer, the first gatekeeping mutation

Functional Somatic Syndromes 3. generalized autonomic dysfunction 2. impaired specific 1.

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Absolute quantification of somatic DNA alterations in human cancer Scott L. Carter, PhD 11.17.11

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Predicting Cancer Phenotypes based on Somatic Genomic Alterations via Genomic Impact Transformer

Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Actions Access to Community Access to Care Care Coordination Peer somatic & behavioral

Analysis of Gene Expression Profiles Analysis of Gene Expression Profiles and Drug Activity

Prioritizing Therapeutics for Lung Cancer: An Integrative Meta-analysis of Cancer Gene

Anuja Jhingran, MD Cervix Cancer Education Symposium, January 2019 Gynecologic Cancer InterGroup

Anuja Jhingran, MD Cervix Cancer Education Symposium, February 2018 Gynecologic Cancer InterGroup

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

Machine Learning and Metagenome Analysis Chris Fieldss slides presented by Amel Ghouila

Causation Issues Delay in Diagnosis of Cancer Cases Prof Pat Price Faculty of Advocates

Meta%Analysis,of,Prospec3ve,Studies,of,, C%pep3de,and,Risk,of,Colorectal,Neoplasia Zhimin,Xiao

Practical Issues in Serrated (MSI) colorectal carcinoma Colorectal Polyps Wendy L Frankel, MD

Stoma Complications and Management I have nothing to disclose Lois Anne Indorf, NP DISCLOSURES

Advanced/Recurrent Endometrial Cancer: First-line Treatment should be Chemotherapy PRO Gini

Community Engagement Julia Clarke Director of Corporate Governance October 2018 1 The

Reporting : Preparing for > Explain the Hospice lnformation Set (HlS) HIS > Explain the

Sambuz

Useful Links

Newsletter

Mail Us

Cancer gene discovery via network analysis of somatic mutation data - PowerPoint PPT Presentation

Cancer gene discovery via network analysis of somatic mutation data Insuk Lee Cancer is a progressive genetic disorder. Accumulation of somatic mutations cause cancer. For example, in colorectal cancer, the first gatekeeping mutation

Functional Somatic Syndromes 3. generalized autonomic dysfunction 2. impaired specific 1.

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Absolute quantification of somatic DNA alterations in human cancer Scott L. Carter, PhD 11.17.11

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Predicting Cancer Phenotypes based on Somatic Genomic Alterations via Genomic Impact Transformer

Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Actions Access to Community Access to Care Care Coordination Peer somatic &amp; behavioral

Analysis of Gene Expression Profiles Analysis of Gene Expression Profiles and Drug Activity

Prioritizing Therapeutics for Lung Cancer: An Integrative Meta-analysis of Cancer Gene

Anuja Jhingran, MD Cervix Cancer Education Symposium, January 2019 Gynecologic Cancer InterGroup

Anuja Jhingran, MD Cervix Cancer Education Symposium, February 2018 Gynecologic Cancer InterGroup

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

Machine Learning and Metagenome Analysis Chris Fieldss slides presented by Amel Ghouila

Causation Issues Delay in Diagnosis of Cancer Cases Prof Pat Price Faculty of Advocates

Meta%Analysis,of,Prospec3ve,Studies,of,, C%pep3de,and,Risk,of,Colorectal,Neoplasia Zhimin,Xiao

Practical Issues in Serrated (MSI) colorectal carcinoma Colorectal Polyps Wendy L Frankel, MD

Stoma Complications and Management I have nothing to disclose Lois Anne Indorf, NP DISCLOSURES

Advanced/Recurrent Endometrial Cancer: First-line Treatment should be Chemotherapy PRO Gini

Community Engagement Julia Clarke Director of Corporate Governance October 2018 1 The

Reporting : Preparing for &gt; Explain the Hospice lnformation Set (HlS) HIS &gt; Explain the

Sambuz

Useful Links

Newsletter

Mail Us

Actions Access to Community Access to Care Care Coordination Peer somatic & behavioral

Reporting : Preparing for > Explain the Hospice lnformation Set (HlS) HIS > Explain the