Tools for analyzing cancer variation Ekta Khurana, PhD Assistant - - PowerPoint PPT Presentation

tools for analyzing cancer variation
SMART_READER_LITE
LIVE PREVIEW

Tools for analyzing cancer variation Ekta Khurana, PhD Assistant - - PowerPoint PPT Presentation

Tools for analyzing cancer variation Ekta Khurana, PhD Assistant Professor Meyer Cancer Center Englander Institute for Precision Medicine Institute for Computational Biomedicine Department of Physiology and Biophysics Weill Cornell Medicine,


slide-1
SLIDE 1

Tools for analyzing cancer variation

Ekta Khurana, PhD

Assistant Professor Meyer Cancer Center Englander Institute for Precision Medicine Institute for Computational Biomedicine Department of Physiology and Biophysics Weill Cornell Medicine, New York, NY

1

ekk2003@med.cornell.edu @ekta_khurana

slide-2
SLIDE 2

2

First cancer WGS, Ley, Mardis, et al., Nature, 2008 ~500 cancer WGS Alexandrov, et al., Nature, 2013 ~ 3000 WGS from ICGC/TCGA 2014

slide-3
SLIDE 3

3

International Cancer Genome Consortium & The Cancer Genome Atlas

~3000 WGS (tumor & normal), ~1600 RNA-Seq, ~1500 methylaQon

slide-4
SLIDE 4

4

Most variants are in noncoding regions

Khurana et al, Nature Rev Genet, 2016

MB: medulloblastoma DLBC: B cell lymphoma STAD: gastric BRCA: breast PAAD: pancreaHc PRAD: prostate LIHC: liver PA: pilocyHc Astrocytoma LUAD: Lung adenocarcinoma

slide-5
SLIDE 5

5

CGGAGG CGGAAG

mRNA

Gain-of-motif

TF TF

0.0 1.0 2.0

WT Mutated

Loss-of-motif Loss-of-motif

TATCTAT

X

TF

TATTTAT

TF

WT Mutated

T

0.0 1.0 2.0

C

G

T

A

T

G

T G C

A 5

G C T AC

TG

ATT

10

A

GT G A

G A

C

T

C G A

T

A

C

T

15

G

T

A

C

Altered binding effects

2.0

Promoter Gene

X

  • MYB moHf created & drives TAL1
  • verexpression in T-ALL (Mansour et al,

Science, 2014)

Modes of action of noncoding variants: transcription factor binding disruption

TERT promoter mutated in many different cancer types

Killela et al, PNAS, 2013 Horn et al, Science, 2013 Huang et al, Science, 2013

slide-6
SLIDE 6

Co-variates of mutation rates: Increased mutation density at TF binding sites in melanoma and lung cancer

6 Perera et al, Nature, 2016 Sabarinathan et al, Nature, 2016 Khurana, Nature News & Views, 2016

slide-7
SLIDE 7

Outline

  • Variants with high functional impact:

FunSeq

  • Driver elements w/ more recurrent & high

functional impact mutations than expected randomly: CompositeDriver

7

slide-8
SLIDE 8

Khurana et al, Nature Rev Genet, 2016

IdenQfying noncoding variants associated with cancer

slide-9
SLIDE 9

Khurana et al, Nature Rev Genet, 2016

IdenQfying noncoding variants associated with cancer

FunSeq

slide-10
SLIDE 10

10

EvoluQonary conservaQon

  • Typically defined by comparison

across species ConservaQon among humans

  • DepleHon of common variants/Enrichment
  • f rare variants

1 2 4 3

Estimating negative selection

Common variant Rare variants FracHon of rare variants = (Num of rare variants/ Total num of variants)

slide-11
SLIDE 11

Enrichment of rare SNPs as a metric for negative selection

11

0.4 0.5 0.6 0.7 0.8 0.9

Fraction of rare SNPs (nonsyn)

All Coding LOF-tol. Recessive GWAS Dominant Essential Cancer

(rare=derived allele freq < 0.5%)

  • Depletion of common

polymorphisms in regions under selection

Negative selection restricts the allele frequency of deleterious mutations.

  • Results for coding genes

consistent with known phenotypic impacts

  • Other metrics for selection
  • EvoluHonary conservaHon

(e.g. GERP)

  • SNP density

(confounded by mutaHon rate)

LOF-tol (Loss-of-funcQon tolerant): least negaQve selecQon Cancer: most selecQon

Khurana et al., Science, 2013

slide-12
SLIDE 12

Organism-level negative selection in noncoding elements

12 Khurana et al., Science, 2013

slide-13
SLIDE 13

Negative selection and tissue-specificity of coding and noncoding regions

13

q Ubiquitously expressed genes and bound regions show stronger selection q Differences in constraints amongst tissues q Constraints in coding genes and regulatory genes are correlated across tissues

slide-14
SLIDE 14

Which noncoding categories are under very strong “coding-like” selection ?

q Top categories among ranked 102 categories q Binding peaks of some general TFs (eg FAM48A) q Core moHfs of some TF families (eg JUN, GATA) q DHS sites in spinal cord and connecHve Hssue

~0.4% genomic coverage (~ top 25) ~0.02% genomic coverage (top 5)

~400-fold ~40-fold

Enrichment of know disease- causing mutaHons from Human Gene MutaHon database

14

slide-15
SLIDE 15

Human regulatory network from ENCODE ChIP-Seq

Peak Calling (ChIP-Seq) Assigning TF binding sites to targets Filtering high confidence edges ~28K proximal edges

PotenHal Distal Edge Strong Proximal Edge

TF TF

Nodes 119 TFs and ~9000 target genes Edges 28,000 interacHons Using correlaHon with expression data

15

Gerstein¶…..Khurana¶…., Nature, 2012 (¶ co-first authors) Yip et al, Genome Res, 2012

slide-16
SLIDE 16

Gene essentiality and human regulatory network

16

Non-TF target

In-degree = 1 Out-degree = 5

TF

Gerstein¶…..Khurana¶…., Nature, 2012 (¶ co-first authors)

LoF-tolerant Essential 0.0 0.5 1.0 1.5 2.0 2.5 Regulatory degree Wilcoxon pvalue=1.29e-2

Total degree (IN + OUT) (log scale)

Essential genes tend to be central

Khurana et al., PLoS Comp. Bio., 2013

16 EssenHal LoF-tolerant

Size of nodes scaled by total degree Z Gumus iCAVE movie

slide-17
SLIDE 17

17

Identification of noncoding mutations with high impact: FunSeq

slide-18
SLIDE 18
  • Feature weight
  • Weighted with mutation patterns in natural polymorphisms

(features frequently observed weighed less)

  • entropy based method

!! = 1 + !!!"#!!! + 1 − !! !"#! 1 − !! ! !

!

!

!"#$%! =! !!!!!!!"!!"#$%&$'!!"#$%&"'!

18

HOT region SensiHve region Polymorphisms

Genome

p = probability of the feature overlapping natural polymorphisms Feature weight: For a variant:

wd p

FunSeq2: weighted scoring scheme

Fu et al., Genome Biology, 2014

hkps://github.com/khuranalab/FunSeq_PCAWG hkp://funseq2.gersteinlab.org

slide-19
SLIDE 19

IdenQfying noncoding variants associated with cancer

CompositeDriver FunSeq

slide-20
SLIDE 20

CompositeDriver for detecting driver coding & noncoding elements

(A) AlteraHons are funcHonally annotated by FunSeq2 pipeline

FSi=original FunSeq2 score

(B) Calculate posiHonal recurrence of each mutaHon in the cohort

Sample 1 Sample 2

Sample 3

(C) Within each funcHonal region, composite funcHonal score (CFSr) is sum of recurrence mulHplied by FunSeq2 score in each posiHon with alteraHon.

r = region (cds, promoter, enhancer and lincRNA) n = number of variants in r Wi = number of samples with variant i

(D) P-value for each region is produced from permutaHon test and Benjamini and Hochberg method to correct mulHple hypothesis tesHng.

FS1 FS3 FS4 FS5 FS7 FS6 FS2 FS8 FS9

20

Eric Minwei Liu

slide-21
SLIDE 21

Results from 40 lung adenocarcinoma samples

Coding seq. Promoters lincRNA

Expected (-logP) Expected (-logP) Expected (-logP)

Data from TCGA

slide-22
SLIDE 22

1 2 3 4 5 6 1 2 3 4 5 6 Q−Q plot for SNV's in promoter regions Expected (−logP) Observed (−logP) 1 2 3 4 1 2 3 4 Q−Q plot for SNV's in coding regions Expected (−logP) Observed (−logP)

SPOP

1 2 3 4 5 6 7 1 2 3 4 5 6 7 Q−Q plot for SNV's in enhancer regions Expected (−logP) Observed (−logP)

Q-Q plot of SNVs in coding regions Q-Q plot of SNVs in promoters Q-Q plot of SNVs in enhancers

22

Results from 188 prostate cancer samples

Data from ICGC, Baca et al Cell 2013, Berger et al Nature 2011

slide-23
SLIDE 23

Functional validation of candidates in prostate cancer

WDR74 promoter

q Sanger sequencing in 19 additional samples confirms the recurrence
 q WDR74 shows increased expression in tumor samples benign PCa 23

RET promoter Increased activity

In collaboration w/ Mark Rubin

EIF4EBP3 promoter Reduced activity

slide-24
SLIDE 24

24

Acknowledgements

Yale Yao Fu (now at Bina), Xinmeng Mu (now at

Broad), Jieming Chen, Lucas Lochovsky, Arif Harmanci, Alexej Abyzov, Suganthi Balasubramanian, Cristina Sisu, Declan Clarke, Mike Wilson, Yong Kong, Mark Gerstein

Sanger

Vincenza Colonna, Yuan Chen, Yali Xue, Chris Tyler-Smith

Cornell

Steven Lipkin, Jishnu Das, Robert Fragoza, Xiaomu Wei, Haiyuan Yu Andrea Sboner, Dimple Chakravarty, Naoki Kitabayashi, Vaja Liluashvili, Zeynep H. Gümüş, Kellie Cotter, Mark A. Rubin

U of Michigan

Hyun Min Kang

U of Geneva

Tuuli Lappalainen (NYGC), Emmanouil

  • T. Dermitzakis

Baylor

Daniel Challis, Uday Evani, Donna Muzny, Fuli Yu, Richard Gibbs

EBI

Kathryn Beal, Laura Clarke, Fiona Cunningham, Paul Flicek, Javier Herrero, Graham R. S. Ritchie

Boston College

Erik Garrison, Gabor Marth

Mass Gen Hospital

Kasper Lage, Daniel G. MacArthur, Tune H. Pers

Rutgers

Jeffrey A. Rosenfeld FuncQonal InterpretaQon Group

~50 parHcipants ~40 InsHtutes ~550 parHcipants

slide-25
SLIDE 25

Khurana lab Eric Minwei Liu Priyanka Dhingra Alexander Fundichely Tawny Cuykendall Andrea Sboner Mark Rubin Dimple Chakravarty Kellie Cotter Steve Lipkin Chason Lee

Sandra and Edward

Meyer Cancer Center

Englander Institute for Precision Medicine

Institute for Computational Biomedicine

Postdoc posiQons available khuranalab.med.cornell.edu/jobs ekk2003@med.cornell.edu