Predicting Cancer Phenotypes from Somatic Genomic Alterations via - - PowerPoint PPT Presentation

predicting cancer phenotypes from somatic genomic
SMART_READER_LITE
LIVE PREVIEW

Predicting Cancer Phenotypes from Somatic Genomic Alterations via - - PowerPoint PPT Presentation

Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer Yifeng Tao 1 , Chunhui Cai 2 , William W. Cohen 1,* , Xinghua Lu 2,3,* 1 School of Computer Science, Carnegie Mellon University 2 Department of Biomedical


slide-1
SLIDE 1

Predicting Cancer Phenotypes from Somatic Genomic Alterations via Genomic Impact Transformer

Yifeng Tao1, Chunhui Cai2, William W. Cohen1,*, Xinghua Lu2,3,*

1School of Computer Science, Carnegie Mellon University 2Department of Biomedical Informatics, University of Pittsburgh 3Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh 1

slide-2
SLIDE 2

Tumor origin and progression

  • Cancers are mainly caused by somatic genomic alterations (SGAs)
  • Driver SGAs (~10s/tumor): Promote tumor progression
  • Passenger SGAs (~100s/tumor): Neutral mutations
  • How to distinguish drivers from passengers?

2

S Nik-Zainal et al. 2017

slide-3
SLIDE 3

Cancer drivers

  • How to distinguish drivers from passengers?
  • Frequency: recurrent mutations more likely to be drivers
  • Conserved domain: protein function significantly disturbed
  • All unsupervised. But drivers are defined as mutations that promote to

tumor development…

3

B Vogelstein et al. 2013 ND Dees et al. 2012 MS Lawrence et al. 2013 B Reva et al. 2011 B Niu et al. 2016

slide-4
SLIDE 4

Cancer drivers

  • Identify driver SGAs with

supervision of downstream phenotypes

  • Change of RNA expression
  • Differentially expressed

genes (DEGs)

  • Candidate models
  • Bayesian model (C Cai et al. 2019)
  • Lasso/Elastic net (R Tibshirani

1994)

  • Multi-layer perceptrons

(MLPs) (F Rosenblatt 1958)

  • Models do prediction &

driver detection?

4

Model (?) that predicts DEGs accurately & identifies driver SGAs

slide-5
SLIDE 5

Self-attention mechanism

  • Models do prediction &

driver detection?

  • Attention mechanism
  • Initially in CV (K Xu et al.

2015)/NLP (A Vaswani et al. 2017)

  • Better interpretability
  • Improves performance
  • Self-attention mechanism (Z

Yang et al. 2016)

  • Contextual deep learning

framework: weights determined by all the input mutations

5

Model with self-attention that predicts DEGs accurately & identifies driver SGAs 𝛽" 𝛽# 𝛽$ 𝛽% 𝛽& 𝛽' 𝛽( = 1

slide-6
SLIDE 6

Genomic impact transformer (GIT)

  • Transformer: encoder-decoder architecture
  • Encoder: self-attention mechanism; Decoder: MLP

6

slide-7
SLIDE 7

Encoder: Multi-head self-attention

  • Tumor embedding is the weighted

sum of gene embeddings:

  • Weights determined by input gene

embeddings:

7

slide-8
SLIDE 8

Pre-training gene embedding: Gene2Vec

  • Co-occurrence pattern (e.g., mutually exclusive alterations)

8

g c Pathway 1 Pathway 2 Pathway 3

MD Leiserson et al. 2015 T Mikolov et al. 2013

slide-9
SLIDE 9

Improved performance in predicting DEGs

  • Predicting DEGs from SGAs
  • Conventional models
  • Ablation studies

9

51 53 55 57 59 61 63

F1 score

73 74 75 76 77 78 79

Accuracy

slide-10
SLIDE 10

Candidate drivers via attention mechanism

10

slide-11
SLIDE 11

Gene embedding space

  • Functionally similar genes are close in gene embedding space
  • Qualitatively and quantitatively (i.e., GO enrichment, NN accuracy)

11

slide-12
SLIDE 12

Tumor embedding: Survival analysis

  • Tumor embeddings reveal distinct survival profiles

12

slide-13
SLIDE 13

Tumor embedding: Drug response

  • Tumor embeddings are predictive of drug response

13

slide-14
SLIDE 14

Conclusions and future work

  • Biologically inspired neural network framework
  • Identifying cancer drivers with supervision of DEGs
  • Accurate prediction of DEGs from mutations
  • Side products
  • Gene embedding: informative of gene functions
  • Tumor embedding: transferable to other phenotype prediction tasks
  • Code and pretrained gene embedding:

https://github.com/yifengtao/genome-transformer

  • Future work
  • Fine-grained embedding representation in codon level
  • Tumor evolutionary features, e.g., hypermutability, intra-tumor

heterogeneity

14

slide-15
SLIDE 15

Acknowledgments

  • Dr. Xinghua Lu
  • Dr. William W. Cohen
  • Dr. Chunhui Cai
  • Michael Q. Ding
  • Yifan Xue

15

slide-16
SLIDE 16

Quantitative measurement of gene embeddings

  • Functional similar genes à closer in embedding space
  • Go enrichment:
  • NN accuracy:

16

4 5 6 7 8 9 10 11

Random pairs Gene2Vec Gene2Vec+GIT

NN accuray (%)

slide-17
SLIDE 17

Tumor embedding space

17

slide-18
SLIDE 18

Gene2Vec algorithm

18

slide-19
SLIDE 19

Gene2Vec: Co-occurrence patterns

  • Co-occurrence does not necessarily mean similar embeddings
  • Ex 1: two cats sit there .
  • Ex 2: two cats stand there .
  • Ex 3: two dogs sit there .

19

Pathway 1: number Pathway 2: noun Pathway 3: verb

MD Leiserson et al. 2015 T Mikolov et al. 2013

  • ne

two several cat dog stand sit lie