Predicting Cancer Phenotypes based on Somatic Genomic Alterations - - PowerPoint PPT Presentation

predicting cancer phenotypes based on somatic genomic
SMART_READER_LITE
LIVE PREVIEW

Predicting Cancer Phenotypes based on Somatic Genomic Alterations - - PowerPoint PPT Presentation

Predicting Cancer Phenotypes based on Somatic Genomic Alterations via Genomic Impact Transformer Yifeng Tao 1 , Chunhui Cai 2 , William W. Cohen 1* , Xinghua Lu 2* 1 Carnegie Mellon University 2 University of Pittsburgh Yifeng Tao Carnegie


slide-1
SLIDE 1

Predicting Cancer Phenotypes based

  • n Somatic Genomic Alterations via

Genomic Impact Transformer

Yifeng Tao1, Chunhui Cai2, William W. Cohen1*, Xinghua Lu2*

1Carnegie Mellon University 2University of Pittsburgh

Carnegie Mellon University 1 Yifeng Tao

slide-2
SLIDE 2

Background

  • Cancers are mainly caused by somatic genomic alterations (SGAs)
  • Driver SGAs à causal to tumor development
  • Passenger SGAs à neutral mutations

Carnegie Mellon University 2 Yifeng Tao

Normal cells Driver SGAs Biological/cellular processes perturbed Tumor cells

slide-3
SLIDE 3

Challenges

  • Driver SGA detection
  • Solution 1: frequency
  • Solution 2: conserved domain of protein
  • Problem: downstream effect of SGAs
  • SGA/tumor representation
  • Solution: a higher dimensional one-hot/sparse vector
  • Problem: little information/knowledge

Carnegie Mellon University 3 Yifeng Tao

slide-4
SLIDE 4

Genomic Impact Transformer (GIT)

Carnegie Mellon University 4 Yifeng Tao

  • GIT: encoder-decoder architecture
  • Mimic cellular signaling process
  • Driver SGA detection
  • Problem: downstream effect of SGAs
  • Solution: supervised by gene expressions
  • SGA/tumor representation
  • Problem: little information/knowledge
  • Solution: gene/tumor embedding

gene embeddings encoder SGA tumor embedding decoder gene expression

slide-5
SLIDE 5

Genomic Impact Transformer (GIT)

...

β1,1

softmax

β1,2

softmax

Cancer patient: TCGA-D8-A1JJ

CNBD1 MATN2 PIK3CA PURG TP53 GATA3 ZBTB10 MRPS28 BRCA

... Tumor embedding e1 e2 e3 em W0

tanh θh

e1 e2 em e3 e1 es em e2 ɑ1 1 ɑm ɑ2 et=es+ɑ1e1+ɑ2e2+ɑmem Somatic genomic alterations (SGAs) Cancer type Over-expressed genes Under-expressed genes α3 α4 α5 α1 α2 αm ... ... ... ... ... et Gene embeddings 1 es Multi-head self-attention Gene embeddings Attention weights Gene embeddings Cancer type embedding Attention weights Tumor embedding

(a) (b) (c)

α1 α2 α3 αm

β1,h β2,h β3,h βm,h

softmax

α1,h α2,h α3,h αm,h

θ1θ2 ...

h heads

Differentially expressed genes (DEGs) Carnegie Mellon University 5 Yifeng Tao

slide-6
SLIDE 6

Encoder: Attention

Carnegie Mellon University 6 Yifeng Tao

slide-7
SLIDE 7

Encoder: Attention

Carnegie Mellon University 7 Yifeng Tao

slide-8
SLIDE 8

Decoder: MLP

Carnegie Mellon University 8 Yifeng Tao

slide-9
SLIDE 9

Pre-training Gene Embedding: Gene2Vec

  • Co-occurrence pattern

Carnegie Mellon University 9 Yifeng Tao

g c Pathway 1 Pathway 2 Pathway 3

slide-10
SLIDE 10

Performance

  • Predicting gene expression using SGAs

Carnegie Mellon University 10 Yifeng Tao

51 53 55 57 59 61 63

F1 score

73 74 75 76 77 78 79

Accuracy

slide-11
SLIDE 11

Gene Embedding

Carnegie Mellon University 11 Yifeng Tao

slide-12
SLIDE 12

Gene Embedding

Carnegie Mellon University 12 Yifeng Tao

4 5 6 7 8 9 10 11

Random pairs Gene2Vec Gene2Vec+GIT

NN accuray (%)

  • NN accuracy: functional similar genes à closer in embedding space
slide-13
SLIDE 13

Candidate Drivers via Attention

Carnegie Mellon University 13 Yifeng Tao

slide-14
SLIDE 14

Tumor Embedding

Carnegie Mellon University 14 Yifeng Tao

  • Common cellular signaling process across cancer types
slide-15
SLIDE 15

Application - Survival Analysis

Carnegie Mellon University 15 Yifeng Tao

slide-16
SLIDE 16

Application – Drug Response

Carnegie Mellon University 16 Yifeng Tao

slide-17
SLIDE 17

Summary

  • Biological-inspired neural network to mimic cellular signaling
  • Distinguish drivers from passengers with supervision of expression
  • Gene embedding: informative of gene functions
  • Tumor embedding: transferable to other phenotype prediction tasks
  • Gene2Vec: speed up training and alleviate overfitting

Carnegie Mellon University 17 Yifeng Tao

slide-18
SLIDE 18

Acknowledgements

  • Lu Lab
  • Xinghua Lu
  • Chunhui Cai
  • Yifan Xue
  • Michael Q. Ding
  • Cohen Lab
  • William W. Cohen
  • Funding
  • NIH
  • Pennsylvania Department of Health

Carnegie Mellon University 18 Yifeng Tao