DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning - - PowerPoint PPT Presentation

deepgene an advanced cancer type classifier based on deep
SMART_READER_LITE
LIVE PREVIEW

DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning - - PowerPoint PPT Presentation

DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning and Somatic Point Mutations (Shi, Yi) 2016.10.03 Center for Systems Biomedicine Shanghai Jiao Tong University Outline Motivation Methods Results & Discussion


slide-1
SLIDE 1

DeepGene: An Advanced Cancer Type Classifier Based

  • n Deep Learning and Somatic Point Mutations

石毅 (Shi, Yi)

2016.10.03 Center for Systems Biomedicine Shanghai Jiao Tong University

slide-2
SLIDE 2

Outline Motivation Methods Results & Discussion

slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

Motivation Traditional cancer diagnosis

  • Morphological appearance,

imaging techniques

  • Gene expression
  • Protein profiling

Image from radiology.med.nyu.edu Image from well.ox.ac.uk Image from sigmaaldrich.com

slide-5
SLIDE 5

Motivation Inside drives

  • Somatic point mutations
  • Insertions and deletions (INDELs)
  • Chromatin translocations
  • Copy number abnormalities
slide-6
SLIDE 6

Motivation

Neural network (1940’s) Support vector machine (1960’s) Deep neural network (1980’s)

Supervised combined with un-supervised

slide-7
SLIDE 7

Motivation

Applications of deep neural network (DNN) learning

slide-8
SLIDE 8

Methods

slide-9
SLIDE 9

Methods Three steps of DeepGene

  • Step1. Clustered gene filtering (CGF)
  • Step2. Indexed sparsity reduction (IDS)
  • Step3. Deep neural network (DNN) classifier
slide-10
SLIDE 10

Methods

  • Step1. Clustered gene filtering (CGF)
  • Intuitive idea:

Vs.

Team A: Team B:

slide-11
SLIDE 11

Methods

  • Step1. Clustered gene filtering (CGF)
slide-12
SLIDE 12

Methods

  • Step2. Indexed sparsity reduction (ISR)

Raw gene data Nx1

⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡𝟐 𝟏 𝟐 𝟏 ⋮ 𝟏 𝟐⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤ * 𝟐 𝟒 ⋮ 𝑶

Indexed gene data nNZx1 if nNZ≥ nISR if nNZ<nISR Truncate the top nISR elements with highest occurrence frequency Add zeros to tail

. 𝟐 𝟒 ⋮ /

⎣ ⎢ ⎢ ⎢ ⎢ ⎡𝟐 𝟒 ⋮ 𝑶 𝟏 ⋮ 𝟏⎦ ⎥ ⎥ ⎥ ⎥ ⎤

Gene data after ISR nISRx1

slide-13
SLIDE 13

Methods

  • Step3. Deep neural network classifier
slide-14
SLIDE 14

Methods Overall flowchart of DeepGene

Raw gene data N× 1 Clustered gene filtering (CGF) Clustered discriminatory gene data n1×1 Indexed sparsity reducing (ISR) Indexes of non-zero elements n2×1 Concatenation Input to the DNN classifier (n1+n2)×1 DNN classifier

6

Classification label 1×1

KIRP

Classification result (cancer type)

⎣ ⎢ ⎢ ⎢ ⎡𝟐 𝟏 𝟐 𝟏 ⋮⎦ ⎥ ⎥ ⎥ ⎤

⎣ ⎢ ⎢ ⎢ ⎡𝟐 𝟏 𝟏 𝟏 ⋮⎦ ⎥ ⎥ ⎥ ⎤ ⎣ ⎢ ⎢ ⎢ ⎡ 𝟐 𝟒 𝟐𝟏 𝟑𝟗 ⋮ ⎦ ⎥ ⎥ ⎥ ⎤ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ 𝟐 𝟏 𝟏 ⋮ 𝟐 𝟒 𝟐𝟏 ⋮ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤

slide-15
SLIDE 15

Results & Discussion

slide-16
SLIDE 16

Results & Discussion Dataset

  • 12 tumor somatic point mutation datasets from
  • TCGA. (ACC, BLCA, BRCA, CESC, HNSC, KIRP,

LGG, LUAD, PAAD, PRAD, STAD, UCS)

  • 22,834 genes from 3,122 samples in total.

Note: ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KIRP , kidney renal papillary cell carcinoma; LGG, brain lower grade glioma; LUAD, lung adenocarcinoma; PAAD, pancreatic adenocarcinoma; PRAD, prostate adenocarcinoma; STAD, stomach adenocarcinoma; UCS, uterine carcinosarcomas.

slide-17
SLIDE 17

Results & Discussion Parameters

(a) (b) (c) (d)

(a) Parameter estimation for and , corresponding to Table 4; (b) parameter estimation for layer number and parameter number per layer for the DNN classifier, corresponding to Table 5; (c) parameter estimation for cost and gamma for SVM, corresponding to Table 6; (d) parameter estimation for Table 7.

slide-18
SLIDE 18

Results & Discussion Does CGF and/or ISR help?

10-fold cross validation accuracy of DeepGene with different design options

slide-19
SLIDE 19

Results & Discussion Comparing to other famous classifiers

Testing accuracy of DeepGene against three widely adopted classifiers

slide-20
SLIDE 20

Results & Discussion Further investigation

  • Integrating other heterogeneous mutation data, e.g.

INDEL, CNV, translocation

  • What feature (gene) combinations contribute to

better prediction accuracy? Why?

How this can help real diagnosis?

  • Applying to CTC or ctDNA for early diagnosis,

subtyping, locating.

slide-21
SLIDE 21

Questions & Comments?