INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, - - PowerPoint PPT Presentation

interaction networks using deep learning
SMART_READER_LITE
LIVE PREVIEW

INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, - - PowerPoint PPT Presentation

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018 Background Method OUTLINE Computational Experiments and


slide-1
SLIDE 1

Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

slide-2
SLIDE 2

2

OUTLINE

Background Method Computational Experiments and Results Conclusions

slide-3
SLIDE 3

3

BACKGROUND

slide-4
SLIDE 4

4

Transcription Translation Forming complexes Performing functions Our works Protein-protein interactions DNA mRNA mRNA Protein Disease Keeping healthy Human Cell Biological System

BACKGROUND

?

slide-5
SLIDE 5

5

BACKGROUND

What is heterodimer and why predict it? Occupy 40% !!! Heterodimers

slide-6
SLIDE 6

6

BACKGROUND

D2 D3 D2

P2 P1 Interaction

Pi: protein Di: domain

Domain Composition

P1

Structure D1 D2

P1

slide-7
SLIDE 7

7

BACKGROUND

D2 D3 D2

P2 P1 Interaction

Pi: protein Di: domain

Domain Composition

P1

Structure D1 D2

P1

P1 P2

w12

Weighted PPI Network

slide-8
SLIDE 8

8

METHOD

slide-9
SLIDE 9

9

OVERVIEW OF THE PROBLEM

Input:weighted PPI network Pi Heterodimer? Pj Input data

slide-10
SLIDE 10

10

MULTIPLE INFORMATION + MULTIPLE DL MODELS

▪ Input data involving

biological information

Protein-protein interaction (PPI)

Domain

Phylogenetic profile

▪ Deep neural network

models including

Convolutional neural network (CNN)

Recurrent neural network (RNN)

CNN + RNN

slide-11
SLIDE 11

11

PROTEIN-PROTEIN INTERACTION (PPI)

Pi Pj wij wik Pk wjk

Dn Dr Dm

Table 1. Feature space mapping from two interacting proteins Pi, Pj and neighbors.

Figure 1. Example of a subgraph with an interacting protein pair and their neighboring proteins.

… ……

The weights of interactions between the focused proteins. The maximum weights of interactions between either of focused proteins and a neighboring protein. The minimum weights of interactions between either of focused proteins and a neighboring protein. The maximum smaller weights of interactions with neighboring proteins. The maximum differences of weights among the neighboring weights.

slide-12
SLIDE 12

12

DOMAIN

The whole domain pair sets for all complexes in the dataset {(D1, D1), (D1, D2),…, (D3, D3),…, (D9, D10),…, (Dn, Dn)}5295 [Cj]=[ ,…, 2 ,…, 1 ,…, 0 ]

P2 P1

Ci

Sample D3D9 D8 D3 D10 D3

Domain pair of protein complex Cj: (D3, D3), (D3, D3), (D3, D10), (D8, D3) , (D8, D3) , (D8, D10) , (D9, D3) , (D9, D3) , (D9, D10)

#domain pair is 5295

slide-13
SLIDE 13

13

PHYLOGENETIC PROFILE

The whole organism for all complexes in the dataset { SC, BS, EC, …}2717 [Cj = Q(P1, P2)]=[ 1 , … ] SC BS EC P1 1 1 P2 1 1 P3 1 P4 1 1

P1 P4 S.Cerevisiae (SC)

#organism is 2717

P1 P2 E.Coli (EC) P3 P2 P1 B.Subtilis (BS)

Q(a, b)=min(a, b)

slide-14
SLIDE 14

14

COMPUTATIONAL EXPERIMENTS

slide-15
SLIDE 15

15

▪ Databases

CYC2008: A manually curated comprehensive catalogue of yeast protein complexes, including 172(42%) heterodimers. WI-PHI: A PPI database with weights containing 49607 interacting protein pairs except self-interactions.

▪ Positives and Negatives

P1 P2 P4 P3 C2 C1

 Positives: (P1,P2)  Negatives: (P1,P3), (P2,P4), (P3,P4) and (P1,P4)  #Sample: 5497

slide-16
SLIDE 16

16

INPUT DATA

e.x.Domain property The whole domain pair set for all complexes in the dataset {(D1, D1), (D1, D2),…, (D3, D3),…, (D9, D10),…, (Dn, Dn)} Input data: [C1]=[ ,…, 2 ,…, 1 ,…, 0 ] [C2]=[ 1 ,…, ,…, 0 ,…, 1 ] … [C5497]=[ 0 ,…, 2 ,…, 1 ,…, 0 ] Label: 1 …

]

slide-17
SLIDE 17

17

INPUT DATA

e.x. Domain + Phylogenetic profile The whole (domain pair set + organism) for all complexes in the dataset {(D1, D1), (D1, D2),…, (Dn, Dn), SC, BS, EC, …}5295+2717 Input data: [C1]=[ ,…, , 0, 0, 1, …] [C2]=[ 1 ,…, 1 , 1, 0, 0, …] … [C5497]=[ 0 ,…, , 0, 1, 1, …] Label: 1 …

]

slide-18
SLIDE 18

18

MODELS

Input data Convolution Neural Network Output Input data Recurrent Neural Network Output Input data Convolution Neural Network Output Recurrent Neural Network

  • D. Quang et al., DanQ: a hybrid convolutional and recurrent deep neural network for

quantifying the function of DNA sequences, Nucleic Acids Research, 2016

slide-19
SLIDE 19

19

RESULTS

slide-20
SLIDE 20

20

PERFORMANCE MEASURES

tp: true positive, tn: true negative, fp: false positive, fn: false negative

𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 = 𝑢𝑞 + 𝑢𝑜 𝑢𝑞 + 𝑢𝑜 + 𝑔𝑞 + 𝑔𝑜 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = 𝑢𝑞 𝑢𝑞 + 𝑔𝑞 𝑆𝑓𝑑𝑏𝑚𝑚 = 𝑢𝑞 𝑢𝑞 + 𝑔𝑜 𝐺1 = 2 · 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 · 𝑠𝑓𝑑𝑏𝑚𝑚 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 + 𝑠𝑓𝑑𝑏𝑚𝑚

slide-21
SLIDE 21

21

COMPARISON OF MODEL + INFORMATION

Models Training accuracy Training loss Test accuracy Evaluation score (F1) CNN (domain) 0.80 1.311 0.79 0.68 CNN (domain+PPI) 0.84 1.124 0.81 0.69 CNN (domain+PPI+Phylogenetic profile) 0.83 0.912 0.81 0.69 RNN (domain+PPI+Phylogenetic profile) 0.71 2.334 0.72 0.66 CNN+RNN (domain+PPI+Phylogenetic profile) 0.86 0.865 0.85 0.72 Baseline method* SVM(PPI+domain) 0.65

  • 0.73

0.63

*P . Ruan et al. Prediction of Heterodimeric Protein Complexes from Weighted Protein-Protein Interaction Networks Using Novel Features and Kernel Functions, PLoS One, 2013

slide-22
SLIDE 22

22

8 min 12 sec 100 200 300 400 500 600 Time(sec)/Epoch

CPU VS GPU

CPU DGX Station

DGX Station is 40 times faster!!

slide-23
SLIDE 23

23

CONCLUSIONS

▪ Applied deep learning to predicting heterodimeric protein

complexes with multiple biological information

▪ The performance of hybrid model with multiple

information is better than single model

▪ The speed of DGX station is 40 times faster than CPU

slide-24
SLIDE 24

Thank you for your kind attention!

Email: cruan@nvidia.com