interaction networks using deep learning
play

INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, - PowerPoint PPT Presentation

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018 Background Method OUTLINE Computational Experiments and


  1. PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018

  2. Background Method OUTLINE Computational Experiments and Results Conclusions 2

  3. BACKGROUND 3

  4. BACKGROUND Our works DNA Forming Transcription Translation Protein mRNA complexes mRNA Performing functions Biological System ? Disease Cell Human Keeping healthy 4 Protein-protein interactions

  5. BACKGROUND What is heterodimer and why predict it? Heterodimers Occupy 40% !!! 5

  6. BACKGROUND Structure Domain Composition D 1 D 3 P 1 P 1 P 2 P 1 Interaction D 2 D 2 D 2 Pi : protein Di : domain 6

  7. BACKGROUND Structure Domain Composition Weighted PPI Network D 1 D 3 w 12 P 1 P 1 P 2 P 1 Interaction P 2 P 1 D 2 D 2 D 2 Pi : protein Di : domain 7

  8. METHOD 8

  9. OVERVIEW OF THE PROBLEM Input : weighted PPI network Heterodimer? P i P j Input data 9

  10. MULTIPLE INFORMATION + MULTIPLE DL MODELS ▪ Input data involving ▪ Deep neural network biological information models including Protein-protein interaction Convolutional neural   (PPI) network (CNN) Domain Recurrent neural network   (RNN) Phylogenetic profile  CNN + RNN  10

  11. PROTEIN-PROTEIN INTERACTION (PPI) Table 1. Feature space mapping from two interacting proteins P i , P j and neighbors. … The weights of interactions between w ij the focused proteins. P i P j D m D n D r The maximum weights of interactions between either of focused proteins and a w jk neighboring protein. w ik The minimum weights of interactions …… between either of focused proteins The maximum smaller weights of interactions and a P k with neighboring proteins. neighboring protein. Figure 1. Example of a subgraph with an The maximum differences of weights among interacting protein pair and their the neighboring weights. neighboring proteins. 11

  12. DOMAIN Sample Domain pair of protein complex C j : P 2 P 1 C i ( D 3, D 3 ), ( D 3, D 3 ), ( D 3, D 10 ), ( D 8, D 3 ) , ( D 8, D 3 ) , D 3 D 8 D 3 D 10 D 3 D 9 ( D 8, D 10 ) , ( D 9, D 3 ) , ( D 9, D 3 ) , ( D 9, D 10 ) The whole domain pair sets for all complexes in the dataset {( D 1, D 1 ), ( D 1, D 2 ),…, ( D 3, D 3 ),…, ( D 9, D 10 ),…, ( D n, D n )} 5295 #domain pair is 5295 [ C j ]=[ 0 0 ,…, 2 ,…, 1 ,…, 0 ] 12

  13. PHYLOGENETIC PROFILE SC BS EC P 3 P 2 P 1 P 4 P 1 P 1 1 0 1 P 2 0 1 1 S.Cerevisiae (SC) P 1 P 2 B.Subtilis (BS) P 3 0 1 0 P 4 1 1 0 E.Coli (EC) The whole organism for all complexes in the dataset { SC , BS , EC , …} 2717 Q ( a, b )=min( a , b ) #organism is 2717 [ C j = Q ( P 1 , P 2 )]=[ 0 0 1 , … ] 13

  14. COMPUTATIONAL EXPERIMENTS 14

  15. ▪ Databases CYC2008: A manually curated comprehensive catalogue of yeast protein complexes, including 172(42%) heterodimers. WI-PHI: A PPI database with weights containing 49607 interacting protein pairs except self-interactions. ▪ Positives and Negatives C 2  Positives: (P 1 ,P 2 ) P 2 P 1  Negatives: ( P 1 , P 3 ), ( P 2 , P 4 ), ( P 3 , P 4 ) and ( P 1 , P 4 ) C 1  #Sample: 5497 P 4 P 3 15

  16. INPUT DATA e.x.Domain property The whole domain pair set for all complexes in the dataset {( D 1, D 1 ), ( D 1, D 2 ),…, ( D 3, D 3 ),…, ( D 9, D 10 ),…, ( D n, D n )} Input data: Label: 0 [ C 1 ]=[ 0 0 ,…, 2 ,…, 1 ,…, 0 ] 1 [ C 2 ]=[ 0 1 ,…, 0 ,…, 0 ,…, 1 ] … … [ C 5497 ]=[ 0 0 ,…, 2 ,…, 1 ,…, 0 ] 0 ] 16

  17. INPUT DATA e.x. Domain + Phylogenetic profile The whole (domain pair set + organism) for all complexes in the dataset {( D 1, D 1 ), ( D 1, D 2 ),…, ( D n, D n ), SC , BS , EC , …} 5295+2717 Label: Input data: 0 [ C 1 ]=[ 0 0 ,…, 0 , 0, 0, 1, … ] 1 [ C 2 ]=[ 0 1 ,…, 1 , 1, 0, 0, … ] … … [ C 5497 ]=[ 0 0 ,…, 0 , 0, 1, 1, … ] 0 ] 17

  18. MODELS Input data Input data Input data Convolution Neural Network Recurrent Neural Network Convolution Neural Network Output Output Recurrent Neural Network Output D. Quang et al., DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences, Nucleic Acids Research, 2016 18

  19. RESULTS 19

  20. PERFORMANCE MEASURES 𝑢𝑞 + 𝑢𝑜 𝐵𝑑𝑑𝑣𝑠𝑏𝑑𝑧 = 𝑢𝑞 + 𝑢𝑜 + 𝑔𝑞 + 𝑔𝑜 𝑢𝑞 𝑄𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 = 𝑢𝑞 + 𝑔𝑞 𝑢𝑞 𝑆𝑓𝑑𝑏𝑚𝑚 = 𝑢𝑞 + 𝑔𝑜 𝐺1 = 2 · 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 · 𝑠𝑓𝑑𝑏𝑚𝑚 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 + 𝑠𝑓𝑑𝑏𝑚𝑚 tp: true positive, tn: true negative, fp: false positive, fn: false negative 20

  21. COMPARISON OF MODEL + INFORMATION Models Training accuracy Training loss Test accuracy Evaluation score (F1) CNN (domain) 0.80 1.311 0.79 0.68 CNN (domain+PPI) 0.84 1.124 0.81 0.69 CNN 0.83 0.912 0.81 0.69 (domain+PPI+Phylogenetic profile) RNN 0.71 2.334 0.72 0.66 (domain+PPI+Phylogenetic profile) CNN+RNN 0.86 0.865 0.85 0.72 (domain+PPI+Phylogenetic profile) Baseline method* 0.65 - 0.73 0.63 SVM(PPI+domain) *P . Ruan et al. Prediction of Heterodimeric Protein Complexes from Weighted Protein-Protein Interaction Networks Using Novel Features and Kernel Functions, PLoS One , 2013 21

  22. CPU VS GPU 600 500 8 min 400 DGX Station is 300 40 times faster!! 200 100 12 sec 0 Time(sec)/Epoch CPU DGX Station 22

  23. CONCLUSIONS ▪ Applied deep learning to predicting heterodimeric protein complexes with multiple biological information ▪ The performance of hybrid model with multiple information is better than single model ▪ The speed of DGX station is 40 times faster than CPU 23

  24. Thank you for your kind attention! Email: cruan@nvidia.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend