interprotein coevolution
play

Interprotein coevolution: bridging scales from residues to genomes - PowerPoint PPT Presentation

Interprotein coevolution: bridging scales from residues to genomes Martin Weigt Laboratoire de Biologie Computationnelle et Quantitative Universit Pierre & Marie Curie Paris Inria Paris


  1. Interprotein coevolution: 
 bridging scales from residues to genomes Martin Weigt Laboratoire de Biologie Computationnelle et Quantitative Université Pierre & Marie Curie Paris Inria Paris 16 Nov 2017

  2. The different scales in protein-protein interaction Who with whom? protein-protein interaction networks

  3. The different scales in protein-protein interaction How? protein-protein interfaces inter-protein residue contacts

  4. The different scales in protein-protein interaction t Evolution? conservation and innovation of protein-protein interactions

  5. Protein sequence data are accumulating… UniProt database 100 without manual millions of sequence entries annotation 10 UniProtKB/TrEMBL UniProtKB/SwissProt 1 with manual annotation 0.1 2004 2007 2010 2013 2016

  6. …and are classified into homologous protein families Homologous proteins • frequently 10 3 –10 6 proteins per family • common evolutionary ancestry • conserved 3D structure and biological function • diverged amino-acid sequences (~20-30% sequence identity) ‣ sequence variability contains information about structure and function • >5000 families without example structures

  7. Statistical physics From models over data to thermodynamic observables: P ( S ) ∼ e − β H ( S ) X X H ( S 1 ) = − J ij S i S j − h i S i i<j i sample from model µ } µ =1 ,...,M { S … h O a ( S ) i P ' 1 µ ) X O a ( S M µ e.g. h S i i P , h S i S j i P

  8. Inverse statistical physics From data over observables to models P ( S ) ∼ e − β H ( S ) X X H ( S 1 ) = − J ij S i S j − h i S i i<j i Data: µ } µ =1 ,...,M { S … h O a ( S ) i P ' 1 µ ) X O a ( S M µ e.g. h S i i P , h S i S j i P

  9. Inverse statistical physics How to construct from data? P ( S ) ∼ e − β H ( S ) • coherence with data h O a ( S ) i P = 1 µ ) X O a ( S M µ • maximum entropy principle (least constrained model) X P ( S ) log P ( S ) → max − S ➡ analytical form of model X H ( S ) = − λ a ( S ) O a ( S ) selection of observables a requires priori biological knowledge

  10. Conservation and coevolution in proteins variable conserved active residue residue site R I D H R L K H N D T evolution F L N G R L R H D D T contact H E R Q E T G H E K L K Y R T R L T H D D L R R A M E V G H N K A T Q K E E L A H N K G coevolving residues Profile model (X ) statistical P ( a 1 , ..., a L ) ∼ exp h i ( a i ) modeling i Direct Coupling Analysis (DCA) P ( a 1 , ..., a L ) 8 9 < = X X ∼ exp J ij ( a i , a j ) + h i ( a i ) : ; [Weigt et al, PNAS ’09] i<j i [Morcos et al, PNAS ’11] strong couplings -> residue contacts

  11. Interactions between protein families Family 1 Family 2 >F7XUK6_MIDMI/129-211 >RS14_NEOSM/47-100 LAQQLEKRISFRKAAKRLIQNAM.R......M.G..AEGIKIKISGRIG.G.AEIARDQQ KLNSLPRNSSPARSKNRCSITGR..PRGYY..RKFGI..SRIQLRVLANWGKLPGVVKSS YNEGRVPL..HTLRMMIDYGTAEAH..TTYGRIGVKVWV >I0AI30_IGNAJ/35-88 >B3SEY6_TRIAD/119-201 ALQKLPRNSSVTRLKNRCMFTGR..ARAYY..RKFGV..SRLVLREMALRGEIPGLKKSS VAEQLEKKVSFRKAVKRAISNAM.K......M.G..AKGIKISVSGRLG.G.AEIARTEW >I6YSF0_MELRP/36-88 YKEGRVPL..HTLRAIVKYDMAEAH..TIYGLIGVKVWV .LQLLPRNSAPTRAHNRCLISGR..PRGYY..RKFGI..SRLVLREMALRGEIPGLKKSS >RS3_ORITB/122-204 >I0IIH6_PHYMF/34-87 IAQQLERRQSFKKVMKKAIHASM.K......Q.G..AKGIKIICSGRLG.G.VEIARSES ALSQLPRDASPTRLVTQCAITGR..TRAVY..RKFNV..SRIVLRELALQGKIPGMKKAS ? YKEGRVPL..QTIRADIRYAFAEAI..TTYGVIGVKVWV >RS14_CHLT3/35-88 >RS3_RICPR/123-205 ALRKLPRDSSPTRLKNRCSITGR..AKGVY..KKFGL..CRHILRKYALEGKIPGMKKAS IAAQLEKRVSFRKAMKTAIQASF.K......Q.G..GQGIRVSCSGRLG.G.AEIARTEW >RS14_PROA2/35-88 YIEGRMPL..HTLRADIDYSTAEAI..TTYGVIGVKVWI ALSKLPRNSSATRVRNRCVLTGR..GRGVY..EKFGL..CRHMFRKLALEGKIPGVKKAS >E1X0L6_HALMS/119-201 >D6XYV1_BACIE/35-88 IASQLEKRVAFRRAMKKVMQSAF.R......A.G..VKGIRVRTAGRLG.G.AEMARAEG ALSKLPRDSAPSRLTRRCKATGR..PRGVL..RKFEL..SRIKFRELAHKGQIPGVRKAS YSERKVPL..HTLRADIDYSTAEAH..TTYGVIGVKVWV >I0JIY2_HALH3/35-88 >I7HEJ8_9HELI/120-202 ALRKLPRDSSPTRVKRRCELSGR..PRGYM..RKFDM..SRIAFRELAHKGQIPGVKKAS IATQLEKRVAFRRAMKKVMQAAM.K......A.G..AKGIKVKVSGRLA.G.AEMARTEW >RS14_EXIS2/36-88 YMEGRVPL..HTLRAKIDYGFAEAM..TTYGIIGVKVWI .LSKLPRNSSAVRLHNRCSITGR..PHGYI..GKFGI..SRIKFRDLAHKGQIPGVKKAS >M4VDL1_9DELT/120-202 >RS14_STRR6/36-88 IAMQLEKRISWRRALKKAIAAAT.K......G.G..VRGIKVRVSGRLD.G.AEIARSEW .LSKLPRNASPTRLHNRCRVTGR..PHSVY..RKFGL..SRIAFRELAHKGQIPGVTKAS YNEKSVPL..HTLRADIDYGTAEAL..TAYGIIGMKVWI >G0VNI1_MEGEL/35-88 >RS3_HYPNA/120-202 ALSQLPANASPVRLHNRCKVTGR..PHGYM..RKFGI..CRITFRELAYKGQIPGVKKAS IARQLERRASFRRAMKRSIQSAM.R......L.G..AEGVKVVVSGRLG.G.AEIARTEK >R7PS46_9FIRM/35-88 YAEGSVPL..HTLRADIDYGTAEAT..TTYGIIGVKVWV ALSKLPRNASPTRLHNRCKLTGR..PHGYL..RKFGV..CRNQFRELAYRGEIPGVRKAS >C0QW02_BRAHW/94-176 >F8L373_SIMNZ/47-100 VARQLEMRVAFRRAMKSVITQAM.K......K.G..AKGIKVMCSGRLA.G.ADIARTEQ KLNSLPKNSSPIRRRNRCKMTGR..CRGYL..RKFQI..SRLCFREMANDGSIPGVVKAS YKNGSVPL..HTLRANIDYGTAEAL..TTFGIIGIKVWI >F8L0V7_PARAV/47-100 >J9Z1W5_9PROT/119-201 ALNKMPRDSSPIRLRNRCQLTGR..XRGYL..RKFKL..SRLTFREMALAGLLPGVTKSS IARQLEKRVAFRKAMKKSGQSAI.K......L.G..AKGIKIVCGGRLG.G.AEIARSEK >D6YVK9_WADCW/47-100 FSEGSVPL..HTLRADIDYATARAL..TTYGIIGIKVWL QLNKMRRDTSPVRLRNRCQITGR..CRGYL..SKFKV..SRLVFREMASIGMIPGVTKSS >RS3_MARMM/120-202 >L7VJR0_9FLAO/35-88 IAQQLERRVAFRRAMKRSMQSAM.R......M.G..AKGCKIVCGGRLG.G.AEIARTEQ ALQKLPKNSCTVRLRNRCKLTGR..SRGYM..RKFGV..SRISFRNLVNFGLIPGVKKSS YNEGSVPL..HTLRADIDYGTCEAK..TAMGIIGIKVWI >C7NDL0_LEPBD/41-94 >G0GFA5_SPITZ/122-204 ELSKLPRNASPTRVRNRCQINGR..PRGYM..REFGI..SRVMFRQLAGEGVIPGVKKSS IAGQLEHRASFRRVMKLAVANAM.K......A.G..VQGIKVRVSGRLG.G.AEIARSEV >RS14_FUSNN/41-94 QMAGRVPL..HTLRADIDYGFAEAR..TTYGVIGVKVWI ELNKLPKDSSAVRKRNRCQLDGR..PRGYM..REFGI..SRVKFRQLAGAGVIPGVKKSS >V6DFZ5_9DELT/122-204 >K0P015_9BACT/35-88 ISEQLEKRGSFKKAMKRAALDVM.K.......SG..AKGVKIRCAGRLG.G.AEIARDEW ALDKLPKNSSPVRLRNRCNITGR..ARGYI..RRFGI..SRLVFRKWALEGKLPGIRKAS IRVGSTPL..HTLRSDIDYGFVEAH..TTYGVIGIKVWI >RS14_AMOA5/35-88 >RS3_NEOSM/120-203 ALDKLPKNASPVRVRNRCKITGR..ARGYM..RKFGI..SRIVFREWAAQGKIPGVIKAS IAFQLEKRSSFRRVIKKAIATVM.R......ESD..VKGVKVACSGRLS.G.AEIARTEV >I4ALV0_FLELS/42-94 FKEGSIPL..HTMRADIDYWVAEAH..TTYGVIGVKVWI .LDKLPKDSSPVRLHNRCRLTGR..PRGYM..RRFGI..CRVVFREMANDGKIPGVTKSS >I0III3_PHYMF/124-207 >RS14_SALRD/35-88 IAEQLAKRASFRRVMKMKAEAAM.N......CGV..CKGVKIMLSGRLG.G.HEMSRSEV ELQKLPRDSSPVRQNNRCELCGR..QRGYL..RKFGV..CRICFRELALEGKIPGIRKAS VSLGSIPL..ATLQANVDYGFAISK..TTYGTIGVKVWI >C7PU84_CHIPD/35-88 >F0SJ92_RUBBR/120-202 ELDQLPRNASPVRLHNRCQLSGR..PKGYM..RHFGM..CRNMFRDLALAGKIPGVRKAS IAQQLGKRGSFRRALKRSMEQVM.D......A.G..AHGVKIELSGRLG.G.AEMSRKEK >F4KWV6_HALH1/35-88 GSRGSIPL..STLQRHVDYGYTTAR..TAQGIIGIKVWI ELDKLPRNSNPIRMHNRCQLTGR..PKGYM..RQFGL..CRVKFREMALYGKIPGITKSS . . . . . .

  12. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  13. Prediction of inter-protein residue contacts joint MSA of protein families protein 1 protein 2 DCA Strong inter-protein couplings predict contacts [Ovchinnikov et al., eLife ’14] response regulator [Weigt et al., PNAS ‘09] histidine kinase

  14. In silico prediction of high-resolution structures of transient protein complexes SK RR SK RR ... ... DCA identifies residue contacts protein monomer structures guided molecular dynamics simulations Spo0B/0F: co-crystal [Zapf et al. (2000)] vs. our model [Schug, MW, Onuchic, Hwa, Szurmant, PNAS ‘09]

  15. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  16. Specific interactions and paralog matching protein family 1 protein family 2 ? General idea: • correct matching shows inter-protein covariation • random matching has no inter-protein covariation ➡ maximise inter-protein covariation computationally • reach 80-90% of accuracy in test cases • simultaneous prediction of interacting paralogs and inter-protein contacts [Gueudré, Baldassi, Zamparo, MW, Pagnani, PNAS ’16] [Bitbol, Dwyer, Colwell, Wingreen, PNAS ’16]

  17. Interactions between protein families What can we learn from the empirical sequence variability: • do the families interact? • which specific proteins interact? • which residues are in contact? ➡ relation between protein structure/function and evolution

  18. Inference of protein-protein interaction networks Bacterial ribosomal proteins Small ribosomal subunit • 20 proteins • 21 interactions (11% of 190 pairs) Large ribosomal subunit • 29 proteins • 29 interactions (7% of 406 pairs) ‣ sparse interaction network [Feinauer, Szurmant, MW, Pagnani, PLoS ONE ’16]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend