Proteins are social molecules Structural Systems Biology: Modelling - - PDF document

proteins are social molecules
SMART_READER_LITE
LIVE PREVIEW

Proteins are social molecules Structural Systems Biology: Modelling - - PDF document

Proteins are social molecules Structural Systems Biology: Modelling Protein Interactions and Complexes Patrick Aloy cks1 mge1 ssc1 YBR135W - CKS YOR232W - GrpE YJR045C - HSP70 pcl1 pcl1 ccl1 YNL289W - cyclin YDL179W - cyclin


slide-1
SLIDE 1

1

Structural Systems Biology: Modelling Protein Interactions and Complexes

Patrick Aloy

BWS – Feb ‘07

Proteins are social molecules

ccl1 YPR025C - cyclin kin28 YDL108W - pkinase cdc28 YBR160W - pkinase cks1 YBR135W - CKS cln1 YMR199W - cyclin cln2 YPL256C - cyclin clb2 YPR119W - cyclin clb4 YLR210W - cyclin clb3 YDL155W - cyclin clb5 YPR120C - cyclin cln3 YAL040C - cyclin clb6 YGR109C - cyclin clb1 YGR108W - cyclin pho85 YPL031C - pkinase pcl1 YNL289W - cyclin pcl2 YDL127W - cyclin pcl1 YDL179W - cyclin pcl5 YHR071W - cyclin † † † I cdc42 YLR229C - ras ste20 YHL007C - PBD gic2 YDR309C - PBD cla4 YNL298W - PH rdi1 YDL135C - Rho_GDI gsp1 YLR293C - ras yrb2 YIL063C - Ran_BP1 ras1 YOR101W - ras sdc25 YLL016W - RasGEF cdc25 YLR310C - RasGEF ras2 YNL098C - ras rho1 YPR165W - ras sac7 YDR389W - RhoGAP rho4 YKR055W - ras ira2 YOL081W - RasGAP † mge1 YOR232W - GrpE ssc1 YJR045C - HSP70 act1 YFL039C - actin pfy1 YOR122C - profilin spt15 YER148W - TBP TF III B YGR246C - transcrript _fac2 † vam3 YOR106W - Syntaxin sed5 YLR026C - Syntaxin vps45 YGL095C - Sec1 sly1 YDR189W - Sec1 vps33 YLR396C - Sec1 tlg2 YOL018C - Syntaxin I

Gavin*, Aloy* et al, Nature (2006).

A great tool to study complexes (TAP / MS)

50 100 Relative Intensity [%] 1000 1500 2000 2500 3000 m/z M

* *

  • M
slide-2
SLIDE 2

2

URA3 Kluyveromyces lactis ORF TAP

Chromosome PCR product Homologous recombination

Protein

NH2 COOH

TAP

TAP-fusion

ORF

50 100 Relative Intensity [%] 1000 1500 2000 2500 3000 m/z M

* *

  • M
  • Genome-wide analysis of the yeast proteome

ORFs processed 6,466 (30% with clear human orthologues) ORFs with positive homologous recombination 5,474 (85%) Selection of strains expressing TAP-fusion proteins 3,206 (59%) Successful TAP-purifications 1,993 (62%) MALDI-TOF samples 52,000 Protein IDs 36,000 2,760 (non redundant)

Extensive re-purification of complexes

64% of the known complexes were purified more than once Reverse tagging is a means to validate new interactors

Screen ran to saturation Reproducibility rate of 69%

  • n139 repeated purifications

Capturing complex dynamics Can we use our complete screen for complexes in yeast to extract

general biological principles ?

and just for the record: purifications are NOT complexes

slide-3
SLIDE 3

3

De novo definition of protein complexes

V X Y Z

Bait

W V X Y Z

Bait

W

Bait V X Y Z W Bait V X Y Z W V X Y Z Bait W V X Y Z Bait W

Pros:

information on biological re-use

Cons:

no direct interactions

Affinity purification data

Matrix Spoke

Z

Socio-affinity index

A B C B C

A-B S A-C S A-D

  • B-C

M B-D

  • C-D
  • S
  • M

S M S S S

  • M

S S

C D B A D B C

Score Low Med High

TAG TAG TAG TAG

Pair Evidence (Spoke, Matrix)

A(i, j) = S i, j|i= bait + S i, j| j= bait + M i, j

Si, j|i= bait = log( n i, j|i= bait f i

bait n bait f j prey n i= bait prey )

M i, j = log( n i, j

prey

f i

prey f j prey

n prey (n prey − 1) /2

all − baits

) 0,0001 0,001 0,01 0,1 1 5 10 15 20 Interaction Score Interaction Affinity full-length domain 5 10 15 20 25 30 35 40 45

  • 10
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3

Log (Kd) % of interactions All AP Y2H

7 / 54 2 / 13 1 / 13 9 / 54 3 / 13 3 / 13 12 / 54 5 / 13 1 / 13 14 / 54 2 / 13 2 / 13 10 / 54 1 / 13 4 / 13 1 / 54 1 / 13 1 / 54 1 / 13

Biophysical meaning of Socio-affinities

Real affinity ? P < 0.08 APs cover a broad range of Kds

Socio-affinity

Biophysical meaning of Socio-affinities

Physical proximity ?

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15

Interaction Scores % in physical contact PDB Y2H

17 / 921 15 / 719 5 / 95 18 / 269 4 / 124 14 / 197 3 / 20 13 / 28 8 / 16 11 / 20 22 / 34 10 / 19 10 / 19 15 / 22 5 / 8 12 / 23 5 / 5 17 / 22 2 / 2 7 / 7 15 / 22 25 / 30 775 / 1524774 795 / 1524764

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

< 5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14 14-15 > 15

Interaction Scores % in physical contact PDB Y2H

17 / 921 15 / 719 5 / 95 18 / 269 4 / 124 14 / 197 3 / 20 13 / 28 8 / 16 11 / 20 22 / 34 10 / 19 10 / 19 15 / 22 5 / 8 12 / 23 5 / 5 17 / 22 2 / 2 7 / 7 15 / 22 25 / 30 775 / 1524774 795 / 1524764

Socio-affinity

Very good at removing “sticky” proteins

(e.g. Vma2 present in 552 purifications but only good scores with Vma5,Vma10, Vma6 & Rav1)

slide-4
SLIDE 4

4

  • Socio-affinities capture the tendency of two

proteins to be together under different conditions and thus can be used to define complexes

  • It is known that proteins can belong to multiple

complexes

  • We need an iterative clustering procedure to

disentangle the biological redundancy and versatility of protein complex composition

De novo definition of protein complexes

A B C D 10 9 11 6 6 6 10 A B C D 8 7 9 E 5 4 4 4 5 5

  • A B C D E F G H I

A - 10 9 6 5 0 0 0 0 B -

  • 11

6 5 0 0 0 0 C -

  • 6

5 0 0 0 0 D -

  • 0 0 0 0 0

E -

  • 0 0 0 0

F -

  • 10

6 4 G -

  • 4

6 H -

  • 10

I -

  • A B C D E F G H I

A - 8 7 4 5 0 0 0 0 B -

  • 9 4 5 0 0 0 0

C -

  • 4 5 0 0 0 0

D -

  • 0 0 0 0 0

E -

  • 0 0 0 0

F -

  • 8 4 2

G -

  • 2 4

H -

  • 10

I -

  • H

I F G 8 8 4 4 2 2 H I F G 10 6 6 4 4

A B C D E F G H I A B C E D F G H I

  • 2

Score matrix Dendrogram Complexes

Iteration Threshold

Clustering strategy Exploring the parameters space

  • We explored a sensible range of clustering parameters (number of

iterations, penalty values, etc) and generated 1,784 potential sets of protein complexes with varying degrees of stringency

  • We compared each set in terms of accuracy and coverage to a hand-

curated set of protein complexes (Aloy et al. Science, 2004)

  • The best set consisted of 491 complexes with a coverage of 83% and

an accuracy of 78%

  • Known complexes and/or functional variations were in sets with

slightly poorer accuracy and coverage

  • We picked all the sets with values of accuracy and coverage above

70% and clustered the similar complexes

Definitive set of protein complexes

  • We ended up with 5,488 slightly different variations

(isoforms) of 491 complexes

  • The procedure increased the coverage to 90%
  • We retrieved 61% of the 279 previously known

complexes (MIPS + literature mining) and identified, on average, 80% of their components

  • 257 out of the 491 complexes are entirely novel
  • We found no novel components for only 20 of the 279

complexes in our gold-standard set

slide-5
SLIDE 5

5

Modular organisation of protein complexes

  • Core average size 3.1 [1-23]
  • Module average size 2.9 [2-9]
  • Modules associated on average to 3.3 cores

Evidence supporting the modular organisation

Functional requirements

(RNA processing and degradation)

Modularity and cross-talk between functions & compartments

Cell cycle Cell fate Cell transport Defense Energy Environment Metabolism
  • Prot. fate
  • Prot. synthesis
Transcription mRNA processing Signaling Unknown Cell cycle Cell fate Cell transport Defense Energy Metabolism
  • Prot. fate
Signaling Unknown

Modules

  • Prot. synthesis
Environment

Cores

Cell cycle Cell fate Cell transport Defense Energy Environment Metabolism
  • Prot. fate
  • Prot. synthesis
Transcription mRNA processing Signaling Unknown Cell cycle Cell fate Cell transport Defense Energy Metabolism
  • Prot. fate
Signaling Unknown

Modules

  • Prot. synthesis
Environment

Cores

slide-6
SLIDE 6

6

  • Protein networks may provide a molecular frame for the interpretation
  • f “simple” genetic traits: essentiality (only ~20% in yeast)
  • Recent phenotypic screens moved beyond essentiality in single

growth condition

  • Aim at providing phenotypic profiles for each genes

Rationalising phenotypes through complex architecture

5 10 15 ≤50 >50 Similarity score Nb of complexes Random Complex core

Rationalising phenotypes through complex architecture Hierarchical, dynamical and modular

  • rganisation of protein complexes

Gavin*, Aloy*, et al. (2006) Nature Bravo & Aloy (2006) Curr Opin Struct Biol

  • 491 complexes (257 novel) with over 5000 isoforms
  • 147 functional (??) modules

But where are the details?

ras1

YOR101W - ras

sdc25

YLL016W - RasGEF

cdc25

YLR310C - RasGEF

ras2

YNL098C - ras 5W - ras

ira2

OL081W - RasGAP

slide-7
SLIDE 7

7

Can we use 3D structures to understand the interaction space?

ras RhoGAP

  • 1. Interface
  • 2. Specificity

Do homologous proteins interact in the same way ?

Aloy et al. (2003) J Mol Biol

A’’ B’’ A A’ B B’

Chothia & Lesk, EMBO J. 1986

10 Å iRMSD

% sequence identity iRMSD

medium high low

80th percentile

Aloy et al. (2003) J Mol Biol Aloy et al. (2005) Curr Opin Struct Biol

iRMSD vs PID

% Sequence Identity iRMSD

http://www.russell.embl.de/simint

iRMSD vs PID

90th percentile 80th percentile

Ferredoxin-like

Asp transcarbamylase Thr deaminase

Dom1 Dom2 Dom1 Dom2

SH2 – SH3

SH3 SH3 SH2 SH2 lck abl

slide-8
SLIDE 8

8

CDK p25 p18 CKSs Cyclins

Type 1 Type 2 Type 3 Type 4

Interaction Type

(equivalent to the concept of fold)

Aloy & Russell (2004) Nature Biotechol

Structural data Interaction data Functional data Genomic data

fL fc fi fl ne fp N N

  • ns

Civilizati

× × × × × × = *

Aloy & Russell (2004) Nature Biotechol

Is Nature restricted to a few interaction types?

species All FP FN Ints Types

E r r C N N

− −

× × × × =

1

… emulating Cyrus again (Chothia, 1992)

Is the number of Interaction Types limited ?

NTypes = NInts × C × r

FN −1 × r FP × EAll−species

10,000

interaction types EU Sixth Framework IP (~14 Million €)

Year Interactions

1000 2000 3000 4000 5000 6000 7000 8000 9000 1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 New interaction types Interaction types Total available interactions

50 100 150 200 250 300 350 400 1981 1982 1985 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Growth in the number of Interaction types

slide-9
SLIDE 9

9

Can we use 3D structures to understand the interaction space?

ras RhoGAP

  • 1. Interface
  • 2. Specificity

Y053 QCR SHIN CXIO RAG2 I7PP DCRL DRA5 DALA DAP2 DAB2 DC4A S6OG Family A Family B

non-Family B

What about the specificity ?

Structure

Asp Arg Asp Phe Phe Phe

Interface pair potentials

+ +

  • -

Side-chain to side-chain Side-chain to main-chain

InterPreTS

Interaction Prediction through Tertiary Structure

Aloy & Russell, PNAS, 99, 5896, 2002. Aloy & Russell, Bioinformatics. 19, 161, 2003.

YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ... RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...

Score Significance

(Do RHO4 & YFE7 interact?) Alignments

1tx4A PIVLRETVAYLQA-------HALTTE ... YFE7_YEAST PLIISSIFSYMDKIYPDLPNDKVR-T ... 1tx4B KLVIVGDGACGKTCLLIVNSKDQF-- ... RHO4_YEAST KIVVVGDGAVGKTCLLISYVQGTFPT ...

FGF - Receptor

FGF IL-1 Ricin

FGF FHF β-trefoil

slide-10
SLIDE 10

10

Ras binding domains

Blind test on 33 potential binders 22/27 (81%) Correct predictions Z-scores Cases

1 2 3 4 5 6 7

  • 2
  • 1,5 -1
  • 0,5

0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 Bind Unclear Don't Bind

RBPs Ras

Structure-based P-P Yeast interaction network

Aloy & Russell (2005) FEBS lett. (Systems Biology issue)

Putting structure into pathways

Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006

Interactions of known structure Interaction Discovery (‘Omics) Cell Biology (EM)

We can predict interactions, good for us … and now what ? Complex structure prediction

X-ray

Five component complex homology homology homology homology Two-hybrid network

Russell et al, Curr. Opin. Struct Biol. 2004 Aloy et al, Curr. Opin. Struct. Biol. 2005

+

Electron microscopy

slide-11
SLIDE 11

11

Structure-based assembly

  • f protein complexes

from binary interactions

Aloy et al. (2004) Science

Modelling complexes from binary interactions

Same complex Protein A Protein B Protein C Protein D Protein E Protein F

Homologous Proteins

Known Structure

Aloy et al. (2004) Science

3Drepertoire Pipeline

1739 genes 589 multi-protein assemblies 232 complexes 126 purifications 102 manually annotated complexes EM quality 6 - 9 634 proteins

Nearly complete Most individual components & few interactions Most individual components Some individual components No structural information

42 12 20 25 3

Structural Overview

(102 hand-annotated complexes) Aloy et al, Science. 2004

slide-12
SLIDE 12

12

Respiratory Fumarate Reductase S. Putrefaciens (1d4d) Adenylylsufate reducatase A. Fulgidus (1jnr)

Succinate dehydrogenase E.Coli (1nek) templates sharing less than 40% homology

Models are filtered by:

  • Quality of the superpositon

target/template

  • Geometrical clashes

(bumps, interactions made)

  • Quality of contacts

(InterPreTS) In this case:

  • 4/7 domains could be modelled
  • distance to original complex: 8.1A
  • good InterPreTS scores

<25% id <28% id <27% id Fumarate reducatase W. Succinogenes (1qla)

Matthieu Pichaud (EMBL-HD)

Structure-based assembly

  • f protein complexes …

… and networks

A B C D F E K I H J G

Cross-talk Complex from affinity purification Complex from literature, etc. Interaction from two-hybrids Interaction predicted by structure Sequence similarity Similarity inferred through structure

?

Bridge the gap between abstract networks and real cells

Aloy & Russell, Nat. Rev. Mol. Cell. Biol. 2006

slide-13
SLIDE 13

13

Protein interaction network Sub-network / Pathway Interaction interface Whole cell tomogram Macromolecular complex Binary interaction

Building the cell from pieces Understanding cell networks at atomic level

Acknowledgements

Rob Russell Anne-Claude Gavin

Structural Bioinformatics @ IRB Andreas Zanzoni Amelie Stein Sasha Panjkovich Roland Pache