Identification of direct residue contacts in protein-protein - - PowerPoint PPT Presentation

identification of direct residue contacts in protein
SMART_READER_LITE
LIVE PREVIEW

Identification of direct residue contacts in protein-protein - - PowerPoint PPT Presentation

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa [ MW et al, PNAS


slide-1
SLIDE 1

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data

Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa

[ MW et al, PNAS 106, 67 (2009)]

slide-2
SLIDE 2

Outline

  • Motivation: Coevolution / sequence correlation of interacting proteins
  • Local inference: Mutual information
  • Global inference: Disentangling direct from indirect coupling
  • Prediction of protein complex structures
  • Outlook
slide-3
SLIDE 3

Protein-protein interactions

Mutation

slide-4
SLIDE 4

Protein-protein interactions

Repair Compensatory mutation

  • inter-protein correlations

Conservation

Use sequence variability of homologous proteins across genomes!

slide-5
SLIDE 5

Two-component signal transduction

  • most common signaling system in bacteria
  • conservation: most SK, RR described by same two HMMs
  • amplification: ~O(10) interacting pairs per genome
  • specificity of interaction: cross-talk between TCS under negative selection
  • genomic location: interacting pairs frequently in same operon

How do these proteins interact?

H P D Signal SK ATPase RR Output ATP Membrane

Histidine Kinase Response Regulator

slide-6
SLIDE 6
  • one known cocrystal structure (Zapf et al., Structure 2000)
  • allows for checking results of sequence analysis

Two-component signal transduction

Spo0F Spo0B

slide-7
SLIDE 7

Data

  • ca. 600 bacterial genomes
  • scanned with Pfam HMMs HisKA, RR

➡ global alignment: ➡ M ~ 7000 SK-RR pairs in same operon: ➡ correlations in frequency counts = contact pair in dimer ?

NSK = 87, NRR = 117

... ...

SK RR species 1 species 2 ... i fi(Ai) j fj(Aj) fij(Ai, Aj)

slide-8
SLIDE 8

10 20 30 40 50

min separation of atoms (A)

0.1 0.2 0.3 0.4

MI MI

0.2 0.4 0.6 0.8 1

sensitivity

0.5 1

specificity MI rand MI

(t)

A B

Mutual information as covariance measure

MIij =

  • Ai,Aj

fij(Ai, Aj) log fij(Ai, Aj) fi(Ai)fj(Aj) − MI(0)

ij

slide-9
SLIDE 9

Direct vs. indirect interaction

MIij i i i j j j

➡ need to consider i and j in context of other residues

slide-10
SLIDE 10

Statistical model learning

  • model data via global distribution such that
  • maximum-entropy model:

➡ disordered 21-states Potts model

P(A1, ..., ANSK+NRR) Pij(Ai, Aj) =

  • {Ak | k=i,j}

P(A1, ..., ANSK+NRR) = fij(Ai, Aj) P(A1, ..., ANSK+NRR) ∼ exp   −

  • i<j

eij(Ai, Aj) +

  • i

hi(Ai)    −

  • {Ai}

P(A1, ...., ANSK+NRR) ln P(A1, ...., ANSK+NRR) → max

slide-11
SLIDE 11

Statistical model learning (II)

Computational problem: Inverse Potts problem

➡ determine model parameters coherent with data ➡ solved via iterative two-step procedure:

(i) given test parameters, estimate two-site distributions (MCMC, message passing) (ii) update parameters

➡ introduce direct information as measure for direct coupling

(MI due to single link) H =

  • ij

eij(Ai, Aj) −

  • i

hi(Ai) ∆eij(Ai, Aj) = ε [Pij(Ai, Aj) − fij(Ai, Aj)]

slide-12
SLIDE 12

0.1 0.2 0.3 0.4

MI

0.02 0.04 0.06 0.08 0.1

DI

272,14 291,21 272,18 275,22 271,18 298,14 267,15 268,14 275,21 272,22 268,18

251,22 257,56 251,84 251,56 251,87 251,90 268,22 257,84 264,84 264,90 252,56 257,90 252,90 251,95 251,94 252,84 264,56 257,99

291,22 272,21 294,21 294,14 268,15

252,99 251,99 264,99 268,56

275,18

Mutual information vs. direct information

99 264 275 271 268 267 272 257 252 251 22 21 18 14 15 56 87 84 95 94 90

α1 α4 α1 α2

N N C C

294

HK853 Spo0F

298 291

  • high DI = spatial vicinity, defines interaction surface
  • low DI = far in 3D structure, but important for phosphotransfer

(independent evidence from mutation and NMR studies)

slide-13
SLIDE 13

10 20 30 40 50

min separation of atoms (A)

0.03 0.06 0.09 0.12

DI

0.1 0.2 0.3 0.4

MI MI DI

0.2 0.4 0.6 0.8 1

sensitivity

0.5 1

specificity MI DI rand MI

(t)

A B

Direct / mutual information vs. distance

slide-14
SLIDE 14

Spo 0 B Spo 0 F Native Distance Simulation Parameter Distance in Prediction 3 7 1 5 7.2 Å 5.5 Å 5.6 Å 3 8 1 4 6.1 Å 5.5 Å 5.8 Å 4 1 1 8 7.1 Å 5.5 Å 5.8 Å 4 2 1 8 6.8 Å 5.5 Å 9.5 Å 4 2 1 4 8.9 Å 5.5 Å 9.3 Å 4 5 2 2 8.5 Å 5.5 Å 11.2 Å

Predicting complexed protein structures

Input:

  • monomer structures of Spo0B, Spo0F (native-structure based model)
  • contact residue pairs (attractive pair interactions)

Output:

  • complex structure
  • 3.3A mean-square deviation from known Spo0B/0F co-crystal

work in progress with A. Schug (UCSD)

slide-15
SLIDE 15

Outlook

  • Statistical-physics challenges:
  • inverse Ising / Potts model - reconstruct Hamiltonian from

microscopic configurations

  • finite-sample fluctuations
  • dilution - describe data as good as possible with as few non-zero

links as necessary

  • correlated input sequences (phylogenetic bias)
  • Biological challenges:
  • interactome scale: detect computationally efficient signature for

domain-domain interactions

  • protein-family scale: predictions of specific interaction partners in

case of amplified proteins

  • protein scale: DI informs structural prediction
  • aminoacid scale: molecular recognition code - influence of mutations,

physical interaction mechanisms vs. statistical analysis