Identification of direct residue contacts in protein-protein - - PowerPoint PPT Presentation

▶

Apr 21, 2023 150 likes •317 views

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa [ MW et al, PNAS

SLIDE 1

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data

Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa

[ MW et al, PNAS 106, 67 (2009)]

SLIDE 2

Outline

Motivation: Coevolution / sequence correlation of interacting proteins
Local inference: Mutual information
Global inference: Disentangling direct from indirect coupling
Prediction of protein complex structures
Outlook

SLIDE 3

Protein-protein interactions

Mutation

SLIDE 4

Protein-protein interactions

Repair Compensatory mutation

inter-protein correlations

Conservation

Use sequence variability of homologous proteins across genomes!

SLIDE 5

Two-component signal transduction

most common signaling system in bacteria
conservation: most SK, RR described by same two HMMs
amplification: ~O(10) interacting pairs per genome
specificity of interaction: cross-talk between TCS under negative selection
genomic location: interacting pairs frequently in same operon

How do these proteins interact?

H P D Signal SK ATPase RR Output ATP Membrane

Histidine Kinase Response Regulator

SLIDE 6

one known cocrystal structure (Zapf et al., Structure 2000)
allows for checking results of sequence analysis

Two-component signal transduction

Spo0F Spo0B

SLIDE 7

Data

ca. 600 bacterial genomes
scanned with Pfam HMMs HisKA, RR

➡ global alignment: ➡ M ~ 7000 SK-RR pairs in same operon: ➡ correlations in frequency counts = contact pair in dimer ?

NSK = 87, NRR = 117

... ...

SK RR species 1 species 2 ... i fi(Ai) j fj(Aj) fij(Ai, Aj)

SLIDE 8

10 20 30 40 50

min separation of atoms (A)

0.1 0.2 0.3 0.4

MI MI

0.2 0.4 0.6 0.8 1

sensitivity

0.5 1

specificity MI rand MI

(t)

A B

Mutual information as covariance measure

MIij =

Ai,Aj

fij(Ai, Aj) log fij(Ai, Aj) fi(Ai)fj(Aj) − MI(0)

SLIDE 9

Direct vs. indirect interaction

MIij i i i j j j

➡ need to consider i and j in context of other residues

SLIDE 10

Statistical model learning

model data via global distribution such that
maximum-entropy model:

➡ disordered 21-states Potts model

P(A1, ..., ANSK+NRR) Pij(Ai, Aj) =

{Ak | k=i,j}

P(A1, ..., ANSK+NRR) = fij(Ai, Aj) P(A1, ..., ANSK+NRR) ∼ exp   −

eij(Ai, Aj) +

hi(Ai)    −

{Ai}

P(A1, ...., ANSK+NRR) ln P(A1, ...., ANSK+NRR) → max

SLIDE 11

Statistical model learning (II)

Computational problem: Inverse Potts problem

➡ determine model parameters coherent with data ➡ solved via iterative two-step procedure:

(i) given test parameters, estimate two-site distributions (MCMC, message passing) (ii) update parameters

➡ introduce direct information as measure for direct coupling

(MI due to single link) H =

eij(Ai, Aj) −

hi(Ai) ∆eij(Ai, Aj) = ε [Pij(Ai, Aj) − fij(Ai, Aj)]

SLIDE 12

0.1 0.2 0.3 0.4

0.02 0.04 0.06 0.08 0.1

272,14 291,21 272,18 275,22 271,18 298,14 267,15 268,14 275,21 272,22 268,18

251,22 257,56 251,84 251,56 251,87 251,90 268,22 257,84 264,84 264,90 252,56 257,90 252,90 251,95 251,94 252,84 264,56 257,99

291,22 272,21 294,21 294,14 268,15

252,99 251,99 264,99 268,56

275,18

Mutual information vs. direct information

99 264 275 271 268 267 272 257 252 251 22 21 18 14 15 56 87 84 95 94 90

α1 α4 α1 α2

N N C C

294

HK853 Spo0F

298 291

high DI = spatial vicinity, defines interaction surface
low DI = far in 3D structure, but important for phosphotransfer

(independent evidence from mutation and NMR studies)

SLIDE 13

10 20 30 40 50

min separation of atoms (A)

0.03 0.06 0.09 0.12

DI

0.1 0.2 0.3 0.4

MI MI DI

0.2 0.4 0.6 0.8 1

sensitivity

0.5 1

specificity MI DI rand MI

(t)

A B

Direct / mutual information vs. distance

SLIDE 14

Spo 0 B Spo 0 F Native Distance Simulation Parameter Distance in Prediction 3 7 1 5 7.2 Å 5.5 Å 5.6 Å 3 8 1 4 6.1 Å 5.5 Å 5.8 Å 4 1 1 8 7.1 Å 5.5 Å 5.8 Å 4 2 1 8 6.8 Å 5.5 Å 9.5 Å 4 2 1 4 8.9 Å 5.5 Å 9.3 Å 4 5 2 2 8.5 Å 5.5 Å 11.2 Å

Predicting complexed protein structures

Input:

monomer structures of Spo0B, Spo0F (native-structure based model)
contact residue pairs (attractive pair interactions)

Output:

complex structure
3.3A mean-square deviation from known Spo0B/0F co-crystal

work in progress with A. Schug (UCSD)

SLIDE 15

Outlook

Statistical-physics challenges:
inverse Ising / Potts model - reconstruct Hamiltonian from

microscopic configurations

finite-sample fluctuations
dilution - describe data as good as possible with as few non-zero

links as necessary

correlated input sequences (phylogenetic bias)
Biological challenges:
interactome scale: detect computationally efficient signature for

domain-domain interactions

protein-family scale: predictions of specific interaction partners in

case of amplified proteins

protein scale: DI informs structural prediction
aminoacid scale: molecular recognition code - influence of mutations,

Identification of direct residue contacts in protein-protein interactions from multi-species sequence data

Martin Weigt Institute for Scientific Interchange, Torino joint work with R.A. White, H. Szurmant, J.A. Hoch, T. Hwa

Outline

Protein-protein interactions

Mutation

Protein-protein interactions

Repair Compensatory mutation

Conservation

Use sequence variability of homologous proteins across genomes!

Two-component signal transduction

How do these proteins interact?

Histidine Kinase Response Regulator

Two-component signal transduction

Spo0F Spo0B

Data

➡ global alignment: ➡ M ~ 7000 SK-RR pairs in same operon: ➡ correlations in frequency counts = contact pair in dimer ?

NSK = 87, NRR = 117

... ...

SK RR species 1 species 2 ... i fi(Ai) j fj(Aj) fij(Ai, Aj)

min separation of atoms (A)

MI MI

sensitivity

specificity MI rand MI

A B

Mutual information as covariance measure

MIij =

fij(Ai, Aj) log fij(Ai, Aj) fi(Ai)fj(Aj) − MI(0)

Direct vs. indirect interaction

MIij i i i j j j

➡ need to consider i and j in context of other residues

Statistical model learning

➡ disordered 21-states Potts model

P(A1, ..., ANSK+NRR) Pij(Ai, Aj) =

P(A1, ..., ANSK+NRR) = fij(Ai, Aj) P(A1, ..., ANSK+NRR) ∼ exp   −

eij(Ai, Aj) +

hi(Ai)    −

P(A1, ...., ANSK+NRR) ln P(A1, ...., ANSK+NRR) → max

Statistical model learning (II)

Computational problem: Inverse Potts problem

➡ determine model parameters coherent with data ➡ solved via iterative two-step procedure:

(i) given test parameters, estimate two-site distributions (MCMC, message passing) (ii) update parameters

➡ introduce direct information as measure for direct coupling

(MI due to single link) H =

eij(Ai, Aj) −

hi(Ai) ∆eij(Ai, Aj) = ε [Pij(Ai, Aj) − fij(Ai, Aj)]

Mutual information vs. direct information

(independent evidence from mutation and NMR studies)

10 20 30 40 50

min separation of atoms (A)

0.03 0.06 0.09 0.12

DI

0.1 0.2 0.3 0.4

MI MI DI

0.2 0.4 0.6 0.8 1

sensitivity

0.5 1

specificity MI DI rand MI

A B

Direct / mutual information vs. distance

Predicting complexed protein structures

Input:

Output:

work in progress with A. Schug (UCSD)

Outlook

microscopic configurations

links as necessary

domain-domain interactions

case of amplified proteins

physical interaction mechanisms vs. statistical analysis