Structural biomathematics: an overview of molecular simulations and - - PowerPoint PPT Presentation

structural biomathematics an overview of molecular
SMART_READER_LITE
LIVE PREVIEW

Structural biomathematics: an overview of molecular simulations and - - PowerPoint PPT Presentation

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding Structural biomathematics: an overview of molecular simulations and protein structure prediction Bernat Anton Bernat Anton


slide-1
SLIDE 1

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Structural biomathematics: an overview of molecular simulations and protein structure prediction

Bernat Anton

Bernat Anton Structural biomathematics

slide-2
SLIDE 2

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Figure: Parc de Recerca Biomèdica de Barcelona (PRBB).

Bernat Anton Structural biomathematics

slide-3
SLIDE 3

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Contents

1

A Glance at Structural Biology

2

Molecular Simulations

3

Direct-Coupling Analysis for Prediction of Protein Folding

Bernat Anton Structural biomathematics

slide-4
SLIDE 4

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

1

A Glance at Structural Biology

2

Molecular Simulations

3

Direct-Coupling Analysis for Prediction of Protein Folding

Bernat Anton Structural biomathematics

slide-5
SLIDE 5

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

All the biological information of the human body is encoded in

  • ur DNA. Human Genome Project: Sequentiation of the whole

human genome completed on 2001, by Francis Collins (Public Project) & Craig Venter (Celera Genomics). About 3 billion base pairs (A, C, T and G). Estimation of 30000 genes (around 3000bp per gene). Less than 2% of the genome codes for proteins. Unknown function for over the half of the discovered genes!

Bernat Anton Structural biomathematics

slide-6
SLIDE 6

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

All the biological information of the human body is encoded in

  • ur DNA. Human Genome Project: Sequentiation of the whole

human genome completed on 2001, by Francis Collins (Public Project) & Craig Venter (Celera Genomics). About 3 billion base pairs (A, C, T and G). Estimation of 30000 genes (around 3000bp per gene). Less than 2% of the genome codes for proteins. Unknown function for over the half of the discovered genes!

Bernat Anton Structural biomathematics

slide-7
SLIDE 7

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

All the biological information of the human body is encoded in

  • ur DNA. Human Genome Project: Sequentiation of the whole

human genome completed on 2001, by Francis Collins (Public Project) & Craig Venter (Celera Genomics). About 3 billion base pairs (A, C, T and G). Estimation of 30000 genes (around 3000bp per gene). Less than 2% of the genome codes for proteins. Unknown function for over the half of the discovered genes!

Bernat Anton Structural biomathematics

slide-8
SLIDE 8

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

All the biological information of the human body is encoded in

  • ur DNA. Human Genome Project: Sequentiation of the whole

human genome completed on 2001, by Francis Collins (Public Project) & Craig Venter (Celera Genomics). About 3 billion base pairs (A, C, T and G). Estimation of 30000 genes (around 3000bp per gene). Less than 2% of the genome codes for proteins. Unknown function for over the half of the discovered genes!

Bernat Anton Structural biomathematics

slide-9
SLIDE 9

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding Bernat Anton Structural biomathematics

slide-10
SLIDE 10

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding Bernat Anton Structural biomathematics

slide-11
SLIDE 11

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

DNA

Transcription

− → RNA

Translation

− → Protein

1

1Table taken from Wikipedia. Bernat Anton Structural biomathematics

slide-12
SLIDE 12

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Protein structure ??? − → Protein function − → Gene function

Bernat Anton Structural biomathematics

slide-13
SLIDE 13

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

2

Primary: Amino acid linear sequence. Secondary: α-helices and β-strands. Tertiary / Domains: Functionally independent part of the sequence. Quaternary: Multi-subunit complex of domains or proteins.

2Figure taken from C.Branden & J.Tooze, Introduction to Protein Structure. Bernat Anton Structural biomathematics

slide-14
SLIDE 14

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Main question: How can we find the structure of a given protein? Crystallography. Nuclear magnetic resonance spectroscopy. Molecular simulation. Prediction of structure (structural biology). NOT AN EASY TASK!

Bernat Anton Structural biomathematics

slide-15
SLIDE 15

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Main question: How can we find the structure of a given protein? Crystallography. Nuclear magnetic resonance spectroscopy. Molecular simulation. Prediction of structure (structural biology). NOT AN EASY TASK!

Bernat Anton Structural biomathematics

slide-16
SLIDE 16

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

1

A Glance at Structural Biology

2

Molecular Simulations

3

Direct-Coupling Analysis for Prediction of Protein Folding

Bernat Anton Structural biomathematics

slide-17
SLIDE 17

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

3

3Both images were obtained using VMD software Bernat Anton Structural biomathematics

slide-18
SLIDE 18

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Lysine Internal (mechanical) energy of the system

And these are not the only forces and energies implied in a molecular simulation!

Bernat Anton Structural biomathematics

slide-19
SLIDE 19

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

PLC-β2 simulation This simulation lasts around 20ns, with timesteps of 4fs4, using the ACEMD software with the AMBER forcefield. The simulation has been visualized using VMD software. The protein has 708 amino acids, for a total of around 150000 atoms in the simulation (counting water and lipid molecules). In the simulation can be observed the folding of the X/Y linker in

  • rder to cover the hydrophobic active site of the protein.

41ns = 10−9seconds, 1fs = 10−15seconds Bernat Anton Structural biomathematics

slide-20
SLIDE 20

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Afinsen’s Dogma The native structure of a protein is unique and is determined

  • nly by it’s amino acid sequence. The folding to its native state

is almost spontaneous. Levinthal’s Paradox Due to the huge number of degrees of freedom in an unfolded protein, the number of possible conformations is astronomically large. Then... how can proteins fold? Partially folded transition states. Funnel-like energy landscapes. ...?

Bernat Anton Structural biomathematics

slide-21
SLIDE 21

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Afinsen’s Dogma The native structure of a protein is unique and is determined

  • nly by it’s amino acid sequence. The folding to its native state

is almost spontaneous. Levinthal’s Paradox Due to the huge number of degrees of freedom in an unfolded protein, the number of possible conformations is astronomically large. Then... how can proteins fold? Partially folded transition states. Funnel-like energy landscapes. ...?

Bernat Anton Structural biomathematics

slide-22
SLIDE 22

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

1

A Glance at Structural Biology

2

Molecular Simulations

3

Direct-Coupling Analysis for Prediction of Protein Folding

Bernat Anton Structural biomathematics

slide-23
SLIDE 23

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Let X, Y be two (discrete) random variables. The (self-)information of X is I(X) = −log(P(X)). The entropy of X is the measure of uncertainty associated with X: S(X) = E(I(X)). The mutual information of X and Y (also called Kullback-Leibler divergence) is MI(X; Y) =

  • x∈X
  • y∈Y

P(x, y)log P(x, y) P(x)P(y)

  • Maximum Entropy Principle

Given a proposition that expresses testable information, the probability distribution that best represents the current state of knowledge is the one with largest entropy.

Bernat Anton Structural biomathematics

slide-24
SLIDE 24

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Figure: Multiple Sequence Alignment (MSA) for aaTHEP1.

Bernat Anton Structural biomathematics

slide-25
SLIDE 25

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

From the previous MSA let’s define:

fi(A) = ’frequency of apparitions of aa A in the position i of the MSA’ fi,j(A, B) = ’frequency of simultaneous apparitions of aa A and B in respective positions i and j of the MSA’

MIi,j =

  • A,B

fi,j(A, B)ln fi,j(A, B) fi(A)fj(B)

  • Be careful!

By definition, this mutual information of these frequencies is local in the amino acid chain, thus is noised by transitivity of correlations.

Bernat Anton Structural biomathematics

slide-26
SLIDE 26

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

From the previous MSA let’s define:

fi(A) = ’frequency of apparitions of aa A in the position i of the MSA’ fi,j(A, B) = ’frequency of simultaneous apparitions of aa A and B in respective positions i and j of the MSA’

MIi,j =

  • A,B

fi,j(A, B)ln fi,j(A, B) fi(A)fj(B)

  • Be careful!

By definition, this mutual information of these frequencies is local in the amino acid chain, thus is noised by transitivity of correlations.

Bernat Anton Structural biomathematics

slide-27
SLIDE 27

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

We want P(A1, . . . , AL) a general model for the probability of a particular amino acid sequence A1 . . . AL to be member of the iso-structural family under consideration, and such that Pi(A) ≈ fi(A), Pi,j(A, B) ≈ fi,j(A, B), where Pi(A) =

  • Ak=A

P(A1, . . . , AL), Pi,j(A, B) :=

  • Ak=A,B

P(A1, . . . , AL). Many distributions satisfying this: Maximum Entropy Principle!!!!

Bernat Anton Structural biomathematics

slide-28
SLIDE 28

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

We want P(A1, . . . , AL) a general model for the probability of a particular amino acid sequence A1 . . . AL to be member of the iso-structural family under consideration, and such that Pi(A) ≈ fi(A), Pi,j(A, B) ≈ fi,j(A, B), where Pi(A) =

  • Ak=A

P(A1, . . . , AL), Pi,j(A, B) :=

  • Ak=A,B

P(A1, . . . , AL). Many distributions satisfying this: Maximum Entropy Principle!!!!

Bernat Anton Structural biomathematics

slide-29
SLIDE 29

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Optimization problem: maximize S = −

  • Ai|i=1,...,L

P(A1, . . . , AL)lnP(A1, . . . , PL) subject to Pi,j(A, B) = fi,j(A, B) Pi(A) = fi(A) Solution: disordered Q-state Potts model P(A1, . . . , AL) = 1 Z exp   

  • 1≤i<j≤L

ei,j(Ai, Aj) +

  • 1≤i≤L

hi(Ai)    where: the parameters ei,j(Ai, Aj), hi(Ai) are the Lagrange multipliers of the system, Z is the normalization constant (partition function).

Bernat Anton Structural biomathematics

slide-30
SLIDE 30

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Optimization problem: maximize S = −

  • Ai|i=1,...,L

P(A1, . . . , AL)lnP(A1, . . . , PL) subject to Pi,j(A, B) = fi,j(A, B) Pi(A) = fi(A) Solution: disordered Q-state Potts model P(A1, . . . , AL) = 1 Z exp   

  • 1≤i<j≤L

ei,j(Ai, Aj) +

  • 1≤i≤L

hi(Ai)    where: the parameters ei,j(Ai, Aj), hi(Ai) are the Lagrange multipliers of the system, Z is the normalization constant (partition function).

Bernat Anton Structural biomathematics

slide-31
SLIDE 31

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Geometrically, this probability distribution is given by the Boltzmann-Gibbs distribution: P(A1, . . . , AL) = 1 Z e−H(A1,...,AL) Formally, the marginals of this distribution are obtained from ∂lnZ ∂hi(A) = −Pi(A) ∂2lnZ ∂hi(A)∂hj(B) = −Pi,j(A, B) + Pi(A)Pj(B) but the direct computation is computationally prohibitive.

Bernat Anton Structural biomathematics

slide-32
SLIDE 32

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Geometrically, this probability distribution is given by the Boltzmann-Gibbs distribution: P(A1, . . . , AL) = 1 Z e−H(A1,...,AL) Formally, the marginals of this distribution are obtained from ∂lnZ ∂hi(A) = −Pi(A) ∂2lnZ ∂hi(A)∂hj(B) = −Pi,j(A, B) + Pi(A)Pj(B) but the direct computation is computationally prohibitive.

Bernat Anton Structural biomathematics

slide-33
SLIDE 33

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

The Lagrange multipliers can be obtained using Mean Field Aproximation technique5: Introduce a new parameter α in the partition function (via the disturbed Hamiltonian): H(α) =

  • i=1,...,L

exp   α

  • 1≤i<j≤L

ei,j(Ai, Aj) +

  • 1≤i≤L

hi(Ai)    Consider the Legendre transform of the Gibbs free energy F = −lnZ(α) (Gibbs potential): G(α) = lnZ(α) −

  • i=1,...,L
  • A

hi(A)Pi(A).

5Plefka, T., Convergence condition of the TAP equation for the infinite-ranged Ising spin glass model (1982) Bernat Anton Structural biomathematics

slide-34
SLIDE 34

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

The Lagrange multipliers can be obtained using Mean Field Aproximation technique5: Introduce a new parameter α in the partition function (via the disturbed Hamiltonian): H(α) =

  • i=1,...,L

exp   α

  • 1≤i<j≤L

ei,j(Ai, Aj) +

  • 1≤i≤L

hi(Ai)    Consider the Legendre transform of the Gibbs free energy F = −lnZ(α) (Gibbs potential): G(α) = lnZ(α) −

  • i=1,...,L
  • A

hi(A)Pi(A).

5Plefka, T., Convergence condition of the TAP equation for the infinite-ranged Ising spin glass model (1982) Bernat Anton Structural biomathematics

slide-35
SLIDE 35

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Considering the empirical connected correlation matrix: Ci,j(A, B) = fi,j(A, B) − fi(A)fj(B). As a consecuence of the functional form of the Legendre transform hi(A) = ∂G(α) ∂Pi(A) (C−1)i,(A, B) = ∂hi(A) ∂Pj(B) = ∂2G(α) ∂Pi(A)∂Pj(B) Expand the Gibbs potential up to first order Taylor expansion around α = 0: G(α) ≈ G(0) + α∂G(α) ∂α

|α=0

Bernat Anton Structural biomathematics

slide-36
SLIDE 36

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

A computation over the two terms of the Taylor expansion of G leads us to an expression which is easily derivable. First and second derivatives with respect the marginal distributions Pi(A) provide self-consistent equations for the local fields, from which we obtain (C−1)i,j(A, B)|α=0 = −ei,j(A, B), for i = j. Finally, the parameters hi can be estimated imposing empirical single-site frequency counts as marginal distributions and considering gauge conditions: fi(A) =

  • B

Pi,j(A, B).

Bernat Anton Structural biomathematics

slide-37
SLIDE 37

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

A computation over the two terms of the Taylor expansion of G leads us to an expression which is easily derivable. First and second derivatives with respect the marginal distributions Pi(A) provide self-consistent equations for the local fields, from which we obtain (C−1)i,j(A, B)|α=0 = −ei,j(A, B), for i = j. Finally, the parameters hi can be estimated imposing empirical single-site frequency counts as marginal distributions and considering gauge conditions: fi(A) =

  • B

Pi,j(A, B).

Bernat Anton Structural biomathematics

slide-38
SLIDE 38

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

This leads us to the effective pair probabilities PDir

i,j (A, B) =

1 Zi,j exp

  • ei,j(A, B) + ˜

hi(A) + ˜ hj(B)

  • .

From which we can define its Kullback-Leibler divergence, that will be called Direct Information: DIi,j :=

  • A,B

PDir

i,j (A, B)ln

  • PDir

i,j (A, B)

fi(A)fj(B)

  • .

Bernat Anton Structural biomathematics

slide-39
SLIDE 39

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

For each pair of positions in the sequence, we "know" if they are spatially close. And now... what? Depending on the previous, not an unique folding for the protein is possible. We must remove knotted structures (Alexander polynomial or Heegaard Floer homology). A scoring method over the resulting foldings must be defined, in order to decide which one of the structures is better. A short simulation of the system may be run in order to

  • ptimize the energies of the folding.

Bernat Anton Structural biomathematics

slide-40
SLIDE 40

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

For each pair of positions in the sequence, we "know" if they are spatially close. And now... what? Depending on the previous, not an unique folding for the protein is possible. We must remove knotted structures (Alexander polynomial or Heegaard Floer homology). A scoring method over the resulting foldings must be defined, in order to decide which one of the structures is better. A short simulation of the system may be run in order to

  • ptimize the energies of the folding.

Bernat Anton Structural biomathematics

slide-41
SLIDE 41

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

For each pair of positions in the sequence, we "know" if they are spatially close. And now... what? Depending on the previous, not an unique folding for the protein is possible. We must remove knotted structures (Alexander polynomial or Heegaard Floer homology). A scoring method over the resulting foldings must be defined, in order to decide which one of the structures is better. A short simulation of the system may be run in order to

  • ptimize the energies of the folding.

Bernat Anton Structural biomathematics

slide-42
SLIDE 42

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

For each pair of positions in the sequence, we "know" if they are spatially close. And now... what? Depending on the previous, not an unique folding for the protein is possible. We must remove knotted structures (Alexander polynomial or Heegaard Floer homology). A scoring method over the resulting foldings must be defined, in order to decide which one of the structures is better. A short simulation of the system may be run in order to

  • ptimize the energies of the folding.

Bernat Anton Structural biomathematics

slide-43
SLIDE 43

A Glance at Structural Biology Molecular Simulations Direct-Coupling Analysis for Prediction of Protein Folding

Figure: Chewbacca mounted on a squirrel wants to thank you for your assistance!

Bernat Anton Structural biomathematics