Protein Folding Simulation in Concurrent Constraint Programming - - PowerPoint PPT Presentation

protein folding simulation in concurrent constraint
SMART_READER_LITE
LIVE PREVIEW

Protein Folding Simulation in Concurrent Constraint Programming - - PowerPoint PPT Presentation

Protein Folding Simulation in Concurrent Constraint Programming Luca Bortolussi, Alessandro Dal Pal` u, Agostino Dovier DIMI, Univ. of Udine (IT) Federico Fogolari DST, Univ. of Verona (IT) Outline of the talk Introduction Concurrent


slide-1
SLIDE 1

Protein Folding Simulation in Concurrent Constraint Programming

Luca Bortolussi, Alessandro Dal Pal` u, Agostino Dovier DIMI, Univ. of Udine (IT) Federico Fogolari DST, Univ. of Verona (IT)

slide-2
SLIDE 2

Outline of the talk

  • Introduction
  • Concurrent framework
  • Testing model
  • Results
  • Future Work
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 2/20

slide-3
SLIDE 3

Proteins

  • Proteins are abundant in nature and fundamental to life.
  • The diversity of 3D protein structure underlies the very large

range of their function (Enzymes, Storage, Transport, Mes- sengers, Antibodies, Regulation, mechanical support).

  • A Protein is a polymer chain made of monomers (aminoacids)
  • f 20 different kinds.
  • Aminoacids have a common part (6 atoms) and a distinguish-

ing part (from 1 to 18 atoms).

  • They are typically identified by one letter in {A, . . . , Z}\{B, J, O, U, X, Z}.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 3/20

slide-4
SLIDE 4

Proteins

  • The Primary Structure is the sequence of aminoacids consti-

tuting a protein.

  • The Secondary Structures of a Protein are local structures (α-

helices, β-sheets) which formation is caused by local forces.

  • The Tertiary Structure, that determines macroscopic proper-

ties and biological functions, is the 3D conformation of the Protein.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 4/20

slide-5
SLIDE 5

Example: Protein 1ENH

  • Primary Structure:

R,P,R,T,A,F,S,S,E,Q, L,A,R,L,K,R,E,F,N,E, N,R,Y,L,T,E,R,R,R,Q, Q,L,S,S,E,L,G,L,N,E, A,Q,I,K,I,W,F,Q,N,K, R,A,K,I

  • Tertiary Structure: All atom Model / Simplified Model
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 5/20

slide-6
SLIDE 6

The Protein Structure Prediction Problem

  • Proteins fold in a determined environment (e.g. water) to form

a very specific geometric pattern (native state/conformation).

  • The native conformation is relatively stable and unique, and

corresponds to a state which minimizes the global free energy.

  • The Protein Structure Prediction problem (PSP) consists in pre-

dicting the Tertiary Structure of a protein, given its Primary Structure.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 6/20

slide-7
SLIDE 7

Approaches to PSP

Several different approaches have been used to tackle PSP:

  • Homology modelling and folding recognition;
  • All atoms simulation using molecular dynamics;
  • Constraint-based approaches in lattices;
  • Ab-initio simulations using simplified models.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 7/20

slide-8
SLIDE 8

CCP simulation framework

  • We encoded the PSP problem into a Concurrent Constraint

(Logic) Programming paradigm.

  • Each aminoacid is associated to an independent process.
  • Each process communicates with the others, and reacts to their

changes of the spatial position.

  • The framework is independent from the spatial model of the

protein and from the energy model.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 8/20

slide-9
SLIDE 9

Communication Strategy

  • Each process, before performing a move, waits for the commu-

nication of a movement of some other aminoacid;

  • The information of position changes of process Pi is stored in

a list Li of logic terms (leaving the tail variable uninstantiated), thus keeping track of the entire history of the folding known to him;

  • Each process, while moving, uses the most recent information

available to him, i.e. the last ground terms of the lists Li;

  • each process, once it has moved, communicates to all other

processes its new position updating its list.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 9/20

slide-10
SLIDE 10

Abstract CCP program

1 simulation(S):- 2 Init=[[I1|_], [I2|_], ..., [In|_]], 3 run(1,S,Init) || 4 run(2,S,Init) || 5 ... || 6 run(n,S,Init). 7 run(ID, S, [P1, P2, ..., Pn]):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 9 ask(T1=[_|_]) -> skip + 10 ask(T2=[_|_]) -> skip + 11 ... + 12 ask(Tn=[_|_]) -> skip, 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 14 updatePosition(ID,S,[L1,..,Ln],NP), 15 tell(TID=[NP|_]), 16 run(ID,S,[P1, ..., Pn]).

  • The main clause is simulation.
  • Init is a variable containing n lists which contain the initial

positions Ii.

  • n concurrent calls to run are called, one for each process -

aminoacid.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 10/20

slide-11
SLIDE 11

Abstract CCP program

1 simulation(S):- 2 Init=[[I1|_], [I2|_], ..., [In|_]], 3 run(1,S,Init) || 4 run(2,S,Init) || 5 ... || 6 run(n,S,Init). 7 run(ID, S, [P1, P2, ..., Pn]):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 9 ask(T1=[_|_]) -> skip + 10 ask(T2=[_|_]) -> skip + 11 ... + 12 ask(Tn=[_|_]) -> skip, 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 14 updatePosition(ID,S,[L1,..,Ln],NP), 15 tell(TID=[NP|_]), 16 run(ID,S,[P1, ..., Pn]).

  • ID is the identification code of the aminoacid.
  • getTails gets the tails of the lists P1,...,Pn and assigns them

to the variables T1,...,Tn.

  • Then the process waits for one of these variables to be instan-

tiated with ask(Ti=[_|_]) -> skip.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 11/20

slide-12
SLIDE 12

Abstract CCP program

1 simulation(S):- 2 Init=[[I1|_], [I2|_], ..., [In|_]], 3 run(1,S,Init) || 4 run(2,S,Init) || 5 ... || 6 run(n,S,Init). 7 run(ID, S, [P1, P2, ..., Pn]):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 9 ask(T1=[_|_]) -> skip + 10 ask(T2=[_|_]) -> skip + 11 ... + 12 ask(Tn=[_|_]) -> skip, 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 14 updatePosition(ID,S,[L1,..,Ln],NP), 15 tell(TID=[NP|_]), 16 run(ID,S,[P1, ..., Pn]).

  • Once this happens it retrieves the last information with getLast.
  • Then it updates its position with updatePosition and communi-

cates its move to all other processes by means of tell.

  • Finally the run procedure is called recursively.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 12/20

slide-13
SLIDE 13

Movement Strategy

The procedure updatePosition works in the following way:

  • The aminoacid randomly chooses a new position, close to the

current one within a given step;

  • Using the most recent information available about the spatial

position of other processes, it computes the energy relative to the choice;

  • It accepts the position using a Montecarlo criterion:
  • If the new energy is lower than the current one, it accepts

the move;

  • If the new energy is greater than the current one, it

accepts the move with probability e−Enew−Ecurrent

T

.

  • This procedure depends on the spatial model adopted.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 13/20

slide-14
SLIDE 14

Movement Strategy

The new position is randomly selected in the following way:

  • We calculate the set of points which keep fixed the distance

with the adjacent neighbours of the aminoacid (a circumference

  • r a sphere);
  • We randomly select a point in this set, close to the current

position;

  • We randomly select a small offset from this point.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 14/20

slide-15
SLIDE 15

Testing Model

✬ ✫ ✩ ✪ ⑦

side chain Cα H

✘ ✘ ✘

N

❳❳ ❳ ③

H

❳❳ ❳ C′

O

✘✘ ✘ ✿

  • Each aminoacid is represented as a single center of interaction,

which corresponds to the Cα atom.

  • The energy function consists of four terms, which take into

account local and global interactions.

  • This model is very simple, but served as a test for the framework.
  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 15/20

slide-16
SLIDE 16

Energy Function

The Energy Function we use is E( s) = ηb Eb( s) + ηa Ea( s) + ηt Et( s) + ηc Ec( s) Eb( s) is the Bond Distance term Eb( s) =

  • 1≤i≤n−1
  • r(si, si+1) − r0

2

Ea( s) is the Bond Angle Bend term Ea( s) =

n−2

  • i=1

− log

 a1 e

βi−β1

σ1

2

+ a2 e

βi−β2

σ2

2 

1 2 3 0.2 0.4 0.6 0.8 1 Radians

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 16/20

slide-17
SLIDE 17

Energy Function

The Energy Function we use is E( s) = ηb Eb( s) + ηa Ea( s) + ηt Et( s) + ηc Ec( s) Et( s) is the Torsional Angle term Et( s) =

n−3

  • i=1

− log

  a1 e

(Φi−φ1)2 (σ1+σ0)2 + a2 e (Φi−φ2)2 (σ2+σ0)2

  

Ec( s) is the Contact Interaction term Ec( s) =

n−3

  • i=1

n

  • j=i+3

 |Pot(si, sj)| r0(si, sj)

r(si, sj)

12

+ Pot(si, sj)

r0(si, sj)

r(si, sj)

6 

2 4 6 0.2 0.4 0.6 0.8 1 Radians 0 r 1r 2 r 3 r Potential

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 17/20

slide-18
SLIDE 18

Implementation

The code is implemented in Mozart. There are two classes:

  • Protein, implements the simulation predicate and coordinates

the process associated with the single aminoacids;

  • Amino, which describes the single aminoacid, and implements all

the methods related to the action, like updatePosition, ask and tell.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 18/20

slide-19
SLIDE 19

Results

  • With our simplistic model we are able to obtain an helix from a

poly-alanine sequence in few seconds.

  • Running a simulation for some PDB proteins, the global mini-

mum of these proteins tends to move into another conformation, which has better energy according to our model. This clearly shows that a more robust energy model is required.

  • Comparing the concurrent simulation with a non–concurrent
  • ne, implemented in C, we noted that the folding pathway is dif-
  • ferent. In fact, in Mozart some amino acids compute the energy

with a partial updated information about the 3D conformation, due to concurrent communication and synchronism.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 19/20

slide-20
SLIDE 20

Future Work

We want to concentrate our efforts in the concurrent framework, developing:

  • new communication strategy to optimize the quantity of mes-

sages passed;

  • a set of cooperative and adaptive strategies, to model the

dynamic evolution of the system. We also want to implement a more realistic representation of the aminoacids, including the side chain, and a more realistic potential energy.

  • L. Bortolussi, A. Dal Pal`

u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 20/20