Fitting Protein Chains to Lattices. J an Ma nuch with Daya Gaur - - PowerPoint PPT Presentation

fitting protein chains to lattices
SMART_READER_LITE
LIVE PREVIEW

Fitting Protein Chains to Lattices. J an Ma nuch with Daya Gaur - - PowerPoint PPT Presentation

Fitting Protein Chains to Lattices. J an Ma nuch with Daya Gaur (1st part&partially 2nd part) Shirley Huang, Robert Benkoczi (2nd part) Simon Fraser University Fitting Protein Chains to Lattices. p. 1/ ?? Proteins Proteins are


slide-1
SLIDE 1

Fitting Protein Chains to Lattices.

J´ an Maˇ nuch with Daya Gaur (1st part&partially 2nd part) Shirley Huang, Robert Benkoczi (2nd part) Simon Fraser University

Fitting Protein Chains to Lattices. – p. 1/??

slide-2
SLIDE 2

Proteins

Proteins are polymers constructed from linear sequences (chains) of amino acids. When placed into a solvent they fold into 3D spatial structures minimizing the total energy.

Problem (Protein Folding Problem). How to predict the 3D structure of a protein based on linear sequence of its amino acids?

Fitting Protein Chains to Lattices. – p. 2/??

slide-3
SLIDE 3

Simplified model

too many degrees of freedom impossible to compute the structure precisely for proteins with more than 7 amino acids most of the folding algorithm will first fold the protein chain (see left) simplified models assume that the centers of amino acids (C

’s) are placed into vertices of a

regular lattice with edge size equal to the dis- tance of two consecutive

C ’s in the proteins

(around 3.8Å) Example: into cubic lattices:

Fitting Protein Chains to Lattices. – p. 3/??

slide-4
SLIDE 4

Protein folding in simple models

The simplest protein folding model was introduced by Dill (1985): not only the (centers of) residues are placed to the vertices of a lattice but also the energy function is simplified: instead of considering all different forces affecting the folding process, only hydrophobic interactions between amino acids neighboring in the lattice are considered. The model is called HP (hydrophobic/polar) model.

Fitting Protein Chains to Lattices. – p. 4/??

slide-5
SLIDE 5

Protein folding in simple models

The simplest protein folding model was introduced by Dill (1985): not only the (centers of) residues are placed to the vertices of a lattice but also the energy function is simplified: instead of considering all different forces affecting the folding process, only hydrophobic interactions between amino acids neighboring in the lattice are considered. The model is called HP (hydrophobic/polar) model. Protein folding in HP model was shown to NP-complete in both 3D cubic lattice (Berger, Leighton (1998)) and 2D square lattice(Crescenzi, Goldman, Papadimitriou, Piccolboni, Yannakakis (1998)).

Fitting Protein Chains to Lattices. – p. 4/??

slide-6
SLIDE 6

Accuracy of lattice models

Even though protein folding in lattice models is NP-complete, it is more computationally feasible than in the general model. However, even if we find the optimal fold in a certain lattice model it could be quite far from the real fold. We would like identify those lattice models which have potential to produce folds close to real 3D structures.

Fitting Protein Chains to Lattices. – p. 5/??

slide-7
SLIDE 7

Accuracy of lattice models

Even though protein folding in lattice models is NP-complete, it is more computationally feasible than in the general model. However, even if we find the optimal fold in a certain lattice model it could be quite far from the real fold. We would like identify those lattice models which have potential to produce folds close to real 3D structures.

  • Question. How to measure ability of lattices to represent proteins (certain

types of proteins)?

Fitting Protein Chains to Lattices. – p. 5/??

slide-8
SLIDE 8

Accuracy of lattice models

Even though protein folding in lattice models is NP-complete, it is more computationally feasible than in the general model. However, even if we find the optimal fold in a certain lattice model it could be quite far from the real fold. We would like identify those lattice models which have potential to produce folds close to real 3D structures.

  • Question. How to measure ability of lattices to represent proteins (certain

types of proteins)?

Take 3D structures of known protein (PDB) and find their closest representations in a given lattice. Then measure similarity between the original (PDB) structures and their lattice approximations.

Fitting Protein Chains to Lattices. – p. 5/??

slide-9
SLIDE 9

Protein chain fitting (PCF) problem

Problem. Instance: Equilateral lattice

L with side 1, a sequence of points p = p 1 ; : : : ; p n such that

(P1)

d(p i ; p i+1 ) = 1, for every 1
  • i
  • n, and

(P2)

d(p i ; p j )
  • 1 for every
n
  • j
> i + 1
  • 2,

a distance measure

  • n sequences of points, and a number
K.

Question: Is there a path

l = l 1 ; : : : ; l n in L such that
  • (p;
l )
  • K?

Fitting Protein Chains to Lattices. – p. 6/??

slide-10
SLIDE 10

Protein chain fitting (PCF) problem

Problem. Instance: Equilateral lattice

L with side 1, a sequence of points p = p 1 ; : : : ; p n such that

(P1)

d(p i ; p i+1 ) = 1, for every 1
  • i
  • n, and

(P2)

d(p i ; p j )
  • 1 for every
n
  • j
> i + 1
  • 2,

a distance measure

  • n sequences of points, and a number
K.

Question: Is there a path

l = l 1 ; : : : ; l n in L such that
  • (p;
l )
  • K?

Two most common distance measures are coordinate root mean square deviation c-RMS:

  • RMS
(p; l ) = r n P i=1 d 2 (p i ;l i ) n

Fitting Protein Chains to Lattices. – p. 6/??

slide-11
SLIDE 11

Protein chain fitting (PCF) problem

Problem. Instance: Equilateral lattice

L with side 1, a sequence of points p = p 1 ; : : : ; p n such that

(P1)

d(p i ; p i+1 ) = 1, for every 1
  • i
  • n, and

(P2)

d(p i ; p j )
  • 1 for every
n
  • j
> i + 1
  • 2,

a distance measure

  • n sequences of points, and a number
K.

Question: Is there a path

l = l 1 ; : : : ; l n in L such that
  • (p;
l )
  • K?

Two most common distance measures are distance root mean square deviation d-RMS:

d-RMS(p; l ) = r P 1i<j n [d(p i ;p j )d(l i ;l j )℄ 2 n(n1)=2

Fitting Protein Chains to Lattices. – p. 6/??

slide-12
SLIDE 12

Protein chain fitting (PCF) problem

1A0M protein fitted to cubic lattice 1GUU protein fitted to truncated tetrahedron lattice

Fitting Protein Chains to Lattices. – p. 7/??

slide-13
SLIDE 13

Applications of PCF problem

Measuring accuracy of lattice models, i.e., their ability to represent protein chains. Used in a genetic protein folding algorithm (into lattice). The Cartesian combination operator is used to generate a new chain from two existing lattice chain. As a new chain is most likely an off-lattice chain, a PCF algorithm has to be used. Rabow, Sheraga (1996).

Fitting Protein Chains to Lattices. – p. 8/??

slide-14
SLIDE 14

Existing algorithms for PCF problem

Exponential backtracking algorithm was introduced by Covell, Jernigan (1990). Dynamic programming approximation algorithms were presented in several papers, e.g., Rykunov, Reva, Filkensten (1995). A greedy approach keeping about 500 “best” lattice folds was used in Park, Levitt (1995). Improved DP algorithm: Rabow, Sheraga (1996). The self-consistent mean field procedure: Koehl, Delarue (1998). Mostly approximation algorithms. The complexity of the problem was so far unknown.

Fitting Protein Chains to Lattices. – p. 9/??

slide-15
SLIDE 15

Our results

(1) Determine the complexity of PCF problem: we show that the problem is NP-complete for c-RMS measure and cubic lattice. (2) Use integer programming (package CPLEX) to exactly solve the PCF problem for known proteins (PDB).

Fitting Protein Chains to Lattices. – p. 10/??

slide-16
SLIDE 16

NP-completeness of PCF problem

We use reduction from a planar 3-SAT prob- lem proved by Lichtenstein (1982).

  • Problem. Var-linked planar 3-SAT (VLP-3-SAT)

Instance: A formula

with a set C clauses over a set X of variables in disjunctive normal form such that:

(S1) Every clause contains at most three variables. (S2) Every variable occurs in exactly three clauses,

  • nce negated and twice positive.

(S3) The set

X of variables allows a linear order-

ing, say

x 1 ; : : : ; x n such that the graph G
  • =
(C [ X ; fx ; x 2 2 C or :x 2 2 C g [ fx i x i+1 ; i = 1; : : : ; ng) is planar (here x n+1 = x 1).

Question: Is

satisfiable?

Fitting Protein Chains to Lattices. – p. 11/??

slide-17
SLIDE 17

Reduction

Set

K so that every protein point has to be mapped to one of

the closest lattice points.

Problem (PCF). Instance: Equilateral lattice

L with side 1, a sequence of points p = p 1 ; : : : ; p n such that

(P1)

d(p i ; p i+1 ) = 1, for every 1
  • i
  • n, and

(P2)

d(p i ; p j )
  • 1 for every
n
  • j
> i + 1
  • 2,

a distance measure

  • n sequences of points, and a number
K.

Question: Is there a path

l = l 1 ; : : : ; l n in L such that
  • (p;
l )
  • K?

Fitting Protein Chains to Lattices. – p. 12/??

slide-18
SLIDE 18

Reduction

Set

K so that every protein point has to be mapped to one of

the closest lattice points. We say that a point is flexible if the set of closest lattice points (to this point) has at least two elements. For instance, centers

  • f faces or cubes are flexible.

Fitting Protein Chains to Lattices. – p. 12/??

slide-19
SLIDE 19

Reduction

Set

K so that every protein point has to be mapped to one of

the closest lattice points. We say that a point is flexible if the set of closest lattice points (to this point) has at least two elements. For instance, centers

  • f faces or cubes are flexible.

Basic building blocks : “wire” “flipper”

Fitting Protein Chains to Lattices. – p. 12/??

slide-20
SLIDE 20

Reduction

Set

K so that every protein point has to be mapped to one of

the closest lattice points. We say that a point is flexible if the set of closest lattice points (to this point) has at least two elements. For instance, centers

  • f faces or cubes are flexible.

Basic building blocks (schematic drawings): “wire” “flipper”

Fitting Protein Chains to Lattices. – p. 12/??

slide-21
SLIDE 21

Basic building blocks

“wire” lattice approximations: “flipper” lattice approximations:

Fitting Protein Chains to Lattices. – p. 13/??

slide-22
SLIDE 22

2-clause

Replace every 2-clause in the planar 3-SAT with the following gadget:

1 1 1 1 1 1

Remark: Use other planes to connect subsequences to one sequence.

Fitting Protein Chains to Lattices. – p. 14/??

slide-23
SLIDE 23

3-clause

We need to use the third dimension. 3D-flipper:

q1 q2 q3 q4 q5 q6 q7 q8

Fitting Protein Chains to Lattices. – p. 15/??

slide-24
SLIDE 24

3-clause

We need to use the third dimension. 3D-flipper:

q1 q2 q3 q4 q5 q6 q7 q8

lattice approximations:

p q r p q r p q r p q r p q r

Fitting Protein Chains to Lattices. – p. 15/??

slide-25
SLIDE 25

3-clause (continued)

3-clause has 2 parts: 3D-part and 2D-part:

p q r r1 r2

p q r1 r2

1 1 1

Fitting Protein Chains to Lattices. – p. 16/??

slide-26
SLIDE 26

Variable gadget

x ¬x ¬x x

1 1 1 1

Fitting Protein Chains to Lattices. – p. 17/??

slide-27
SLIDE 27

Connecting clause and variable gadgets

1 1

2-clause gadget for c

  • s1

s2 s3 s4

1 1 1 1

variable gadget for x

  • Fitting Protein Chains to Lattices. – p. 18/??
slide-28
SLIDE 28

Connecting clause and variable gadgets

1 1

2-clause gadget for c

  • s1

s2 s3 s4

1 1 1 1

variable gadget for x

  • Theorem. The problem mapping a sequence of points to a lattice path such

that each point is choosing one of the closest points is NP-complete.

  • Corollary. The PCF problem is NP-complete as well.

Fitting Protein Chains to Lattices. – p. 18/??

slide-29
SLIDE 29

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Constraints:

Fitting Protein Chains to Lattices. – p. 19/??

slide-30
SLIDE 30

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Constraints: (1) each protein edge is mapped to unique lattice edge:

8i = 1; : : : ; n
  • 1;
P (u;v )2L X i;u;v = 1

Fitting Protein Chains to Lattices. – p. 19/??

slide-31
SLIDE 31

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Constraints: (1) each protein edge is mapped to unique lattice edge:

8i = 1; : : : ; n
  • 1;
P (u;v )2L X i;u;v = 1

(2) adjacent proteins edges are mapped to adjacent lattice edges:

8i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v + P (w ;z )2L; v 6=w X i+1;w ;z
  • 1

Fitting Protein Chains to Lattices. – p. 19/??

slide-32
SLIDE 32

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Constraints: (2) adjacent proteins edges are mapped to adjacent lattice edges:

8i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v + P (w ;z )2L; v 6=w X i+1;w ;z
  • 1

(3) (path 1) two mapped lattice edges have different terminal vertices:

8v 2 L; P i=1;::: ;n1; (u;v )2L X i;u;v
  • 1

Fitting Protein Chains to Lattices. – p. 19/??

slide-33
SLIDE 33

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Constraints: (3) (path 1) two mapped lattice edges have different terminal vertices:

8v 2 L; P i=1;::: ;n1; (u;v )2L X i;u;v
  • 1

(4) (path 2) initial vertex of lattice edge mapped from

(p 1 ; p 2 ) is

different from terminal vertex of any mapped lattice edge:

8(u; v ) 2 L; X 1;u;v + P i=1;::: ;n1; (w ;u)2L X i;w ;u
  • 1

Fitting Protein Chains to Lattices. – p. 19/??

slide-34
SLIDE 34

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Objective function: minimize

P i=1;::: ;n1; (u;v )2L i;u;v
  • X
i;u;v,

where

1;u;v = d 2 (p 1 ; u) + 1 2 d 2 (p 2 ; v ), i;u;v = 1 2 d 2 (p i ; u) + 1 2 d 2 (p i+1 ; v ), for i = 2; : : : ; n
  • 2
n1;u;v = 1 2 d 2 (p n1 ; u) + d 2 (p n ; v ).

Fitting Protein Chains to Lattices. – p. 20/??

slide-35
SLIDE 35

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Objective function: minimize

P i=1;::: ;n1; (u;v )2L i;u;v
  • X
i;u;v,

where

1;u;v = d 2 (p 1 ; u) + 1 2 d 2 (p 2 ; v ), i;u;v = 1 2 d 2 (p i ; u) + 1 2 d 2 (p i+1 ; v ), for i = 2; : : : ; n
  • 2
n1;u;v = 1 2 d 2 (p n1 ; u) + d 2 (p n ; v ).

Size of the problem (number of variable occurrences): (1)

O (njLj); (2) O (njLj 2 )

; (3)

O (njLj); (4) O (njLj)

Fitting Protein Chains to Lattices. – p. 20/??

slide-36
SLIDE 36

IP formulation of PCF problem

First approach: Edge Model I. Binary variables: for

i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v = 1 if protein edge (p i ; p i+1 ) is mapped to lattice edge (u; v ).

Objective function: minimize

P i=1;::: ;n1; (u;v )2L i;u;v
  • X
i;u;v,

where

1;u;v = d 2 (p 1 ; u) + 1 2 d 2 (p 2 ; v ), i;u;v = 1 2 d 2 (p i ; u) + 1 2 d 2 (p i+1 ; v ), for i = 2; : : : ; n
  • 2
n1;u;v = 1 2 d 2 (p n1 ; u) + d 2 (p n ; v ).

Size of the problem (number of variable occurrences): (1)

O (njLj); (2) O (njLj 2 )

; adjacent proteins edges are mapped to adjacent lattice edges:

8i = 1; : : : ; n
  • 1,
(u; v ) 2 L; X i;u;v + P (w ;z )2L; v 6=w X i+1;w ;z
  • 1

(3)

O (njLj); (4) O (njLj)

Fitting Protein Chains to Lattices. – p. 20/??

slide-37
SLIDE 37

IP formulation of PCF problem

Second approach: Edge Model II. (2) adjacent proteins edges are mapped to adjacent lattice edges:

8i = 1; : : : ; n
  • 1,
v 2 L; P (u;v )2L X i;u;v = P (v ;w )2L X i+1;v ;w

Fitting Protein Chains to Lattices. – p. 21/??

slide-38
SLIDE 38

IP formulation of PCF problem

Second approach: Edge Model II. (2) adjacent proteins edges are mapped to adjacent lattice edges:

8i = 1; : : : ; n
  • 1,
v 2 L; P (u;v )2L X i;u;v = P (v ;w )2L X i+1;v ;w

Size of the problem (number of variable occurrences): (1)–(4)

O (njLj)

Fitting Protein Chains to Lattices. – p. 21/??

slide-39
SLIDE 39

Decreasing the number of variables

Theoretically,

jLj is unbounded, as a lattice L is infinite

structure.

i = 1; : : : ; n S i
  • V
(L) p i S i S i X i;u;v u 2 S i v 2 S i+1 (u; v ) 2 L

Fitting Protein Chains to Lattices. – p. 22/??

slide-40
SLIDE 40

Decreasing the number of variables

Theoretically,

jLj is unbounded, as a lattice L is infinite

structure. For every

i = 1; : : : ; n, we want to construct a smallest

possible set

S i
  • V
(L) such that if vertex p i is mapped to

a lattice vertex not in

S i then the objective function is

certainly not minimized.

S i X i;u;v u 2 S i v 2 S i+1 (u; v ) 2 L

Fitting Protein Chains to Lattices. – p. 22/??

slide-41
SLIDE 41

Decreasing the number of variables

Theoretically,

jLj is unbounded, as a lattice L is infinite

structure. For every

i = 1; : : : ; n, we want to construct a smallest

possible set

S i
  • V
(L) such that if vertex p i is mapped to

a lattice vertex not in

S i then the objective function is

certainly not minimized. Once, we determine small enough sets

S i, we consider
  • nly variables
X i;u;v such that u 2 S i, v 2 S i+1 and (u; v ) 2 L, i.e., we assume that other variables have value 0.

Fitting Protein Chains to Lattices. – p. 22/??

slide-42
SLIDE 42

Distance based method

Determine distance

D i such that if p i is mapped to a lattice

point

q such that d(p i ; q ) > D i then the objective function is not
  • minimized. Then set
S i = fq 2 V (L); d(p i ; q ) < D i g.

Example.

  • 1. Determine an upper bound
U on the objective function (using an

existing approximation algorithm, or IP formulation with small size of sets

S i).
  • 2. Let
d i be distance of p i from the closest lattice vertex.
  • 3. Then
D 2 i
  • U
  • P
j 6=i d 2 j.

Alternatively, divide to

p to small chunks (not containing p i),

and use backtracking to determine the minimal cost of fitting each chunk to the lattice. Works for proteins with up to 100 amino acids.

Fitting Protein Chains to Lattices. – p. 23/??

slide-43
SLIDE 43

General method

Use previous method to determine initial

S i’s.

Divide protein

p to chunks. For each chunk p i ; : : : ; p j, for

every

q 2 S i and q 2 S j, compute the minimum cost of

fitting the chunk to the lattice when starting in vertex

q and

finishing in vertex

q
  • 0. Use dynamic programming to glue

chunks together. Based on these values recompute sets

S i for the end

points of the chunks. Decrease the size of other sets

S i by repeatedly using the

formula,

S i := fq 2 S i ; 9q 2 S i1 ; q 00 2 S i+1 ; (q ; q ); (q ; q 00 ) 2 Lg.

Works for all proteins in PDB (they are up to 1200 a.a. long).

Fitting Protein Chains to Lattices. – p. 24/??

slide-44
SLIDE 44

Open problems / Future work

Is PCF problem NP-complete also in 2D? What about other lattices or d-RMS measure? Design and optimize IP formulation for d-RMS measure. What about allowing rigid transformations on the protein sequence of 3D points before fitting it to the lattice?

  • Remark. It can be shown that this problem is NP-complete as well, just

add a “stabilizer” to the one end of the protein sequence.

Fitting Protein Chains to Lattices. – p. 25/??