Constraint Programming approaches to the Protein Folding Problem. - PowerPoint PPT Presentation

Constraint Programming approaches to the Protein Folding Problem. Agostino Dovier DIMI, University of Udine (IT) www.dimi.uniud.it/dovier www.dimi.uniud.it/dovier/PF

Outline of the talk Basic notions on Proteins • • Introduction to Protein Folding/Structure Prediction Problem • The PFP as a constrained optimization problem ( CLP ( FD )) ◦ Abstract modeling (HP) and solutions ◦ Realistic modeling and solutions Simulation (CCP) approach to the problem • • Other approaches • Conclusions Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 2/56

Proteins Proteins are abundant in all organisms and fundamental to • life. • The diversity of 3D protein structure underlies the very large range of their function: Enzymes—biological catalysts ◦ ◦ Storage (e.g. ferritin in liver) ◦ Transport (e.g. haemoglobin) ◦ Messengers (transmission of nervous impulses—hormones) ◦ Antibodies ◦ Regulation (during the process to synthesize proteins) ◦ Structural proteins (mechanical support, e.g. hair, bone) Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 3/56

Primary Structure • A Protein is a polymer chain (a list ) made of monomers ( aminoacids ). • This list is called the Primary Structure . • The typical length is 50–500. • Aminoacids are of twenty types, called Ala nine (A), Cys teine (C), Asp artic Acid (D), Glu tamic Acid (E), Phe nylalanine (F), Gly cine (G), His tidine (H), I so le ucine (I), Lys ine (K), Leu cine (L), Met hionine (M), As paragi n e (N), Pro line (P), Gl utami n e (Q), Arg inine (R), Ser ine (S), Thr eonine (T), Val ine (V), Tr y p tophan (W), Tyr osine (Y). • Summary: The primary structure of a protein is a list of the form [ a 1 , . . . , a n ] with a i ∈ { A, . . . , Z } \ { B, J, O, U, X, Z } . Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 4/56

Aminoacid Structure ✬ ✩ ⑦ side chain O H ❳❳❳❳❳❳ Cα ✘✘✘✘✘✘ ✿ ✘ ❳❳❳❳ ✘ ✘ ❳ C ′ ③ ✘ ✘ N H H H O ✫ ✪ The backbone is the same for all aminoacids. • The side chain characterizes each aminoacid. • • Side chains contain from 1 (Glycine) to 18 (Tryptophan) atoms. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 5/56

Example: Glycine and Arginine C 2 H 5 NO 2 → 10 atoms C 6 H 14 N 4 O 2 → 26 atoms ✬ ✩ ⑦ Remember the base scheme (9 atoms) ⇒ O H C ❳❳ ✘ ✿ ✘ ❳❳ ✘✘ ❳ C ③ ❳ ✘ ✘ White = H N Blue = N H H ✫ H O ✪ Red = O Grey = C Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 6/56

Example: Alanine and Tryptophan C 3 H 7 NO 2 → 13 atoms C 11 H 12 N 2 O 2 → 27 atoms ✬ ✩ ⑦ White = H O H C ❳❳ ✘ ✿ ✘ ❳❳ ✘✘ ❳ C ③ ❳ ✘ ✘ N Blue = N H H Red = H O O ✫ ✪ Grey = C Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 7/56

Aminoacid’s size Name Chemical Side Chain Name Chemical Side Chain 4 11 A C 3 H 7 NO 2 M C 5 H 11 NO 2 S 4 8 C C 3 H 7 NO 2 S N C 4 H 8 N 2 O 3 16 8( ∗ ) D C 4 H 7 NO 4 P C 5 H 9 NO 2 10 11 E C 5 H 9 NO 4 Q C 5 H 10 N 2 O 3 14 17 F C 9 H 11 NO 2 R C 6 H 14 N 4 O 2 1 5 G C 2 H 5 NO 2 S C 3 H 7 NO 3 11 9 H C 6 H 9 N 3 O 2 T C 4 H 9 NO 3 13 15 I C 6 H 13 NO 2 Y C 9 H 11 NO 3 15 10 K C 6 H 14 N 2 O 2 V C 5 H 11 NO 2 13 18 L C 6 H 13 NO 2 W C 11 H 12 N 2 O 2 Images from: http://www.chemie.fu-berlin.de/chemistry/bio/amino-acids_en.html Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 8/56

Primary Structure, detailed The primary structure is a linked list of aminoacids. • • The terminals H (left) and OH (right) are lost in the linking phase. ✬ ✩ ✬ ✩ ✬ ✩ O H ⑦ ⑦ H ✘✘✘ C ′ ✘ ✿ N ❳❳❳ ✘✘✘✘✘✘✘ ❳❳❳❳ ✿ O H Cα . . . Cα ❳❳❳❳ ✘✘✘✘ ③ ✘ ❳❳❳ C ′ ✘ ❳❳❳ C ′ ❳❳❳❳ ✘ ✘ ③ ✘ ✘ N N ③ Cα ⑦ ✫ ✪ H H H H H O O ✫ ✪ ✫ ✪ Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 9/56

The Secondary Structure Locally, a protein can assume two particular forms: • α -helix β -sheet • This information is the Secondary Structure of a Protein. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 10/56

The Tertiary Structure • The complete 3D conformation of a protein is called the Ter- tiary Structure . • Proteins fold in a determined environment (e.g. water) to form a very specific geometric pattern ( native state ). • The native conformation is relatively stable and unique and ( Anfinsen ’s hypothesis) is the state with minimum free energy. • The tertiary structure determines the function of a Protein. • ∼ 26000 structures (most of them redundant) are stored in the PDB. The number of possible proteins of length ≤ 500 is • 20 1 + 20 2 + · · · + 20 500 = O (20 501 ) ∼ 10 651 The secondary structures is believed to form before the ter- • tiary. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 11/56

Example: Tertiary Structure of 1ENH Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 12/56

The Protein Folding Problem • The Protein Structure Prediction (PSP) problem consists in predicting the Tertiary Structure of a protein, given its Primary Structure. • The Protein Folding (PF) Problem consists in predicting the whole folding process to reach the Tertiary Structure. • Sometimes the two problems are not distinguished. • A reliable solution is fundamental for medicine, agriculture, In- dustry. • Let us focus on the PSP problem, first. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 13/56

The PSP Problem • Anfinsen: the native state minimizes the whole protein energy. Two problems emerge. 1 Energy model: ◦ What is the energy function E ? ◦ It depends on what? 2 Spatial Model: Assume E be known, depending on the aminoacids a 1 , . . . , a n and on their positions, what is the search’s space where looking for the conformation minimizing E ? ◦ Lattice (discrete) models. ◦ Off-lattice (continuous) models. After a solution/choice for (1) and (2) is available, we can try • to study and solve the minimization problem • If the solution’s space is finite, a brute-force algorithm can be written. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 14/56

The PSP as a minimization problem • We give a general formal definition of the problem, under the assumption that each aminoacid is considered as a whole: a ✬ ✩ sphere centered in its Cα -atom. ✬ ✩ ✬ ✩ O H ⑦ ⑦ H ✘ C ′ ✿ ✘ N ✘✘✘✘ ❳❳ ❳ ✿ O ❳ H ❳❳ Cα ③ ❳ . . . Cα ✘ ✘✘ ✘ ❳ ✘ ❳ ✘ ❳❳ ③ ❳ ✘ ❳ C ′ ✘ ❳ C ′ N ③ ❳ N Cα ⑦ ✫ ✪ H H H ✫ ✪ ✫ ✪ H H O O It emerges from experiments on the known proteins, that the • distance between two consecutive Cα atoms is fixed (3.8˚ A). • Let L be the set of admissible points for each aminoacid. • Given the sequence a 1 . . . a n , a folding is a function ω : { 1 , . . . , n } − → L such that: next( ω ( i ) , ω ( i + 1)) for i = 1 , . . . , n − 1, and ◦ ◦ ω ( i ) � = ω ( j ) for i � = j . Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 15/56

Objective function Assumption : the energy is the sum of the energy contributions • of each pair of non-consecutive aminoacids. It depends on their distance and on their type. The contribu- • tion is of the form en contrib( ω, i, j ). The function to be minimized is therefore: • � E ( ω ) = en contrib( ω, i, j ) 1 ≤ i ≤ n i + 2 ≤ j ≤ n • It is a constrained minimization problem (recall that: next( ω ( i ) , ω ( i + 1)) and ω ( i ) � = ω ( j )). • It is parametric on L , next, and en contrib. next and en contrib are typically non linear. • Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 16/56

A first proposal for the Energy: DILL The aminoacids: Cys (C), Ile (I), Leu (L), Phe (F), Met (M), • Val (V), Trp (W), His (H), Tyr (Y), Ala (A) are hydrophobic (H). • The aminoacids: Lys (K), Glu (E), Arg (R), Ser (S), Gln (Q), Asp (D), Asn (N), Thr (T), Pro (P), Gly (G) are polar (P). • The protein is in water: hydrophobic elements tend to occupy the center of the protein. Consequently, H aminoacids tend to stay close each other. • • polar elements tend to stay in the frontier. Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 17/56

A first proposal for the Energy: DILL This fact suggest an energy definition: if two aminoacids of • type H are in contact (i.e. no more distant than a certain value) in a folding they contribute negatively to the energy. • The aminoacid is considered as a whole: a unique sphere centered in its Cα atom. • The notion of being in contact is naturally formalized in lattice models : one (or more) lattice units . Agostino Dovier CILC’04, Parma, 16 Giugno 2004 – 18/56

Constraint Programming approaches to the Protein Folding Problem. - PowerPoint PPT Presentation

Constraint Programming approaches to the Protein Folding Problem. Agostino Dovier DIMI, University of Udine (IT) www.dimi.uniud.it/dovier www.dimi.uniud.it/dovier/PF Outline of the talk Basic notions on Proteins Introduction to

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Predicting Protein Folding Paths S.Will, 18.417, Fall 2011 Protein Folding by Robotics S.Will,

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Protein Folding In Vitro Biochemistry 412 February 24, 2006 Fersht & Daggett (2002) Cell 108

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Protein Folding Simulation in Concurrent Constraint Programming Luca Bortolussi, Alessandro Dal

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Protein Folding In Vivo Biochemistry 412 March 7 th , 2006 But first, before we talk about in

Basic Rules of Protein Folding Seth Lichter Northwestern University Mechanical Engineering Dept.

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

COST Action IC0603 3rd Management Committee Meeting & Workshop on "Antenna Systems

MODAL AUTOMATA studying modal fixpoint logics one step at a time Yde Venema

Design for a combination of compounds: the balance between theory and practice Peter Lane &

Resolution-Based Uniform Interpolation and Forgetting for Expressive Description Logics Patrick

Towards Distributed Computation of Answer Sets Marco De Bortoli, Federico Igne, Fabio Tardivo,

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Ma rc ia L . Zuc ke r, Ph.D. ZI VD L L C 1 I nte rpre t sta tistic a l a na lyse s a s

Introduction to Quality Engineering SE 350 Software Processes & Product Quality 1 Quality:

Sambuz

Useful Links

Newsletter

Mail Us

Constraint Programming approaches to the Protein Folding Problem. - PowerPoint PPT Presentation

Constraint Programming approaches to the Protein Folding Problem. Agostino Dovier DIMI, University of Udine (IT) www.dimi.uniud.it/dovier www.dimi.uniud.it/dovier/PF Outline of the talk Basic notions on Proteins Introduction to

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Predicting Protein Folding Paths S.Will, 18.417, Fall 2011 Protein Folding by Robotics S.Will,

Protein design Chris Bystroff Biology 12 Apr 2016 1 Protein folding/ protein design folding

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Protein Folding In Vitro Biochemistry 412 February 24, 2006 Fersht &amp; Daggett (2002) Cell 108

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Protein Folding Simulation in Concurrent Constraint Programming Luca Bortolussi, Alessandro Dal

Protein-Protein interactions Reducing the complexity Why are protein-protein interactions

Protein Folding In Vivo Biochemistry 412 March 7 th , 2006 But first, before we talk about in

Basic Rules of Protein Folding Seth Lichter Northwestern University Mechanical Engineering Dept.

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Animal protein production in a Animal protein production in a Animal protein production in a

DNA RNA Protein synthesis AMINO ACIDS PROTEIN Protein degradation FUNCTION Some properties

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

COST Action IC0603 3rd Management Committee Meeting &amp; Workshop on &quot;Antenna Systems

MODAL AUTOMATA studying modal fixpoint logics one step at a time Yde Venema

Design for a combination of compounds: the balance between theory and practice Peter Lane &amp;

Resolution-Based Uniform Interpolation and Forgetting for Expressive Description Logics Patrick

Towards Distributed Computation of Answer Sets Marco De Bortoli, Federico Igne, Fabio Tardivo,

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC &amp; Apache Software Foundation

Ma rc ia L . Zuc ke r, Ph.D. ZI VD L L C 1 I nte rpre t sta tistic a l a na lyse s a s

Introduction to Quality Engineering SE 350 Software Processes &amp; Product Quality 1 Quality:

Sambuz

Useful Links

Newsletter

Mail Us

Protein Folding In Vitro Biochemistry 412 February 24, 2006 Fersht & Daggett (2002) Cell 108

COST Action IC0603 3rd Management Committee Meeting & Workshop on "Antenna Systems

Design for a combination of compounds: the balance between theory and practice Peter Lane &

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Introduction to Quality Engineering SE 350 Software Processes & Product Quality 1 Quality: