S.Will, 18.417, Fall 2011
Protein Structure Prediction
- Protein = chain of amino acids (AA)
- aa connected by peptide bonds
Protein Structure Prediction Protein = chain of amino acids (AA) aa - - PowerPoint PPT Presentation
Protein Structure Prediction Protein = chain of amino acids (AA) aa connected by peptide bonds S.Will, 18.417, Fall 2011 Amino Acids S.Will, 18.417, Fall 2011 Levels of structure S.Will, 18.417, Fall 2011 Protein Structure Prediction
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
Robert F. Service. Problem solved∗ (∗sort of). Science, 2008.
[this and some following slides inspired by Jinbo Xu, Jerome Waldisp¨ uhl]
S.Will, 18.417, Fall 2011
100 80 60 40 20 C
c tlyA ligne d(%) E a s y T arget difficulty D iffic ult C A S P1 C A S P2 C A S P3 C A S P4 C A S P5 C A S P6 C A S P7
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
after competition
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
without overlaps: Self-Avoiding Walk
H H H P P P
S.Will, 18.417, Fall 2011
without overlaps: Self-Avoiding Walk
H H H P P P
S.Will, 18.417, Fall 2011
without overlaps: Self-Avoiding Walk
H H H P P P
HH-contact
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
H H H P P P
S.Will, 18.417, Fall 2011
H H H P P P
S.Will, 18.417, Fall 2011
H H H P P P
S.Will, 18.417, Fall 2011
H H H P P P
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
y z
S.Will, 18.417, Fall 2011
y z
S.Will, 18.417, Fall 2011
ij)2
ij = ω′(i) − ω′(j)
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
(H=Hydrophobic, P=Postive, N=Negative, X=Neutral)
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
exp(−(E(s, ω′) − E(s, ω)) T )
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
(Here: the energy of each offspring is compared to average energy in population.)
S.Will, 18.417, Fall 2011
smaller than vibration of system ⇒ in order of femtoseconds = 10−15 seconds!
S.Will, 18.417, Fall 2011
S.Will, 18.417, Fall 2011
motifs
structural size
within N- rms deviation structure of the center
clusters and returned to the user as predictions
S.Will, 18.417, Fall 2011
a b c
Hydrophobicresidues Posit ively c harged residues Negat ively c harged residues Polar residues Hydrogen bonds Nonpolar at
s
S.Will, 18.417, Fall 2011
a b
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 18/49
Prediction
– Ab initio folding – Homology modeling – Protein threading
Modeling
Packing
Refinement
The picture is adapted from http://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 19/49
– identification of homologous proteins through sequence alignment – structure prediction through placing residues into “corresponding” positions of homologous structure models
– make structure prediction through identification of “good” sequence- structure fit
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 20/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 21/49
The Best Match DRVYIHPFADRVYIHPFA Query Sequence:
Protein sequence classification database
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 22/49
Prediction
– Ab initio folding – Homology modeling – Protein threading
Modeling
Chain Packing
Refinement
The picture is adapted from http://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 23/49
– “alignment” quality is measured by some statistics-based scoring function – best overall “alignment” among all templates may give a structure prediction
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 24/49
The Best Match DRVYIHPFADRVYIHPFA Query Sequence:
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 25/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 26/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 27/49
how well a residue fits a structural environment: E_s (Fitness score) how preferable to put two particular residues nearby: E_p (Pairwise potential) alignment gap penalty: E_g (gap score)
E= E_p +E_s +E_m +E_g +E_ss
Minimize E to find a sequence-template alignment
sequence similarity between query and template proteins: E_m (Mutation score) How consistent of the secondary structures: E_ss
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 28/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 29/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 30/49
template secondary structure
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 31/49
Could be based on chemical similarity, etc, etc.
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 32/49
1. Each residue as a vertex 2. One edge between two residues if their spatial distance is within given cutoff. 3. Cores are the most conserved segments in the template template
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 33/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 34/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 35/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 36/49
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 37/49
– Can be reduced to MAX-CUT
– Interaction-frozen algorithm (A. Godzik et al.) – Monte Carlo sampling (S.H. Bryant et al.) – Double dynamic programming (D. Jones et al.)
– Branch-and-bound (R.H. Lathrop and T.F. Smith) – PROSPECT-I uses divide-and-conquer (Y. Xu et al.) – Linear programming by RAPTOR (J. Xu et al.)
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 38/49
3x+y<=11
x, y>=0
maximize
z= 6x+5y
Subject to
Linear contraints Linear function x, y integer Integral contraints (nonlinear) Linear Program Integer Program
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 39/49
and core j is aligned to position k at the same time.
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 40/49
a: singleton score parameter b: pairwise score parameter Each y variable is 1 if and only if its two x variable are 1 Each core has only one alignment position
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 41/49
http:// www.bioinformatics.uw aterloo.ca/~j3xu/raptor/ index.php http://robetta.bakerlab.org/index.html http://www.sbg.bio.ic.ac.uk/~phyre/
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 42/49
Backbone Prediction
– Ab initio folding – Homology modeling – Protein threading
Modeling
Chain Packing
Structure Refinement
The picture is adapted from http://www.cs.ucdavis.edu/~koehl/ProModel/fillgap.html
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 43/49
backbone coordinates of a protein, predict the coordinates of the side- chain atoms
structure is a geometric
features
protein structure into some very small blocks
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 44/49
clash Each residue has many possible side-chain positions. Each possible position is called a rotamer. Need to avoid atomic clashes.
0.3 0.2 0.1 0.1 0.1 0.3 0.7 0.6 0.4
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 45/49
Minimize the energy function to obtain the best side-chain packing. Assume rotamer A(i) is assigned to residue i. The side-chain packing quality is measured by
clash penalty
The higher the occurring probability, the smaller the value 0.82
10
1 clash penalty : distance between two atoms :atom radii
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 46/49
complete to achieve an approximation ratio O(N)
[Chazelle et al, 2004]
structure [Dunbrack et al., 2003]
– One of the most popular side-chain packing programs
Eriksson et al, 2001; Kingsford et al, 2004] – The formulation similar to that used in RAPTOR
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 47/49
r possible rotomers
Global conformation energy formulated as 2 parts:
(residues i, j, rotatmers r, s)
j= i+1 N
i=1 N−1
i=1 N
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 48/49
between is and all other sidechains
j≠i
j≠i
http://www.ch.embnet.org/CoursEMBnet/Pages3D08/slides/SIB-PhD-Day2_p.pdf
Eliminate ir iff:
April 22nd, 2009 18.417 Lecture 20: Comparative modeling and side-chain packing 49/49
cause new elimination that couldn’t have happened before