Protein Structure Prediction 1 Ram Samudrala, University of - - PowerPoint PPT Presentation

protein structure prediction
SMART_READER_LITE
LIVE PREVIEW

Protein Structure Prediction 1 Ram Samudrala, University of - - PowerPoint PPT Presentation

Protein Structure Prediction 1 Ram Samudrala, University of Washington Rationale for Understanding Protein Structure and Function structure determination Protein sequence structure prediction -large numbers of sequences, including Protein


slide-1
SLIDE 1

1

Ram Samudrala, University of Washington

Protein Structure Prediction

slide-2
SLIDE 2

2

Rationale for Understanding Protein Structure and Function

Protein sequence

  • large numbers of

sequences, including whole genomes

Protein function

  • rational drug design and treatment of disease
  • protein and genetic engineering
  • build networks to model cellular pathways
  • study organismal function and evolution

?

structure determination structure prediction homology rational mutagenesis biochemical analysis model studies

Protein structure

  • three dimensional
  • complicated
  • mediates function
slide-3
SLIDE 3

3

Protein Folding

…-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…

  • ne amino acid

DNA protein sequence unfolded protein native state

spontaneous self-organization (~1 second) not unique mobile inactive expanded irregular

slide-4
SLIDE 4

4

Protein Folding

…-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…

  • ne amino acid

DNA protein sequence unfolded protein native state

spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

slide-5
SLIDE 5

5

unfolded

Protein Folding Landscape

Large multi-dimensional space of changing conformations

free energy folding reaction

molten globule J=10-8 s native J=10-3 s ΔG*

*

RT G

e

*

(J) time jump

Δ −

barrier height

slide-6
SLIDE 6

6

Protein Primary Structure

twenty types of amino acids R H C OH O N H H Cα two amino acids join by forming a peptide bond R Cα H C O N H H N Cα H C O OH R H R Cα H C O N H N Cα H C O R H R Cα H C O N H N Cα H C O R H χ χ χ χ φ φ φ φ ψ ψ ψ ψ each residue in the amino acid main chain has two degrees of freedom (φ and ψ) the amino acid side chains can have up to four degrees of freedom (χ1-4)

slide-7
SLIDE 7

7

Protein Secondary Structure

β α L φ 0 0 ψ

+180 +180

  • 180
  • 180

many φ,ψ combinations are not possible

α helix β sheet (anti-parallel)

N C N C

β sheet (parallel)

slide-8
SLIDE 8

8

Protein Tertiary and Quaternary Structures

Ribonuclease inhibitor (2bnh) Haemoglobin (1hbh) Hemagglutinin (1hgd)

slide-9
SLIDE 9

9

Methods for Determining Protein Structure

Protein sequence

  • large numbers of

sequences, including whole genomes

Protein function

  • rational drug design and treatment of disease
  • protein and genetic engineering
  • build networks to model cellular pathways
  • study organismal function and evolution

?

X-ray crystallography NMR spectroscopy homology rational mutagenesis biochemical analysis model studies

Protein structure

  • three dimensional
  • complicated
  • mediates function

expensive and slow

slide-10
SLIDE 10

10

A Naïve Approach

  • Use the first principles to produce the native conformation of a protein
  • not only the correct structure, but entire energy landscape
  • it would explain dynamic behavior of a protein

Let’s see how this could work…

  • there are only 5 atom types (C, H, O, N, S) , so if we can accurately model

interactions between them, we could get to the solution of the folding problem So, why is it then so complicated…

  • atomic interactions cannot be modeled with sufficient accuracy (plus proteins are
  • nly marginally stable)
  • some phenomena are highly non-linear (for example, Van der Waals forces)
  • large number in the degrees of freedom + modeling water molecules

ab initio !!!

slide-11
SLIDE 11

11

Predictions Needed NOW!!!

  • Pure ab initio approach is out of reach for a long time
  • We must adopt a less purist approach

What should we do?

  • use approximations
  • use all available information
  • vast number of sequences
  • large number of structures
  • functional site information
slide-12
SLIDE 12

12

Methods for Predicting Protein Structure

Protein sequence

  • large numbers of

sequences, including whole genomes

Protein function

  • rational drug design and treatment of disease
  • protein and genetic engineering
  • build networks to model cellular pathways
  • study organismal function and evolution

?

comparative modeling fold recognition ab initio prediction homology rational mutagenesis biochemical analysis model studies

Protein structure

  • three dimensional
  • complicated
  • mediates function
slide-13
SLIDE 13

13

Protein Sequence Database Searching Domain Assignment Multiple Sequence Alignment Homologue in PDB Comparative Modelling Secondary Structure and Disorder Prediction

No Yes

3-D Protein Model Fold Recognition Predicted Fold Sequence-Structure Alignment Ab-initio Structure Prediction

No Yes

Overall Approach

modified from http://bioinf.cs.ucl.ac.uk

slide-14
SLIDE 14

14

Comparative (Homology) Modeling of Protein Structure

  • Aims to produce protein models with high accuracy
  • Proteins that have similar sequences (i.e., related by evolution) have similar three-

dimensional structures

  • A model of a protein whose structure is not known can be constructed if the structure
  • f a related protein has been determined by experimental methods
  • Similarity must be obvious and significant for good models to be built
  • Need ways to build regions that are not similar between the two related proteins
  • Need ways to move model closer to the native structure
slide-15
SLIDE 15

15

Comparative Modeling of Protein Structure

KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * **

… … scan align

build initial model

construct non-conserved side chains and main chains

refine

slide-16
SLIDE 16

16

Let’s Look Closer at Steps of Homology Modeling

  • 1. Template recognition and initial alignment
  • 2. Alignment correction
  • 3. Backbone generation
  • 4. Loop modeling
  • 5. Side-chain modeling
  • 6. Model optimization
  • 7. Model validation
slide-17
SLIDE 17

17

Let’s Look Closer at Steps of Homology Modeling

  • 1. Template recognition and initial alignment
  • 2. Alignment correction
  • 3. Backbone generation
  • 4. Loop modeling
  • 5. Side-chain modeling
  • 6. Model optimization
  • 7. Model validation
slide-18
SLIDE 18

18

Let’s Look Closer at Steps of Homology Modeling

  • 1. Template recognition and initial alignment
  • 2. Alignment correction
  • 3. Backbone generation
  • 4. Loop modeling
  • 5. Side-chain modeling
  • 6. Model optimization
  • 7. Model validation
slide-19
SLIDE 19

19

Recognition of similarity between the target and template Target – protein with unknown structure. Template – protein with known structure. Main difficulty – deciding which template to pick, multiple choices/template structures. Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.

  • 1. Template Recognition
slide-20
SLIDE 20

20

Two Zones of Sequence Alignment

50 100 150 200 50 100

Safe homology modeling zone Twilight zone

Alignment length Sequence identity

slide-21
SLIDE 21

21

1. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned. 2. If two aligned residues are the same, copy their side chain coordinates as well.

  • 3. Backbone Generation
slide-22
SLIDE 22

22

insertion

AHYATPTTT AH---TPSS

deletion

Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict. Approaches to loop modeling:

  • knowledge-based: searches the PDB for loops with known structure
  • energy-based: an energy function is used to evaluate the quality of a loop.

Energy minimization or Monte Carlo.

  • 4. Loop Modeling
slide-23
SLIDE 23

23

Scan database and search protein fragments with correct number of residues and correct end-to-end distances

  • 4. Loop Modeling – Database Approach
slide-24
SLIDE 24

24

Side chain conformations – rotamers. In similar proteins - side chains have similar conformations. If % identity is high - side chain conformations can be copied from template to

  • target. If % identity is not very high - modeling of side chains using libraries of

rotamers and different rotamers are scored with energy functions. Problem: side chain configurations depend on backbone conformation which is predicted, not real

E1 E2 E3

E = min (E1, E2, E3)

  • 5. Side-Chain Modeling
slide-25
SLIDE 25

25

  • Energy optimization of entire structure.
  • Since conformation of backbone depends on conformations of side chains and

vice versa - iterative approach

Predict rotamers Shift in backbone

  • 6. Model Optimization
slide-26
SLIDE 26

26

CASP5 assessors, homology modeling category: “We are forced to draw the disappointing conclusion that, similarly to what observed in previous editions of the experiment, no model resulted to be closer to the target structure than the template to any significant extent.” The consensus is not to refine the model, as refinement usually pulls the model away from the native structure!!

  • 6. Model Optimization???
slide-27
SLIDE 27

27

Historical Perspective on Comparative Modeling

BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

slide-28
SLIDE 28

28

Historical Perspective on Comparative Modeling

CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

slide-29
SLIDE 29

29

Prediction for CASP4 target T128/sodm

Cα RMSD of 1.0 Å for 198 residues (PID 50%)

slide-30
SLIDE 30

30

Prediction for CASP4 target T122/trpa

Cα RMSD of 2.9 Å for 241 residues (PID 33%)

slide-31
SLIDE 31

31

Prediction for CASP4 target T125/sp18

Cα RMSD of 4.4 Å for 137 residues (PID 24%)

slide-32
SLIDE 32

32

Prediction for CASP4 target T112/dhso

Cα RMSD of 4.9 Å for 348 residues (PID 24%)

slide-33
SLIDE 33

33

Prediction for CASP4 target T92/yeco

Cα RMSD of 5.6 Å for 104 residues (PID 12%)

slide-34
SLIDE 34

34

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity

**T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%)

Comparative Modeling at CASP - conclusions

CASP2 fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 fair ~75% ~ 1.0 Å ~ 2.5 Å CASP4 fair ~75% ~ 1.0 Å ~ 2.0 Å CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

slide-35
SLIDE 35

35

  • Aim to solve the structure of all proteins: this is too much work

experimentally!

  • Solve enough structures so that the remaining structures can be inferred

from those experimental structures

  • The number of experimental structures needed depend on our abilities to

generate a model.

Structural Genomics Project

slide-36
SLIDE 36

36

Proteins with known structures Unknown proteins

Structural Genomics Project

slide-37
SLIDE 37

37

  • Goal: to find protein with known structure which best matches a given

sequence

  • Since similarity between target and the closest to it template is not high,

sequence-sequence alignment methods fail

  • Solution: threading – sequence-structure alignment method

Fold Recognition

slide-38
SLIDE 38

38

Fold Recognition

  • The number of possible protein structures/folds is limited (large number of sequences

but few folds)

  • Proteins that do not have similar sequences sometimes have similar three-

dimensional structures

  • A sequence whose structure is not known is fitted directly (or “threaded”) onto a

known structure and the “goodness of fit” is evaluated using a discriminatory function

  • Need ways to move model closer to the native structure

3.6 Å 5% ID

NK-lysin (1nkl) Bacteriocin T102/as48 (1e68)

slide-39
SLIDE 39

39

Fold Recognition

KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * **

… … evaluate fit

build initial model

construct non-conserved side chains and main chains

refine

slide-40
SLIDE 40

40

  • Step 1: Construction of Template Library
  • Step 2: Design of Scoring Function
  • Step 3: Sequence-Structure Alignment
  • Step 4: Template Selection and Model Construction

Only step 1 is relatively easy! Steps in Threading

slide-41
SLIDE 41

41

Target Sequence α & β structure from template structure

Template

Steps in Threading

slide-42
SLIDE 42

42

  • Sequence-structure alignment

– target sequence is compared to all structural templates from the database

Requires:

  • Alignment method

– dynamic programming, Monte Carlo,…

  • Scoring function

– yields relative score for each alternative alignment

Threading – Method for Structure Prediction

slide-43
SLIDE 43

43

A representative set of protein structures extracted from the PDB

  • database. It satisfies the following conditions:

1. The resolution of each representative structure should be good; 2. A good X-ray structure has higher priority than an NMR structure; 3. The sequence identity between any two representatives should be no more than 30%, in order to save computing time. Examples:

  • CATH: http://www.biochem.ucl.ac.uk/bsm/cath/
  • SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
  • PDB_SELECT: http://www.cmbi.kun.nl/gv/pdbsel/

Template Database

slide-44
SLIDE 44

44

  • Contact-based scoring function depends
  • n the amino acid types of two residues

and distance between them.

  • Sequence-sequence alignment scoring

function does not depend on the distance between two residues.

  • If distance between two non-adjacent

residues in the template is less than 8Å, these residues make a contact.

Scoring Function for Threading

slide-45
SLIDE 45

45

) , ( ) , ( ; ) , (

1 ,

Trp Ile w Tyr Ala w S a a w S

N j i j i

+ = = ∑

=

Ala Ile Tyr Trp w - calculated from the frequency of amino acid contacts in PDB

ai - amino acid type of target sequence aligned with the position i of the template N - number of contacts

Scoring Function for Threading

slide-46
SLIDE 46

46

Class work: calculate the score for target sequence “ATPIIGGLPY” aligned to the template structure which is defined by the contact matrix.

* * 10 9 * 8 * 7 * 6 * * 5 * 4 * 3 2 * * * 1 10 9 8 7 6 5 4 3 2 1 0.3 L 0.2 0.4 G 0.4 0.2 0.3 I

  • 0.2
  • 0.1
  • 0.2
  • 0.4

Y

  • 0.2

0.1

  • 0.1
  • 0.4
  • 0.2

P 0.1

  • 0.3
  • 0.2
  • 0.1

0.3 T 0.2

  • 0.2

0.5

  • 0.1
  • 0.1
  • 0.2

A L G I Y P T A

=

=

N j i j i a

a w S

1 ,

) , (

slide-47
SLIDE 47

47

  • Dynamic programming.

“frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that:

) , (

1 ,

=

=

N j i j i b

a w S

b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in the sequence.

  • Monte Carlo

Alignment Algorithms

slide-48
SLIDE 48

48

  • Approximation Algorithm

– Interaction-Frozen Algorithm (A. Godzik et al.) – Monte Carlo Sampling (S.H. Bryant et al.) – Double dynamic programming (D. Jones et al.)

  • Exact Algorithm

– Branch-and-bound (R.H. Lathrop and T.F. Smith) – PROSPECT-I uses Divide-and-conquer (Y. Xu et al.) – Linear programming by RAPTOR (J. Xu et al.)

Pairwise Threading Algorithms

slide-49
SLIDE 49

49

  • Sequence-sequence alignment
  • Sequence-profile alignment
  • Sequence-HMM model alignment

– e.g. SAMT02 (K. Karplus et al.)

  • Profile-sequence alignment

– e.g. PDB-Blast (A. Godzik et al.)

  • Profile-profile alignment

– e.g. PROSPECT-II (Y. Xu et al.)

  • Combinations of several alignments

– e.g. 3DPS (L.A. Kelley et al), SHGU (D. Fischer)

Non-Pairwise Threading Algorithms

slide-50
SLIDE 50

50

  • Correct bond length and bond angles
  • Correct placement of functionally important sites
  • Prediction of global topology, not partial alignment (minimum number of gaps)

>> 3.8 Angstroms

Threading Model Validation

slide-51
SLIDE 51

51

Placement of functionally important sites in threading.

Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase

slide-52
SLIDE 52

52

GenThreader

1. Predicts secondary structures for target sequence 2. Makes sequence profiles (PSSMs) for each template sequence 3. Uses threading scoring function to find the best matching profile

http://bioinf.cs.ucl.ac.uk/psipred

slide-53
SLIDE 53

53

  • Threading models are generally not suitable for things like

drug design

  • Function prediction is only possible if the fold family is only

associated with a single function Threading - Conclusions

slide-54
SLIDE 54

54

Protein Sequence Database Searching Domain Assignment Multiple Sequence Alignment Homologue in PDB Comparative Modelling Secondary Structure Prediction Disorder Prediction

No Yes

3-D Protein Model Fold Recognition Predicted Fold Sequence-Structure Alignment Ab-initio Structure Prediction

No Yes

Overall Approach

http://bioinf.cs.ucl.ac.uk

slide-55
SLIDE 55

55

Ab Initio Methods

slide-56
SLIDE 56

56

What is an atom?

  • Classical mechanics: a solid object
  • Defined by its position (x, y, z), its shape (usually a ball) and

its mass

  • May carry an electric charge (positive or negative), usually

partial (less than an electron)

slide-57
SLIDE 57

57

Atomic interactions Torsion angles Are 4-body Angles Are 3-body Bonds Are 2-body

Non-bonded pair

slide-58
SLIDE 58

58

Forces between atoms Strong bonded interactions

2 0)

( b b K U − =

2 0)

( θ θ − = K U

)) cos( 1 ( φ n K U − =

b θ φ All chemical bonds Angle between chemical bonds Preferred conformations for Torsion angles:

  • ω angle of the main chain
  • χ angles of the sidechains

(aromatic, …)

slide-59
SLIDE 59

59

Forces between atoms: van der Waals interactions

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

6 12

2 ) ( r R r R r E

ij ij ij LJ

ε

1/r12 1/r6 Rij r Lennard-Jones potential

j i ij j i ij

R R R ε ε ε = + = ; 2

slide-60
SLIDE 60

60

Forces between atoms: Electrostatics interactions

r Coulomb potential qi qj

r q q r E

j i

ε πε0 4 1 ) ( =

slide-61
SLIDE 61

61

Some Common force fields in Computational Biology

ENCAD (Michael Levitt, Stanford) AMBER (Peter Kollman, UCSF; David Case, Scripps) CHARMM (Martin Karplus, Harvard) OPLS (Bill Jorgensen, Yale) MM2/MM3/MM4 (Norman Allinger, U. Georgia) ECEPP (Harold Scheraga, Cornell) GROMOS (Van Gunsteren, ETH, Zurich)

Michael Levitt. The birth of computational structural biology. Nature Structural Biology, 8, 392-393 (2001)

slide-62
SLIDE 62

62

Protein Structure Prediction

  • One popular model for protein folding assumes

a sequence of events:

– Hydrophobic collapse – Local interactions stabilize secondary structures – Secondary structures interact to form motifs – Motifs aggregate to form tertiary structure

slide-63
SLIDE 63

63

Protein Structure Prediction

A physics-based approach:

  • find conformation of protein corresponding to a

thermodynamics minimum (free energy minimum)

  • cannot minimize internal energy alone!

Needs to include solvent

  • simulate folding…a very long process!

Folding time are in the ms to second time range Folding simulations at best run 1 ns in one day…

slide-64
SLIDE 64

64

What is a molecular dynamics simulation?

  • Simulation that shows how the atoms in the

system move with time

  • Typically on the nanosecond timescale
  • Atoms are treated like hard balls, and their

motions are described by Newton’s laws.

slide-65
SLIDE 65

65

Why MD simulations?

  • Link physics, chemistry and biology
  • Model phenomena that cannot be observed

experimentally

  • Understand protein folding…
  • Access to thermodynamics quantities (free

energies, binding energies,…)

slide-66
SLIDE 66

66

Characteristic protein motions

> 5 Å 20 ns (20 ps) ms – hrs Global protein tumbling (water tumbling) protein folding 1-5 Å ns – μs Medium scale loop motions SSE formation < 1 Å 0.01 ps 0.1 ps 1 ps Local: bond stretching angle bending methyl rotation

Amplitude Timescale Type of motion

Periodic (harmonic) Random (stochastic)

slide-67
SLIDE 67

67

The Ergodic Hypothesis

  • Time averages = Ensemble Averages

time ensemble

A A =

slide-68
SLIDE 68

68

The Folding @ Home initiative

(Vijay Pande, Stanford University)

http://folding.stanford.edu/

slide-69
SLIDE 69

69

The Folding @ Home initiative

slide-70
SLIDE 70

70

Folding @ Home: Results

1 10 100 1000 10000 100000 1 10 100 1000 10000 100000

experimental measurement (nanoseconds) Predicted folding tim e ( nanoseconds) PPA alpha helix beta hairpin villin

Experiments:

villin: Raleigh, et al, SUNY, Stony Brook BBAW: Gruebele, et al, UIUC beta hairpin: Eaton, et al, NIH alpha helix: Eaton, et al, NIH PPA: Gruebele, et al, UIUC BBAW

http://pande.stanford.edu/

slide-71
SLIDE 71

71

Protein Structure Prediction

DECOYS:

Generate a large number

  • f possible shapes

DISCRIMINATION:

Select the correct, native-like fold Need good decoy structures Need a good energy function

slide-72
SLIDE 72

72

The CASP experiment

  • CASP= Critical Assessment of Structure Prediction
  • Started in 1994, based on an idea from John Moult

(Moult, Pederson, Judson, Fidelis, Proteins, 23:2-5 (1995))

  • First run in 1994; now runs regularly every second year

(CASP6 was held last december)

slide-73
SLIDE 73

73

The CASP experiment: how it works

1) Sequences of target proteins are made available to CASP participants in June-July of a CASP year

  • the structure of the target protein is know, but not yet released

in the PDB, or even accessible 2) CASP participants have between 2 weeks and 2 months over the summer of a CASP year to generate up to 5 models for each of the target they are interested in. 3) Model structures are assessed against experimental structure 4) CASP participants meet in December to discuss results

slide-74
SLIDE 74

74

CASP Statistics

28965 166 87 CASP6 22909 175 67 CASP5 5150 111 43 CASP4 1256 61 43 CASP3 947 72 42 CASP2 100 35 33 CASP1 # of 3D models # of predictors # of Targets Experiment

slide-75
SLIDE 75

75

CASP

Three categories at CASP

  • Homology (or comparative) modeling
  • Fold recognition
  • Ab initio prediction

CASP dynamics:

  • Real deadlines; pressure: positive, or negative?
  • Competition?
  • Influence on science ?

Venclovas, Zemla, Fidelis, Moult. Assessment of progress over the CASP experiments. Proteins, 53:585-595 (2003)

slide-76
SLIDE 76

76

Ab initio prediction of protein structure – concept

  • Go from sequence to structure by sampling the conformational space in a reasonable

manner and select a native-like conformation using a good discrimination function

  • Problems: conformational space is astronomical, and it is hard to design functions that

are not fooled by non-native conformations (or “decoys”)

slide-77
SLIDE 77

77

Ab initio prediction of protein structure

sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = 5100 = 1070 select hard to design functions that are not fooled by non-native conformations (“decoys”)

slide-78
SLIDE 78

78

Sampling conformational space – continuous approaches

  • Most work in the field
  • Molecular dynamics
  • Continuous energy minimisation (follow a valley)
  • Monte Carlo simulation
  • Genetic Algorithms
  • Like real polypeptide folding process
  • Cannot be sure if native-like conformations are sampled

energy

slide-79
SLIDE 79

79

Molecular dynamics

  • Force = -dU/dx (slope of potential U); acceleration, m a(t) = force
  • All atoms are moving so forces between atoms are complicated functions of time
  • Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial
  • Atoms move for very short times of 10-15 seconds or 0.001 picoseconds (ps)

x(t+Δt) = x(t) + v(t)Δt + [4a(t) – a(t-Δt)] Δt2/6 v(t+Δt) = v(t) + [2a(t+Δt)+5a(t)-a(t-Δt)] Δt/6 Ukinetic = ½ Σ mivi(t)2 = ½ n KBT

  • Total energy (Upotential + Ukinetic) must not change with time

new position

  • ld position

new velocity

  • ld velocity

acceleration acceleration

  • ld velocity

n is number of coordinates (not atoms)

slide-80
SLIDE 80

80

Energy minimisation

  • For a given protein, the energy depends on thousands of x,y,z Cartesian atomic

coordinates; reaching a deep minimum is not trivial

  • With convergence, we have an accurate equilibrium conformation and a well-defined

energy value energy number of steps deep minimum starting conformation steepest descent conjugate gradient energy number of steps give up converge RMSD

slide-81
SLIDE 81

81

Monte Carlo simulation

  • Discrete moves in torsion or cartesian conformational space
  • Evaluate energy after every move and compare to previous energy (ΔE)
  • Accept conformation based on Boltzmann probability:
  • Many variations, including simulated annealing (starting with a high temperature so

more moves are accepted initially and then cooling)

  • If run for infinite time, simulation will produce a Boltzmman distribution

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ∝ kT ΔE exp P

slide-82
SLIDE 82

82

Genetic Algorithms

  • Generate an initial pool of conformations
  • Perform crossover and mutation operations on this set to generate a much larger pool of

conformations

  • Select a subset of the fittest conformations from this large pool
  • Repeat above two steps until convergence
slide-83
SLIDE 83

83

Sampling conformational space – exhaustive approaches

enumerate all possible conformations view entire space (perfect partition function) computationally intractable: 5 states/100 residues = 5100 = 1070 possible conformations select must use discrete state models to minimise number of conformations explored

slide-84
SLIDE 84

84

Scoring/energy functions

  • Need a way to select native-like conformations from non-native ones
  • Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms
  • Knowledge-based scoring functions: derive information about atomic properties from a

database of experimentally determined conformations; common parametres include pairwise atomic distances and amino acid burial/exposure.

slide-85
SLIDE 85

85

Requirements for sampling methods and scoring functions

  • Sampling methods must produce good decoy sets that are comprehensive and include

several native-like structures

  • Scoring function scores must correlate well with RMSD of conformations (the better

the score/energy, the lower the RMSD)

slide-86
SLIDE 86

86

Protein Structure

Primary (Sequence) Secondary (Helix/Strand/Coil) and lack of structure (disorder) Quaternary (Complexes) Domain and Tertiary (Fold)

IVGGYTCAANSIPYQ VSLNSGSHFCGGSLI NSQWVVSAAHCYKSR IQVRLGEHNIDVLEG NEQFINAAKIITHPN FNGNTL...

http://bioinf.cs.ucl.ac.uk

slide-87
SLIDE 87

87

Computational Aspects of Structural Genomics

  • D. ab initio prediction
  • C. fold recognition

* * * * * * * * * *

  • B. comparative modeling
  • A. sequence space

* * * * * * * * * * * *

  • E. target selection

targets

  • F. analysis

* *

(Figure idea by Steve Brenner.)