Protein Structure Analysis with Protein Structure Analysis with - - PowerPoint PPT Presentation
Protein Structure Analysis with Protein Structure Analysis with - - PowerPoint PPT Presentation
Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with Sequential Monte Carlo Method Sequential Monte Carlo Method Sequential Monte Carlo Method Jinfeng Zhang Computational Biology Lab Department of
Introduction Introduction Introduction
- Structure Function & Interaction
– Protein structure initiative (PSI) is speeding up the information flow from sequence to structures. – Information does not readily flow from structures to structures. – Neither does it readily flow from structures to applications.
- What are the bottle necks?
– Sampling method. – Potential function.
Sampling Methods
- - Folding & Growth
Sampling Methods Sampling Methods
- - Folding & Growth
Folding & Growth
Growth Method Folding Method
From http://www.bioinformatics.buffalo.edu/
Sequential Monte Carlo (SMC)
- - Step by Step
Sequential Monte Carlo (SMC) Sequential Monte Carlo (SMC)
- - Step by Step
Step by Step
. .
Each sample has a weight!
. . . . . . . .
Resampling
SMC
- - Summary
SMC SMC
- - Summary
Summary
- Short chains:
– Exhaustive enumeration, useful for evaluation of SMC performance.
- Long chains:
– Sequential Monte Carlo, estimating interesting properties.
- The main ingredients of SMC are:
– Sequence of distributions “approaching” the target distribution π(x1,…,xn). – Sampling distribution gt+1(xt+1|x1,…,xt). – Resampling scheme.
Reference for SMC Reference for SMC Reference for SMC
- J.S. Liu and R. Chen (1998). SMC for dynamic systems. J
Amer Statist Assoc 93, 1032-45.
- J.S. Liu (2001). Monte Carlo Strategies in Scientific
- Computing. Springer-Verlag.
- J. Liang, J. Zhang, R. Chen, (2002). J. Chem. Phys. 117:7,
3511-3521.
- J. Zhang, R. Chen, C. Tang, and J. Liang, (2003). J. Chem.
- Phys. 118:12, 6102-6109.
- J. Zhang, Y. Chen, R. Chen, and J. Liang, (2004). J. Chem.
- Phys. 121:1, 592-603.
Near Native Structures of Proteins Near Native Structures of Proteins Near Native Structures of Proteins
Native State is an Ensemble of Structures Native State is an Ensemble of Structures Native State is an Ensemble of Structures
Ca2+ ATPase pump Lac repressor
2BBN
- Protein functions and interactions are
determined by the near native structures.
Biological Problems Biological Problems Biological Problems
- Stability
– Probability of NNS under Boltzmann distribution.
- Function
– Analysis of NNS to detect correlated structural changes.
- Interaction
– Near native structures with diversified interfaces.
- Difficulty of protein structure prediction
– Probability of NNS under uniform distribution.
Methods for Studying NNS Methods for Studying NNS Methods for Studying NNS
- Experimental method, such as NMR
– Study one protein at a time. Limited to protein types.
- MD simulation
– Computationally expensive. Applicable for small proteins.
- MCMC
– Folding around the constrained native structure template is not efficient.
- NMR combined with MD
– Vendruscolo M, et. al. Nature (2005), 433:128-32
Near Native Structures
- - Connecting Experimental Structures and Applications
Near Native Structures Near Native Structures
- - Connecting Experimental Structures and Applications
Connecting Experimental Structures and Applications
SMC
Representation of Protein Structures Representation of Protein Structures Representation of Protein Structures
- Optimized discrete state
model (ODSM).
αi τi Ci+1 Ci Ci-1 Ci-2 SCi SCi-1
- Accuracy of ODSM.
3 4 5 6 7 8 9 10 1.5 2.0 2.5 3.0 Discrete State cRMSD
ALA PRO GLY HIS
Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS
Near Native Structures SMC
Native structure
- Definition of NNS:
–Structures with RMSD < 3 Å to native structure. –Other similarity measures are possible.
Comparison with Enumeration I.
- - Estimation of Number of Conformations
Comparison with Enumeration I. Comparison with Enumeration I.
- - Estimation of Number of Conformations
Estimation of Number of Conformations Sample size: 10,000. 1.042×109 1.039×109
10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 5 State SMC
1ail
Comparison with Enumeration II.
- - Estimation of NNS
Comparison with Enumeration II. Comparison with Enumeration II.
- - Estimation of NNS
Estimation of NNS
RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4
L 15 Enum.
RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4
L 15 Enum. L 15 SMC
RMSD Bin: 1: 1.0 Å - 1.5 Å; 2: 1.5 Å - 2.0 Å; 3: 2.0 Å - 2.5 Å; 4: 2.5 Å - 3.0 Å;
5.94 × 10-8 5.60 × 10-8 Sample size: 10,000.
Comparison with Enumeration III.
- - Estimation of Native Contacts
Comparison with Enumeration III. Comparison with Enumeration III.
- - Estimation of Native Contacts
Estimation of Native Contacts
Probability
a
0.2 0.4 0.6 0.8
Enum.
5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0
Probability
Native Contact
b Enum.
Probability
a
0.2 0.4 0.6 0.8
Enum. SMC
5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0
Probability
Native Contact
b Enum.
Probability
a
0.2 0.4 0.6 0.8
Enum. SMC
5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0
Probability
Native Contact
b Enum. SMC
1nkd, RMSD Bin-2: 1.5 Å - 2.0 Å; 1nkd, RMSD Bin-4: 2.5 Å - 3.0 Å;
Probability of NNS
- - How Difficult Protein Structure Prediction is?
Probability of NNS Probability of NNS
- - How Difficult Protein Structure Prediction is?
How Difficult Protein Structure Prediction is?
Probability of NNS for 70 non-homologous proteins grouped by their length with 5 residues per interval.
60 80 100 120 140 −70 −60 −50 −40 −30 −20 −10 RMSD < 3A RMSD < 4A RMSD < 5A
Length log10(Probability)
Probability of NNS
- - Effect of Model Complexity
Probability of NNS Probability of NNS
- - Effect of Model Complexity
Effect of Model Complexity
Average probability of NNS for 8 proteins at partial length and full length.
20 30 40 50 5 15 25 Length log10(N) a
4−state 5−state 6−state 8−state
20 30 40 50 5 15 25 Length log10(N) a
4−state 5−state 6−state 8−state
20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 20 30 40 50 5 15 25 Length log10(N) a
4−state 5−state 6−state 8−state
20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 20 30 40 50 5 15 25 Length log10(N) a
4−state 5−state 6−state 8−state
20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 6−state 20 30 40 50 5 15 25 Length log10(N) a
4−state 5−state 6−state 8−state
20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 6−state 8−state
- 4,5,6,8-state models all have same probability of NNS.
Probability Under Boltzmann Distribution Probability Under
- - Contact Potentials
Probability Under Boltzmann Boltzmann Distribution Distribution
- - Contact Potentials
Contact Potentials
Piotr Pokarowski et. al., PROTEINS, 59:49–57 (2005)
Probability of NNS Under Boltzmann Distributions Probability of NNS Under Probability of NNS Under Boltzmann Boltzmann Distributions Distributions
- Probability of NNS for 32 proteins with length from 31 to 90.
30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Length Bin Uniform distribution of 5−state model 30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Length Bin Uniform distribution of 5−state model Boltzmann distribution of 5−state model 30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Uniform distribution of 5−state model Boltzmann distribution of 5−state model Boltzmann distribution of 6−state model
Length log10(Probability)
- Pair-wise contact potential function stabilize NNS poorly.
Summary for NNS Summary for NNS Summary for NNS
- Sequential Monte Carlo (SMC) for studying
near native structures (NNS).
- Probability of NNS is estimated for proteins up
to length 150.
- Models with different complexities have same
probability of NNS.
- Rigorous evaluation criterion for potential
- functions. Contact potentials do not stabilize
native structures.
Side Chain Modeling Side Chain Modeling Side Chain Modeling
Introduction Introduction Introduction
- Side chain modeling is important for protein
structure prediction, protein interaction, and protein design.
- Most current methods are looking for single
conformation with minimum potential energy.
- In structure prediction, the energy of a
conformation is normally calculated ignoring the side chain conformational entropy.
Questions Questions Questions
- Do structures with similar compactness have similar
side chain conformational entropy?
- Do structures with similar fold have similar side
chain conformational entropy?
- Do native structures have higher side chain entropy
than random structures with similar compactness or similar fold? We address these questions with our new side chain modeling method.
SMC for Side Chain Modeling SMC for Side Chain Modeling SMC for Side Chain Modeling
- Number of side chain
conformations, Nsc.
- Side chain conformational
entropy. Ssc = kBln(Nsc)
- Stability.
- Folding and Packing.
Validation of SMC
- - Comparison with Enumeration
Validation of SMC Validation of SMC
- - Comparison with Enumeration
Comparison with Enumeration
8 10 12 14 16 18 5 10 15 20 25 30 Length log(Nsc)
Enumeration SMC 2ovo 3ebx
The total SAW side chain conformation for a fragment of 3ebx, residue 1-17, is 396,325,923,840 (3.96×1011). The estimated number is 4.01×1011 with a sample size of 1,000 for 10 runs.
Do structures with similar compactness have similar side chain conformational entropy? Do structures with similar compactness Do structures with similar compactness have similar side chain conformational have similar side chain conformational entropy? entropy?
- Structures satisfying:
– same sequence, – similar compactness, – different backbone conformations.
Decoys Structures Decoys Structures Decoys Structures
- Decoys are generated to fool potential functions.
- 24 decoy proteins are selected from 5 decoy sets in
Decoys ‘R’ Us database.
– 4state_reduced: 7 proteins (about 600 structures each protein). – fisa: 3 proteins (500 decoys). – fisa_casp3: 4 proteins (1000-2500 decoys). – lattice_ssfit: 5 proteins (2000 decoys). – lmds: 5 proteins (300-500 decoys).
- Compactness are measured by one of the two
parameters: radius of gyration (Rg) or number of residue contact (Nc).
Side Chain Entropy of Native and Decoys Structures Side Chain Entropy of Native Side Chain Entropy of Native and Decoys Structures and Decoys Structures
400 500 600 700 25 30 35
Native Nc log10(Nsc)
1ctf
On average, the number of side chain conformations for native 1ctf is 105 times more than a decoy structure!
Native vs. Decoys Native vs. Decoys Native vs. Decoys
Protein Nsc Type DecoySet Protein Nsc Type DecoySet
1ctf Y 4state 1r69 Y 4state 1sn3 Y 4state 2cro Y 4state 3icb N M 4state 4pti N S 4state 4rxn N M 4state 1fc2 N I fisa 1hdd-C N I fisa 4icb N M fisa 1bg8-A N S fisa_casp3 1bl0 Y fisa_casp3 1eh2 N M fisa_casp3 smd3 Y fisa_casp3 1beo N S lattice 1dkt-A N I lattice 1fca Y lattice 1nkl Y lattice 1pgb Y lattice 1b0n N M lmds 1bba N NMR lmds 1igd Y lmds 1shf Y lmds 2ovo N S lmds
Y: Proteins for which side chain entropy is maximized. N: Proteins for which side chain entropy is not maximized. M: Metal binding protein S: Disulfide protein I: Involved in Interaction
Proteins with Disulfide Bonds Proteins with Disulfide Bonds Proteins with Disulfide Bonds
4pti
300 400 500 20 22 24 26 28 30 32 Nc log10(Nsc) 2 4 6 8 20 22 24 26 28 30 32 RMSD log10(Nsc)
Native
Structures with similar compactness can have very different side chain conformational entropy. Native structures tend to maximize side chain conformational entropy. Structures with similar compactness can have Structures with similar compactness can have very different side chain conformational entropy. very different side chain conformational entropy. Native structures tend to maximize side chain Native structures tend to maximize side chain conformational entropy. conformational entropy.
Do structures with similar conformation have similar side chain conformational entropy? Do structures with similar conformation Do structures with similar conformation have similar side chain conformational have similar side chain conformational entropy? entropy?
- Structures satisfying:
– same sequence, – similar (but not the same) conformations.
X-ray and NMR Structures X X-
- ray and NMR Structures
ray and NMR Structures
- Experimental X-ray structure vs. NMR
structures
– Very similar backbone folds. – Differ in details, such as packing of loop and contacts. – Potential derived from X-ray structures fails to recognize NMR structures and vice versa. Why?
Sergiy O. Garbuzynskiy et. al., Proteins, 60:139–147 (2005)
Side Chain Entropy of X-ray and NMR Structures Side Chain Entropy of Side Chain Entropy of X X-
- ray and NMR Structures
ray and NMR Structures
1eq0 (NMR) : 1hka (X-ray)
12.1 12.2 12.3 12.4 95 100 105 110
1bmw (NMR) : 1who (X-ray)
X-ray
Rg ln(Nsc)
14.90 14.95 15.00 15.05 15.10 160 170 180 190
X-ray
Rg ln(Nsc)
X-ray structures have similar fold and compactness as NMR structures, but higher side chain entropy.
Side Chain Entropy Difference between X-ray and NMR Structures Side Chain Entropy Difference between Side Chain Entropy Difference between X X-
- ray and NMR Structures
ray and NMR Structures
50 100 150 200 250 −10 10 20 30
)} ( max{ ) ( ln NMR N ray X N
sc sc
−
Protein Length Entropy Difference
In general, X-ray structure has higher side chain entropy than NMR structures of the same protein.
Two Packing Modes
- - Balance between Enthalpy and Entropy
Two Packing Modes Two Packing Modes
- - Balance between Enthalpy and Entropy
Balance between Enthalpy and Entropy
1ah2 (NMR) : 1svn (X-ray)
13.2 13.3 13.4 13.5 13.6 110 120 130 140 Rg log(Nsc)
1pfl (NMR) : 1fil (X-ray)
RMSD: 1.65 Å
X-ray
RMSD: 1.76 Å
16.0 16.1 16.2 16.3 16.4 16.5 200 205 210 215 Rg log(Nsc)
X-ray
Higher compactness, comparable side chain entropy. Lower compactness, much higher side chain entropy.
Summary for Side Chain Modeling Summary for Side Chain Modeling Summary for Side Chain Modeling
- Protein folding is a subtle balance between enthalpy
and entropy, not simply minimizing enthalpy to compensate the lose of entropy.
- Side chain entropy plays very important role in
protein stability, and can be used in discrimination of native and decoy structures, especially similar structures.
- Packing of NMR structures are sub-optimal compared
to X-ray structures.
Acknowledgement Acknowledgement Acknowledgement
- Prof. Jun Liu
Computational Biology Lab Department of Statistics Harvard University
- Prof. Jie Liang
Bioengineering Department University of Illinois at Chicago
- Prof. Rong Chen
Department of Information and Decision Science University of Illinois at Chicago
- Dr. Ming Lin