Protein Structure Analysis with Protein Structure Analysis with - - PowerPoint PPT Presentation

protein structure analysis with protein structure
SMART_READER_LITE
LIVE PREVIEW

Protein Structure Analysis with Protein Structure Analysis with - - PowerPoint PPT Presentation

Protein Structure Analysis with Protein Structure Analysis with Protein Structure Analysis with Sequential Monte Carlo Method Sequential Monte Carlo Method Sequential Monte Carlo Method Jinfeng Zhang Computational Biology Lab Department of


slide-1
SLIDE 1

Protein Structure Analysis with Sequential Monte Carlo Method Protein Structure Analysis with Protein Structure Analysis with Sequential Monte Carlo Method Sequential Monte Carlo Method

Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

slide-2
SLIDE 2

Introduction Introduction Introduction

  • Structure Function & Interaction

– Protein structure initiative (PSI) is speeding up the information flow from sequence to structures. – Information does not readily flow from structures to structures. – Neither does it readily flow from structures to applications.

  • What are the bottle necks?

– Sampling method. – Potential function.

slide-3
SLIDE 3

Sampling Methods

  • - Folding & Growth

Sampling Methods Sampling Methods

  • - Folding & Growth

Folding & Growth

Growth Method Folding Method

From http://www.bioinformatics.buffalo.edu/

slide-4
SLIDE 4

Sequential Monte Carlo (SMC)

  • - Step by Step

Sequential Monte Carlo (SMC) Sequential Monte Carlo (SMC)

  • - Step by Step

Step by Step

. .

Each sample has a weight!

. . . . . . . .

Resampling

slide-5
SLIDE 5

SMC

  • - Summary

SMC SMC

  • - Summary

Summary

  • Short chains:

– Exhaustive enumeration, useful for evaluation of SMC performance.

  • Long chains:

– Sequential Monte Carlo, estimating interesting properties.

  • The main ingredients of SMC are:

– Sequence of distributions “approaching” the target distribution π(x1,…,xn). – Sampling distribution gt+1(xt+1|x1,…,xt). – Resampling scheme.

slide-6
SLIDE 6

Reference for SMC Reference for SMC Reference for SMC

  • J.S. Liu and R. Chen (1998). SMC for dynamic systems. J

Amer Statist Assoc 93, 1032-45.

  • J.S. Liu (2001). Monte Carlo Strategies in Scientific
  • Computing. Springer-Verlag.
  • J. Liang, J. Zhang, R. Chen, (2002). J. Chem. Phys. 117:7,

3511-3521.

  • J. Zhang, R. Chen, C. Tang, and J. Liang, (2003). J. Chem.
  • Phys. 118:12, 6102-6109.
  • J. Zhang, Y. Chen, R. Chen, and J. Liang, (2004). J. Chem.
  • Phys. 121:1, 592-603.
slide-7
SLIDE 7

Near Native Structures of Proteins Near Native Structures of Proteins Near Native Structures of Proteins

slide-8
SLIDE 8

Native State is an Ensemble of Structures Native State is an Ensemble of Structures Native State is an Ensemble of Structures

Ca2+ ATPase pump Lac repressor

2BBN

  • Protein functions and interactions are

determined by the near native structures.

slide-9
SLIDE 9

Biological Problems Biological Problems Biological Problems

  • Stability

– Probability of NNS under Boltzmann distribution.

  • Function

– Analysis of NNS to detect correlated structural changes.

  • Interaction

– Near native structures with diversified interfaces.

  • Difficulty of protein structure prediction

– Probability of NNS under uniform distribution.

slide-10
SLIDE 10

Methods for Studying NNS Methods for Studying NNS Methods for Studying NNS

  • Experimental method, such as NMR

– Study one protein at a time. Limited to protein types.

  • MD simulation

– Computationally expensive. Applicable for small proteins.

  • MCMC

– Folding around the constrained native structure template is not efficient.

  • NMR combined with MD

– Vendruscolo M, et. al. Nature (2005), 433:128-32

slide-11
SLIDE 11

Near Native Structures

  • - Connecting Experimental Structures and Applications

Near Native Structures Near Native Structures

  • - Connecting Experimental Structures and Applications

Connecting Experimental Structures and Applications

SMC

slide-12
SLIDE 12

Representation of Protein Structures Representation of Protein Structures Representation of Protein Structures

  • Optimized discrete state

model (ODSM).

αi τi Ci+1 Ci Ci-1 Ci-2 SCi SCi-1

  • Accuracy of ODSM.

3 4 5 6 7 8 9 10 1.5 2.0 2.5 3.0 Discrete State cRMSD

ALA PRO GLY HIS

slide-13
SLIDE 13

Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS

Near Native Structures SMC

Native structure

  • Definition of NNS:

–Structures with RMSD < 3 Å to native structure. –Other similarity measures are possible.

slide-14
SLIDE 14

Comparison with Enumeration I.

  • - Estimation of Number of Conformations

Comparison with Enumeration I. Comparison with Enumeration I.

  • - Estimation of Number of Conformations

Estimation of Number of Conformations Sample size: 10,000. 1.042×109 1.039×109

10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 5 State SMC

1ail

slide-15
SLIDE 15

Comparison with Enumeration II.

  • - Estimation of NNS

Comparison with Enumeration II. Comparison with Enumeration II.

  • - Estimation of NNS

Estimation of NNS

RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4

L 15 Enum.

RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4

L 15 Enum. L 15 SMC

RMSD Bin: 1: 1.0 Å - 1.5 Å; 2: 1.5 Å - 2.0 Å; 3: 2.0 Å - 2.5 Å; 4: 2.5 Å - 3.0 Å;

5.94 × 10-8 5.60 × 10-8 Sample size: 10,000.

slide-16
SLIDE 16

Comparison with Enumeration III.

  • - Estimation of Native Contacts

Comparison with Enumeration III. Comparison with Enumeration III.

  • - Estimation of Native Contacts

Estimation of Native Contacts

Probability

a

0.2 0.4 0.6 0.8

Enum.

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum.

Probability

a

0.2 0.4 0.6 0.8

Enum. SMC

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum.

Probability

a

0.2 0.4 0.6 0.8

Enum. SMC

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum. SMC

1nkd, RMSD Bin-2: 1.5 Å - 2.0 Å; 1nkd, RMSD Bin-4: 2.5 Å - 3.0 Å;

slide-17
SLIDE 17

Probability of NNS

  • - How Difficult Protein Structure Prediction is?

Probability of NNS Probability of NNS

  • - How Difficult Protein Structure Prediction is?

How Difficult Protein Structure Prediction is?

Probability of NNS for 70 non-homologous proteins grouped by their length with 5 residues per interval.

60 80 100 120 140 −70 −60 −50 −40 −30 −20 −10 RMSD < 3A RMSD < 4A RMSD < 5A

Length log10(Probability)

slide-18
SLIDE 18

Probability of NNS

  • - Effect of Model Complexity

Probability of NNS Probability of NNS

  • - Effect of Model Complexity

Effect of Model Complexity

Average probability of NNS for 8 proteins at partial length and full length.

20 30 40 50 5 15 25 Length log10(N) a

4−state 5−state 6−state 8−state

20 30 40 50 5 15 25 Length log10(N) a

4−state 5−state 6−state 8−state

20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 20 30 40 50 5 15 25 Length log10(N) a

4−state 5−state 6−state 8−state

20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 20 30 40 50 5 15 25 Length log10(N) a

4−state 5−state 6−state 8−state

20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 6−state 20 30 40 50 5 15 25 Length log10(N) a

4−state 5−state 6−state 8−state

20 40 60 80 −60 −40 −20 Length log10(P) b 4−state 5−state 6−state 8−state

  • 4,5,6,8-state models all have same probability of NNS.
slide-19
SLIDE 19

Probability Under Boltzmann Distribution Probability Under

  • - Contact Potentials

Probability Under Boltzmann Boltzmann Distribution Distribution

  • - Contact Potentials

Contact Potentials

Piotr Pokarowski et. al., PROTEINS, 59:49–57 (2005)

slide-20
SLIDE 20

Probability of NNS Under Boltzmann Distributions Probability of NNS Under Probability of NNS Under Boltzmann Boltzmann Distributions Distributions

  • Probability of NNS for 32 proteins with length from 31 to 90.

30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Length Bin Uniform distribution of 5−state model 30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Length Bin Uniform distribution of 5−state model Boltzmann distribution of 5−state model 30 40 50 60 70 80 90 −60 −50 −40 −30 −20 −10 Uniform distribution of 5−state model Boltzmann distribution of 5−state model Boltzmann distribution of 6−state model

Length log10(Probability)

  • Pair-wise contact potential function stabilize NNS poorly.
slide-21
SLIDE 21

Summary for NNS Summary for NNS Summary for NNS

  • Sequential Monte Carlo (SMC) for studying

near native structures (NNS).

  • Probability of NNS is estimated for proteins up

to length 150.

  • Models with different complexities have same

probability of NNS.

  • Rigorous evaluation criterion for potential
  • functions. Contact potentials do not stabilize

native structures.

slide-22
SLIDE 22

Side Chain Modeling Side Chain Modeling Side Chain Modeling

slide-23
SLIDE 23

Introduction Introduction Introduction

  • Side chain modeling is important for protein

structure prediction, protein interaction, and protein design.

  • Most current methods are looking for single

conformation with minimum potential energy.

  • In structure prediction, the energy of a

conformation is normally calculated ignoring the side chain conformational entropy.

slide-24
SLIDE 24

Questions Questions Questions

  • Do structures with similar compactness have similar

side chain conformational entropy?

  • Do structures with similar fold have similar side

chain conformational entropy?

  • Do native structures have higher side chain entropy

than random structures with similar compactness or similar fold? We address these questions with our new side chain modeling method.

slide-25
SLIDE 25

SMC for Side Chain Modeling SMC for Side Chain Modeling SMC for Side Chain Modeling

  • Number of side chain

conformations, Nsc.

  • Side chain conformational

entropy. Ssc = kBln(Nsc)

  • Stability.
  • Folding and Packing.
slide-26
SLIDE 26

Validation of SMC

  • - Comparison with Enumeration

Validation of SMC Validation of SMC

  • - Comparison with Enumeration

Comparison with Enumeration

8 10 12 14 16 18 5 10 15 20 25 30 Length log(Nsc)

Enumeration SMC 2ovo 3ebx

The total SAW side chain conformation for a fragment of 3ebx, residue 1-17, is 396,325,923,840 (3.96×1011). The estimated number is 4.01×1011 with a sample size of 1,000 for 10 runs.

slide-27
SLIDE 27

Do structures with similar compactness have similar side chain conformational entropy? Do structures with similar compactness Do structures with similar compactness have similar side chain conformational have similar side chain conformational entropy? entropy?

  • Structures satisfying:

– same sequence, – similar compactness, – different backbone conformations.

slide-28
SLIDE 28

Decoys Structures Decoys Structures Decoys Structures

  • Decoys are generated to fool potential functions.
  • 24 decoy proteins are selected from 5 decoy sets in

Decoys ‘R’ Us database.

– 4state_reduced: 7 proteins (about 600 structures each protein). – fisa: 3 proteins (500 decoys). – fisa_casp3: 4 proteins (1000-2500 decoys). – lattice_ssfit: 5 proteins (2000 decoys). – lmds: 5 proteins (300-500 decoys).

  • Compactness are measured by one of the two

parameters: radius of gyration (Rg) or number of residue contact (Nc).

slide-29
SLIDE 29

Side Chain Entropy of Native and Decoys Structures Side Chain Entropy of Native Side Chain Entropy of Native and Decoys Structures and Decoys Structures

400 500 600 700 25 30 35

Native Nc log10(Nsc)

1ctf

On average, the number of side chain conformations for native 1ctf is 105 times more than a decoy structure!

slide-30
SLIDE 30

Native vs. Decoys Native vs. Decoys Native vs. Decoys

Protein Nsc Type DecoySet Protein Nsc Type DecoySet

1ctf Y 4state 1r69 Y 4state 1sn3 Y 4state 2cro Y 4state 3icb N M 4state 4pti N S 4state 4rxn N M 4state 1fc2 N I fisa 1hdd-C N I fisa 4icb N M fisa 1bg8-A N S fisa_casp3 1bl0 Y fisa_casp3 1eh2 N M fisa_casp3 smd3 Y fisa_casp3 1beo N S lattice 1dkt-A N I lattice 1fca Y lattice 1nkl Y lattice 1pgb Y lattice 1b0n N M lmds 1bba N NMR lmds 1igd Y lmds 1shf Y lmds 2ovo N S lmds

Y: Proteins for which side chain entropy is maximized. N: Proteins for which side chain entropy is not maximized. M: Metal binding protein S: Disulfide protein I: Involved in Interaction

slide-31
SLIDE 31

Proteins with Disulfide Bonds Proteins with Disulfide Bonds Proteins with Disulfide Bonds

4pti

300 400 500 20 22 24 26 28 30 32 Nc log10(Nsc) 2 4 6 8 20 22 24 26 28 30 32 RMSD log10(Nsc)

Native

slide-32
SLIDE 32

Structures with similar compactness can have very different side chain conformational entropy. Native structures tend to maximize side chain conformational entropy. Structures with similar compactness can have Structures with similar compactness can have very different side chain conformational entropy. very different side chain conformational entropy. Native structures tend to maximize side chain Native structures tend to maximize side chain conformational entropy. conformational entropy.

slide-33
SLIDE 33

Do structures with similar conformation have similar side chain conformational entropy? Do structures with similar conformation Do structures with similar conformation have similar side chain conformational have similar side chain conformational entropy? entropy?

  • Structures satisfying:

– same sequence, – similar (but not the same) conformations.

slide-34
SLIDE 34

X-ray and NMR Structures X X-

  • ray and NMR Structures

ray and NMR Structures

  • Experimental X-ray structure vs. NMR

structures

– Very similar backbone folds. – Differ in details, such as packing of loop and contacts. – Potential derived from X-ray structures fails to recognize NMR structures and vice versa. Why?

Sergiy O. Garbuzynskiy et. al., Proteins, 60:139–147 (2005)

slide-35
SLIDE 35

Side Chain Entropy of X-ray and NMR Structures Side Chain Entropy of Side Chain Entropy of X X-

  • ray and NMR Structures

ray and NMR Structures

1eq0 (NMR) : 1hka (X-ray)

12.1 12.2 12.3 12.4 95 100 105 110

1bmw (NMR) : 1who (X-ray)

X-ray

Rg ln(Nsc)

14.90 14.95 15.00 15.05 15.10 160 170 180 190

X-ray

Rg ln(Nsc)

X-ray structures have similar fold and compactness as NMR structures, but higher side chain entropy.

slide-36
SLIDE 36

Side Chain Entropy Difference between X-ray and NMR Structures Side Chain Entropy Difference between Side Chain Entropy Difference between X X-

  • ray and NMR Structures

ray and NMR Structures

50 100 150 200 250 −10 10 20 30

)} ( max{ ) ( ln NMR N ray X N

sc sc

Protein Length Entropy Difference

In general, X-ray structure has higher side chain entropy than NMR structures of the same protein.

slide-37
SLIDE 37

Two Packing Modes

  • - Balance between Enthalpy and Entropy

Two Packing Modes Two Packing Modes

  • - Balance between Enthalpy and Entropy

Balance between Enthalpy and Entropy

1ah2 (NMR) : 1svn (X-ray)

13.2 13.3 13.4 13.5 13.6 110 120 130 140 Rg log(Nsc)

1pfl (NMR) : 1fil (X-ray)

RMSD: 1.65 Å

X-ray

RMSD: 1.76 Å

16.0 16.1 16.2 16.3 16.4 16.5 200 205 210 215 Rg log(Nsc)

X-ray

Higher compactness, comparable side chain entropy. Lower compactness, much higher side chain entropy.

slide-38
SLIDE 38

Summary for Side Chain Modeling Summary for Side Chain Modeling Summary for Side Chain Modeling

  • Protein folding is a subtle balance between enthalpy

and entropy, not simply minimizing enthalpy to compensate the lose of entropy.

  • Side chain entropy plays very important role in

protein stability, and can be used in discrimination of native and decoy structures, especially similar structures.

  • Packing of NMR structures are sub-optimal compared

to X-ray structures.

slide-39
SLIDE 39

Acknowledgement Acknowledgement Acknowledgement

  • Prof. Jun Liu

Computational Biology Lab Department of Statistics Harvard University

  • Prof. Jie Liang

Bioengineering Department University of Illinois at Chicago

  • Prof. Rong Chen

Department of Information and Decision Science University of Illinois at Chicago

  • Dr. Ming Lin

Department of IDS, UIC NIH