[PPT] - Protein Structure Analysis with Protein Structure Analysis with PowerPoint Presentation

SLIDE 1

Protein Structure Analysis with Sequential Monte Carlo Method Protein Structure Analysis with Protein Structure Analysis with Sequential Monte Carlo Method Sequential Monte Carlo Method

Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

SLIDE 2

Introduction Introduction Introduction

Structure Function & Interaction

– Protein structure initiative (PSI) is speeding up the information flow from sequence to structures. – Information does not readily flow from structures to structures. – Neither does it readily flow from structures to applications.

What are the bottle necks?

– Sampling method. – Potential function.

SLIDE 3

Sampling Methods

- Folding & Growth

Sampling Methods Sampling Methods

- Folding & Growth

Folding & Growth

Growth Method Folding Method

From http://www.bioinformatics.buffalo.edu/

SLIDE 4

Sequential Monte Carlo (SMC)

- Step by Step

Sequential Monte Carlo (SMC) Sequential Monte Carlo (SMC)

- Step by Step

Step by Step

. .

Each sample has a weight!

. . . . . . . .

Resampling

SLIDE 5

SMC

- Summary

SMC SMC

- Summary

Summary

Short chains:

– Exhaustive enumeration, useful for evaluation of SMC performance.

Long chains:

– Sequential Monte Carlo, estimating interesting properties.

The main ingredients of SMC are:

– Sequence of distributions “approaching” the target distribution π(x1,…,xn). – Sampling distribution gt+1(xt+1|x1,…,xt). – Resampling scheme.

SLIDE 6

Reference for SMC Reference for SMC Reference for SMC

J.S. Liu and R. Chen (1998). SMC for dynamic systems. J

Amer Statist Assoc 93, 1032-45.

J.S. Liu (2001). Monte Carlo Strategies in Scientific
Computing. Springer-Verlag.
J. Liang, J. Zhang, R. Chen, (2002). J. Chem. Phys. 117:7,

3511-3521.

J. Zhang, R. Chen, C. Tang, and J. Liang, (2003). J. Chem.
Phys. 118:12, 6102-6109.
J. Zhang, Y. Chen, R. Chen, and J. Liang, (2004). J. Chem.
Phys. 121:1, 592-603.

SLIDE 7

Near Native Structures of Proteins Near Native Structures of Proteins Near Native Structures of Proteins

SLIDE 8

Native State is an Ensemble of Structures Native State is an Ensemble of Structures Native State is an Ensemble of Structures

Ca2+ ATPase pump Lac repressor

2BBN

Protein functions and interactions are

determined by the near native structures.

SLIDE 9

Biological Problems Biological Problems Biological Problems

Stability

– Probability of NNS under Boltzmann distribution.

Function

– Analysis of NNS to detect correlated structural changes.

Interaction

– Near native structures with diversified interfaces.

Difficulty of protein structure prediction

– Probability of NNS under uniform distribution.

SLIDE 10

Methods for Studying NNS Methods for Studying NNS Methods for Studying NNS

Experimental method, such as NMR

– Study one protein at a time. Limited to protein types.

MD simulation

– Computationally expensive. Applicable for small proteins.

MCMC

– Folding around the constrained native structure template is not efficient.

NMR combined with MD

– Vendruscolo M, et. al. Nature (2005), 433:128-32

SLIDE 11

Near Native Structures

- Connecting Experimental Structures and Applications

Near Native Structures Near Native Structures

- Connecting Experimental Structures and Applications

Connecting Experimental Structures and Applications

SMC

SLIDE 12

Representation of Protein Structures Representation of Protein Structures Representation of Protein Structures

Optimized discrete state

model (ODSM).

αi τi Ci+1 Ci Ci-1 Ci-2 SCi SCi-1

Accuracy of ODSM.

3 4 5 6 7 8 9 10 1.5 2.0 2.5 3.0 Discrete State cRMSD

ALA PRO GLY HIS

SLIDE 13

Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS Sequential Monte Carlo for Sampling NNS

Near Native Structures SMC

Native structure

Definition of NNS:

–Structures with RMSD < 3 Å to native structure. –Other similarity measures are possible.

SLIDE 14

Comparison with Enumeration I.

- Estimation of Number of Conformations

Comparison with Enumeration I. Comparison with Enumeration I.

- Estimation of Number of Conformations

Estimation of Number of Conformations Sample size: 10,000. 1.042×109 1.039×109

10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 10 11 12 13 14 15 12 14 16 18 20 22 24 Length ln(Number of Conformations) 5 State Enum. 5 State SMC

1ail

SLIDE 15

Comparison with Enumeration II.

- Estimation of NNS

Comparison with Enumeration II. Comparison with Enumeration II.

- Estimation of NNS

Estimation of NNS

RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4

L 15 Enum.

RMSD Bin ln(Probability) −18 −16 −14 −12 −10 −8 −6 1 2 3 4

L 15 Enum. L 15 SMC

RMSD Bin: 1: 1.0 Å - 1.5 Å; 2: 1.5 Å - 2.0 Å; 3: 2.0 Å - 2.5 Å; 4: 2.5 Å - 3.0 Å;

5.94 × 10-8 5.60 × 10-8 Sample size: 10,000.

SLIDE 16

Comparison with Enumeration III.

- Estimation of Native Contacts

Comparison with Enumeration III. Comparison with Enumeration III.

- Estimation of Native Contacts

Estimation of Native Contacts

Probability

a

0.2 0.4 0.6 0.8

Enum.

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum.

Probability

a

0.2 0.4 0.6 0.8

Enum. SMC

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum.

Probability

a

0.2 0.4 0.6 0.8

Enum. SMC

5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Probability

Native Contact

b Enum. SMC

1nkd, RMSD Bin-2: 1.5 Å - 2.0 Å; 1nkd, RMSD Bin-4: 2.5 Å - 3.0 Å;

SLIDE 17

Probability of NNS

- How Difficult Protein Structure Prediction is?

Probability of NNS Probability of NNS

- How Difficult Protein Structure Prediction is?

How Difficult Protein Structure Prediction is?

Probability of NNS for 70 non-homologous proteins grouped by their length with 5 residues per interval.

60 80 100 120 140 −70 −60 −50 −40 −30 −20 −10 RMSD < 3A RMSD < 4A RMSD < 5A

Length log10(Probability)

SLIDE 18

Probability of NNS

- Effect of Model Complexity

Probability of NNS Probability of NNS

- Effect of Model Complexity