Toward automated structure determination from near-atomic - - PowerPoint PPT Presentation
Toward automated structure determination from near-atomic - - PowerPoint PPT Presentation
Toward automated structure determination from near-atomic resolution data Frank DiMaio University of Washington Institute for Protein Design November 2014 2 Accurate structure determination with RosettaEM Homology modelling de novo
Accurate structure determination with RosettaEM
2
All-atom refinement
- B factor fitting
- Cross validation
de novo model building Homology modelling
- Template identification
- Multi-model docking
Model extension
Accurate structure determination with RosettaEM
3
All-atom refinement
- B factor fitting
- Cross validation
de novo model building Homology modelling
- Template identification
- Multi-model docking
Model extension
Lack of sidechain detail makes identifying sequence difficult
4
4.8Å reconstruction 20S proteasome (courtesy Yifan Cheng & Xueming Li)
Sequence registration Backbone tracing
Crystallographic “autotracing”:
Searching density for local backbone conformations
Local sequence restricts local structure
5
6-dimensional search Ray Wang (in review) sidechain building & refinement
…CVKVTKPLVARAKL…
Selecting a maximally consistent set of fragments
Idea: The correct placements must all be consistent
- adjacent fragments must assign the
same residue to the same location
- residues close in sequence
must be close in space
- no two residues can
- ccupy the same space
6
score(F ) =
f
) = X
fi∈F
scdens(fi) + X
fi,fj∈F
scoverlap(fi, fj) + X
fi,fj∈F
scclose(fi, fj) + X
fi,fj∈F
scclash(fi, fj)
Density Map! Round 1!
Monte Carlo sampling correctly identifies sequence
Accuracy
- 3.67
- 3.66
- 3.65
Score (103)
50 52 54 56 58 60 62 64 Number of fragments assigned
Monte Carlo Sampling!
0.5 1 2.5 5 10 25 45
RMSD
H1 S1 S2 S3 H2 H3 S5 S6 S7 S8 H4 H5 S9 S10 H6
Partial Model! ! Fragment Placement!
0.5 1 2.5 5 10 25 45
RMSD
0.5 1 2.5 5 10 25 45 25 50 75 100 125 150 175 200 221
RMSD
- 3.67
- 3.82
- 3.80
- 3.78
Score (103)
64 66 68 70 72 74 76 78 Number of fragments assigned
- 4.66
- 4.65
- 4.64
60 70 80 90 100
Score (103)
86 88 90 92 94 96 Number of fragments assigned
Round 2! Round 3!
Residue! Accuracy (%)!
Multiple rounds of sampling completes model
Density Map Partial Model Fragment Placement Monte Carlo
1.28 A 196/213 rsds 1.19 A
20S proteasome α-subunit at 4.8 Å
Density" Final Partial Model" Overlay of the " fulllength model (red)" to the native (blue)"
Automatic structure determination is accurate in 6 of 9 cases
10
Target PDB ID (chain) EMDB ID Reported resolution (Å) Length (aa) Partial model Cα RMSd [Å] (%) Cα RMSd [Å] TMV 3j06 (A) 5185 3.3 155 1.3 (81) 1.7 TRPV1 3j5q (A) 5778 3.4 310 1.1 (76) 1.4 FrhA 4ci0 (A) 2513 3.4 385 2.3 (91) 1.3 FrhB 4ci0 (C) 2513 3.4 280 1.4 (85) 1.7 FrhG 4ci0 (B) 2513 3.4 228 1.6 (73) 2.2 BPP1 3j4u (A) 5764 3.5 327 17.2 (42)
- VP6
1qhd (A) 1461 3.8 397 1.6 (52)
- 20S-α
1pma (A) TBD 4.8 221 1.3 (88) 1.2 STIV 3j31 (A) 5584 3.9 344 21.9 (26)
TRPV1" 3.4 Å " FrhB" 3.4 Å " Density" Final Partial Model" Overlay of the " fulllength model (red)" to the native (blue)"
1.26 A 74.9% (236/315 rsds) 1.43 A 1.40 A 85.1% (239/281 rsds) 1.62 A
Automatic structure determination is accurate in 6 of 9 cases
11
Crystallographic chain tracing is generally unable to register sequence
12
Target PDB ID (chain) Length (aa) Cα atom placed Sequence registered Correctly registered TMV 3j06 (A) 155 145 56 TRPV1 3j5q (A) 315 257 190 FrhA 4ci0 (A) 386 382 367 185 (48%) FrhB 4ci0 (C) 281 192 186 126 (45%) FrhG 4ci0 (B) 228 242 190 63 (27%) BPP1 3j4u (A) 327 339 162 VP6 1qhd (A) 397 405 155 20S-α 1pma (A) 221 224 135 7 (3%) STIV 3j31 (A) 345 553 259
Using Buccaneer:
Failures are primarily in sheets
13
Native Partial Model
1.62 A 52.1% (207/397 rsds) 2.46 A 20.0% (69/345 rsds)
STIV 3.9 Å Rotavirus-vp6 3.8 Å
Density
VipAB structure determination
14
with Misha Kudryashev, Marek Basler, Ed Egelman (in review)
VipA: 168 residues VipB: 492 residues
446/660 residues
VipAB structure determination
manual model Automated model
Our method corrects errors from the manually traced model
16
Our method corrects errors from the manually traced model
17
- 6
- 5.5
- 5
- 4.5
- 4
- 3.5
- 3
- 2.5
- 2
GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 LYS_115 GLU_118 GLU_121 automatic manual
Edens
- 2
- 1.5
- 1
- 0.5
0.5 1 1.5 GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 LYS_115 GLU_118 GLU_121
Egeom
Accurate structure determination with RosettaEM
18
All-atom refinement
- B factor fitting
- Cross validation
Model extension de novo model building Homology modelling
- Template identification
- Multi-model docking
Refinement against EM density
- Refinement
- identify (and correct) errors in the initial model
- improve fit to data
- improve model geometry
19
Refinement at low resolution requires a better geometry potential
High-resolution
Egeom Edata
20
Edata
Low-resolution
E = Egeom + w⋅ Edata
Egeom Refinement: find atom positions optimizing:
Rosetta forcefield disambiguates low-resolution solutions
21
Core packing
Electro- statics Hydrogen bonding Torsional probabilities Rotamer probabilities
Information from known structures reduces conformational space
+ tools for improved optimization (discrete sidechain optimization, torsion and Cartesian space minimization, dynamics)
Our approach improves refinement against low-resolution crystallographic data
22 2 4 6 8 10 12 14 0-1 1-2 2-3 3-4 4-5 5-6 6-7 Number of structures RMS to deposited structure start phenix DEN Refmac Rosetta
Key components for refinement against cryoEM
- Model validation
- Independent map agreement over high-resolution shells
- Variations in local resolution
- Atomic B factors describing how spread the density
is around each atom
- Small radius of convergence
- Discrete backbone optimization in refinement
23
Independent validation
Refine models into reconstruction 1 “train map” Evaluate models against reconstruction 2 “test map”
24
Independent validation
!
- 0.2
0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3
Fourier Shell Correlation S (1/Å)
Training versus test Training versus w=0.1 model Testing versus w=0.1 model Training versus w=20 model Testing versus w=20 model
1/resolution Fourier Shell correlation
6Å 10Å
25
Independent validation
- 4
- 2
2 4 6 8 10 12 14 16 0.45 0.50 0.55 0.60 0.65 0.70
- 3
- 2
- 1
1 2 Rosetta energy (x105) FSC correlation (12-6Å ) Density weight log(wa) training map testing map energy 26
Fitting atomic B factors
- In addition to refining atomic coords, refine
per-atom B factors (in real space)
- Alternate coordinate refinement and B factor refinement
- Constraint function keeps B factors of nearby atoms close
27
Model B’s have good agreement with crystallographic Bs
CryoEM map, real-space B factors Deposited crystal structure (1pma)
28
Iterative density-guided conformational sampling
29
find allowed backbone conformations
- ptimize into density
with minimal forcefield
Assessing the role of starting-model quality on structure determination
30
Template Sequence ID 1yar 100% 3h4p 50% 3nzj 32% 1iru 30% 1ryp 30% 1q5q 26% 3unf 25% 1m4y 20% 2x3b/2z3b 19% 4hnz 17% 1g3k 17% 1g0u 17%
20S proteasome at 3.3Å resolution with Yifan Cheng, Xueming Li Fraction residues within 1A Starting model (sorted by difficulty)
0.00 0.20 0.40 0.60 0.80 1.00 input MDFF Rosetta
Fraction residues within 1Å Starting model (sorted by difficulty)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
6.0Å 1k particles 4.4Å 3k particles 4.1Å 5k particles
31 input MDFF Rosetta
We can accurately determine structures to atomic resolution at 4.4Å or better
Model convergence is an indicator of accuracy
32
1gou (3.3Å) 1gou (6.0Å)
33
Independent FSC is an indicator of accuracy (though not absolute)
Fraction of residues within 1 Å FSC
Model strain also can indicate errors
34
Residue Angle violations (energy units)
– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations
Refinement of TRPV1: Deposited structure
35
Local strain reveals errors
– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations
36
Local strain reveals errors
– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations
37
– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations
Final refined model
38
Cross-validation – low/no overfitting
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Fourier Shell Correlation Resolution (1/Angstrom) FSC on testing map Deposited model Rosetta-refined model Resolution limit (3.4 A)
model-map FSC deposited versus refined
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Fourier Shell Correlation Resolution (1/Angstrom) Map for training Map for testing (cross-validation) Resolution limit (3.4 A)
model-map FSC train versus test
39
Conclusions
- Atomic accuracy is possible from
near-atomic resolution (up to 4.5Å) data
- Have we solved it? Do we have…
- Good fit to independent data (locally and globally)?
- No model strain / molprobity outliers?
- Well converged ensemble of solutions satisfying the above two?
40
Method availability
41
All-atom refinement
- B factor fitting
- Cross validation
de novo model building Homology modelling
- Template identification
- (Multi-model) docking
Model extension Available currently
- demos, documentation
Available currently (as RosettaCM)
- demos, documentation
- alternate methods under development
Available ~Dec 2014
Acknowledgements
- Collaborators
- Wah Chiu (Baylor), Junjie Zhang (Texas A&M)
- Tom Marlovits (IMBA, Austria)
- Ed Egelman (U. Virginia)
- Misha Kudryashev, Marek Basler (U. Basel)
- Xueming Li, Yifan Cheng (UCSF)
- Students & Postdoc
- Ray Wang
- Patrick Conway
- Brandon Frenz
- Zibo Chen
- Ryan Pavlovicz
42