Toward automated structure determination from near-atomic - - PowerPoint PPT Presentation

toward automated structure determination from near atomic
SMART_READER_LITE
LIVE PREVIEW

Toward automated structure determination from near-atomic - - PowerPoint PPT Presentation

Toward automated structure determination from near-atomic resolution data Frank DiMaio University of Washington Institute for Protein Design November 2014 2 Accurate structure determination with RosettaEM Homology modelling de novo


slide-1
SLIDE 1

Toward automated structure determination from near-atomic resolution data

Frank DiMaio University of Washington Institute for Protein Design November 2014

slide-2
SLIDE 2

Accurate structure determination with RosettaEM

2

All-atom refinement

  • B factor fitting
  • Cross validation

de novo model building Homology modelling

  • Template identification
  • Multi-model docking

Model extension

slide-3
SLIDE 3

Accurate structure determination with RosettaEM

3

All-atom refinement

  • B factor fitting
  • Cross validation

de novo model building Homology modelling

  • Template identification
  • Multi-model docking

Model extension

slide-4
SLIDE 4

Lack of sidechain detail makes identifying sequence difficult

4

4.8Å reconstruction 20S proteasome (courtesy Yifan Cheng & Xueming Li)

Sequence registration Backbone tracing

Crystallographic “autotracing”:

slide-5
SLIDE 5

Searching density for local backbone conformations

Local sequence restricts local structure

5

6-dimensional search Ray Wang (in review) sidechain building & refinement

…CVKVTKPLVARAKL…

slide-6
SLIDE 6

Selecting a maximally consistent set of fragments

Idea: The correct placements must all be consistent

  • adjacent fragments must assign the

same residue to the same location

  • residues close in sequence

must be close in space

  • no two residues can
  • ccupy the same space

6

score(F ) =

f

) = X

fi∈F

scdens(fi) + X

fi,fj∈F

scoverlap(fi, fj) + X

fi,fj∈F

scclose(fi, fj) + X

fi,fj∈F

scclash(fi, fj)

slide-7
SLIDE 7

Density Map! Round 1!

Monte Carlo sampling correctly identifies sequence

Accuracy

  • 3.67
  • 3.66
  • 3.65

Score (103)

50 52 54 56 58 60 62 64 Number of fragments assigned

Monte Carlo Sampling!

0.5 1 2.5 5 10 25 45

RMSD

H1 S1 S2 S3 H2 H3 S5 S6 S7 S8 H4 H5 S9 S10 H6

Partial Model! ! Fragment Placement!

slide-8
SLIDE 8

0.5 1 2.5 5 10 25 45

RMSD

0.5 1 2.5 5 10 25 45 25 50 75 100 125 150 175 200 221

RMSD

  • 3.67
  • 3.82
  • 3.80
  • 3.78

Score (103)

64 66 68 70 72 74 76 78 Number of fragments assigned

  • 4.66
  • 4.65
  • 4.64

60 70 80 90 100

Score (103)

86 88 90 92 94 96 Number of fragments assigned

Round 2! Round 3!

Residue! Accuracy (%)!

Multiple rounds of sampling completes model

Density Map Partial Model Fragment Placement Monte Carlo

slide-9
SLIDE 9

1.28 A 196/213 rsds 1.19 A

20S proteasome α-subunit at 4.8 Å

Density" Final Partial Model" Overlay of the " fulllength model (red)" to the native (blue)"

slide-10
SLIDE 10

Automatic structure determination is accurate in 6 of 9 cases

10

Target PDB ID (chain) EMDB ID Reported resolution (Å) Length (aa) Partial model Cα RMSd [Å] (%) Cα RMSd [Å] TMV 3j06 (A) 5185 3.3 155 1.3 (81) 1.7 TRPV1 3j5q (A) 5778 3.4 310 1.1 (76) 1.4 FrhA 4ci0 (A) 2513 3.4 385 2.3 (91) 1.3 FrhB 4ci0 (C) 2513 3.4 280 1.4 (85) 1.7 FrhG 4ci0 (B) 2513 3.4 228 1.6 (73) 2.2 BPP1 3j4u (A) 5764 3.5 327 17.2 (42)

  • VP6

1qhd (A) 1461 3.8 397 1.6 (52)

  • 20S-α

1pma (A) TBD 4.8 221 1.3 (88) 1.2 STIV 3j31 (A) 5584 3.9 344 21.9 (26)

slide-11
SLIDE 11

TRPV1" 3.4 Å " FrhB" 3.4 Å " Density" Final Partial Model" Overlay of the " fulllength model (red)" to the native (blue)"

1.26 A 74.9% (236/315 rsds) 1.43 A 1.40 A 85.1% (239/281 rsds) 1.62 A

Automatic structure determination is accurate in 6 of 9 cases

11

slide-12
SLIDE 12

Crystallographic chain tracing is generally unable to register sequence

12

Target PDB ID (chain) Length (aa) Cα atom placed Sequence registered Correctly registered TMV 3j06 (A) 155 145 56 TRPV1 3j5q (A) 315 257 190 FrhA 4ci0 (A) 386 382 367 185 (48%) FrhB 4ci0 (C) 281 192 186 126 (45%) FrhG 4ci0 (B) 228 242 190 63 (27%) BPP1 3j4u (A) 327 339 162 VP6 1qhd (A) 397 405 155 20S-α 1pma (A) 221 224 135 7 (3%) STIV 3j31 (A) 345 553 259

Using Buccaneer:

slide-13
SLIDE 13

Failures are primarily in sheets

13

Native Partial Model

1.62 A 52.1% (207/397 rsds) 2.46 A 20.0% (69/345 rsds)

STIV 3.9 Å Rotavirus-vp6 3.8 Å

Density

slide-14
SLIDE 14

VipAB structure determination

14

with Misha Kudryashev, Marek Basler, Ed Egelman (in review)

VipA: 168 residues VipB: 492 residues

slide-15
SLIDE 15

446/660 residues

VipAB structure determination

slide-16
SLIDE 16

manual model Automated model

Our method corrects errors from the manually traced model

16

slide-17
SLIDE 17

Our method corrects errors from the manually traced model

17

  • 6
  • 5.5
  • 5
  • 4.5
  • 4
  • 3.5
  • 3
  • 2.5
  • 2

GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 LYS_115 GLU_118 GLU_121 automatic manual

Edens

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 LYS_115 GLU_118 GLU_121

Egeom

slide-18
SLIDE 18

Accurate structure determination with RosettaEM

18

All-atom refinement

  • B factor fitting
  • Cross validation

Model extension de novo model building Homology modelling

  • Template identification
  • Multi-model docking
slide-19
SLIDE 19

Refinement against EM density

  • Refinement
  • identify (and correct) errors in the initial model
  • improve fit to data
  • improve model geometry

19

slide-20
SLIDE 20

Refinement at low resolution requires a better geometry potential

High-resolution

Egeom Edata

20

Edata

Low-resolution

E = Egeom + w⋅ Edata

Egeom Refinement: find atom positions optimizing:

slide-21
SLIDE 21

Rosetta forcefield disambiguates low-resolution solutions

21

Core packing

Electro- statics Hydrogen bonding Torsional probabilities Rotamer probabilities

Information from known structures reduces conformational space

+ tools for improved optimization (discrete sidechain optimization, torsion and Cartesian space minimization, dynamics)

slide-22
SLIDE 22

Our approach improves refinement against low-resolution crystallographic data

22 2 4 6 8 10 12 14 0-1 1-2 2-3 3-4 4-5 5-6 6-7 Number of structures RMS to deposited structure start phenix DEN Refmac Rosetta

slide-23
SLIDE 23

Key components for refinement against cryoEM

  • Model validation
  • Independent map agreement over high-resolution shells
  • Variations in local resolution
  • Atomic B factors describing how spread the density

is around each atom

  • Small radius of convergence
  • Discrete backbone optimization in refinement

23

slide-24
SLIDE 24

Independent validation

Refine models into reconstruction 1 “train map” Evaluate models against reconstruction 2 “test map”

24

slide-25
SLIDE 25

Independent validation

!

  • 0.2

0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3

Fourier Shell Correlation S (1/Å)

Training versus test Training versus w=0.1 model Testing versus w=0.1 model Training versus w=20 model Testing versus w=20 model

1/resolution Fourier Shell correlation

6Å 10Å

25

slide-26
SLIDE 26

Independent validation

  • 4
  • 2

2 4 6 8 10 12 14 16 0.45 0.50 0.55 0.60 0.65 0.70

  • 3
  • 2
  • 1

1 2 Rosetta energy (x105) FSC correlation (12-6Å ) Density weight log(wa) training map testing map energy 26

slide-27
SLIDE 27

Fitting atomic B factors

  • In addition to refining atomic coords, refine

per-atom B factors (in real space)

  • Alternate coordinate refinement and B factor refinement
  • Constraint function keeps B factors of nearby atoms close

27

slide-28
SLIDE 28

Model B’s have good agreement with crystallographic Bs

CryoEM map, real-space B factors Deposited crystal structure (1pma)

28

slide-29
SLIDE 29

Iterative density-guided conformational sampling

29

find allowed backbone conformations

  • ptimize into density

with minimal forcefield

slide-30
SLIDE 30

Assessing the role of starting-model quality on structure determination

30

Template Sequence ID 1yar 100% 3h4p 50% 3nzj 32% 1iru 30% 1ryp 30% 1q5q 26% 3unf 25% 1m4y 20% 2x3b/2z3b 19% 4hnz 17% 1g3k 17% 1g0u 17%

20S proteasome at 3.3Å resolution with Yifan Cheng, Xueming Li Fraction residues within 1A Starting model (sorted by difficulty)

0.00 0.20 0.40 0.60 0.80 1.00 input MDFF Rosetta

slide-31
SLIDE 31

Fraction residues within 1Å Starting model (sorted by difficulty)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

6.0Å 1k particles 4.4Å 3k particles 4.1Å 5k particles

31 input MDFF Rosetta

We can accurately determine structures to atomic resolution at 4.4Å or better

slide-32
SLIDE 32

Model convergence is an indicator of accuracy

32

1gou (3.3Å) 1gou (6.0Å)

slide-33
SLIDE 33

33

Independent FSC is an indicator of accuracy (though not absolute)

Fraction of residues within 1 Å FSC

slide-34
SLIDE 34

Model strain also can indicate errors

34

Residue Angle violations (energy units)

slide-35
SLIDE 35

– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations

Refinement of TRPV1: Deposited structure

35

slide-36
SLIDE 36

Local strain reveals errors

– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations

36

slide-37
SLIDE 37

Local strain reveals errors

– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations

37

slide-38
SLIDE 38

– Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – Cβ deviations – Ramachandran angles – No violations

Final refined model

38

slide-39
SLIDE 39

Cross-validation – low/no overfitting

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Fourier Shell Correlation Resolution (1/Angstrom) FSC on testing map Deposited model Rosetta-refined model Resolution limit (3.4 A)

model-map FSC deposited versus refined

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Fourier Shell Correlation Resolution (1/Angstrom) Map for training Map for testing (cross-validation) Resolution limit (3.4 A)

model-map FSC train versus test

39

slide-40
SLIDE 40

Conclusions

  • Atomic accuracy is possible from

near-atomic resolution (up to 4.5Å) data

  • Have we solved it? Do we have…
  • Good fit to independent data (locally and globally)?
  • No model strain / molprobity outliers?
  • Well converged ensemble of solutions satisfying the above two?

40

slide-41
SLIDE 41

Method availability

41

All-atom refinement

  • B factor fitting
  • Cross validation

de novo model building Homology modelling

  • Template identification
  • (Multi-model) docking

Model extension Available currently

  • demos, documentation

Available currently (as RosettaCM)

  • demos, documentation
  • alternate methods under development

Available ~Dec 2014

slide-42
SLIDE 42

Acknowledgements

  • Collaborators
  • Wah Chiu (Baylor), Junjie Zhang (Texas A&M)
  • Tom Marlovits (IMBA, Austria)
  • Ed Egelman (U. Virginia)
  • Misha Kudryashev, Marek Basler (U. Basel)
  • Xueming Li, Yifan Cheng (UCSF)
  • Students & Postdoc
  • Ray Wang
  • Patrick Conway
  • Brandon Frenz
  • Zibo Chen
  • Ryan Pavlovicz

42