Global Molecular Replacement for Protein Structure Determination - - PowerPoint PPT Presentation

global molecular replacement for protein structure
SMART_READER_LITE
LIVE PREVIEW

Global Molecular Replacement for Protein Structure Determination - - PowerPoint PPT Presentation

Global Molecular Replacement for Protein Structure Determination Ian Stokes-Rees SBGrid - Harvard Medical School SBGrid and NEBioGrid Cornell U. Washington U. School of Med. R. Cerione NE-CAT T. Ellenberger B. Crane R. Oswald D. Fremont


slide-1
SLIDE 1

Global Molecular Replacement for Protein Structure Determination

Ian Stokes-Rees SBGrid - Harvard Medical School

slide-2
SLIDE 2

Rice University

  • E. Nikonowicz
  • Y. Shamoo

Y.J. Tao

CalTech

  • P. Bjorkman
  • W. Clemons
  • G. Jensen
  • D. Rees

Stanford

  • A. Brunger
  • K. Garcia
  • T. Jardetzky

UCSF

JJ Miranda

  • Y. Cheng

UC Davis

  • H. Stahlberg

UCSD

  • T. Nakagawa
  • H. Viadiu

WesternU

  • M. Swairjo
  • U. Washington
  • T. Gonen

Washington U. School of Med.

  • T. Ellenberger
  • D. Fremont

Vanderbilt

Center for Structural Biology

Rosalind Franklin

  • D. Harrison
  • A. Leschziner
  • K. Miller
  • A. Rao
  • T. Rapoport
  • M. Samso
  • P. Sliz
  • T. Springer
  • G. Verdine
  • G. Wagner
  • L. Walensky

S.Walker T.Walz

  • J. Wang
  • S. Wong
  • N. Beglova
  • S. Blacklow
  • B. Chen
  • J. Chou
  • J. Clardy
  • M. Eck
  • B. Furie
  • R. Gaudet
  • M. Grant

S.C. Harrison

  • J. Hogle
  • D. Jeruzalmi
  • D. Kahne
  • T. Kirchhausen

Harvard and Affiliates

NE-CAT

  • R. Oswald
  • C. Parrish
  • H. Sondermann
  • R. Cerione
  • B. Crane
  • S. Ealick
  • M. Jin
  • A. Ke

Cornell U. Brandeis U.

  • N. Grigorieff

Tufts U.

  • K. Heldwein

UMass Medical

  • W. Royer

NIH

  • M. Mayer
  • U. Maryland
  • E. Toth
  • K. Reinisch
  • J. Schlessinger
  • F. Sigworth
  • F. Zhou
  • T. Boggon
  • D. Braddock
  • Y. Ha
  • E. Lolis

Yale U.

  • C. Sanders
  • B. Spiller
  • M. Stone
  • M. Waterman
  • W. Chazin
  • B. Eichman
  • M. Egli
  • B. Lacy

Columbia U.

  • Q. Fan

Rockefeller U.

  • R. MacKinnon

Thomas Jefferson

  • J. Williams

Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan

SBGrid and NEBioGrid

slide-3
SLIDE 3

Primary thesis: Molecular replacement, used to solve over 60% of known structures, can benefit from novel computationally intensive techniques to identify search models, including those with low sequence identity or a lack of previous association with the unknown structure. Expected benefits: identify search models which would otherwise be missed; faster bootstrapping of MR search model selection; broaden range of structures amenable to MR, avoiding more costly phasing techniques; allow greater parameter tuning of MR stage; Transferable infrastructure: framework developed to support 20,000 CPU-hour computation with 10 GB of data,100,000 invocations of a scientific application, and the consequent results filtering, aggregation, and analysis can be re-used for other applications.

slide-4
SLIDE 4

Traditional Molecular Replacement

  • ne, or maybe more

carefully selected search model

Target Data

0.1 CPUh

10-20 Solutions

Internal Validation

Hit

+ Refinement Validation

0.2 0.4 0.6 0.8 1 2 3 4 5 0.225 0.450 0.675 0.900 1 2 3 4 5

slide-5
SLIDE 5

Global Molecular Replacement

95,000 carefully edited search models

Target Data

~50K Solutions

External Validation

Hit

+ Refinement Validation

9500 CPUh

Score Individual Models

slide-6
SLIDE 6

Small Physical Differences, Big Impact On Results TARGET MODEL A MODEL B MODEL C

differences in loops, and shifts of the secondary structure elements degrade results

slide-7
SLIDE 7
  • Would global search work? What are the

boundaries of global search method?

  • What is the best scoring function?
  • Is MR Score related to RMSD/Sequence

Identity of target molecule

  • Real Life example
slide-8
SLIDE 8

Target I: 2VLJ

α12 β2m α3 Vα Vβ

influenza-virus matrix peptide

presentation of the peptide by the major Histocompatibility Complex (MHC) molecule

(2 Immunoglobulin Domains + peptide binding domain)

T cell receptor

(4 Immunoglobulin Domains)

slide-9
SLIDE 9

α12 β2m α3 Vα Vβ

SCOP d.19.1.1 - MHC antigen recognition domain, 568 domains SCOP b.1.1.1 - antibody variable domain-like, 2001 domains SCOP b.1.1.2 - antibody constant domain-like, 2535 domains

1 x MHC domain 6 x Ig domain +

Molecular Weight of the complex: 94.495 kDa

~22% by MW

~12.5% by MW

slide-10
SLIDE 10

α12 β2m α3 Vα Vβ Search with 95K SCOP models

5 min timeout 2000 CPU cores on OSG 24h

Selection Criteria:

  • a multidomain protein
  • wide range of models

Phaser - round I

Bjorkman et al. Structure of the human class I histocompatibility antigen, HLA-

  • A2. Nature (1987) vol. 329 (6139) pp. 506-12

Garboczi et al. Structure of the complex between human T

  • cell receptor, viral

peptide and HLA-A2. Nature (1996) vol. 384 (6605) pp. 134-41

slide-11
SLIDE 11

2vlj

Top Scoring Solution: 1im3a2

1im3a2

100%, 181aa

(TFZ=13,LLG=92)

2D representation of MR results

R factor (weak predictor): TFZ (good predictor) LLG (strongest predictor)

α12 domains SCOP class: d.19.1.1

negative positive

slide-12
SLIDE 12

α12 β2m α3 Vα Vβ

Phaser - round II

Repeat MR search with the 95K SCOP dataset Fix the α12 domain

5 min timeout 2000 CPU cores on OSG 24h 13% 14% 31% 18% 22%

slide-13
SLIDE 13

RFZ LLG

1ogad1

(7.9,43)

1ogae1

(6.8/37)

HSLUV PROTEASE-CHAPERONE COMPLEX

Two solutions for Ig domains from TCR

R factor above 55

A B C A B C

1g3iv_

(5.4,46)

false positive

Quick Refinement:

slide-14
SLIDE 14

A B E D

Domain A12 placed, searching for next domain

1kgce2

(19.2,220)

99.2%, 129 aa

1ogad1

100%, 115aa

1ogae1

100%, 114aa

slide-15
SLIDE 15

Refinement 3 cycles of Rigid Body

rigid

three domains added 42.26/43.74 40.78/42.75

slide-16
SLIDE 16

4 domains placed, searching for 3 remaining domains

b.1.1.2 b.1.1.1

#1: 1agdb - 100% B2M, 99aa #2: 2bnra1 - 100% A3, 95aa #3: 1kgcd2 - 100% D2, 89aa

D2 A3 B2M

#1 #2 #3

Top 280 solutions with B2M SCOP domains Highest Scoring TCR D2 ranks as #345

slide-17
SLIDE 17

Refinement 3 cycles of Rigid Body

rigid

three domains added 42.26/43.74 40.78/42.75 32.23/34.95

Solved!

slide-18
SLIDE 18
  • Would global search work? What are the

boundaries of global search method?

  • What is the best MR scoring function?
  • Is MR Score related to RMSD/Sequence

Identity of target molecule

  • Real Life example
slide-19
SLIDE 19

Least Squares: commonly used for molecular replacement model quality measure select model with minimum error between observed amplitudes |FO| and calculated amplitudes |FC| Problem: Implicitly biased towards model to select h (structure parameters) based on model phasing

difference between scalar amplitudes magnitude of vector difference

Common approach to molecular replacement: Least Squares match

  • bservations

parametric model to fit to observations

Iterative Convergence: Rotate search model (3D RF) then translate (3D TF) to find best (lowest) least squares fit

real-space equivalent

Solution Quality: Typically measured by heuristic score, or residual factor (measure of agreement between solution and experimental observations)

slide-20
SLIDE 20

Phaser

(maximum likelihood)

positive negative

Clear separation between two populations!

Molrep

(Crowther rotation + FFT in reciprocal space)

TFZ LLG

Phaser performs better (although more CPU demanding)

Fast and slow searches return comparable results

slide-21
SLIDE 21

TFZ LLG Extended range of correct solutions! α12

extended: TZF> 4 traditional TFZ region extended TFZ/LLG Region traditional: TZF > 7

2ak4f2 80% 2mhac2 72% 1mhca2 60%, B=24 2nx5q2 60%, B=44

slide-22
SLIDE 22

Rotation Function Score

LLG heat

MHC molecules

slide-23
SLIDE 23
  • Would global search work? What are the

boundaries of global search method?

  • What is the best MR scoring function?
  • Is MR Score related to RMSD/Sequence

Identity of target molecule

  • Real Life example
slide-24
SLIDE 24

Search for the first molecule:

MHC

Ig Ig

MHC

With small fraction of target (~22%) sequence identity > 60% (rmsd < 1.5) required For Ig domains (~12%) even 100% is barely sufficient

Seq ID heat

slide-25
SLIDE 25

2vlj

A B C D Differences between A12 solutions

1mhca2

(6,49)

d2fsea2

(3.1/14)

1zagb2

(4.8,31)

1im3a2

(13,92)

SCOP ID (TFZ/LLG)

100% 64%, C

2nx5q2

(3.6,51) 84.8%

37.1%, W 14.7%

slide-26
SLIDE 26

Structure Superimposition TARGET MODEL A MODEL B MODEL C

differences in loops, and shifts of the secondary structure elements degrade results

slide-27
SLIDE 27

Ig Domains variable and constant

LLG Seq ID LLG RMSD

slide-28
SLIDE 28
  • Would global search work? What are the

boundaries of global search method?

  • What is the best MR scoring function?
  • Is MR Score related to RMSD/Sequence

Identity of target molecule

  • Real Life example
slide-29
SLIDE 29

72% Solvent

slide-30
SLIDE 30

Sequence Identity < 20% 3 cycles of refinement in Phenix shift secondary structure elements and lower Rfac to 43%

slide-31
SLIDE 31
slide-32
SLIDE 32
  • NEBioGrid Django Portal

Interactive dynamic web portal for workflow definition, submission, monitoring, and access control

  • NEBioGrid Web Portal

GridSite based web portal for file-system level access (raw job output), meta-data tagging, X.509 access control/sharing, CGI

  • PyCCP4

Python wrappers around CCP4 structural biology applications

  • PyCondor

Python wrappers around common Condor operations enhanced Condor log analysis

  • PyOSG

Python wrappers around common OSG

  • perations
  • PyGACL

Python representation of GACL model and API to work with GACL files

  • osg_wrap

Swiss army knife OSG wrapper script to handle file staging, parameter sweep, DAG, results aggregation, monitoring

  • sbanalysis

data analysis and graphing tools for structural biology data sets

  • osg.monitoring

tools to enhance monitoring of job set and remote OSG site status

  • shex

Write bash scripts in Python: replicate commands, syntax, behavior

  • xconfig

Universal configuration

slide-33
SLIDE 33

Example Job Set

1077 662 1173 840 47 76 5292 17 52 349 1409 1159 421 237 4 12 628 190 720 407 1657

UNL FNAL MIT HMS Caltech UCR

20 60

Purdue

20

Buffalo

3

Cornell

3 6 24

ND

316 1216 248

SPRACE

120

UWisc

47 79 39

RENCI

10k grid jobs approx 30k CPU hours 99.7% success rate 24 wall clock hours

held - orange evicted - red completed - green running remote queue local queue 10,000 jobs 24 hours

slide-34
SLIDE 34

Job Lifelines

slide-35
SLIDE 35

Typical Layered Environment

  • Command line application (e.g. Fortran)
  • Friendly application API wrapper
  • Batch execution wrapper for N-iterations
  • Results extraction and aggregation
  • Grid job management wrapper
  • Web interface
  • forms, views, static HTML results
  • GOAL eliminate shell scripts
  • ften found as “glue” language between layers

Python API Fortran bin Multi-exec wrapper Result aggregator Grid management Web interface

Map- Reduce

slide-36
SLIDE 36

Acknowledgements

Piotr Sliz PI and SBGrid team leader Peter Doherty Grid Administrator Ian Levesque Systems Architect Ben Eisenbraun Software Curator Steve Jahl System Administrator http://abitibi.sbgrid.org http://www.nebiogrid.org

slide-37
SLIDE 37

The End

slide-38
SLIDE 38

for later

slide-39
SLIDE 39

data file

low-scoring hit for domain 1 high-scoring hit for domain 1 low-scoring hit for domain .. low-scoring hit for domain n high-scoring hit for domain .. high-scoring hit for domain n

+ + + + Molecular Replacement Refinement in Phenix Final Model LS Final Model HS Structure Determination Strategy:

slide-40
SLIDE 40

2vlj

Top Scoring Solution: 1im3a2

color all Ig domains

2D representation of MR results

R factor (bad predictor): TFZ (good predictor) LLG (good predictor)

α12 domains SCOP: d.19.1.1

  • wide TFZ range of solutions (from 3.5 to 14) which overlaps

with missed searches LLG score does not overlap with failed searches

  • both TFZ and LLG scores predict the most likely MR candidate

negative positive

2nx5q2

(3.6,51) 80.7%

slide-41
SLIDE 41

1qsee1

(19.8,209)

1vgkb1

(16.40,193)

1e4xl1

(9.0,147)

Rfac=49.94

4lvea

(4.2,146)

Rfac=49.09 Rfac=50.82

*Refined Rfac, Phenix

A B E D

77% 100% 19% 27%

*Pairwise Identity, Geneious 4.8.0

Domains A12 placed, searching for next domain

Rfac=50.66

slide-42
SLIDE 42

Translation Function Z Score A B C D

3 cycles of refinement in Phenix Rigid Body + ADP

slide-43
SLIDE 43

Discriminating Solutions RFZ TFZ