SASBDB Small Angle Scattering Biological Data Bank Erica Valentini - - PowerPoint PPT Presentation

sasbdb
SMART_READER_LITE
LIVE PREVIEW

SASBDB Small Angle Scattering Biological Data Bank Erica Valentini - - PowerPoint PPT Presentation

SASBDB Small Angle Scattering Biological Data Bank Erica Valentini Dmitri Svergun group Solution Scattering from biological macromolecules EMBO course 2014 Index 1. Introduction: What is SAS? Do we need a SAS database? 2. SASBDB:


slide-1
SLIDE 1

SASBDB Small Angle Scattering Biological Data Bank

Erica Valentini

Dmitri Svergun group Solution Scattering from biological macromolecules EMBO course 2014

slide-2
SLIDE 2

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

2 SAS EMBO Course 2014 11/2/2014

slide-3
SLIDE 3

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

3 SAS EMBO Course 2014 11/2/2014

slide-4
SLIDE 4

What is SAS?

SAS Experiment

2θ s

|s| = 4π sinθ/λ s scattering vector 2θ scattering angle λ wavelength I(s) intensity

X-ray/Neutron beam Low resolution Model ATSAS

Scattering Intensity, Log I(s)

4 SAS EMBO Course 2014 11/2/2014

slide-5
SLIDE 5

What is SAS?

ATSAS Package

Rg MM Dmax Volume Shape Rigid body modelling Missing fragments Oligomeric mixtures Flexible System

5 SAS EMBO Course 2014 11/2/2014

slide-6
SLIDE 6

Do we need a SAS DB?

SA(X)S advantages

Increasing popularity of SAXS

Solution Broad size range New developments in software and hardware

From few kDa to GDa Fast experiments: μ

  • r m seconds.

Small amount of sample: 5-30 μl. Monitor alteration in environmental conditions.

6 SAS EMBO Course 2014 11/2/2014

slide-7
SLIDE 7

Do we need a SAS DB?

SAS database motivations

7 SAS EMBO Course 2014

  • Increasing number of

publications about SAS and the ATSAS package.

  • Increasing amount of

data collected with a single experiment.

  • Importance of making

the data underlying scientific publications available for the community.

Graewert, M. a and Svergun, D.I. (2013) Impact and progress in small and wide angle X-ray scattering (SAXS and WAXS). Curr. Opin. Struct. Biol., 23, 748–54. Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59. Collins, F.S. and Tabak, L. a (2014) Policy: NIH plans to enhance reproducibility. Nature, 505, 612–3.

50 100 150 200 250 300 350 400 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Number of publications referring to biological SAS ATSAS bioSAS

. 11/2/2014

slide-8
SLIDE 8

Do we need a SAS DB?

wwPDB SAS task force

SAS EMBO Course 2014 8 Trewhella, J., Hendrickson, W.A., Kleywegt, G.J., Sali, A., Sato, M., Schwede, T., Svergun, D.I., Tainer, J.A., Westbrook, J. and Berman, H.M. (2013) Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the

  • PDB. Structure, 21, 875–881.

“…a global repository is needed that holds standard format X-ray and neutron SAS data that is searchable and freely accessible for download” Database and small angle scattering experts

SASBDB

11/2/2014

slide-9
SLIDE 9

Do we need a SAS DB? Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement Primary data used to calculate the models Scattering curves from 20.000 pdb structures Models and possibility to deposit SAS data. SAXS data and models Complete search, cross-references to other databases, quality check on data Scattering curves and ensembles models from disordered proteins SAS data and models from “not disordered proteins”

9 SAS EMBO Course 2014 11/2/2014

slide-10
SLIDE 10

Do we need a SAS DB? Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement Primary data used to calculate the models Scattering curves from 20.000 pdb structures Models and possibility to deposit SAS data. SAXS data and models Complete search, cross-references to other databases, quality check on data Scattering curves and ensembles models from disordered proteins SAS data and models from “not disordered proteins”

10 SAS EMBO Course 2014 Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3. 11/2/2014

slide-11
SLIDE 11

Do we need a SAS DB? Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement Primary data used to calculate the models Scattering curves from 20.000 pdb structures Models and possibility to deposit SAS data. SAXS data and models Complete search, cross-references to other databases, quality check on data Scattering curves and ensembles models from disordered proteins SAS data and models from “not disordered proteins”

11 SAS EMBO Course 2014

dara.embl-hamburg.de

Sokolova, A. V, Volkov, V. and Svergun, D. I. (2003) Prototype of a database for rapid protein classification based on solution scattering data. Conference papers classification based on solution scattering data. 1, 865–868. 11/2/2014

slide-12
SLIDE 12

Do we need a SAS DB? Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement Primary data used to calculate the models Scattering curves from 20.000 pdb structures Models and possibility to deposit SAS data. SAXS data and models Complete search, cross-references to other databases, quality check on data Scattering curves and ensembles models from disordered proteins SAS data and models from “not disordered proteins”

12 SAS EMBO Course 2014 Hura, G.L., Menon, A.L., Hammel, M., Rambo, R.P., Poole, F.L., Tsutakawa, S.E., Jenney, F.E., Classen, S., Frankel, K. a, Hopkins, R.C., et al. (2009) Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nat. Methods, 6, 606– 12. 11/2/2014

slide-13
SLIDE 13

Do we need a SAS DB? Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement Primary data used to calculate the models Scattering curves from 20.000 pdb structures Models and possibility to deposit SAS data. SAXS data and models Complete search, cross-references to other databases, quality check on data Scattering curves and ensembles models from disordered proteins SAS data and models from “not disordered proteins”

13 SAS EMBO Course 2014 Varadi, M., Kosol, S., Lebrun, P., Valentini, E., Blackledge, M., Dunker, a K., Felli, I.C., Forman-Kay, J.D., Kriwacki, R.W., Pierattelli, R., et al. (2014) pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res., 42, D326–35. 11/2/2014

slide-14
SLIDE 14

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

14 SAS EMBO Course 2014 11/2/2014

slide-15
SLIDE 15

SASBDB features:

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

SAS EMBO Course 2014 15 11/2/2014

slide-16
SLIDE 16
  • 1. Entries

SAS EMBO Course 2014 16

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

www.sasbdb.org

slide-17
SLIDE 17
  • 1. Entries

SAS EMBO Course 2014 17

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-18
SLIDE 18
  • 2. Cross links

SAS EMBO Course 2014 18

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-19
SLIDE 19
  • 3. Searching
  • 1. Simple search:

SAS EMBO Course 2014 19

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-20
SLIDE 20
  • 3. Searching
  • 1. Simple search:

SAS EMBO Course 2014 20

  • 2. Advanced search:
  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-21
SLIDE 21
  • 3. Searching

SAS EMBO Course 2014 21

Browsing unit

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-22
SLIDE 22
  • 4. Browsing

SAS EMBO Course 2014 22

Scattering curve Model Kratky plot Experiment information Publication Structural parameters Unique code format: SASXXXN

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-23
SLIDE 23
  • 4. Browsing

SAS EMBO Course 2014 23

Chronological

  • rder

Browse according to the selected field

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-24
SLIDE 24
  • 5. Benchmark

SAS EMBO Course 2014 24

Benchmark

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-25
SLIDE 25
  • 5. Benchmark

SAS EMBO Course 2014 25

  • 17 Entries from a

set of 14 “standard proteins”

  • SAXS and WAXS

data

  • Extra purification

steps

  • Benchmark for

algorithm testing proposes

  • Dissemination

Dissemination

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-26
SLIDE 26
  • 6. Plots

SAS EMBO Course 2014 26

Scattering plot Guinier region Kratky plot P(r) distribution

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-27
SLIDE 27

vRadius of Gyration Maximum Distance MWs & Porod Volume vRadius of Gyration

27 SAS EMBO Course 2014

  • 6. Plots
  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-28
SLIDE 28

Fitting 1 Model 1 Fitting 2 Model 2

28 SAS EMBO Course 2014

  • 7. Interactivity
  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-29
SLIDE 29

Fitting 3 Model 1 Model 2 Model 3

29 SAS EMBO Course 2014

Model 4

  • 7. Interactivity
  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-30
SLIDE 30

Experimental details Molecule details

30 SAS EMBO Course 2014

  • 7. Interactivity
  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-31
SLIDE 31
  • 8. Availability

SAS EMBO Course 2014 31

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-32
SLIDE 32
  • 8. Availability

SAS EMBO Course 2014 32

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-33
SLIDE 33
  • 8. Availability
  • Possibility to log

in using ATSAS account

  • Submission form
  • Users can choose

between:

– “on hold” – “public”

33 SAS EMBO Course 2014

  • 1. Entries
  • 2. Cross links
  • 3. Searching
  • 4. Browsing
  • 5. Benchmark
  • 6. Plots
  • 7. Interactivity
  • 8. Availability

11/2/2014

slide-34
SLIDE 34

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

34 SAS EMBO Course 2014 11/2/2014

slide-35
SLIDE 35

SASBDB Usage

SAS EMBO Course 2014 35

More than 500 users from August 2014 We are currently monitoring also search items and number of downloads

11/2/2014

slide-36
SLIDE 36

SASBDB Usage: use cases

11/2/2014 SAS EMBO Course 2014 36

SAS user SAS novice Article referee

slide-37
SLIDE 37

11/2/2014 SAS EMBO Course 2014 37

SASBDB Usage: use cases

slide-38
SLIDE 38

11/2/2014 SAS EMBO Course 2014 38

SASBDB Usage: use cases

slide-39
SLIDE 39

11/2/2014 SAS EMBO Course 2014 39

SASBDB Usage: use cases

slide-40
SLIDE 40

11/2/2014 40

SASBDB Usage: use cases

slide-41
SLIDE 41

11/2/2014 41

SASBDB Usage: use cases

slide-42
SLIDE 42

11/2/2014 42

SASBDB Usage: use cases

SAS EMBO Course 2014

slide-43
SLIDE 43

11/2/2014 43

SASBDB Usage: use cases

slide-44
SLIDE 44

11/2/2014 44

SASBDB Usage: use cases

SAS EMBO Course 2014

slide-45
SLIDE 45

11/2/2014 45

SASBDB Usage: use cases

SAS EMBO Course 2014

slide-46
SLIDE 46

11/2/2014 46

SASBDB Usage: use cases

SAS EMBO Course 2014

slide-47
SLIDE 47

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

47 SAS EMBO Course 2014 11/2/2014

slide-48
SLIDE 48

SASBDB Quality check:

Difference Rg (Guinier) and Rg (p(r))

11/2/2014 SAS EMBO Course 2014 48

A B

slide-49
SLIDE 49

SASBDB Quality check:

Difference Rg (Guinier) and Rg (p(r))

11/2/2014 SAS EMBO Course 2014 49

A B

slide-50
SLIDE 50

SASBDB Quality check:

Difference MW (expected) and MW (experimental)

11/2/2014 SAS EMBO Course 2014 50

A B

slide-51
SLIDE 51

SASBDB Quality check:

Quality p(r) distribution

11/2/2014 SAS EMBO Course 2014 51

A B

slide-52
SLIDE 52

SASBDB Quality check:

Quality Guinier region

11/2/2014 SAS EMBO Course 2014 52

A B

slide-53
SLIDE 53

SASBDB Quality check:

Quality of the fit

11/2/2014 SAS EMBO Course 2014 53

A B

slide-54
SLIDE 54

SASBDB Quality check:

Quality of the data

11/2/2014 SAS EMBO Course 2014 54

A B

slide-55
SLIDE 55

SASBDB Quality check:

Quality of the data

11/2/2014 SAS EMBO Course 2014 55

A B

slide-56
SLIDE 56

SASBDB Quality check:

11/2/2014 SAS EMBO Course 2014 56

A B A B

  • Difference between structural parameters
  • Quality of the Guinier region
  • Quality of the p(r) distribution
  • Discrepancy between expected and experimental MW
  • Overall quality of the data
  • Goodness of fit of the model

Quality score based on the comparison between the selected entry and all the other entries.

slide-57
SLIDE 57

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

57 SAS EMBO Course 2014 11/2/2014

slide-58
SLIDE 58

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 58 11/2/2014

slide-59
SLIDE 59

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 59 11/2/2014

Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3.

slide-60
SLIDE 60

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 60

Read, R.J., Adams, P.D., Arendall, W.B., III, Brunger, A.T., Emsley, P., Joosten, R.P., Kleywegt, G.J., Krissinel, E.B., Lutteke, T., Otwinowski, Z., Perrakis, A., Richardson, J.S., Sheffler, W.H., Smith, J.L., Tickle, I.J., Vriend, G., Zwart, P.H.. (2011) A new generation of crystallographic validation tools for the Protein Data Bank. Structure 19: 1395-1412.

11/2/2014

slide-61
SLIDE 61

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 61

Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys.

  • Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59.

11/2/2014

slide-62
SLIDE 62

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 62

Konarev, P. and Svergun, D.I. (2014) Submitted.

11/2/2014

slide-63
SLIDE 63

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 63

Franke, D., Jeffries, C.M. and Svergun, D.I. (2014) Submitted.

11/2/2014

slide-64
SLIDE 64

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 64

Tuukkanen, A. and Svergun, D.I. (2015) In preparation.

11/2/2014

slide-65
SLIDE 65

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 65

Malfois, M. and Svergun, D.I. (2000) sasCIF: an extension of core Crystallographic Information File for SAS. J. Appl. Crystallogr., 33, 812–816.

11/2/2014

slide-66
SLIDE 66

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare values Assessment

  • f the

angular range Difference between curves Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 66

Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M. & Westbrook, J. D. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Cryst. D60, 1833-1839.

11/2/2014

slide-67
SLIDE 67

Index

  • 1. Introduction:

– What is SAS? – Do we need a SAS database?

  • 2. SASBDB:

– Features – Usage – Quality check – Missing

  • 3. Conclusions

67 SAS EMBO Course 2014 11/2/2014

slide-68
SLIDE 68

SASBDB: Conclusions

  • With 100 entries and 163 models SASBDB is currently the

largest repository of SAS data available.

  • Entirely browsable according to different criteria.
  • Highly flexible search.
  • Embedded Javascript to display interactive 3D models.
  • Set of SAXS and WAXS data from “standard proteins”.
  • Cross links to other biological databases.
  • Aimed at different types of users
  • Several validation methods under development.
  • Development of the standard format: sasCIF.
  • Network of interconnected SAS databases.
  • Paper about SASBDB in N.A.R. 2015 Database issue.

68 SAS EMBO Course 2014 11/2/2014

slide-69
SLIDE 69

Thanks for your attention!

69 SAS EMBO Course 2014 11/2/2014