A system for automated data analysis and interpretation for - - PowerPoint PPT Presentation

a system for automated data analysis and interpretation
SMART_READER_LITE
LIVE PREVIEW

A system for automated data analysis and interpretation for - - PowerPoint PPT Presentation

A system for automated data analysis and interpretation for biological solution SAXS Maxim Petoukhov EMBL, Hamburg Outstation Outline Introduction Concept of the integrated system Input & output Examples Summary


slide-1
SLIDE 1

A system for automated data analysis and interpretation for biological solution SAXS

Maxim Petoukhov EMBL, Hamburg Outstation

slide-2
SLIDE 2

Outline

  • Introduction
  • Concept of the integrated

system

  • Input & output
  • Examples
  • Summary & Outlook
slide-3
SLIDE 3

SAXS: State of the Art

  • Brilliant sources for rapid data take

and novel methods for data analysis

  • A rapid increase in the biological

users community; active training

  • Automation, remote access, high

throughput data reduction

  • Active use in multidisciplinary

projects

  • IT: on-line services, pipelines, databases

Hardware- independent analysis block Year

1999 2001 2003 2005 2007 2009

Proposals/Groups, X33, EMBL Hamburg

50 100 150 50 100 150 Total N proposals Biomolecular solutions Number of groups
slide-4
SLIDE 4

Automated SAXS pipeline

Data acquisition robot, data normalization, reduction and XML log file generation Data processing, computation

  • f overall parameters

Database search, ab initio model building and XML- summary file generation Hardware- independent analysis block Advanced 3D modelling?

slide-5
SLIDE 5

Data Analysis Expert System for Small Angles: DANESSA

Bioinformatics Web Services Software Tools for SAS Data Analysis Set of Processed (and Optionally Ranked) Scattering Curves Facilitates routine tasks and enables high throughput SAXS studies

slide-6
SLIDE 6

Employed external services

Protein Data Bank

  • Primary sequences of

macromolecules

  • and their atomic coordinates

EMBL-EBI

  • Sequence alignment
  • Annotation by structure
  • Macromolecular interfaces
slide-7
SLIDE 7

The optimal threshold as a compromise between the number of clusters and the averaged spread within a cluster

Integrated ATSAS software components

  • DATPOROD – automated

calculation of the excluded volume and molecular mass estimate

  • DAMMIF – ab initio shape

determination by simulated annealing using bead model

  • DAMCLUST – clustering of

multiple 3D models (assessment of multimodality) based on discrepancies between the models

slide-8
SLIDE 8

s, nm-1

0.5 1.0 1.5 2.0

lg I, relative

8 9 10 11

Integrated ATSAS software components

  • CRYSOL – Evaluation of X-

ray Solution Scattering Curves from Atomic Models

  • SASREF – Rigid body

modelling of multi-component particles against solution scattering data

  • BUNCH – Modelling of

multidomain proteins and their deletion mutants

  • OLIGOMER – Quantitative

analysis of equilibrium mixtures

=

k k k

s I v s I ) ( ) (

slide-9
SLIDE 9

Concepts

  • Object
  • Individual sequence contributing to one or several samples
  • generally ≠ sample, ≠ individual atomic model
  • Project
  • contains a number of curves (samples)
  • and the set of corresponding objects
  • Samples
  • A
  • B
  • A1
  • A+B
  • A1+B
  • A+2B
  • A1+2B

A1 B A2 A B

  • Objects

– A – B – A1 Generic project

slide-10
SLIDE 10

Minimalistic Input

  • List of Objects (sequences)

– Sequence A – … – Sequence K

  • List of Scattering Profiles

– Curve 1 – … – Curve N

  • Cross-correlation table

with molar ratios

GSGVPSRVI H I RKLPI DVTE GEVI SLGLPF GKVTNLLMLK GKNQAFI EMN TEEAANTMV YYTSVTPVLR GQPI YI QFSN HKELKTDSSPNQARAQAALQ AVNSVQSGNL ALAASAAAVD 4.138455E-02 5.904029 1.555333E-01 4.371607E-02 5.652469 1.527037E-01 4.604759E-02 5.533381 1.521723E-01 4.837912E-02 5.547052 1.474577E-01 5.071064E-02 5.296281 1.436712E-01 …

Object A Object B Object C Object D Object E Curve 1

1 1 1

Curve 2

1 1

Curve 3

1 1 1 2 2

slide-11
SLIDE 11

Case-independent actions: bioinformatics analysis

MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG EAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGD HIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR KIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG

slide-12
SLIDE 12

Case-independent actions: oligomerization assessment

MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG

MWexpected MWexperimental

A B A:B 2:1

slide-13
SLIDE 13

Case-independent actions: ab initio modelling

  • Multiple bead modelling runs for each sample in P1
  • For non-monomeric states additional reconstructions with

appropriate symmetries (e.g. P222 and P4 for tetramers)

  • Clustering of independent reconstructions
  • Averaged volumes are determined
slide-14
SLIDE 14 s, nm-1 0.5 1.0 1.5 2.0 lg I, relative 8 9 10 11
  • Bound vs Dissociated
  • Modular protein vs Assembly with no flexible parts
  • Deletion mutants vs single curve fitting

Selecting Scenario

+ ≥ ?

slide-15
SLIDE 15

Scenario-based modelling

  • Dissociation
  • Composition analysis with OLIGOMER
  • Other curves or PDB files to evaluate formfactors
  • Proteins with linkers
  • Modelling with BUNCH
  • Combining the curves from the same family (simultaneous fitting of deletion

mutants)

  • Single domain (possibly in various conditions)
  • Validation/Identification of biologically active oligomers by CRYSOL
  • Modelling of quaternary structure by SASREF with symmetry restraints
  • Multisubunit complex(es) / multidomain proteins with no gaps
  • Global rigid body modelling with SASREF
  • Accounting for assembly predictions of individual subunits (PISA)
  • Combining multiple curves where applicable
  • Switching / mixing of scenarios possible
slide-16
SLIDE 16

Results output

  • All 3D modelling attempts are performed

multiple times

  • Non-uniqueness for each type of

reconstruction is assessed by clustering

  • The results are stored in an SQLite

database for easy retrieval

slide-17
SLIDE 17

Case studies

  • Binary complex
  • Dissociation
  • Oligomeric equilibrium
  • Modular protein
  • Quaternary structure of multimer
slide-18
SLIDE 18

Examples: binary complex

internalin (listeria monocytogenes) / e-cadherin (human)

Collaboration: H.Niemann (Braunschweig)

slide-19
SLIDE 19

Examples: two-component mixture

Internalin Met receptor (semaphorin domain) +

Mixture

Fit by two scattering intensities from atomic models Fit by linear combination of two experimental profiles

Collaboration: H.Niemann (Braunschweig) and E.Gherardi (Cambridge)

slide-20
SLIDE 20

Examples: oligomeric equilibrium

dimer monomer mixture

H(C) fragment

  • f Tetanus toxin
  • O. Qazi, B. Bolgiano, D. Crane, D.I. Svergun, P.V. Konarev, Z.-P. Yao,

C.V. Robinson, K.A. Brown and N. Fairweather (2007). JMB 365, 123-34.

slide-21
SLIDE 21

Overlap of the typical ab initio and rigid body models Petoukhov, M. V., Monie, T. P., Allain, F. H., Matthews, S., Curry, S., and Svergun, D. I. (2006). Structure 14, 1021-1027.

Examples: modular protein & deletion mutants

Polypyrimidine tract binding protein (PTB)

slide-22
SLIDE 22

Single vs multiple curves fitting

Examples: modular protein & deletion mutants

slide-23
SLIDE 23

Examples: modular protein & deletion mutants

Data Tool Multifit ? Symmetry PISA? Identifier Chi ptb_ab_a31c.dat bunch single P1 ptb_ab_a31/bunch_01_P1- 12/bun-10 1.27 ptb_bc_a54c.dat bunch single P1 ptb_bc_a54/bunch_01_P1- 23/bun-08 1.02 ptb_cd_a18c.dat bunch single P1 ptb_cd_a18/bunch_01_P1- 34/bun-08 0.99 ptb123c.dat bunch single P1 ptb123/bunch_01_P1-123m/bun- 01 0.99 ptb123c.dat bunch single P2 ptb123/bunch_02_P2-123m/bun- 02 1.08 ptb_bcd_a13c.dat bunch single P1 ptb_bcd_a13/bunch_01_P1- 234/bun-10 1.02 ptb_del_a39c.dat bunch single P1 ptb_del_a39/bunch_01_P1- delm/bun-02 1.1 ptb123c.dat bunch multi P1 ptb123-multi/bunch_01_P1- 123m/bun-08 1.05 ptb_bcd_a13c.dat bunch multi P1 ptb_bcd_a13- multi/bunch_01_P1-234/bun-04 1.07 ptb_del_a39c.dat bunch multi P1 ptb_del_a39-multi/bunch_01_P1- delm/bun-02 1.25

slide-24
SLIDE 24

Examples: multimer

Tetrameric glucose isomerase

E.Mylonas, EMBL-HH

slide-25
SLIDE 25

Conclusions

  • A working prototype of the

integrated system for automated SAXS data analysis and 3D model building is created

  • An Atsas-online Web portal for

remote access is provided

  • Minimal information is required

from the User

  • Up-to-date programs from ATSAS

package and Web-based bioinformatics tools are employed

  • DANESSA liberates one from a

routine work but not (yet) from the need for thinking

slide-26
SLIDE 26
  • Course Organizers
  • BioSAXS Group

@EMBL-Hamburg

  • Collaborations:
  • S. Curry (Imperial College, London, UK)
  • E. Gherardi (Medical Research Council Centre, Cambridge, UK)
  • H. Niemann (GBF, Braunschweig, Germany)
  • K. Brown (Imperial College, London)

Acknowledgements