A system for automated data analysis and interpretation for - - PowerPoint PPT Presentation
A system for automated data analysis and interpretation for - - PowerPoint PPT Presentation
A system for automated data analysis and interpretation for biological solution SAXS Maxim Petoukhov EMBL, Hamburg Outstation Outline Introduction Concept of the integrated system Input & output Examples Summary
Outline
- Introduction
- Concept of the integrated
system
- Input & output
- Examples
- Summary & Outlook
SAXS: State of the Art
- Brilliant sources for rapid data take
and novel methods for data analysis
- A rapid increase in the biological
users community; active training
- Automation, remote access, high
throughput data reduction
- Active use in multidisciplinary
projects
- IT: on-line services, pipelines, databases
Hardware- independent analysis block Year
1999 2001 2003 2005 2007 2009Proposals/Groups, X33, EMBL Hamburg
50 100 150 50 100 150 Total N proposals Biomolecular solutions Number of groupsAutomated SAXS pipeline
Data acquisition robot, data normalization, reduction and XML log file generation Data processing, computation
- f overall parameters
Database search, ab initio model building and XML- summary file generation Hardware- independent analysis block Advanced 3D modelling?
Data Analysis Expert System for Small Angles: DANESSA
Bioinformatics Web Services Software Tools for SAS Data Analysis Set of Processed (and Optionally Ranked) Scattering Curves Facilitates routine tasks and enables high throughput SAXS studies
Employed external services
Protein Data Bank
- Primary sequences of
macromolecules
- and their atomic coordinates
EMBL-EBI
- Sequence alignment
- Annotation by structure
- Macromolecular interfaces
The optimal threshold as a compromise between the number of clusters and the averaged spread within a cluster
Integrated ATSAS software components
- DATPOROD – automated
calculation of the excluded volume and molecular mass estimate
- DAMMIF – ab initio shape
determination by simulated annealing using bead model
- DAMCLUST – clustering of
multiple 3D models (assessment of multimodality) based on discrepancies between the models
s, nm-1
0.5 1.0 1.5 2.0lg I, relative
8 9 10 11Integrated ATSAS software components
- CRYSOL – Evaluation of X-
ray Solution Scattering Curves from Atomic Models
- SASREF – Rigid body
modelling of multi-component particles against solution scattering data
- BUNCH – Modelling of
multidomain proteins and their deletion mutants
- OLIGOMER – Quantitative
analysis of equilibrium mixtures
∑
=
k k k
s I v s I ) ( ) (
Concepts
- Object
- Individual sequence contributing to one or several samples
- generally ≠ sample, ≠ individual atomic model
- Project
- contains a number of curves (samples)
- and the set of corresponding objects
- Samples
- A
- B
- A1
- A+B
- A1+B
- A+2B
- A1+2B
A1 B A2 A B
- Objects
– A – B – A1 Generic project
Minimalistic Input
- List of Objects (sequences)
– Sequence A – … – Sequence K
- List of Scattering Profiles
– Curve 1 – … – Curve N
- Cross-correlation table
with molar ratios
GSGVPSRVI H I RKLPI DVTE GEVI SLGLPF GKVTNLLMLK GKNQAFI EMN TEEAANTMV YYTSVTPVLR GQPI YI QFSN HKELKTDSSPNQARAQAALQ AVNSVQSGNL ALAASAAAVD 4.138455E-02 5.904029 1.555333E-01 4.371607E-02 5.652469 1.527037E-01 4.604759E-02 5.533381 1.521723E-01 4.837912E-02 5.547052 1.474577E-01 5.071064E-02 5.296281 1.436712E-01 …
Object A Object B Object C Object D Object E Curve 1
1 1 1
Curve 2
1 1
Curve 3
1 1 1 2 2
Case-independent actions: bioinformatics analysis
MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG EAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGD HIDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFEILERR KIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG
Case-independent actions: oligomerization assessment
MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGMSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELG
MWexpected MWexperimental
A B A:B 2:1
Case-independent actions: ab initio modelling
- Multiple bead modelling runs for each sample in P1
- For non-monomeric states additional reconstructions with
appropriate symmetries (e.g. P222 and P4 for tetramers)
- Clustering of independent reconstructions
- Averaged volumes are determined
- Bound vs Dissociated
- Modular protein vs Assembly with no flexible parts
- Deletion mutants vs single curve fitting
Selecting Scenario
+ ≥ ?
Scenario-based modelling
- Dissociation
- Composition analysis with OLIGOMER
- Other curves or PDB files to evaluate formfactors
- Proteins with linkers
- Modelling with BUNCH
- Combining the curves from the same family (simultaneous fitting of deletion
mutants)
- Single domain (possibly in various conditions)
- Validation/Identification of biologically active oligomers by CRYSOL
- Modelling of quaternary structure by SASREF with symmetry restraints
- Multisubunit complex(es) / multidomain proteins with no gaps
- Global rigid body modelling with SASREF
- Accounting for assembly predictions of individual subunits (PISA)
- Combining multiple curves where applicable
- Switching / mixing of scenarios possible
Results output
- All 3D modelling attempts are performed
multiple times
- Non-uniqueness for each type of
reconstruction is assessed by clustering
- The results are stored in an SQLite
database for easy retrieval
Case studies
- Binary complex
- Dissociation
- Oligomeric equilibrium
- Modular protein
- Quaternary structure of multimer
Examples: binary complex
internalin (listeria monocytogenes) / e-cadherin (human)
Collaboration: H.Niemann (Braunschweig)
Examples: two-component mixture
Internalin Met receptor (semaphorin domain) +
Mixture
Fit by two scattering intensities from atomic models Fit by linear combination of two experimental profiles
Collaboration: H.Niemann (Braunschweig) and E.Gherardi (Cambridge)
Examples: oligomeric equilibrium
dimer monomer mixture
H(C) fragment
- f Tetanus toxin
- O. Qazi, B. Bolgiano, D. Crane, D.I. Svergun, P.V. Konarev, Z.-P. Yao,
C.V. Robinson, K.A. Brown and N. Fairweather (2007). JMB 365, 123-34.
Overlap of the typical ab initio and rigid body models Petoukhov, M. V., Monie, T. P., Allain, F. H., Matthews, S., Curry, S., and Svergun, D. I. (2006). Structure 14, 1021-1027.
Examples: modular protein & deletion mutants
Polypyrimidine tract binding protein (PTB)
Single vs multiple curves fitting
Examples: modular protein & deletion mutants
Examples: modular protein & deletion mutants
Data Tool Multifit ? Symmetry PISA? Identifier Chi ptb_ab_a31c.dat bunch single P1 ptb_ab_a31/bunch_01_P1- 12/bun-10 1.27 ptb_bc_a54c.dat bunch single P1 ptb_bc_a54/bunch_01_P1- 23/bun-08 1.02 ptb_cd_a18c.dat bunch single P1 ptb_cd_a18/bunch_01_P1- 34/bun-08 0.99 ptb123c.dat bunch single P1 ptb123/bunch_01_P1-123m/bun- 01 0.99 ptb123c.dat bunch single P2 ptb123/bunch_02_P2-123m/bun- 02 1.08 ptb_bcd_a13c.dat bunch single P1 ptb_bcd_a13/bunch_01_P1- 234/bun-10 1.02 ptb_del_a39c.dat bunch single P1 ptb_del_a39/bunch_01_P1- delm/bun-02 1.1 ptb123c.dat bunch multi P1 ptb123-multi/bunch_01_P1- 123m/bun-08 1.05 ptb_bcd_a13c.dat bunch multi P1 ptb_bcd_a13- multi/bunch_01_P1-234/bun-04 1.07 ptb_del_a39c.dat bunch multi P1 ptb_del_a39-multi/bunch_01_P1- delm/bun-02 1.25
Examples: multimer
Tetrameric glucose isomerase
E.Mylonas, EMBL-HH
Conclusions
- A working prototype of the
integrated system for automated SAXS data analysis and 3D model building is created
- An Atsas-online Web portal for
remote access is provided
- Minimal information is required
from the User
- Up-to-date programs from ATSAS
package and Web-based bioinformatics tools are employed
- DANESSA liberates one from a
routine work but not (yet) from the need for thinking
- Course Organizers
- BioSAXS Group
@EMBL-Hamburg
- Collaborations:
- S. Curry (Imperial College, London, UK)
- E. Gherardi (Medical Research Council Centre, Cambridge, UK)
- H. Niemann (GBF, Braunschweig, Germany)
- K. Brown (Imperial College, London)