Within Structural Bioinformatics Plant Bioinformatics, Systems and - PDF document

Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School University of Nottingham, UK July 2009 Eran Eyal Cancer Research Center Sheba Medical Center Tel Hashomer Israel 1 VAL LEU SER PRO ALA ASP LYS THR ASN VAL LYS ALA ALA TRP GLY LYS VAL GLY ALA HIS ALA GLY GLU TYR GLY ALA Sequence GLU ALA LEU GLU ARG MET PHE LEU SER PHE PRO THR THR LYS THR TYR PHE PRO HIS PHE ASP LEU SER HIS GLY SER ALA GLN VAL LYS GLY HIS GLY LYS LYS VAL ALA ASP ALA LEU THR ASN ALA VAL ALA HIS VAL ASP ASP MET PRO ASN ALA LEU SER ALA LEU SER ASP LEU HIS ALA HIS LYS LEU Structure Structure Function Dynamics Drug 2

Structural Bioinformatics • Databases of 3D structures of macromolecules • Structural alignment • Structural classification • Secondary structure prediction • Tertiary structure prediction • Molecular docking • Dynamics 3 •Databases •Structural alignment •Structural classification •Secondary structure prediction The structural data – where, what •Tertiary structure prediction •Molecular docking •Visualization •Dynamics Description of the databases How to explore and query the data Source of the data Quality of the data 4

http://www.wwpdb.org// http://www.rcsb.org/ The PDB database is the main repository for the processing and distribution of 3-D biological macromolecular structures Source of data: • Crystal structures • NMR models • Other 5 6

Data Source X-Ray Crystallography Interpret electron Clone/Express/Purify X-Ray diffraction data density map Crystallize + Solve phase problem Coordinates of atoms in protein molecule 7 Data Source NMR Spectroscopy NOESY experiment information about spatially-closed atoms J-couplings list of distance constraints + dihedral angles + … multiple models of protein structure Coordinates of atoms in protein molecule 8

Human thioredoxin structure determined by X-ray and NMR NMR X-ray (pdb 3trx) (pdb 1ert) superimposition 9 X-ray crystallography NMR Atomic resolution Good Reasonable Hydrogen Rarely determined Determined Molecule size No restriction Small proteins Dynamics Multi models Snapshot Membrane proteins Problematic Problematic Procedure long long 10

File Format Header section Coordinate section 11 How to search in the PDB? The OCA browser developed in the WIS by Jaime Prilusky is one of the best interfaces to the PDB. Entries can be retrieved by variety of criteria http://bip.weizmann.ac.il/oca-bin/ocamain 12

13 Problems in the PDB database • Missing data • Quality of data • Data is often not independent • Format problems – residue numbers 14

Diffraction pattern Brag Planes separation Poor Resolution 3Å resolution Good Resolution 2Å resolution 15 R-factor Original diffraction Model pattern Calculated diffraction R factor measures how different pattern based on the model is the originated diffraction map from a recalculated one based on a putative model 16

B-factor A measure of the uncertainty in the position of individual atoms TYROSINE- PROTEIN KINASE color by B-factor 17 •Databases •Databases •Structural alignment •Structural alignment •Structural classification •Structural classification •Secondary structure prediction •Secondary structure prediction Structural alignment •Tertiary structure prediction •Tertiary structure prediction •Molecular docking •Molecular docking •Visualization •Dynamics •Dynamics Structures are more conserved throughout evolution than sequences. Two homologous proteins have the same overall structure. It is possible that 2 proteins without detectable sequence similarity will have the same structure. In the twilight zone of sequence similarity, structural alignment might help to correctly determine the relations between 2 proteins Structural similarity is more sensitive method than sequence alignment to determine protein function 18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS 19 Kuttner et al,. 2003 20

Superposition � Structural alignment There are two types of problems related to structural comparison: • Superposition problem • Structural alignment problem In the superposition problem we know in advance the correspondence between the points in the two structures we want to align 21 What properties of the protein might be used to detect structural similarity to other proteins ? • sequence • Type and number of secondary structures (sheets, helices) • Structural arrangement of secondary structures • Structural attributes of individual amino acids • Distances between amino acids in the protein 22

The standard way to quantify similarity between molecules is to measure the positional deviation of the atoms - RMSD root mean square deviation RMSD = RMSD = Σ (X i1 -X i2 ) 2 +(Y i1 -Y i2 ) 2 +(Z i1 -Z i2 ) 2 N This method amplifies large deviation in local region of the protein 23 SSAP http://www.biochem.ucl.ac.uk/cgi-bin/cath/GetSsapRasmol.pl 24

• • • • • • • • • • • • • • • • The method includes 2 steps of dynamic programming. Initial step to obtain the score between each pair of amino acids, and second step in which the best overall alignment in the protein is determined 25 SARF2 http://123d.ncifcrf.gov/sarf2.html http://carten.gmd.de/ToPign.html An algorithm to find structural similarity based on comparison of secondary structures. As such it might be used to compare proteins only, and only proteins with minimal content of defined secondary structures 26

every secondary structure element is represented by a vector Single SSE does not give any information about the structure of the protein. Two SSEs or more are therefore required. 27 DALI : Search for common 3D-pattern of C α distance maps 3-helix-bundle pairwise 3D alignment http://www.ebi.ac.uk/dali/ 28

•Databases •Structural alignment Structural classification Structural classification •Structural classification •Secondary structure prediction •Tertiary structure prediction •Molecular docking •Visualization •Dynamics •Using structural alignment it is feasible to construct a classification system •Classification helps us understand relations between remote proteins •Convergence evolution in structures can often hint to the function of the protein 29 Classification databases databases Classification FSSP http://www2.ebi.ac.uk/dali/fssp/ CATH http://www.biochem.ucl.ac.uk/bsm/cath_n ew/index.html SCOP http://scop.mrc-lmb.cam.ac.uk/scop/ HOMSTRAD http://www- cryst.bioc.cam.ac.uk/data/align/ MMDB http://www.ncbi.nlm.nih.gov/Structure/MM DB/mmdb.shtml 3Dee http://www.compbio.dundee.ac.uk/3Dee/ CE http://cl.sdsc.edu/ce.html VAST http://www.ncbi.nlm.nih.gov/Structure/VA ST/vast.shtml SARF http://www- lmmb.ncifcrf.gov/~nicka/sarf2.html/ 30

CATH CATH http://www.cathdb.info/ Semi- -automatic! automatic! Semi Class – 2D composition – automatic. 4 classes: α , β , αβ , FSS: few 2D structures. Architecture – manual! Shape created by orientation of 2D units. Topology – secondary structures connectivity. Homologous superfamily – high structural and functional similarity. Sequence similarity 31 CATH of 1hho CATH of 1hho 32

SCOP – – Structural Structural SCOP Classification of Proteins Classification of Proteins Manual inspection of automatic output 1. 2D content (class) 2. Structural similarity (fold) 3. Remote homology (superfamily/family). 4. Close homology (family) 33 •Databases •Databases •Structural alignment •Structural alignment •Structural classification •Structural classification Structure Prediction •Secondary structure prediction •Secondary structure prediction •Tertiary structure prediction •Tertiary structure prediction •Molecular docking •Molecular docking •Dynamics •Visualization •Dynamics A-C-H-Y-T-T-E-K-R-G-G-S-G-T-K-K-R-E-A H-H-H-H-H-H-H-H-O-O-O-O-O-S-S-S-S-S-S Secondary structure prediction Tertiary structure prediction 34

35 http://www.predictprotein.org/ 36

•Databases •Structural alignment Why make a structural model for •Structural classification •Secondary structure prediction •Tertiary structure prediction your protein ? •Molecular docking •Visualization •Dynamics • The structure can provide clues on the function • With a structure it is easier to guess the location of functional sites and to learn on the function • With a structure we can plan more precise experiments in the lab • We can do docking experiments (both with other proteins and with small molecules) 37 Building by homology (Homology modeling) alignment with proteins of known structure M A A G Y A Y G V L S - A T G F D - - V I D - A S G F E - - V V E - A K A Y L - - V L S structural model 38

Fold recognition (Threading) sequence: M A A G Y A V L S + known protein folds structural model 39 Ab initio sequence M A A G Y A V L S structural model 40

Within Structural Bioinformatics Plant Bioinformatics, Systems and - PDF document

Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School University of Nottingham, UK July 2009 Eran Eyal Cancer Research Center Sheba Medical Center Tel Hashomer Israel 1 VAL LEU SER PRO ALA ASP

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

A Workflow Enactment Portal for Bioinformatics Paolo Romano Bioinformatics and Structural

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Compressive Structural Bioinformatics: Large-scale analysis and visualization of the Protein Data

Structural Bioinformatics Davide Ba Staff Scientist Genome Biology Group (CNAG) Structural

Structural Bioinformatics Davide Ba Staff Scientist Genome Biology Group (CNAG) Structural

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Details of Protein Structure Function, evolution & experimental methods Thomas Blicher,

Choosing the Right 1. Diagnosis screening 2. Staging of disease Treatment Regimen 3.

10. Enterprise-wide Optimization 11. Batch Scheduling TOTAL (110 pts) 1. Biosystems Engineering

Clinical Program for Cervical Cancer Bulent ULKER MD International Medical Director F.

Protein Clustering: Parallelizing an Expensive, Irregular Computation James Larus EPFL AACBB

NMR Spectroscopy CH.EMBnet course 28.9.2004 Biozentrum, Basel D. Hussinger Overview 1. Basic

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES Robert Hirt Department of Zoology, The Natural

Specificity of Protein-DNA recognition of a long DNA binding motif Francisco Melo Ledermann EMBO

Sambuz

Useful Links

Newsletter

Mail Us

Within Structural Bioinformatics Plant Bioinformatics, Systems and - PDF document

Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School University of Nottingham, UK July 2009 Eran Eyal Cancer Research Center Sheba Medical Center Tel Hashomer Israel 1 VAL LEU SER PRO ALA ASP

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String &amp; Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

A Workflow Enactment Portal for Bioinformatics Paolo Romano Bioinformatics and Structural

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Compressive Structural Bioinformatics: Large-scale analysis and visualization of the Protein Data

Structural Bioinformatics Davide Ba Staff Scientist Genome Biology Group (CNAG) Structural

Structural Bioinformatics Davide Ba Staff Scientist Genome Biology Group (CNAG) Structural

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Details of Protein Structure Function, evolution &amp; experimental methods Thomas Blicher,

Choosing the Right 1. Diagnosis screening 2. Staging of disease Treatment Regimen 3.

10. Enterprise-wide Optimization 11. Batch Scheduling TOTAL (110 pts) 1. Biosystems Engineering

Clinical Program for Cervical Cancer Bulent ULKER MD International Medical Director F.

Protein Clustering: Parallelizing an Expensive, Irregular Computation James Larus EPFL AACBB

NMR Spectroscopy CH.EMBnet course 28.9.2004 Biozentrum, Basel D. Hussinger Overview 1. Basic

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES Robert Hirt Department of Zoology, The Natural

Specificity of Protein-DNA recognition of a long DNA binding motif Francisco Melo Ledermann EMBO

Sambuz

Useful Links

Newsletter

Mail Us

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt

Details of Protein Structure Function, evolution & experimental methods Thomas Blicher,