ELIXIR SCOP (Murzin) ~3000 domain structure families CATH - PDF document

23/07/2014 Groups involved in Genome3D Tom Blundell – Cambridge University Julian Gough – Bristol University David Jones – UCL Alexey Murzin – LMB (Cambridge) Annotating Genomes with Structures and Functions Christine Orengo – UCL BBSRC funding from 2011, Established July 2012 Michael Sternberg – Imperial, London SAB: Prof Geoff Barton – Dundee University, Prof Chas Bountra - Structural Genomics Consortium, Prof Torsten Schwede - Swiss Institute of Bioinformatics http://www.genome3d.eu Resources ELIXIR SCOP (Murzin) ~3000 domain structure families CATH (Orengo) Predicted domain SUPERFAMILY (Gough) annotations for >30 million sequences in Gene3D (Orengo) UniProt, ~70% of domains in completed genomes FUGUE (Blundell) Predicted domain PdomTHREADER (Jones) annotations and 3D models for PHYRE (Sternberg) ELIXIR unites Europe’s leading life science organisations in safeguarding the biological data selected organisms generated every day in publicly funded research. Learn more at www.elixir-europe.org Predicted Domain Annotations Domain Structure Classification Domain Structure Annotation (Per UniProt Sequence) 35000 SUPERFAMILY 30000 Number of UniProt sequences Gene3D 25000 PHYRE 20000 6 groups pDomTHREADER 5 groups 4 groups 15000 FUGUE 3 groups 2 groups SCOP 10000 1 groups No groups CATH Associated functional information from > 10 public sources 5000 Consensus 0 ... KEGG terms GO terms Other ... 1

23/07/2014 Genome3D Applications Predicted 3D Models (Per UniProt Sequence) 35000 30000 Number of UniProt sequences 25000 20000 4 groups SCOP 3 groups 15000 2 groups CATH 1 groups 10000 No groups Consensus 5000 0 ELIXIR-UK • UK node’s initial focus will be exclusively on training • The UK node will develop training infrastructure and focus on: – training needs analysis and trainer workshops – e-support service platform (TeSS) 9 2

23/07/2014 Organisation GLORIA Genome3D PDB of Information UniProt mapping residue annotation TOCCATA PICCOLO family-based protein-protein structural interactions alignments REQUIEM Genome3D Resources at nsSNP mapping comparative modelling TIMBAL BIPA Bernardo nsSNP impact Biochemistry, Cambridge inhibitors of protein- Richard Bickerton Ochoa Montano protein-protein nucleic acid interactions interactions CREDO protein- Tom L Blundell small molecule interactions Bernardo Ochoa M James Smith Department of Biochemistry Alicia Higueruelo Semin Lee University of Cambridge Databases in Biochemistry, Open Source Cambridge Sung Sam Gong Harry Jubb Adrian Schreyer Organisation of CREDO PDB Information UniProt mapping database of protein-ligand interactions, residue annotation TOCCATA PICCOLO ♦ represents contacts as structural family-based protein-protein structural interaction fingerprints, interactions REQUIEM alignments nsSNP mapping comparative ♦ sequence-to-structure mapping modelling TIMBAL BIPA nsSNP impact inhibitors of protein- Sung Sam Gong ♦ molecular shape descriptors with protein-protein nucleic acid Richard Bickerton interactions interactions Ultrafast Shape Recognition (USR), CREDO protein- small molecule ♦ fragmentation of ligands in PDB, Data from interactions papers ♦ identification of approved drugs. Semin Lee Alicia Higueruelo ♦ completely scriptable through Databases in Adrian Schreyer application programming interface. Biochemistry, Open Source Cambridge Adrian Schreyer Extending Knowledge of the Proteome FUGUE Knowledge-based prediction of protein structure Blundell, Sibanda, Sternberg, Thornton Nature 326, 26 675 1987 675 Citations  Sequence-structure homology recognition Assembly of fragments: program Composer (1987) 389 Citations Sutcliffe et al., Protein Engineering, 1, 377-384 Satisfaction of spatial restraints:  Defining characteristics: Modeller (1993) 6413 Citations Sali and Blundell. J. Mol. Biol. 234: 779-815  Use of Environment-Specific Substitution Tables (ESSTs) in structural profiles Sequence structure homology recognition Fugue (2001) 1054 Citations  Automatic alignment algorithm selection with Shi, Blundell, MizuguchiJMB 310 (1), 243-257 structure-dependent gap penalties Discrete sampling for ensembles consistent with spatial restraints of empirical data. RAPPER (2006) Shi J, Blundell TL and Mizuguchi K. Journal of Molecular De Bakker, DePristo, Blundell Nature SMB 13, 184-185 Biology 310, no. 1 (June 29, 2001): 243 – 57. PMID: 11419950 1

23/07/2014 Structural Environments TOCCATA  Substitutes original HOMSTRAD database as  Residues exist in variety of environments in protein structures, source of profiles for FUGUE.  this affects their conservation in evolution .  Constructed from a consensus of SCOP 1.75(b)  Examples of environments: families and CATH 3.5 superfamilies, including multi-domain patterns (not used on G3D).  secondary structure,  Goal was to group domains/structures in minimal  solvent exposure, number of categories, not analysis.  hydrogen bonding of main or side chain,  Each structure annotated according to  atypical dihedral angles. conformation (ligand binding, oligomeric state).  BLOSUM-like substitution tables can be derived for each combination of environments (currently 64), improving the http://structure.bioc.cam.ac.uk/toccata/ detection of remote homology and alignment quality. TOCCATA in numbers VIVACE Pipeline  57,880 PDBids Genomic Local Web Genome3D sequences Interface  135,894 PDB chains  228,014 domains Sequence Sequence pre-segmentation enrichment  114,647 with non-trivial ligands (HMMER+PFAM) (PSI-BLAST)  148,605 as part of complexes Domain Alignment + model Modelling  8151 profiles assignment annotation TOCCATA (MODELLER) (JOY/XSuLT) (FUGUE)  6238 single domain families (2263 consensus)  1519 multi-domain profiles Template Template alignment  394 repeated domain profiles selection (BATON, FUGUE) Joy / XSuLT Genome3D Stats  Encodes structural environment information (e.g. that used by FUGUE FUGUE FUGUE VIVACE Genome  XSuLT expands the original JOY to include other features (SCOP) (CATH) (models)  inter-residue contacts, residue depth, interface & ligand binding residues, E. coli 3,709 3,642 N/A  predictions & custom per-residue annotations, among others. S. cerevisae 5,499 5,430 N/A (baker’s yeast)  JOY: Mizuguchi K, Deane CM, Blundell TL, Johnson MS and Overington JS. H. sapiens 15,620 14,967 15,133 Bioinformatics 14, no. 7 (January 1, 1998): 617 – 623. PMID: 9730927 (human)  XSuLT: In preparation. Soon at http://structure.bioc.cam.ac.uk/xsult 2

23/07/2014 Human Genomes & Mutations SDM: Stability score calculation Can chemistry, structure and genomics information help   k /R j ,  wt ) F =  ln P(r k /R k ,  mut ) P(r j /R j ,  wt )  P(r identify mutations that cause  s jk   disease? P(r j /R k ,  mut )   Asp187 Ser1528 Unfolded state represented by substitutions occurring outside of regular secondary structure, solvent Genome Gly1529 Sequences Thr1526 exposed and non-hydrogen bonded ฀ Mendelian Inherited    Δs Δs U F s Diseases BHD jk jk Single Gene Syndrome Mutations & Current Worth CL, Preissner R & Blundell TL (2011) SDM — a server for predicting Disease Polygenic Disorders effects of mutations on protein stability and malfunction. Nucleic Acids Research Which are the “drivers”, Early Onset 39(Web Server issue):W215-W222 work on SDM: Catherine Worth & Complex Diseases Breast Cancer and which are the “passengers”? Cancer Somatic Topham, C.M., Srinivasan, N. and Blundell, T.L. (1997) Protein Engineering.10: 7-21. BRCA2 Mutations mCSM Genome3D Predicting the effect of mutations in proteins using graph-based signatures http://bleoberis.bioc .cam.ac.uk/mcsm/ Genome3D Resources at Biochemistry, Cambridge Tom L Blundell Bernardo Ochoa M Douglas E. V. Pires James Smith Department of Biochemistry University of Cambridge Pires DEV, Ascher DB, Blundell TL (2013) mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30(3):335-342 3

23/07/2014 ? http://supfam.org A Hidden Markov model profile PDB files SCOP domains, classification SUPERFAMILY profile HMMs SUPERFAMILY genome annotation Structural assignments to genomes Example of an • ~5,000 genomes assignment • ~125 million sequences • GO annotation of domains and sequences • Phylogenetic reconstruction • Comparative genomics/enrichment tools 1

23/07/2014 Website Walkthrough Website Walkthrough http://www.genome3d.eu/uniprot/id/Q01860/annotations http://www.genome3d.eu/uniprot/id/Q01860/annotations Uniprot name Website Walkthrough Website Walkthrough http://www.genome3d.eu/uniprot/id/Q01860/annotations http://www.genome3d.eu/uniprot/id/Q01860/annotations Annotations from resources Structural overlay interface Website Walkthrough http://www.genome3d.eu/uniprot/id/Q01860/annotations 2

ELIXIR SCOP (Murzin) ~3000 domain structure families CATH - PDF document

23/07/2014 Groups involved in Genome3D Tom Blundell Cambridge University Julian Gough Bristol University David Jones UCL Alexey Murzin LMB (Cambridge) Annotating Genomes with Structures and Functions Christine Orengo UCL BBSRC

DRINKING SOME DRINKING SOME ELIXIR ELIXIR 1 WHAT IS ELIXIR ? WHAT IS ELIXIR ? Elixir is a

S piritus-Agens Elixir DIstilLErs Elixir-Machine Garden of the Elixir Pill Elixir-Station The

ELIXIR Recommended Interoperability Resources Carole Goble, ELIXIR-UK Interoperability Platform

ELIXIR Beacon Network ELIXIR All Hands 2019 Juha Trnroos 17.6.2019 www.elixir(europe.org-

ELIXIR AAI task delivers: Manual assignment of affiliation Beacon workshop, ELIXIR AHM 2019

ELIXIR Serena Scollen Head of Human Genomics and Translational Data ELIXIR Hub, Cambridge UK

ELIXIR competence center Three months remaining Kimmo Mattila / CSC www.elixir-europe.org

ELIXIR Beacon Project 1 genomicsandhealth.org ELIXIR Beacon Project A Beacon is a discovery

ELIXIR Tolerncia a Falhas para Adultos @akitaonrails ELIXIR Tolerncia a Falhas para Adultos

Crash course how to use EeLP Brane Leskoek Marko Vidak ELIXIR Slovenia University of

Responsibility and Accountability under the GDPR Regina Becker ELIXIR-LU ELIXIR Workshop Data

Obligations of a controller A walk through the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands

The Galaxy use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

The BEACON use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

ELIXIR-SI eLearning Platform (EeLP) Crash course how to use EeLP Marko Vidak Brane Lesko ek

Elixir the only sane choice in an insane world Brian Cardarella CEO of DockYard why the

COMP 598 Advanced Computational Biology Methods & Research Introduction Jrme

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Novel method for estimating isotope incorporation using the half-decimal place rule Ingo

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Sambuz

Useful Links

Newsletter

Mail Us

ELIXIR SCOP (Murzin) ~3000 domain structure families CATH - PDF document

23/07/2014 Groups involved in Genome3D Tom Blundell Cambridge University Julian Gough Bristol University David Jones UCL Alexey Murzin LMB (Cambridge) Annotating Genomes with Structures and Functions Christine Orengo UCL BBSRC

DRINKING SOME DRINKING SOME ELIXIR ELIXIR 1 WHAT IS ELIXIR ? WHAT IS ELIXIR ? Elixir is a

S piritus-Agens Elixir DIstilLErs Elixir-Machine Garden of the Elixir Pill Elixir-Station The

ELIXIR Recommended Interoperability Resources Carole Goble, ELIXIR-UK Interoperability Platform

ELIXIR Beacon Network ELIXIR All Hands 2019 Juha Trnroos 17.6.2019 www.elixir(europe.org-

ELIXIR AAI task delivers: Manual assignment of affiliation Beacon workshop, ELIXIR AHM 2019

ELIXIR Serena Scollen Head of Human Genomics and Translational Data ELIXIR Hub, Cambridge UK

ELIXIR competence center Three months remaining Kimmo Mattila / CSC www.elixir-europe.org

ELIXIR Beacon Project 1 genomicsandhealth.org ELIXIR Beacon Project A Beacon is a discovery

ELIXIR Tolerncia a Falhas para Adultos @akitaonrails ELIXIR Tolerncia a Falhas para Adultos

Crash course how to use EeLP Brane Leskoek Marko Vidak ELIXIR Slovenia University of

Responsibility and Accountability under the GDPR Regina Becker ELIXIR-LU ELIXIR Workshop Data

Obligations of a controller A walk through the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands

The Galaxy use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

The BEACON use case under the GDPR Regina Becker ELIXIR-LU ELIXIR AllHands Workshop 7. June

ELIXIR-SI eLearning Platform (EeLP) Crash course how to use EeLP Marko Vidak Brane Lesko ek

Elixir the only sane choice in an insane world Brian Cardarella CEO of DockYard why the

COMP 598 Advanced Computational Biology Methods &amp; Research Introduction Jrme

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Novel method for estimating isotope incorporation using the half-decimal place rule Ingo

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Sambuz

Useful Links

Newsletter

Mail Us

COMP 598 Advanced Computational Biology Methods & Research Introduction Jrme