life science database integration based on semantic web
play

Life Science Database Integration Based on Semantic Web Technology - PowerPoint PPT Presentation

Life Science Database Integration Based on Semantic Web Technology Susumu Goto Database Center for Life Science (DBCLS) Joint Support-Center for Data Science Research (DS) Research Organization of Information and Systems (ROIS) International


  1. Life Science Database Integration Based on Semantic Web Technology Susumu Goto Database Center for Life Science (DBCLS) Joint Support-Center for Data Science Research (DS) Research Organization of Information and Systems (ROIS) International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines 2017 / 12 / 5 @ Tachikawa, Tokyo, Japan

  2. Self Introduction 1994 ‒ 2016 Institute for Chemical Research, Kyoto University GenomeNet: Bioinformatics tools and databases KEGG: Kyoto Encyclopedia of Genes and Genomes Database of gene and chemical networks LinkDB: Database of link information 2017- Database Center for Life Science, DS, ROIS Database integration based on semantic web technology Application of the integrated database

  3. Database Center for Life Science 2011- DDBJ DS Joint Support-Center for Data Science Research 2008- Database integration based on web application Funded by JST National Bioscience Database Center for the database JST integration with the FAIR principle Integbio DB Catalog LSDB Cross Search Life Science DB Archive NBDC RDF Portal Semantic Web Linked Open Data Japan Science and Technology Agency Research Database Center for DNA Data Bank of Japan NBDC National Bioscience Collaborative DBCLS Database Center Life Science NIG National Institute of Genetics Cooperation ROIS Research Organization of Information and Systems RDF • • • • • • •

  4. Linked Data LD ‒ Linked Data Define relationships between objects -> objects Use common ontologies for describing To give a unique ID to every object -> URI Framework Description Resource Resource Identifier Tim Berners-Lee URI ‒ Universal OF ‒ Open Format & Editable RE ‒ Readable OL ‒ Open License Linked-RDF http://5stardata.info/en/ 5 ★ Linked Open Data

  5. Database Integration @ DBCLS Highly heterogeneous databases Predicate: Attribute (URI) defined by an ontology Subject: ID (URI) for an object Triples consisting of Subject, Predicate and Object RDF: Resource Description Framework access and knowledge mining Databases integration for seamless using their own terms and formats RDF DB Ontology DB DB DB DB DB DB DB Object: ID (URI) or value (literal) for another object

  6. NBDC RDF Portal Portal site for RDF data from research groups in Japan 20 data sets including nine from NBDC funded databases comprising 45 billion triples (as of Nov. 2017) Microbial genomes, protein 3D structures, glycan structures, … RDF file download, SPARQL endpoints, Statistics, Metadata, … http://integbio.jp/rdf/ RDFyzing database guideline BioHackathons and SPARQLthons Two important topics

  7. SPARQLthon Two days hackathon held every month from 2012 October. Theme: Life science database integration by semantic web technologies. >60 times in total and 1,328 (138 unique) participants from 45 institutes (15 universities, 13 research institutes, 17 private companies). From 2014, researchers from integrated database project funded by NBDC have attended and collaborated for creating RDF data and ontologies.

  8. Biohackathon International hackathon hosted by DBCLS/NBDC once a year in Japan from 2008 Discuss and develop up-to-date technologies and systems for database integration and its applications One week intense development by international collaboration Summary papers have been published FAIR principle paper acknowledges biohackathon

  9. Currently Available RDF Data PubChem, Nikkaji wwPDB, BMRB, FAMSBASE Dictionary MeSH, Allie, LSD Glycan GlyTouCan, GlycoEpitope, WURCS Transcriptome ExpressionAtlas, RefEx, KERO, Open TG-GATEs Chemical compound Proteome MedGen neXtProt, The Human Protein Atlas, jPOSTdb Meta data Quanto, integbio DB catalog, Colil, First Authors Metabolome MassBank, metabolonote Sample BioSamples, JCM Ontology BioProtal, OLS Protein structure PAConto, GGDonto, DisGeNet, ClinVar, Type Metagenome RDF Data Set Type RDF Data Set Gene DDBJ Ortholog MBGD, PGDBj Orthology Genome Ensembl Protein interaction IntAct, Instruct, HINT MicrobeDB.jp Disease Pathway REACTOME, WikiPathway Epigenome KERO, ChIP-Atlas, iMETHYL Systems biology BioModels, SSBD Genome variation Linked ICGC, ClinVar, ExAC Bioassay ChEMBL, PubChem Protein UniProt Red: RDF Portal、Gray: On-going

  10. Application: TogoGenome Genome database based on semantic web technology. Unique: implemented only by RDF data stores. >10,000 species including 360 eukaryotes. > 1 billion triples Genes and genomes, environmental and growth conditions, links to other DBs

  11. Middleware: Accessing SPARQL EPs TogoStanza : generic web framework for reusable web components SPARQList : API for accessing SPARQL endpoints SPARQL support, SPARQL builder : web interface to support building SPARQL queries YummyData : listing and monitoring SPARQL endpoints

  12. Application: Easy access to omics data

  13. Application: Natural language Q&A http://lodqa.org

  14. Portal site Database Center for Bank Japan Protein Data PDBj Agency Japan Science and Technology JST Information and Systems Research Organization of ROIS Life Science DBCLS Hands-on seminars Center Database Bioscience National NBDC of Japan DNA Data Bank DDBJ 14 Japan alliance for Bioscience Information (JBI) JBI Osaka Univ.

  15. Summary 15 Database integration via semantic web technology RDF, Linked Open Data RDF Portal and converting tools Tools to utilize integrated database http://dbcls.jp/services Community for the development and utilization Biohackathon SPARQLthon Lecture series, TogoTV for lecture videos

  16. Acknowledgements NAITO, Yuki OKAMOTO, Shinobu KAWAMOTO, Shoko OKUBO, Kousaku WANG, Yue FUJIWARA, Toyofumi OHTA, Tazro IIDA, Keisuke MORIYA, Yuki NAKAZATO, Takeru CHIBA, Hirokazu 16 KAWASHIMA, Shuichi KATAYAM, Toshiaki ONO, Hiromasa YAMAMOTO, Yasunori YAMAGUCHI, Atsuko MINOWA, Mari BONO, Hidemasa KIM, Jin-Dong KAWANO, Shin KOHARA, Yuji Director

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend