ontologies and data integration in biomedicine
play

Ontologies and data integration in biomedicine Success stories and - PowerPoint PPT Presentation

Data Integration in the Life Sciences Evry, France June 26, 2008 Ontologies and data integration in biomedicine Success stories and challenging issues Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda,


  1. Data Integration in the Life Sciences Evry, France June 26, 2008 Ontologies and data integration in biomedicine Success stories and challenging issues Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

  2. Why integrate data?

  3. Integration yields nice pictures! [Yildirim, 2007] 3 Lister Hill National Center for Biomedical Communications

  4. Motivation Translational research � “Bench to Bedside” � Integration of clinical and research activities and results � Supported by research programs � NIH Roadmap � Clinical and Translational Science Awards (CTSA) � Requires the effective integration and exchange and of information between � Basic research � Clinical research 4 Lister Hill National Center for Biomedical Communications

  5. Translational research NIH Roadmap 5 Lister Hill National Center for Biomedical Communications

  6. Motivation Translational research Clinical Basic Research Research and Practice 6 Lister Hill National Center for Biomedical Communications

  7. Why ontologies?

  8. Terminology and translational research Cancer EHR Basic Cancer Research Patients NCI Thesaurus SNOMED CT 8 Lister Hill National Center for Biomedical Communications

  9. Approaches to data integration � Mediation � Warehousing � Local schema (of the � Sources to be integrated sources) are transformed into a � Global schema (in common format and converted to a common reference to which the queries are made) vocabulary 9 Lister Hill National Center for Biomedical Communications

  10. Ontologies and warehousing � Role � Provide a conceptualization of the domain � Help define the schema � Information model vs. ontology � Provide value sets for data elements � Enable standardization and sharing of data � Examples � Annotations to the Gene Ontology � Repositories for translational research (CTSA) � Clinical information systems 10 Lister Hill National Center for Biomedical Communications

  11. Ontologies and mediation � Role � Reference for defining the global schema � Map between local and global schemas � Examples � TAMBIS � BioMediator � OntoFusion 11 Lister Hill National Center for Biomedical Communications

  12. Success stories Gene Ontology http://www.geneontology.org/

  13. Annotating data � Gene Ontology � Functional annotation of gene products in several dozen model organisms � Various communities use the same controlled vocabularies � Enabling comparisons across model organisms � Annotations � Assigned manually by curators � Inferred automatically (e.g., from sequence similarity) 13 Lister Hill National Center for Biomedical Communications

  14. GO Annotations for Aldh2 (mouse) http:// www.informatics.jax.org/ 14 Lister Hill National Center for Biomedical Communications

  15. GO ALD4 in Yeast http://db.yeastgenome.org/ 15 Lister Hill National Center for Biomedical Communications

  16. GO Annotations for ALDH2 (Human) http://www.ebi.ac.uk/GOA/ 16 Lister Hill National Center for Biomedical Communications

  17. Integration applications � Based on shared annotations � Enrichment analysis (within/across species) � Clustering (co-clustering with gene expression data) � Based on the structure of GO � Closely related annotations � Semantic similarity [Lord, PSB 2003] � Based on associations between gene products and annotations [Bodenreider, PSB 2005] � Leveraging reasoning [Sahoo, Medinfo 2007] 17 Lister Hill National Center for Biomedical Communications

  18. Integration Entrez Gene + GO Gene Ontology Entrez Gene Gene name Glycosyltransferase Interactions GO gene Sequence PubMed OMIM [Sahoo, Medinfo 2007] Congenital muscular dystrophy 18 Lister Hill National Center for Biomedical Communications

  19. From glycosyltransferase to congenital muscular dystrophy glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl- GO:0008375 transferase has_molecular_function acetylglucosaminyl- GO:0008375 transferase LARGE EG:9215 Muscular dystrophy, MIM:608840 congenital, type 1D has_associated_phenotype 19 Lister Hill National Center for Biomedical Communications

  20. Success stories caBIG http://cabig.nci.nih.gov/

  21. Cancer Biomedical Informatics Grid � US National Cancer Institute � Common infrastructure used to share data and applications across institutions to support cancer research efforts in a grid environment � Data and application services available on the grid � Supported by ontological resources 21 Lister Hill National Center for Biomedical Communications

  22. caBIG services � caArray � Microarray data repository � caTissue � Biospecimen repository � caFE (Cancer Function Express) � Annotations on microarray data � … � caTRIP � Cancer Translational Research Informatics Platform � Integrates data services 22 Lister Hill National Center for Biomedical Communications

  23. Ontological resources � NCI Thesaurus � Reference terminology for the cancer domain � ~ 60,000 concepts � OWL Lite � Cancer Data Standards Repository (caDSR) � Metadata repository � Used to bridge across UML models through Common Data Elements � Links to concepts in ontologies 23 Lister Hill National Center for Biomedical Communications

  24. Success stories Semantic Web for Health Care and Life Sciences http://www.w3.org/2001/sw/hcls/

  25. W3C Health Care and Life Sciences IG 25 Lister Hill National Center for Biomedical Communications

  26. Biomedical Semantic Web � Integration � Data/Information � E.g., translational research � Hypothesis generation � Knowledge discovery [Ruttenberg, 2007] 26 Lister Hill National Center for Biomedical Communications

  27. HCLS mashup of biomedical sources PDSPki NeuronDB Reactome Gene Ontology BAMS Allen Brain BrainPharm Antibodies Atlas Entrez MeSH Gene NC Annotations PubChem Mammalian Phenotype SWAN AlzGene Homologene Publications http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_Demo 27 Lister Hill National Center for Biomedical Communications

  28. 28 Lister Hill National Center for Biomedical Communications GO Shared identifiers Example

  29. HCLS mashup NeuronDB Protein (channels/receptors) PDSPki Neurotransmitters Neuroanatomy Proteins GO Reactome Cell Chemicals Compartments Neurotransmitters Currents Genes/proteins Molecular function Interactions Cell components BAMS BrainPharm Cellular location Biological process Processes (GO) Protein Annotation gene Drug Neuroanatomy PubMedID Drug effect Cells Pathological agent Metabolites Allen Brain Entrez Phenotype (channels) Antibodies Atlas Receptors Gene PubMedID Genes Channels Genes Genes Brain images Cell types MeSH Protein Antibodies Gross anatomy -> PubMedID GO Drugs neuroanatomy Disease PubMedID Anatomy Genes/Proteins Name Interaction (g/p) Phenotypes NC Processes Genes Structure Chromosome Compounds Annotations Cells (maybe) Properties Phenotypes C. location Chemicals PubMed ID MeSH term Disease PubMedID PubMedID PubChem PubChem Mammalian Genes PubMedID Species Phenotype Gene Orthologies Hypothesis Polymorphism Proofs Questions Population Evidence Alz Diagnosis Genes Homologene SWAN AlzGene 29 Lister Hill National Center for Biomedical Communications

  30. HCLS mashup NeuronDB Protein (channels/receptors) PDSPki Neurotransmitters Neuroanatomy Proteins GO Reactome Cell Chemicals Compartments Neurotransmitters Currents Genes/proteins Molecular function Interactions Cell components BAMS BrainPharm Cellular location Biological process Processes (GO) Protein Annotation gene Drug Neuroanatomy PubMedID Drug effect Cells Pathological agent Metabolites Allen Brain Entrez Phenotype (channels) Antibodies Atlas Receptors Gene PubMedID Genes Channels Genes Genes Brain images Cell types MeSH Protein Antibodies Gross anatomy -> PubMedID GO Drugs neuroanatomy Disease PubMedID Anatomy Genes/Proteins Name Interaction (g/p) Phenotypes NC Processes Genes Structure Chromosome Compounds Annotations Cells (maybe) Properties Phenotypes C. location Chemicals PubMed ID MeSH term Disease PubMedID PubMedID PubChem PubChem Mammalian Genes PubMedID Species Phenotype Gene Orthologies Hypothesis Polymorphism Proofs Questions Population Evidence Alz Diagnosis Genes Homologene SWAN AlzGene 30 Lister Hill National Center for Biomedical Communications

  31. HCLS mashups � Based on RDF/OWL � Based on shared identifiers � “Recombinant data” (E. Neumann) � Ontologies used in some cases � Support applications (SWAN, SenseLab, etc.) � Journal of Biomedical Informatics special issue on Semantic Bio-mashups (forthcoming) 31 Lister Hill National Center for Biomedical Communications

  32. Bridges across ontologies Challenging issues

  33. Trans-namespace integration Clinical repositories Primary adrenocortical insufficiency Addison's disease (E27.1) (363732003) SNOMED CT ICD 10 MeSH Addison Disease (D000224) Biomedical literature 33 Lister Hill National Center for Biomedical Communications

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend