prototyping a biomedical ontology
play

Prototyping a Biomedical Ontology Recommender Service Clement - PowerPoint PPT Presentation

Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the data they need Data


  1. Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1

  2. Ontologies & data & annota@ons (1/2)  Hard for biomedical researchers to find the data they need  Data integration problem  Translational discoveries are prevented  Annotating data with biomedical ontologies is a solution  Annotations describe data with ontology concepts  Semantic annotations (GO annotations , MeSH in PubMed)  Ontologies play a common denominator role  For examples:  A researcher wants to integrate data from different gene expression dataset repositories  A curator wants to triage articles for better information retrieval BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 2

  3. Ontologies & data & annota@ons (2/2)  Annotation of PMID 19550360 with Human  Annotation of PMID 19550360 with FMA  Annotation of PMID 19550360 with MeSH  Annotation of PMID 19550360 with SNOMED‐CT Disease  1 match  17 matches  33 matches  5 matches BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 3

  4. Which ontology to use?  Large number of ontologies  Different versions, platforms, formats, etc.  Which ontology is relevant? accurate?  What’s the risk of a bad choice?  Miss a relevant ontology  Miss possible reuse and start a new ontology  Miss connection/integration with other data that use the right ontologies BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 4

  5. The recommender service Given representative textual metadata (text description, keywords, etc.) Use a method based on semantic annotations generated by the NCBO Annotator Use 206 ontologies from UMLS & NCBO BioPortal Recommend and score the appropriate ontologies to annotate the data BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 5

  6. NCBO Annotator workflow [Jonquet et al, AMIA STB 2009]  Extract annotations from text by concept recognition  Expand annotations using the knowledge represented in ontologies  Score annotations according to the their context and returns them to the user BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 6

  7. Concept recogni@on (step 1)  Use a dictionary: a list of strings that identifies ontology concepts  206 ontologies, ~3.5M concepts & ~7M terms  Use NCIBI Mgrep, a syntactic concept recognizer  High degree of accuracy  Fast, scalable,  Domain independent BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 7

  8. Seman@c expansion (step 2) • Use is_a hierarchies defined by original ontologies • Use mappings in UMLS Metathesaurus and NCBO BioPortal • Use semantic‐ similarity algorithms based on the is_a graph (ongoing work) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 8

  9. An example  “Melanoma is a malignant tumor of melanocytes which are found predominantly in skin but also in the bowel and the eye”.  NCI/C0025201 , Melanocyte in NCI Thesaurus {score: 10}  NCI/C0025202 , Melanoma in NCI Thesaurus {score: 10}  39228/DOID:1909 , Melanoma in Human Disease {score: 10}  NCI/C0027651 , Neoplasm (synonym of Tumor ) in NCI Thesaurus {8}  Is_a closure expansion  39228/DOID:191 , Melanocytic neoplasm , direct parent of Melanoma in Human Disease {score: 8}  39228/DOID:0000818 , cell proliferation disease , grand parent of Melanoma in Human Disease {score: 8}  NCI/C0027651 , Neoplasm in NCI Thesaurus, grand‐grand parent of Melanoma in NCI Thesaurus {score: 7}  Mapping expansion  FMA/C0025201 , Melanocyte in Foundational Model of Anatomy, concept mapped to NCI/C0025201 in UMLS {score: 7} BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 9

  10. Ontology scoring method  Each annotation computed by the NCBO annotator has a score depending of the context of annotation (e.g., direct, expanded, etc.)  Ontologies are sorted by the sum of the scores of the annotations they have generated  In the previous example: NCI/C0025201 {score: 10} + NCI/C0025202 {score: 10} + NCI/ C0027651 {score: 8} + NCI/C0027651 {score: 7} = 35 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 10

  11. 11

  12. BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 12

  13. Recommenda@on & datasets  We have compared results for 3 types of datasets about “Melanoma”  PubMed article citations  Clinicaltrials.gov trials  Gene Expression Omnibus datasets  Recommendation is data dependent  Different types of data will require different types of ontologies for annotations BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 13

  14. Ontologies iden@fied for each resources PubMed Phenotypic quality Human phenotype RadLex NCI Thesaurus Mouse pathology Human disease Galen Experimental Factor FMA Xenopus anatomy Mouse adult gross anatomy Zebrafish anatomy Human developmental Nci anatomy Mosquito gross anatomy anatomy Medaka fish anatomy GEO ClinicalTrials.gov BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 14

  15. Results and analysis  High score for big ontologies  Key ontologies identified  Regardless of the dataset some ontologies are always present  Some ontologies appear only with a specific type of data  Importance of appropriate recommendation  Score does not follow linearly number of annotations  Importance of scoring as well as the annotation context weights. BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 15

  16. Future work  Enhance backend annotation workflow  Concept recognition  S emantic expansion  Different scoring methods that will support different kinds of recommendation scenarios  Use the size of the ontologies to normalize the score  Prefer “key” ontologies over other ontologies  Parameterized scoring methods  Customization of weights for specific contexts  Predefined domain‐specific preferences BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 16

  17. Conclusion  Enabling data integration and translational discoveries requires scalable data annotation using ontologies  It may be hard for a scientist to know which ontology to (re)use in an annotation task  We prototyped an ontology recommender service, which, given sample textual metadata, will recommend appropriate ontologies to use  Please try it and join us! BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 17

  18. Thank you hXp://obs.bioontology.org hXp://www.bioontology.org 18 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

  19. Why using ontologies? BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 19

  20. Biomedical resources indexed with ontology • We have used the annotation workflow to index several important biomedical resources with ontology concepts • The index can be used to enhance search and data integration BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 20

  21. Resources tab in BioPortal Example of resource available Number of annota@ons in the OBR index Ontology concept browsed Link to the original element ID of an element Annota@on context BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 21

  22. Good use of the seman@cs (1/2) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 22

  23. Good use of the seman@cs (2/2) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend