Prototyping a Biomedical Ontology Recommender Service Clement - - PowerPoint PPT Presentation

prototyping a biomedical ontology
SMART_READER_LITE
LIVE PREVIEW

Prototyping a Biomedical Ontology Recommender Service Clement - - PowerPoint PPT Presentation

Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the data they need Data


slide-1
SLIDE 1

Prototyping a Biomedical Ontology Recommender Service

Clement Jonquet Nigam H. Shah Mark A. Musen

jonquet@stanford.edu

1

slide-2
SLIDE 2

Ontologies & data & annota@ons (1/2)

 Hard for biomedical researchers to find the data they need

 Data integration problem  Translational discoveries are prevented

 Annotating data with biomedical ontologies is a solution

 Annotations describe data with ontology concepts  Semantic annotations (GO annotations , MeSH in PubMed)  Ontologies play a common denominator role

 For examples:

 A researcher wants to integrate data from different gene expression

dataset repositories

 A curator wants to triage articles for better information retrieval

2 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-3
SLIDE 3

Ontologies & data & annota@ons (2/2)

3 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

 Annotation of PMID 19550360 with Human

Disease

 5 matches  Annotation of PMID 19550360 with FMA  1 match  Annotation of PMID 19550360 with MeSH  17 matches  Annotation of PMID 19550360 with SNOMED‐CT  33 matches

slide-4
SLIDE 4

 Large number of ontologies

 Different versions, platforms, formats, etc.

 Which ontology is relevant? accurate?  What’s the risk of a bad choice?

 Miss a relevant ontology  Miss possible reuse and start a new ontology  Miss connection/integration with other data that use the right

  • ntologies

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 4

Which ontology to use?

slide-5
SLIDE 5

The recommender service

5 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

Given representative textual metadata (text description, keywords, etc.) Use a method based on semantic annotations generated by the NCBO Annotator Use 206 ontologies from UMLS & NCBO BioPortal Recommend and score the appropriate ontologies to annotate the data

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-6
SLIDE 6

NCBO Annotator workflow [Jonquet et al, AMIA STB 2009]

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 6

 Extract annotations from text by concept recognition  Expand annotations using the knowledge represented in

  • ntologies

 Score annotations according to the their context and

returns them to the user

slide-7
SLIDE 7

 Use NCIBI Mgrep, a

syntactic concept recognizer

 High degree of

accuracy

 Fast, scalable,  Domain

independent

Concept recogni@on (step 1)

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 7

 Use a dictionary: a list of strings that identifies

  • ntology concepts

 206 ontologies, ~3.5M concepts & ~7M terms

slide-8
SLIDE 8
  • Use semantic‐

similarity algorithms based on the is_a graph (ongoing work)

Seman@c expansion (step 2)

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 8

  • Use is_a hierarchies defined by original ontologies
  • Use mappings in UMLS Metathesaurus and NCBO BioPortal
slide-9
SLIDE 9

An example

 “Melanoma is a malignant tumor of melanocytes which

are found predominantly in skin but also in the bowel and the eye”.

 NCI/C0025201, Melanocyte in NCI Thesaurus {score: 10}  NCI/C0025202, Melanoma in NCI Thesaurus {score: 10}  39228/DOID:1909, Melanoma in Human Disease {score: 10}  NCI/C0027651, Neoplasm (synonym of Tumor) in NCI Thesaurus {8}

 Is_a closure expansion

 39228/DOID:191, Melanocytic neoplasm, direct parent of Melanoma in

Human Disease {score: 8}

 39228/DOID:0000818, cell proliferation disease, grand parent of Melanoma

in Human Disease {score: 8}

 NCI/C0027651, Neoplasm in NCI Thesaurus, grand‐grand parent of

Melanoma in NCI Thesaurus {score: 7}  Mapping expansion

 FMA/C0025201, Melanocyte in Foundational Model of Anatomy, concept

mapped to NCI/C0025201 in UMLS {score: 7}

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 9

slide-10
SLIDE 10

Ontology scoring method

 Each annotation computed by the NCBO

annotator has a score depending of the context

  • f annotation (e.g., direct, expanded, etc.)

 Ontologies are sorted by the sum of the scores of

the annotations they have generated

 In the previous example:

NCI/C0025201{score: 10} + NCI/C0025202{score: 10} + NCI/ C0027651{score: 8} + NCI/C0027651{score: 7} = 35

10 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-11
SLIDE 11

11

slide-12
SLIDE 12

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 12

slide-13
SLIDE 13

Recommenda@on & datasets

 We have compared results for 3 types of datasets

about “Melanoma”

 PubMed article citations  Clinicaltrials.gov trials  Gene Expression Omnibus datasets

 Recommendation is data dependent

 Different types of data will require different types of

  • ntologies for annotations

13 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-14
SLIDE 14

Ontologies iden@fied for each resources

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 14

ClinicalTrials.gov PubMed

NCI Thesaurus Human disease Galen Experimental Factor Human phenotype RadLex Mouse pathology Phenotypic quality FMA Xenopus anatomy Mouse adult gross anatomy Nci anatomy Medaka fish anatomy Human developmental anatomy Zebrafish anatomy Mosquito gross anatomy

GEO

slide-15
SLIDE 15

Results and analysis

 High score for big ontologies  Key ontologies identified

 Regardless of the dataset some ontologies are always present

 Some ontologies appear only with a specific type of

data

 Importance of appropriate recommendation

 Score does not follow linearly number of

annotations

 Importance of scoring as well as the annotation context

weights.

15 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-16
SLIDE 16

Future work

 Enhance backend annotation workflow

 Concept recognition  Semantic expansion

 Different scoring methods that will support

different kinds of recommendation scenarios

 Use the size of the ontologies to normalize the score  Prefer “key” ontologies over other ontologies

 Parameterized scoring methods

 Customization of weights for specific contexts  Predefined domain‐specific preferences

16 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-17
SLIDE 17

Conclusion

 Enabling data integration and translational

discoveries requires scalable data annotation using

  • ntologies

 It may be hard for a scientist to know which ontology

to (re)use in an annotation task

 We prototyped an ontology recommender service,

which, given sample textual metadata, will recommend appropriate ontologies to use

 Please try it and join us!

17 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-18
SLIDE 18

Thank you

hXp://obs.bioontology.org hXp://www.bioontology.org

18 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009

slide-19
SLIDE 19

Why using ontologies?

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 19

slide-20
SLIDE 20

Biomedical resources indexed with ontology

  • We have used the annotation workflow to index several

important biomedical resources with ontology concepts

  • The index can be used to enhance search and data integration

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 20

slide-21
SLIDE 21

Resources tab in BioPortal

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 21 Ontology concept browsed Example of resource available Number of annota@ons in the OBR index ID of an element Annota@on context Link to the

  • riginal element
slide-22
SLIDE 22

Good use of the seman@cs (1/2)

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 22

slide-23
SLIDE 23

Good use of the seman@cs (2/2)

BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 23