Prototyping a Biomedical Ontology Recommender Service
Clement Jonquet Nigam H. Shah Mark A. Musen
jonquet@stanford.edu
1
Prototyping a Biomedical Ontology Recommender Service Clement - - PowerPoint PPT Presentation
Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the data they need Data
jonquet@stanford.edu
1
Hard for biomedical researchers to find the data they need
Data integration problem Translational discoveries are prevented
Annotating data with biomedical ontologies is a solution
Annotations describe data with ontology concepts Semantic annotations (GO annotations , MeSH in PubMed) Ontologies play a common denominator role
For examples:
A researcher wants to integrate data from different gene expression
dataset repositories
A curator wants to triage articles for better information retrieval
2 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
3 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
Annotation of PMID 19550360 with Human
5 matches Annotation of PMID 19550360 with FMA 1 match Annotation of PMID 19550360 with MeSH 17 matches Annotation of PMID 19550360 with SNOMED‐CT 33 matches
Large number of ontologies
Different versions, platforms, formats, etc.
Which ontology is relevant? accurate? What’s the risk of a bad choice?
Miss a relevant ontology Miss possible reuse and start a new ontology Miss connection/integration with other data that use the right
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 4
5 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
Given representative textual metadata (text description, keywords, etc.) Use a method based on semantic annotations generated by the NCBO Annotator Use 206 ontologies from UMLS & NCBO BioPortal Recommend and score the appropriate ontologies to annotate the data
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 6
Extract annotations from text by concept recognition Expand annotations using the knowledge represented in
Score annotations according to the their context and
returns them to the user
Use NCIBI Mgrep, a
High degree of
accuracy
Fast, scalable, Domain
independent
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 7
Use a dictionary: a list of strings that identifies
206 ontologies, ~3.5M concepts & ~7M terms
similarity algorithms based on the is_a graph (ongoing work)
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 8
“Melanoma is a malignant tumor of melanocytes which
are found predominantly in skin but also in the bowel and the eye”.
NCI/C0025201, Melanocyte in NCI Thesaurus {score: 10} NCI/C0025202, Melanoma in NCI Thesaurus {score: 10} 39228/DOID:1909, Melanoma in Human Disease {score: 10} NCI/C0027651, Neoplasm (synonym of Tumor) in NCI Thesaurus {8}
Is_a closure expansion
39228/DOID:191, Melanocytic neoplasm, direct parent of Melanoma in
Human Disease {score: 8}
39228/DOID:0000818, cell proliferation disease, grand parent of Melanoma
in Human Disease {score: 8}
NCI/C0027651, Neoplasm in NCI Thesaurus, grand‐grand parent of
Melanoma in NCI Thesaurus {score: 7} Mapping expansion
FMA/C0025201, Melanocyte in Foundational Model of Anatomy, concept
mapped to NCI/C0025201 in UMLS {score: 7}
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 9
Each annotation computed by the NCBO
Ontologies are sorted by the sum of the scores of
In the previous example:
NCI/C0025201{score: 10} + NCI/C0025202{score: 10} + NCI/ C0027651{score: 8} + NCI/C0027651{score: 7} = 35
10 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
11
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 12
We have compared results for 3 types of datasets
PubMed article citations Clinicaltrials.gov trials Gene Expression Omnibus datasets
Recommendation is data dependent
Different types of data will require different types of
13 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 14
ClinicalTrials.gov PubMed
NCI Thesaurus Human disease Galen Experimental Factor Human phenotype RadLex Mouse pathology Phenotypic quality FMA Xenopus anatomy Mouse adult gross anatomy Nci anatomy Medaka fish anatomy Human developmental anatomy Zebrafish anatomy Mosquito gross anatomy
GEO
High score for big ontologies Key ontologies identified
Regardless of the dataset some ontologies are always present
Some ontologies appear only with a specific type of
data
Importance of appropriate recommendation
Score does not follow linearly number of
Importance of scoring as well as the annotation context
weights.
15 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
Enhance backend annotation workflow
Concept recognition Semantic expansion
Different scoring methods that will support
Use the size of the ontologies to normalize the score Prefer “key” ontologies over other ontologies
Parameterized scoring methods
Customization of weights for specific contexts Predefined domain‐specific preferences
16 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
Enabling data integration and translational
discoveries requires scalable data annotation using
It may be hard for a scientist to know which ontology
to (re)use in an annotation task
We prototyped an ontology recommender service,
which, given sample textual metadata, will recommend appropriate ontologies to use
Please try it and join us!
17 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
18 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 19
important biomedical resources with ontology concepts
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 20
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 21 Ontology concept browsed Example of resource available Number of annota@ons in the OBR index ID of an element Annota@on context Link to the
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 22
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 23