Co nte nt-base d Onto lo g y Ranking
9th Intl. Protégé Conference - July 23-26, 2006 - Stanford, California
Mathew Jones & Harith Alani
Co nte nt-base d Onto lo g y Ranking Mathew Jones & Harith - - PowerPoint PPT Presentation
Co nte nt-base d Onto lo g y Ranking Mathew Jones & Harith Alani 9th Intl. Protg Conference - July 23-26, 2006 - Stanford, California Onto lo g y Ranking Is crucial for ontology search and reuse! Especially when there is a
9th Intl. Protégé Conference - July 23-26, 2006 - Stanford, California
Mathew Jones & Harith Alani
– Especially when there is a large number of them available online
– Philosophical soundness (e.g. OntoClean) – General properties such as metadata, documentation (e.g. Ontometric) – User ratings – Authority of source – Popularity (e.g. Swoogle) – Coverage – Consistency – Accuracy – Fit for purpose – …
variation of PageRank
– The more links an ontology receives from other ontologies the higher its rank
sometimes insufficient
– Many ontologies are not connected to others – Ontology popularity gives no guarantees on quality of specific concepts’ representation – There is a need to extend this ranking to take into account other
names
– Searching for Education will find
engineer
– Same as when searching with Swoogle
– Not hard wired into any specific ontology search tool
respect to specific characteristics
– Class Match Measure
– Density Measure
– Semantic Similarity Measure
– Betweenness Measure
taking into account their weight factors
Exact match Partial matches O1 O2 CMM(O1) > CMM(O2)
DEM(O2) > DEM(O1) O1 O2
univ.owl
5 links away
aargh.owl
1 link away O1 O2 SSM(O2) > SSM(O1)
BEM(University) = 0.0 BEM(Student) = 0.004 BEM(Organization) = 0.02
univ.owl
5 links away
Pos. Ontology URL a http://www.csd.abdn.ac.uk/~cmckenzi/playpen/rdf/akt_ontology_LITE.owl b http://protege.stanford.edu/plugins/owl/owl-library/koala.owl c http://protege.stanford.edu/plugins/owl/owl-library/ka.owl d http://reliant.teknowledge.com/DAML/Mid-level-ontology.owl
e http://www.mindswap.org/2004/SSSW04/aktive-portal-ontology-latest.owl f http://www.mondeca.com/owl/moses/univ2.owl g http://www.mondeca.com/owl/moses/univ.owl
h http://www.lri.jur.uva.nl/~rinke/aargh.owl
i http://www.mondeca.com/owl/moses/ita.owl j http://triplestore.aktors.org/data/portal.owl k http://annotation.semanticweb.org/ontologies/iswc.owl
l http://ontoware.org/frs/download.php/18/semiport.owl
Measure Values
0.000 0.500 1.000 1.500 2.000 2.500 3.000 a b c d e f g h i j k l Ontology CMM DEM SSM BEM
– Get a query from the user (e.g. Cancer) – Expand query with WordNet – Retrieve a corpus from the Web that covers this domain – Analyse the corpus to get a set of terms that strongly relate to this domain – Get a list of potentially relevant ontologies from Google (or Swoogle) – Calculate frequency in which those terms appear in the ontology (in concept labels and comments) – First rank is awarded to the ontology with the best coverage of the “domain terms”
is the user looking for (more on this later)
– Those documents are downloaded and treated as a domain corpus
– such as tf-idf (text frequency – inverse document frequency)
– Ontologies that contain those terms are given higher ranks than others
(a) Using Basic Google Search (b) Using WordNet Expanded Google Search
– Each ontology will be scored based on how well it covers the given terms
– So each word is given an importance value – This needs to be considered when assessing the ontologies – E.g. An ontology with concepts whose labels match the top ten tf-idf words would outrank an ontology with only the second ten words matching.
– Class Match Score (CMS): to match with concepts labels – Literal Match Score (LMS): to match with comments and other text
– α and β are weights to control the two scoring formulas
– Eg 1 for a full match, 0.4 for a partial match, 0 for no match
– This helps to find out which settings produce the best results
ID URL 1 http://semweb.mcdonaldbradley.com/OWL/Cyc/FreeToGov/060704/FreeToGovCyc.owl 2 http://www.inf.fu-berlin.de/inst/agnbi/research/swpatho/owldata/swpatho1/swpatho1.owl 3 http://www.mindswap.org/2003/CancerOntology/nciOncology.owl 4 http://sweet.jpl.nasa.gov/ontology/data_center.owl 5 http://compbio.uchsc.edu/Hunter_lab/McGoldrick/DataFed_OWL.owl 6 http://www.cs.umbc.edu/~aks1/ontosem.owl 7 http://homepages.cs.ncl.ac.uk/phillip.lord/download/knowledge/ontologyontology.owl 8 http://www.daml.org/2004/05/unspsc/unspsc.owl 9 http://envgen.nox.ac.uk/miame/MGEDOntology_env_final.owl 10 http://www.fruitfly.org/~cjm/obo-download/obo-all/mesh/mesh.owl
Experiment Exact Match Partial Match a 1 0.4 b 1 c 1 1
Experiment Class Match Text Match a 1 0.25 b 1 c 1 1
α is weight for class labels β is weight for comments
0.25), but not much more than that!
– They contained many of the terms found in the corpus, but with minimum detail – Overall focus of the ontologies was not on the chosen domain – Perhaps an ontology should be penalised if it had many terms that are definitely not related to the domain – Adding extra tests might also help to filter out such ontologies, such as density and betweenness
– No statistical significance can be claimed – Difficult for people to assess an ontology
– Some domains might not be well covered in Wikipedia – Of course finding a good corpus on the web can not be guaranteed either
– But WordNet might not cover the given term – Cost of an additional layer of user interaction