Auditing Redundant Import in Reuse of a Top Level Ontology for the - - PowerPoint PPT Presentation

auditing redundant import in reuse of a top
SMART_READER_LITE
LIVE PREVIEW

Auditing Redundant Import in Reuse of a Top Level Ontology for the - - PowerPoint PPT Presentation

ICBO 2013 Workshop on Vaccine and Drug Ontology Studies Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) Zhe He 1 Christopher Ochs 1 Larisa Soldatova 2 Yehoshua Perl 1 Sivaram


slide-1
SLIDE 1

Zhe He1 Christopher Ochs1 Larisa Soldatova2 Yehoshua Perl1 Sivaram Arabandi3 James Geller1

1New Jersey Institute of Technology, 2Brunel University, 3Ontopro LLC.

Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI)

ICBO 2013 Workshop on Vaccine and Drug Ontology Studies 1

slide-2
SLIDE 2

Outline

  • Introduction

– Environment – Motivation – Ontology for Drug Discovery Investigations (DDI) – Abstraction Networks & Partial Area Taxonomy

  • Algorithm Hide

– Hiding Redundant BFO (Basic Formal Ontology) classes from DDI

  • Future work
  • Conclusions

2

slide-3
SLIDE 3

Environment

  • BioPortal: a large repository of over 340 biomedical ontologies

covering a wide range of domains.

  • Many ontologies in BioPortal are released in OWL or OBO format.
  • OWL (Web Ontology Language): based on Description Logic,

maintained by a working group of W3C.

  • OBO (Open Biological and Biomedical Ontologies ) Foundry: a

collaborative experiment involving developers of ontologies who are establishing a set of principles for ontology development.

3

slide-4
SLIDE 4

Motivation

  • Use a top-level ontology as a template for a domain
  • ntology is recommended.
  • OBO Foundry recommends importing BFO (Basic

Formal Ontology).

  • The top-domain ontologies OGMS (Ontology for

General Medical Science) and BioTop (Beisswanger et

  • al. 2008) reuse BFO.
  • Some domain ontologies reuse OGMS, thereby

indirectly reusing BFO.

4

slide-5
SLIDE 5

Motivation (cont.)

  • Ontologies need to go through Quality Assurance

before being put to use.

– Discovering modeling errors and inconsistencies in the design – Unused imported top-level classes diminish the usability of the ontology. – Currently, there is no mechanism to remove unused imported classes. – Redundant imported top-level classes should be hidden.

5

slide-6
SLIDE 6

Ontology for Drug Discovery Investigations

  • DDI was developed to support automatic drug discovery

investigations run by a Robot Scientist “Eve” (Qi et al. 2010).

  • DDI is used for reasoning with data about the biological activity of

compounds in regards to various drug targets.

  • DDI uses BFO (Basic Formal Ontology) and RO (Relations Ontology) as

design templates and extends BFO and OBI (Ontology for Biomedical Investigations).

  • Some imported BFO classes were left unused in DDI.

– connected_temporal_region – temporal_instant – temporal_interval

6

slide-7
SLIDE 7

Abstraction Networks

  • An abstraction network is a secondary network that provides

a compact view of the structure and content of the primary

  • ntology.
  • Abstraction of an ontology is the process by which subsets of

classes are each replaced by a higher-level conceptual entity (node).

Ontology Abstraction Network Subset of classes modeled by a node

7

slide-8
SLIDE 8

Partial Area Taxonomy

  • Partial area taxonomy is an abstraction network

developed by our research group that summarizes sets of structurally and semantically similar classes.

  • Partial area taxonomies have been derived for

– SNOMED CT (Wang et al. 2007) – Ontology of Clinical Research (OCRe) (Ochs et al. 2012) – Sleep Domain Ontology (SDO) (Ochs et al. 2013) – Cancer Chemoprevention Ontology (CanCo) (He et al. 2013) – etc.

8

slide-9
SLIDE 9

Area Taxonomy

Area: Set of all classes that are explicitly defined or inferred as being in exactly the domain of a given set of object properties.

9

slide-10
SLIDE 10

Partial Area Taxonomy

Root: Class with no superclasses in area Partial area: Root + all descendants in area

10

slide-11
SLIDE 11

Algorithm Hide

  • Hide is a post order recursive algorithm requiring linear time.
  • Hide identifies imported classes that are not used in the

domain ontology.

  • Applicability:

– Ontologies in OWL or OBO format – Both domain ontology and top-level ontology are trees. – Top-level ontology does not have object properties.

  • A Class is redundant if:

– Imported from the top-level ontology AND – In Root partial area of the taxonomy AND – A leaf in the domain ontology (at some stage of the algorithm) AND – Not used as range of an object property 11

slide-12
SLIDE 12

12

Partial Area Taxonomy for DDI

slide-13
SLIDE 13

Entity Node of DDI Taxonomy

  • 81 classes in Entity root partial area of DDI taxonomy
  • BFO has 38 classes.
  • 32 out of 81 classes are imported from BFO.
  • 6 BFO classes are used as domains of object properties.
  • Hence, we reviewed 32 classes for redundancy.

13

slide-14
SLIDE 14

BFO Classes in Entity Node Before Hiding

Entity (2 children) continuant (3 children) dependent_continuant (2 children) independent_continuant (3 children) material_entity (10 children) fiat_object_part

  • bject
  • bject_aggregate
  • bject_boundary

site (3 children) spatial_region (4 children)

  • ne_dimentional_region

two_dimentional_region three_dimentional_region zero_dimentional_region

  • ccurent (3 children)

processual_entity (6 children) fiat_process_part process (2 children) process_aggregate process_boundary processual_context spatiotemporal_region (2 children) connected_spatiotemporal_region (2 children) spatiotemporal_instant spatiotemporal_interval scattered_spatiotemporal_region temporal_region (2 children) connected_temporal_region (2 children) temporal_instant temporal_interval scattered_temporal_region Legend LL Leaf LL Parent of classes that are all leaves LL Grandparent of grandchildren that are all leaves

14

slide-15
SLIDE 15

BFO Classes in Entity Partial Area After Hiding

  • 18 unused BFO classes are hidden.
  • Meaning 18/32 = 56% BFO classes in Entity partial area

are hidden.

Entity (2 children) continuant (3 children) dependent_continuant (2 children) independent_continuant (3 children) material_entity (10 children) site (3 children) spatial_region (4 children)

  • ne_dimentional_region

two_dimentional_region three_dimentional_region zero_dimentional_region

  • ccurent (3 children)

processual_entity (6 children) process (2 children)

15

slide-16
SLIDE 16

Future Work

  • As many as 35 out of 186 ontologies we investigated in BioPortal

reuse BFO classes.

  • Some ontologies have a Directed Acyclic Graph (DAG) hierarchy, e.g.

SDO (Sleep Domain Ontology) (Arabandi 2010).

  • Need to consider cases where both top-level and domain ontologies

are DAG hierarchies.

  • Some top-domain ontologies have object properties, e.g. BioTop.
  • Need to design algorithm to deal with issues regarding redundant

import of relationships in the reuse of top-domain ontologies.

16

slide-17
SLIDE 17

Conclusions

  • We described a recursive linear algorithm for hiding unused

imported top-level ontology classes of an OWL-based

  • ntology.
  • The algorithm was demonstrated by hiding 18 (56%) BFO

imported classes from the DDI.

  • Hiding of unused imported top-level classes should be part of

the Quality Assurance process of OWL-based ontologies.

17

slide-18
SLIDE 18

References

  • Qi, D., R. D. King, et al. (2010). "An ontology for description of drug

discovery investigations." J Integr Bioinform 7(3).

  • Arabandi, S. (2010). “Developing a Sleep Domain Ontology.” AMIA TBI/CRI
  • Summit. San Francisco, CA.
  • Beisswanger, E, S. Schulz, et al. “BioTop: An Upper Domain Ontology for

the Life Sciences.” Appl Ontology 3(4): 205-212.

  • Wang, Y., et al. (2007). "Structural methodologies for auditing SNOMED." J

Biomed Inform 40(5): 561-581.

  • Ochs, C., A. Agrawal, et al. (2012). "Deriving an Abstraction Network to

Support Quality Assurance in OCRe." AMIA Annu Symp Proc: 681-689

  • Ochs, C. , Z. He, et al. (2013). “Choosing the Granularity of Abstraction

Networks for Orientation and Quality Assurance of the Sleep Domain Ontology.” The 4th International Conference on Biomedical Ontology Proc.

  • He, Z., C. Ochs, et al. (2013). “A Family-based Framework for Supporting

Quality Assurance of Biomedical Ontologies in BioPortal.” To appear in AMIA Annu Symp Proc. 18

slide-19
SLIDE 19

Thank you! Any Questions?

19

slide-20
SLIDE 20

Algorithm of Hide

  • Algorithm Hide(R, O, T, v)
  • IF isInternal(O, v) THEN
  • FOR EACH Class w IN subclasses(R, v) {
  • Hide(R, O, T, w)
  • }
  • END IF
  • IF NOT(isInternal(O,v)) THEN
  • IF isClassFrom(v, O, T) AND NOT(in_op_range(v, O))
  • THEN
  • hide(v, O)
  • END IF
  • END IF
  • RETURN
  • Main Program
  • // Initially, call Hide on the root class r of the root partial area R.
  • Hide(R, O, T, r)

Function Name Function Description isInternal(O, v) Boolean function that returns true if class v has any subclasses in ontology O. subclasses(R, v) Returns iterator to the set

  • f subclasses of class v in

root partial area R. isClassFrom(v, O, T) Boolean function that returns true if the class v in

  • ntology O is imported

from Top-Level ontology T. in_op_range(v, O) Boolean function that returns true if class v is in the range of an object property of ontology O. hide(v, O) Hides class v from

  • ntology O and therefore

also removes all subclass relationships from v.

Domain ontology: O Top-Level ontology: T Root Partial Area of O: R Class in O - v

20