Data Integration with Ontologies Sebastian Brandt - - PowerPoint PPT Presentation

data integration with ontologies
SMART_READER_LITE
LIVE PREVIEW

Data Integration with Ontologies Sebastian Brandt - - PowerPoint PPT Presentation

Data Integration with Ontologies Sebastian Brandt brandt@cs.manchester.ac.uk (slides by Bijan Parsia bparsia@cs.man.ac.uk) 1 Friday, 2 May 2014 Ontology Based Data Access (ODBA) Ontology at run time? More, ontology for the end


slide-1
SLIDE 1

Data Integration with Ontologies

Sebastian Brandt brandt@cs.manchester.ac.uk

(slides by Bijan Parsia bparsia@cs.man.ac.uk) 1

Friday, 2 May 2014

slide-2
SLIDE 2

Ontology Based Data Access (ODBA)

  • Ontology at run time?

– More, ontology for the end user!??!

  • By end user, I mean, “someone writing queries”
  • Familiar

– Controlled vocabulary – Query by example

  • New

– “Better” queries – Integrated views of data

Friday, 2 May 2014

slide-3
SLIDE 3

“Better” queries

  • Better how?

– Consider a simple schema – What does the logical schema look like? – Lots of variants

  • Sane queries

– SELECT hasAge FROM employee WHERE hasSalary >= 50000; – SELECT hasAge FROM student WHERE hasSalary >= 50000; – What about Persons?

  • Union query?
  • Rather write

– SELECT hasAge FROM Person WHERE hasSalary >= 50000; – no matter what kind of persons there are

Person Student Employee

hasAge hasSalary

create table employee (id number(4) hasAge number(3), hasSalary number(6); create table student (id number(4) hasAge number(3), hasSalary number(5);

Friday, 2 May 2014

slide-4
SLIDE 4

What do we want?

  • We want to be able to query our data

– in the same way

  • no matter how the underlying structure changes

– in a “natural” way

  • so that I get the answers I need

– effectively

  • no waiting until the end of time

– unobtrusively

  • i.e., without too much disruption to my information systems
  • Often the people using the data

– are not the same as the people

  • collect the data
  • curate the data
  • manage the data
  • build apps using the data
  • Opportunities for impedance mismatch

Friday, 2 May 2014

slide-5
SLIDE 5

Bioinformatics case

  • Thousands (if not 100s of 1000s) of data sources

– Not all are databases!

  • or SQL database!
  • Very much over same or related data
  • Domain knowledge is widely shared

– Biologists know what they are talking about

  • genes, proteins, trees, etc.
  • Data structure knowledge not widely shared

– Consequence of the first point!

  • What must they do to get an answer?

Friday, 2 May 2014

slide-6
SLIDE 6

Workflow

  • 1. Discover (all!) relevant sources
  • 2. Assimilate their structure and content
  • 3. Formulate query fragments

– Each source might have it’s own! – The user must understand how things come together

  • 4. Dispatch the queries
  • Need to understand the interfaces!
  • 5. Synthesize the results

select ? ???? select ? ???? //protein/[@?dfl] gene, that, I, want //protein/[@?dfl] //protein/[@?dfl] //protein/[@?dfl]

http://www.publicdomainpictures.net/view-image.php?image=21541&picture=trassliga-kablar-pa-pole

Friday, 2 May 2014

slide-7
SLIDE 7

The hope

  • An ontology

– representing domain knowledge – in a reasonably familiar way – would provide easier access

  • For example:

– http://www.cs.man.ac.uk/~stevensr/tambis/video/Tut-Tao- query.avi

Friday, 2 May 2014

slide-8
SLIDE 8

Two Basic Strategies

  • In general:

– TBox = Schema; ABox = Data

  • ETL

– Convert the databases into an ABox

  • Federation

– Split, dispatch, and splice queries on the fly

https://babbage.inf.unibz.it/trac/obdapublic/wiki/ObdalibQuestIntro

Friday, 2 May 2014

slide-9
SLIDE 9

We always need mappings!

  • We need to map the data structure

– into the common schema/TBox – no matter what – no free lunch – but we saw how to do that!

  • ETL is a development time thing

– Develop the mapping – Run the conversion – Mappings inactive at runtime – What are the pros/cons?

  • Federation leaves the data in situ

– But has to exploit the mappings at query time – Pros/cons?

Friday, 2 May 2014

slide-10
SLIDE 10

Issues

  • We have a non-standard query language

– OWL or SPARQL (a SQL like conjunctive query language)

  • We have to do “extra” work

– Build the common ontology – Create the mappings

  • We have computational issues

– Data complexity of OWL is very high (NP-Complete)

Friday, 2 May 2014

slide-11
SLIDE 11

Trade expressivity for performance

  • We want an ontology language

– which is expressive enough to represent DB schemas – with good data complexity (at least) – sound and complete algorithms for federated query answering

  • Answer: OWL QL

– http://www.w3.org/TR/owl2-profiles/#OWL_2_QL – A restriction of OWL

  • “The OWL 2 QL profile is designed so that sound and complete query

answering is in LOGSPACE (more precisely, in AC0) with respect to the size of the data (assertions), while providing many of the main features necessary to express conceptual models such as UML class diagrams and ER diagrams.”

  • “...data (assertions) that is stored in a standard relational database

system can be queried through an ontology via a simple rewriting mechanism, i.e., by rewriting the query into an SQL query that is then answered by the RDBMS system, without any changes to the data.”

  • (Based on the DL Lite family of DLs.)

Friday, 2 May 2014

slide-12
SLIDE 12

Several important moves

  • Restrict the expression language

– B ::= A | ∃R | ∃R− C ::= B | ¬B | C1⊓C2

  • Odd axiom shapes

– B ⊑ C – (funct R), (funct R-)

  • No negations on RHS
  • No conjunctions on RHS

– Are conjunctions meaningful here?

Only unqualified existentials! No hasFinger some Finger SubClass axioms are asymmetric

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.1525&rep=rep1&type=pdf

Friday, 2 May 2014

slide-13
SLIDE 13

We can express

  • ISA

– using A1 ⊑ A2 ;

  • disjointness

– A1 ⊑ ¬A2

  • role-typing

– ∃R ⊑ A1 (or ∃R− ⊑ A2);

  • participation constraints,

– A ⊑ ∃R (resp., A ⊑ ∃R−);

  • non-participation constraints

– using A ⊑ ¬∃R and A ⊑ ¬∃R−;

  • functionality restrictions

– using (funct R) and (funct R−) – but no other counting

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.1525&rep=rep1&type=pdf

Friday, 2 May 2014

slide-14
SLIDE 14

Recall our example

  • Want to write

– SELECT hasAge FROM Person WHERE hasSalary >= 50000; – and get the right answers

  • We build an ontology

– Student SubClassOf: Person – Employee SubClassOf: Person – Etc.

  • We build mappings
  • Our query now works!

– No change to database

Person Student Employee

hasAge hasSalary

create table employee (id number(4) hasAge number(3), hasSalary number(6); create table student (id number(4) hasAge number(3), hasSalary number(5); https://babbage.inf.unibz.it/trac/obdapublic/wiki/SimpleHelloWorldTutorial

Friday, 2 May 2014