Islands of tractability in ontology-based data access Michael - - PowerPoint PPT Presentation

islands of tractability in ontology based data access
SMART_READER_LITE
LIVE PREVIEW

Islands of tractability in ontology-based data access Michael - - PowerPoint PPT Presentation

Islands of tractability in ontology-based data access Michael Zakharyaschev Department of Computer Science and Information Systems , Birkbeck, University of London http://www.dcs.bbk.ac.uk/~michael supported by EPSRC grants ExODA EP/H05099X and


slide-1
SLIDE 1

Islands of tractability in

  • ntology-based data access

Michael Zakharyaschev

Department of Computer Science and Information Systems, Birkbeck, University of London http://www.dcs.bbk.ac.uk/~michael

supported by EPSRC grants ExODA EP/H05099X and iTract EP/M012670

slide-2
SLIDE 2

Data access in industry

(from Norwegian Petroleum Directorate’s FactPages)

show me the wellbores completed before 2008 where Statoil as a drilling operator sampled less than 10 meters of cores 5 days later:

SELECT DISTINCT cores.wlbName, cores.lenghtM, wellbore.wlbDrillingOperator, wellbore.wlbCompletionYear FROM ( (SELECT wlbName, wlbNpdidWellbore, (wlbTotalCoreLength * 0.3048) AS lenghtM FROM wellbore core WHERE wlbCoreIntervalUom = ’[ft ]’ ) UNION (SELECT wlbName, wlbNpdidWellbore, wlbTotalCoreLength AS lenghtM FROM wellbore core WHERE wlbCoreIntervalUom = ’[m ]’ ) ) as cores, ( (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore development all UNION (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore exploration all ) UNION (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore shallow all ) ) as wellbore WHERE wellbore.wlbNpdidWellbore = cores.wlbNpdidWellbore

...

In STATOIL: 1,000 TB of relational data 2,000 tables different schemas

30–70% of time on data gathering

UCL 16.11.15 1

slide-3
SLIDE 3

Ontology-based data access (OBDA)

(the Romans ≈ 2007)

SELECT DISTINCT ?unit ?well WHERE { [] npdv:stratumForWellbore ?wellboreURI ; npdv:inLithostratigraphicUnit [ npdv:name ?unit ] . ?wellboreURI npdv:name ?well . ?core a npdv:WellboreCore ; npdv:coreForWellbore ?wellboreURI . }

query

[] rdf:type rr:TriplesMap; rr:logicalTable "select * from wellbore core"; rr:subjectMap [ a rr:TermMap; rr:template "&npd-v2;wellbore/{wlbNpdidWellbore}/";]; rr:propertyObjectMap [ rr:property npdv:coreIntervalBottom; rr:column "wlbCoreIntervalBottom" ]; ...

mappings

  • ntology

ProductionWellbore Wellbore WellboreStratum stratumForWellbore WellboreCore coreForWellbore ∪ CREATE TABLE wellbore core ( wlbName varchar(60) NOT NULL, wlbCoreNumber int(11) NOT NULL, wlbCoreIntervalTop decimal(13,6), ... ) A B C D 1 2 3 4 5

data sources

Ontology

– gives a high-level conceptual view of the data – provides a convenient & natural vocabulary for user queries – enriches incomplete data with background knowledge

UCL 16.11.15 2

slide-4
SLIDE 4

OBDA via FO-rewriting

database

n-ary relations

virtual ABox

triples

canonical model

derived triples

mapping

npdv:MoveableFacility

(URI(”&npdv;facility/{}”,t7)) :- facility moveable(t1,. . . ,t6,t7,t8,. . . ,t10)

. . .

  • ntology

npdv:MoveableFacility ⊑ npdv:Facility . . .

query rewriting unfolding + + + + database

n-ary relations

virtual ABox A

triples

canonical model

derived triples

mapping

npdv:MoveableFacility

(URI(”&npdv;facility/{}”,t7)) :- facility moveable(t1,. . . ,t6,t7,t8,. . . ,t10)

. . .

  • ntology T

npdv:MoveableFacility ⊑ npdv:Facility . . .

query q rewriting q′ unfolding + + + +

for all A and a, T , A | = q( a) ⇐ ⇒ IA | = q′( a)

reduction to DB query evaluation

UCL 16.11.15 3

slide-5
SLIDE 5

OWL 2 QL profile of OWL 2

(W3C 2012)

Roles ̺(x, y) ::= ⊤ | P (x, y) | P (y, x)

R ::= ⊤ | P | P −

Basic concepts τ(x) ::= ⊤ | A(x) | ∃y ̺(x, y)

B ::= ⊤ | A | ∃R

TBoxes ∀x

  • τ(x) → τ ′(x)
  • B ⊑ B′

∀x, y

  • ̺(x, y) → ̺′(x, y)
  • R ⊑ R′

∀x ̺(x, x)

R is reflexive

∀x

  • τ(x) ∧ τ ′(x) → ⊥
  • B ⊓ B′ ⊑ ⊥

∀x, y

  • ̺(x, y) ∧ ̺′(x, y) → ⊥
  • R ⊓ R′ ⊑ ⊥

∀x

  • ̺(x, x) → ⊥
  • R is irreflexive

Sugar ∀x

  • τ(x) → ∃y (̺1(x, y) ∧ · · · ∧ ̺k(x, y) ∧ τ ′(y))
  • B ⊑ ∃R.B′

(expressible via additional role inclusions)

ABoxes {A(a), P (a, b), ...}

based on the ‘DL-Lite family’ designed by the Romans (≈ 2005) and extended by Artale, Calvanese, Kontchakov & Z (2007–9)

UCL 16.11.15 4

slide-6
SLIDE 6

Example

Staff ontology T ∀x

  • ProjectManager(x) → ∃y (isAssistedBy(x, y) ∧ PA(y))
  • ∀x
  • ∃y managesProject(x, y) → ProjectManager(x)
  • ∀x
  • ProjectManager(x) → Staff(x)
  • ∀x
  • PA(x) → Secretary(x)
  • User query q: find the staff assisted by secretaries

q(x) = ∃y (Staff(x) ∧ isAssistedBy(x, y) ∧ Secretary(y)))

PE-rewriting of ontology-mediated query (T , q)

q′(x) = ∃y

  • Staff(x) ∧ isAssistedBy(x, y) ∧ (Secretary(y) ∨ PA(y))

ProjectManager(x) ∨ ∃z managesProject(x, z)

UCL 16.11.15 5

slide-7
SLIDE 7

Why are OWL 2 QL OMQs FO-rewritable?

Canonical model (chase) CT ,A of a given consistent (T , A)

homomorphically embeddable into every model of (T , A)

T , A | = q ⇐ ⇒ CT ,A | = q

for any CQ q Example: T = {A ⊑ ∃R−.∃R.B, B ⊑ ∃S.B} A = {A(a)}

CT ,A

a A a A B

R R

a A B

R R

B

S

a A B

R R

B

S

B

S

all Horn DLs have canonical models

but OMQ ({∃R.A ⊑ A}, A(x)) is not FO-rewritable (recursive datalog needed)

Bounded depth derivation property: there is a function f such that

T , A | = q ⇐ ⇒ CN

T ,A |

= q with CN

T ,A constructed in N = f(|T |, |q|) steps

⇔ FO-rewritability

f is polynomial for OWL 2 QL

UCL 16.11.15 6

slide-8
SLIDE 8

What is the price of OBDA?

– reduction to DB query evaluation could be too expensive OBDA would not be viable

1 what is the size of rewritings ?

– depending on the type of OMQs – depending on the type of rewritings

new research (succinctness) problem

2 what is the combined complexity of OMQ answering ?

– depending on the type of OMQs

well-known problem in DB theory

it may turn out that reduction to DB query evaluation is not most optimal way of OMQ answering

UCL 16.11.15 7

slide-9
SLIDE 9

Tree-witness rewriting of OMQ Q = (T , q)

q

qt1 qt2

CT ,A

Cτ1(a1)

T

Cτ2(a2)

T

h h

qtw( x) =

  • Θ independent set
  • f tree witnesses

∃ y

  • S(

z)∈ q\qΘ

S( z) ∧

  • t∈Θ

twt

  • Θ is independent if qt ∩ qt′ = ∅, for any distinct t, t′ ∈ Θ

UCL 16.11.15 8

slide-10
SLIDE 10

The number of tree witnesses

B

q(x1, x2, x3)

x1 x3 x2 A a

CT ,{A(a)}

B

exponentially-many tree witnesses

huge tw-rewriting

however, it can be simplified to a polynomial-size PE-rewriting:

q(x1, x2, x3) ∨ ∃z

  • A(z)∧n

i=1

  • (xi = z)∨∃y (R(y, xi)∧R(y, z))
  • can we always do this?

UCL 16.11.15 9

slide-11
SLIDE 11

Circuit complexity P/poly: the class of problems decidable by

polynomial-size circuit families

P ⊆ P/poly if NP ⊆ P/poly then P = NP

– almost all Boolean functions with n inputs require circuits of size Θ(2n/n)

(Shannon 1949)

are there complex Boolean functions fn in NP? (known lower bound: 5n − o(n)) nobody knows, but ...

UCL 16.11.15 10

slide-12
SLIDE 12

Monotone circuit complexity

(Razborov, Raz, et al. 1985)

Boolean variables eij give graph G = (V, E): V = {1, . . . , n}, E =

  • {i, j} | eij = 1
  • – CLIQUEn,k(

e) = 1 iff G contains a k-clique (e.g., for k ≤ n1/4) monotone circuits: exp (2ε

√ k)

monotone formulas: exp formulas with ¬: superpoly

unless NP ⊆ P/poly

– MATCHINGn( e) = 1 iff the bipartite graph e with n vertices in each part has a perfect matching (subset of edges containing every node once) monotone formulas: exp formulas with ¬: poly

UCL 16.11.15 11

slide-13
SLIDE 13

Tree-witness rewriting as a Boolean function

OMQ Q = (T , q) a hypergraph HQ = (V, E) where vertices V = atoms of q hyperedges E = tree witnesses qt monotone Boolean hypergraph function for Q (or hypergraph HQ)

fQ =

  • E′⊆E independent
  • v∈V \VE′

pv ∧

  • e∈E′

pe

  • (some tweaks required in case of exponentially-many tree witnesses)

– Boolean formula ϕ for fQ FO-rewriting of size O(|ϕ| · |Q|) – monotone Boolean formula ϕ for fQ PE-rewriting – monotone Boolean circuit ϕ for fQ NDL-rewriting

(nonrecursive datalog)

tool for obtaining upper succinctness and complexity bounds using classical circuit complexity

UCL 16.11.15 12

slide-14
SLIDE 14

Tool for lower bounds

For any OMQ Q = (T , q) and assignment α: predicates(q) → {0, 1},

Aα = {A(a) | α(A) = 1} ∪ {P (a, a) | α(P ) = 1}

ABox with a single individual a

Primitive evaluation function: gQ(α) = 1

⇔ T , Aα | = q( a)

– FO-rewriting q′ of Q Boolean formula for gQ of size O(|q′|) – PE-rewriting q′ of Q monotone Boolean formula for gQ – NDL-rewriting q′ of Q monotone Boolean circuit for gQ

(proof by quantifier elimination)

tool for obtaining lower succinctness bounds using classical circuit complexity

UCL 16.11.15 13

slide-15
SLIDE 15

Case study: OMQs with ontologies of depth 1

no axioms such as A ⊑ ∃P , ∃P − ⊑ ∃R depth 1 A a b depth 2 A a b Q = (T , q) with T of depth 1 hypergraph HQ is of degree ≤ 2

each vertex belongs to ≤ 2 hyperedges

hypergraph H of degree ≤ 2 ∃ OMQ QH with T of depth 1 and H ∼ = HQH

What can hypergraph functions of degree 2 compute?

UCL 16.11.15 14

slide-16
SLIDE 16

Hypergraph programs (HGPs)

An HGP is a hypergraph H = (V, E) with every vertex labelled by 0, 1, pi or ¬pi computes f: f( α) = 1 ⇔ there is an independent E′ ⊆ E covering all zeros

(contains all vertices whose labels evaluate to 0 under α)

monotone if no ¬pi among the labels

Any monotone HGP based on H computes a sub-function of fH

HGPs based on hypergraphs of degree ≤ 2 are polynomially equivalent to

nondeterministic branching programs (NBPs)

s t e f d p1 p6 p3 ∧ p4 1 1

HGP2 = NBP = NL/poly

functions computable by NLogSpace TMs with polynomial advice functions

(non-uniform analogue of NLOGSPACE)

UCL 16.11.15 15

slide-17
SLIDE 17

Rewritings for OMQs with ontologies of depth 1

HGP2 = NL/poly ⊆ P/poly

(for monotone functions)

polynomial-size NDL-rewritings

there is a monotone f computable by a polynomial-size NBP , but any monotone Boolean formula computing f is of size nΩ(log n)

∃ OMQ with superpolynomial PE-rewritings only

all OMQs have polynomial FO-rewritings

NC1 = NL/poly

all OMQs with CQs of bounded treewidth have polynomial PE-rewritings all tree-shaped OMQs have polynomial-size Π4-rewritings (∧∨∧∨)

(SPARQL queries under OWL 2 QL entailment regime)

UCL 16.11.15 16

slide-18
SLIDE 18

Succinctness landscape for OMQ rewritings

1 2 3 . . . d arb 2 . . . ℓ trees tw 2 . . . btw arb TBox depth Number of leaves Treewidth poly NDL, but no poly PE

poly FO iff NL/poly ⊆ NC1

poly NDL, but no poly PE

poly FO iff LOGCFL/poly ⊆ NC1 no poly FO unless

NP/poly ⊆ NC1

no poly PE or NDL

poly FO iff NP/poly ⊆ NC1

poly PE, NDL, & FO poly NDL, but no poly PE poly NDL, but no poly PE

poly FO iff NL/poly ⊆ NC1

Bienvenu, Kikot, Kontchakov, Podolskii, Z 2012–15

UCL 16.11.15 17

slide-19
SLIDE 19

Combined complexity landscape

1 2 3 . . . d arb 2 . . . ℓ trees tw 2 . . . btw arb TBox depth Number of leaves Treewidth NL-complete LOGCFL-complete NP-complete NP-complete LOGCFL-c – CQ evaluation over databases is NP-complete L ⊆ NL ⊆ LOGCFL ⊆ NC2 ⊆ P ⊆ NP – bounded treewidth CQ evaluation is LOGCFL-complete (logspace reducible to a CFL) Gottlob et al. 2001

UCL 16.11.15 18

slide-20
SLIDE 20

database

n-ary relations

virtual ABox

triples

canonical model

derived triples

mapping

npdv:MoveableFacility

(URI(”&npdv;facility/{}”,t7)) :- facility moveable(t1,. . . ,t6,t7,t8,. . . ,t10)

. . .

  • ntology

npdv:MoveableFacility ⊑ npdv:Facility . . .

query rewriting unfolding + + + +

compile integrity constraints ABox constraints

+ SQO

Rodriguez-Muro, Calvanese, Kontchakov, Rezk, Xiao, Z 2010–15

UCL 16.11.15 19

slide-21
SLIDE 21

Ontop in practice

T-mappings compile (big parts of) OWL 2 QL ontologies into mappings

(domain and range constraints, concept and role hierarchies)

can be optimised offline few tree witnesses in real-world OBDA

polynomial-size rewritings

database constraints and SQO significantly simplify T-mappings

efficient SQL queries over the data

✗ some important conceptual modelling constructs are missing in OWL 2 QL

A ⊑ B ⊔ C ∃R.A ⊑ B owl:sameAs

? islands of OMQ rewritability & succinctness for expressive languages

UCL 16.11.15 20

slide-22
SLIDE 22

iTract: Islands of Tractability in Ontology-Based Data Access EPSRC UK project:

(i) establish a novel, OMQ-centric approach to OBDA aiming to identify islands of tractable OMQs in rich ontology and query languages (ii) develop uniformly efficient OMQ answering techniques for the identified islands (iii) implement and test these techniques in practice, using state-of-the-art OBDA systems

Team:

– London: MZ (PI), S Kikot (RA), R Kontchakov (co-I), I Razgon (co-I) – Liverpool: F Wolter (PI), F Papacchini (RA), A Hernich, B Konev

Project partners:

– University of Bozen-Bolzano (Diego Calvanese) – University of Bremen (Carsten Lutz) – University of Oslo (Arild Waaler) – IBM Watson, New York (Mariano Rodriguez-Muro)

UCL 16.11.15 21