(More on) Islands of Tractability in Ontology-Based Data Access - - PowerPoint PPT Presentation

more on islands of tractability in ontology based data
SMART_READER_LITE
LIVE PREVIEW

(More on) Islands of Tractability in Ontology-Based Data Access - - PowerPoint PPT Presentation

(More on) Islands of Tractability in Ontology-Based Data Access Carsten Lutz, University of Bremen Scientists vs. Users We need Use these simple scalability ontology languages We need WAY more expressivity OUR Then no scalability,


slide-1
SLIDE 1

Carsten Lutz, University of Bremen

(More on) Islands of Tractability in Ontology-Based Data Access

slide-2
SLIDE 2

Scientists vs. Users

2

We need scalability Use these simple

  • ntology languages

We need WAY more expressivity Then no scalability, look at this proof OUR

  • ntologies

do not look like that

slide-3
SLIDE 3

Scientists vs. Users

3

We care about logical languages users insist on using expressive languages with many features concrete ontologies from applications tend to have simple structure Observations: We care about actual ontologies

slide-4
SLIDE 4

Islands of Tractability

4

Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable

slide-5
SLIDE 5

Islands of Tractability

5

Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable

slide-6
SLIDE 6

Basic Setup

Ontology-mediated query (OMQ): triple (T , Σ, q) where OMQ language: q is query, e.g. atomic query (AQ) / conjunctive query (CQ) / UCQ takes form A(x) ≈ tree-shaped CQ pair (L, Q) with L DL (TBox language) and Q query language for example (EL, AQ), (ALC, UCQ), etc.

  • ntology)
  • ntology)

T is TBox (= ontology) Σ is schema for data (subset of schema of T )

slide-7
SLIDE 7

Part I: Horn DLs

slide-8
SLIDE 8

Horn DLs

Horn-DLs fit into the Horn fragment of FO / admit a chase procedure Concept formation rule: This is roughly: Datalog with arity <=2 and tree-shaped rule bodies plus existential quantification in rule heads (underly OWL2 EL profile) TBoxes: Example: finite sets of inclusions C v D Two basic Horn DLs: EL and ELI

true monadic relation (concept name) should be ∧

C, D ::= A | > | C u D | 9r.C | 9r−.C 9manages.Project v ProjectManager ProjectManager v 9assistedBy.PersonalAssistant

∃y r(x, y) ∧ C(y) ∃y r(y, x) ∧ C(y) (only ELI)

slide-9
SLIDE 9

Horn DLs, FO, Datalog

OMQs in Horn DLs can be rewritten into monadic datalog program Exploited in practice: systems such as Clipper, Rapid, Requiem Most interesting island of tractability is FO-rewritability In Datalog, FO-rewritability coincides with boundedness Theorem [BenediktTenCateColcombetVandenBoomLICS15] Monadic datalog boundedness is 2ExpTime-complete (assuming an unpublished result on cost automata). We thus obtain only a 3ExpTime upper bound, no practical algorithms (though with exponential blowup) CHECK: 2ExpTime because of bounded arity?

slide-10
SLIDE 10

FO-rewritability

r

r

r

A

a Non-locality comes from cycles via existentials on the left-hand side. So non-FO-rewritability = existence of certain syntactic cycles? ABox: TBox: 9r.A v A Query: A(x) A answer A(a) A A A Paradigmatic OMQ in (EL, AQ) that is not FO-rewritable:

slide-11
SLIDE 11

FO-rewritability

r r r

A

a A

Cancelation is main source of complexity: On these steps, one can simulate a Turing machine , 9r.> v A A(x) ∨ ∃y r(x, y) TBox: 9r.A v A Query: A(x) FO-rewriting exists since 9r.> v A cancels non-locality: finding cycles in TBox is trivial (pure syntax) cycle cancelations can still occur after exponentially many steps A A A

slide-12
SLIDE 12

Unraveling Tolerance

a

b

r, s t a

r

s t s t t

r

r s

· · · · · · · · · A

Au

a

Theorem [L__WolterKR12] A, T | = A[a] iff Au

a, T |

= A[a] OMQ (T , Σ, A(x)) is unraveling tolerant if for every Σ-ABox A: Every OMQ from (ELI, AQ) is unraveling tolerant.

slide-13
SLIDE 13

Theorem [BienvenuL_WolterIJCAI13]

Characterizing Non-Rewritability

A1 A2 A3 A4

· · ·

, but T , A0

i 6|

= A(a0)

1

2 3 4 A0

4

A0

3

A0

2

A0

1

such that for all i ≥ 1: T , Ai | = A(a0) Unraveling tolerance enables characterization of FO-rewritability there are Σ-ABoxes in Horn DLs. OMQ (T , Σ, A(x)) in (ELI, AQ) is not FO-rewritable iff

slide-14
SLIDE 14

Complexity

Theorem [BienvenuL_WolterIJCAI13] Deciding FO-rewritability is Via a pumping argument, we can bound the depth of the ABoxes to look at Worst case optimal algorithms for deciding FO-rewritability can then be found via automata techniques Does not suggest practical approach to construct rewritings

  • PSPACE-complete in (EL, AQ) with full ABox signature
  • EXPTIME-complete in (EL, AQ) with unrestricted ABox signature
  • EXPTIME-complete in (ELI, AQ)

(with full and unrestricted ABox signature)

slide-15
SLIDE 15

Constructing FO-Rewritings: Preliminary

Theorem [RossmanJACM08] If an FO-query is preserved under homomorphisms on finite structures, then it is equivalent to a UCQ. Corollary In (FO-without-equality, UCQ), every FO-rewritable OMQ has a UCQ-rewriting. Most OMQs Q preserved under homomorphisms on ABoxes: if A1 | = Q[~ a] and h : A1 → A2 homomorphism, then A2 | = Q[h(a)]

slide-16
SLIDE 16

16

TBox: Query: A0(x) A0 achievable via tree characterization [HansenL_SeylanWolterIJCAI15]

Constructing FO-Rewritings: Backwards Chaining

Proposed in [KönigLeclereMugnierThomazoRR12] for existential rules, Termination for positive cases guaranteed, Problem: UCQ representation of rewriting quickly grows out of bounds

r

r

A B s

r

r r

A B

s

9r.A u 9r.B v A0 9r.9s.> v A0 9s.B v B here adapted to (EL, AQ): general termination

slide-17
SLIDE 17

Constructing FO-Rewritings II

[HansenL_SeylanWolterIJCAI15] a (succinct) non-recursive datalog rewriting is produced

  • ptimal ExpTime complexity is achieved

structure sharing helps to avoid thrashing Backwards chaining can be realized in decomposed calculus so that r

r A B (A0,

) )

(B, 9s.B v B B s TBox: Query: A0(x) 9r.A u 9r.B v A0,

slide-18
SLIDE 18

Experiments

The actual rewritings are small (≤ 10 rules) in almost all cases Confirms that almost all OMQs from practice fall within island! CQs can be handled similarly, but complexity goes up (sometimes)

slide-19
SLIDE 19

Part II: Non-Horn DLs

slide-20
SLIDE 20

Expressive DLs

Concept formation rule: This is roughly: traditional modal logic or a slight restriction of the two-variable guarded fragment (core of OWL2 DL profile) Standard first-order semantics of negation Two basic expressive DLs: ALC and ALCI C, D ::= A | > | ¬A | C u D | 9r.C | 9r−.C Can also express: disjunction C t D universal restriction ∀r.C and ∀r−.C

(only ALCI)

∀y (r(x, y) → C(y)) ∀y (r(y, x) → C(y))

slide-21
SLIDE 21

Expressive DLs: Example

Query: Expresses non-3-colorability, q() = ∃x D(x) thus coNP-hard and provably not Datalog-rewritable [AfratiEtAl91] Ontology: > v R t G t B R u 9r.R v D G u 9r.G v D

B u 9r.B v D R u G v D R u B v D G u B v D Schema for data: single binary relation r (data=graphs) Relevant islands of tractability include FO- and Datalog-rewritability

slide-22
SLIDE 22

No Unraveling Tolerance

22

Non-Horn DLs are NOT unraveling tolerant: 9x.9y.P u 9y.9x.P v A0 TBox: Query: A0(x) x y

x y

A0 P? ¬P? x y x y A0

P

¬P Valuable resource: CSP-connection Tree-based approaches not likely to be successful. What can we do?

9x.9y.¬P u 9y.9x.¬P v A0

slide-23
SLIDE 23

OBDA and CSP

23

Given: A template is a finite relational structure T. CSP(T) is: Theorem [BienvenuTenCateL_WolterPODS13] We concentrate on binary CSPs: only unary and binary relations finite relational structure S Question: T ← S? Every OMQ from (ALCI,BAQ) is equivalent to the complement of a CSP and vice versa. BAQs: Boolean atomic queries ∃x A(x)

slide-24
SLIDE 24

More On Expressive Power

24

(ALC, BAQ) coCSP Boolean MDDLog w. single EDB in rule body (ALC, AQ) multi-template coCSP

  • w. single constant

MDDLog w. single EDB in rule body coMMSNP w free FO-variables MDDLog coGMSNP w free FO-variables Frontier-guarded disjunctive Datalog [BienvenuTenCateL_WolterPODS13] poly 1exp / 2exp poly (ALC, UCQ) (GF, UCQ)

slide-25
SLIDE 25

On Complexity / Rewritings

25

Thus studying islands of tractability for OMQs and CSPs is equivalent Two caveats: For every CSP, there is a binary CSP of the same complexity, up to polytime reductions For example, (ALC, AQ) has dichotomy between PTime and coNP iff the Feder-Vardi conjecture holds (a problem for algebraists, it seems) But classification below PTime not known to be equivalent!

counting quantifiers

Theorem [L_WolterKR12] (ALCF, AQ) contains queries that are coNP-intermediate (unless P=NP) There are important OMQ languages such as (ALCF, AQ) for which CSP connection breaks

slide-26
SLIDE 26

Rewritings: Decidability

26

Theorem Theorem [BienvenuTenCateL_WolterPODS13]

  • 1. FO-definability of coCSPs is NP-complete.

[LaroseLotenTardiffLMCS07]

  • 2. Datalog-definability of coCSPs is NP-complete.

[BartoKozikFOCS09, KozikKrokhinValerioteWillardAU14] Exponential blowup in translation OMQ => CSP “materializes” FO-rewritability and Datalog-rewritability in (ALCI, BAQ) and (ALCI, AQ) is NEXPTIME-complete. Can be lifted to multi-template CSPs with single constant

slide-27
SLIDE 27

Constructing Rewritings (in Theory)

27

Canonical width-3 Datalog program of Feder and Vardi is a rewriting iff there is one [SiamJComp98] FO-Rewritings: From CSP-connection and results on homomorphism dualities: if there is an FO-rewriting, then there is a tree-UCQ-rewriting Pumping argument: depth and outdegree of tree-CQs can be bounded double exponentially Enumerate all CQs of these dimensions, check whether they are rewriting (red. to query answering) Datalog-Rewritings: If there is a rewriting, then there is one of width at most three [BartoKozikEtAl] More practical / pragmatic approaches (even incomplete) needed!

slide-28
SLIDE 28

Thank You!

28

This and related research carried out under ERC Consolidator Grant CODA - Custom-Made Ontology Based Data Access August 2015 - July 2020, University of Bremen