(More on) Islands of Tractability in Ontology-Based Data Access - - PowerPoint PPT Presentation

▶

Sep 30, 2022 106 likes •391 views

(More on) Islands of Tractability in Ontology-Based Data Access Carsten Lutz, University of Bremen Scientists vs. Users We need Use these simple scalability ontology languages We need WAY more expressivity OUR Then no scalability,

SLIDE 1

Carsten Lutz, University of Bremen

(More on) Islands of Tractability in Ontology-Based Data Access

SLIDE 2

Scientists vs. Users

We need scalability Use these simple

ntology languages

We need WAY more expressivity Then no scalability, look at this proof OUR

ntologies

do not look like that

SLIDE 3

Scientists vs. Users

We care about logical languages users insist on using expressive languages with many features concrete ontologies from applications tend to have simple structure Observations: We care about actual ontologies

SLIDE 4

Islands of Tractability

Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable

SLIDE 5

Islands of Tractability

Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable

SLIDE 6

Basic Setup

Ontology-mediated query (OMQ): triple (T , Σ, q) where OMQ language: q is query, e.g. atomic query (AQ) / conjunctive query (CQ) / UCQ takes form A(x) ≈ tree-shaped CQ pair (L, Q) with L DL (TBox language) and Q query language for example (EL, AQ), (ALC, UCQ), etc.

ntology)
ntology)

T is TBox (= ontology) Σ is schema for data (subset of schema of T )

SLIDE 7

Part I: Horn DLs

SLIDE 8

Horn DLs

Horn-DLs fit into the Horn fragment of FO / admit a chase procedure Concept formation rule: This is roughly: Datalog with arity <=2 and tree-shaped rule bodies plus existential quantification in rule heads (underly OWL2 EL profile) TBoxes: Example: finite sets of inclusions C v D Two basic Horn DLs: EL and ELI

true monadic relation (concept name) should be ∧

C, D ::= A | > | C u D | 9r.C | 9r−.C 9manages.Project v ProjectManager ProjectManager v 9assistedBy.PersonalAssistant

∃y r(x, y) ∧ C(y) ∃y r(y, x) ∧ C(y) (only ELI)

SLIDE 9

Horn DLs, FO, Datalog

OMQs in Horn DLs can be rewritten into monadic datalog program Exploited in practice: systems such as Clipper, Rapid, Requiem Most interesting island of tractability is FO-rewritability In Datalog, FO-rewritability coincides with boundedness Theorem [BenediktTenCateColcombetVandenBoomLICS15] Monadic datalog boundedness is 2ExpTime-complete (assuming an unpublished result on cost automata). We thus obtain only a 3ExpTime upper bound, no practical algorithms (though with exponential blowup) CHECK: 2ExpTime because of bounded arity?

SLIDE 10

FO-rewritability

a Non-locality comes from cycles via existentials on the left-hand side. So non-FO-rewritability = existence of certain syntactic cycles? ABox: TBox: 9r.A v A Query: A(x) A answer A(a) A A A Paradigmatic OMQ in (EL, AQ) that is not FO-rewritable:

SLIDE 11

FO-rewritability

r r r

a A

Cancelation is main source of complexity: On these steps, one can simulate a Turing machine , 9r.> v A A(x) ∨ ∃y r(x, y) TBox: 9r.A v A Query: A(x) FO-rewriting exists since 9r.> v A cancels non-locality: finding cycles in TBox is trivial (pure syntax) cycle cancelations can still occur after exponentially many steps A A A

SLIDE 12

Unraveling Tolerance

r, s t a

s t s t t

r s

· · · · · · · · · A

Theorem [L__WolterKR12] A, T | = A[a] iff Au

a, T |

= A[a] OMQ (T , Σ, A(x)) is unraveling tolerant if for every Σ-ABox A: Every OMQ from (ELI, AQ) is unraveling tolerant.

SLIDE 13

Theorem [BienvenuL_WolterIJCAI13]

Characterizing Non-Rewritability

A1 A2 A3 A4

· · ·

, but T , A0

i 6|

= A(a0)

2 3 4 A0

such that for all i ≥ 1: T , Ai | = A(a0) Unraveling tolerance enables characterization of FO-rewritability there are Σ-ABoxes in Horn DLs. OMQ (T , Σ, A(x)) in (ELI, AQ) is not FO-rewritable iff

SLIDE 14

Complexity

Theorem [BienvenuL_WolterIJCAI13] Deciding FO-rewritability is Via a pumping argument, we can bound the depth of the ABoxes to look at Worst case optimal algorithms for deciding FO-rewritability can then be found via automata techniques Does not suggest practical approach to construct rewritings

PSPACE-complete in (EL, AQ) with full ABox signature
EXPTIME-complete in (EL, AQ) with unrestricted ABox signature
EXPTIME-complete in (ELI, AQ)

(with full and unrestricted ABox signature)

SLIDE 15

Constructing FO-Rewritings: Preliminary

Theorem [RossmanJACM08] If an FO-query is preserved under homomorphisms on finite structures, then it is equivalent to a UCQ. Corollary In (FO-without-equality, UCQ), every FO-rewritable OMQ has a UCQ-rewriting. Most OMQs Q preserved under homomorphisms on ABoxes: if A1 | = Q[~ a] and h : A1 → A2 homomorphism, then A2 | = Q[h(a)]

SLIDE 16

TBox: Query: A0(x) A0 achievable via tree characterization [HansenL_SeylanWolterIJCAI15]

Constructing FO-Rewritings: Backwards Chaining

Proposed in [KönigLeclereMugnierThomazoRR12] for existential rules, Termination for positive cases guaranteed, Problem: UCQ representation of rewriting quickly grows out of bounds

A B s

r r

A B

9r.A u 9r.B v A0 9r.9s.> v A0 9s.B v B here adapted to (EL, AQ): general termination

SLIDE 17

Constructing FO-Rewritings II

[HansenL_SeylanWolterIJCAI15] a (succinct) non-recursive datalog rewriting is produced

ptimal ExpTime complexity is achieved

structure sharing helps to avoid thrashing Backwards chaining can be realized in decomposed calculus so that r

r A B (A0,

) )

(B, 9s.B v B B s TBox: Query: A0(x) 9r.A u 9r.B v A0,

SLIDE 18

Experiments

The actual rewritings are small (≤ 10 rules) in almost all cases Confirms that almost all OMQs from practice fall within island! CQs can be handled similarly, but complexity goes up (sometimes)

SLIDE 19

Part II: Non-Horn DLs

SLIDE 20

Expressive DLs

Concept formation rule: This is roughly: traditional modal logic or a slight restriction of the two-variable guarded fragment (core of OWL2 DL profile) Standard first-order semantics of negation Two basic expressive DLs: ALC and ALCI C, D ::= A | > | ¬A | C u D | 9r.C | 9r−.C Can also express: disjunction C t D universal restriction ∀r.C and ∀r−.C

(only ALCI)

∀y (r(x, y) → C(y)) ∀y (r(y, x) → C(y))

SLIDE 21

Expressive DLs: Example

Query: Expresses non-3-colorability, q() = ∃x D(x) thus coNP-hard and provably not Datalog-rewritable [AfratiEtAl91] Ontology: > v R t G t B R u 9r.R v D G u 9r.G v D

B u 9r.B v D R u G v D R u B v D G u B v D Schema for data: single binary relation r (data=graphs) Relevant islands of tractability include FO- and Datalog-rewritability

SLIDE 22

No Unraveling Tolerance

Non-Horn DLs are NOT unraveling tolerant: 9x.9y.P u 9y.9x.P v A0 TBox: Query: A0(x) x y

x y

A0 P? ¬P? x y x y A0

¬P Valuable resource: CSP-connection Tree-based approaches not likely to be successful. What can we do?

9x.9y.¬P u 9y.9x.¬P v A0

SLIDE 23

OBDA and CSP

Given: A template is a finite relational structure T. CSP(T) is: Theorem [BienvenuTenCateL_WolterPODS13] We concentrate on binary CSPs: only unary and binary relations finite relational structure S Question: T ← S? Every OMQ from (ALCI,BAQ) is equivalent to the complement of a CSP and vice versa. BAQs: Boolean atomic queries ∃x A(x)

SLIDE 24

More On Expressive Power

(ALC, BAQ) coCSP Boolean MDDLog w. single EDB in rule body (ALC, AQ) multi-template coCSP

w. single constant

MDDLog w. single EDB in rule body coMMSNP w free FO-variables MDDLog coGMSNP w free FO-variables Frontier-guarded disjunctive Datalog [BienvenuTenCateL_WolterPODS13] poly 1exp / 2exp poly (ALC, UCQ) (GF, UCQ)

SLIDE 25

On Complexity / Rewritings

Thus studying islands of tractability for OMQs and CSPs is equivalent Two caveats: For every CSP, there is a binary CSP of the same complexity, up to polytime reductions For example, (ALC, AQ) has dichotomy between PTime and coNP iff the Feder-Vardi conjecture holds (a problem for algebraists, it seems) But classification below PTime not known to be equivalent!

counting quantifiers

Theorem [L_WolterKR12] (ALCF, AQ) contains queries that are coNP-intermediate (unless P=NP) There are important OMQ languages such as (ALCF, AQ) for which CSP connection breaks

SLIDE 26

Rewritings: Decidability

Theorem Theorem [BienvenuTenCateL_WolterPODS13]

1. FO-definability of coCSPs is NP-complete.

[LaroseLotenTardiffLMCS07]

2. Datalog-definability of coCSPs is NP-complete.

[BartoKozikFOCS09, KozikKrokhinValerioteWillardAU14] Exponential blowup in translation OMQ => CSP “materializes” FO-rewritability and Datalog-rewritability in (ALCI, BAQ) and (ALCI, AQ) is NEXPTIME-complete. Can be lifted to multi-template CSPs with single constant

SLIDE 27

Constructing Rewritings (in Theory)

Canonical width-3 Datalog program of Feder and Vardi is a rewriting iff there is one [SiamJComp98] FO-Rewritings: From CSP-connection and results on homomorphism dualities: if there is an FO-rewriting, then there is a tree-UCQ-rewriting Pumping argument: depth and outdegree of tree-CQs can be bounded double exponentially Enumerate all CQs of these dimensions, check whether they are rewriting (red. to query answering) Datalog-Rewritings: If there is a rewriting, then there is one of width at most three [BartoKozikEtAl] More practical / pragmatic approaches (even incomplete) needed!

SLIDE 28

Thank You!

This and related research carried out under ERC Consolidator Grant CODA - Custom-Made Ontology Based Data Access August 2015 - July 2020, University of Bremen