Carsten Lutz, University of Bremen
(More on) Islands of Tractability in Ontology-Based Data Access - - PowerPoint PPT Presentation
(More on) Islands of Tractability in Ontology-Based Data Access - - PowerPoint PPT Presentation
(More on) Islands of Tractability in Ontology-Based Data Access Carsten Lutz, University of Bremen Scientists vs. Users We need Use these simple scalability ontology languages We need WAY more expressivity OUR Then no scalability,
Scientists vs. Users
2
We need scalability Use these simple
- ntology languages
We need WAY more expressivity Then no scalability, look at this proof OUR
- ntologies
do not look like that
Scientists vs. Users
3
We care about logical languages users insist on using expressive languages with many features concrete ontologies from applications tend to have simple structure Observations: We care about actual ontologies
Islands of Tractability
4
Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable
Islands of Tractability
5
Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable
Basic Setup
Ontology-mediated query (OMQ): triple (T , Σ, q) where OMQ language: q is query, e.g. atomic query (AQ) / conjunctive query (CQ) / UCQ takes form A(x) ≈ tree-shaped CQ pair (L, Q) with L DL (TBox language) and Q query language for example (EL, AQ), (ALC, UCQ), etc.
- ntology)
- ntology)
T is TBox (= ontology) Σ is schema for data (subset of schema of T )
Part I: Horn DLs
Horn DLs
Horn-DLs fit into the Horn fragment of FO / admit a chase procedure Concept formation rule: This is roughly: Datalog with arity <=2 and tree-shaped rule bodies plus existential quantification in rule heads (underly OWL2 EL profile) TBoxes: Example: finite sets of inclusions C v D Two basic Horn DLs: EL and ELI
true monadic relation (concept name) should be ∧
C, D ::= A | > | C u D | 9r.C | 9r−.C 9manages.Project v ProjectManager ProjectManager v 9assistedBy.PersonalAssistant
∃y r(x, y) ∧ C(y) ∃y r(y, x) ∧ C(y) (only ELI)
Horn DLs, FO, Datalog
OMQs in Horn DLs can be rewritten into monadic datalog program Exploited in practice: systems such as Clipper, Rapid, Requiem Most interesting island of tractability is FO-rewritability In Datalog, FO-rewritability coincides with boundedness Theorem [BenediktTenCateColcombetVandenBoomLICS15] Monadic datalog boundedness is 2ExpTime-complete (assuming an unpublished result on cost automata). We thus obtain only a 3ExpTime upper bound, no practical algorithms (though with exponential blowup) CHECK: 2ExpTime because of bounded arity?
FO-rewritability
r
r
r
A
a Non-locality comes from cycles via existentials on the left-hand side. So non-FO-rewritability = existence of certain syntactic cycles? ABox: TBox: 9r.A v A Query: A(x) A answer A(a) A A A Paradigmatic OMQ in (EL, AQ) that is not FO-rewritable:
FO-rewritability
r r r
A
a A
Cancelation is main source of complexity: On these steps, one can simulate a Turing machine , 9r.> v A A(x) ∨ ∃y r(x, y) TBox: 9r.A v A Query: A(x) FO-rewriting exists since 9r.> v A cancels non-locality: finding cycles in TBox is trivial (pure syntax) cycle cancelations can still occur after exponentially many steps A A A
Unraveling Tolerance
a
b
r, s t a
r
s t s t t
r
r s
· · · · · · · · · A
Au
a
Theorem [L__WolterKR12] A, T | = A[a] iff Au
a, T |
= A[a] OMQ (T , Σ, A(x)) is unraveling tolerant if for every Σ-ABox A: Every OMQ from (ELI, AQ) is unraveling tolerant.
Theorem [BienvenuL_WolterIJCAI13]
Characterizing Non-Rewritability
A1 A2 A3 A4
· · ·
, but T , A0
i 6|
= A(a0)
1
2 3 4 A0
4
A0
3
A0
2
A0
1
such that for all i ≥ 1: T , Ai | = A(a0) Unraveling tolerance enables characterization of FO-rewritability there are Σ-ABoxes in Horn DLs. OMQ (T , Σ, A(x)) in (ELI, AQ) is not FO-rewritable iff
Complexity
Theorem [BienvenuL_WolterIJCAI13] Deciding FO-rewritability is Via a pumping argument, we can bound the depth of the ABoxes to look at Worst case optimal algorithms for deciding FO-rewritability can then be found via automata techniques Does not suggest practical approach to construct rewritings
- PSPACE-complete in (EL, AQ) with full ABox signature
- EXPTIME-complete in (EL, AQ) with unrestricted ABox signature
- EXPTIME-complete in (ELI, AQ)
(with full and unrestricted ABox signature)
Constructing FO-Rewritings: Preliminary
Theorem [RossmanJACM08] If an FO-query is preserved under homomorphisms on finite structures, then it is equivalent to a UCQ. Corollary In (FO-without-equality, UCQ), every FO-rewritable OMQ has a UCQ-rewriting. Most OMQs Q preserved under homomorphisms on ABoxes: if A1 | = Q[~ a] and h : A1 → A2 homomorphism, then A2 | = Q[h(a)]
16
TBox: Query: A0(x) A0 achievable via tree characterization [HansenL_SeylanWolterIJCAI15]
Constructing FO-Rewritings: Backwards Chaining
Proposed in [KönigLeclereMugnierThomazoRR12] for existential rules, Termination for positive cases guaranteed, Problem: UCQ representation of rewriting quickly grows out of bounds
r
r
A B s
r
r r
A B
s
9r.A u 9r.B v A0 9r.9s.> v A0 9s.B v B here adapted to (EL, AQ): general termination
Constructing FO-Rewritings II
[HansenL_SeylanWolterIJCAI15] a (succinct) non-recursive datalog rewriting is produced
- ptimal ExpTime complexity is achieved
structure sharing helps to avoid thrashing Backwards chaining can be realized in decomposed calculus so that r
r A B (A0,
) )
(B, 9s.B v B B s TBox: Query: A0(x) 9r.A u 9r.B v A0,
Experiments
The actual rewritings are small (≤ 10 rules) in almost all cases Confirms that almost all OMQs from practice fall within island! CQs can be handled similarly, but complexity goes up (sometimes)
Part II: Non-Horn DLs
Expressive DLs
Concept formation rule: This is roughly: traditional modal logic or a slight restriction of the two-variable guarded fragment (core of OWL2 DL profile) Standard first-order semantics of negation Two basic expressive DLs: ALC and ALCI C, D ::= A | > | ¬A | C u D | 9r.C | 9r−.C Can also express: disjunction C t D universal restriction ∀r.C and ∀r−.C
(only ALCI)
∀y (r(x, y) → C(y)) ∀y (r(y, x) → C(y))
Expressive DLs: Example
Query: Expresses non-3-colorability, q() = ∃x D(x) thus coNP-hard and provably not Datalog-rewritable [AfratiEtAl91] Ontology: > v R t G t B R u 9r.R v D G u 9r.G v D
B u 9r.B v D R u G v D R u B v D G u B v D Schema for data: single binary relation r (data=graphs) Relevant islands of tractability include FO- and Datalog-rewritability
No Unraveling Tolerance
22
Non-Horn DLs are NOT unraveling tolerant: 9x.9y.P u 9y.9x.P v A0 TBox: Query: A0(x) x y
x y
A0 P? ¬P? x y x y A0
P
¬P Valuable resource: CSP-connection Tree-based approaches not likely to be successful. What can we do?
9x.9y.¬P u 9y.9x.¬P v A0
OBDA and CSP
23
Given: A template is a finite relational structure T. CSP(T) is: Theorem [BienvenuTenCateL_WolterPODS13] We concentrate on binary CSPs: only unary and binary relations finite relational structure S Question: T ← S? Every OMQ from (ALCI,BAQ) is equivalent to the complement of a CSP and vice versa. BAQs: Boolean atomic queries ∃x A(x)
More On Expressive Power
24
(ALC, BAQ) coCSP Boolean MDDLog w. single EDB in rule body (ALC, AQ) multi-template coCSP
- w. single constant
MDDLog w. single EDB in rule body coMMSNP w free FO-variables MDDLog coGMSNP w free FO-variables Frontier-guarded disjunctive Datalog [BienvenuTenCateL_WolterPODS13] poly 1exp / 2exp poly (ALC, UCQ) (GF, UCQ)
On Complexity / Rewritings
25
Thus studying islands of tractability for OMQs and CSPs is equivalent Two caveats: For every CSP, there is a binary CSP of the same complexity, up to polytime reductions For example, (ALC, AQ) has dichotomy between PTime and coNP iff the Feder-Vardi conjecture holds (a problem for algebraists, it seems) But classification below PTime not known to be equivalent!
counting quantifiers
Theorem [L_WolterKR12] (ALCF, AQ) contains queries that are coNP-intermediate (unless P=NP) There are important OMQ languages such as (ALCF, AQ) for which CSP connection breaks
Rewritings: Decidability
26
Theorem Theorem [BienvenuTenCateL_WolterPODS13]
- 1. FO-definability of coCSPs is NP-complete.
[LaroseLotenTardiffLMCS07]
- 2. Datalog-definability of coCSPs is NP-complete.
[BartoKozikFOCS09, KozikKrokhinValerioteWillardAU14] Exponential blowup in translation OMQ => CSP “materializes” FO-rewritability and Datalog-rewritability in (ALCI, BAQ) and (ALCI, AQ) is NEXPTIME-complete. Can be lifted to multi-template CSPs with single constant
Constructing Rewritings (in Theory)
27
Canonical width-3 Datalog program of Feder and Vardi is a rewriting iff there is one [SiamJComp98] FO-Rewritings: From CSP-connection and results on homomorphism dualities: if there is an FO-rewriting, then there is a tree-UCQ-rewriting Pumping argument: depth and outdegree of tree-CQs can be bounded double exponentially Enumerate all CQs of these dimensions, check whether they are rewriting (red. to query answering) Datalog-Rewritings: If there is a rewriting, then there is one of width at most three [BartoKozikEtAl] More practical / pragmatic approaches (even incomplete) needed!
Thank You!
28
This and related research carried out under ERC Consolidator Grant CODA - Custom-Made Ontology Based Data Access August 2015 - July 2020, University of Bremen