Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas - - PowerPoint PPT Presentation

inconsistency tolerant query rewriting for linear datalog
SMART_READER_LITE
LIVE PREVIEW

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas - - PowerPoint PPT Presentation

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari Department of Computer Science, University of Oxford 2nd WORKSHOP ON THE RESURGENCE OF DATALOG IN ACADEMIA AND


slide-1
SLIDE 1

1

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/

Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari Department of Computer Science, University of Oxford

2nd WORKSHOP ON THE RESURGENCE OF DATALOG IN ACADEMIA AND INDUSTRY September 2012, Vienna, Austria

slide-2
SLIDE 2

2

Motivation

  • Inconsistency in data management is an issue that cannot

be ignored, and sometimes it is necessary to live with conflicting information.

  • The focus is now on reasoning with data coming from the

Web (or made accessible through the Web).

  • Challenge: to make sense of constantly increasing

amounts of heterogeneous (dynamic) data coming from very disparate sources and domains.

  • Goal: deal with inconsistency using reasonable semantics

and efficient methods of computation.

slide-3
SLIDE 3

3

Overview

  • In this talk, we focus on Linear Datalog+/-:

– Generalizes the DL-Lite family of tractable descriptions logics (DLs). – Query Answering of Conjunctive Queries (CQ) for Datalog+/- is FO rewritable.

  • We analyze query answering of CQ under the Intersection

Semantics (Lembo et al., RR 2010):

– Inconsistency-tolerant semantics for query answering. – Sound approximation of Consistent Answers.

  • We show that query answering of CQ for Linear

Datalog+/- is FO rewritable under this semantics.

slide-4
SLIDE 4

Datalog+/-

  • We assume:

– An infinite universe of data constants D – An infinite set of labeled nulls DN – An infinite set of variables  – A relational schema , which is a finite set of relation names (or predicate symbols).

  • Different constants represent different values, but different

nulls may represent the same value.

  • We use X to denote a sequence X1, …, Xn, with n ³ 0.
  • A database (instance) D over  is a set of atoms with

predicates from  and arguments from D.

4

slide-5
SLIDE 5

Datalog+/-

  • A conjunctive query (CQ) over  has the form

Q(X) = $Y F(X,Y), where F is a conjunction of atoms.

  • A Boolean conjunctive query (BCQ) over  has the form

Q() = $Y F(X,Y), where F is a conjunction of atoms.

  • Answers to queries are defined via homomorphisms, which

are mappings m: D È DN È   D È DN È  s.t.:

– c Î D implies m(c) = c – c Î DN implies m(c) Î D È DN – m is extended to atoms, sets of atoms, and conjunctions.

  • The set of answers Q(D) is the set of tuples t over D s.t.

$ m: X È Y  D È DN s.t. m(F(X,Y)) Í D, and m(X) = t.

5

slide-6
SLIDE 6

Datalog+/-

  • Tuple-generating Dependencies (TGDs) are constraints of

the form "X"Y F(X,Y)  $Z Y(X,Z) where F and Y are atomic conjunctions over .

  • Given a DB D and a set S of TGDs, the set of models

mods(D, S) is the set of all B s.t.:

– D Í B – every s Î S is satisfied in B.

  • The set of answers for a CQ Q to D and S, ans(Q,D,S), is

the set of all tuples a s.t. a Î Q(B) for all B Î mods(D, S).

  • A TGD is guarded if there exists an atom in its body that

contains all the variables appearing in the body. A TGD is linear if it has only one atom in its body.

6

slide-7
SLIDE 7

The Chase

  • The Chase is a procedure for repairing a DB relative to a

set of dependencies.

  • (Informal) TGD Chase rule:

– a TGD s is applicable in a DB D if body(s) maps to atoms in D – if not already in D, the application of s on D adds an atom with “fresh” nulls corresponding to each existentially quantified variable in head(s).

  • The (possibly infinite) chase is a universal model: there

exists a homomorphism from chase(D, S) onto every B Î mods(D, S).

  • Therefore we have that D È S  Q iff chase(D, S)  Q.
  • If S consists of guarded TGDs, CQs can be evaluated on a

fragment of constant depth k ⋅ |Q|, PTIME in data comp.

7

slide-8
SLIDE 8

Negative Constraints and EGDs

  • Negative constraints (NCs) are formulas of the form

"X F(X)  ^, where F(X) is a conjunction of atoms.

  • NCs are easy to check, since we can simply verify that the

CQ F(X) has an empty set of answers.

  • Equality Generating Dependencies (EGDs) are of the form

"X F(X)  Xi = Xj , where F is a conjunction of atoms and Xi , Xj are variables from X.

  • Here, we assume that EGDs are separable, which intuitively

means that EGDs and TGDs are independent of each other.

8

slide-9
SLIDE 9

Example

D = {directs(john, sales), directs(anna, sales), directs(john, finance), supervises(anna,john), works_in(john,sales), works_in(anna,sales)} ST = {works_in (X,D)  emp(X), manager(X)  $Y supervises(X,Y), supervises(X,Y)  directs(X,D)  works_in(Y,D)} SNC = {supervises(X,Y)  manager(Y)  ^, supervises(X,Y)  works_in(X,D)  directs(Y,D)  ^, directs(X,D)  directs (X,D’)  D = D’}

9

slide-10
SLIDE 10

Consistent Query Answering

  • Inconsistency arises whenever chase(D,S)  body(n), for

some n Î SE È SNC.

  • Data repair for ontology KB = (D, S): a database D¢ such

that: (1) D¢ Í D, (2) mods(D¢,S) ¹ Æ, and (3) no D Í D is such that D Í D¢ and mods(D,S) ¹ Æ.

  • Consistent Query Answering: given KB = (D,S) and a CQ

Q, KB CONS Q iff (R, S)  Q for every R Î DRep(KB).

  • Intersection Semantics: Given KB = (D,S) and a CQ Q,

we say that KB ICons Q iff (R Î DRep(KB) R, S)  Q .

  • Equivalently, KB ICons Q iff (D - (c Î culprits(KB) c))  Q.

10

slide-11
SLIDE 11

FO Rewritable TGDs

11

Q ST QS Q*

compilation FO

D

evaluation SQL

"D (D È S  Q)  D  Q*

Query Answering in AC0 In the data complexity

slide-12
SLIDE 12

FO Query Rewriting: Intersection Semantics

12

Q S QS Q*

compilation FO

D

evaluation SQL

"D (D È S ICons Q)  D  Q*

S = ST È SNC

slide-13
SLIDE 13

FO Query Rewriting ICons : TGD-free case

  • To rewrite a query under the intersection semantics we

need to enforce the negative constraints in the rewriting.

  • We need to establish a correspondence between the

minimization of negative constraints in the rewriting of Q and the minimization inherently encoded in culprits.

13

SNC = {u1: p(U,U)  ^, u2: p(X,Y)  q(X)  ^} Q: $X q(X) D1 = {p(a,a), q(a)} culprits(KB) = {p(a,a)} D2 = {p(a,b), q(a)} culprits(KB) = {p(a,b),q(a)}

slide-14
SLIDE 14

FO Query Rewriting ICons : 1 - Normalization of NCs

  • Def: Let u Î SNC and Q a BCQ; then, ~u is an equivalence

relation on the arguments of the body of u and the constants in u and Q such that every equivalence class contains at most one constant.

14

~u1:

{{U, a}} {{U} {a}}

~u2:

{{X,Y, a}} {{X} {Y} {a}} {{X,Y} {a}} {{X,a} {Y}} {{X} {Y,a}}

u1: p(U,U)  ^ u2: p(X,Y)  q(X)  ^ Q: q(a)

slide-15
SLIDE 15

FO Query Rewriting ICons: 1 - Normalization of NCs

  • Def: Let u Î SNC and Q a BCQ; the normalization of u

w.r.t. ~u , is obtained replacing every argument in the body

  • f u by a representative of its equivalence class (a constant

if the equivalence class contains a constant) and adding to the body the conjunction all s ¹ t for any two different representatives s and t such that s is a variable occurring in the instance, and t is either a variable occurring in the instance or a constant in SNC and Q.

  • Normalization of u, (u,Q), is the set of all instances of u

subject to all equivalence relations ~u. and (SNC,Q) = u Î SNC (u,Q).

15

slide-16
SLIDE 16

NCs Normalization

16

SNC = {u1: p(U,U)  ^, u2: p(X,Y)  q(X)  ^} Q: $X q(X) (SNC) = {u1’: p(U,U)  ^, u2’: p(X,X)  q(X)  ^, u2’: p(X,Y)  q(X)  X ¹ Y ^}

slide-17
SLIDE 17

FO Query Rewriting ICons: 2 – Enforcement of NCs

  • Identify the set of constraints that need to be enforced in

the rewriting.

  • Def: Given a BCQ Q and a set SNC, u Î (SNC,Q) needs

to be enforced iff there exists C Í Q, C ¹ Æ, such that C unifies with B Í body(u), and there is no u’ such that body(u’) homomorphically maps to B’ Í body(u).

17

(SNC) = { u1’: p(U,U)  ^, u2’: p(X,X)  q(X)  ^, u3’: p(X,Y)  q(X)  X ¹ Y ^} Q: $X q(X) u1’, u2’ do not need to be enforced, but u3’ does.

slide-18
SLIDE 18

FO Query Rewriting ICons : 2 – Enforcement of NCs

18

u3’: p(X1,Y)  q(X1)  X1 ¹ Y ^ Q: $X q(X) F = q(X)  $Y (p(X,Y)  q(X)  X ¹ Y)

slide-19
SLIDE 19

FO Query Rewriting ICons: 2 – Enforcement of NCs

  • Proposition: Let KB = (D, SNC ), Q a CQ, and

SQ Í (SNC,Q) be the set of constraints that need to be enforced in Q. Then, KB ICons Q iff (D, SQ) ICons Q.

  • Theorem: KB ICons Q iff D  enforcement(Q, (SNC ,Q)).

19

slide-20
SLIDE 20

FO Query Rewriting ICons: General Case

  • Rewriting of Q under the intersection semantics when

S = ST È SNC.

  • It is possible to rewrite the body of the negative constraints

first relative to a set ST and then to enforce the new set of negative constraints (containing all possible rewritings of the negative constraints) in Q.

  • Several works for FO rewritability of different fragments of

Datalog+/-; in this work we assume such an algorithm for Linear Datalog+/-.

20

slide-21
SLIDE 21

FO Query Rewriting ICons: General Case

21

  • Proposition: Let KB = (D, S) with S = ST È SNC, Q a CQ,

and SRew = {F  ^ | F Î TGDrewrite(body(u), ST) with u Î SNC}. Then, culprits(KB) = culprits(KB’), with KB’=(D, SRew).

  • Proposition: KB ICons Q iff (D, SRew È ST)  Q.
slide-22
SLIDE 22

FO Query Rewriting ICons : General Case

22

  • Th: Let KB = (D, S) with (linear) S = ST È SNC, Q a BCQ.

Then, KB ICons Q iff D  rewriteICons(Q, S).

slide-23
SLIDE 23

rewriteICons: Example

23

ST = {s(X)  q(X), t(X,Y)  $Z p(Z,X)} SNC = {u1: p(U,U)  ^, u2: p(X,Y)  q(X)  ^} Q: $X q(X) (SRew) = {u1’: p(U,U)  ^, u2’: p(X,X)  q(X)  ^, u2’: p(X,X)  s(X)  ^, u3’: p(X,Y)  q(X)  X ¹ Y ^, u4’: p(X,Y)  s(X)  X ¹ Y ^} ST = {s(Z)  q(Z)} SRew = {u1: p(U,U)  ^, u2: p(X,Y)  q(X)  ^, u3: p(X,Y)  s(X)  ^}

1 - Rewrite SNC relative to ST 2 – Normalize SRew

slide-24
SLIDE 24

rewriteICons: Example

24

(SRew) = {u1’: p(U,U)  ^, u2’: p(X,X)  q(X)  ^, u2’: p(X,X)  s(X)  ^, u3’: p(X,Y)  q(X)  X ¹ Y ^, u4’: p(X,Y)  s(X)  X ¹ Y ^} Qrew = {$X1 q(X1), $X2 s(X2)} F1 = q(X1)  $Y (p(X1,Y)  q(X1)  X1 ¹ Y F2 = q(X)  $Y (p(X,Y)  s(X)  X ¹ Y $X [q(X)  $Y (p(X,Y)  q(X)  X ¹ Y)]  $X [q(X)  $Y (p(X,Y)  s(X)  X ¹ Y)]

3 - Rewrite Q relative to ST 4 – Enforce (SRew) in Qrew

slide-25
SLIDE 25

Conclusions

  • We developed an algorithm for FO query rewriting of linear

Datalog+/- ontologies under the Intersection Semantics.

  • We build on top of existing works on FO query rewriting for

linear Datalog+/- under the standard (non inconsistency- tolerant) semantics.

  • The algorithm and results hold for every fragment of

Datalog+/- that is FO rewritable.

25

slide-26
SLIDE 26

26

Thank You!