Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 - - PowerPoint PPT Presentation

ontological constraints
SMART_READER_LITE
LIVE PREVIEW

Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 - - PowerPoint PPT Presentation

Optimizing Query Answering under Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of Computing Oxford Martin School University of Oxford 2 Department of Computer Science University of Oxford VLDB 2011


slide-1
SLIDE 1

Optimizing Query Answering under Ontological Constraints

Giorgio Orsi1,2 and Andreas Pieris2

1Institute for the Future of Computing

Oxford Martin School University of Oxford

2Department of Computer Science

University of Oxford VLDB 2011

slide-2
SLIDE 2

Ontological Databases

Ontological Reasoning DB Constraints Ontological DB

slide-3
SLIDE 3

Ontological Databases

D  D D

ABox TBox

Ontological Reasoning DB Constraints Ontological DB

slide-4
SLIDE 4

Ontological Databases

D  D D Q(X)  9Y (X,Y)

ABox TBox

Ontological Reasoning DB Constraints Ontological DB

slide-5
SLIDE 5

Ontological Databases

D  D D

,

ABox TBox

{ t | D [  ² 9u (t,u) } Ontological Reasoning DB Constraints Ontological DB Q(X)  9Y (X,Y)

slide-6
SLIDE 6

Ontological Constraints (examples)

Concept Inclusions: 8X emp(X)  person(X) (Inverse) Relation Inclusion: Relation Transitivity: 8X8Y8Z mgs(X,Y),mgs(Y,Z)  mgs(X,Z) 8X8Y manages(X,Y)  isManaged(Y,X) Participation: 8X emp(X)  9Y report(X,Y) Disjointness: 8X emp(X), customer(X)  ? Functionality: 8X8Y8Z reports(X,Y),reports(X,Z)  Y = Z

slide-7
SLIDE 7

Datalog§

¡ Datalog variant allowing in the head:

  • 9-variables ! TGDs 8X8Y (X,Y)  9Z (X,Z)
  • Equality atoms ! EGDs 8X (X)  Xi=Xj
  • Constant false (?) ! NCs 8X (X)  ?

Datalog+

[Cali’ et Al, PODS 09]

slide-8
SLIDE 8

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

  • 9-variables ! TGDs 8X8Y (X,Y)  9Z (X,Z)
  • Equality atoms ! EGDs 8X (X)  Xi=Xj
  • Constant false (?) ! NCs 8X (X)  ?

¡ But, query answering under Datalog+ is undecidable

slide-9
SLIDE 9

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

  • 9-variables ! TGDs 8X8Y (X,Y)  9Z (X,Z)
  • Equality atoms ! EGDs 8X (X)  Xi=Xj
  • Constant false (?) ! NCs 8X (X)  ?

¡ Datalog+ is syntactically restricted ! Datalog§ ¡ But, query answering under Datalog+ is undecidable

slide-10
SLIDE 10

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

  • 9-variables ! TGDs 8X8Y (X,Y)  9Z (X,Z)
  • Equality atoms ! EGDs 8X (X)  Xi=Xj
  • Constant false (?) ! NCs 8X (X)  ?

¡ Datalog+ is syntactically restricted ! Datalog§ ¡ But, query answering under Datalog+ is undecidable ¡ TGDs more expressive than inclusion dependencies 8D8P8A runs(D,P),area(P,A)  9E employee(E,D,A)

slide-11
SLIDE 11

The Chase Procedure

Input: Database D, set of TGDs  Output: A model of D [  person(john) 8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X) D  chase(D,) = D [ ?

slide-12
SLIDE 12

The Chase Procedure

Input: Database D, set of TGDs  Output: A model of D [  person(john) D chase(D,) = D [ {father(z1,john) 8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X) 

slide-13
SLIDE 13

The Chase Procedure

Input: Database D, set of TGDs  Output: A model of D [  person(john) D chase(D,) = D [ {father(z1,john), person(z1) 8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X) 

slide-14
SLIDE 14

The Chase Procedure

Input: Database D, set of TGDs  Output: A model of D [  person(john) D chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1) 8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X) 

slide-15
SLIDE 15

The Chase Procedure

Input: Database D, set of TGDs  Output: A model of D [  person(john) D chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …} 8X person(X)  9Y father(Y,X) 8X8Y father(X,Y)  person(X) 

slide-16
SLIDE 16

Query Answering via Chase

[see, e.g., Deutsch, Nash & Remmel, PODS 08]

D [  ² Q , chase(D,) ² Q D

. . .

C = chase(D,) M1 M2 h1 h2

h1(C) h2(C)

Q h

slide-17
SLIDE 17

 Q

Query Answering via Rewriting

slide-18
SLIDE 18

 Q Q

compilation

Query Answering via Rewriting

slide-19
SLIDE 19

Q

evaluation

 Q Q

compilation

D

Query Answering via Rewriting

slide-20
SLIDE 20

Chase vs Rewriting

slide-21
SLIDE 21

Linear TGDs

8X8Y r(X,Y)  9Z (X,Z)

single body atom ¡ Properly generalize inclusion dependencies. ¡ Enjoy the bounded-derivation depth property. ¡ FO-rewritable  Query Answering in AC0 (data complexity).

slide-22
SLIDE 22

Q q  promotesTo(A,B), customer(B) (original query) promoter(X)  Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y)  q  promotesTo(A,B), customer(B)

Q

FO-rewritability: example [Gottlob et Al., ICDE 11]

slide-23
SLIDE 23

q  promotesTo(A,B), customer(B) q  promotesTo(A,B), customer(V0,B) { Y = B } ( V0 is fresh ) promoter(X)  Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) 

Q

FO-rewritability: example [Gottlob et Al., ICDE 11]

Q q  promotesTo(A,B), customer(B)

slide-24
SLIDE 24

q  promotesTo(A,B), customer(B) q  promotesTo(A,B), promotesTo(V0,B) ans(A)  promotesTo(A,B) factorization { A = V0 } promoter(X)  Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) 

Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Q q  promotesTo(A,B), customer(B)

slide-25
SLIDE 25

q  promoter(A) promoter(X)  Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) 

Q

FO-rewritability: example [Gottlob et Al., ICDE 11]

Q q  promotesTo(A,B), customer(B) q  promotesTo(A,B) {X = A, Y = B} q  promotesTo(A,B), customer(B)

slide-26
SLIDE 26

UCQ rewriting (first-order)

promoter(X)  Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) 

Q

FO-rewritability: example [Gottlob et Al., ICDE 11]

Q q  promoter(A) q  promotesTo(A,B), customer(B) q  promotesTo(A,B) q  promotesTo(A,B), customer(B)

slide-27
SLIDE 27

FO-rewritability

¡ Desirable properties of a FO-rewriting:  independent on the DB  executable by any DBMS  easy to compute (e.g., polynomial time)  small size (e.g., polynomial size)

slide-28
SLIDE 28

FO-rewritability

¡ Unions of Conjunctive Queries (UCQs)  executable by any DBMS  DB independent  easy to optimize and distribute  worst-case exponential size in Q and 

Calvanese et Al, JAR 07 Perez Urbina et Al, JAL 09 Cali’ et Al, PODS 09 Gottlob et Al, ICDE 11 and others…

¡ Desirable properties of a FO-rewriting:  independent on the DB  executable by any DBMS  easy to compute (e.g., polynomial time)  small size (e.g., polynomial size)

slide-29
SLIDE 29

¡ Combined and hybrid FO-rewriting  good computational properties (e.g., polynomial in size)  requires access to the DB

Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 Gottlob and Schwentick, DL 11

FO-rewritability

slide-30
SLIDE 30

¡ Purely intensional Datalog rewriting  very compressed representation  purely intensional  requires view-creation or Datalog engine ¡ Combined and hybrid FO-rewriting  good computational properties (e.g., polynomial in size)  requires access to the DB

Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 Gottlob and Schwentick, DL 11 Perez Urbina et Al, JAL 09 Rosati and Almatelli., KR 10

FO-rewritability

slide-31
SLIDE 31

Datalog Rewriting: Keep it First-Order!

¡ A Datalog query is (in general) not a first-order query  a non-recursive Datalog query is a first-order query  a bounded Datalog query is a first-order query

slide-32
SLIDE 32

¡ A Datalog query is (in general) not a first-order query  a non-recursive Datalog query is a first-order query  a bounded Datalog query is a first-order query ¡ Input:  a (w.l.o.g. boolean) conjunctive query Q = <q,ρ> Q : q(X)  p(X), s(X,Y)  <q, q(X) p(X),s(X,Y) >  a set of linear TGDs  ¡ Output:  a bounded Datalog query Q = <q,π >

Datalog Rewriting: Keep it First-Order!

slide-33
SLIDE 33

Datalog Rewriting: skolemization (and renaming)

r(X,Y)  Z s(Y,Z) s(X,Y)  Z p(Y,Y,Z) p(X,Y,Z)  t(Z) 

slide-34
SLIDE 34

r(X,Y)  Z s(Y,Z) s(X,Y)  Z p(Y,Y,Z) p(X,Y,Z)  t(Z)  r(X1,Y1)  s(Y1,f1(Y1)) s(X2,Y2)  p(Y2,Y2,f2(Y2)) p(X3,Y3,Z3)  t(Z3) f

Datalog Rewriting: skolemization (and renaming)

slide-35
SLIDE 35

Datalog Rewriting: Skolemization (and renaming)

r(X,Y)  Z s(Y,Z) s(X,Y)  Z p(Y,Y,Z) p(X,Y,Z)  t(Z) ¡ f and  are equisatisfiable (not equivalent) ¡ Introduce one Skolem function for each existential variable  r(X1,Y1)  s(Y1,f1(Y1)) s(X2,Y2)  p(Y2,Y2,f2(Y2)) p(X3,Y3,Z3)  t(Z3) f

slide-36
SLIDE 36

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f  at least one of the rules contains Skolem terms δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) f

slide-37
SLIDE 37

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f  at least one of the rules contains Skolem terms f [f] … r(X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1))) … δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3)

slide-38
SLIDE 38

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.

slide-39
SLIDE 39

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations. δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3)

slide-40
SLIDE 40

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations. ¡ [f] depends only on . ¡ [f] is possibly infinite linear TGDs have BDDP: suffices to construct it up to k steps [f]k. δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3)

slide-41
SLIDE 41

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.  use only rules with Skolem terms.

slide-42
SLIDE 42

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.  use only rules with Skolem terms. … δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) … [δ12]] : r (X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1))) … Q  s(A,B), p(B,B,C) [f] Q … Q  r(X1,Y1), p(f1(Y1), f1(Y1),C) … [Q,f]

slide-43
SLIDE 43

Datalog Rewriting: Query Saturation

¡ bypasses chase derivations with function symbols … δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) … [δ12]] : r (X1,Y1)  p(f1(Y1) ,f1(Y1), f2(f1(Y1))) … Q  s(A,B), p(B,B,C) [f] Q

slide-44
SLIDE 44

Datalog Rewriting: Finalization

¡ keep only the function-free rules from [f] [ [Q,f] ¡ derivations producing certain answers are captured by function-symbol-free rules.

slide-45
SLIDE 45

¡ use the predicate graph to reduce the number of rules in f δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) f Q  s(A,B), p(B,B,C) Q

Optimizations: Pruning

slide-46
SLIDE 46

Optimizations: Pruning

¡ use the predicate graph to reduce the number of rules in f δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) f Q  s(A,B), p(B,B,C) Q

slide-47
SLIDE 47

Optimizations: Pruning

¡ use the predicate graph to reduce the number of rules in f δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) δ3 : p(X3,Y3,Z3)  t(Z3) f Q  s(A,B), p(B,B,C) Q ¡ we are no longer independent on Q!

slide-48
SLIDE 48

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2)) f Q Q  s(A,B), p(B,B,C)

slide-49
SLIDE 49

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation f Q s(A,B) ² p(B,B,C) Q  s(A,B), p(B,B,C) atom coverage δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))

slide-50
SLIDE 50

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation f Q  s(A,B), p(B,B,C) ≡ Q  s(A,B) Q s(A,B) ² p(B,B,C) Q  s(A,B), p(B,B,C) atom coverage δ1 : r (X1,Y1)  s(Y1,f1(Y1)) δ2 : s(X2,Y2)  p(Y2,Y2,f2(Y2))

slide-51
SLIDE 51

Optimizations: Query Elimination

¡ unique elimination strategy (w.r.t. the final size of the rewriting)  see paper. ¡ given m = |body(ρ)| and n = ||  worst-case size of [f] [ [Q,f] is O((n∙m)m)  worst-case size of Q = <q,π > is O(n+m)m ¡ atom coverage under linear TGDs can be checked in polynomial time  see paper.

slide-52
SLIDE 52

Experimental Results

slide-53
SLIDE 53

Discussion

¡ Datalog rewriting is substantially more compact than UCQ rewriting. ¡ Unclear whether this always leads to increase in performance. ¡ Extend the procedure to larger classes of TGDs guarded TGDs [Cali’ et Al, PODS 09]  non FO-rewritable sticky-join TGDs [Cali’ et Al, VLDB 10]

slide-54
SLIDE 54

The Datalog Family

Thomas Lukasiewicz Georg Gottlob Andreas Pieris Andrea Calì Giorgio Orsi Thank you!