First-Order Under-Approximations of Consistent Query Answers DBDBD - - PowerPoint PPT Presentation

first order under approximations of consistent query
SMART_READER_LITE
LIVE PREVIEW

First-Order Under-Approximations of Consistent Query Answers DBDBD - - PowerPoint PPT Presentation

First-Order Under-Approximations of Consistent Query Answers DBDBD 2015, Amsterdam Floris Geerts Fabian Pijcke Jef Wijsen Dept. of Computer Science University of Mons Dept. of Mathematics and Computer Science University of Antwerp


slide-1
SLIDE 1

First-Order Under-Approximations of Consistent Query Answers

DBDBD 2015, Amsterdam Floris Geerts Fabian Pijcke Jef Wijsen

  • Dept. of Computer Science — University of Mons
  • Dept. of Mathematics and Computer Science — University of Antwerp
slide-2
SLIDE 2

Uncertain Database

Definition (Uncertain Database and Repair)

An uncertain database is a database in which primary keys can be violated. A repair of an uncertain database is any maximal consistent subset.

Example

ManagedBy Dept Mgr Budget CIA Barack 60M MI6 James 15M WorksFor Agent Dept James CIA James MI6 The uncertainty about James’ department gives rise to two repairs:

  • ne with WorksFor(James, CIA), another with WorksFor(James, MI6).
  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 2 / 13

slide-3
SLIDE 3

Certain Query Answering

Definition

The certain answer to a query q on an uncertain database db is defined by:

  • {q(rep) | rep is a repair of db}.

Intuitively, an answer is certain if it holds true in every repair. We write ⌊q⌋ for the query that takes in an uncertain database db, and returns the certain answer, i.e., ⌊q⌋ (db) :=

  • {q(rep) | rep is a repair of db}.
  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 3 / 13

slide-4
SLIDE 4

Certain Query Answering: Example

Let db be the following uncertain database:

ManagedBy Dept Mgr Budget CIA Barack 60M MI6 James 15M WorksFor Agent Dept James CIA James MI6

Let

rep1 be the repair with WorksFor(James, CIA), and rep2 be the repair with WorksFor(James, MI6).

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 4 / 13

slide-5
SLIDE 5

Certain Query Answering: Example

Let db be the following uncertain database:

ManagedBy Dept Mgr Budget CIA Barack 60M MI6 James 15M WorksFor Agent Dept James CIA James MI6

Let

rep1 be the repair with WorksFor(James, CIA), and rep2 be the repair with WorksFor(James, MI6).

Let q0 be the query “Which departments are self-managed, i.e., managed by one of its agents?” q0 = {d | ∃m∃b (ManagedBy(d, m, b) ∧ WorksFor(m, d))}. ⌊q0⌋ (db) = q0(rep1) ∩ q0(rep2) = {} ∩ {MI6} = {}

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 4 / 13

slide-6
SLIDE 6

Data Complexity

The focus of this paper is on computing certain answers to self-join-free conjunctive queries q, for which three possibilities can

  • ccur:
  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 5 / 13

slide-7
SLIDE 7

Data Complexity

The focus of this paper is on computing certain answers to self-join-free conjunctive queries q, for which three possibilities can

  • ccur:

A ⌊q⌋ can be expressed in relational calculus (the “ideal” case);

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 5 / 13

slide-8
SLIDE 8

Data Complexity

The focus of this paper is on computing certain answers to self-join-free conjunctive queries q, for which three possibilities can

  • ccur:

A ⌊q⌋ can be expressed in relational calculus (the “ideal” case); B ⌊q⌋ cannot be expressed in relational calculus, but can be computed by a polynomial-time algorithm; or

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 5 / 13

slide-9
SLIDE 9

Data Complexity

The focus of this paper is on computing certain answers to self-join-free conjunctive queries q, for which three possibilities can

  • ccur:

A ⌊q⌋ can be expressed in relational calculus (the “ideal” case); B ⌊q⌋ cannot be expressed in relational calculus, but can be computed by a polynomial-time algorithm; or C ⌊q⌋ cannot even be computed by a polynomial-time algorithm (unless P = NP).

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 5 / 13

slide-10
SLIDE 10

Data Complexity

The focus of this paper is on computing certain answers to self-join-free conjunctive queries q, for which three possibilities can

  • ccur:

A ⌊q⌋ can be expressed in relational calculus (the “ideal” case); B ⌊q⌋ cannot be expressed in relational calculus, but can be computed by a polynomial-time algorithm; or C ⌊q⌋ cannot even be computed by a polynomial-time algorithm (unless P = NP).

Recall: a self-join-free conjunctive query q is a relational calculus query of the form: { x | ∃ y (R1( z1) ∧ · · · ∧ Rℓ( zℓ))}, in which i = j implies Ri = Rj.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 5 / 13

slide-11
SLIDE 11

Examples

Case A: ⌊q⌋ in relational calculus “Who is the manager of CIA?”: q0 = {m | ∃b (ManagedBy(CIA, m, b))}. ⌊q0⌋ can be expressed in relational calculus, as follows: ⌊q0⌋ = {m | ∃b (ManagedBy(CIA, m, b) ∧∀m′∀b′ (ManagedBy(CIA, m′, b′) → m′ = m))}

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 6 / 13

slide-12
SLIDE 12

Examples

Case A: ⌊q⌋ in relational calculus “Who is the manager of CIA?”: q0 = {m | ∃b (ManagedBy(CIA, m, b))}. ⌊q0⌋ can be expressed in relational calculus, as follows: ⌊q0⌋ = {m | ∃b (ManagedBy(CIA, m, b) ∧∀m′∀b′ (ManagedBy(CIA, m′, b′) → m′ = m))} Case B: ⌊q⌋ in P, but not expressible in relational calculus “Get budgets

  • f self-managed departments”:

q0 = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 6 / 13

slide-13
SLIDE 13

Examples

Case A: ⌊q⌋ in relational calculus “Who is the manager of CIA?”: q0 = {m | ∃b (ManagedBy(CIA, m, b))}. ⌊q0⌋ can be expressed in relational calculus, as follows: ⌊q0⌋ = {m | ∃b (ManagedBy(CIA, m, b) ∧∀m′∀b′ (ManagedBy(CIA, m′, b′) → m′ = m))} Case B: ⌊q⌋ in P, but not expressible in relational calculus “Get budgets

  • f self-managed departments”:

q0 = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}. Case C: ⌊q⌋ is coNP-hard Example in the paper.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 6 / 13

slide-14
SLIDE 14

Research Question

Since RDBMSs cope well with relational calculus (in the form of SQL), it is easy to handle the case where ⌊q⌋ is expressible in relational calculus (case A). But what if ⌊q⌋ is not expressible in relational calculus (cases B and C)?

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 7 / 13

slide-15
SLIDE 15

Research Question

Since RDBMSs cope well with relational calculus (in the form of SQL), it is easy to handle the case where ⌊q⌋ is expressible in relational calculus (case A). But what if ⌊q⌋ is not expressible in relational calculus (cases B and C)? Find a relational calculus query ϕ (the greater with respect to ⊆, the better) such that Under-Approximation: ϕ ⊆ ⌊q⌋; and First-Order Postprocessig: ϕ is a first-order combination (using ∧, ∨, ¬, ∃, ∀) of queries of the form ⌊qi⌋, where qi is self-join-free conjunctive and ⌊qi⌋ can be expressed in relational calculus (as in case A). Such query ϕ is called a strategy for ⌊q⌋.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 7 / 13

slide-16
SLIDE 16

Practical Setting

1 Restricted query interface to an inconsistent database db:

You can only ask self-join-free conjunctive queries q!

2 Moreover, the interface only returns consistent answers computable in

relational calculus: If ⌊q⌋ cannot be expressed in relational calculus, then your query q is rejected;

  • therwise the answer ⌊q⌋ (db) will be returned.

3 Assume that your query q is rejected. How will you proceed?

Find queries q1, . . . , qℓ, each accepted by the interface, and a relational calculus query ϕ such that ϕ(⌊q1⌋ (db), . . . , ⌊qℓ⌋ (db)) is a “large” subset of ⌊q⌋ (db). Intuitively, the strategy ϕ does some first-order postprocessing on answers obtained from the interface.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 8 / 13

slide-17
SLIDE 17

Optimality of Strategies

Let q be a self-join-free conjunctive query q such that ⌊q⌋ is not expressible in relational calculus. Obviously, there exists no strategy ϕ such that ϕ ≡ ⌊q⌋, because ϕ is a relational calculus query, but ⌊q⌋ cannot be expressed in relational calculus. Obviously, strategies are closed under union: if ϕ1 and ϕ2 are strategies, then ϕ1 ∪ ϕ2 is a strategy. If neither of ϕ1 or ϕ2 is contained in the other, then ϕ1 ∪ ϕ2 is a better strategy than ϕ1 (and than ϕ2). A strategy ϕ for ⌊q⌋ is called optimal if for every other strategy ϕ′, we have ϕ′ ⊆ ϕ ⊆ ⌊q⌋.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 9 / 13

slide-18
SLIDE 18

Example

1 “Get budgets of self-managed departments”:

q0 = {b | ∃d∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}. ⌊q0⌋ cannot be expressed in relational calculus!!!

2 “Get budgets of self-managed departments managed by Barack (or

James)”: q1 = {b | ∃d (ManagedBy(d, ‘Barack’, b) ∧ WorksFor(‘Barack’, d))} q2 = {b | ∃d (ManagedBy(d, ‘James’, b) ∧ WorksFor(‘James’, d))} ⌊q1⌋ and ⌊q2⌋ can be expressed in relational calculus!!!

3 Then, the following query is a strategy for ⌊q0⌋:

⌊q1⌋ ∪ ⌊q2⌋ . This strategy is not optimal (since we can add a query for, e.g., ‘Sherlock’).

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 10 / 13

slide-19
SLIDE 19

Example (Continued)

q0 = “Get budgets of self-managed departments”:

1 “Get self-managed departments together with their budget”:

q3 = {d, b | ∃m (ManagedBy(d, m, b) ∧ WorksFor(m, d))}. “Get manager and budget of self-managed departments”: q4 = {m, b | ∃d (ManagedBy(d, m, b) ∧ WorksFor(m, d))}. ⌊q3⌋ and ⌊q4⌋ can be expressed in relational calculus!!!

2 Then, the following query is a strategy for ⌊q0⌋:

∃d ( ⌊q3(d, b)⌋ ) ∪ ∃m ( ⌊q4(m, b)⌋ ) . This strategy is strictly better than the strategy on the previous slide (it does not rely on constants like Barack, James, Sherlock).

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 11 / 13

slide-20
SLIDE 20

Contribution

1 We show how to build, given a self-join-free conjunctive query q, a

strategy ϕ for ⌊q⌋ of the syntactic form ∃ x1 ( ⌊q1⌋ ) ∪ · · · ∪ ∃ xℓ ( ⌊qℓ⌋ ) . That is, postprocessing is limited to ∃ and ∪.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 12 / 13

slide-21
SLIDE 21

Contribution

1 We show how to build, given a self-join-free conjunctive query q, a

strategy ϕ for ⌊q⌋ of the syntactic form ∃ x1 ( ⌊q1⌋ ) ∪ · · · ∪ ∃ xℓ ( ⌊qℓ⌋ ) . That is, postprocessing is limited to ∃ and ∪.

2 We show that our strategy is optimal in some weak sense: for every

  • ther strategy ϕ′ of the same syntactic form, we have

ϕ′ ⊆ ϕ ⊆ ⌊q⌋.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 12 / 13

slide-22
SLIDE 22

Contribution

1 We show how to build, given a self-join-free conjunctive query q, a

strategy ϕ for ⌊q⌋ of the syntactic form ∃ x1 ( ⌊q1⌋ ) ∪ · · · ∪ ∃ xℓ ( ⌊qℓ⌋ ) . That is, postprocessing is limited to ∃ and ∪.

2 We show that our strategy is optimal in some weak sense: for every

  • ther strategy ϕ′ of the same syntactic form, we have

ϕ′ ⊆ ϕ ⊆ ⌊q⌋.

3 Open question: Is it possible to improve strategies by using negation

in postprocessing?

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 12 / 13

slide-23
SLIDE 23

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13

slide-24
SLIDE 24

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

never divulge inconsistencies to end users; and

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13

slide-25
SLIDE 25

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

never divulge inconsistencies to end users; and the data complexity of queries must remain tractable (and even within relational calculus in this paper).

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13

slide-26
SLIDE 26

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

never divulge inconsistencies to end users; and the data complexity of queries must remain tractable (and even within relational calculus in this paper).

2 The notion of strategy captures how end users can obtain certain

answers under such access postulates.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13

slide-27
SLIDE 27

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

never divulge inconsistencies to end users; and the data complexity of queries must remain tractable (and even within relational calculus in this paper).

2 The notion of strategy captures how end users can obtain certain

answers under such access postulates.

3 We show how to build strategies of a syntactically restricted form.

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13

slide-28
SLIDE 28

Conclusion

1 We have proposed a new framework for divulging an inconsistent

database to end users, which adopts two postulates:

never divulge inconsistencies to end users; and the data complexity of queries must remain tractable (and even within relational calculus in this paper).

2 The notion of strategy captures how end users can obtain certain

answers under such access postulates.

3 We show how to build strategies of a syntactically restricted form. 4 Challenging open question: Is it possible to find better strategies of a

more general syntactic form?

  • F. Geerts, F. Pijcke, J. Wijsen

First-Order Under-Approximations of Consistent Query Answers 13 / 13