1
Database Theory: Conjunctive Queries & Static Analysis CS 645 - - PowerPoint PPT Presentation
Database Theory: Conjunctive Queries & Static Analysis CS 645 - - PowerPoint PPT Presentation
Database Theory: Conjunctive Queries & Static Analysis CS 645 Feb 20, 2006 1 Life of a database theoretician Expressiveness of query languages Any query in L1 can be expressed in L2 Query q cannot be expressed in L
Life of a database theoretician
- Expressiveness of query languages
– Any query in L1 can be expressed in L2 – Query q cannot be expressed in L
- Complexity of languages
– Bounds on resources required to evaluate any query in language L
- Static analysis of queries (for optimization)
– Given q in L: is it minimal? – Given q1 and q2 in L: are they equivalent?
- Views
2
Coming lectures
- TODAY:
– Overview of languages – Conjunctive queries (CQs) – Properties of CQs – Containment/equivalence for CQs
- Next Week
– Adding recursion – Reasoning about views
3
Query languages
- So far weʼve seen:
– Relational algebra – Relational calculus – SQL
4
Review: relational algebra
- Five operators:
– Union: ∪ – Difference: - – Selection: σ – Projection: Π – Cartesian Product: ×
- Derived or auxiliary operators:
– Intersection, complement – Joins (natural,equi-join, theta join) – Renaming: ρ
Review: relational calculus
Name and sid of students who are taking the course “DB” English:
{xname, xsid | ∃xcid∃xterm Students(xsid,xname) ∧ Takes(xsid,xcid) ∧ Course(xcid,”DB”, xterm) }
RC: RA:
Πname,sid (Students Takes σname=”DB” (Course)
Review: SQL
Basic form: SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)
Query language classes
FO queries Expressiveness Logic Algebra SQL RA (safe) RC SFW +
UNION EXCEPT
single datalog rule
Conjunctive Queries
Recursive Queries
9
Conjunctive Queries
- A subset of FO queries (i.e. less expressive)
- Many queries in practice are conjunctive
- Some optimizers handle only conjunctive
queries - break larger queries into many CQs
- CQʼs have “better” theoretical properties than
arbitrary queries
abbreviated: CQ
- R: Extensional database (EDB) - stored
- P: Intentional database (IDB) - computed
Conjunctive Queries
10
in rule-based (datalog) notation
P(x,z) :- R(x,y) & R(y,z)
Body Head Variables Subgoals “IF” Conjunction ← Implicit ∃
- More formally:
- Consider all possible substitutions:
assignments of the variables in the body
Conjunctive Queries
11
P(x,z) :- R(x,y) & R(y,z)
Intuitively: when facts in the body are true of stored relations, then we infer the fact in the head
12
Examples
A(x) :- ManagedBy(“Smith”,y) & ManagedBy(x,y)
ManagedBy(x,y) Sam Frank Smith Joan Sally Joan Smith Bob A(x) Sally Smith
substitution y = Joan x = Sally substitution y = Joan x = Smith substitution y = Bob x = ?
All employees having the same manager as “Smith” EDB Relation: ManagedBy(emp,mgr)
Defining answers to CQ
- A substitution v is a function from variables into the
- domain. e.g. x → a, y → a, z → b, u → c
- Let I be an instance, i.e. relations I(R1) ... I(Rn)
- A tuple t is in the answer q(I) if there is a
substitution v s.t:
– v(u1) ∈ I(R1) for each i, and – t = v(u)
13
ans(u) :- R1(u1) & ... & Rn(un)
e.g. ui = (x,y,z)
general form
- f a CQ q
v(ui) = (a,a,b)
14
Examples
- Find all employees having the same director
as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)
EDB Relation: ManagedBy(emp,mgr)
(Your director is your managerʼs manager)
Query language classes
FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +
UNION EXCEPT
RA: σ,π,×
single datalog rule
Conjunctive Queries
Recursive Queries
16
CQ and RA
Relational Algebra:
- CQ correspond precisely to σC, ΠA, ×
(missing: ∪, –)
Π$2.name
σname=“Smith”
ManagedBy ManagedBy
$1.manager=$2.manager
A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
Query language classes
FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +
UNION EXCEPT
SdFW RA: σ,π,×
single datalog rule
Conjunctive Queries
Recursive Queries
18
CQ and SQL
select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)
Rule-based: SQL:
Notice “distinct”
Boolean queries
19
A() :- ManagedBy(“Smith”, x), ManagedBy(“Sally”, x) Is there someone who manages both Smith and Sally?
- Returns:
– relation {〈〉} if the answer is yes – relation { } if the answer is no
empty relation non-empty relation
Properties of Conjunctive Queries
- Satisfiability
– A query q is satisfiable if there exists some input relation I such that q(I) is non-empty. – FACT: Every CQ is satisfiable.
- Monotonicity
– A query q is monotonic if for each instance I,J
- ver schema, I ⊆ J implies q(I) ⊆ q(J).
– FACT: Every CQ is monotonic.
20
Satisfiability of CQs
21
S(x,y,z) :- P(x,w) & R(w,y,v) & P(v,z)
a b d e
P
b c d
R We can always generate satisfying EDB relations from the body of the rule.
a c e
S a b b c d d e
Monotonicity of CQs
- Consider two databases I, J s.t. I ⊆ J.
- let t ∈ q(I).
– Then for some substitution v:
- v(ui) ∈ I(Ri) for each i.
- t = v(u)
– Since I ⊆ J, v(ui) ∈ J(Ri) for each i
– So t ∈ q(J) ans(u) :- R1(u1) & ... & Rn(un)
e.g. ui = (x,y,z)
general form
- f a CQ q
Consequence of monotonicity
SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker=ʻGizmo-Worksʼ) Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works”
- This query is NOT monotone.
- Therefore, it is not in the class of conjunctive queries.
- It cannot be expressed as a simple SFW query.
Extensions of CQs
24
Query language classes
FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +
UNION EXCEPT
SdFW RA: σ,π,×
single datalog rule
Conjunctive Queries
Recursive Queries
26
Extensions of CQ: disequality
CQ≠
A(y) :- ManagedBy(x,y), ManagedBy(z,y), x≠z
Find managers that manage at least 2 employees
Extensions of CQ: inequality
CQ<
A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u > v
Find employees earning more than their manager
Additional EDB Relation: Salary(emp,money)
28
Extensions of CQ: negation
CQ¬ A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x), ¬ManagedBy(y,x)
Find people sharing the same office with Alice, but with a different manager
Additional EDB Relation: Office(emp,officenum)
29
Extensions of CQ: union
UCQ A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address)
Unions of conjunctive queries Datalog notation is very convenient for expressing unions (no need for ∨ ) Rule-based:
Query language classes
FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +
UNION EXCEPT
SdFW RA: σ,π,×
single datalog rule
UCQ CQ< CQ≠ CQ¬
Conjunctive Queries
Recursive Queries
31
Extensions of CQ
- If we extend too much, we capture FO
– Namely: CQs + Union, Negation
- Theoreticians need to be careful: small
extensions may make a huge difference
- n certain theoretical properties of CQ
Query language classes
FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +
UNION EXCEPT
Conjunctive Queries
Recursive Queries UCQ¬ SdFW RA: σ,π,×
single datalog rule
UCQ CQ< CQ≠ CQ¬
33
Query Equivalence and Containment
- One kind of static analysis
- Useful for query optimization
- Intensively studied since 1977
34
Query Equivalence
SELECT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ʻSalesʼ and x.office = y.office and x.floor = 5 and y.dept = ʻSalesʼ
Hmmmm…. Is there a simpler way to write that ?
35
Query Equivalence
- Queries q1 and q2 are equivalent if for
every database D, q1(D) = q2(D).
- Notation: q1 ≡ q2
relations equal
- Query q1 is contained in q2 if for every
database D, q1(D) ⊆ q2(D).
- Notation: q1 ⊆ q2
- Obviously: q1 ⊆ q2 and q2 ⊆ q1 iff q1 ≡ q2
- Conversely: q1∧ q2 ≡ q2 iff q1 ⊆ q2
Query Containment
We will study the containment problem only.
Sidenote: containment for Boolean queries
- Recall: q1 is contained in q2 if for every database D,
q1(D) ⊆ q2(D).
– if q1, q2 are boolean they return {〈〉} or { } – containment says: – whenever q1(D) = {〈〉} then q2(D) = {〈〉}.
- Containment is implication: q1 → q2
37
38
Examples of Query Containments
q1(x) :- R(x,y), R(y,z), R(z,w) q2(x) :- R(x,y), R(y,z) Is q1 ⊆ q2 ?
39
Examples of Query Containments
q1(x) :- R(x,y), R(y,z), R(z,x) q2(x) :- R(x,y), R(y,x) Is q1 ⊆ q2 ?
A B C
Counter-example
40
Examples of Query Containments
q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v), R(v,w) Is q1 ⊆ q2 ?
A B
Example
41
Examples of Query Containments
q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v) Is q1 ⊆ q2 ?
42
Query Containment
- Theorem Query containment for CQ is
decidable and NP-complete.
(query complexity)
Checking containment
- 1. “Freeze” q1
- Replace variables by unique constants
- x --> ax, u --> ay
- this is called canonical database of q1
- 2. Evaluate q2 on frozen body of q1
- 3. If frozen head is derived, then
q1 ⊆ q2
q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v), R(v,w)
ax ay ay ay
body
(ax)
head
substitution: x = ax, u= ay, v= ay, w= ay
Containment!
Why does this test work?
- If the test is negative, the canonical database
constructed is a counterexample to containment.
- If the test is positive:
– substitution v: var(q2) --> “canonical domain” – this implies f: var(q2) --> var(q1) ∪ const(q1) – Now suppose t ∈ q1(I) for any instance I
- there is substitution w: var(q1) --> domain
– such that t is derived.
- then f followed-by w is a substitution showing
that t will be in q2(I).
44
Query Homomorphisms
- A homomorphism f : q2 → q1 is a function
f: var(q2) → var(q1) ∪ const(q1) such that:
– f(body(q2)) ⊆ body(q1) – f(tq1) = tq2
The Homomorphism Theorem q1 ⊆ q2 iff there exists a homomorphism f : q2 → q1
Chandra & Merlin 1977
q1(x) :- R(x,u), R(u,u) q2(xʼ) :- R(xʼ,uʼ), R(uʼ,vʼ), R(vʼ,wʼ)
homomorphism f: xʼ → x uʼ → u vʼ → u wʼ → u
46
The Homeomorphism Theorem
- Theorem Conjunctive query containment is:
- 1. decidable (why ?)
- 2. in NP (why ?)
- 3. NP-hard
- In short: containment for CQs is NP-complete
Query Minimization
Definition A conjunctive query q is minimal if for every other conjunctive query qʼ s.t. q ≡ qʼ, qʼ has at least as many predicates (ʻsubgoalsʼ) as q
q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x,ʼAliceʼ)
Are these queries minimal ?
Query Minimization
- Query minimization algorithm
Choose a subgoal g of q Remove g: let qʼ be the new query We already know q ⊆ qʼ (why ?) If qʼ ⊆ q then permanently remove g
- Notice: the order in which we inspect
subgoals doesnʼt matter
Other containment problems
- Extensions of CQs:
– Unions of CQs – CQs with inequality
- FO queries
- Containment under constraints
- What about bags?
– strange things happen
49
Containment under constraints
- Recall: query q1 is contained in q2 if for
every database D, q1(D) ⊆ q2(D).
- What if we know more about our input
databases?
- Replace “every database D”, with:
– “every database satisfying constraint C” Containment under FD is NP-complete
Containment for FO queries
- Theorem Satisfiability for FO queries is
undecidable
- Lemma Query containment/equivalence
for FO is undecidable
- if we had an algorithm for equivalence, we
could use it to decide satisfiability of q:
- check: q ≡ false
Consequence: we cannot do global query
- ptimization for first-order queries.
Review
- CQs are an important fragment of FO
– Equivalences: RA: σ,π,× SQL: SdFW – Properties: satisfiable, monotonic – containment/equivalence decidable, NPc
- Expressiveness
– CQs strictly less expressive than FO
- Hardness of static optimization
52