Database Theory: Conjunctive Queries & Static Analysis CS 645 - - PowerPoint PPT Presentation

database theory conjunctive queries static analysis
SMART_READER_LITE
LIVE PREVIEW

Database Theory: Conjunctive Queries & Static Analysis CS 645 - - PowerPoint PPT Presentation

Database Theory: Conjunctive Queries & Static Analysis CS 645 Feb 20, 2006 1 Life of a database theoretician Expressiveness of query languages Any query in L1 can be expressed in L2 Query q cannot be expressed in L


slide-1
SLIDE 1

1

Database Theory: Conjunctive Queries & Static Analysis

CS 645

Feb 20, 2006

slide-2
SLIDE 2

Life of a database theoretician

  • Expressiveness of query languages

– Any query in L1 can be expressed in L2 – Query q cannot be expressed in L

  • Complexity of languages

– Bounds on resources required to evaluate any query in language L

  • Static analysis of queries (for optimization)

– Given q in L: is it minimal? – Given q1 and q2 in L: are they equivalent?

  • Views

2

slide-3
SLIDE 3

Coming lectures

  • TODAY:

– Overview of languages – Conjunctive queries (CQs) – Properties of CQs – Containment/equivalence for CQs

  • Next Week

– Adding recursion – Reasoning about views

3

slide-4
SLIDE 4

Query languages

  • So far weʼve seen:

– Relational algebra – Relational calculus – SQL

4

slide-5
SLIDE 5

Review: relational algebra

  • Five operators:

– Union: ∪ – Difference: - – Selection: σ – Projection: Π – Cartesian Product: ×

  • Derived or auxiliary operators:

– Intersection, complement – Joins (natural,equi-join, theta join) – Renaming: ρ

slide-6
SLIDE 6

Review: relational calculus

Name and sid of students who are taking the course “DB” English:

{xname, xsid | ∃xcid∃xterm Students(xsid,xname) ∧ Takes(xsid,xcid) ∧ Course(xcid,”DB”, xterm) }

RC: RA:

Πname,sid (Students Takes σname=”DB” (Course)

slide-7
SLIDE 7

Review: SQL

Basic form: SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections)

slide-8
SLIDE 8

Query language classes

FO queries Expressiveness Logic Algebra SQL RA (safe) RC SFW +

UNION EXCEPT

single datalog rule

Conjunctive Queries

Recursive Queries

slide-9
SLIDE 9

9

Conjunctive Queries

  • A subset of FO queries (i.e. less expressive)
  • Many queries in practice are conjunctive
  • Some optimizers handle only conjunctive

queries - break larger queries into many CQs

  • CQʼs have “better” theoretical properties than

arbitrary queries

abbreviated: CQ

slide-10
SLIDE 10
  • R: Extensional database (EDB) - stored
  • P: Intentional database (IDB) - computed

Conjunctive Queries

10

in rule-based (datalog) notation

P(x,z) :- R(x,y) & R(y,z)

Body Head Variables Subgoals “IF” Conjunction ← Implicit ∃

slide-11
SLIDE 11
  • More formally:
  • Consider all possible substitutions:

assignments of the variables in the body

Conjunctive Queries

11

P(x,z) :- R(x,y) & R(y,z)

Intuitively: when facts in the body are true of stored relations, then we infer the fact in the head

slide-12
SLIDE 12

12

Examples

A(x) :- ManagedBy(“Smith”,y) & ManagedBy(x,y)

ManagedBy(x,y) Sam Frank Smith Joan Sally Joan Smith Bob A(x) Sally Smith

substitution y = Joan x = Sally substitution y = Joan x = Smith substitution y = Bob x = ?

All employees having the same manager as “Smith” EDB Relation: ManagedBy(emp,mgr)

slide-13
SLIDE 13

Defining answers to CQ

  • A substitution v is a function from variables into the
  • domain. e.g. x → a, y → a, z → b, u → c
  • Let I be an instance, i.e. relations I(R1) ... I(Rn)
  • A tuple t is in the answer q(I) if there is a

substitution v s.t:

– v(u1) ∈ I(R1) for each i, and – t = v(u)

13

ans(u) :- R1(u1) & ... & Rn(un)

e.g. ui = (x,y,z)

general form

  • f a CQ q

v(ui) = (a,a,b)

slide-14
SLIDE 14

14

Examples

  • Find all employees having the same director

as Smith: A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)

EDB Relation: ManagedBy(emp,mgr)

(Your director is your managerʼs manager)

slide-15
SLIDE 15

Query language classes

FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +

UNION EXCEPT

RA: σ,π,×

single datalog rule

Conjunctive Queries

Recursive Queries

slide-16
SLIDE 16

16

CQ and RA

Relational Algebra:

  • CQ correspond precisely to σC, ΠA, ×

(missing: ∪, –)

Π$2.name

σname=“Smith”

ManagedBy ManagedBy

$1.manager=$2.manager

A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

slide-17
SLIDE 17

Query language classes

FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +

UNION EXCEPT

SdFW RA: σ,π,×

single datalog rule

Conjunctive Queries

Recursive Queries

slide-18
SLIDE 18

18

CQ and SQL

select distinct m2.name from ManagedBy m1, ManagedBy m2 where m1.name=“Smith” AND m1.manager=m2.manager A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

Rule-based: SQL:

Notice “distinct”

slide-19
SLIDE 19

Boolean queries

19

A() :- ManagedBy(“Smith”, x), ManagedBy(“Sally”, x) Is there someone who manages both Smith and Sally?

  • Returns:

– relation {〈〉} if the answer is yes – relation { } if the answer is no

empty relation non-empty relation

slide-20
SLIDE 20

Properties of Conjunctive Queries

  • Satisfiability

– A query q is satisfiable if there exists some input relation I such that q(I) is non-empty. – FACT: Every CQ is satisfiable.

  • Monotonicity

– A query q is monotonic if for each instance I,J

  • ver schema, I ⊆ J implies q(I) ⊆ q(J).

– FACT: Every CQ is monotonic.

20

slide-21
SLIDE 21

Satisfiability of CQs

21

S(x,y,z) :- P(x,w) & R(w,y,v) & P(v,z)

a b d e

P

b c d

R We can always generate satisfying EDB relations from the body of the rule.

a c e

S a b b c d d e

slide-22
SLIDE 22

Monotonicity of CQs

  • Consider two databases I, J s.t. I ⊆ J.
  • let t ∈ q(I).

– Then for some substitution v:

  • v(ui) ∈ I(Ri) for each i.
  • t = v(u)

– Since I ⊆ J, v(ui) ∈ J(Ri) for each i

– So t ∈ q(J) ans(u) :- R1(u1) & ... & Rn(un)

e.g. ui = (x,y,z)

general form

  • f a CQ q
slide-23
SLIDE 23

Consequence of monotonicity

SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker=ʻGizmo-Worksʼ) Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works”

  • This query is NOT monotone.
  • Therefore, it is not in the class of conjunctive queries.
  • It cannot be expressed as a simple SFW query.
slide-24
SLIDE 24

Extensions of CQs

24

slide-25
SLIDE 25

Query language classes

FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +

UNION EXCEPT

SdFW RA: σ,π,×

single datalog rule

Conjunctive Queries

Recursive Queries

slide-26
SLIDE 26

26

Extensions of CQ: disequality

CQ≠

A(y) :- ManagedBy(x,y), ManagedBy(z,y), x≠z

Find managers that manage at least 2 employees

slide-27
SLIDE 27

Extensions of CQ: inequality

CQ<

A(y) :- ManagedBy(x,y), Salary(x,u), Salary(y,v), u > v

Find employees earning more than their manager

Additional EDB Relation: Salary(emp,money)

slide-28
SLIDE 28

28

Extensions of CQ: negation

CQ¬ A(y) :- Office(“Alice”,u), Office(y,u), ManagedBy(“Alice”,x), ¬ManagedBy(y,x)

Find people sharing the same office with Alice, but with a different manager

Additional EDB Relation: Office(emp,officenum)

slide-29
SLIDE 29

29

Extensions of CQ: union

UCQ A(name) :- Employee(name, dept, age, salary), age > 50 A(name) :- RetiredEmployee(name, address)

Unions of conjunctive queries Datalog notation is very convenient for expressing unions (no need for ∨ ) Rule-based:

slide-30
SLIDE 30

Query language classes

FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +

UNION EXCEPT

SdFW RA: σ,π,×

single datalog rule

UCQ CQ< CQ≠ CQ¬

Conjunctive Queries

Recursive Queries

slide-31
SLIDE 31

31

Extensions of CQ

  • If we extend too much, we capture FO

– Namely: CQs + Union, Negation

  • Theoreticians need to be careful: small

extensions may make a huge difference

  • n certain theoretical properties of CQ
slide-32
SLIDE 32

Query language classes

FO queries Expressiveness RA (safe) RC Logic Algebra SQL SFW +

UNION EXCEPT

Conjunctive Queries

Recursive Queries UCQ¬ SdFW RA: σ,π,×

single datalog rule

UCQ CQ< CQ≠ CQ¬

slide-33
SLIDE 33

33

Query Equivalence and Containment

  • One kind of static analysis
  • Useful for query optimization
  • Intensively studied since 1977
slide-34
SLIDE 34

34

Query Equivalence

SELECT x.name, x.manager FROM Employee x, Employee y WHERE x.dept = ʻSalesʼ and x.office = y.office and x.floor = 5 and y.dept = ʻSalesʼ

Hmmmm…. Is there a simpler way to write that ?

slide-35
SLIDE 35

35

Query Equivalence

  • Queries q1 and q2 are equivalent if for

every database D, q1(D) = q2(D).

  • Notation: q1 ≡ q2

relations equal

slide-36
SLIDE 36
  • Query q1 is contained in q2 if for every

database D, q1(D) ⊆ q2(D).

  • Notation: q1 ⊆ q2
  • Obviously: q1 ⊆ q2 and q2 ⊆ q1 iff q1 ≡ q2
  • Conversely: q1∧ q2 ≡ q2 iff q1 ⊆ q2

Query Containment

We will study the containment problem only.

slide-37
SLIDE 37

Sidenote: containment for Boolean queries

  • Recall: q1 is contained in q2 if for every database D,

q1(D) ⊆ q2(D).

– if q1, q2 are boolean they return {〈〉} or { } – containment says: – whenever q1(D) = {〈〉} then q2(D) = {〈〉}.

  • Containment is implication: q1 → q2

37

slide-38
SLIDE 38

38

Examples of Query Containments

q1(x) :- R(x,y), R(y,z), R(z,w) q2(x) :- R(x,y), R(y,z) Is q1 ⊆ q2 ?

slide-39
SLIDE 39

39

Examples of Query Containments

q1(x) :- R(x,y), R(y,z), R(z,x) q2(x) :- R(x,y), R(y,x) Is q1 ⊆ q2 ?

A B C

Counter-example

slide-40
SLIDE 40

40

Examples of Query Containments

q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v), R(v,w) Is q1 ⊆ q2 ?

A B

Example

slide-41
SLIDE 41

41

Examples of Query Containments

q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v) Is q1 ⊆ q2 ?

slide-42
SLIDE 42

42

Query Containment

  • Theorem Query containment for CQ is

decidable and NP-complete.

(query complexity)

slide-43
SLIDE 43

Checking containment

  • 1. “Freeze” q1
  • Replace variables by unique constants
  • x --> ax, u --> ay
  • this is called canonical database of q1
  • 2. Evaluate q2 on frozen body of q1
  • 3. If frozen head is derived, then

q1 ⊆ q2

q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v), R(v,w)

ax ay ay ay

body

(ax)

head

substitution: x = ax, u= ay, v= ay, w= ay

Containment!

slide-44
SLIDE 44

Why does this test work?

  • If the test is negative, the canonical database

constructed is a counterexample to containment.

  • If the test is positive:

– substitution v: var(q2) --> “canonical domain” – this implies f: var(q2) --> var(q1) ∪ const(q1) – Now suppose t ∈ q1(I) for any instance I

  • there is substitution w: var(q1) --> domain

– such that t is derived.

  • then f followed-by w is a substitution showing

that t will be in q2(I).

44

slide-45
SLIDE 45

Query Homomorphisms

  • A homomorphism f : q2 → q1 is a function

f: var(q2) → var(q1) ∪ const(q1) such that:

– f(body(q2)) ⊆ body(q1) – f(tq1) = tq2

The Homomorphism Theorem q1 ⊆ q2 iff there exists a homomorphism f : q2 → q1

Chandra & Merlin 1977

q1(x) :- R(x,u), R(u,u) q2(xʼ) :- R(xʼ,uʼ), R(uʼ,vʼ), R(vʼ,wʼ)

homomorphism f: xʼ → x uʼ → u vʼ → u wʼ → u

slide-46
SLIDE 46

46

The Homeomorphism Theorem

  • Theorem Conjunctive query containment is:
  • 1. decidable (why ?)
  • 2. in NP (why ?)
  • 3. NP-hard
  • In short: containment for CQs is NP-complete
slide-47
SLIDE 47

Query Minimization

Definition A conjunctive query q is minimal if for every other conjunctive query qʼ s.t. q ≡ qʼ, qʼ has at least as many predicates (ʻsubgoalsʼ) as q

q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x,ʼAliceʼ)

Are these queries minimal ?

slide-48
SLIDE 48

Query Minimization

  • Query minimization algorithm

Choose a subgoal g of q Remove g: let qʼ be the new query We already know q ⊆ qʼ (why ?) If qʼ ⊆ q then permanently remove g

  • Notice: the order in which we inspect

subgoals doesnʼt matter

slide-49
SLIDE 49

Other containment problems

  • Extensions of CQs:

– Unions of CQs – CQs with inequality

  • FO queries
  • Containment under constraints
  • What about bags?

– strange things happen

49

slide-50
SLIDE 50

Containment under constraints

  • Recall: query q1 is contained in q2 if for

every database D, q1(D) ⊆ q2(D).

  • What if we know more about our input

databases?

  • Replace “every database D”, with:

– “every database satisfying constraint C” Containment under FD is NP-complete

slide-51
SLIDE 51

Containment for FO queries

  • Theorem Satisfiability for FO queries is

undecidable

  • Lemma Query containment/equivalence

for FO is undecidable

  • if we had an algorithm for equivalence, we

could use it to decide satisfiability of q:

  • check: q ≡ false

Consequence: we cannot do global query

  • ptimization for first-order queries.
slide-52
SLIDE 52

Review

  • CQs are an important fragment of FO

– Equivalences: RA: σ,π,× SQL: SdFW – Properties: satisfiable, monotonic – containment/equivalence decidable, NPc

  • Expressiveness

– CQs strictly less expressive than FO

  • Hardness of static optimization

52