BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After - - PowerPoint PPT Presentation

bcnf revisited 40 years normal forms
SMART_READER_LITE
LIVE PREVIEW

BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After - - PowerPoint PPT Presentation

Full set of slides A. Skrobov BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After F. Ferrarotti, S. Hartmann, H. K ohler, S. Link, M. Vincent , The Boyce-Codd-Heath Normal Form for SQL , to appear in 18th Workshop on Logic,


slide-1
SLIDE 1

Full set of slides

  • A. Skrobov

BCNF revisited: 40 Years Normal Forms

Part III: BCNF for SQL

After F. Ferrarotti, S. Hartmann, H. K¨

  • hler, S. Link, M. Vincent,

The Boyce-Codd-Heath Normal Form for SQL, to appear in 18th Workshop

  • n Logic, Language, Information and Computation (WoLLIC), 2011

Presentation prepared by Artyom Skrobov (329000723), tyomitch@cs.technion.ac.il Faculty of Computer Science Technion - IIT, Haifa 2011

1

slide-2
SLIDE 2

Full set of slides

  • A. Skrobov

Overview

Part I

  • Normal forms and functional dependencies
  • BCNF and redundancy
  • BCNF and update anomalies

Part II

  • BCNF and storage saving
  • Achieving BCNF
  • Other normal forms

Part III

  • BCNF for SQL

2

slide-3
SLIDE 3

Full set of slides

  • A. Skrobov

Part III: BCNF for SQL

SQL supports data model that diverges from the conventional relational model.

  • Bag semantics: duplicate rows allowed in a table.
  • Incomplete data: NULL values are allowed, and treated specially.

Nevertheless, syntactic tools are needed to reason about dependencies and their inference in SQL data model. Such tools can be applied to reason about non-redundancy of the data, and thus to improve the DBMS performance, while maintaining data integrity.

3

slide-4
SLIDE 4

BCNF for SQL

  • A. Skrobov

Real-world databases

To be practical, the relational model needs adaptation.

  • Bag (multiset) semantics: removing duplicate tuples from the relation

after each operation is prohibitively expensive; thus, deduplication is only performed when requested explicitly.

  • NULL values:

alleviate insertion and deletion anomalies in “kind-of”- denormalized schemata. Example: Title Rating Director Apocalypse Now 8.6 Coppola Blade Runner 8.3 Scott The Birds NULL Hitchcock [Title,Rating,Director] with {Title→Rating,Director} is in BCNF, but in the relational model we can’t represent a movie title without a rating; we can’t add a new title to the database until we know both its rating and the director; and we can’t delete an expired rating from the database without deleting the movie title as well.

4

slide-5
SLIDE 5

BCNF for SQL

  • A. Skrobov

Real-world databases

The relational solution is to use two relations having a common key: Title Rating Apocalypse Now 8.6 Blade Runner 8.3 Title Director Apocalypse Now Coppola Blade Runner Scott The Birds Hitchcock Obviously it’s a waste of space if most rows are total (include all the data), since the decomposed schema stores each key multiple times.

5

slide-6
SLIDE 6

BCNF for SQL

  • A. Skrobov

Bag schemata

  • bag schema is a finite, non-empty set B of attributes. Each attribute A

is associated with a domain of possible values, denoted dom(A)

  • bag tuple is a function t : B → ∪

A∈B dom(A) such that

∀A ∈ B : t(A) ∈ dom(A)

  • bag (instance) is a finite multiset b of tuples.

Example: Title Rating Director Apocalypse Now 8.6 Coppola Blade Runner 8.3 Scott Blade Runner 8.3 Scott Note that though this bag has duplicate rows, it still satisfies the classical definition of functional dependency Title→Rating,Director: for each movie title, there is only a single rating and only a single director.

6

slide-7
SLIDE 7

Bag schemata

  • A. Skrobov

FD’s vs. keys in bags

  • The classical definition of functional dependency (FD) still applies:

for X, Y ⊆ B, a bag b is said to satisfy the FD X → Y iff ∀t1, t2 ∈ b : t1[X] = t2[X] → t1[Y ] = t2[Y ]; that is, if tuples having the same value for X must also have the same value for Y .

  • In a relation R, a set of attributes X ⊆ R was called a (super)key if

X → R; that is, if no two distinct tuples can have the same value for X. This definition needs generalization for bags. A bag b is said to satisfy the key constraint (a.k.a. unique constraint) key(X) iff ∀t1, t2 ∈ b : t1[X] = t2[X] → t1 = t2. Obviously, key(X) | = X → B; but the reverse doesn’t hold: in the pre- vious example, | =b Title → B but | =b key(Title) because of the duplicate

  • rows. A bag having duplicate rows violates every key.

7

slide-8
SLIDE 8

Bag schemata

  • A. Skrobov

Axiomatization for total bags

The following set of inference rules is sound and complete for bags:

  • key(X)

X→Y

(demotion)

  • key(Y ), X→Y

key(X)

(pullback) – These two rules specify the equivalence between key(X) and X → B. The superkey rule

key(X) key(XY ) is a special case of the pullback rule.

  • XY →X (reflexivity)
  • X→Y

XZ→Y Z (augmentation)

  • X→Y, Y →Z

X→Z

(transitivity) – These three rules are unchanged from the axiomatization for FD’s on relations.

Dubious: what is cited in place of the classic augmentation doesn’t look equivalent! 8

slide-9
SLIDE 9

Bag schemata

  • A. Skrobov

BCNF for total bags

The semantics of BCNF remains unchanged: “no fact is stored twice”. The last example satisfies the relational definition of BCNF (for each non- trivial X → Y ∈ Σ∗, there must be X → R ∈ Σ∗) but is blatantly redundant because of the duplicate row. For a tuple t ∈ b and an attribute A ∈ B, a replacement of t[A] is defined as a tuple t′ such that t′[B \ A] = t[B \ A], t′[A] = t[A]. Given a set Σ of keys and FD’s, the data value t[A] is called redundant iff every replacement t′ of t[A] results in a bag b′ = (b \ {t}) ∪ {t′} that violates Σ. In the example, the name “Scott” in the last row is redundant, because any its replacement would violate Title → Director. BCNF is now defined for bags as: for each non-trivial X → Y ∈ Σ∗, there must be key(X) ∈ Σ∗(generalization of the original formulation, using the notion of key constraints)

9

slide-10
SLIDE 10

Bag schemata

  • A. Skrobov

BCNF for total bags

An equivalent definition of BCNF: for each non-trivial X → Y ∈ Σ, there must be X → B ∈ Σ∗, and additionally, if Σ has non-trivial FD’s, then it must also have a key. Intuition: for BCNF, the schema must have a key, and therefore, must be a relation. Theorem: B, Σ is in BCNF iff no bag satisfying Σ has redundant data values. The implication of FD’s and keys on (total) bags is decidable in linear time; therefore, a schema can be checked for BCNF compliance in O(|ΣFD||×Σ), where ΣFD is the set of functional dependencies defined over the schema.

10

slide-11
SLIDE 11

Partial schemata

  • A. Skrobov

Incomplete data

  • Incomplete data is represented as partial relations: some of the attributes

in a tuple may not have a value. – E.g., in SQL, some columns are defined as nullable, and the others are constrained as NOT NULL.

  • A tuple that has values assigned to attributes X ⊆ R is called an X-total

tuple, and a relation R consisting only of R-total tuples is a total relation.

  • A set X of non-nullable attributes is called a null-free subschema (NFS);

the NFS is satisfied by a relation if all its tuples are X-total. NFS of a relation R is normally denoted as Rs.

11

slide-12
SLIDE 12

Partial schemata

  • A. Skrobov

Incomplete data

NULL value is included in every domain, and it’s special with regard to dealing with the equality NULL

?

=NULL. It may be interpreted as “missing value” (not participating in any constraints) or as “existing but unknown value” (there must exist a matching total relation that satisfies the constraints). The prohibition of duplicate tuples is generalized for partial relations: a rela- tion R must not have tuples t1, t2 that subsume one another, i.e., ∀A ∈ R : (t1[A] = t2[A]) ∨ (t2[A] = NULL) Partial bags are defined identically, but without the restriction of non-subsumption.

12

slide-13
SLIDE 13

Partial schemata

  • A. Skrobov

NULL consistency

In the “missing value” interpretation, only X-total tuples are considered in joins R ⊲ ⊳X=R∩S S, dependencies X → Y , X ։ Y , and constraints key(X). Example: r Article Supplier Location Cost t1 Kiwi G6Kiwi NULL 1.50 t2 Kiwi G6Kiwi NULL 2.50 The set of dependencies Σ = {Article → Supplier; Article, Location → Cost; Supplier ։ Location} would in a null-free schema imply Article → Cost and Article ։ Location. One can see that | =r Σ; specifically, | =r Article → Supplier and | =r Supplier → Location in the normal sense, and | =r Location → Cost because there are no Location-total tuples in r. Nevertheless, | =r Article → Cost, indi- cating that the null-free inference rules aren’t sound for the partial schema. As for the “unknown value” interpretation, note that Article, Location → Cost requires t1[Location] = t2[Location], while at the same time, Supplier ։ Location requires t1[Location] = t2[Location]. Therefore, | =r Σ.

13

slide-14
SLIDE 14

Partial schemata

  • A. Skrobov

Logic of paradox

  • NULL-consistency substitutes the classical propositional logic with logic
  • f paradox (LP) based on three boolean values: T (true), F (false), and

P (paradoxical)

– The latter is represented in SQL as boolean NULL – Equality of NULL value to any data value is P

  • A logic formula is valid if it evaluates to T or P.

Then, it’s possible to map FD’s, MVD’s, and key constraints to logic for- mulae, and partial bags – to LP interpretations, so that the constructed interpretation violates the formulae iff the original bag violates the dependen- cies; in other words, LP is sufficient to reason about dependency implications. NFS is accounted for by restricting the interpretation to not assign P to the variables corresponding to non-nullable attributes.

14

slide-15
SLIDE 15

Partial schemata

  • A. Skrobov

Logic of paradox

Construction of the formula for a dependency:

  • Φ(key(X)) = ∨

A∈X ¬A′

  • Φ(X → Y ) = ∨

A∈X ¬A′ ∨ ∧ B∈Y B′

  • Φ(X ։ Y ) = ∨

A∈X ¬A′ ∨ ∧ B∈Y \X B′ ∨ ∧ C∈B\XY C′

The interpretation is constructed from two-tuple counterexample subbag: it holds that whenever Σ | =b ϕ, there exists b′ = {b1, b2} ⊆ b such that Σ | =b′ ϕ. Therefore, Σ | = ϕ iff any two-tuple bag that satisfies Σ satisfies ϕ. Then, each A′ in the formula is assigned the truth value b1[A] = b2[A]. The final result is that Σ | = ϕ iff Φ(Σ) | = Φ(ϕ).

15

slide-16
SLIDE 16

Partial schemata

  • A. Skrobov

Logic of paradox (example)

In the previous example, | =r Article → Location and | =r Location → Cost, but | =r Article → Cost: the transitivity of FD’s doesn’t hold because Location is

  • nullable. We map Article → Location to ¬A ∨ L, Location → Cost to ¬L ∨ C,

Article → Cost to ¬A ∨ C, and r to ωr = A S L C

T T P F ;

and indeed, | =ωr {¬A ∨ L, ¬L ∨ C}, but | =ωr ¬A ∨ C: the extension of ωr to LP formulae evaluates ¬A ∨ L to P, ¬L ∨ C to P, and ¬A ∨ C to F. This illustrates that the transitivity of logic implication doesn’t hold in LP; neither does modus ponens. Intuitively, we have constructed ωr by evaluating t1[Article] = t2[Article] to

T, t1[Location] = t2[Location] to P, and t1[Cost] = t2[Cost] to F; therefore, in

terms of LP, (t1[Article] = t2[Article]) → (t1[Location] = t2[Location]) is P (and valid), (t1[Location] = t2[Location]) → (t1[Cost] = t2[Cost]) is P (and valid), and (t1[Article] = t2[Article]) → (t1[Cost] = t2[Cost]) is F (and not valid).

16

slide-17
SLIDE 17

Partial schemata

  • A. Skrobov

S-3 logic

S-3 logic, an alternative NULL-consistent logic, deals with the two classic boolean values T and F. The distinctive feature of S-3 is that it allows for a nullable logic variable A that both A and ¬A be assigned the value T. This logic is equivalent to LP: an LP interpretation assigning P to A corre- sponds to an S-3 interpretation assigning T to both A and ¬A. In the last example, we may have used S-3 for our reasoning; Location is nullable, so we can assign T both to t1[Location] = t2[Location] and to t1[Location] = t2[Location], thereby satisfying both Article → Location and Location → Cost. Since S-3 is equivalent to LP, the transitivity still doesn’t hold.

17

slide-18
SLIDE 18

Partial schemata

  • A. Skrobov

Axiomatization for partial relations

The following set of inference rules is sound and complete for a partial relation R & NFS Rs:

  • XY →X (reflexivity)
  • X→Y Z

X→Y

(decomposition)

  • X→Y, X→Z

X→Y Z

(FD union)

  • X։Y

X։R\Y (complementation)

  • X։Y, X։Z

X։Y Z

(MVD union)

  • X։W, Y ։Z, Y ⊆X∪(W∩Rs)

X։Z\W

(transitivity)

  • X→Y

X։Y (conversion)

  • X։W, Y →Z, Y ⊆X∪(W∩Rs)

X→Z\W

(interaction)

18

slide-19
SLIDE 19

Partial schemata

  • A. Skrobov

Axiomatization for partial relations

The transitivity and interaction rules are reformulated for NULL-consistency, and the rest of the rules are unchanged from the axiomatization for FD’s and MVD’s on total relations. The implication of FD’s and MVD’s on partial relations is decidable in time O (|Σ| + min(kΣ, log pΣ) |Σ|), where Σ is the set of dependencies, consisting

  • f kΣ MVD’s, and pΣ is the size of its dependency basis. This result is known

as almost linear decidability. This bound may be slightly tightened further, by considering Σ[XRs] in place of Σ when deciding the implication Σ

?

| = ϕ for ϕ : X → Y or ϕ : X ։ Y . Another result for the complexity of deciding an implication is O ( |Σ| × |ϕ| × 2|Rs|) , illustrating that declaring less attributes as NOT NULL improves the efficiency

  • f the deciding, thus contributing to the DBMS performance. On the other

hand, for each Rs ⊆ R′

s ⊆ R, if Σ |

=Rs ϕ, then Σ | =R′

s ϕ: by declaring more

attributes as NOT NULL, the DBA enforces at least all of the previously enforced data dependencies, thus contributing to the schema expressiveness.

19

slide-20
SLIDE 20

Partial schemata

  • A. Skrobov

Axiomatization for partial relations (example)

r Article Supplier Location Cost t1 Kiwi G6Kiwi NULL 1.50 t2 Kiwi G6Kiwi NULL 2.50 In our example, Σ = {Article → Supplier; Article, Location → Cost; Supplier ։ Location}, and the axiomata show that the implication Σ | = Article ։ Location holds

  • nly when Supplier ∈ Rs, and the implication Σ |

= Article → Cost holds only when {Supplier, Location} ⊆ Rs.

20

slide-21
SLIDE 21

BCNF for SQL

  • A. Skrobov

Axiomatization for partial bags

The following set of inference rules is sound and complete for a partial bag B with NFS Bs:

  • key(X)

X→Y

(demotion)

  • key(Y ), X→Y

key(X)

(pullback)

– These two rules are unchanged from the axiomatization for FD’s on total bags.

  • XY →X (reflexivity)
  • X→Y Z

X→Y

(decomposition)

  • X→Y, X→Z

X→Y Z

(union)

  • X→Y, Y →Z, Y ⊆XBs

X→Z

(transitivity)

– Except for the NFS requirement in the transitivity rule, these four rules are the classic inference rules for FD’s on relations. 21

slide-22
SLIDE 22

BCNF for SQL

  • A. Skrobov

Axiomatization for partial bags

Note that in a total bag, the augmentation rule is equivalent to the union rule, and the decomposition rule follows from the reflexivity and the transitivity. In a partial bag, these need to be postulated explicitly, because the transitivity doesn’t always hold. The implication of FD’s and keys on partial bags is decidable in linear time. No axiomatization is suggested for the combined class of FD’s, MVD’s, and key constraints on partial bags.

22

slide-23
SLIDE 23

BCNF for SQL

  • A. Skrobov

BCNF for partial bags

With the same definition of redundancy as for total bags (every replacement

  • f a data value results in a dependency violation) and the same definition of

BCNF (for each non-trivial X → Y ∈ Σ∗, there must be key(X) ∈ Σ∗), the same theorem holds: B, Σ is in BCNF iff no bag satisfying Σ has redundant data values. Additional equivalent definitions of BCNF for partial bags:

  • for each non-trivial X → Y ∈ Σ, there must be key(X) ∈ Σ∗
  • for each non-trivial X → Y ∈ Σ, there must be

X → B ∈ Σ∗, key(Z) ∈ Σ, Z ⊆ XBs Intuition: for BCNF, the schema must have a non-nullable key, and therefore, must be a relation. Keys are a stronger constraint than FD’s: projection of a bag preserves its FD’s, but may not preserve its keys. Moreover, join doesn’t preserve keys, thus dependency-preserving decomposition is not well defined for bags. Dealing with partial bags adds the restriction that the join attribute must be non-nullable for the decomposition to be lossless.

23

slide-24
SLIDE 24

BCNF for SQL

  • A. Skrobov

Additional properties of BCNF

  • Equivalent definition: Σ∗

key = Σ∗, where

Σkey = {key(X) ∈ Σ} ∪ {X → Y ∈ Σ | key(X) ∈ Σ∗} (set of key constraints, and FD’s induced by the keys) – Difference w.r.t. the relational model: key constraints need to be added explicitly into Σkey

  • Σ-redundancy: equivalent to the aforementioned replacement-redundancy

– a data value is redundant iff it appears in a projection upon a FD at- tributes set, and the projection has duplicate tuples.

  • Update anomalies: we validate Σkey on updates. The equality Σ∗

key = Σ∗

ensures that the updated bag would satisfy Σ.

  • Storage minimization: a schema is in BCNF iff it has no storage-saving

lossless decomposition (w.r.t. the total size in tuples of all subbags)

24