1
Schema Refinement and Normal Forms
UMass Amherst Feb 14, 2007
Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu
Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 - - PowerPoint PPT Presentation
Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design name Conceptual ER Model Person buys Product Design price name ssn Relational
1
Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu
2
Person buys Product name price name ssn
Relational Schema plus Integrity Constraints
Normalized schema
3
Redundancy is at the root of several problems
Integrity constraints, in particular functional
4
Main refinement technique: decomposition
Decomposition should be used judiciously:
5
ssn name lot rating wages hours 123-22-3666 Attishoo 48 8 10 40 231-31-5386 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40
Notation: We will denote this relation schema by listing
This is really the set of attributes {S,N,L,R,W,H}.
Some FDs on Hourly_Emps:
S SNLRWH rating determines hrly_wages: R W
7
8
A functional dependency X Y holds over relation R
An FD is a statement about all allowable relations.
A superkey is a set of attributes K such that K B
A key is a minimal superkey
Problems due to R W :
we change W in just the 1st tuple of SNLRWH?
want to insert an employee and don’t know the hourly wage for his rating?
all employees with rating 5, we lose the information about the wage for rating 5!
S N L R W H 123-22-36 Attishoo 48 8 10 40 231-31-53 Smiley 22 8 10 30 131-24-36 Smethurst 35 5 7 30 434-26-37 Guldu 35 5 7 32 612-67-41 Madayan 35 8 10 40
R W 8 10 5 7 S N L R H 123-22-366 Attishoo 48 8 40 231-31-538 Smiley 22 8 30 131-24-365 Smethurst 35 5 30 434-26-375 Guldu 35 5 32 612-67-413 Madayan 35 8 40
10
Given some FDs, we can usually infer additional FDs:
An FD f is implied by a set of FDs F, if f holds for every
Armstrong’s Axioms (X, Y, Z are sets of attributes):
11
Additional rules (that follow from AA):
These are sound and complete inference rules for FDs!
13
Computing the closure F+ can be expensive: computes
Typically, we just want to check if a given FD X Y is
Does F = {A B, B C, C D E } imply A E?
Compute X+ for all sets X If X+ = all attributes, then X is a superkey Consider only the minimal superkeys
15
Returning to the issue of schema refinement, the first
If a relation is in a certain normal form (BCNF, 3NF
Role of FDs in detecting redundancy:
16
Given a relation R, and set of FD’s F on R R is in BCNF if:
“The only non-trivial FDs that hold over R are key
Is the following table in BCNF?
Key is BC, so B is not a superkey Not in BCNF
17
18
BCNF ensures that no redundancy can be detected using FDs!
R is in 3NF if:
Minimality of a key is crucial in third condition above! If R is in BCNF, obviously in 3NF. If R is in 3NF, some redundancy is possible.
20
1st normal form 1NF: no set-valued attributes. 2nd normal form 2NF: [historical interest only] 3rd normal form 3NF Boyce-Codd normal form BCNF: 3NF, and no
4th normal form 4NF: BCNF and no multi-valued
21
A decomposition of R replaces R by two or more
As a result, we will store instances of the relation
22
23
Decompositions should be used only when needed.
If we just store the projections of SNLRWH tuples
24
25
Decomposition of R into X and Y is lossless-join w.r.t. a
It is always true that r (r) (r)
Definition extended to decomposition into 3 or more
It is essential that all decompositions used to deal with
26
The decomposition of R into
In particular, if U V holds
27
Consider CSJDPQV, C is key, JP C and SD P.
Dependency preserving decomposition:
Projection of set of FDs F:
28
Formally, decomposition of R into X and Y is
Important to consider F +, not F, in this definition:
Dependency preserving does not imply lossless join:
And vice-versa! (Example?)
29
Consider relation R with FDs F. If X Y violates
Several dependencies may cause violation of BCNF.
30
In general, there may not be a dependency preserving
Similarly, decomposition of CSJDQV into SDP, JS and
31
3NF decomposition v.s. BCNF decomposition:
Tradeoffs
32
If a relation is in BCNF, it is free of redundancies that
If a relation is not in BCNF, we can try to decompose
33