CSCI1270 Introduction to Database Systems Normalization CSCI1270: - - PowerPoint PPT Presentation

csci1270 introduction to database systems
SMART_READER_LITE
LIVE PREVIEW

CSCI1270 Introduction to Database Systems Normalization CSCI1270: - - PowerPoint PPT Presentation

CSCI1270 Introduction to Database Systems Normalization CSCI1270: Introduction to Database Systems Another Use for FD s: Schema Design Schema Design: Approach #1 1. Construct E/R diagram 2. Translate into tables Subjective: How do we know


slide-1
SLIDE 1

CSCI1270: Introduction to Database Systems

CSCI1270 Introduction to Database Systems

Normalization

slide-2
SLIDE 2

CSCI1270: Introduction to Database Systems

Another Use for FD’s: Schema Design

Schema Design: Approach #1 Schema Design: Approach #2 Schema Design: Approach #3

  • 1. Construct E/R diagram
  • 2. Translate into tables

Subjective: How do we know if any good?

  • 1. Start with universal relation
  • 2. Determine FD’s
  • 3. “Decompose” UR using FD’s as guide
  • 1. Construct E/R diagram to come up with 1st cut design
  • 2. Use FD’s to verify or refine
slide-3
SLIDE 3

CSCI1270: Introduction to Database Systems

Decomposition

  • 1. Decomposing the Schema

R = (bname, bcity, assets, cname, lno, amt) R1 = (bname, bcity, assets, cname) R2 = (cname, lno, amt)

Notation:

R = R1 ∪ R2

slide-4
SLIDE 4

CSCI1270: Introduction to Database Systems

Decomposition

1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 2000 L-23 Johnson 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname 1000 L-17 Hayes 500 L-93 Jones 2000 L-23 Johnson 1000 L-17 Jones amt lno cname

  • 2. Decomposing the Instance

R1 = R = R2 =

BTW: Not a Good Decomposition

slide-5
SLIDE 5

CSCI1270: Introduction to Database Systems

Want to be able to reconstruct big relation by joining smaller ones (Natural join) (i.e.: R1 R2 = R?)

Goals of Decomposition

  • 1. Lossless Joins
  • 2. Dependency Preservation
  • 3. Redundancy Avoidance

Want to minimize the cost of global integrity constraints based

  • n FD’s (i.e.: Avoid big joins in assertions)

Summary: LJ: Information loss DP: Efficiency (time) RA: Efficiency (space), update anomalies Avoid unnecessary data dupl. (motivation for decomposition)

slide-6
SLIDE 6

CSCI1270: Introduction to Database Systems

Another Use for FD’s: Schema Design

1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 2000 L-23 Johnson 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname

Example: R: “Universal Relation” Design:

R = Tuple meaning: Jones has a loan (L-17) for $1000 taken out

  • f the Dntn branch in Bkln which has assets of $9M

pro : Fast queries (No need for joins!) con : Redundancy, update anomalies, deletion anomalies

slide-7
SLIDE 7

CSCI1270: Introduction to Database Systems

Decomposition Goal #1: Lossless Joins

Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname 1000 L-17 Hayes 500 L-93 Jones 2000 L-23 Johnson 1000 L-17 Jones amt lno cname

=

A Bad Decomposition

1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 1000 L-17 Jones 1.7M Hnck Mianus 3000 L-23 Johnson 9M Bkln Dntn 500 L-93 Jones 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname

slide-8
SLIDE 8

CSCI1270: Introduction to Database Systems

Decomposition Goal #1: Lossless Joins

1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 1000 L-17 Jones 1.7M Hnck Mianus 3000 L-23 Johnson 9M Bkln Dntn 500 L-93 Jones 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname

→ → = A Bad Decomposition Problem: “Lossy join”: By adding noise, have lost meaningful information

as a result of decomposition adds meaningless tuples

slide-9
SLIDE 9

CSCI1270: Introduction to Database Systems

(R1 R2 has 7 tuples, whereas R has 4) A: Lossy. R1 R2 includes:

Lossless Joins

Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname 500 L-93 Mianus 2000 L-23 Dntn 1000 L-17 Dntn amt lno bname … 2000 L-23 Hayes 9M Bkln Dntn 1000 L-17 Johnson 9M Bkln Dntn 2000 L-23 Jones 9M Bkln Dntn … amt lno cname assets bcity bname

Is the Following Decomposition Lossless or Lossy?

R1 = R2 =

slide-10
SLIDE 10

CSCI1270: Introduction to Database Systems

Is the Following Decomposition Lossless or Lossy?

A: Lossless. R1 R2 has 4 tuples

Lossless Joins

L-17 Hayes 9M Dntn L-93 Jones 1.7M Mianus L-23 Johnson 9M Dntn L-17 Jones 9M Dntn lno cname assets bname 500 Hnck L-93 2000 Bkln L-23 1000 Bkln L-17 amt bcity lno

R1 = R2 =

slide-11
SLIDE 11

CSCI1270: Introduction to Database Systems

A: Lossless. R1 R2 has 4 tuples

Lossless Joins

Hayes 1000 L-17 Dntn Jones 500 L-93 Mianus Johnson 2000 L-23 Dntn Jones 1000 L-17 Dntn cname amt lno bname 1.7M Bkln Mianus 9M Bkln Dntn assets bcity bname

Lossless or Lossy? Q: When is decomposition lossless?

R1 = R2 =

slide-12
SLIDE 12

CSCI1270: Introduction to Database Systems

▪ A a key ⇒ |R2| ≤ n, & Relationship with R1 is n:1

Ensuring Lossless Joins

  • ▪ A not a key ⇒ |R1| = n

∴ n tuples in result

A Decomposition of R, R = R1 ∪ R2 is Lossless iff Intuition: Original relation R has n tuples

R1 R2 A A …    R1 ∩ R2 → R1 or R1 ∩ R2 → R2

(i.e.: Intersecting atts must form a super key for one of the resulting smaller relations)

slide-13
SLIDE 13

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation

Goal: Efficient integrity checks of FD’s An Example With No Dependency Preservation: Decomposition: R = R1 ∪ R2 Lossless, but Not DP. Why?

R = (bname, bcity, assets, cname, lno, amt) bname → bcity assets lno → amt bname R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)

slide-14
SLIDE 14

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation (cont.)

CREATE ASSERTION bname-bcity CHECK NOT EXISTS (SELECT * FROM R1 AS x1, R2 AS y1,R1 AS x2, R2 AS y2 WHERE x1.lno = y1.lno AND x2.lno = y2.lno AND x1.lno = x2.lno AND x1.bname = x2.bname AND y1.bcity <> y2.bcity)

Decomposition (cont.): R = R1 ∪ R2 Lossless, but Not DP. Why?

R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)

A: bname → bcity crosses 2 tables

slide-15
SLIDE 15

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation

… A1 … An B1 … Bm …

To Ensure Best Possible Efficiency of FD Checks Above: Ri “covers” the FD, A1, …, An → B1, …, Bm To Test if Decomposition R = R1 ∪ … Rn is DP,

Ensure that only a SINGLE table be examined for each FD i.e.: Ensure that A1, …, An → B1, …, Bm can be checked by examining one table as in:

  • 1. See which FD’s of R are covered by R1, …, Rn
  • 2. Compare closure of (1) to closure of FD’s of R

Ri =

slide-16
SLIDE 16

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation

More Formally:

To test if R = R1 ∪ … ∪ Rn is dependency preserving wrt R’s FD set, F:

  • 1. Compute F+
  • 2. Compute G

G ← ∅ For i ← 1 to n DO Add to G those FD’s in F+ covered by Ri

  • 3. Compute G+
  • 4. If F+ = G+: Decomposition is DP

If F+ ≠ G+: Decomposition is not DP

slide-17
SLIDE 17

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation (cont.)

More Formally (cont.): Example:

To test if R = R1 ∪ … ∪ Rn is dependency preserving wrt R’s FD set, F:

  • 1. Compute F+
  • 2. Compute G
  • 3. Compute G+
  • 4. Compute F+ - G+

F = {A → B, AB → D, C → D} R1 = (A, B, C); R2 = (C, D)

Is this decomposition of (A, B, C, D) DP?

slide-18
SLIDE 18

CSCI1270: Introduction to Database Systems

Aside: Computing F+

Many Algorithms Call For It

If you know Armstrong’s Axioms cold, can generate lazily:

  • 1. Compute Fc
  • 2. Use Armstrong’s Axioms to derive (X → Y) ∈ Fc

+ as

needed

slide-19
SLIDE 19

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation

Example:

F = {A → B, AB → D, C → D} R1 = (A, B, C); R2 = (C, D) Is R = R1 ∪ R2 DP? A: 1. F+ = {A → B, AB → D, C → D}+ Note: (A → D) ∈ F+

  • 2. G = ∅ ∪ {A → B, …} ∪ {C → D, …}

Note: (A → D) ∉ G

  • 3. G+ = {…}

Note: (A → D) ∉ G+

  • 4. F+ ≠ G+ because (A → D) ∈ (F+ - G+)

∴ Decomposition is not DP

slide-20
SLIDE 20

CSCI1270: Introduction to Database Systems

Decomposition Goal #2: Dependency Preservation

Example: Q: Does it satisfy lossless joins?

F = {A → B, AB → D, C → D} What is a DP decomposition of F? A: R = R1 ∪ R2 s.t. R1 = (A, B, D); R2 = (C, D)

  • 1. F+ = {A → B, AB → D,C → D}+
  • 2. G+ = {A → B,AB → D, C → D}+
  • 3. F+ = G+

Note: G+ cannot introduce FD’s not in F+ ∴ Decomposition is DP A: No

slide-21
SLIDE 21

CSCI1270: Introduction to Database Systems

Decomposition Goals Summary

Lossless Joins Dependency Preservation

Motivation: Efficient FD assertions Idea: No gic’s require joins of more than 1 table with itself Test: R = R1 ∪ … ∪ Rn is DP if closure of FD’s covered by each Ri = closure of FD’s covered by R = F+ Ensured for: 3NF Motivation: Avoid information loss Idea: No noise introduced when reconstitution universal relation via joins Test: At each decomposition test: R = R1 ∪ R2

(R1 ∩ R2) → R1 or (R1 ∩ R2) → R2

Ensured for: BCNF, 3NF

slide-22
SLIDE 22

CSCI1270: Introduction to Database Systems

Decomposition Goal #3 Redundancy Avoidance

Redundancy:

  • 1. Name FD of this relation?

Ans: B → C

  • 2. Name the super keys of this relation

A: All sets of atts that include A

  • 3. When do we have redundancy?

A: When ∃ some FD, X → Y covered by relation & X not a super key

1 z p 1 z n 2 y m 2 y h 2 y g 1 x e 1 x a C B A

slide-23
SLIDE 23

CSCI1270: Introduction to Database Systems

Redundancy Avoidance

Decomposition Goals Summary (cont.)

Motivation: Avoid update, deletion anomalies Idea: Avoid update anomalies, wasted space Test: For any X → Y covered by Ri, X should be a superkey of Ri Ensured for: BCNF

slide-24
SLIDE 24

CSCI1270: Introduction to Database Systems

Boyce-Codd Normal Form

What is a Normal Form? BCNF:

Characterization of schema decomposition in terms of properties it satisfies Guarantees no redundancy and lossless joins (Not DP!) Defined: Relation schema R, with FD set F, is in BCNF if: For all nontrivial X → Y in F+: X → R (i.e.: X is a super key)

slide-25
SLIDE 25

CSCI1270: Introduction to Database Systems

  • 1. A → B, A → R (A is a key)
  • 2. A → C, A → R (A is a key)
  • 3. B → C, B → A (B is not a key)

BCNF

Example:

R = (A, B, C) F = {A → B, B → C} A: Consider the nontrivial dependencies in F+: Therefore, R not in BCNF Is R in BCNF?

slide-26
SLIDE 26

CSCI1270: Introduction to Database Systems

BCNF

Example:

R = R1 ∪ R2 R1 = (A, B); R2 = (B, C) F = {A → B, B → C} A: 1. Test R1: A → B covered, A → R1 (all other FD’s covered trivial)

  • 2. Test R2:

B → C covered, B → R2 (all other FD’s covered trivial) ∴ R1, R2 in BCNF Are R1, R2 in BCNF?

slide-27
SLIDE 27

CSCI1270: Introduction to Database Systems

  • a. Choose (X → Y) ∈ F+ s.t.

→ (X → Y) covered by Ri → X → Ri ALGORITHM BCNF (R: Relation, F: FD set) BEGIN

BCNF

Decomposition Algorithm

  • 1. Compute F+
  • 2. Result ← {R}
  • 3. While some Ri ∈ Result not in BCNF, DO
  • b. Decompose Ri on (X → Y)

Ri1 ← X ∪ Y Ri2 ← Ri – Y

  • c. Result ← Result – {Ri} ∪ {Ri1,Ri2}
  • 4. Return result

END

slide-28
SLIDE 28

CSCI1270: Introduction to Database Systems

Ri = {A, B, C, D, E) (B → CD) ∈ F+, B → Ri

BCNF

Decomposition Algorithm

Each Step: Decompose Ri that is not in BCNF Ri

1 = (B, C, D)

Note: B → CD Covered, and

B → Ri

1

Ri

2 = (A, B, E)

Progress!

slide-29
SLIDE 29

CSCI1270: Introduction to Database Systems

BCNF

Decomposition Algorithm (cont.) Example:

R = (A, B, C, D) F = {A → B, AB → D, B → C}

  • 1. Compute F+:

F+ = {A → B, AB → D, B → C, A → C, A → D, AB → C, AC → D, AD → C, ABC → D, ABD → C} + all trivial dep’s

Decompose R into BCNF?

slide-30
SLIDE 30

CSCI1270: Introduction to Database Systems

Ri = {A, B, C, D) B → C covered, B → Ri

BCNF

Decomposition Algorithm (cont.)

R1 = (B, C) In BCNF

  • 1. B → C &

B → R1 R2 = (A, B, D) In BCNF

  • 2. A → B,
  • 3. AB → D,
  • 4. A → D

covered & A → R2, AB → R2

∴ Solution is R = R1 ∪ R2

slide-31
SLIDE 31

CSCI1270: Introduction to Database Systems

BCNF

Note: This will suffice! Find 2 decompositions, 1 DP and 1 not DP

R = (A, B, C, D, E, H) F = {A → BC, E → HA}

Decompose R into BCNF:

F+ = {A → B, A → C, A → BC E → H, E → A, E → HA E → B, E → C, E → BC E → HB, E → HC, E → AB E → AC, AE → …, ABE → …, ACE → …, ADE → …, …} + all trivial dep’s

slide-32
SLIDE 32

CSCI1270: Introduction to Database Systems

BCNF Decomposition

R = (A, B, C, D, E, H) F = {A → BC, E → HA}

Decomposition #1: R = R1 ∪ R3 ∪ R4 Q: Is this DP?

A: Yes. All Fc covered by R1, R3, R4. Therefore F+ covered

R = (A, B, C, D, E, H) Decompose on A → BC R1 = (A, B, C) R2 = (A, D, E, H) Decompose on E → HA R4 = (D, E) R3 = (A, E, H)

slide-33
SLIDE 33

CSCI1270: Introduction to Database Systems

R4 = (C, D, E) Decompose on E → C

BCNF Decomposition (cont.)

R = (A, B, C, D, E, H) F = {A → BC, E → HA} (Note: Fc = F)

Decomposition #2: R = R1 ∪ R3 ∪ R5 ∪ R6 Q: Not DP. Why?

A: A → C not covered by R1, R3, R5 , R6.

R = (A, B, C, D, E, H) Decompose on A → B R1 = (A, B) R2 = (A, C, D, E, H) Decompose on E → HA R3 = (A, E, H) R5 = (C, E) R6 = (E, D)

slide-34
SLIDE 34

CSCI1270: Introduction to Database Systems

More BCNF (cont.)

Q: Can we decompose on FD’s in Fc to get a DP BCNF decomposition?

A: Sometimes, BCNF + DP not possible Decomposition #1: Decomposition #2:

R1 = (L, K) R2 = (J, L) R1 = (J, K, L) R2 = (J, L)

Not DP: JK → L not covered Still not in BCNF

(L not a superkey)

R = (J, K, L) Decompose on L → K R = (J, K, L) Decompose on JK → L R = (J, K, L) F = {JK → L, L → K}

slide-35
SLIDE 35

CSCI1270: Introduction to Database Systems

Aside

Is This a Realistic Example?

JK → L L → K

A: BankerName → BranchName

BranchName CustomerName → BankerName Every banker works at one branch A customer works with the same banker at a given branch

slide-36
SLIDE 36

CSCI1270: Introduction to Database Systems

Testing for FDs Across Relations

  • Decomposition not dependency preserving => an extra

materialized view (MV) for each dependency α →β in Fc that is not preserved in the decomposition

  • The MV is a projection on α β of the join of the

relations in the decomposition

  • DBMS maintains MV when the relations are updated.

è No extra coding effort for programmer.

  • Space overhead: storing MV
  • Time overhead: keeping MV up to date
slide-37
SLIDE 37

CSCI1270: Introduction to Database Systems

Multivalued Dependencies

  • There are database schemas in BCNF that do not

seem to be sufficiently normalized

  • Consider a database

classes(course, teacher, book)


  • The database lists for each course the set of

teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).

slide-38
SLIDE 38

CSCI1270: Introduction to Database Systems

(course, teacher, book) is the only key, and therefore the relation is in BCNF Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted (database, Sara, DB Concepts)
 (database, Sara, Ullman)

course teacher book database database database database database database

  • perating systems
  • perating systems
  • perating systems
  • perating systems

Avi Avi Hank Hank Sudarshan Sudarshan Avi Avi Jim Jim DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Shaw OS Concepts Shaw classes

slide-39
SLIDE 39

CSCI1270: Introduction to Database Systems

Therefore, it is better to decompose classes into:

course teacher database database database

  • perating systems
  • perating systems

Avi Hank Sudarshan Avi Jim teaches course book database database

  • perating systems
  • perating systems

DB Concepts Ullman OS Concepts Shaw text

We shall see that these two relations are in Fourth Normal Form (4NF)

slide-40
SLIDE 40

CSCI1270: Introduction to Database Systems

Multivalued Dependencies (MVDs)

Let R be a relation schema and let α ⊆ R and β ⊆ R. The multivalued dependency α →→ →→ β holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[α] = t2 [α], there exist tuples t3 and t4 in r such that: t1[α] = t2 [α] = t3 [α] = t4 [α] 
 t3[β] = t1 [β] 
 t3[R – β] = t2[R – β] 
 t4 β] = t2[β] 
 t4[R – β] = t1[R – β] 


slide-41
SLIDE 41

CSCI1270: Introduction to Database Systems

MVD (Cont.)

Tabular representation of α →→

→→ β

slide-42
SLIDE 42

CSCI1270: Introduction to Database Systems

Example

  • Let R be a relation schema with a set of attributes

that are partitioned into 3 nonempty subsets. Y, Z, W

  • We say that Y →→

→→ Z (Y multidetermines Z)


if and only if for all possible relations r(R) < y1, z1, w1 > ∈ r and < y1, z2, w2 > ∈ r implies < y1, z1, w2 > ∈ r and < y1, z2, w1 > ∈ r

  • Note that since the behavior of Z and W are

identical it follows that Y →→

→→ Z if Y →→ →→ W

slide-43
SLIDE 43

CSCI1270: Introduction to Database Systems

Example (Cont.)

  • In our example:

course →→

→→ teacher 


course →→

→→ book

  • The above formalizes the notion that a

particular value of Y (course) has associated with it a set of values of Z (teacher) and a set of values of W (book), and these two sets are in some sense independent of each other. Note:

If Y → Z then Y →→

→→ Z

Indeed we have (in above notation) Z1 = Z2
 The claim follows.

slide-44
SLIDE 44

CSCI1270: Introduction to Database Systems

Use of Multivalued Dependencies

  • We use multivalued dependencies in two ways:
  • 1. To test relations to determine whether they are legal

under a given set of functional and multivalued dependencies

  • 2. To specify constraints on the set of legal relations.

We shall thus concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies.

  • If a relation r fails to satisfy a given multivalued

dependency, we can construct a relation rʹ that does satisfy the multivalued dependency by adding tuples to r.

slide-45
SLIDE 45

CSCI1270: Introduction to Database Systems

Fourth Normal Form

  • A relation schema R is in 4NF with respect to

a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form α →→ →→ β, where α ⊆ R and β ⊆ R, at least one of the following hold: α →→ →→ β is trivial (i.e., β ⊆ α or α ∪ β = R) α is a superkey for schema R

  • If a relation is in 4NF it is in BCNF