RECAP But So far we have seen: How do we make sure the - - PDF document

recap but
SMART_READER_LITE
LIVE PREVIEW

RECAP But So far we have seen: How do we make sure the - - PDF document

RECAP But So far we have seen: How do we make sure the decompositions are lossless (equivalence preserving)? How to use functional dependencies to guide the design of relations Are some decompositions better than others? How


slide-1
SLIDE 1

1

RECAP

So far we have seen:

  • How to use functional dependencies to

guide the design of relations

  • How to modify/decompose relations to

achieve 1NF, 2NF and 3NF relations

2

But

  • How do we make sure the decompositions

are lossless (equivalence preserving)?

  • Are some decompositions better than
  • thers?
  • What if there are multiple candidate keys?

3

Why is losslessness important?

  • No information is lost or added implicitly by

mistake.

  • Any information that can be derived from the
  • riginal relation can also be derived from the

relations that result from the decomposition and vice versa.

  • In other words, you get the same answers to your

queries.

4

An Instance of Relation NEWS

S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens

5

Suppose we decompose NEWS into:

NEWS (S#, SNAME, STATUS, CITY)

CITY → → → → STATUS S# → → → → SNAME, STATUS, CITY

R1(S#, Sname) R2(City, Status)

So no attributes are lost.

6

R1 R2 S# Sname City Status S1 Smith London 20 S2 Jones Paris 10 S3 Blake Athens 30 What is the status or city of Smith?

slide-2
SLIDE 2

7

Would it be good enough if the 2 relations were to have at least one attribute in common?

8

NEWS (S#, SNAME, STATUS, CITY)

CITY → → → → STATUS S# → → → → SNAME, STATUS, CITY

R1(S#, Sname, Status) R2(City, Status) So no attributes are lost and the R1 and R2 have an attribute in common. ?????? Do you see any problems ???????

9

R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30

10

R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20

11

R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20 What is the city of Smith? London or Rome?

12

So still not good enough. R R1 R2

  • Do not loose any attributes.
  • Make sure R1 and R2 have some

attribute(s) in common.

  • Some extra condition on the shared

attribute(s) to ensure losslessness.

slide-3
SLIDE 3

13

Losslessness

Definition: Lossless decomposition A decomposition of a relation R into relations R1, ..., Rn is lossless (nonloss) if and only if for every instance of R and the Ri the natural join of R1, ..., Rn gives the relation R.

14

Natural Join – very briefly

Teaches Class Lecturer Course Course class fs logic logic msc jm ai ai msc sd C++ ai meng2

15

Teaches Class Lecturer Course Course Class fs logic logic msc jm ai ai msc sd C++ ai meng2

16

Teaches JOIN Class

Lecturer Course Class fs logic msc jm ai msc jm ai meng2

17

Another Example of JOIN

Teaches Class-number Lecturer Course Number Class fs logic 60 msc jm ai 100 meng2 Teaches JOIN Class-number Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2

18

A Lossy Decomposition

R A B C a1 b1 c1 a2 b1 c2 a2 b2 c2

slide-4
SLIDE 4

19

Suppose we decomposed R into R1 and R2. R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2

20

Now consider the JOIN of R1 and R2. R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2

21

R1 JOIN R2 versus R

R1 JOIN R2 R

A B C A B C a1 b1 c1 a1 b1 c1 a1 b1 c2 a2 b1 c2 a2 b1 c1 a2 b2 c2 a2 b1 c2 a2 b2 c2

22

Theorem: sufficient condition for losslessness

Suppose R is a relation scheme and F is a set of functional dependencies on R. Let R1 and R2 be projections of R such that the union of the sets of attributes of R1 and R2 is equal to the set of attributes of R. This decomposition of R is lossless if at least one of the following fds is in F+:

  • R1 ∩

∩ ∩ ∩ R2 → → → → R1

  • R1 ∩

∩ ∩ ∩ R2 → → → → R2

23

Example: Lecturer DB

Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2 Class → Number Only key: (Lecturer,Course,Class)

24

Example cntd.

What normal form is Lecturer DB in?

Lecturer(Lecturer,Course,Number,Class)

Decompose to: Degree(Class,Number) Teaches(Lecturer,Course,Class) Degree ∩ ∩ ∩ ∩ Teaches = Class Class → → → → Degree

slide-5
SLIDE 5

25

So far we have done:

How do we make sure the decompositions are lossless (equivalence preserving)?

  • Are some decompositions better than
  • thers?
  • What if there are multiple candidate keys?

26

Dependency Preservation

It is often possible to decompose a relation in different ways. Amongst the lossless decompositions some may be better than

  • thers.

27

Example NEWS (S #, SNAME, STATUS, CITY) S# → → → → SNAME, STATUS, CITY CITY → → → → STATUS

28

Here is an instance of relation NEWS.

S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens NEWS is in 2NF, but not in 3NF. It can be transformed to 3NF by two alternative decompositions.

29

A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) Both decompositions are lossless. All resulting relations are in 3NF.

30

Instance of A

Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens

slide-6
SLIDE 6

31

Instance of B

Supplier Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30

32

Which one is better? A or B?

Example: Consider the update “Change CITY of supplier S1 from London to Paris.” What does this involve in A? What does this involve in B?

33

In A: All we have to do is change the relevant value in relation Supplier. In B: We have to update both relations Supplier and Status-info to ensure that the functional dependency CITY → STATUS is maintained.

34

In A

In Supplier: Change <S1, Smith, London> to <S1, Smith, Paris> .

35

Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens

36

In B

  • In Supplier:

Change <S1, Smith, London> to <S1, Smith, Paris> .

  • In Supplier find a row with City=Paris,

and read its S#.

  • In Status-info find the Status of that S#.
  • In Status-info change the value of the

Status of S1 to this new Status.

slide-7
SLIDE 7

37

Supplier Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30

38

Another problem with B is that:

(Assuming the Entity Integrity Rule) We cannot insert in B the information that a given city has a given status, unless some supplier is located in that city.

39

In A, it is the transitive dependency S # → → → → STATUS which is an inter-relational constraint. This constraint is maintained automatically as long as the constraints S # → → → → CITY CITY → → → → STATUS are maintained in each relation, and these are just primary key constraints in each relation of A.

40

In A

A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) S# → → → → SNAME S# → → → → CITY S# → → → → STATUS CITY → → → → STATUS

41

In B

B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) S# → → → → SNAME S# → → → → CITY S# → → → → STATUS CITY → → → → STATUS The problem with B is that the dependency CITY → → → → STATUS has become an inter-relational constraint.

42

Formalisation/Generalisation of This Intuition

R (F) R1 R2 …… …. Rn F1 F2 …… …. Fn

slide-8
SLIDE 8

43

Let F´ = F1∪ F2 ∪ ….. ∪ Fn In general F ≠ F´. But if F+ = F´+, then to check F we only need to check F´.

44

Definition: Dependency-Preserving

A decomposition R1, ..., Rn of R is dependency- preserving if and only if F+ = F´+, where F and F´ are defined as above. Example: In the NEWS example A is dependency preserving, but B is not.

45

Example:

NEWS (S #, SNAME, STATUS, CITY) S# → → → → SNAME, STATUS, CITY CITY → → → → STATUS A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS)

46

In A

Fsupplier={S# → → → → SNAME, CITY} Fcity-info = { CITY → → → → STATUS} So F’ = Fsupplier ∪ ∪ ∪ ∪ Fcity-info. So clearly F+=F’+.

47

In B

Fsupplier={S#→ → → →SNAME, CITY} FStatus-info = { S#→ → → →STATUS} So F’ = Fsupplier ∪ ∪ ∪ ∪ FStatus-info. CITY→ → → →STATUS is in F+ but not in F’+. So F+ ≠ ≠ ≠ ≠ F’+.

48

So far we have done:

How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than

  • thers?
  • What if there are multiple candidate keys?
slide-9
SLIDE 9

49

Generalising 2NF and 3NF: Boyce-Codd Normal Form (BCNF)

50

For 2NF – Get rid of:

D E F

A B C

51

For 3NF – Get rid of :

D F

A B C D E

52

What we want:

D E F G H

A B C

53

With 2NF and 3NF: assumed that the relation has one candidate key. Now we generalise to cater for multiple candidate keys. This more general normal form is called the Boyce- Codd Normal Form (BCNF).

54

CK1

CK3 CK2 attribute1 attribute1 attribute2 attribute2 attribute3 attribute3 attribute4

slide-10
SLIDE 10

55

Definition: Determinant

A determinant is any attribute, or set of attributes, on which some other attribute is fully functionally dependent. Example: R(A,B,C,D,E) AB→C A → B C → D DE →A Here A and C are determinants. There are 2 others. Can you see what they are?

56

Definition: BCNF

A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a candidate key. Any relation can be nonloss decomposed into a collection of BCNF relations.

57

Example: Enrols(Student #, Subject, Teacher) Teacher → → → → Subject (student #, subject) → → → → Teacher

58

An Instance of the Relation Scheme Enrols

Student # Subject Teacher 100 maths smith 101 maths jones 102 maths smith 103 maths smith 104 physics brown 101 physics brown 100 physics green

59

  • Each student is taught by several teachers.
  • Each teacher teaches only one subject.
  • Each student takes several subjects and has only
  • ne teacher for a given subject.

60

  • What are the candidate keys of Enrols?
  • What normal form is Enrols in?
  • What problems do you see in the design of

Enrols?

slide-11
SLIDE 11

61

  • Candidate keys of Enrols are

(Teacher, Student#) (Subject, Student#)

62

Teacher Student# Subject Student# Teacher Subject

63

Problems with Enrols

  • We cannot insert the fact that a teacher teaches a

certain subject until at least one student enrols for that subject.

  • The fact that a teacher teaches a certain subject is

recorded with a lot of redundancy, for every student to whom he teaches that subject.

64

Teacher Student# Subject Student# Teacher Subject

65

Solution

Decompose Enrols into Courses(Teacher, Subject) Students(Student #,Teacher)

66

In Enrols: Teacher is a determinant, but not a candidate key.

In Courses(Teacher, Subject) : The only dependency is Teacher → Subject. So Teacher is the only determinant. It is also the only candidate key.

slide-12
SLIDE 12

67

Exercise

In Students(Student#,Teacher) Teacher → → → → Subject (student #, subject) → → → → Teacher What are the candidate keys and the determinants of Students?

68

No determinant. So BCNF. WHY?? Only Candidate Key: (Student#,Teacher)

69

Exercise

Is the decomposition lossless? Is the decomposition dependency preserving? Enrols (Student #, Subject, Teacher) Courses(Teacher, Subject) Students(Student#,Teacher) Teacher → → → → Subject (student #, subject) → → → → Teacher

70

Exercise

S (S#, Sname, Status, City) with FDs S # → → → → Status, City, Sname Sname → → → → City, Status, S# Is S in BCNF?

71

In S:

  • Determinants:

S# and Sname

  • Candidate keys:

S# and Sname

  • So all determinants are candidate keys.
  • So S is in BCNF.

72

Exercise

SSP (S#, Sname, P#, Qty) with FDs S# → → → → Sname Sname → → → → S# S#, P# → → → → Qty Sname, P# → → → → Qty Is SSP in BCNF?

slide-13
SLIDE 13

73

In SSP:

  • Determinants:

S# and Sname

  • Candidate keys: (S#, P#) and (Sname, P#)
  • So there are determinants that are not candidate

keys.

  • So S is not in BCNF.

74

Decomposing SSP to BCNF relations

SSP Lossless? S1(S#, Sname) Dependency S2(S#,P#,Qty) Preserving? SSP Lossless? R1(S#, Sname) Dependency R2(Sname,P#,Qty) Preserving?

75

An Algorithm for BCNF Decomposition

  • Input:

A relation R, the closure, F+, of the set of functional dependencies on R.

  • Output (result):

A set of relations Ri, such that each Ri is in BCNF and the decomposition of R into the Ri is lossless.

76

begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in BCNF) then begin let A → B be a nontrivial ffd that holds

  • n Ri, such that A is not a candidate key of Ri;

result : = (result - Ri) ∪ (Ri - B) ∪ (A, B); end else done : = true ; end;

77

Same algorithm we have been using for 2NF and 3NF

begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in the required normal form) then begin let A → B be an fd that holds on Ri, that shows Ri is not in the required normal form; result : = (result - Ri) ∪ (Ri - B) ∪ (A, B); end else done : = true ; end;

78

Normalisation - Conclusion

Objectives of normalisation:

  • Eliminate redundancy
  • Avoid update anomalies
  • (From 5NF upwards) Simplify the

enforcement of certain integrity constraints

slide-14
SLIDE 14

79

Some Limitations of Normalisation

  • Full normalisation not always desirable.

Example: Customer(Name,Street,City, Postcode) Postcode → City, Street So Customer is not in 3NF.

80

  • Normalisation often facilitates update, but

tends to have an adverse effect on query

  • evaluation. Related data which may have

been retrievable from one relation in an unnormalised schema may have to be retrieved from several relations in the normalised form.

81

  • Decomposition into normal forms is not

always unique. But there is not much guidance which decomposition to choose.