Normalisation Why normalise? To improve (simplify) database design - - PowerPoint PPT Presentation

normalisation
SMART_READER_LITE
LIVE PREVIEW

Normalisation Why normalise? To improve (simplify) database design - - PowerPoint PPT Presentation

Normalisation Why normalise? To improve (simplify) database design in order to Avoid update problems Avoid redundancy Simplify update operations 1 Example ( the practical difference between a first normal form relation and an


slide-1
SLIDE 1

1

Normalisation

Why normalise? To improve (simplify) database design in order to

  • Avoid update problems
  • Avoid redundancy
  • Simplify update operations
slide-2
SLIDE 2

2

Example ( the practical difference between a first normal form relation and an unnormalised table)

EMPLOYEE-PROJECT EMP* EMP-NAME PROJECT E1 Smith CS 101 CS 203 EE 121 E2 Jones CS 202 CS 101 E3 Lee EE 410

slide-3
SLIDE 3

3

Consider 2 transactions on EMPLOYEE-PROJECT

T1: Insert the fact that Hanks works on project EE 202. T2: Insert the fact that Smith works on project EE 202.

slide-4
SLIDE 4

4

No difference between the two transactions with EMPLOYEE-PROJECT*

EMPLOYEE-PROJECT* EMP# EMP-NAME PROJECT E1 Smith CS 101 E1 Smith CS 203 E1 Smith EE 121 E2 Jones CS 202 E2 Jones CS 101 E3 Lee EE 410

slide-5
SLIDE 5

5

Normalisation theory allows us to detect such cases and shows how relations can be converted to more suitable forms.

slide-6
SLIDE 6

6

Numerous normal forms have been defined:

  • 1NF: first normal form
  • 2NF: second normal form
  • 3NF: third normal form
  • BCNF: Boyce-Codd normal form
  • 4NF: fourth normal form
  • 5NF: fifth normal form
slide-7
SLIDE 7

7

Each succeeding normal form improves on the previous one by specifying further constraints

  • n the relations.
slide-8
SLIDE 8

8

Definition: 1NF

1NF: A relation is in first normal form if and only if it contains atomic values only. 1NF relations can still have undesirable features.

slide-9
SLIDE 9

9

Recall the S-P-SP Database

Suppose we design the S-P-SP database differently: Instead of S(S#, SNAME, STATUS,CITY) SP(S#, P#, QTY) we have a single relation SSP (S#, SNAME, STATUS,CITY, P#, QTY). (leaving the P relation as it is).

slide-10
SLIDE 10

10

S

S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens

slide-11
SLIDE 11

11

SP

S# P# QTY S# P# QTY S1 P1 300 S4 P2 200 S1 P2 200 S4 P4 300 S1 P3 400 S4 P5 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200

slide-12
SLIDE 12

12

An instance of the SSP

S#SNAME STATUS CITY P# QTY S1Smith 20 London P1 300 S1Smith 20 London P2 200 S1Smith 20 London P3 400 S1Smith 20 London P4 200 S1Smith 20 London P5 100 S1Smith 20 London P6 100 S2Jones 10 Paris P1 300 S2Jones 10 Paris P2 400 S3Blake 30 Paris P2 200 S4Clark 20 London P2 200 S4Clark 20 London P4 300 S4Clark 20 London P5 400

slide-13
SLIDE 13

13

Do you see any problems with SSP?

  • Redundancies

e.g. every tuple for supplier S1 shows SNAME to be Smith, and CITY to be London.

  • Update problems

Suppose supplier S1 moves from London to Paris.

slide-14
SLIDE 14

14

Compare With The Original Table S

S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens

slide-15
SLIDE 15

15

There are other problems associated with the design

  • f the SSP relation which we will discuss later.

To be able to identify all these problems and solutions in general, we have to know about functional dependencies.

slide-16
SLIDE 16

16

Functional Dependencies

Given a relation R, and X, Y subsets of the set of attributes of R, Y is functionally dependent on X if and only if each X-value in R has associated with it at most one Y-value in R. In other words, whenever two tuples of R agree on their X-value, they must also agree on their Y-value. The functional dependency of Y on X is expressed by X → → → → Y (read as "X functionally determines Y".)

slide-17
SLIDE 17

17

Examples

S.S# → → → → S.SNAME S.S# → → → → S.STATUS S.S# → → → → S.CITY

  • r more succinctly

S.S# → → → → S.(SNAME, STATUS, CITY)

  • r

S# → → → → SNAME, STATUS, CITY if the context of relation S is understood.

slide-18
SLIDE 18

18

In relation SP: S#, P# → → → → QTY Note: Dependencies are a matter of the semantics of the data, not merely a matter of the data values that happen to appear in a relation at some particular time.

slide-19
SLIDE 19

19

Exercise: Functional Dependencies

Find the functional dependencies amongst the following attributes: Snumber LectureCourse (LC) Name LectureCourseGrade(LCG) TutorName LectureRoom# (LR) TutorRoom# LectureTime (LT) Degree LectureRoomCapacity (LRC)

slide-20
SLIDE 20

20

Which of these hold?

Snumber → → → →Name, TutorName, Degree TutorName → → → →TutorRoom# Snumber → → → → TutorRoom# Snumber → → → → LectureCourse LectureCourse → → → → LectureRoom# LectureRoom#→ → → →LectureRoomCapacity LectureCourse → → → → LectureTime

slide-21
SLIDE 21

21

Which of these hold?

Snumber,Name → → → → TutorName LectureCourse,LectureTime→ → → → LectureRoom# TutorName → → → → LectureCourse

slide-22
SLIDE 22

22

Inference Axioms for functional dependencies

Given a set of functional dependencies we can derive others using the following inference axioms. In these axioms: X, Y, Z, W denote sets of attributes, XY is a shorthand for X∪Y.

slide-23
SLIDE 23

23

LHS |= RHS If you know the items given on the LHS then you can infer the items on the RHS.

slide-24
SLIDE 24

24

Axioms for functional dependencies

  • A1:Reflexivity

|= X → → → → X

  • A2:Augmentation

X → → → → Y |= XZ → → → →Y X → → → →Y |= XZ → → → → YZ

  • A3:Transitivity

(X → → → → Y) ∧ ∧ ∧ ∧ (Y → → → →Z) |= X → → → →Z The axioms A1-A3 are called Armstrong's axioms.

slide-25
SLIDE 25

25

Example of Augmentation: Snumber → → → → TutorName Snumber, SName → → → → TutorName

slide-26
SLIDE 26

26

Axioms cntd.

  • A4: Additivity

(X → → → → Y) ∧ ∧ ∧ ∧ (X → → → → Z) |= X → → → →YZ

  • A5: Projectivity

X → → → → YZ |= X → → → → Y X → → → → YZ |= X → → → → Z

  • A6:Pseudotransitivity

(X → → → → Y) ∧ ∧ ∧ ∧ (YZ → → → →W) |= XZ → → → →W A4-A6 can be derived from A1 - A3.

slide-27
SLIDE 27

27

Exercise

Derive A4 from A1 - A3.

slide-28
SLIDE 28

28

X→Y given X →Z given ? ? ? X →YZ

slide-29
SLIDE 29

29

Some Useful Definitions

  • A functional dependency of the form X→Y is trivial

iff Y ⊆ X. (E.g AB →A)

  • Let F be a set of fds. The closure of F, denoted F+, is

the set of all fds logically implied by F.

  • Let X be a set of attributes. The set of all attributes

functionally determined by X under a set F of fds is called the closure of X under F, and is denoted X+F.

slide-30
SLIDE 30

30

Example: F: B→ C C→D A→E CE →F Then B+

F = {B, C, D}

AB +

F = ???

slide-31
SLIDE 31

31

An Algorithm for computing X+

F

result : = X; while (changes to result) do for each fd B→C in F do begin if B ⊆ result then result:= result ∪ C end;

slide-32
SLIDE 32

32

Example: Find AG+

F where F is:

F: A → B CG → I C → D H → C B → H result = {A,G} result = {A,G,B,H} result = {A, G,B,H,C,I,D} result = {A,G,B,H,C,I,D} AG+F = {A,B,C,D,G,H,I}

slide-33
SLIDE 33

33

The speed of the algorithm is dependent on the size of

  • F. Also these functional dependencies are part of

the integrity constraints of the data stored in the

  • database. These constraints have to be checked when

updating the database and maintained. So it can pay to reduce the size of F without changing its closure.

slide-34
SLIDE 34

34

Integrity Constraints

Name Degree Length of Degree Smith A. BEng 3 Smith B. MEng 4 Fran C. MSc 1 Jones D. MSc 1 Name → Degree Degree → Length of Degree Update: Change Length of Degree of Fran C. to 2.

slide-35
SLIDE 35

35

Definition : equivalent sets of FDs

Two sets of functional dependencies S1 and S2 are equivalent if and only if S1+=S2+, i.e. S1 implies all the fds in S2, and vice versa. Informally: S1 and S2 are equivalent if they contain exactly the same information.

slide-36
SLIDE 36

36

Definition: Irreducible/Canonical Cover

An irreducible cover for a set F of fds, denoted Fc, is a set of fds that satisfies the following four conditions: 1. F and Fc are equivalent. 2.The right-hand side of every dependency in Fc involves just one attribute.

slide-37
SLIDE 37

37

3.The left-hand side of every dependency in Fc is irreducible, i.e. no attribute in any left- hand side can be discarded without changing the closure.

  • 4. No fd in Fc can be discarded without

changing the closure.

slide-38
SLIDE 38

38

For each set of functional dependencies there exists at least one irreducible cover, but each set does not necessarily have a unique irreducible cover.

slide-39
SLIDE 39

39

Example: Irreducible Cover

F: A → → → → BC B → → → → C A → → → →B AB → → → → C AC → → → →D

slide-40
SLIDE 40

40

A → → → → BC A → → → → B B → → → → C A → → → → C A → → → →B AB → → → → C AC → → → →D

slide-41
SLIDE 41

41

First rewrite F as: A → → → → B A → → → → C B → → → → C A → → → →B AB → → → → C AC → → → →D This set is equivalent to F by additivity and projectivity.

slide-42
SLIDE 42

42

A → → → → B A → → → → C B → → → →C A → → → → B AB → → → → C AC → → → → D

A → → → → B

slide-43
SLIDE 43

43

Delete the duplication of A→ B: A → → → → B A → → → → C B → → → →C AB → → → → C AC → → → → D

slide-44
SLIDE 44

44

AB→ → → →C can be deleted without changing the closure, because it is implied from A→ → → →C by augmentation. A → → → →B A → → → → C B → → → → C AC → → → → D

AB → → → → C

slide-45
SLIDE 45

45

A → → → →B A → → → → C B → → → → C AC → → → → D

slide-46
SLIDE 46

46

A → → → → B A → → → → C B → → → → C A → → → → D C

slide-47
SLIDE 47

47

C can be deleted from AC→ → → →D, because A→ → → →D is implied from A→ → → →C and AC→ → → →D by augmentation and transitivity: A → → → → B A → → → → C B → → → → C A → → → → D

slide-48
SLIDE 48

48

A → → → → B B → → → → C A → → → → D A → → → → C

slide-49
SLIDE 49

49

Finally A→ → → →C can be deleted because it is implied by A→ → → →B and B→ → → →C by transitivity: A → → → → B B → → → → C A → → → → D This final set is an irreducible cover for F.

slide-50
SLIDE 50

50

Definitions: Candidate, Primary and Alternate Keys

A set of attributes K of a relation R is a candidate key of R iff K satisfies the following two conditions: 1) K→ → → →R This is a shorthand for saying that K functionally determines all attributes of R. 2) ¬ ∃K' ( K'⊂K and K'→ → → →R) i.e. there is no proper subset of K that functionally determines all attributes of R.

slide-51
SLIDE 51

51

Every relation has at least one candidate key. Some relations may have exactly one, but it is possible that some may have two or more. Historically, in the relational model, for any given relation, one of the candidate keys is chosen as the primary key, and then the remainder (if any) are called alternate keys.

slide-52
SLIDE 52

52

Example

Suppose in relation S, S# and SNAME both uniquely identify each supplier. Then S would have two candidate keys: S# and SNAME. We may choose S# as the primary key. Then SNAME becomes an alternate key.

slide-53
SLIDE 53

53

The Entity Integrity Rule

In the relational model, the primary key is constrained by the following integrity rule. Entity Integrity Rule: No attribute participating in the primary key of a relation is allowed to accept NULL values. ( NULL values represent unknown or non-existent values.)

slide-54
SLIDE 54

54

Name Postcode Telephone Smith SW7 2BZ 02075948822 Jones SW5 5AT NULL Pitt NULL 02086662222 NULL = DO not know / Does not exist

slide-55
SLIDE 55

55

Phew….. After all this work on FDs we will now use them to analyse the design of relations and improving the design through normalisation.