Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew - - PDF document

database normalization
SMART_READER_LITE
LIVE PREVIEW

Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew - - PDF document

Electricite Du Laos (EDL) Database Normalization Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Department of Computer Engineering Khon Kaen University 1 Overview What and why normalization Background to normalization


slide-1
SLIDE 1

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

1

1

  • Asst. Prof. Dr. Kanda

Runapongsa Saikaew (krunapon@kku.ac.th) Department of Computer Engineering Khon Kaen University

Database Normalization

2

Overview

What and why normalization Background to normalization

Functional dependency Superkey Candidate key Primary key

Normal forms and their purposes

1NF 2NF 3NF

3

What and why normalization

 Normalization is a systematic way of ensuring that a

database structure is suitable for general-purpose querying and free of certain undesirable characteristics —insertion, update, and deletion anomalies—that could lead to a loss of data integrity

 Integrity constraints, in particular functional

dependencies, can be used to identify schemas with such problems and to suggest refinements.

slide-2
SLIDE 2

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

2

Normalized tables

Informally, a relational database table

is often described as "normalized" if it is in the third normal form (3NF)

Most 3NF tables are free of insertion,

update, and deletion anomalies, i.e. in most cases 3NF tables adhere to BCNF, 4NF, and 5NF (but typically not 6NF)

4

When we apply normalization

A standard piece of database design

guidance is that the designer should begin by fully normalizing the design

Selectively denormalize only in places

where doing so is absolutely necessary to address performance issues

Some modeling disciplines, such as the

dimensional modeling approach to data warehouse design, explicitly recommend non-normalized designs

5 6

Functional dependency

 Functional dependency: Attribute B has a

functional dependency on attribute B (i.e., A -> B)

 If, for each value of attribute A, there is exactly one

value of attribute B

 If value of A is repeating in tuples then value of B

will also repeat In our example, branch address and

branch phone number has a functional dependency on branch no.

slide-3
SLIDE 3

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

3

7

All instance that satisfies AB  C

A B C D a1 b1 c1 d1 a1 b1 c1 d2 a1 b2 c2 d1 a2 b1 c3 d1

8

Trivial functional dependency

 Trivial functional dependency A trivial functional dependency is a

functional dependency of attribute of a superset of itself

Examples

{Branch No, Branch Address} -> {Branch

Address}

 {Branch Address} -> {Branch Address}

Full functional dependency

 An attribute is fully functional dependency

  • n a set of attributes X if it is

Functionally dependent on X, and Not functionally dependent on any

proper subset of X

 Example: {Employee Address} has a functional

dependency on {Employee ID, Skill}, but not a full functional dependency because it is also dependent on {Employee ID}

9

slide-4
SLIDE 4

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

4

Transitive dependency

Transitive dependency is an indirect

functional dependency, one in which X→Z only by virtue

  • f X→Y and Y→Z

Example:

Employee ID -> Employee office Employee office -> Employee phone Employee ID -> Employee phone

10

Superkey

Superkey

A superkey is an attribute or set of

attributes that uniquely identifies rows within a table

Two distinct rows are always guaranteed to

have distinct superkeys

Example

 {Employee ID, Salary, Year} would be a

superkey for the {Employee ID, Salary, Bonus, Year} table

{Employee ID, Year} would also be a

superkey

11

Candidate key

Candidate key is a minimal superkey,

that is, a superkey for which we can say that no proper subset of it is also a superkey

Example:

{Employee ID, Year} would be a

candidate key for the table {Employee ID, Salary, Bonus, Year} table.

12

slide-5
SLIDE 5

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

5

Non-prime attribute

A non-prime attribute is an attribute

that does not occur in any candidate key

Bonus would be a non-prime attribute

in the "Employees' Bonus" table.

13

Primary key

Most DBMSs require a table to be

defined as having a single unique key, rather than a number of possible unique keys

A primary key is a candidate key

which the database designer has designated for this purpose

14 15

Example: Constraints on Entity Set

 Consider relation obtained from Hourly_Emps:

 Hourly_Emps (ssn, name, lot, rating, hrly_wages,

hrs_worked)

 Notation: We will denote this relation schema by

listing the attributes: SNLRWH

 This is really the set of attributes {S,N,L,R,W,H}.  Sometimes, we will refer to all attributes of a relation by

using the relation name. (e.g., Hourly_Emps for SNLRWH)

 Some FDs on Hourly_Emps:

 ssn is the key:

S  SNLRWH

 rating determines hrly_wages:

R  W

slide-6
SLIDE 6

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

6

16

Data redundancy example1 (1/3)

ssn name lot rating Hr_Wages Hr_Worked 1 A 48 8 10 40 2 B 22 8 10 30 3 C 35 5 7 30 4 D 35 5 7 32 5 E 35 8 10 40

17

Data redundancy example1 (2/3)

Redu

dunda ndant nt Stora rage ge: The rating value 8 corresponds to the hourly wage 10, and this association is repeated three times

Upda

date e Anoma malie lies: s: The hourly_wages in the first tuple could be updated without making a similar change in the second tuple

18

Data redundancy example1 (3/3)

 Inserti

rtion Anomalies: We cannot insert a tuple for an employee unless we know the hourly wage for the employee‟s rating value

 Deletion A

Anomalies: If we delete all tuples with a given rating value (e.g., we delete the tuples for C and D) we lose the association between the rating value and its hourly_wage value

slide-7
SLIDE 7

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

7

19

Decompositions (1/2)

Redundancy arises when a relational

schema forces an association between attributes that is not natural.

Functional dependencies can be used

to identify such situations and suggest refinement to the schema

20

Decompositions (2/2)

 The essential idea is that many problems

arising from redundancy can be addressed by replacing a relation with a collection of „smaller‟ relations

 R consists of replacing the relation

schema by two (or more) relation schemas that each contain a subset of attributes of R and together include all attributes in R

21

Examples

We can decompose Hourly_Emps

into two relations:

Hourly_Emps2(ssn, name, lot, rating,

hours_worked)

Wages(rating, hourly_wages)

This is more efficient than updating

several tuples (as in the original design), and it eliminates the potential for inconsistency

slide-8
SLIDE 8

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

8

Data redundancy example2

22

Redundant Storage: the branchNo B001 and B002 corresponds to their branchAddress and telNo are repeated

Example2 Problem

 Update Anomalies: If we change address of

Tom Daniels, we also need to change address of Sally Adams

 Insertion Anomalies: We cannot insert a

tuple for a staff unless we know the address and phone number of the branch of that stuff

 Deletion Anomalies: If we delete all tuples

with a given branch value (e.g., we delete the tuples for branch B002) we lose the association between the branch and its address and phone number

23

Example2 Solution

24

slide-9
SLIDE 9

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

9

25

Normal Forms (1/2)

Given a relation schema, we need to

decide whether it is a good design or we need to decompose it into smaller relations

To provide such guidance, several

normal forms have been proposed

26

Normal Forms (2/2)

The normal forms based on FDs are

first normal form (1NF), second normal form (2NF), and third normal form (3NF)

Every relation in 3NF is also in 2NF Every relation in 2NF is also in 1NF

First Normal Form (1NF)

A relation is in first normal form if

every field contains only atomic values, that is, no lists or sets

This requirement is implicit in our

definition of the relation model

Table faithfully represents

a relation and has no "repeating groups"

27

slide-10
SLIDE 10

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

10

Branch Table is not in 1NF

28

Converting Branch Table into 1NF

29

Second normal form (2NF)

2NF only applies to tables with

composite primary keys.

No non-prime attribute in the table

is functionally dependent on a part (proper subset) of a candidate key

30

slide-11
SLIDE 11

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

11

TempStaffAllocation Table is not in 2NF

31

Formal Definition of 2NF

Formal definition of 2NF is a table

that is in 1NF and every non-prime attribute is fully functional dependent

  • n the primary key

Full functional dependency indicates

that if A and B are columns of a table, B is fully dependent on A if B is functionally dependent on A but not

  • n any proper subset of A

32

Converting TempStaffAllocation table to 2NF

33

slide-12
SLIDE 12

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

12

Third Normal Form (3NF)

A table that is in 1NF and 2NF and in

which all non-prime attributes can be derived from only the primary key column(s) and no other columns

The formal definition of 3NF is a table

that is in 1NF and 2NF and in which no non-prime attributes is transitively dependent on the primary key

34

StaffBranch table is not in 3NF

35

Converting StaffBranch table to 3NF

36

slide-13
SLIDE 13

Electricite Du Laos (EDL)

  • Dr. Kanda Runapongsa Saikaew, Computer Engineering, KKU

13

Summary (1/2)

Norma

rmaliz lizati ation

  • n is a systematic way of

ensuring that a database structure is suitable for general-purpose querying

Free of certain undesirable

characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity

Informally, a relational database table

is often described as "normalized" if it is in the 3NF

37

Summary (2/2)

1NF: Table faithfully represents

a relation and has no "repeating groups"

2NF: No non-prime attribute in the

table is functionally dependent on a part (proper subset) of a candidate key

3NF: Every non-prime attribute is

non-transitively dependent on every key of the table

38

References

Connolly and Begg, “Database

Systems: A Practical Approach to Design, Implementation and Management”, Pearson, 2004

http://en.wikipedia.org/wiki/Database_

normalization

Ramakrishnan and Gehrke,

“Database Management Systems”, McGraw-Hill Science/Engineering/Math, 2003

39