Normalization
Cs386 - Introduction to Database Systems Jay Urbain, PhD
Credits: Data Management, Richard Watson Database Management Systems, Raghu Ramkrishnan, Johannes Gehrke
Normalization Cs386 - Introduction to Database Systems Jay Urbain, - - PowerPoint PPT Presentation
Normalization Cs386 - Introduction to Database Systems Jay Urbain, PhD Credits: Data Management, Richard Watson Database Management Systems, Raghu Ramkrishnan, Johannes Gehrke Normalization n An alternative/adjunct database design tool to
Cs386 - Introduction to Database Systems Jay Urbain, PhD
Credits: Data Management, Richard Watson Database Management Systems, Raghu Ramkrishnan, Johannes Gehrke
n An alternative/adjunct database design tool to data
modeling.
n Schema refinement. n A theoretical foundation for the relational model. n Application of a series of rules that gradually improve the
design.
n A relationship between the attributes of an entity where:
¨ One or more attributes determine the value of another attribute
n An identifier can functionally determine all of the
attributes of an entity
¨ stock code -> firm name, stock price, stock quantity, stock
dividend
¨ If we know stock code, we know the value of firm name, etc.
n Multi-valued dependency
¨ Formula
n (stock dividend, stock price) -> yield
n Yield is fully functionally dependent on stock dividend
and stock price because both of these attributes are required to determine the value of yield
¨ (stock dividend, stock price) -> yield
n Determinant
¨ An attribute that fully functionally determines another
attribute
n e.g., stock code determines stock PE
n A given value can determine multiple values
¨ A multidetermines B ¨ A -> B ¨ E.g., Department multidetermines course
n One-to-one
¨ A value of an attribute determines the value of
another attribute and vice versa
¨ A -> B and B -> A ¨ e.g.:
n CH -> Switzerland n Switzerland -> CH
n One-to-many
¨ A value of one attribute determines the value of
another attribute but not vice versa
¨ A -> B ¨ e.g.:
n country name -> currency unit n currency unit NOT-> country name
n Many-to-many
¨ Neither attribute determines the other ¨ A not -> B ¨ B not -> A
n country name not -> language n language not -> country name
¨ French and Flemish is spoken in Belgium ¨ French is spoken in many countries
n A classification of relations n Innermost is first normal form
¨ I.e. each normal form is a prerequisite for the next higher
normal form.
n Only scalar attributes
¨ All rows must have the same number of columns ¨ Single valued attributes only ¨ "A relation is in first normal form if it has the
property that none of its domains has elements which are themselves sets.”
n E. F. Codd (Oct 1972). "Further normalization of the
database relational model”.
n All columns are fully functionally dependent on the
primary key
¨ Violated when a nonkey column is a fact about part of the
primary key
¨ customer-credit in this case is not dependent on itemno
itemno customerid quantity customer-credit 12 57 25 OK 34 679 3 POOR
n No Transitive dependencies
¨ Violated when a nonkey column is a fact about another nonkey
column
¨ A column is not fully functionally dependent on the primary key ¨ exchange rate in this case is a fact about nation
stock stock code nation exchange rate MG USA 0.67 IR AUS 0.46
n Violation of BCNF arises when a table
¨ has multiple candidate keys ¨ the candidate keys are composite ¨ the candidate keys overlap
advisor client probtype consultant Alpha Marketing Gomez Alpha Production Raginiski
n A row should not contain two or more multivalued
independent facts
student studentid sport subject … 50 Football English … 50 Football Music … 50 Tennis Botany … 50 Karate Botany …
n 5NF if decomposition would not remove any redundancies. n Not 5NF if a table can be reconstructed from other tables
¨ There exists some rule that enables a relation to be inferred
n Base case
¨ Consultants provide skills to one or more firms, and firms can use many
consultants; a consultant has many skills and a skill can be used by many firms; and a firm can have a need for many skills and the same skill can be required by many firms
advisor client probtype consultant Alpha Marketing Gomez Alpha Production Raginiski
n Designed to reduce redundancy in databases recording multi-valued
facts by isolating semantically related multiple relationships.
n The rule - If a consultant has a certain skill (e.g., database) and has a
contract with the firm that requires that skill (e.g., IBM), then the consultant advises the firm on that skill (i.e., he advises IBM on database).
n Every constraint on the table is a logical consequence of
the table's domain constraints and key constraints.
n Key: unique identifier n Constraint: rule governing attribute values n Domain: set of values of the same data type
n Property of data which, when satisfied, requires every
value of one attribute of a relation to exist as a value of another attribute in a different (or the same) relation (table).
n Foreign Key constraint
n Trick: look for invalid relationships:
¨ Expressed as 1:1 or single entity, when relationship is 1:m ¨ 1:m when relationship is m:m
n Data modeling is often an easier path to good database
design
n A high-fidelity data model will be of high normal form n 5NF is likely to create the most problems
¨ Check for special rules
n Normalization is one approach to data modeling. n Based on eliminating redundancies.
¨ Note: eliminating redundancies can increase the number of table
joins which can reduce performance.
n Use normalization to complement relational modeling. n A data model can have multiple representations. n Learning to model is difficult. n Learning to represent a model is relatively easy.