CS 61: Database Systems
Normalization
Adapted from Coronel and Morris unless otherwise noted
CS 61: Database Systems Normalization Adapted from Coronel and - - PowerPoint PPT Presentation
CS 61: Database Systems Normalization Adapted from Coronel and Morris unless otherwise noted Objective: create well-formed relations Tables are the building blocks of a relational database Previously we created tables for entities
Normalization
Adapted from Coronel and Morris unless otherwise noted
2
understanding the business rules
Understand business rules Identify entities and their attributes Model relationships between entities Apply constraints
3
4
Normalization is the process of evaluating and correcting poor table structure by following a few rules:
in more than one table
primary key
deletion anomalies Eliminate data anomalies by removing unnecessary
redundancies
5
First (1NF), Second (2NF) and Third (3NF) normal form characteristics
First Normal Form (1NF)
dependencies identified Second Normal Form (2NF)
dependencies Third Normal Form (3NF)
dependencies
Higher forms mainly of academic interest only
set of smaller relations as needed moving from 1NF to 3NF
6
7
PlayerID Name Team TeamPhone Position1 Position2 Position3 1Pessi Argentina 54-11-1000-1000 Striker Forward 2Ricardo Portugal 351-2-7777-7777 Right Midfield Defending Midfielder 3Neumann Brazil 55-21-4040-2020 Forward Left Fullback Right Fullback 4Baily Wales 44-29-1876-1876 Defending Midfielder Striker 5Marioso Argentina 54-11-1000-1000 Sweeper Defending Midfielder Striker 6Pare Brazil 55-21-4040-2020 Goalkeeper
Soccer player database Business rules
Based on Prof Charles Palmer lecture notes
8
PlayerID Name Team TeamPhone Position1 Position2 Position3 1Pessi Argentina 54-11-1000-1000 Striker Forward 2Ricardo Portugal 351-2-7777-7777 Right Midfield Defending Midfielder 3Neumann Brazil 55-21-4040-2020 Forward Left Fullback Right Fullback 4Baily Wales 44-29-1876-1876 Defending Midfielder Striker 5Marioso Argentina 54-11-1000-1000 Sweeper Defending Midfielder Striker 6Pare Brazil 55-21-4040-2020 Goalkeeper ∅ ∅ Iceland 54-12-5432-2345 ∅ ∅ ∅
Soccer player database Insert anomaly:
because PlayerID is Primary Key
Based on Prof Charles Palmer lecture notes
9
PlayerID Name Team TeamPhone Position1 Position2 Position3 1Pessi Argentina 54-11-1000-1000 Striker Forward 2Ricardo Portugal 351-2-7777-7777 Right Midfield Defending Midfielder 3Neumann Brazil 55-21-4040-2020 Forward Left Fullback Right Fullback 4Baily Wales 44-29-1876-1876 Defending Midfielder Striker 5Marioso Argentina 54-11-1000-1000 Sweeper Defending Midfielder Striker 6Pare Brazil 55-21-4040-2020 Goalkeeper ∅ ∅ Iceland 54-12-5432-2345 ∅ ∅ ∅
Soccer player database Update anomaly:
not all
Based on Prof Charles Palmer lecture notes
10
PlayerID Name Team TeamPhone Position1 Position2 Position3 1Pessi Argentina 54-11-1000-1000 Striker Forward 2Ricardo Portugal 351-2-7777-7777 Right Midfield Defending Midfielder 3Neumann Brazil 55-21-4040-2020 Forward Left Fullback Right Fullback 4Baily Wales 44-29-1876-1876 Defending Midfielder Striker 5Marioso Argentina 54-11-1000-1000 Sweeper Defending Midfielder Striker 6Pare Brazil 55-21-4040-2020 Goalkeeper ∅ ∅ Iceland 54-12-5432-2345 ∅ ∅ ∅
Soccer player database Delete anomaly:
How does this apply to our inspection database?
Based on Prof Charles Palmer lecture notes
11
Inspection table: compound Primary Key uniquely identifies row: RestaurantID, InspectionDate, InspectionType
Version 1
While a restaurant may receive more than one inspection on a given day (e.g. cycle and trans fat initial inspection), it will not receive more than one of the same type
Each violation has a description (concatenated here)
12
Each inspection can result in multiple violations CriticalFlag set to ‘Y’ if any violation is critical,
Version 1
Is this a good design? Let’s take a closer look
13
Version 1
If a new type
code were created, it will not exist in the database until a restaurant gets this type
Also no consistency in values Could enter a violation code that does not exist or a violation description that does not match the violation code
14
Version 1
If we change the description for a violation code, we must update the description in all rows with that description Hard to find violation codes and descriptions as there are multiple entries in these columns CriticalFlag also depends on all values in the ViolationCode column
15
Version 1
If only one inspection found a particular violation code, and we delete that inspection, we would loose the violation code How do we fix this situation? Normalization!
16
17
Normalization
fewer attributes
produce data
database design
18
Key type Definition Superkey An attribute or combination of attributes that uniquely identifies each row in a table Candidate key A minimal (irreducible) superkey; a superkey that does not contain a subset of attributes that is itself a superkey Primary key A candidate key selected to uniquely identify all other attribute values in any given row; cannot contain null entries Foreign key An attribute or combination of attributes in one table whose values must either match the primary key in another table or be null Composite key A key comprised of multiple attributes (sometimes called a compound key)
19
Functional dependence (FD)
attributes that can be looked up given key A Can look up B, if given A
20
Full functional dependence
subset of the key (e.g., all attributes in key are required) Ex: RestaurantID, InspectionDate, InspectionType ➝ Score All three attributes are required to uniquely identify the score Partial dependence
Ex: RestaurantID, InspectionDate, InspectionType➝InspectionDescription
Transitive dependence
Ex: ViolationCode ➝ ViolationDescription, CriticalFlag
21
First (1NF), Second (2NF) and Third (3NF) normal form characteristics
First Normal Form (1NF)
dependencies identified
22
Step 1: Put in table form
Inspections table
23
Repeating groups Step 1: Put in table form Step 2: Eliminate repeating groups
ViolationDescription attributes because each inspection may result in multiple violations
violation its own row
Inspections table
24
Now each violation on its own row Step 1: Put in table form Step 2: Eliminate repeating groups
ViolationDescription attributes because each inspection may result in multiple violations
violation its own row
Inspections table
25
Step 1: Put in table form Step 2: Eliminate repeating groups Step 3: Identify primary key
uniquely identify rows
InspectionType, ViolationCode
assigned by the system (auto increment)
Inspections table
26
Step 1: Put in table form Step 2: Eliminate repeating groups Step 3: Identify primary key
uniquely identify rows
InspectionType, ViolationCode
system (auto increment)
Inspections table
27
Step 1: Put in table form Step 2: Eliminate repeating groups Step 3: Identify primary key Step 4: Identify dependencies
Now in 1NF, move on to 2NF
Inspections table
28
First (1NF), Second (2NF) and Third (3NF) normal form characteristics
First Normal Form (1NF)
dependencies identified Second Normal Form (2NF)
dependencies
29
Step 1: Make new tables to eliminate partial dependencies
Inspections table
30
Step 1: Make new tables to eliminate partial dependencies
InspectionDate, InspectionType, ViolationCode instead of using surrogate key then:
dependencies (depend only on ViolationCode)
new table and reassigning dependent attributes
Inspections table
31
First (1NF), Second (2NF) and Third (3NF) normal form characteristics
First Normal Form (1NF)
dependencies identified Second Normal Form (2NF)
dependencies Third Normal Form (3NF)
dependencies
32
Step 1: Make new tables to eliminate transitive dependencies
PK and ViolationDescription and CriticalFlag as attributes
ViolationCode
Inspections table
33
Inspections table
Move dependent attributes into their own table (we would have done this with partial dependencies also) Use PK in new table as FK in Inspections
34
Inspections table
Move dependent attributes into their own table (we would have done this with partial dependencies also) Use PK in new table as FK in Inspections
35
Inspections table
Move dependent attributes into their own table (we would have done this with partial dependencies also) Use PK in new table as FK in Inspections Problem! There is a M:N relationship between Inspections and InspectionCodes
36
Inspections table
Use a joining table Inspection table get smaller
moved into new tables
37
Other steps to consider after reaching 3NF
anything?)
attribute like first name if storing name as first and last name)
from a set of known values
38
39
Restaurants table
Can you identify any dependencies in the Restaurants table?
here?
dependencies?
40
Restaurants table
Can you identify any dependencies in the Restaurants table?
here? No: key is single attribute
dependencies?
41
Restaurants table
Can you identify any dependencies in the Restaurants table?
here? No: key is single attribute
dependencies? Yes: ZipCode gives Boro Could make a table with ZipCode as key and Boro as attribute and look up Boro with JOIN as needed
42
Restaurants table
Can you identify any dependencies in the Restaurants table?
here? No: key is single attribute
dependencies? Yes: ZipCode gives Boro For simplicity we choose to keep this table denormalized (2NF, has transitive dependency) But, want Cuisine to be selected from a small number of options, so make a Cuisine table and use Cuisine as FK in Restaurants table
43
44
45
Normal Form Characteristic First normal form (1NF) Table format, no repeating groups, PK and dependencies identified Second normal form (2NF) 1NF and no partial dependencies Third formal form (3NF) 2NF and no transitive dependencies Boyce-Codd normal form (BCNF) Every determinant is a candidate key (special case of 3NF) Fourth normal form (4NF) 3NF and no independent multivalued dependencies Fifth normal form (5NF) and more Mainly of academic interest only
Normalization: evaluating and correcting table structures to minimize data redundancies
Normally good enough
46