Homework review Below is a table containing information about people - - PowerPoint PPT Presentation
Homework review Below is a table containing information about people - - PowerPoint PPT Presentation
Normalization (part 2) Professor Larry Heimann Carnegie Mellon University Information Systems Program Homework review Below is a table containing information about people who registered for events at a community center: (first_name, last_name,
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
a) Identify all the functional dependencies b) Identify the table's normal form c) Convert the table into a database with all tables in 3NF
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
a) Identify all the functional dependencies F = {(first_name, last_name) → email, home_telephone, work_telephone, spouse_name, child_name_i; event_name → event_date, event_price}
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
b) Identify the table's normal form The repeated attributes child_name_(1,2,3) make it clear that this is 0NF . One could also argue that lacking a legitimate primary key also places this as 0NF , but we’ll accept (first_name, last_name) as a primary key in this case.
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
c) Convert the table into a database with all tables in 3NF
people(person_id, first_name, last_name, email, home_telephone, work_telephone, spouse_name) children(child_id, person_id, child_name) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
c) Convert the table into a database with all tables in 3NF
people(person_id, first_name, last_name, email, home_telephone, work_telephone, spouse_name) children(child_id, person_id, child_name) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)
Fact: just over half of adults today are married Implication: lots of nulls in the people table
Homework review
Below is a table containing information about people who registered for events at a community center:
(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)
c) Convert the table into a database with all tables in 3NF (revised)
people(person_id, first_name, last_name, email, home_telephone, work_telephone) relatives(relative_id, person_id, name, relation) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)
Asides
Some issues dealing with households:
- Different kinds of relations
- People switching households over time
Patterns for dealing with household membership Candidate keys and primary keys — some better than others
Homework review
Below is a table that a dentist users to keep track of her work on her patients' teeth:
(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)
a) Identify all the functional dependencies b) Identify the table's normal form c) Convert the table into a database with all tables in 3NF
Homework review
Below is a table that a dentist users to keep track of her work on her patients' teeth:
(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)
a) Identify all the functional dependencies F = { (first_name, last_name) → patient_insurance_number,
amount_currently_owed; insurance_company_name → insurance_company_phone_number; (first_name, last_name), visit_date → which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed }
Homework review
Below is a table that a dentist users to keep track of her work on her patients' teeth:
(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)
b) Identify the table's normal form As in the first problem, we could pick on the (first_name, last_name) primary key and say it’s 0NF , but we won’t. No repeated elements and no partial dependencies, so we are at least in 2NF . However, there are transitive dependencies, so can’t move onto 3NF .
Homework review
Below is a table that a dentist users to keep track of her work on her patients' teeth:
(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)
c) Convert the table into a database with all tables in 3NF
patients(patient_id, first_name, last_name, insurance_id, patient_insurance_number, amount_currently_owed) insurances(insurance_id, company_name, company_phone_number) visits(visit_id, patient_id, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed)
Homework review
Below is a table that a dentist users to keep track of her work on her patients' teeth:
(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)
c) Convert the table into a database with all tables in 3NF (revised)
patients(patient_id, first_name, last_name, amount_currently_owed) patient_insurances(patient_id, insurance_id, patient_insurance_number) insurances(insurance_id, company_name, company_phone_number) visits(visit_id, patient_id, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed)
Homework review
Given a relation (table) R = (A, B, C, D, E) with F = {A → B, E → C, DE → A} a) what are all the candidate keys in this case?
- - not A because it is determined by {C E}, not B because it is
determined by A, not C because it is determined by E
- - not D alone cannot be the key because you need E, as well
to determine A ( and the same goes for why not E alone)
- - only candidate key is {D E} in this case
Class exercise
Given a relation (table) R = (A, B, C, D, E) with F = {A → B, E → C, DE → A} c) decompose the relation into additional relations until it is in 3NF
- - Relation is 1NF because there is a composite primary key {D E}
but there is also a partial dependency {E → C}
- - remove the partial dependency by decomposing to
R1 = (C, E) ; R2 = (A, B, D, E)
- - remove the transitive dependency by decomposing further to
R1 = (C, E) ; R21 = (A, D, E) ; R22 = (A, B)
Homework review
Given a relation (table) R = (A, B, C, D, E) with F = {BC → ADE, D → B} a) Both {BC} and {CD} are candidate keys b) The relation R has no partial or transitive dependencies, so it is 3NF
- c) Trick question; not automatically BCNF because two candidate keys,
but they are overlapping composite keys. This is a rare edge-case that I personally have never seen in real life and would never ask on an exam.
Dealing with World Bank data
- Looking at the raw data…
- Lessons
- Practical database design is not just looking functional dependencies of
the fields, but looking at the data that would fill the fields.
- Are repeating elements? Don’t forget that repeats can be the result of
time.
- Are there null values that we could prudently avoid? (Remember that some
nulls are normal and inevitable — we can’t fear them — but if the design could minimize them, all the better)
- Do we have primary keys that make sense and easy to identify?
Moving on …
Beyond 3NF
- Introducing multi-valued dependencies
In BCNF and below, everything revolved around functional dependencies A multi-valued dependency is one where certain tuples are required to exist in a relation; sometimes referred to as tuple-generating dependencies.
- Fourth Normal Form (4NF)
Multi-valued dependencies must be transformed into functional dependencies; implies that
- ne value and not multiple values are dependent on a primary key
Eliminate multiple sets of multiple valued or multi-valued dependencies, sometimes described as non-trivial multi-valued dependencies
- Fifth Normal Form (5NF)
Cyclical dependencies are removed; a cyclic dependency is simply something that depends
- n one thing, where that one thing is either directly in indirectly dependent on itself
4NF Example
applications(ssn, college, hobby)
Functional dependencies? Candidate keys? BCNF? Good design?
1 person applies to 5 colleges and has 6 hobbies, we’d need 30 tuples to capture all that information… not efficient
4NF Example
applications(ssn, college, hobby)
M = {ssn college, ssn hobby}
ssn college hobby t 123456789 CMU chess u 123456789 MIT fishing
4NF Example
applications(ssn, college, hobby)
M = {ssn college, ssn hobby}
ssn college hobby t 123456789 CMU chess u 123456789 MIT fishing v 123456789 CMU fishing w 123456789 MIT chess
4NF Example
applications(ssn, college, hobby)
M = {ssn college, ssn hobby}
student_colleges(ssn, college) student_hobbies(ssn, hobby)
Relation R with MVDs is in 4NF if it is in BCNF and if for each nontrivial A B, A is a key.
4NF Example 2
applications(ssn, college, date, major, hobby)
M = {ssn, college, date major} F = {ssn, college → date}
Reveal hobbies selectively to colleges Apply once to each college May apply to multiple majors (different days)
4NF Example 2
applications(ssn, college, date, major, hobby)
M = {ssn, college, date major} F = {ssn, college → date}
a1(ssn, college, date, major) a2(ssn, college, date, hobby)
4NF Example 2
applications(ssn, college, date, major, hobby)
M = {ssn, college, date major} F = {ssn, college → date}
a1(ssn, college, date, major) a2(ssn, college, date, hobby) a11(ssn, college, date) a12(ssn, college, major) a22(ssn, college, hobby)
Worshipping at the altar of XNF
- Those who prescribe that all good database designs must be in at least XNF
- Normalization is a principle, not a law
“Normalization theory is a useful aid in the process, but it is not a panacea; anyone designing a database is certainly advised to be familiar with the basic techniques of normalization...but we do not mean to suggest that the design should necessarily be based on normalization principles alone.”
(C.J. Date, 1990)
- How do we go about sensibly denormalizing?
Denormalizing with compound fields
- Some fields are routinely combined (e.g., first_name, last_name;
city, state)
- Some designers choose to create additional field like ‘full_name’ or
‘city_state’ in these cases
- Sorting on two indexed fields can take up to 20x longer than sorting on one
indexed field
- Can also be done for ease of reporting / simpler queries
Denormalizing with summary fields
- Summary field is a field in a ‘one’ table record whose value is based on data
in ‘related-many’ table records
- Example:
items (item_id, name, unit_price)
- rders (order_id, customer_id, date)
- rder_items (order_id, item_id, quantity)
Denormalizing with summary fields
- Summary field is a field in a ‘one’ table record whose value is based on data
in ‘related-many’ table records
- Example:
items (item_id, name, unit_price)
- rders (order_id, customer_id, date, total_sale)
- rder_items (order_id, item_id, quantity)
- Benefits
- Summary tables in some cases where there is lots of data to be summarized
Handling denormalization responsibly
- Benefits should be tangible and significant
- Good documentation on any decisions to denormalize
- Use triggers and constraints to help protect the database internally