Homework review Below is a table containing information about people - - PowerPoint PPT Presentation

homework review
SMART_READER_LITE
LIVE PREVIEW

Homework review Below is a table containing information about people - - PowerPoint PPT Presentation

Normalization (part 2) Professor Larry Heimann Carnegie Mellon University Information Systems Program Homework review Below is a table containing information about people who registered for events at a community center: (first_name, last_name,


slide-1
SLIDE 1

Normalization (part 2)

Professor Larry Heimann Carnegie Mellon University Information Systems Program

slide-2
SLIDE 2

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

a) Identify all the functional dependencies b) Identify the table's normal form c) Convert the table into a database with all tables in 3NF

slide-3
SLIDE 3

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

a) Identify all the functional dependencies F = {(first_name, last_name) → email, home_telephone, work_telephone, spouse_name, child_name_i; event_name → event_date, event_price}

slide-4
SLIDE 4

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

b) Identify the table's normal form The repeated attributes child_name_(1,2,3) make it clear that this is 0NF . One could also argue that lacking a legitimate primary key also places this as 0NF , but we’ll accept (first_name, last_name) as a primary key in this case.

slide-5
SLIDE 5

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

c) Convert the table into a database with all tables in 3NF

people(person_id, first_name, last_name, email, home_telephone, work_telephone, spouse_name) children(child_id, person_id, child_name) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)

slide-6
SLIDE 6

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

c) Convert the table into a database with all tables in 3NF

people(person_id, first_name, last_name, email, home_telephone, work_telephone, spouse_name) children(child_id, person_id, child_name) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)

slide-7
SLIDE 7

Fact: just over half of adults today are married Implication: lots of nulls in the people table

slide-8
SLIDE 8

Homework review

Below is a table containing information about people who registered for events at a community center:

(first_name, last_name, email, home_telephone, work_telephone, event_name, event_date, event_price, spouse_name, child_name_1, child_name_2, child_name_3)

c) Convert the table into a database with all tables in 3NF (revised)

people(person_id, first_name, last_name, email, home_telephone, work_telephone) relatives(relative_id, person_id, name, relation) events(event_id, event_name, event_date, event_price) person_events(event_id, person_id)

slide-9
SLIDE 9

Asides

Some issues dealing with households:

  • Different kinds of relations
  • People switching households over time

Patterns for dealing with household membership Candidate keys and primary keys — some better than others

slide-10
SLIDE 10

Homework review

Below is a table that a dentist users to keep track of her work on her patients' teeth:

(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)

a) Identify all the functional dependencies b) Identify the table's normal form c) Convert the table into a database with all tables in 3NF

slide-11
SLIDE 11

Homework review

Below is a table that a dentist users to keep track of her work on her patients' teeth:

(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)

a) Identify all the functional dependencies F = { (first_name, last_name) → patient_insurance_number,

amount_currently_owed; insurance_company_name → insurance_company_phone_number; (first_name, last_name), visit_date → which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed }

slide-12
SLIDE 12

Homework review

Below is a table that a dentist users to keep track of her work on her patients' teeth:

(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)

b) Identify the table's normal form As in the first problem, we could pick on the (first_name, last_name) primary key and say it’s 0NF , but we won’t. No repeated elements and no partial dependencies, so we are at least in 2NF . However, there are transitive dependencies, so can’t move onto 3NF .

slide-13
SLIDE 13

Homework review

Below is a table that a dentist users to keep track of her work on her patients' teeth:

(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)

c) Convert the table into a database with all tables in 3NF

patients(patient_id, first_name, last_name, insurance_id, patient_insurance_number, amount_currently_owed) insurances(insurance_id, company_name, company_phone_number) visits(visit_id, patient_id, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed)

slide-14
SLIDE 14
slide-15
SLIDE 15

Homework review

Below is a table that a dentist users to keep track of her work on her patients' teeth:

(first_name, last_name, insurance_company_name, patient_insurance_number, insurance_company_phone_number, visit_date, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed, amount_currently_owed)

c) Convert the table into a database with all tables in 3NF (revised)

patients(patient_id, first_name, last_name, amount_currently_owed) patient_insurances(patient_id, insurance_id, patient_insurance_number) insurances(insurance_id, company_name, company_phone_number) visits(visit_id, patient_id, which_tooth, symptoms, diagnosis, treatment_notes, material_used, fee, amount_billed)

slide-16
SLIDE 16

Homework review

Given a relation (table) R = (A, B, C, D, E) with F = {A → B, E → C, DE → A} a) what are all the candidate keys in this case?

  • - not A because it is determined by {C E}, not B because it is

determined by A, not C because it is determined by E

  • - not D alone cannot be the key because you need E, as well

to determine A ( and the same goes for why not E alone)

  • - only candidate key is {D E} in this case
slide-17
SLIDE 17

Class exercise

Given a relation (table) R = (A, B, C, D, E) with F = {A → B, E → C, DE → A} c) decompose the relation into additional relations until it is in 3NF

  • - Relation is 1NF because there is a composite primary key {D E}

but there is also a partial dependency {E → C}

  • - remove the partial dependency by decomposing to

R1 = (C, E) ; R2 = (A, B, D, E)

  • - remove the transitive dependency by decomposing further to

R1 = (C, E) ; R21 = (A, D, E) ; R22 = (A, B)

slide-18
SLIDE 18

Homework review

Given a relation (table) R = (A, B, C, D, E) with F = {BC → ADE, D → B} a) Both {BC} and {CD} are candidate keys b) The relation R has no partial or transitive dependencies, so it is 3NF

  • c) Trick question; not automatically BCNF because two candidate keys,

but they are overlapping composite keys. This is a rare edge-case that I personally have never seen in real life and would never ask on an exam.

slide-19
SLIDE 19

Dealing with World Bank data

  • Looking at the raw data…
  • Lessons
  • Practical database design is not just looking functional dependencies of

the fields, but looking at the data that would fill the fields.

  • Are repeating elements? Don’t forget that repeats can be the result of

time.

  • Are there null values that we could prudently avoid? (Remember that some

nulls are normal and inevitable — we can’t fear them — but if the design could minimize them, all the better)

  • Do we have primary keys that make sense and easy to identify?
slide-20
SLIDE 20

Moving on …

slide-21
SLIDE 21

Beyond 3NF

  • Introducing multi-valued dependencies

In BCNF and below, everything revolved around functional dependencies A multi-valued dependency is one where certain tuples are required to exist in a relation; sometimes referred to as tuple-generating dependencies.

  • Fourth Normal Form (4NF)

Multi-valued dependencies must be transformed into functional dependencies; implies that

  • ne value and not multiple values are dependent on a primary key

Eliminate multiple sets of multiple valued or multi-valued dependencies, sometimes described as non-trivial multi-valued dependencies

  • Fifth Normal Form (5NF)

Cyclical dependencies are removed; a cyclic dependency is simply something that depends

  • n one thing, where that one thing is either directly in indirectly dependent on itself
slide-22
SLIDE 22

4NF Example

applications(ssn, college, hobby)

Functional dependencies? Candidate keys? BCNF? Good design?

1 person applies to 5 colleges and has 6 hobbies, we’d need 30 tuples to capture all that information… not efficient

slide-23
SLIDE 23

4NF Example

applications(ssn, college, hobby)

M = {ssn college, ssn hobby}

ssn college hobby t 123456789 CMU chess u 123456789 MIT fishing

slide-24
SLIDE 24

4NF Example

applications(ssn, college, hobby)

M = {ssn college, ssn hobby}

ssn college hobby t 123456789 CMU chess u 123456789 MIT fishing v 123456789 CMU fishing w 123456789 MIT chess

slide-25
SLIDE 25

4NF Example

applications(ssn, college, hobby)

M = {ssn college, ssn hobby}

student_colleges(ssn, college) student_hobbies(ssn, hobby)

Relation R with MVDs is in 4NF if it is in BCNF and if for each nontrivial A B, A is a key.

slide-26
SLIDE 26

4NF Example 2

applications(ssn, college, date, major, hobby)

M = {ssn, college, date major} F = {ssn, college → date}

Reveal hobbies selectively to colleges Apply once to each college May apply to multiple majors (different days)

slide-27
SLIDE 27

4NF Example 2

applications(ssn, college, date, major, hobby)

M = {ssn, college, date major} F = {ssn, college → date}

a1(ssn, college, date, major) a2(ssn, college, date, hobby)

slide-28
SLIDE 28

4NF Example 2

applications(ssn, college, date, major, hobby)

M = {ssn, college, date major} F = {ssn, college → date}

a1(ssn, college, date, major) a2(ssn, college, date, hobby) a11(ssn, college, date) a12(ssn, college, major) a22(ssn, college, hobby)

slide-29
SLIDE 29

Worshipping at the altar of XNF

  • Those who prescribe that all good database designs must be in at least XNF
  • Normalization is a principle, not a law

“Normalization theory is a useful aid in the process, but it is not a panacea; anyone designing a database is certainly advised to be familiar with the basic techniques of normalization...but we do not mean to suggest that the design should necessarily be based on normalization principles alone.”

(C.J. Date, 1990)

  • How do we go about sensibly denormalizing?
slide-30
SLIDE 30

Denormalizing with compound fields

  • Some fields are routinely combined (e.g., first_name, last_name;

city, state)

  • Some designers choose to create additional field like ‘full_name’ or

‘city_state’ in these cases

  • Sorting on two indexed fields can take up to 20x longer than sorting on one

indexed field

  • Can also be done for ease of reporting / simpler queries
slide-31
SLIDE 31

Denormalizing with summary fields

  • Summary field is a field in a ‘one’ table record whose value is based on data

in ‘related-many’ table records

  • Example:

items (item_id, name, unit_price)

  • rders (order_id, customer_id, date)
  • rder_items (order_id, item_id, quantity)
slide-32
SLIDE 32

Denormalizing with summary fields

  • Summary field is a field in a ‘one’ table record whose value is based on data

in ‘related-many’ table records

  • Example:

items (item_id, name, unit_price)

  • rders (order_id, customer_id, date, total_sale)
  • rder_items (order_id, item_id, quantity)
  • Benefits
  • Summary tables in some cases where there is lots of data to be summarized
slide-33
SLIDE 33

Handling denormalization responsibly

  • Benefits should be tangible and significant
  • Good documentation on any decisions to denormalize
  • Use triggers and constraints to help protect the database internally