Why Is This Important? Many ways to model a given scenario in a - PDF document

Why Is This Important?  Many ways to model a given scenario in a database Schema Refinement and  How do we find the best one?  We will discuss objective criteria for evaluating Normal Forms database design quality  Formally define desired properties Chapter 19  Algorithms for determining if a database has these properties  Algorithms for fixing problems 1 2 The Evils of Redundancy Functional Dependencies (FDs)  A functional dependency X  Y holds over relation R  Redundancy is at the root of several problems associated with relational schemas: if, for every allowable instance r of R:  Redundant storage  t1  r, t2  r,  X (t1) =  X (t2) implies  Y (t1) =  Y (t2)  Insert, delete, update anomalies  I.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.)  Integrity constraints can be used to identify schemas  An FD is a statement about all allowable relations. with such problems and to suggest refinements.  Must be identified based on semantics of application.  Main refinement technique: decomposition  Given some allowable instance r1 of R, we can check if it  Replacing ABCD with, say, AB and BCD, or ACD and ABD. violates some FD f, but we cannot tell if f holds over R.  Decomposition should be used judiciously:  K is a candidate key for R means that K  R  Is there reason to decompose a relation?  However, K  R does not require K to be minimal.  What problems (if any) does the decomposition cause? 3 4 Wages R W 8 10 Example: Constraints on Entity Set Example (Contd.) 5 7 Hourly_Emps2 S N L R H Are the two smaller  Consider a relation obtained from Hourly_Emps: 123-22-3666 Attishoo 48 8 40 tables better?  Hourly_Emps (ssn, name, lot, rating, hrly_wages, 231-31-5368 Smiley 22 8 30  Problems in single “wide” hrs_worked) table due to R  W: 131-24-3650 Smethurst 35 5 30  Notation: We will denote this relation schema by  Update anomaly: Can we 434-26-3751 Guldu 35 5 32 listing the attributes: SNLRWH change W in just the first 612-67-4134 Madayan 35 8 40 tuple of SNLRWH?  This is really the set of attributes {S,N,L,R,W,H}.  Insertion anomaly: What S N L R W H if we want to insert an  Sometimes, we will refer to all attributes of a relation by employee and don’t know 123-22-3666 Attishoo 48 8 10 40 using the relation name. (e.g., Hourly_Emps for SNLRWH) the hourly wage for his rating? 231-31-5368 Smiley 22 8 10 30  Some FDs on Hourly_Emps:  Deletion anomaly: If we 131-24-3650 Smethurst 35 5 7 30 delete all employees with  ssn is the key: S  SNLRWH rating 5, we lose the 434-26-3751 Guldu 35 5 7 32  rating determines hrly_wages: R  W information about the wage for rating 5. 612-67-4134 Madayan 35 8 10 40 5 6

Reasoning About FDs Reasoning About FDs (Contd.)  Given some FDs, we can infer additional FDs:  Additional rules (that follow from the AA):  ssn  did, did  lot implies ssn  lot  Union: If X  Y and X  Z, then X  YZ  Decomposition: If X  YZ, then X  Y and X  Z  An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.  Example: Contracts(cid, sid, jid, did, pid, qty, value) and:  F + = closure of F; is the set of all FDs that are implied by F.  C is the key: C  CSJDPQV  Project purchases each part using single contract: JP  C  Armstrong’s Axioms (X, Y, Z are sets of attributes):  Dept purchases at most one part from a supplier: SD  P  Reflexivity: If X  Y, then Y  X.  JP  C, C  CSJDPQV imply JP  CSJDPQV  Augmentation: If X  Y, then XZ  YZ for any Z.  Transitivity: If X  Y and Y  Z, then X  Z.  SD  P implies SDJ  JP  These are sound (generate only FDs in F + ) and complete  SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV (generate all FDs in F + ) inference rules for FDs. 7 8 Reasoning About FDs (Contd.) So, What Do We Do Now With FDs?  Computing the closure of a set of FDs can be expensive.  Essential for identifying problems in a database  Size of closure is exponential in # attributes design  Typically, we just want to check if a given FD X  Y is in  Provide a way for “fixing” the problem the closure of a set of FDs F. An efficient algorithm:  Compute attribute closure of X (denoted X + ) wrt F:  Key concept: normal forms • Set of all attributes A such that X  A is in F+  A relation that is in a certain normal form has certain • There is a linear time algorithm to compute this.  Check if Y is in X + desirable properties  Does F = {A  B, B  C, CD  E} imply A  E?  I.e, is A  E in the closure F + ? Equivalently, is E in A+? 9 10 Normal Forms Boyce-Codd Normal Form (BCNF)  Reln R with FDs F is in BCNF if, for all X  A in F +  Returning to the issue of schema refinement, the first  A  X (called a trivial FD), or question to ask is whether any refinement is needed.  X is a superkey for R.  If a relation is in a certain normal form (BCNF, 3NF etc.),  In other words, R is in BCNF if the only non-trivial FDs it is known that certain kinds of problems are avoided or that hold over R are key constraints. minimized.  R is free of any redundancy caused by FDs alone.  Helps deciding whether decomposing the relation will help. • No field of any tuple can be inferred (using only FDs) from the values in the other fields in the relation instance  Role of FDs in detecting redundancy: • For X  A, consider two tuples with the same X Y A  Consider a relation R with three attributes, ABC. X value. x y1 a • No FDs hold: There is no redundancy here. • They should have the same A value. Redundancy? • Given A  B: Several tuples could have the same A value, and if so, • No. Since R is in BCNF, X is a superkey and hence x y2 ? they all have the same B value. the “two” tuples must be identical. 11 12

Problems Prevented By BCNF Third Normal Form (3NF)  If BCNF is violated by (non-trivial) FD X  A, one of the  Reln R with FDs F is in 3NF if, for all X  A in F + following holds:  A  X (called a trivial FD), or  X is a subset of some key K.  X is a superkey for R, or • We store (X, A) pairs redundantly.  A is part of some key for R. • E.g., Reserves(S, B, D, C) with SBD as only key and FD S  C  Minimality of a key is crucial in third condition above. • Credit card number of a sailor stored for each reservation  If R is in BCNF, is it automatically in 3NF? What about the  X is not a proper subset of any key. other direction? • Redundant storage of (X, A) pairs as above • And there is a chain of FDs K  X  A, which means that we cannot  If R is in 3NF, some redundancy is possible. associate an X value with a K value unless we also associate an A value  3NF is a compromise, used when BCNF is not achievable (e.g., with an X value. no ``good’’ decomposition, or performance considerations). • E.g., Hourly_Emps(S, N, L, R, W, H) with S as only key and FD R  W  Lossless-join, dependency-preserving decomposition of R into a • Have chain S  R  W, hence cannot record the fact that employee S has rating R without knowing the hourly wage for that rating collection of 3NF relations is always possible. (covered soon) 13 14 What Does 3NF Achieve? Footnote About Other Normal Forms  Prevents same problems as BCNF, except for FDs where  1NF: every field contains only atomic values, i.e., no A is part of some key lists or sets  Consider FD X  A where X is no superkey, but A is part of some  2NF: 1NF, and all attributes that are not part of any key  E.g., Reserves(S, B, D, C) with only key SBD and FDs S  C and candidate key are functionally dependent on the C  S is in 3NF whole of every candidate key • Notice: same example as before, but adding C  S made it 3NF  3NF implies 2NF • Why? Since C  S and SBD is a key, CBD is also a key. Hence for S  C, C is part of a key  4NF: prevents redundancy from multi-valued • Redundancy problem: for each reservation of sailor S, same (S, C) pair is stored. dependencies (see book)  BCNF did not suffer from this redundancy problem.  5NF: addresses redundancy based on join  So, why do we need 3NF? Let’s look at decompositions dependencies, which generalize multi-valued first. dependencies (see book) 15 16 Example Decomposition Decomposition of a Relation Schema  Decompositions should be used only when needed.  Suppose relation R contains attributes A1,..., An. A  Let SNLRWH have FDs S  SNLRWH and R  W decomposition of R replaces R by two or more  Second FD causes violation of 3NF relations such that: • W values repeatedly associated with R values.  Each new relation schema contains a subset of the  Easiest fix: create a relation RW to store these associations attributes of R (and no attributes that do not appear in R), and remove W from the main schema: and • I.e., we decompose SNLRWH into SNLRH and RW  Every attribute of R appears as an attribute of at least one  Each SNLRWH tuple will now be projected into two of the new relations. tuples, SNLRH and RW, each stored in the  Intuition: decomposing R means we will store corresponding relation instances of the relation schemes produced by the  Are there any potential problems with this approach? decomposition, instead of instances of R. 17 18

Why Is This Important? Many ways to model a given scenario in a - PDF document

Why Is This Important? Many ways to model a given scenario in a database Schema Refinement and How do we find the best one? We will discuss objective criteria for evaluating Normal Forms database design quality Formally define

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Why is this important? Why is this important? Anyone know what song this is? Yes thats right,

Why is shipping important? CARGO TRINITY HOUSE // KEY STAGE 1 WHY IS SHIPPING IMPORTANT?

The Worlds First SMART LASER The Platform That Redefines the Therapy Laser Industry. Why is

Smart City Wonderful Life Guo

Why is there a price to pay? Why is there a price to pay? Why cant God just

Crises Important and urgent Goals and planning Important but not urgent Assess tasks using

Occupational Hygiene Why is it important? And why ignoring it is bad business? Julian Dowson

WHAT IS HEATING VALUE WHY IS IT IMPORTANT WHAT IS HEATING VALUE WHY IS IT IMPORTANT

Deborah Grove, Principal Grove-Associates Why Green IT is Important Why Green IT is Important

WHY IS MIGRATION SO IMPORTANT? Why migration? German National Team 2014 Why migration?

G 98 7 Artist Workshop Spring 2014 Why We Do What We Do What? Why? The Reason Why what we

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Important Concepts Important Concepts Some important concepts in financial and derivative

WELCOME Building Envelope Problems AGENDA WHY IMPORTANT CONDENSATION STONE

WELCOME VENTED and UNVENTED Roof/Deck Systems AGENDA WHY IMPORTANT CODE THEORY

Central Meson Production in ALICE ALICE detector Selection of central diffractive

PDG Java API Chuck McParland Computational Research Division Lawrence Berkeley National

Schema Refinement and Normal Forms [R&G] Chapter 19 CS4320 1 The Evils of Redundancy

On the Computation of Approximate Slow Invariant Manifolds Samuel Paolucci, Joseph M. Powers, and

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

Privacy & Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research

Levels in triangulated categories Srikanth Iyengar University of Nebraska, Lincoln Leeds, 18th

C C Charmonium: present and future ICPPA, October, 10, 2016, Moscow Charmonium in standard

Why Is This Important? Many ways to model a given scenario in a - PDF document

Why Is This Important? Many ways to model a given scenario in a database Schema Refinement and How do we find the best one? We will discuss objective criteria for evaluating Normal Forms database design quality Formally define

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Why is this important? Why is this important? Anyone know what song this is? Yes thats right,

Why is shipping important? CARGO TRINITY HOUSE // KEY STAGE 1 WHY IS SHIPPING IMPORTANT?

The Worlds First SMART LASER The Platform That Redefines the Therapy Laser Industry. Why is

Smart City Wonderful Life Guo

Why is there a price to pay? Why is there a price to pay? Why cant God just

Crises Important and urgent Goals and planning Important but not urgent Assess tasks using

Occupational Hygiene Why is it important? And why ignoring it is bad business? Julian Dowson

WHAT IS HEATING VALUE WHY IS IT IMPORTANT WHAT IS HEATING VALUE WHY IS IT IMPORTANT

Deborah Grove, Principal Grove-Associates Why Green IT is Important Why Green IT is Important

WHY IS MIGRATION SO IMPORTANT? Why migration? German National Team 2014 Why migration?

G 98 7 Artist Workshop Spring 2014 Why We Do What We Do What? Why? The Reason Why what we

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Important Concepts Important Concepts Some important concepts in financial and derivative

WELCOME Building Envelope Problems AGENDA WHY IMPORTANT CONDENSATION STONE

WELCOME VENTED and UNVENTED Roof/Deck Systems AGENDA WHY IMPORTANT CODE THEORY

Central Meson Production in ALICE ALICE detector Selection of central diffractive

PDG Java API Chuck McParland Computational Research Division Lawrence Berkeley National

Schema Refinement and Normal Forms [R&amp;G] Chapter 19 CS4320 1 The Evils of Redundancy

On the Computation of Approximate Slow Invariant Manifolds Samuel Paolucci, Joseph M. Powers, and

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research

Levels in triangulated categories Srikanth Iyengar University of Nebraska, Lincoln Leeds, 18th

C C Charmonium: present and future ICPPA, October, 10, 2016, Moscow Charmonium in standard

Schema Refinement and Normal Forms [R&G] Chapter 19 CS4320 1 The Evils of Redundancy

Privacy & Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research