The Relational Data Model Lecture 6 1 Outline Relational Data - - PDF document

the relational data model
SMART_READER_LITE
LIVE PREVIEW

The Relational Data Model Lecture 6 1 Outline Relational Data - - PDF document

The Relational Data Model Lecture 6 1 Outline Relational Data Model Functional Dependencies Logical Schema Design Reading Chapter 8 2 1 The Relational Data Model Relational Physical Data Schema storage Modeling Have


slide-1
SLIDE 1

1

1

The Relational Data Model

Lecture 6

2

Outline

  • Relational Data Model
  • Functional Dependencies
  • Logical Schema Design

Reading Chapter 8

slide-2
SLIDE 2

2

3

The Relational Data Model

Data Modeling Relational Schema Physical storage E/R diagrams Tables: column names: attributes rows: tuples Complex file organization and index structures.

Have seen this in SQL Have seen this too Discuss next

4

Terminology

Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Tuples or rows or records Attribute names Table name or relation name Products:

slide-3
SLIDE 3

3

5

Schemas

Relational Schema: – Relation name plus attribute names – E.g. Product(Name, Price, Category, Manufacturer) – In practice we add the domain for each attribute Database Schema – Set of relational schemas – E.g. Product(Name, Price, Category, Manufacturer), Company(Name, Address, Phone), . . . . . . .

6

Instances

  • Relational schema = R(A1,…,Ak)

Instance = relation with k attributes (of “type” R)

– values of corresponding domains

  • Database schema = R1(…), R2(…), …, Rn(…)

Instance = n relations, of types R1, R2, ..., Rn

This is all mathematics, not to be confused with SQL tables ! (What's a difference?)

slide-4
SLIDE 4

4

7

Example

Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Relational schema:Product(Name, Price, Category, Manufacturer) Instance:

8

First Normal Form (1NF)

  • A database schema is in First Normal

Form if all tables are flat

3.9 Carol 3.7 Bob 3.8 Alice Courses GPA Name

OS DB Math OS DB OS Math

Student

3.9 Carol 3.7 Bob 3.8 Alice GPA Name

Student

Course OS DB Math OS Carol OS Alice DB Bob Alice Carol Alice Student Course DB Math Math

Takes Course

May need to add keys

slide-5
SLIDE 5

5

9

Functional Dependencies

  • A form of constraint

– hence, part of the schema

  • Finding them is part of the database design
  • Also used in normalizing the relations
  • Warning: this is the most abstract, and

“hardest” part of the course.

10

Functional Dependencies

Definition: If two tuples agree on the attributes then they must also agree on the attributes Formally: A1, A2, …, An  B1, B2, …, Bm A1, A2, …, An B1, B2, …, Bm

slide-6
SLIDE 6

6

11

Examples

  • EmpID  Name, Phone, Position
  • Position  Phone
  • but Phone  Position

EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

12

In General

  • To check A  B, erase all other columns
  • check if the remaining relation is many-one

(called functional in mathematics)

… A … B X1 Y1 X2 Y2 … …

slide-7
SLIDE 7

7

13

Example

EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

Position  Phone

14

Typical Examples of FDs

Product: name  price, manufacturer Person: ssn  name, age Company: name  stockprice, president

slide-8
SLIDE 8

8

15

Example

Product(name, category, color, department, price) name  color category  department color, category  price Consider these FDs: What do they say ?

16

Example

FD’s are constraints on relations:

  • On some instances they hold
  • On others they don’t

99 Toys Green Gadget Tweaker 49 Toys Green Gadget Gizmo price department color category name

Does this instance satisfy all the FDs ? name  color category  department color, category  price

slide-9
SLIDE 9

9

17

Example

59 Office-supp. Green Stationary Gizmo 99 Toys Black Gadget Tweaker 49 Toys Green Gadget Gizmo price department color category name

What about this one ? name  color category  department color, category  price

18

Example

If some FDs are satisfied, then

  • thers are satisfied too

If all these FDs are true: name  color category  department color, category  price Then this FD also holds: name, category  price Why ??

slide-10
SLIDE 10

10

19

Inference Rules for FD’s

Is equivalent to

Splitting rule and Combining rule

Bm ... B1 Am ... A1

A1, A2, …, An  B1, B2, …, Bm A1, A2, …, An  B1 A1, A2, …, An  B2 . . . . . A1, A2, …, An  Bm

20

Inference Rules for FD’s (continued)

Trivial Rule Why ?

Am … A1

where i = 1, 2, ..., n A1, A2, …, An  Ai

slide-11
SLIDE 11

11

21

Inference Rules for FD’s (continued)

Transitive Closure Rule

If and then Why ? A1, A2, …, An  B1, B2, …, Bm B1, B2, …, Bm  C1, C2, …, Cp A1, A2, …, An  C1, C2, …, Cp

22 ... C1 Cp Bm … B1 Am … A1

slide-12
SLIDE 12

12

23

Example (continued)

Start from the following FDs: Infer the following FDs:

  • 1. name  color
  • 2. category  department
  • 3. color, category  price
  • 8. name, category  price
  • 7. name, category  color, category
  • 6. name, category  category
  • 5. name, category  color
  • 4. name, category  name

Which Rule did we apply ?

Inferred FD

24

Example (continued)

Answers:

Transitivity on 3, 7

  • 8. name, category  price

Split/combine on 5, 6

  • 7. name, category  color, category

Trivial rule

  • 6. name, category  category

Transitivity on 4, 1

  • 5. name, category  color

Trivial rule

  • 4. name, category  name

Which Rule did we apply ? Inferred FD

  • 1. name  color
  • 2. category  department
  • 3. color, category  price
slide-13
SLIDE 13

13

25

Another Example

  • Enrollment(student, major, course, room, time)

student  major major, course  room course  time What else can we infer ?

26

Another Rule

If then Augmentation follows from trivial rules and transitivity How ? A1, A2, …, An  B A1, A2, …, An , C1, C2, …, Cp  B

Augmentation

slide-14
SLIDE 14

14

27

Problem: infer ALL FDs

Given a set of FDs, infer all possible FDs How to proceed ?

  • Try all possible FDs, apply all 3 rules

– E.g. R(A, B, C, D): how many FDs are possible ?

  • Drop trivial FDs, drop augmented FDs

– Still way too many

  • Better: use the Closure Algorithm (next)

28

Closure of a set of Attributes

Given a set of attributes A1, …, An The closure, {A1, …, An}+ , is the set of attributes B s.t. A1, …, An  B name  color category  department color, category  price Example: Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

slide-15
SLIDE 15

15

29

Closure Algorithm

Start with X={A1, …, An}. Repeat until X doesn’t change do: if B1, …, Bn  C is a FD and B1, …, Bn are all in X then add C to X. {name, category}+ = {name, category, color, department, price} name  color category  department color, category  price Example:

30

Example

Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B

slide-16
SLIDE 16

16

31

Using Closure to Infer ALL FDs

A, B  C A, D  B B  D Example: Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y ⊆ X+ and X∩Y = ∅: AB  CD, ADBC, ABC  D, ABD  C, ACD  B

32

Problem: Finding FDs

  • Approach 1: During Database Design

– Designer derives them from real-world knowledge of users – Problem: knowledge might not be available

  • Approach 2: From a Database Instance

– Analyze given database instance and find all FD’s satisfied by that instance – Useful if designers don’t get enough information from users – Problem: FDs might be artifical for the given instance

slide-17
SLIDE 17

17

33

Find All FDs

020 Circuits EE Frank 045 DB CSE Elsa 050 Java CSE Dan 045 DB CSE Carol 040 HW EE Alice 020 C++ CSE Bob 020 C++ CSE Alice Room Course Dept Student

Do all FDs make sense in practice ?

34

Answer

Course  Dept, Room Dept, Room  Course Student, Dept  Course, Room Student, Course  Dept, Room Student, Room  Dept, Course Do all FDs make sense in practice ?

slide-18
SLIDE 18

18

35

Keys

  • A key is a set of attributes A1, ..., An s.t. for

any other attribute B, we have A1, ..., An  B

  • A minimal key is a set of attributes which is

a key and for which no subset is a key

  • Note: book calls them superkey and key

36

Computing Keys

  • Compute X+ for all sets X
  • If X+ = all attributes, then X is a key
  • List only the minimal keys

Note: there can be many minimal keys !

  • Example: R(A,B,C), ABC, BCA

Minimal keys: AB and BC

slide-19
SLIDE 19

19

37

Examples of Keys

  • Product(name, price, category, color)

name, category  price category  color Keys are: {name, category} and all supersets

  • Enrollment(student, address, course, room, time)

student  address room, time  course student, course  room, time

38

Relational Schema Design (or Logical Schema Design)

Main idea:

  • Start with some relational schema
  • Find out its FD’s
  • Use them to design a better relational

schema

slide-20
SLIDE 20

20

39

Data Anomalies

When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don’t want

40

Relational Schema Design

Anomalies:

  • Redundancy = repeat data
  • Update anomalies = Fred moves to “Bellevue”
  • Deletion anomalies = Joe deletes his phone number:

what is his city ? Example: Persons with several phones SSN  Name, City

Westfield 908-555-2121 987-65-4321 Joe Seattle 206-555-6543 123-45-6789 Fred Seattle 206-555-1234 123-45-6789 Fred City PhoneNumber SSN Name

but not SSN  PhoneNumber

slide-21
SLIDE 21

21

41

Relation Decomposition

Break the relation into two:

Westfield 987-65-4321 Joe Seattle 123-45-6789 Fred City SSN Name 908-555-2121 987-65-4321 206-555-6543 123-45-6789 206-555-1234 123-45-6789 PhoneNumber SSN

Anomalies have gone:

  • No more repeated data
  • Easy to move Fred to “Bellevue” (how ?)
  • Easy to delete all Joe’s phone number (how ?)

Westfield 908-555-2121 987-65-4321 Joe Seattle 206-555-6543 123-45-6789 Fred Seattle 206-555-1234 123-45-6789 Fred City PhoneNumber SSN Name

42

Relational Schema Design

Person buys Product name price name ssn

Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

slide-22
SLIDE 22

22

43

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

44

Decomposition

  • Sometimes it is correct:

Camera 19.99 Gizmo Camera 24.99 OneClick Gadget 19.99 Gizmo Category Price Name 19.99 Gizmo 24.99 OneClick 19.99 Gizmo Price Name Camera Gizmo Camera OneClick Gadget Gizmo Category Name

Lossless decomposition

slide-23
SLIDE 23

23

45

Incorrect Decomposition

  • Sometimes it is not:

Camera 19.99 Gizmo Camera 24.99 OneClick Gadget 19.99 Gizmo Category Price Name Camera Gizmo Camera OneClick Gadget Gizmo Category Name Camera 19.99 Camera 24.99 Gadget 19.99 Category Price

What’s incorrect ?? Lossy decomposition

46

Decompositions in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) If A1, ..., An  B1, ..., Bm Then the decomposition is lossless R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) Example: name  price, hence the first decomposition is lossless Note: don’t need necessarily A1, ..., An  C1, ..., Cp

slide-24
SLIDE 24

24

47

Normal Forms

First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

48

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. A relation R is in BCNF if: If A1, ..., An  B is a non-trivial dependency in R , then {A1, ..., An} is a key for R

slide-25
SLIDE 25

25

49

BCNF Decomposition Algorithm

A’s Others B’s R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A1, …, Am  B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2 Until no more violations R2

50

Example

What are the dependencies? SSN  Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF?

Westfield 908-555-1234 987-65-4321 Joe Westfield 908-555-2121 987-65-4321 Joe Seattle 206-555-6543 123-45-6789 Fred Seattle 206-555-1234 123-45-6789 Fred City PhoneNumber SSN Name

slide-26
SLIDE 26

26

51

Decompose it into BCNF

Westfield 987-65-4321 Joe Seattle 123-45-6789 Fred City SSN Name 908-555-1234 987-65-4321 908-555-2121 987-65-4321 206-555-6543 123-45-6789 206-555-1234 123-45-6789 PhoneNumber SSN

SSN  Name, City Let’s check anomalies:

  • Redundancy ?
  • Update ?
  • Delete ?

52

Summary of BCNF Decomposition

Find a dependency that violates the BCNF condition: A’s Others B’s R1 R2 Heuristics: choose B , B , … B “as large as possible”

1 2 m

Decompose: 2-attribute relations are BCNF Continue until there are no BCNF violations left. A1, A2, …, An  B1, B2, …, Bm

slide-27
SLIDE 27

27

53

Example Decomposition

Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S+, for various sets S) Step 2: now decompose

54

Other Example

  • R(A,B,C,D) A  B, B  C
  • Key: AD
  • Violations of BCNF: A  B, A C, ABC
  • Pick A BC: split into R1(A,BC) R2(A,D)
  • What happens if we pick A  B first ?
slide-28
SLIDE 28

28

55

Lossless Decompositions

A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R Decompose Recover

56

Lossless Decompositions

  • Given R(A,B,C) s.t. AB, the

decomposition into R1(A,B), R2(A,C) is lossless

slide-29
SLIDE 29

29

57

3NF: A Problem with BCNF

Unit Company Product Unit Company Unit Product FD’s: Unit → Company; Company, Product → Unit So, there is a BCNF violation, and we decompose. Unit → Company No FDs Notice: we loose the FD: Company, Product  Unit

58

So What’s the Problem?

Unit Company Product Unit Company Unit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again (anomalies?): Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

slide-30
SLIDE 30

30

59

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A1, A2, ..., An → B for R , then {A1, A2, ..., An } is a key for R,

  • r B is part of a key.

Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies