Relational Model Gaps between Data and DB Design Conceptually, - - PDF document

relational model gaps between data and db design
SMART_READER_LITE
LIVE PREVIEW

Relational Model Gaps between Data and DB Design Conceptually, - - PDF document

Relational Model Gaps between Data and DB Design Conceptually, what is a relational database? How to rewrite a query so that it can be evaluated correctly and efficiently? How can an ER design be reduced to a relational


slide-1
SLIDE 1

Relational Model

slide-2
SLIDE 2

CMPT 354: Database I -- Relational Model 2

Gaps between Data and DB Design

  • Conceptually, what is a relational database?

– How to rewrite a query so that it can be evaluated correctly and efficiently?

  • How can an ER design be reduced to a

relational implementation?

– In ER design, we have entity sets and relationship sets – In a relational database, we have only tables – How to fill the gap?

slide-3
SLIDE 3

CMPT 354: Database I -- Relational Model 3

Example of a Relation

Attribute Tuple/record

slide-4
SLIDE 4

CMPT 354: Database I -- Relational Model 4

Relation as a Math Structure

  • Formally, given sets D1, D2, …. Dn, a relation r is a

subset of D1 x D2 x … x Dn

– A relation is a set of n-tuples {(a1, a2, …, an)} where each ai ∈ Di customer_name = {Jones, Smith, Curry, Lindsay} customer_street = {Main, North, Park} customer_city = {Harrison, Rye, Pittsfield} r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation over customer_name x customer_street x customer_city

slide-5
SLIDE 5

CMPT 354: Database I -- Relational Model 5

Attribute Types

  • Each attribute of a relation has a name
  • The set of allowed values for each attribute is

called the domain of the attribute

  • Attribute values are (normally) required to be

atomic; that is, indivisible

– Multivalued attribute values and composite attribute values are not atomic

  • The special value null is a member of every

domain

– We will ignore the effect of null values in our main presentation and consider their effect later

slide-6
SLIDE 6

CMPT 354: Database I -- Relational Model 6

Relation Schema

  • A1, A2, …, An are attributes, R = (A1, A2, …,

An) is a relation schema

– Customer_schema = (customer_name, customer_street, customer_city)

  • r(R) is a relation on the relation schema R

– customer (Customer_schema)

slide-7
SLIDE 7

CMPT 354: Database I -- Relational Model 7

Relation Instance

  • The current values (relation instance) of a

relation are specified by a table

  • An element t of r is a tuple, represented by a

row in a table

Jones Smith Curry Lindsay customer_name Main North North Park customer_street Harrison Rye Rye Pittsfield customer_city customer attributes (or columns) tuples (or rows)

slide-8
SLIDE 8

CMPT 354: Database I -- Relational Model 8

Relations are Unordered

  • Order of tuples is irrelevant (tuples may be

stored in an arbitrary order)

– account relation with unordered tuples

slide-9
SLIDE 9

CMPT 354: Database I -- Relational Model 9

Relational Databases

  • A database consists of multiple relations
  • Information about an enterprise is broken down

into parts, with each relation storing one piece of the information

– account: stores information about accounts – depositor: stores information about which customer

  • wns which account

– customer: stores information about customers

slide-10
SLIDE 10

CMPT 354: Database I -- Relational Model 10

Universal Table

  • Storing all information as a single relation such as

bank(account_number, balance, customer_name, …)

  • Advantages: an easy starting point
  • Disadvantages:

– Repetition/redundancy of information (e.g., if two customers share an account, the balance is repeated in the two tuples) – The need for null values (e.g., to represent a customer without an account)

slide-11
SLIDE 11

CMPT 354: Database I -- Relational Model 11

Relation Examples

slide-12
SLIDE 12

CMPT 354: Database I -- Relational Model 12

Keys

  • Let K ⊆ R. K is a superkey of R if values for K are

sufficient to identify a unique tuple of each possible relation r(R)

– {customer_name, customer_street} and {customer_name} are both superkeys of Customer, if no two customers can possibly have the same name

  • K is a candidate key if K is minimal

– {customer_name} is a candidate key for Customer

  • Primary Key: a candidate key chosen by the

database designer as the principal means of identifying tuples within a relation

slide-13
SLIDE 13

CMPT 354: Database I -- Relational Model 13

Schema Diagram

  • Foreigh key: reference to the primary key of

another table

slide-14
SLIDE 14

CMPT 354: Database I -- Relational Model 14

Query Languages

  • Languages in which users request information

from the database

  • Categories of languages

– Procedural: specifying how to find the answers – Non-procedural, or declarative: only specifying what should be the answers, but not how to find them

  • “Pure” languages: form underlying basis of query

languages that commercial systems use

– Relational algebra (procedural) – Tuple relational calculus (non-procedural) – Domain relational calculus (non-procedural)

slide-15
SLIDE 15

CMPT 354: Database I -- Relational Model 15

Relational Algebra

  • A procedural language
  • Six basic operators

– select: σ – project: ∏ – union: ∪ – set difference: – – Cartesian product: x – rename: ρ

  • The operators take one or two relations as inputs

and produce a new relation as the result

slide-16
SLIDE 16

CMPT 354: Database I -- Relational Model 16

Select Operation – Example

  • σA=B ^ D > 5 (r)

A B C D α α β β α β β β 1 5 12 23 7 7 3 10 A B C D α β α β 1 23 7 10

slide-17
SLIDE 17

CMPT 354: Database I -- Relational Model 17

Select Operation – Definition

  • σp(r), p is called the selection predicate

– σp(r) = {t | t ∈ r and p(t)} – p is a formula in propositional calculus consisting of terms connected by : ∧ (and), ∨ (or), ¬ (not) – Each term is one of: <attribute> op <attribute>

  • r <constant>, where op is one of: =, ≠, >, ≥. <.

  • Example: σbranch_name=“Perryridge”(account)
slide-18
SLIDE 18

CMPT 354: Database I -- Relational Model 18

Project Operation – Example

  • ∏A,C(r)

A B C α α β β 10 20 30 40 1 1 1 2 A C α α β β 1 1 1 2 = A C α β β 1 1 2

slide-19
SLIDE 19

CMPT 354: Database I -- Relational Model 19

Project Operation – Definition

  • , where A1, A2 are attribute names

and r is a relation name

– The result is defined as the relation of k columns obtained by erasing the columns that are not listed – Duplicate rows removed from result, since relations are sets

  • Example: To eliminate the branch_name

attribute of account

– ∏account_number, balance (account)

) (

, , ,

2 1

r

k

A A A K

slide-20
SLIDE 20

CMPT 354: Database I -- Relational Model 20

Union Operation – Example

A B α α β 1 2 1 r A B α β 2 3 s A B α α β β 1 2 1 3 r ∪ s

slide-21
SLIDE 21

CMPT 354: Database I -- Relational Model 21

Union Operation – Definition

  • r ∪ s = {t | t ∈ r or t ∈ s}

– r and s must have the same arity (same number of attributes) – The attribute domains must be compatible (e.g., the 2nd column of r deals with the same type of values as does the 2nd column of s)

  • Example: to find all customers with either an

account or a loan

– ∏customer_name(depositor) ∪ ∏customer_name(borrower)

slide-22
SLIDE 22

CMPT 354: Database I -- Relational Model 22

Set Difference Operation – Example

A B α α β 1 2 1 r A B α β 2 3 s A B α β 1 1 r – s

slide-23
SLIDE 23

CMPT 354: Database I -- Relational Model 23

Set Difference Operation – Definition

  • r – s = {t | t ∈ r and t ∉ s}
  • Set differences must be taken between

compatible relations

– r and s must have the same arity – Attribute domains of r and s must be compatible

slide-24
SLIDE 24

CMPT 354: Database I -- Relational Model 24

Cartesian-Product Operation – Example

A B α β 1 2 r C D α β β γ 10 10 20 10 E a a b b s A B α α α α β β β β 1 1 1 1 2 2 2 2 C D α β β γ α β β γ 10 10 20 10 10 10 20 10 E a a b b a a b b r x s

slide-25
SLIDE 25

CMPT 354: Database I -- Relational Model 25

Cartesian-Product Operation – Definition

  • r x s = {t q | t ∈ r and q ∈ s}

– Attributes of r(R) and s(S) are disjoint. (That is, R ∩ S = ∅)

  • If attributes of r(R) and s(S) are not disjoint,

then renaming must be used

slide-26
SLIDE 26

CMPT 354: Database I -- Relational Model 26

Summary

  • Relational model

– Representing data in relations

  • Relational algebra – mathematical

foundation for SQL

  • Basic operations in relational algebra
slide-27
SLIDE 27

CMPT 354: Database I -- Relational Model 27

To-Do List

  • Can you translate the relational algebra

examples into SQL? What can you

  • bserve?