Relational Model Gaps between Data and DB Design Conceptually, - - PDF document
Relational Model Gaps between Data and DB Design Conceptually, - - PDF document
Relational Model Gaps between Data and DB Design Conceptually, what is a relational database? How to rewrite a query so that it can be evaluated correctly and efficiently? How can an ER design be reduced to a relational
CMPT 354: Database I -- Relational Model 2
Gaps between Data and DB Design
- Conceptually, what is a relational database?
– How to rewrite a query so that it can be evaluated correctly and efficiently?
- How can an ER design be reduced to a
relational implementation?
– In ER design, we have entity sets and relationship sets – In a relational database, we have only tables – How to fill the gap?
CMPT 354: Database I -- Relational Model 3
Example of a Relation
Attribute Tuple/record
CMPT 354: Database I -- Relational Model 4
Relation as a Math Structure
- Formally, given sets D1, D2, …. Dn, a relation r is a
subset of D1 x D2 x … x Dn
– A relation is a set of n-tuples {(a1, a2, …, an)} where each ai ∈ Di customer_name = {Jones, Smith, Curry, Lindsay} customer_street = {Main, North, Park} customer_city = {Harrison, Rye, Pittsfield} r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation over customer_name x customer_street x customer_city
CMPT 354: Database I -- Relational Model 5
Attribute Types
- Each attribute of a relation has a name
- The set of allowed values for each attribute is
called the domain of the attribute
- Attribute values are (normally) required to be
atomic; that is, indivisible
– Multivalued attribute values and composite attribute values are not atomic
- The special value null is a member of every
domain
– We will ignore the effect of null values in our main presentation and consider their effect later
CMPT 354: Database I -- Relational Model 6
Relation Schema
- A1, A2, …, An are attributes, R = (A1, A2, …,
An) is a relation schema
– Customer_schema = (customer_name, customer_street, customer_city)
- r(R) is a relation on the relation schema R
– customer (Customer_schema)
CMPT 354: Database I -- Relational Model 7
Relation Instance
- The current values (relation instance) of a
relation are specified by a table
- An element t of r is a tuple, represented by a
row in a table
Jones Smith Curry Lindsay customer_name Main North North Park customer_street Harrison Rye Rye Pittsfield customer_city customer attributes (or columns) tuples (or rows)
CMPT 354: Database I -- Relational Model 8
Relations are Unordered
- Order of tuples is irrelevant (tuples may be
stored in an arbitrary order)
– account relation with unordered tuples
CMPT 354: Database I -- Relational Model 9
Relational Databases
- A database consists of multiple relations
- Information about an enterprise is broken down
into parts, with each relation storing one piece of the information
– account: stores information about accounts – depositor: stores information about which customer
- wns which account
– customer: stores information about customers
CMPT 354: Database I -- Relational Model 10
Universal Table
- Storing all information as a single relation such as
bank(account_number, balance, customer_name, …)
- Advantages: an easy starting point
- Disadvantages:
– Repetition/redundancy of information (e.g., if two customers share an account, the balance is repeated in the two tuples) – The need for null values (e.g., to represent a customer without an account)
CMPT 354: Database I -- Relational Model 11
Relation Examples
CMPT 354: Database I -- Relational Model 12
Keys
- Let K ⊆ R. K is a superkey of R if values for K are
sufficient to identify a unique tuple of each possible relation r(R)
– {customer_name, customer_street} and {customer_name} are both superkeys of Customer, if no two customers can possibly have the same name
- K is a candidate key if K is minimal
– {customer_name} is a candidate key for Customer
- Primary Key: a candidate key chosen by the
database designer as the principal means of identifying tuples within a relation
CMPT 354: Database I -- Relational Model 13
Schema Diagram
- Foreigh key: reference to the primary key of
another table
CMPT 354: Database I -- Relational Model 14
Query Languages
- Languages in which users request information
from the database
- Categories of languages
– Procedural: specifying how to find the answers – Non-procedural, or declarative: only specifying what should be the answers, but not how to find them
- “Pure” languages: form underlying basis of query
languages that commercial systems use
– Relational algebra (procedural) – Tuple relational calculus (non-procedural) – Domain relational calculus (non-procedural)
CMPT 354: Database I -- Relational Model 15
Relational Algebra
- A procedural language
- Six basic operators
– select: σ – project: ∏ – union: ∪ – set difference: – – Cartesian product: x – rename: ρ
- The operators take one or two relations as inputs
and produce a new relation as the result
CMPT 354: Database I -- Relational Model 16
Select Operation – Example
- σA=B ^ D > 5 (r)
A B C D α α β β α β β β 1 5 12 23 7 7 3 10 A B C D α β α β 1 23 7 10
CMPT 354: Database I -- Relational Model 17
Select Operation – Definition
- σp(r), p is called the selection predicate
– σp(r) = {t | t ∈ r and p(t)} – p is a formula in propositional calculus consisting of terms connected by : ∧ (and), ∨ (or), ¬ (not) – Each term is one of: <attribute> op <attribute>
- r <constant>, where op is one of: =, ≠, >, ≥. <.
≤
- Example: σbranch_name=“Perryridge”(account)
CMPT 354: Database I -- Relational Model 18
Project Operation – Example
- ∏A,C(r)
A B C α α β β 10 20 30 40 1 1 1 2 A C α α β β 1 1 1 2 = A C α β β 1 1 2
CMPT 354: Database I -- Relational Model 19
Project Operation – Definition
- , where A1, A2 are attribute names
and r is a relation name
– The result is defined as the relation of k columns obtained by erasing the columns that are not listed – Duplicate rows removed from result, since relations are sets
- Example: To eliminate the branch_name
attribute of account
– ∏account_number, balance (account)
) (
, , ,
2 1
r
k
A A A K
∏
CMPT 354: Database I -- Relational Model 20
Union Operation – Example
A B α α β 1 2 1 r A B α β 2 3 s A B α α β β 1 2 1 3 r ∪ s
CMPT 354: Database I -- Relational Model 21
Union Operation – Definition
- r ∪ s = {t | t ∈ r or t ∈ s}
– r and s must have the same arity (same number of attributes) – The attribute domains must be compatible (e.g., the 2nd column of r deals with the same type of values as does the 2nd column of s)
- Example: to find all customers with either an
account or a loan
– ∏customer_name(depositor) ∪ ∏customer_name(borrower)
CMPT 354: Database I -- Relational Model 22
Set Difference Operation – Example
A B α α β 1 2 1 r A B α β 2 3 s A B α β 1 1 r – s
CMPT 354: Database I -- Relational Model 23
Set Difference Operation – Definition
- r – s = {t | t ∈ r and t ∉ s}
- Set differences must be taken between
compatible relations
– r and s must have the same arity – Attribute domains of r and s must be compatible
CMPT 354: Database I -- Relational Model 24
Cartesian-Product Operation – Example
A B α β 1 2 r C D α β β γ 10 10 20 10 E a a b b s A B α α α α β β β β 1 1 1 1 2 2 2 2 C D α β β γ α β β γ 10 10 20 10 10 10 20 10 E a a b b a a b b r x s
CMPT 354: Database I -- Relational Model 25
Cartesian-Product Operation – Definition
- r x s = {t q | t ∈ r and q ∈ s}
– Attributes of r(R) and s(S) are disjoint. (That is, R ∩ S = ∅)
- If attributes of r(R) and s(S) are not disjoint,
then renaming must be used
CMPT 354: Database I -- Relational Model 26
Summary
- Relational model
– Representing data in relations
- Relational algebra – mathematical
foundation for SQL
- Basic operations in relational algebra
CMPT 354: Database I -- Relational Model 27
To-Do List
- Can you translate the relational algebra
examples into SQL? What can you
- bserve?