Review of the Relational Model 5DV120 Database System Principles - - PowerPoint PPT Presentation

review of the relational model
SMART_READER_LITE
LIVE PREVIEW

Review of the Relational Model 5DV120 Database System Principles - - PowerPoint PPT Presentation

Review of the Relational Model 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Review of the Relational Model 20130410 Slide 1 of 22


slide-1
SLIDE 1

Review of the Relational Model

5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner

Review of the Relational Model 20130410 Slide 1 of 22

slide-2
SLIDE 2

About these Slides

  • These slides are mostly adapted from those for 5DV119 by the same

instructor.

  • The adaptations use the schema of the Silberschatz-Korth-Sudarsham

textbook, rather than that of the Elmasri-Navathe textbook.

  • These slides will be used as a refresher/review, and will be covered much

more rapidly than in the introductory course.

  • They are intended for those who already know/knew SQL, but need a

quick refresher.

Review of the Relational Model 20130410 Slide 2 of 22

slide-3
SLIDE 3

Three Types of Information Systems

Information-retrieval systems (IR):

  • Search large bodies of information which are not specifically

formatted as formal data bases.

  • Web search engine
  • Keyword search of a text base
  • Typically read-only

Database management systems (DBMS):

  • Relatively small schema
  • Large body of homogeneous data
  • Minor or no deductive capability
  • Extensive formal update capability
  • Shared use for both read and write

Knowledge-base systems (KBS):

  • Relatively small body of heterogeneous information
  • Significant deductive capability
  • Typical use: support of an intelligent application

Review of the Relational Model 20130410 Slide 3 of 22

slide-4
SLIDE 4

The Evolution of Data Models

Model Development Use Properties Analogy File-management 1950’s - 1970’s 1950’s - Low-level interaction. No data independence. Assembly language Navigational 1950’s - 1960’s 1960’s - Some data independence, but the model invites dependence. Requires procedural queries. Procedural languages Relational 1970’s - Late 1980’s - Simple, easy to use for non-experts. Strong data independence. Standard nonprocedural query language (SQL). Excellent implementations exist. Limited expressive capability. Declarative languages Object-oriented 1980’s - 1990’s - Powerful expressive capability, but require substantial expertise for use. Popular in niche applications. Standardization not imminent. Object-oriented languages Object-relational 1980’s 1990’s - Attempt to integrate the simplicity of the relational model with the advanced features of the object-oriented approach. The most recent SQL standard, as well as many commercial systems, embody such features. ? Semi-structured 1990’s 2000’s - Attempt to integrate data management with markup languages, principally via XML. ? Review of the Relational Model 20130410 Slide 4 of 22

slide-5
SLIDE 5

Foci of the Course

Relational Model: 90%

  • Selected advanced topics in SQL
  • Systems aspects, including internal data models, query processing,

transaction models and implementation, recovery from failure Rationale: The relational model is still, by far, the dominant one used in practice. XML: 10%

  • The principles of semistructured data
  • Query languages for semistructured data, including XPath and

XQuery. ➳ The emphasis will be upon principles. ➳ Application programming will not be covered in detail.

Review of the Relational Model 20130410 Slide 5 of 22

slide-6
SLIDE 6

Multi-User Relational Database Systems

Open-Source Systems: PostgreSQL: The most comprehensive open-source relational DBMS. MySQL: A popular relational DBMS for small systems.

  • Widely used to support Web-based applications.

HyperSQL: An efficient DBMS written in Java.

  • The default DBMS bundled with OpenOffice.org

SQLite: A compact DBMS written in C.

  • The default DBMS bundled with LibreOffice.

The “big three” commercial relational DBMSs: Oracle Database: IBM DB2: Microsoft SQL Server: (Windows only!) Another commercial relational DBMS of interest: Mimer SQL: Oriented towards embedded systems; based in Uppsala.

  • All of the commercial systems listed above have “free” versions with

limited functionality.

  • All except SQL Server run on many platforms, including Linux.

Review of the Relational Model 20130410 Slide 6 of 22

slide-7
SLIDE 7

Single-User Relational Database Systems

Microsoft Access: The original PC DBMS for Windows.

  • Part of the Microsoft Office bundle.
  • It will cost you ✩✩✩, ➾➾➾, or SeKSeKSeK.
  • Even if you have a Dreamspark Premium account (formerly

MSDNAA), you cannot get MS Access for free.

  • Runs only under MS Windows, of course.
  • Support for SQL is not as extensive as in multi-user systems.
  • No real support for transactions.

Kexi: “Microsoft Access for Linux”

  • Built-in SQLite-based DB server.
  • Can also use other servers such as PostgreSQL and MySQL.
  • Not as mature a product as MS Access.
  • ... but it is open source and free (LGPL).
  • Can also be compiled for other systems.
  • A link to the Kexi Web site may be found on the course Web page.

Review of the Relational Model 20130410 Slide 7 of 22

slide-8
SLIDE 8

Database Systems to be Used in this Course

  • PostgreSQL will be used as the main relational system.
  • Both versions 8.4 and 9.1 will be available for use.
  • Students will receive at least one database for each.
  • If you have your own computer, it is easy to install under Linux, MS

Windows, and Mac OS.

  • Some pointers for installation under Linux will be given later in the

course.

  • You are free to (and encouraged to) try other relational DBMSs as well,

but they will not be used in the course.

  • The XML-based system exist-db will also be used.
  • More later.

Review of the Relational Model 20130410 Slide 8 of 22

slide-9
SLIDE 9

Some Remarks on Terminology and Pronunciation

Database vs. Database (Management) System:

  • A database is a (usually structured) collection of data.
  • A database system or database management system (DBMS) is a

system for managing databases.

  • In the popular literature, the word database is sometimes used as a

synonym for DBMS. ❉ Even the textbook does this!

  • This terminology is confusing and its use is to be discouraged.
  • MySQL and PostgreSQL are DBMSs, not databases!
  • Calling a DBMS a database is akin to calling JDK a Java program.

Pronunciation of SQL:

  • In research circles, it is usually pronounced as the three letters S-Q-L.
  • In trade groups, SQL is sometimes pronounced as See-Quel.
  • This can lead to confusion with the older language SEQUEL,

which also has that pronunciation.

Review of the Relational Model 20130410 Slide 9 of 22

slide-10
SLIDE 10

The Structure of the Relational Model

  • In the relational model, the data are stored in relations, or tables.
  • An example from the textbook is shown on the next three slides.
  • The relation names are shown with a blue background.
  • The attribute names are shown with a lime background.
  • The tuples are shown with a bisque background.
  • The primary keys are underlined.

Databases are sets: For each relation, the set of rows consists of distinct elements.

  • Duplicate rows are not allowed.
  • The order in which the rows are displayed is of no formal

consequence.

Review of the Relational Model 20130410 Slide 10 of 22

slide-11
SLIDE 11

The University Schema with a Small Database Instance – 1

dept name building budget Biology Watson 90000

  • Comp. Sci.

Taylor 100000

  • Elec. Eng.

Taylor 85000 Finance Painter 120000 History Painter 50000 Music Packard 80000 Physics Watson 70000

department

building room number capacity Packard 101 500 Painter 514 10 Taylor 3128 70 Watson 100 30 Watson 120 50

classroom

course id title dept name credits BIO-101

  • Intro. to Biology

Biology 4 BIO-301 Genetics Biology 4 BIO-399 Computational Biology Biology 3 CS-101

  • Intro. to Computer Science
  • Comp. Sci.

4 CS-190 Game Design

  • Comp. Sci.

4 CS-315 Robotics

  • Comp. Sci.

3 CS-319 Image Processing

  • Comp. Sci.

3 CS-347 Database System Concepts

  • Comp. Sci.

3 EE-181

  • Intro. to Digital Systems
  • Elec. Eng.

3 FIN-201 Investment Banking Finance 3 HIS-351 World History History 3 MU-199 Music Video Production Music 3 PHY-101 Physical Principles Physics 4

course

id name dept name tot cred 00128 Zhang

  • Comp. Sci.

102 12345 Shankar

  • Comp. Sci.

32 19991 Brandt History 80 23121 Chavez Finance 110 44553 Peltier Physics 56 45678 Levy Physics 46 54321 Williams

  • Comp. Sci.

54 55739 Sanchez Music 38 70557 Snow Physics 76543 Brown

  • Comp. Sci.

58 76653 Aoi

  • Elec. Eng.

60 98765 Bourikas

  • Elec. Eng.

98 98988 Tanaka Biology 120

student

id name dept name salary 10101 Srinivasan

  • Comp. Sci.

65000 12121 Wu Finance 90000 15151 Mozart Music 40000 22222 Einstein Physics 95000 32343 El Said History 60000 33456 Gold Physics 87000 45565 Katz

  • Comp. Sci.

75000 58583 Califieri History 62000 76543 Singh Finance 80000 76766 Crick Biology 72000 83821 Brandt

  • Comp. Sci.

92000 98345 Kim

  • Elec. Eng.

80000

instructor

s id i id 00128 45565 12345 10101 23121 76543 44553 22222 45678 22222 76543 45565 76653 98345 98765 98345 98988 76766

advisor

course id prereq id BIO-301 BIO-101 BIO-399 BIO-101 CS-190 CS-101 CS-315 CS-101 CS-319 CS-101 CS-347 CS-101 EE-181 PHY-101

prerequisite

Review of the Relational Model 20130410 Slide 11 of 22

slide-12
SLIDE 12

The University Schema with a Small Database Instance – 2

id course id sec id semester year grade 00128 CS-101 1 Fall 2009 A 00128 CS-347 1 Fall 2009 A- 12345 CS-101 1 Fall 2009 C 12345 CS-190 2 Spring 2009 A 12345 CS-315 1 Spring 2010 A 12345 CS-347 1 Fall 2009 A 19991 HIS-351 1 Spring 2010 B 23121 FIN-201 1 Spring 2010 C+ 44553 PHY-101 1 Fall 2009 B- 45678 CS-101 1 Fall 2009 F 45678 CS-101 1 Spring 2010 B+ 45678 CS-319 1 Spring 2010 B 54321 CS-101 1 Fall 2009 A- 54321 CS-190 2 Spring 2009 B+ 55739 MU-199 1 Spring 2010 A- 76543 CS-101 1 Fall 2009 A 76543 CS-319 2 Spring 2010 A 76653 EE-181 1 Spring 2009 C 98765 CS-101 1 Fall 2009 C- 98765 CS-315 1 Spring 2010 B 98988 BIO-101 1 Summer 2009 A 98988 BIO-301 1 Summer 2010 null

takes

id course id sec id semester year 10101 CS-101 1 Fall 2009 10101 CS-315 1 Spring 2010 10101 CS-347 1 Fall 2009 12121 FIN-201 1 Spring 2010 15151 MU-199 1 Spring 2010 22222 PHY-101 1 Fall 2009 32343 HIS-351 1 Spring 2010 45565 CS-101 1 Spring 2010 45565 CS-319 1 Spring 2010 76766 BIO-101 1 Summer 2009 76766 BIO-301 1 Summer 2010 83821 CS-190 1 Spring 2009 83821 CS-190 2 Spring 2009 83821 CS-319 2 Spring 2010 98345 EE-181 1 Spring 2009

teaches

Review of the Relational Model 20130410 Slide 12 of 22

slide-13
SLIDE 13

The University Schema with a Small Database Instance – 3

course id sec id semester year building room number time slot id BIO-101 1 Summer 2009 Painter 514 B BIO-301 1 Summer 2010 Painter 514 A CS-101 1 Fall 2009 Packard 101 H CS-101 1 Spring 2010 Packard 101 F CS-190 1 Spring 2009 Taylor 3128 E CS-190 2 Spring 2009 Taylor 3128 A CS-315 1 Spring 2010 Watson 120 D CS-319 1 Spring 2010 Watson 100 B CS-319 2 Spring 2010 Taylor 3128 C CS-347 1 Fall 2009 Taylor 3128 A EE-181 1 Spring 2009 Taylor 3128 C FIN-201 1 Spring 2010 Packard 101 B HIS-351 1 Spring 2010 Painter 514 C MU-199 1 Spring 2010 Packard 101 D PHY-101 1 Fall 2009 Watson 100 A

section

time slot id day start time end time A M 08:00 08:50 A W 08:00 08:50 A F 08:00 08:50 B M 09:00 09:50 B W 09:00 09:50 B F 09:00 09:50 C M 11:00 11:50 C W 11:00 11:50 C F 11:00 11:50 D M 13:00 13:50 D W 13:00 13:50 D F 13:00 13:50 E T 10:30 11:45 E R 10:30 11:45 F T 14:30 15:45 F R 14:30 15:45 G M 16:00 16:50 G W 16:00 16:50 G F 16:00 16:50 H W 10:00 12:30

timeslot

Review of the Relational Model 20130410 Slide 13 of 22

slide-14
SLIDE 14

Attributes and Domains

  • The label of a column is called an attribute.
  • With each attribute A is associated a set Dom(A) of domain values for A.

Example: Dom(dept name) might be the set of all strings of characters of length at most 20.

  • The field identified by attribute A of a tuple must have values taken from

Dom(A).

  • The only exception is that in some cases the special value NULL may be

used.

  • More later on null values.

id name dept name salary 10101 Srinivasan

  • Comp. Sci.

65000 12121 Wu Finance 90000 15151 Mozart Music 40000 22222 Einstein Physics 95000 32343 El Said History 60000 33456 Gold Physics 87000 45565 Katz

  • Comp. Sci.

75000 58583 Califieri History 62000 76543 Singh Finance 80000 76766 Crick Biology 72000 83821 Brandt

  • Comp. Sci.

92000 98345 Kim

  • Elec. Eng.

80000

instructor

Review of the Relational Model 20130410 Slide 14 of 22

slide-15
SLIDE 15

Relation Schemes

  • In conceptual work, the actual tuples in the database are of secondary

importance.

  • A relation scheme (or relation schema) is a a type definition for a table.
  • Formally, it is a (usually nonempty) finite list of attributes.
  • In the classical literature, it is often taken to be a set of attributes.
  • The order of the attributes is of no conceptual importance.
  • The relation name is the same as the scheme name.
  • The relation scheme for the instructor relation is shown below.

id name dept name salary

instructor

  • All eleven relation schemes for the university schema are shown on the

next slide.

Review of the Relational Model 20130410 Slide 15 of 22

slide-16
SLIDE 16

The Schemes of the University Schema

dept name building budget

department

course id title dept name credits

course

id name dept name tot cred

student

id name dept name salary

instructor

s id i id

advisor

course id prereq id

prerequisite

building room number capacity

classroom

course id sec id semester year building room number time slot id

section

id course id sec id semester year

teaches

id course id sec id semester year grade

takes

time slot id day start time end time

timeslot

Review of the Relational Model 20130410 Slide 16 of 22

slide-17
SLIDE 17

Integrity Constraints

  • The allowable databases must satisfy certain integrity constraints.

Examples: (Primary-)key dependency: Example: No two distinct departments may have the same name. Example: No two distinct instructors may have the same ID.

  • The attributes which constitute the primary key are underlined.

Foreign-key dependency: The dept name of an instructor must be the name

  • f a department (or NULL).
  • This is illustrated by an arrow from the foreign key of the referring

relation to the primary key of the referenced relation.

  • Foreign key constraints are a type of referential integrity constraints.

dept name building budget

department

id name dept name salary

instructor

  • All of the primary- and foreign-key constraints of the university schema

are shown on the next slide.

Review of the Relational Model 20130410 Slide 17 of 22

slide-18
SLIDE 18

The University Schema with Dependencies

dept name building budget department course id title dept name credits course id name dept name tot cred student id name dept name salary instructor s id i id advisor course id prereq id prerequisite building room number capacity classroom course id sec id semester year building room number time slot id section id course id sec id semester year teaches id course id sec id semester year grade takes time slot id day start time end time timeslot

  • The dashed line indicates a referential integrity constraint which is not a

true foreign-key dependency.

  • This will be discussed more fully in conjunction with triggers.

Review of the Relational Model 20130410 Slide 18 of 22

slide-19
SLIDE 19

Flavors of Key Constraints

  • A relation scheme may have several candidate keys.

Example: In the database in which the grades for this course are maintained, there is a relation which contains identification information about students. Ident PersonNr Name Student Ident: The user-id of the student on the departmental computing systems. PersonNr: The Swedish identification number of the student.

  • Both Ident and PersonNr are candidate keys.
  • Ident is chosen as the primary key.
  • A superkey is any (not necessarily proper) superset of a candidate key.

Example: {PersonNr, Name} is a superkey.

Review of the Relational Model 20130410 Slide 19 of 22

slide-20
SLIDE 20

Tuples

Context: R = (A1, A2, . . . , Ak) a relation scheme. Tuple: A row of data for R is called a tuple for R: t = (t1, t2, . . . , tk) with ti ∈ Dom(Ai) ∪ {NULL} for each i. Example: tAl = (’22222’,’Einstein’,’Physics’,95000) Projection of a tuple: For A = (Ai1, Ai2, . . . , Aiℓ) a nonempty subsequence of R and t = (t1, t2, . . . , tk) a tuple over R, define the projection of t onto A to be t[A] = (ti1, ti2, . . . , tiℓ) Example: tAl[Name,Salary] = (’Einstein’,95000) Notation: When abstract schemes are considered for example, A = (Ai1, Ai2, . . . , Aiℓ), commas may be dropped in attribute lists. Example: t[A1, A3, A7] = t[A1A3A7]. Notational convention: For a subset S = {Ai1, Ai2, . . . , Aiℓ} ⊆ {A1, A2, . . . , An}, write t[S] for t[Ai1, Ai2, . . . , Aℓ].

Review of the Relational Model 20130410 Slide 20 of 22

slide-21
SLIDE 21

Null Values

  • A special value NULL is allowed instead of a normal domain value in

certain circumstances.

  • The semantics of NULL are unfortunately not specified.
  • The semantics must be defined by usage convention.

Example: Suppose that the attribute Telephone is added to the employee relation.

  • There are at least three possible interpretations of a null value:

Value not known: The employee has a telephone but the number is not in the database. No value: The employee has no telephone (but could have one). Inappropriate attribute: The position of the employee (e.g., custodial) does not involve having a telephone.

  • Thus, the meaning of null values can be ambiguous.

Question: What does a null value for the grade in the takes relation of the university schema mean?

Review of the Relational Model 20130410 Slide 21 of 22

slide-22
SLIDE 22

Update Operations

  • There are three main classes of update operations:

Insert: Add some new tuples to the database. Delete: Remove some tuples from the database. Modify: Change some fields of existing tuples.

  • A principal aspect of support for update operation is to ensure that the

integrity constraints remain satisfied.

  • There are two main strategies:

Restrict: If a proposed update violates the integrity constraints, reject it. Cascade: If a proposed update violates the integrity constraints, make additional updates automatically to satisfy them. Example of cascading: If an employee is deleted, remove also all associated entries in the Works On and Dependent relations.

  • It is probably not a good idea to cascade to the Department or

Employee relations!! Remark on terminology: Modify is called Update in SQL.

  • This leads to terminology overload which can be confusing and must

be resolved carefully.

Review of the Relational Model 20130410 Slide 22 of 22