Relational Algebra Spring 2012 Instructor: Hassan Khosravi - - PowerPoint PPT Presentation

relational algebra
SMART_READER_LITE
LIVE PREVIEW

Relational Algebra Spring 2012 Instructor: Hassan Khosravi - - PowerPoint PPT Presentation

Relational Algebra Spring 2012 Instructor: Hassan Khosravi Querying relational databases Lecture given by Dr. Widom on querying Relational Models 2.2 2.1 An Overview of Data Models 2.1.1 What is a Data Model? 2.1.2 Important


slide-1
SLIDE 1

Relational Algebra

Spring 2012 Instructor: Hassan Khosravi

slide-2
SLIDE 2

2.2

Querying relational databases

Lecture given by Dr. Widom on querying Relational Models

slide-3
SLIDE 3

2.3

2.1 An Overview of Data Models

2.1.1 What is a Data Model?

2.1.2 Important Data Models

2.1.3 The Relational Model in Brief

2.1.4 The Semi-structured Model in Brief

2.1.5 Other Data Models

2.1.6 Comparison of Modeling Approaches

slide-4
SLIDE 4

2.4

2.1.1 What is a Data Model?

Data model is a notion for describing data or information. Real World Math Model:

1.

Structure of the data (tuples)

2.

Operations on the data –queries to retrieve and modify information

3.

Constraints on the data – year has to be integer, name is string .

Important data models

The relational Model

The semi-structured data model XML

slide-5
SLIDE 5

2.5

Relational Model in Brief

Relational model is based on tables

Operations: query, modify

Constraints: year is Integer between 1930-2012

The structure may appear to resemble an array of structs in C where the column headers are the field names and each row represent the values of one struct in the array.

 Distinction in scales of relations  Not normally implemented as main-memory structure  Take into consideration to access relations on hard drive

Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy

slide-6
SLIDE 6

2.6

The Semi-structured Model in Brief

Semi structure data resembles trees or graphs rather than tables or arrays.

Operations usually involve following in the tree.

Find the movies with the comedy genre.

Constraints often involve data types of values associated with a tag.

Values associated with the length tag are integers <Movies> <Movie title=“Gone with the Wind”> <Year>1939</Year> <Length>231</Length> <Genre>drama</Genre> </Movie> <Movie title=“Star Wars”> <title= Wars > <Year>1977</Year> <Length>124</Length> <Genre>sciFi</Length> </Movie> <Movie title=“Wayne’s World”> <Year>1992</Year> <Length>95</Length> <Genre>comedy</Genre> </Movie> </Movies>

slide-7
SLIDE 7

2.7

Comparison of Modeling Approaches

Semi-structured models have more flexibility than relations. However, the relational model is still preferred in DBMS’s.

  • 1. Efficiency of access to data and efficiency of modifications to that

data are more important than flexibility

  • 2. ease of use is more important than flexibility.

SQL enables the programmer to express their wishes at very high

  • level. The strongly limited set of operations can be optimized to run

very fast

slide-8
SLIDE 8

2.8

Basics of the Relational model

Attributes: columns of a relation are named attributes.

Schema: the name of the relation and the set of attributes

 Movies(title, year, length, genre)

Tuples: The rows of a relation, other than the header

Domains: the value for each attribute must be atomic (can not be structure). Each attribute has a domain of values. Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy

slide-9
SLIDE 9

2.9

Equivalent Representations of a Relation

Relations are sets of tuples not lists of tuples. The order of tuples does not matter. Attributes could be reordered too. How many different ways can we present the given relation? Year Genre Title length 1977 SciFi Star Wars 124 1992 Comedy Wayne’s World 95 1939 Drama Gone With the Wind 231 Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy

slide-10
SLIDE 10

2.10

Relation Instances and Keys

 A set of tuples for a given relation is called an instance of that

  • relation. It is expected for the instance of the relation to change
  • ver time.

 New movies are added to the table

 It is less common for the schema of a relation to change. It is hard

to add a new value for all the current tuples if a new attribute is added to the schema.

Keyes of relations

 Key constraints: A set of attributes form a key if we do not allow

two tuples in a relation instance to have the same value.

 We indicate the attributes that form a key by underlining them

 Movies(title, year, length, genre)

 Key most be true for all possible instances of a relation not a

specific instance.

 Genre is not a key

 What if our data does not have a key?

 Generate artificial ID. Student Number

slide-11
SLIDE 11

2.11

Database Schema about Movies

Movies( title: string; Year : integer, Length : integer, Genre : string, studioName : string, producerC# : integer ) Moviestar ( name : string, address : string, gender : char, birthdate : date ) StarsIn ( MovieTitle: string, Movieyear : integer Starname : string ) MovieExec ( name: string, address : string cert# : integer netWorth : integer ) Studio ( name: string, address : string pressC# : integer )

slide-12
SLIDE 12

2.12

Defining a Relation Schema in SQL

2.3.1 Relations in SQL

2.3.2 Data Types

2.3.3 Simple Table Declarations

2.3.4 Modifying Relation Schemas

2.3.5 Default Values

2.3.6 Declaring Keys

2.3.7 Exercises for Section 2.3

slide-13
SLIDE 13

2.13

2.3.1 Relations in SQL

SQL also pronounced (sequel) is the principal language used to describe and manipulate relational database

SQL makes a distinction between three kinds of relations

 Stored relations (tables): this relations are tables that exist in the

database we can query and modify

 Views: are relations defined by a computation. They are not stored

but constructed. We just query them (chapter 8)

 Temporary tables: are constructed by SQL language processor

during optimization. These are not stored nor seen by the user

slide-14
SLIDE 14

2.14

Data Types

Char(n): a fixed-length string of up to n characters.

 Char(5) of foo is stored “foo ”

Varchar(n): a variable-length string of up to n characters

 Varchar(5) of foo is stored “foo”

Bit(n), Varbit(n) fixed and variable string of upto n bits.

Boolean: True False and although it would surprise George Boole Unknown

Int or Integer: typical integer values

Float or real: typical real values

Decimal(6,2) could be 0123.45

Date and time: essentially char strings with constraints.

slide-15
SLIDE 15

2.15

2.3.3 Simple Table Declarations

CREATE TABLE Movie ( title VARCHAR(255), year INTEGER, length INTEGER, inColor CHAR(1), studioName CHAR(50), producerC# INTEGER, ); CREATE TABLE MOVIESTAR ( NAME CHAR(30), ADDRESS VARCHAR2(50), GENDER CHAR(6) , BIRTHDATE DATE );

Movies( title: string; Year : integer, Length : integer, Genre : string, studioName : string, producerC# : integer ) Moviestar ( name : string, address : string, gender : char, birthdate : date )

slide-16
SLIDE 16

2.16

Modifying Relation Schemas

We can delete a table R by the following SQL command

 Drop table R;

We can modify a table by the command

 Alter Table MovieStar ADD phone CHAR(16);  Alter Table MovieStar Drop birthdate;

Defaults values

 To use the default character ? As the default for an unknown

gender.

 Earliest possible date for Unknown Birthdate. DATE ‘0000-00-00’

 Gender CHAR(1) DEFAULT ‘?’,  Birthdate DATE DEFAULT DATE ‘0000-00-00’,  ALTER TABLE MovieStar ADD phone CHAR (16) DEFAULT ‘

unlisted’;

slide-17
SLIDE 17

2.17

2.3.6 Declaring Keys

Two ways to declare keys in CRATE table statement

 Primary key can not be null  Unique can be null  Replace primary with unique in examples to get the example with

unique

CREATE TABLE MovieStar ( name CHAR (30) Primary Key, address VARCHAR (255), gender CHAR(1), birthdate DATE ); CREATE TABLE MovieStar ( name CHAR (30), address VARCHAR (255), gender CHAR(1), birthdate DATE PRIMARY KEY (name) );

slide-18
SLIDE 18

2.18

Example 2.7

The Relation Movie, whose key is the pair of attributes ‘title and year’ must be declared like this

CREATE TABLE Movies( title CHAR(100), year INTEGER, length INTEGER, genre CHAR(10), studiName CHAR(30), producerC# INTEGER, PRIMARY KEY (title,year) );

slide-19
SLIDE 19

2.19

Quick summary

Lecture given by Dr. Widom on Relational Model definition

slide-20
SLIDE 20

2.20

2.4 An Algebraic Query Language

 2.4.1 Why Do We Need a Special Query Language?  2.4.2 What is an Algebra?  2.4.3 Overview of Relational Algebra  2.4.4 Set Operations on Relations  2.4.5 Projection  2.4.6 Selection  2.4.7 Cartesian Product  2.4.8 Natural Joins  2.4.9 Theta-Joins  2.4.10 Combining Operations to Form Queries  2.4.11 Naming and Renaming  2.4.12 Relationships Among Operations  2.4.13 A Linear Notation for Algebraic Expressions  2.4.14 Exercises for Section 2.4

slide-21
SLIDE 21

2.21

Why Do We Need a Special Query Language?

Why not just use C or java instead of introducing relational algebra ?

 Relational algebra is useful because it is less powerful than C and

  • Java. One of the only areas where non-Turing-complete

languages make sense.

 Relational algebra CANNOT determine whether the number of

tuples are odd or even

 Being less powerful is helpful because

 Ease of programming  Ease of compilation

– Ease of optimization

slide-22
SLIDE 22

2.22

Projection

Title Year Length Genre Studioname producerC# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne’s World 1992 95 Comedy Paramount 99999

The Projection operator applied to a relation R, produces a new relation with a subset of R’s columns.

Duplicate tuples are eliminated.

Title Year Length Star Wars 1977 124 Galaxy Quest 1999 104 Wayne’s World 1992 95

∏Title,year,length (Movies) ∏genre (Movies)

Genre SciFi Comedy

slide-23
SLIDE 23

2.23

Selection and Projection

Lecture given by Dr. Widom on selection and projection

slide-24
SLIDE 24

2.24

2.4.6 Selection

The selection operator applied to a relation R, produces a new relation with a subset of R’s tuples.

Title Year Length Genre Studioname producerC# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne’s World 1992 95 Comedy Paramount 99999 Title Year Length Genre StudioName producerC# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890

σ length >= 100(Movie)

slide-25
SLIDE 25

2.25

Example for Selection

Set tuples in the relation movies that represent Fox Movies at least 100 minutes long.

Title Year Length Genre StudioName producerC# Star Wars 1977 124 SciFi Fox 12345 Title Year Length Genre Studioname producerC# Star Wars 1977 124 SciFi Fox 12345 Galaxy 1999 104 Comedy DreamWorks 67890 Wayne’s World 1992 95 Comedy Paramount 99999

σ Length >= 100 AND studioName = ‘Fox’ (Movies)

slide-26
SLIDE 26

2.26

2.4.7 Cartesian Product

The Cartesian Product of two sets R and S is the set of pairs that can be formed by choosing the first element from R and the second from S.

If R and S have some attribute in common, we need to invent new name for the identical attributes.

A B 1 2 3 4 B C D 2 5 6 4 7 8 9 10 11

Relation R Relation S Relation R X S

A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11

slide-27
SLIDE 27

2.27

Cartesian Product

Lecture given by Dr. Widom on duplicates , cross product

slide-28
SLIDE 28

2.28

slide-29
SLIDE 29

2.29

2.4.8 Natural Joins

The Natural join of two sets R and S is the set of pairs that agree in whatever attributes are common to the schemas of R and S.

 Let A1,A2, …, An be attributes in both R and S. a tuple r from R

and s from S are successfully paired if and only if r and s agree on A1,A2, …, An

that can be formed by choosing the first element from R and the second from S.

A B C D 1 2 5 6 3 4 7 8 A B 1 2 3 4 B C D 2 5 6 4 7 8 9 10 11

Relation R Relation S Relation R ⋈ S

slide-30
SLIDE 30

2.30

Example for Natural Join

A B C 1 2 3 6 7 8 9 7 8 B C D 2 3 4 2 3 5 7 8 10

Relation U Result U ⋈ V

A B C D 1 2 3 4 1 2 3 5 6 7 8 10 9 7 8 10

Relation V 

A more complicated example for natural join

slide-31
SLIDE 31

2.31

Lecture given by Dr. Widom on Natural Join

slide-32
SLIDE 32

2.32

slide-33
SLIDE 33

2.33

slide-34
SLIDE 34

2.34

Theta-Joins

It is sometimes desirable to pair tuples on other conditions except all the common attributes being equal. The notation for a theta-join of relation R and S based on condition C is R ⋈C S

 The result is constructed as follows:

– Take product of R and S – Select tuples that satisfy C

A U.B U.C V.B V.C D 1 2 3 2 3 4 1 2 3 2 3 5 1 2 3 7 8 10 6 7 8 7 8 10 9 7 8 7 8 10

U ⋈ A < D V

A B C 1 2 3 6 7 8 9 7 8 B C D 2 3 4 2 3 5 7 8 10

Relation U Relation V

slide-35
SLIDE 35

2.35

Example on Theta-Joins

U and V that has more complex condition :

 We require for successful pairing not only that the A

component of U-tuple be less than D component of the V-tuple, but that the two tuples disagree on their respective B components

A U.B U.C V.B V.C D 1 2 3 7 8 10 A B C 1 2 3 6 7 8 9 7 8 B C D 2 3 4 2 3 5 7 8 10

Relation U Relation V U ⋈ A < D AND U.B <> V.B V

slide-36
SLIDE 36

2.36

Combining Operations to Form Queries

Example: “ What are the titles and years of movies made by Fox that are at least 100 minutes long”

∩ ∏ Title,year σ length >=100 σ StudioName =‘Fox’ Movies Movies

∏ Title,year (σ length >=100 (Movies) ∩ σ StudioName =‘Fox’ (Movies) ∏ Title,year (σ length >=100 AND StudioName =‘Fox’ (Movies)

slide-37
SLIDE 37

2.37

Relational algebra

Algebra in general consists of operators and atomic operands

 Algebra of arithmetic operands are variables and constants and

  • perators are (+, -, *, /).

Any algebra allows us to build expressions by applying an operator to

  • perands and other expressions.

 (x+y)/z

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88

Relation R

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77

Relation S

slide-38
SLIDE 38

2.38

Operations of relational algebra

Union (R S): the set of elements that are in R, or S or both. Appears

  • nly once in the union.

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark Hamill 456 oak Rd., Brentwood M 8/8/88 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Relation S

slide-39
SLIDE 39

2.39

Intersection (R S): the set of elements that are in both R and S. Appears only once in the intersection.

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Relation S

Operations of relational algebra

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99

slide-40
SLIDE 40

2.40

The Difference (R-S): the set of elements that are in R and not in S. Appears only once in the difference.

Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Mark hamill 456 Oak road., Brentwood M 8/8/88 Relation R Name Address Gender Birthdate Carrie Fisher 123 Maple st., Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly Hills M 7/7/77 Relation S

Operations of relational algebra

Name Address Gender Birthdate Mark Hamill 456 oak Rd., Brentwood M 8/8/88

slide-41
SLIDE 41

2.41

Lecture given by Dr. Widom on union, difference, intersection

slide-42
SLIDE 42

2.42

slide-43
SLIDE 43

2.43

2.4.11 Naming and Renaming

Operator to explicitly rename attributes in relations.

PS(A1,A2, …, An ) (R) results in a relation S that has exactly the same

tuples as R but the attributes names are A1,A2, …, An starting from the left most attribute.

A B X C D 1 2 2 5 6 1 2 4 7 8 1 2 9 10 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11

R X ρ s (X,C,D) (S)

A B 1 2 3 4 B C D 2 5 6 4 7 8 9 10 11

Relation R Relation S

slide-44
SLIDE 44

2.44

Lecture given by Dr. Widom on Renaming

slide-45
SLIDE 45

2.45

Relationships Among Operations

Intersection can be expressed as difference.

 R S = R –(R –S)  See video

Theta join can be expressed by product and selection

 R ⋈ CS=

C(R S)

Natural join can be rewritten by product, selection, projection

 Example Result U ⋈ V = ∏A,U.B, U.C, D(

U.B=V.B AND U.C=V.B (U V))

These are the only redundancies ( union, difference, selection, projection, product, renaming) form an independent set.

A B C 1 2 3 6 7 8 9 7 8 B C D 2 3 4 2 3 5 7 8 10

Relation U Relation V Result U ⋈ V

A B C D 1 2 3 4 1 2 3 5 6 7 8 10 9 7 8 10

slide-46
SLIDE 46

2.46

2.5 Constraints on Relations

2.5.1 Relational Algebra as a Constraint Language

2.5.2 Referential Integrity Constraints

2.5.3 Key Constraints

2.5.4 Additional Constraint Examples

2.5.5 Exercises for Section 2.5

2.6 Summary of Chapter 2

2.7 References for Chapter 2

slide-47
SLIDE 47

2.47

Referential Integrity Constraints

Referential Integrity Constraints

 A value appearing in one context also appears in another, related

context

 StarsIn(movietitle, movieYear,starName)  Movie(title,year,length,studioName, producerC#)  ∏movieTitle, movieYear (StarsIn) ⊆ ∏title,year(Movies)  Movie(title,year,length,genre,studioName, producerC#)  MovieExec(name,address,cert#,netWorth)  ∏producerC# (Movies) ⊆ ∏cert# (MocvieExec)

slide-48
SLIDE 48

2.48

Key Constraints

Recall that name is the key for relation

 MovieStar(name,address,gender,birthdate)  The requirement can be expressed by the algebraic

expression

 σ MS1.name = MS2.name AND MS1.address ≠ MS2.address(MS1 x MS2) = ∅

 MS1 in the product MS1 x MS2 is shorthand for the remaining  ρ MS1(name,address,gender,birthdate) (MovieStar)

slide-49
SLIDE 49

2.49

Example 2.24

The only legal value for Gender attribute is ‘F’ and ‘M’. We can express the gender attribute of MovieStar alegrabically by:

 σ Mgender ≠‘F’ AND gender ≠‘M’(MovieStar) = ∅

slide-50
SLIDE 50

2.50

Example 2.25

If one must have networth of at least $100,000,000 to be president of movie studio. FROM

 MovieExec(name,address,cert#,networth)  Studio(name,address, presC#)

First we have to perform theta-join on this two relations.

 σ networth < 100000000(Studio ⋈ presC# = cert# MovieExec) = ∅

Second way

 ∏TpressC#(Studio) ⊆ ∏cert#(σ networth < 100000000(MovieExec))

Which one is more efficient?

slide-51
SLIDE 51

2.51

Summary of Relational Algebra

Lecture given by Dr. Widom on Relational Model