Relational Algebra Lecture 7 1 Outline Relational Algebra - - PDF document

relational algebra
SMART_READER_LITE
LIVE PREVIEW

Relational Algebra Lecture 7 1 Outline Relational Algebra - - PDF document

Relational Algebra Lecture 7 1 Outline Relational Algebra (Section 6.1) 2 1 Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declarative query Algebra Implementation


slide-1
SLIDE 1

1

1

Relational Algebra

Lecture 7

2

Outline

  • Relational Algebra (Section 6.1)
slide-2
SLIDE 2

2

3

Relational Algebra

  • Formalism for creating new relations from

existing ones

  • Its place in the big picture:

Declarative query language Algebra Implementation SQL, relational calculus Relational algebra

4

Relational Algebra

  • Five operators:

– Union: ∪ – Difference: - – Selection: s – Projection: P – Cartesian Product: ×

  • Derived or auxiliary operators:

– Intersection, complement – Joins (natural,equi-join, theta join, semi-join) – Renaming: r

slide-3
SLIDE 3

3

5

  • 1. Union and 2. Difference
  • R1 ∪ R2

Example:

– ActiveEmployees ∪ RetiredEmployees

  • R1 – R2

Example:

– AllEmployees − RetiredEmployees

6

What about Intersection ?

  • It is a derived operator

R1 ∩ R2 = R1 – (R1 – R2)

  • Also expressed as a join (will see later)

Example

– UnionizedEmployees ∩ RetiredEmployees

slide-4
SLIDE 4

4

7

  • 3. Selection
  • Returns all tuples which satisfy a condition
  • Notation: sc(R)
  • Examples

– s Salary > 40000 (Employee) – s name = “Smith” (Employee)

  • The condition c can be =, <, ≤, >, ≥, <>

[in SQL: SELECT * FROM Employee WHERE Salary > 40000]

8

Selection Example Employee SSN Name DepartmentID Salary 999999999 John 1 30,000 777777777 Tony 1 32,000 888888888 Alice 2 45,000 SSN Name DepartmentID Salary 888888888 Alice 2 45,000 Find all employees with salary more than $40,000. s Salary > 40000 (Employee)

slide-5
SLIDE 5

5

9

  • 4. Projection
  • Eliminates columns, then removes duplicates
  • Notation: P A1,…,An (R)
  • Example: project to social-security number and

names:

– P SSN, Name (Employee) – Output schema: Answer(SSN, Name)

[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

10

Projection Example Employee SSN Name DepartmentID Salary 999999999 John 1 30,000 777777777 Tony 1 32,000 888888888 Alice 2 45,000 SSN Name 999999999 John 777777777 Tony 888888888 Alice

P SSN, Name (Employee)

slide-6
SLIDE 6

6

11

  • 5. Cartesian Product
  • Combine each tuple in R1 with each tuple in R2
  • Notation: R1 × R2
  • Example:

– Employee × Dependents

  • Very rare in practice; mainly used to express joins

[In SQL: SELECT * FROM R1, R2]

12

Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe

slide-7
SLIDE 7

7

13

Relational Algebra

  • Five operators:

– Union: ∪ – Difference: - – Selection: s – Projection: P – Cartesian Product: ×

  • Derived or auxiliary operators:

– Intersection, complement – Joins (natural,equi-join, theta join, semi-join) – Renaming: r

14

Renaming

  • Changes the schema, not the instance
  • Schema: R(A1, …, An )
  • Notation: r B1,…,Bn (R)
  • Example:

– r LastName, SocSocNo (Employee) – Output schema: Answer(LastName, SocSocNo)

[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]

slide-8
SLIDE 8

8

15

Renaming Example

Employee Name SSN John 999999999 Tony 777777777 LastName SocSocNo John 999999999 Tony 777777777

r LastName, SocSocNo (Employee)

16

Natural Join

  • Notation: R1 R2
  • Meaning: R1 R2 = PA(s C(R1 × R2))
  • Where:

– The selection sC checks equality of all common attributes – The projection eliminates the duplicate common attributes

[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]

slide-9
SLIDE 9

9

17

Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents))

18

Natural Join

  • R= S=
  • R S=

V Z Z Y Z X Y X B A V Z W V U Z C B

W V Z V Z Y U Z Y V Z X U Z X C B A

slide-10
SLIDE 10

10

19

Natural Join

  • Given the schemas R(A, B, C, D), S(A, C, E),

what is the schema of R S ?

  • Given R(A, B, C), S(D, E), what is R S ?
  • Given R(A, B), S(A, B), what is R S ?

20

Theta Join

  • A join that involves a predicate
  • R1 q R2 = s q (R1 × R2)
  • Here q can be any condition
slide-11
SLIDE 11

11

21

Eq-join

  • A theta join where q is an equality

R1 A=B R2 = s A=B (R1 × R2)

  • Example:

– Employee SSN=SSN Dependents

  • Most useful join in practice

(difference to natural join?)

22

Semijoin

  • R S = P A1,…,An (R S)
  • Where A1, …, An are the attributes in R
  • Example:

– Employee Dependents

slide-12
SLIDE 12

12

23

Semijoins in Distributed Databases

  • Semijoins are used in distributed databases

. . . . . . Name SSN Dname . . . . . . Age SSN

Employee Dependents network

Employee ssn=ssn (s age>71 (Dependents))

T = P SSN s age>71 (Dependents) R = Employee T Answer = R Dependents

24

Complex RA Expressions

Person Purchase Person Product

s name=fred s name=gizmo P pid P ssn

seller-ssn=ssn pid=pid buyer-ssn=ssn

P name

slide-13
SLIDE 13

13

25

Application: Query Rewriting for Optimization

Reserves Sailors

sid=sid bid=100 rating > 5 sname

Reserves Sailors

sid=sid bid=100 sname rating > 5

(Scan; write to temp T1) (Scan; write to temp T2)

The earlier we process selections, less tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages?

26

Algebraic Laws (Examples)

  • Commutative and Associative Laws

– R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T – R S = S R, R (S T) = (R S) T

  • Laws involving selection

– s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) – s C (R S) = s C (R) S

  • When C involves only attributes of R
  • Laws involving projections

– PM(PN(R)) = PM,N(R)

slide-14
SLIDE 14

14

27

Operations on Bags

A bag = a set with repeated elements All operations need to be defined carefully on bags

  • {a,b,b,c}∪{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
  • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}
  • sC(R): preserve the number of occurrences
  • PA(R): no duplicate elimination
  • Cartesian product, join: no duplicate elimination

Important ! Relational Engines work on bags, not sets !

28

Finally: RA has Limitations !

  • Cannot compute “transitive closure”
  • Find all direct and indirect relatives of Fred
  • Cannot express in RA !!! Need to write C program

Sister Lou Nancy Spouse Bill Mary Cousin Joe Mary Father Mary Fred Relationship Name2 Name1