Relational Algebra Lecture 7 1 Outline Relational Algebra - - PDF document

relational algebra
SMART_READER_LITE
LIVE PREVIEW

Relational Algebra Lecture 7 1 Outline Relational Algebra - - PDF document

Relational Algebra Lecture 7 1 Outline Relational Algebra (Section 6.1) 2 Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declarative query Algebra Implementation


slide-1
SLIDE 1

1

Relational Algebra

Lecture 7

2

Outline

  • Relational Algebra (Section 6.1)
slide-2
SLIDE 2

3

Relational Algebra

  • Formalism for creating new relations

from existing ones

  • Its place in the big picture:

Declarative query language Algebra Implementation SQL, relational calculus Relational algebra

4

Relational Algebra

  • Five operators:

– Union: – Difference: - – Selection: s – Projection: P – Cartesian Product:

  • Derived or auxiliary operators:

– Intersection, complement – Joins (natural,equi-join, theta join, semi-join) – Renaming: r

slide-3
SLIDE 3

5

  • 1. Union and 2. Difference
  • R1 R2

Example:

– ActiveEmployees RetiredEmployees

  • R1 – R2

Example:

– AllEmployees RetiredEmployees

6

What about Intersection?

  • It is a derived operator

R1 R2 = R1 – (R1 – R2)

  • Also expressed as a join (will see later)

Example

– UnionizedEmployees RetiredEmployees

slide-4
SLIDE 4

7

  • 3. Selection
  • Returns all tuples which satisfy a condition
  • Notation: sc(R) (sometimes also )
  • Examples

– s Salary > 40000 (Employee) – s name = “Smith” (Employee)

  • The condition c can be =, <, , >, , <>

[in SQL: SELECT * FROM Employee WHERE Salary > 40000]

condition

8

Selection Example Employee SSN Name DepartmentID Salary 999999999 John 1 30,000 777777777 Tony 1 32,000 888888888 Alice 2 45,000 SSN Name DepartmentID Salary 888888888 Alice 2 45,000 Find all employees with salary more than $40,000. s Salary > 40000 (Employee)

slide-5
SLIDE 5

9

  • 4. Projection
  • Eliminates columns, then removes duplicates
  • Notation: P A1,…,An (R) (sometimes )
  • Example:

– project to social-security number and names: – P SSN, Name (Employee) – Output schema: Answer(SSN, Name)

[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

attributes

10

Projection Example Employee SSN Name DepartmentID Salary 999999999 John 1 30,000 777777777 Tony 1 32,000 888888888 Alice 2 45,000 SSN Name 999999999 John 777777777 Tony 888888888 Alice

P SSN, Name (Employee)

slide-6
SLIDE 6

11

  • 5. Cartesian Product
  • Combine each tuple in R1 with each tuple in

R2

  • Notation: R1 R2
  • Example:

– Employee Dependents

  • Very rare in practice; mainly used to express

joins

[In SQL: SELECT * FROM R1, R2]

12

Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe

Note the output

slide-7
SLIDE 7

13

Cartesian Product and SQL

select * from Employee, Dependents; +------+-----------+-------------+-------+ | Name | SSN | EmployeeSSN | Dname | +------+-----------+-------------+-------+ | John | 999999999 | 999999999 | Emily | | Tony | 777777777 | 999999999 | Emily | | John | 999999999 | 777777777 | Joe | | Tony | 777777777 | 777777777 | Joe | +------+-----------+-------------+-------+ select * from Employee, Dependents where SSN=EmpoyeeSSN; +------+-----------+-------------+-------+ | Name | SSN | EmployeeSSN | Dname | +------+-----------+--------------+-------+ | John | 999999999 | 999999999 | Emily | | Tony | 777777777 | 777777777 | Joe | +------+-----------+-------------+-------+

14

Relational Algebra

  • Five operators:

– Union: – Difference: - – Selection: s – Projection: P – Cartesian Product:

  • Derived or auxiliary operators:

– Intersection, complement – Joins (natural,equi-join, theta join, semi-join) – Renaming: r

slide-8
SLIDE 8

15

Renaming

  • Changes the schema, not the instance
  • Schema: R(A1, …, An )
  • Notation: r B1,…,Bn (R)
  • Example:

– r LastName, SocSocNo (Employee) – Output schema: Answer(LastName, SocSocNo)

[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]

16

Renaming Example

Employee Name SSN John 999999999 Tony 777777777 LastName SocSocNo John 999999999 Tony 777777777

r LastName, SocSocNo (Employee)

slide-9
SLIDE 9

17

Natural Join

  • Notation: R1 ! R2
  • Meaning: R1 ! R2 = PA(sC(R1 R2))

– Equality on common attributes names

  • Where:

– The selection sC checks equality of all common attributes – The projection eliminates the duplicate common attributes

[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]

18

Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents))

slide-10
SLIDE 10

19

Natural Join

  • R= S=
  • R ! S=

V Z Z Y Z X Y X B A V Z W V U Z C B

W V Z V Z Y U Z Y V Z X U Z X C B A

20

Natural Join

  • Given the schemas R(A, B, C, D), S(A, C,

E), what is the schema of R ! S ?

  • Given R(A, B, C), S(D, E), what is R ! S ?
  • Given R(A, B), S(A, B), what is R ! S ?
slide-11
SLIDE 11

21

Theta Join

  • A join that involves a predicate
  • R1 ! q R2 = sq (R1 R2)
  • Here q can be any condition

22

Eq-join

  • A theta join where q is an equality

R1 !A=B R2 = s A=B (R1 R2) Equality on two different attributes

  • Example:

– Employee !SSN=SSN Dependents

  • Most useful join in practice

(difference to natural join?)

slide-12
SLIDE 12

23

Semijoin

  • R " S = P A1,…,An (R ! S)
  • Where A1, …, An are the attributes in R
  • Example:

– Employee " Dependents

24

Employee Name EmpId DeptName Harry 3415 Finance Sally 2241 Sales George 3401 Finance Harriet 2202 Sales Dept DeptName Manager Sales Harriet Production Charles

Semijoin - Example

Employee ! Dept Name EmpId DeptName Sally 2241 Sales Harriet 2202 Sales

Only attributes of Employee are selected

slide-13
SLIDE 13

25

Semijoins in Distributed Databases

  • Semijoins are used in distributed databases

. . . . . . Name SSN Dname . . . . . . Age SSN

Employee Dependents network

Employee !ssn=ssn (s age>71 (Dependents))

T = P SSN s age>71 (Dependents) R = Employee "!T Answer = R ! Dependents

26

Complex RA Expressions

Person Purchase Person Product

sname=fred sname=gizmo P pid P ssn

seller-ssn=ssn pid=pid buyer-ssn=ssn

P name

slide-14
SLIDE 14

27

Application: Query Rewriting for Optimization

Reserves Sailors

sid=sid bid=100 rating > 5 sname

Reserves Sailors

sid=sid bid=100 sname rating > 5

(Scan; write to temp T1) (Scan; write to temp T2)

The earlier we process selections, the fewer tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages?

28

Algebraic Laws (Examples)

  • Commutative and Associative Laws

– R S = S R, R (S T) = (R S) T – R S = S R, R (S T) = (R S) T

  • Laws involving selection

– s C AND C(R) = s C(s C(R)) = s C(R) s C(R) – s C (R S) = s C (R) S

  • When C involves only attributes of R
  • Laws involving projections

– PM(PN(R)) = PM,N(R)

! ! ! ! ! ! ! !

slide-15
SLIDE 15

29

Operations on Bags

A bag = a set with repeated elements All operations need to be defined carefully on bags

  • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
  • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}
  • sC(R): preserve the number of occurrences
  • PA(R): no duplicate elimination
  • Cartesian product, join: no duplicate elimination

Important: Relational Engines work on bags, not sets!

30

Finally: RA has Limitations !

  • Cannot compute “transitive closure”
  • Find all direct and indirect relatives of Fred
  • Cannot express in RA !!! Need to write C

program

Sister Lou Nancy Spouse Bill Mary Cousin Joe Mary Father Mary Fred Relationship Name2 Name1

slide-16
SLIDE 16

31

Conclusion

  • Relational algebra is important for SQL
  • We mainly need projects, selections,

joins

  • Joins are typically time consuming for

large databases