cse 344
play

CSE 344 SECTION 4 RELATIONAL ALGEBRA ! - PowerPoint PPT Presentation

CSE 344 SECTION 4 RELATIONAL ALGEBRA ! Formalism)for)describing)queries) Why RA? ! Basis)of)rela4onal)databases) ! Will)make)you)a)SQL)wizard!) Notes on RA ! Mul4ple)possible)query)plans) ! Logical)vs.)Physical)query)plans) ) Example:


  1. CSE 344 SECTION 4 – RELATIONAL ALGEBRA

  2. ! Formalism)for)describing)queries) Why RA? ! Basis)of)rela4onal)databases) ! Will)make)you)a)SQL)wizard!)

  3. Notes on RA ! Mul4ple)possible)query)plans) ! Logical)vs.)Physical)query)plans) )

  4. Example: RA-to-SQL Person(id,)name,)countryid)) Country(id,)name,)con4nent)) ) SELECT C.id ) ) FROM Person P, Country C ) ) WHERE P.countryid = C.id ) ) AND C.continent=‘Africa’ ) ) GROUP BY C.id ) ) HAVING COUNT(*) > 10000000 ) Can)we)make)a)more)efficient)plan?)

  5. Demo in Azure!

  6. RA Reference Sheet

  7. From Logical Plans to Physical Plans CSE 344 - Winter 2015 1

  8. Query Evaluation Steps SQL query Parse & Check Query Check syntax, Translate query access control, string into internal table names, etc. representation Decide how best to answer query: query optimization Logical plan ! physical plan Query Query Execution Evaluation Return Results CSE 344 - Winter 2015 2

  9. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Example SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � Give a relational algebra expression for this query CSE 344 - Winter 2015 3

  10. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Relational Algebra SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � π sname ( σ scity= � Seattle � ∧ sstate= � WA � ∧ pno=2 ( Supplier sid = sid Supply )) CSE 344 - Winter 2015 4

  11. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Relational Algebra SELECT sname π sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 σ scity= � Seattle �� ∧ sstate= � WA �� ∧ pno=2 and x.scity = � Seattle � and x.sstate = � WA � sid = sid Relational algebra expression is also called the “logical query Supplier Supply plan” CSE 344 - Winter 2015 5

  12. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 1 (On the fly) π sname A physical query plan is a logical query plan annotated with (On the fly) physical implementation details σ scity= � Seattle �� ∧ sstate= � WA �� ∧ pno=2 SELECT sname FROM Supplier x, Supply y (Block-nested loop) WHERE x.sid = y.sid sid = sid and y.pno = 2 and x.scity = � Seattle � and x.sstate = � WA � Supplier Supply (File scan) (File scan) CSE 344 - Winter 2015 6

  13. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 2 Different but equivalent logical query plan; different physical plan ( d ) (On the fly) π sname SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid ( c ) (Sort-merge join) and y.pno = 2 sid = sid and x.scity = � Seattle � (Scan and x.sstate = � WA � (Scan write to T1) write to T2) ( a ) σ scity= � Seattle �� ∧ sstate= � WA � ( b ) σ pno=2 Supplier Supply (File scan) (File scan) CSE 344 - Winter 2015 7

  14. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 3 (On the fly) ( d ) π sname Another logical plan that (On the fly) produces the same result and ( c ) σ scity= � Seattle �� ∧ sstate= � WA � is implemented with a different physical plan ( b ) (Index nested loop) sid = sid SELECT sname (Use index) FROM Supplier x, Supply y ( a ) σ pno=2 WHERE x.sid = y.sid and y.pno = 2 and x.scity = � Seattle � Supplier and x.sstate = � WA � Supply (Index lookup on pno ) (Index lookup on sid) 8 Assume: clustered Doesn � t matter if clustered or not

  15. Physical Data Independence • Means that applications are insulated from changes in physical storage details – E.g., can add/remove indexes without changing apps – Can do other physical tunings for performance • SQL and relational algebra facilitate physical data independence because both languages are “set-at-a-time”: Relations as input and output CSE 344 - Winter 2015 9

  16. Index • An additional file, that allows fast access to records in the data file given a search key CSE 344 - Winter 2015 1

  17. Index • An additional file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: – The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record CSE 344 - Winter 2015 2

  18. Index • An additional file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: – The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record • Could have many indexes for one table Key = means here search key CSE 344 - Winter 2015 3

  19. This Is Not A Key Different keys: • Primary key – uniquely identifies a tuple • Key of the sequential file – how the datafile is sorted, if at all • Index key – how the index is organized CSE 344 - Winter 2015 4

  20. Student Example 1: ID# fName# lName# 10# Tom# Hanks# Index on ID 20# Amy# Hanks# Data File Student …# Index Student_ID on Student.ID 10 Tom Hanks 10 20 Amy Hanks 20 50 50 … … 200 200 … 220 220 240 420 240 800 420 950 800 … CSE 344 - Winter 2015 5

  21. Student Example 2: ID# fName# lName# 10# Tom# Hanks# Index on fName 20# Amy# Hanks# Index Student_fName Data File Student …# on Student.fName 10 Tom Hanks Amy 20 Amy Hanks Ann Bob 50 … … Cho 200 … … 220 … … 240 … 420 … 800 … Tom CSE 344 - Winter 2015 6

  22. Index Organization Several index organizations: • Hash table • B+ trees – most popular – They are search trees, but they are not binary instead have higher fanout – will discuss them briefly next • Specialized indexes: bit maps, R-trees, inverted index CSE 344 - Winter 2015 7

  23. B+ Tree Index by Example d = 2 Find the key 40 80 40 ≤ 80 20 60 100 120 140 20 < 40 ≤ 60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 ≤ 40 10 15 18 20 30 40 50 60 65 80 85 90 CSE 344 - Winter 2015 8

  24. Clustered vs Unclustered B+ Tree B+ Tree Data entries Data entries ( Index File ) ( Data file ) Data Records Data Records CLUSTERED UNCLUSTERED Every table can have only one clustered and many unclustered indexes CSE 344 - Winter 2015 9

  25. Getting Practical: Creating Indexes in SQL CREATE##TABLE####V(M#int,###N#varchar(20),####P#int);# CREATE##INDEX#V1#ON#V(N)# CREATE##INDEX#V2#ON#V(P,#M)# CREATE##INDEX#V3#ON#V(M,#N)# CREATE#UNIQUE#INDEX#V4#ON#V(N)# Not#supported#in# SQLite# CREATE#CLUSTERED#INDEX#V5#ON#V(N)# CSE 344 - Winter 2015 10

  26. Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • How many indexes could we create? • Which indexes should we create? CSE 344 - Winter 2015 11

  27. Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • How many indexes could we create? • Which indexes should we create? In general this is a very hard problem 12

  28. Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • The index selection problem – Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!) • Who does index selection: – The database administrator DBA – Semi-automatically, using a database administration tool CSE 344 - Winter 2015 13

  29. Student ID# fName# lName# Which Indexes? 10# Tom# Hanks# 20# Amy# Hanks# …# • The index selection problem – Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!) • Who does index selection: – The database administrator DBA – Semi-automatically, using a database administration tool CSE 344 - Winter 2015 14

  30. Index Selection: Which Search Key • Make some attribute K a search key if the WHERE clause contains: – An exact match on K – A range predicate on K – A join on K CSE 344 - Winter 2015 15

  31. The Index Selection Problem 1 V(M, N, P); Your workload is this 100000 queries: 100 queries: SELECT * SELECT * FROM V FROM V WHERE N=? WHERE P=? What indexes ? CSE 344 - Winter 2015 16

  32. The Index Selection Problem 1 V(M, N, P); Your workload is this 100000 queries: 100 queries: SELECT * SELECT * FROM V FROM V WHERE N=? WHERE P=? A: V(N) and V(P) (hash tables or B-trees) CSE 344 - Winter 2015 17

  33. The Index Selection Problem 2 V(M, N, P); Your workload is this 100000 queries: 100 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N>? and N<? WHERE P=? What indexes ? CSE 344 - Winter 2015 18

  34. The Index Selection Problem 2 V(M, N, P); Your workload is this 100000 queries: 100 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N>? and N<? WHERE P=? A: definitely V(N) (must B-tree); unsure about V(P) CSE 344 - Winter 2015 19

  35. The Index Selection Problem 3 V(M, N, P); Your workload is this 100000 queries: 1000000 queries: 100000 queries: SELECT * INSERT INTO V SELECT * FROM V VALUES (?, ?, ?) FROM V WHERE N=? WHERE N=? and P>? What indexes ? CSE 344 - Winter 2015 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend