CSE 344 SECTION 4 RELATIONAL ALGEBRA ! - - PowerPoint PPT Presentation

cse 344
SMART_READER_LITE
LIVE PREVIEW

CSE 344 SECTION 4 RELATIONAL ALGEBRA ! - - PowerPoint PPT Presentation

CSE 344 SECTION 4 RELATIONAL ALGEBRA ! Formalism)for)describing)queries) Why RA? ! Basis)of)rela4onal)databases) ! Will)make)you)a)SQL)wizard!) Notes on RA ! Mul4ple)possible)query)plans) ! Logical)vs.)Physical)query)plans) ) Example:


slide-1
SLIDE 1

CSE 344

SECTION 4 – RELATIONAL ALGEBRA

slide-2
SLIDE 2

Why RA?

! Formalism)for)describing)queries) ! Basis)of)rela4onal)databases) ! Will)make)you)a)SQL)wizard!)

slide-3
SLIDE 3

Notes on RA

! Mul4ple)possible)query)plans) ! Logical)vs.)Physical)query)plans) )

slide-4
SLIDE 4

Example: RA-to-SQL

) SELECT C.id) ) FROM Person P, Country C) ) WHERE P.countryid = C.id) ) AND C.continent=‘Africa’) ) GROUP BY C.id) ) HAVING COUNT(*) > 10000000)

Can)we)make)a)more)efficient)plan?) Person(id,)name,)countryid)) Country(id,)name,)con4nent))

slide-5
SLIDE 5

Demo in Azure!

slide-6
SLIDE 6

RA Reference Sheet

slide-7
SLIDE 7

CSE 344 - Winter 2015 1

From Logical Plans to Physical Plans

slide-8
SLIDE 8

Query Evaluation Steps

Parse & Check Query Decide how best to answer query: query

  • ptimization

Query Execution SQL query Return Results

Translate query string into internal representation Check syntax, access control, table names, etc.

Query Evaluation

CSE 344 - Winter 2015 2

Logical plan ! physical plan

slide-9
SLIDE 9

Example

CSE 344 - Winter 2015 3

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA Give a relational algebra expression for this query

Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity)

slide-10
SLIDE 10

Relational Algebra

CSE 344 - Winter 2015 4

π sname(σ scity=Seattle∧ sstate=WA∧ pno=2 (Supplier sid = sid Supply))

Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity)

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA

slide-11
SLIDE 11

5

Supplier Supply

sid = sid

σ scity=Seattle∧ sstate=WA∧ pno=2 π sname

Relational Algebra

CSE 344 - Winter 2015

Relational algebra expression is also called the “logical query plan” Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity)

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA

slide-12
SLIDE 12

6

Physical Query Plan 1

Supplier Supply

sid = sid

σ scity=Seattle∧sstate=WA∧ pno=2 π sname (File scan) (File scan) (Block-nested loop) (On the fly) (On the fly)

CSE 344 - Winter 2015

A physical query plan is a logical query plan annotated with physical implementation details Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity)

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA

slide-13
SLIDE 13

7

Supplier Supply

sid = sid

(a) σ scity=Seattle∧sstate=WA π sname (File scan) (File scan) (Sort-merge join) (Scan write to T2) (On the fly) (b) σ pno=2 (Scan write to T1)

Physical Query Plan 2

(c) (d)

CSE 344 - Winter 2015

Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Different but equivalent logical query plan; different physical plan

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA

slide-14
SLIDE 14

Supply Supplier

sid = sid

σ scity=Seattle∧sstate=WA π sname (Index nested loop) (Index lookup on sid) Doesnt matter if clustered or not (On the fly) (a) σ pno=2 (Index lookup on pno ) Assume: clustered

Physical Query Plan 3

(Use index) (b) (c) (d) (On the fly)

8

Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Another logical plan that produces the same result and is implemented with a different physical plan

SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = Seattle and x.sstate = WA

slide-15
SLIDE 15

Physical Data Independence

  • Means that applications are insulated from

changes in physical storage details

– E.g., can add/remove indexes without changing apps – Can do other physical tunings for performance

  • SQL and relational algebra facilitate physical

data independence because both languages are “set-at-a-time”: Relations as input and output

CSE 344 - Winter 2015 9

slide-16
SLIDE 16

Index

  • An additional file, that allows fast access to

records in the data file given a search key

1 CSE 344 - Winter 2015

slide-17
SLIDE 17

Index

  • An additional file, that allows fast access to

records in the data file given a search key

  • The index contains (key, value) pairs:

– The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record

2 CSE 344 - Winter 2015

slide-18
SLIDE 18

Index

  • An additional file, that allows fast access to

records in the data file given a search key

  • The index contains (key, value) pairs:

– The key = an attribute value (e.g., student ID or name) – The value = a pointer to the record

  • Could have many indexes for one table

3

Key = means here search key

CSE 344 - Winter 2015

slide-19
SLIDE 19

This Is Not A Key

Different keys:

  • Primary key – uniquely identifies a tuple
  • Key of the sequential file – how the datafile is

sorted, if at all

  • Index key – how the index is organized

CSE 344 - Winter 2015 4

slide-20
SLIDE 20

5

Example 1: Index on ID

10 20 50 200 220 240 420 800

CSE 344 - Winter 2015

Data File Student

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

10 Tom Hanks 20 Amy Hanks 50 … … 200 … 220 240 420 800

950 …

Index Student_ID on Student.ID

slide-21
SLIDE 21

6

Example 2: Index on fName

CSE 344 - Winter 2015

Index Student_fName

  • n Student.fName

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

Amy Ann Bob Cho … … … … … … Tom

10 Tom Hanks 20 Amy Hanks 50 … … 200 … 220 240 420 800

Data File Student

slide-22
SLIDE 22

Index Organization

Several index organizations:

  • Hash table
  • B+ trees – most popular

– They are search trees, but they are not binary instead have higher fanout – will discuss them briefly next

  • Specialized indexes: bit maps, R-trees,

inverted index

CSE 344 - Winter 2015 7

slide-23
SLIDE 23

8

B+ Tree Index by Example

80 20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90

d = 2

Find the key 40 40 ≤ 80 20 < 40 ≤ 60 30 < 40 ≤ 40

CSE 344 - Winter 2015

slide-24
SLIDE 24

Clustered vs Unclustered

Data entries

(Index File) (Data file)

Data Records Data entries Data Records

CLUSTERED UNCLUSTERED B+ Tree B+ Tree

9 CSE 344 - Winter 2015

Every table can have only one clustered and many unclustered indexes

slide-25
SLIDE 25

Getting Practical: Creating Indexes in SQL

10

CREATE##INDEX#V1#ON#V(N)# CREATE##TABLE####V(M#int,###N#varchar(20),####P#int);# CREATE##INDEX#V2#ON#V(P,#M)# CREATE##INDEX#V3#ON#V(M,#N)# CREATE#CLUSTERED#INDEX#V5#ON#V(N)#

CSE 344 - Winter 2015

CREATE#UNIQUE#INDEX#V4#ON#V(N)#

Not#supported#in# SQLite#

slide-26
SLIDE 26

Which Indexes?

  • How many indexes could we create?
  • Which indexes should we create?

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

CSE 344 - Winter 2015 11

slide-27
SLIDE 27

Which Indexes?

  • How many indexes could we create?
  • Which indexes should we create?

In general this is a very hard problem

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

12

slide-28
SLIDE 28

Which Indexes?

  • The index selection problem

– Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!)

  • Who does index selection:

– The database administrator DBA – Semi-automatically, using a database administration tool

13 CSE 344 - Winter 2015

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

slide-29
SLIDE 29

Which Indexes?

  • The index selection problem

– Given a table, and a “workload” (big Java application with lots of SQL queries), decide which indexes to create (and which ones NOT to create!)

  • Who does index selection:

– The database administrator DBA – Semi-automatically, using a database administration tool

14 CSE 344 - Winter 2015

Student

ID# fName# lName#

10# Tom# Hanks# 20# Amy# Hanks# …#

slide-30
SLIDE 30

Index Selection: Which Search Key

  • Make some attribute K a search key if the

WHERE clause contains:

– An exact match on K – A range predicate on K – A join on K

15 CSE 344 - Winter 2015

slide-31
SLIDE 31

The Index Selection Problem 1

16

V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this

What indexes ?

CSE 344 - Winter 2015

slide-32
SLIDE 32

The Index Selection Problem 1

17

V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this

A: V(N) and V(P) (hash tables or B-trees)

CSE 344 - Winter 2015

slide-33
SLIDE 33

The Index Selection Problem 2

18

V(M, N, P); SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this

What indexes ?

INSERT INTO V VALUES (?, ?, ?) 100000 queries:

CSE 344 - Winter 2015

slide-34
SLIDE 34

The Index Selection Problem 2

19

V(M, N, P); SELECT * FROM V WHERE P=? 100000 queries: 100 queries: Your workload is this INSERT INTO V VALUES (?, ?, ?) 100000 queries:

A: definitely V(N) (must B-tree); unsure about V(P)

SELECT * FROM V WHERE N>? and N<?

CSE 344 - Winter 2015

slide-35
SLIDE 35

The Index Selection Problem 3

20

V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: 1000000 queries: Your workload is this

What indexes ?

INSERT INTO V VALUES (?, ?, ?) 100000 queries:

CSE 344 - Winter 2015

slide-36
SLIDE 36

The Index Selection Problem 3

21

V(M, N, P); SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: 1000000 queries: Your workload is this

A: V(N, P)

INSERT INTO V VALUES (?, ?, ?) 100000 queries:

CSE 344 - Winter 2015

How does this index differ from:

  • 1. Two indexes V(N) and V(P)?
  • 2. An index V(P, N)?
slide-37
SLIDE 37

Basic Index Selection Guidelines

  • Consider queries in workload in order of importance
  • Consider relations accessed by query

– No point indexing other relations

  • Look at WHERE clause for possible search key
  • Try to choose indexes that speed-up multiple queries
  • And then consider the following…

CSE 344 - Winter 2015 22

slide-38
SLIDE 38

Index Selection: Multi-attribute Keys

Consider creating a multi-attribute key on K1, K2, … if

  • WHERE clause has matches on K1, K2, …

– But also consider separate indexes

  • SELECT clause contains only K1, K2, ..

– A covering index is one that can be used exclusively to answer a query, e.g. index R(K1,K2) covers the query:

23

SELECT K2 FROM R WHERE K1=55

CSE 344 - Winter 2015

slide-39
SLIDE 39

To Cluster or Not

  • Range queries benefit mostly from clustering
  • Covering indexes do not need to be

clustered: they work equally well unclustered

24 CSE 344 - Winter 2015

slide-40
SLIDE 40

25

Percentage tuples retrieved Cost 100 SELECT * FROM R WHERE K>? and K<?

CSE 344 - Winter 2015

slide-41
SLIDE 41

26

Percentage tuples retrieved Cost 100

Sequential scan

SELECT * FROM R WHERE K>? and K<?

CSE 344 - Winter 2015

slide-42
SLIDE 42

27

Percentage tuples retrieved Cost 100

Sequential scan

SELECT * FROM R WHERE K>? and K<?

CSE 344 - Winter 2015

slide-43
SLIDE 43

28

Percentage tuples retrieved Cost 100

Sequential scan

SELECT * FROM R WHERE K>? and K<?

CSE 344 - Winter 2015