Project 4 Review 1 .sql Select CreateTable Saved Schema x - - PowerPoint PPT Presentation

project 4 review
SMART_READER_LITE
LIVE PREVIEW

Project 4 Review 1 .sql Select CreateTable Saved Schema x - - PowerPoint PPT Presentation

Project 4 Review 1 .sql Select CreateTable Saved Schema x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output) .sql Select CreateTable Load Phase Saved schemas.sql Schema CreateTable x


slide-1
SLIDE 1

Project 4 Review

1

slide-2
SLIDE 2

.sql

Select CreateTable

Saved Schema

(Output)

π σ x R S

Iterator

Optimizer ORDERS.dat CUSTOMER.dat LINEITEM.dat

slide-3
SLIDE 3

.sql

Select CreateTable

Saved Schema

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Load Phase

slide-4
SLIDE 4

.sql

Select CreateTable

Saved Schema

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Schema & Statistics

Load Phase

slide-5
SLIDE 5

.sql

Select

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Schema & Statistics

Load Phase

slide-6
SLIDE 6

.sql

Select

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Schema & Statistics

Load Phase

Indexes

slide-7
SLIDE 7

Load Phase You will be given n number of CreateTable statements (number will be announced) Do not print prompt until you process the data You have 5 minutes with the data Finally, print the next prompt

slide-8
SLIDE 8

Query Phase As before Needs to be faster Needs to run with a very limited memory available Hint: External Sort & Indexing & Buckets

slide-9
SLIDE 9

.sql

Select CreateTable

Saved Schema

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Statistics Indexes

Load Phase

slide-10
SLIDE 10

Serializing Records

10

Option 1: Object{In|Out}putStream

Faster! (Smaller data, Object serialization better than Strings)

public class Tuple implements Serializable { … } Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream objOut = new ObjectOutputStream(out);

  • bjOut.writeObject(t);

byte[] tupleData = out.toByteArray(); … proceed as before …

slide-11
SLIDE 11

Serializing Records

11

Option 1: ObjectOutputStream

Faster! (Smaller data, Object serialization better than Strings)

… get tupleData byte array as before … ByteArrayInputStream in = new ByteArrayInputStream(tupleData); ObjectInputStream objIn = new ObjectInputStream(in); Tuple t = objIn.readObject(t);

slide-12
SLIDE 12

Serializing Records

12

Option 2: Data{In|Out}putStream

Fastest! (Tiny data, No Reflection overheads)

Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataOut = new DataOutputStream(out); // dataOut.writeDouble(d); // dataOut.writeLong(l); // dataOut.writeUTF(s); … get bytes as before …

slide-13
SLIDE 13

.sql

Select CreateTable

Saved Schema

(Output)

π σ x R S

Iterator

Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat

Statistics Indexes

Load Phase

slide-14
SLIDE 14

Cost-Based Estimation

14

Opportunity 1: Which index do I use? Opportunity 2: Which join order do I use? (What’s the most selective predicate) (Which order creates the fewest intermediate tuples)

If you get this right… Oracle/MS/Google has a job for you.

slide-15
SLIDE 15

Cost-Based Estimation

15

Opportunity 1: Which index do I use? (What’s the most selective predicate) # of distinct values Upper/Lower Bounds Histograms