Project 4 Review
1
Project 4 Review 1 .sql Select CreateTable Saved Schema x - - PowerPoint PPT Presentation
Project 4 Review 1 .sql Select CreateTable Saved Schema x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output) .sql Select CreateTable Load Phase Saved schemas.sql Schema CreateTable x
1
.sql
Select CreateTable
Saved Schema
(Output)
π σ x R S
Iterator
Optimizer ORDERS.dat CUSTOMER.dat LINEITEM.dat
.sql
Select CreateTable
Saved Schema
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Load Phase
.sql
Select CreateTable
Saved Schema
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Schema & Statistics
Load Phase
.sql
Select
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Schema & Statistics
Load Phase
.sql
Select
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Schema & Statistics
Load Phase
Indexes
Load Phase You will be given n number of CreateTable statements (number will be announced) Do not print prompt until you process the data You have 5 minutes with the data Finally, print the next prompt
Query Phase As before Needs to be faster Needs to run with a very limited memory available Hint: External Sort & Indexing & Buckets
.sql
Select CreateTable
Saved Schema
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Statistics Indexes
Load Phase
10
Option 1: Object{In|Out}putStream
Faster! (Smaller data, Object serialization better than Strings)
public class Tuple implements Serializable { … } Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream objOut = new ObjectOutputStream(out);
byte[] tupleData = out.toByteArray(); … proceed as before …
11
Option 1: ObjectOutputStream
Faster! (Smaller data, Object serialization better than Strings)
… get tupleData byte array as before … ByteArrayInputStream in = new ByteArrayInputStream(tupleData); ObjectInputStream objIn = new ObjectInputStream(in); Tuple t = objIn.readObject(t);
12
Option 2: Data{In|Out}putStream
Fastest! (Tiny data, No Reflection overheads)
Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataOut = new DataOutputStream(out); // dataOut.writeDouble(d); // dataOut.writeLong(l); // dataOut.writeUTF(s); … get bytes as before …
.sql
Select CreateTable
Saved Schema
(Output)
π σ x R S
Iterator
Optimizer schemas.sql CreateTable ORDERS.dat CUSTOMER.dat LINEITEM.dat ORDERS.dat CUSTOMER.dat LINEITEM.dat
Statistics Indexes
Load Phase
14
Opportunity 1: Which index do I use? Opportunity 2: Which join order do I use? (What’s the most selective predicate) (Which order creates the fewest intermediate tuples)
If you get this right… Oracle/MS/Google has a job for you.
15
Opportunity 1: Which index do I use? (What’s the most selective predicate) # of distinct values Upper/Lower Bounds Histograms