project 4 review
play

Project 4 Review 1 .sql Select CreateTable Saved Schema x - PowerPoint PPT Presentation

Project 4 Review 1 .sql Select CreateTable Saved Schema x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output) .sql Select CreateTable Load Phase Saved schemas.sql Schema CreateTable x


  1. Project 4 Review 1

  2. .sql Select CreateTable Saved Schema π σ x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  3. .sql Select CreateTable Load Phase Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  4. .sql Select CreateTable Load Phase Schema & Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  5. .sql Select Load Phase Schema & Statistics schemas.sql π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  6. .sql Select Load Phase Schema & Statistics schemas.sql π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  7. Load Phase You will be given n number of CreateTable statements (number will be announced) Do not print prompt until you process the data You have 5 minutes with the data Finally, print the next prompt

  8. Query Phase As before Needs to be faster Needs to run with a very limited memory available Hint: External Sort & Indexing & Buckets

  9. .sql Select CreateTable Load Phase Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  10. Serializing Records Option 1 : Object{In|Out}putStream Faster! (Smaller data, Object serialization better than Strings) public class Tuple implements Serializable { … } Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream objOut = new ObjectOutputStream(out); objOut.writeObject(t); byte[] tupleData = out.toByteArray(); … proceed as before … 10

  11. Serializing Records Option 1 : ObjectOutputStream Faster! (Smaller data, Object serialization better than Strings) … get tupleData byte array as before … ByteArrayInputStream in = new ByteArrayInputStream(tupleData); ObjectInputStream objIn = new ObjectInputStream(in); Tuple t = objIn.readObject(t); 11

  12. Serializing Records Option 2 : Data{In|Out}putStream Fastest! (Tiny data, No Reflection overheads) Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataOut = new DataOutputStream(out); // dataOut.writeDouble(d); // dataOut.writeLong(l); // dataOut.writeUTF(s); … get bytes as before … 12

  13. .sql Select CreateTable Load Phase Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  14. Cost-Based Estimation Opportunity 1 : Which index do I use? (What’s the most selective predicate) Opportunity 2 : Which join order do I use? If you get this right… Oracle/MS/Google has a job for you. (Which order creates the fewest intermediate tuples) 14

  15. Cost-Based Estimation Opportunity 1 : Which index do I use? (What’s the most selective predicate) # of distinct values Upper/Lower Bounds Histograms 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend