CSE 4/562 Database Systems Practicum Component Project Outline - PowerPoint PPT Presentation

CSE 4/562 Database Systems Practicum Component

Project Outline Parser & SQL Query Relational Algebra Translator A relational query processor Optimizer Statistics Query Evaluation Execution Plan Result Engine

Project Outline Parser & SQL Query Relational Algebra Translator Checkpoint 3 Optimization Optimizer Statistics Query Evaluation Execution Plan Result Engine

Checkpoint 3 How do make your system faster? • Programming efficiency? • Choosing a strategy? • More efficient operators? • How can you deal with aggregation? •

Checkpoint 3 • 3 Main Components • New functions (Distinct, Group-By, and functions) • Optimization (Projection/Selection Pushdown) • Join Algorithms

New Functions • GROUP BY – HAVING • COUNT() • AVG() • SUM() • MIN() – MAX() • DISTINCT

Aggregations • Hash aggregation algorithm • Requires more memory • Stream aggregation algorithm • Requires data to be sorted first

Hash Aggregation for each input row begin calculate hash value on group by column(s) check for a matching row in the hash table if we find a match update the matching row with the input row else insert a new row into the hash table end output all rows in the hash table

Stream Aggregation For each input row begin if the input row does not match the current columns begin output the aggregate results clear the current aggregate results set the current group-by columns to the input row end update the aggregate results with the input row end

Optimization • You already implemented Selection pushdown • You need to implement Projection pushdown

Selection Pushdown • Helps you filter out unnecessary tuples early on • Provides both memory and CPU time profits • Saves you memory because join and cross product algorithms use memory based on their input size • Saves you CPU time because other operators do not need to deal with unnecessary tuples

Projection Pushdown • Helps you filter out unnecessary attributes early on • Provides both memory and CPU time profits • Saves memory because you don’t carry unnecessary data between operators • Saves CPU time because you don’t copy unnecessary data when you modify the tuple schema and need to copy data to the new tuple

Join Algorithms • Nested-Loop-Join and Block-Nested-Loop-Join are slow… • Try other join algorithms

Classic Hash Join Works when the smaller relation R fits in memory. Build a in-memory hash table for the smaller 1. relation; For each record in the larger relation, probe the 2. hash table. If the smaller relation does not fit in memory, partition into smaller buckets!

Simple Hash Join 1. for each logical bucket j for each record r in R 2. if r is in bucket j then 3. insert r into the hash table; 4. for each record s in S 5. if s is in bucket j then 6. probe the hash table; 7. • Classic hash join is a special case, with one bucket; • Optimization: write the tuples not in bucket j to disk; • Works good when memory is large (nearly as large as | R |).

GRACE Hash Join partition R into n buckets so that each bucket fits 1. in memory; partition S into n buckets; 2. for each bucket j do 3. for each record r in Rj do 4. insert into a hash table; 5. for each record s in Sj do 6. probe the hash table. 7. • Works good when memory is small.

Hybrid Hash Join • Hybrid of simple hash join and GRACE; • When partitioning R , keep the records of the first bucket in memory as a hash table; Typically this means that the first bucket uses more pages in • memory (all other partitions are 1 page each) • When partitioning S , for records of the first bucket, probe the hash table directly; • Saving: no need to write R 1 and S 1 to disk or read back to memory. • Works good for large and small memory.

Handle Partition Overflow • Case 1, overflow on disk: an R partition is larger than memory size (note: don’t care about the size of S partitions). • Solution (a) small partitions first and combine before join; • Solution (b) recursive partition. • Case 2, overflow in memory: the in-memory hash table of R becomes too large. • Solution: revise the partitioning scheme and keep a smaller partition in memory.

Questions?

CSE 4/562 Database Systems Practicum Component Project Outline - PowerPoint PPT Presentation

CSE 4/562 Database Systems Practicum Component Project Outline Parser & SQL Query Relational Algebra Translator A relational query processor Optimizer Statistics Query Evaluation Execution Plan Result Engine Project Outline

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome 562 February 2015 Week 7 Genome 562 p.1/5 East and Jones, 1918 Edward M. East

ME 562: Sustainable Energy an Exergy Analysis Spring 2014 Mechanical Engineering ME462/562

Absorption cooling Mechanical Engineering ME462/562 Sustainable Energy: an Exergy Analysis

Psychrometric processes Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

Heat pumps Mechanical Engineering ME462/562 Sustainable Energy: an Exergy Analysis Heat pump

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

CSE 132B CSE 132B Database Systems Applications Database Systems Applications SQL as Query

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

by Gamma and Fluorescence Sarl au capital de 15 000 RCS PARIS 432 607 562 Siret 432

Thermal Energy Storage Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

First things first Project 1 Hashing Still working on grading Definitely by

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory Xiaomin Zou 1 , Fang Wang 1 *,

Intro to AWS and Boto3 IN TRODUCTION TO AW S BOTO IN P YTH ON Maksim Pecherskiy Data

OpenGL and GLSL Steve Marschner CS4620 Cornell University Cornell CS4620 Fall 2020 Steve

Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The

So You Want To Start A Green Business Program? Green Business Engagement National Network (GBENN)

Gamification: Behind the buzz Unity Studios GOTO, October 2013 Unity Studios GOTO,

What is the buzz term in the current field of computer science? A. Cloud Computing B. Grid

CSE 4/562 Database Systems Practicum Component Project Outline - PowerPoint PPT Presentation

CSE 4/562 Database Systems Practicum Component Project Outline Parser & SQL Query Relational Algebra Translator A relational query processor Optimizer Statistics Query Evaluation Execution Plan Result Engine Project Outline

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome 562 February 2015 Week 7 Genome 562 p.1/5 East and Jones, 1918 Edward M. East

ME 562: Sustainable Energy an Exergy Analysis Spring 2014 Mechanical Engineering ME462/562

Absorption cooling Mechanical Engineering ME462/562 Sustainable Energy: an Exergy Analysis

Psychrometric processes Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

Heat pumps Mechanical Engineering ME462/562 Sustainable Energy: an Exergy Analysis Heat pump

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Welcome to CSE 506 Introduc/on &amp; Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

CSE 132B CSE 132B Database Systems Applications Database Systems Applications SQL as Query

CSE 562: Mobile Systems &amp; Applications Quals Course Systems Area Shyam Gollakota First

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

by Gamma and Fluorescence Sarl au capital de 15 000 RCS PARIS 432 607 562 Siret 432

Thermal Energy Storage Mechanical Engineering ME462/562 Sustainable Energy: an Exergy

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

First things first Project 1 Hashing Still working on grading Definitely by

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory Xiaomin Zou 1 , Fang Wang 1 *,

Intro to AWS and Boto3 IN TRODUCTION TO AW S BOTO IN P YTH ON Maksim Pecherskiy Data

OpenGL and GLSL Steve Marschner CS4620 Cornell University Cornell CS4620 Fall 2020 Steve

Querying Heterogeneous Information Sources Using Source Descriptions &amp; Data Integration: The

So You Want To Start A Green Business Program? Green Business Engagement National Network (GBENN)

Gamification: Behind the buzz Unity Studios GOTO, October 2013 Unity Studios GOTO,

What is the buzz term in the current field of computer science? A. Cloud Computing B. Grid

Welcome to CSE 506 Introduc/on & Review Don Porter 1 2 CSE 506: Opera.ng Systems CSE 506:

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First

Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The