Last Class Carnegie Mellon Univ. Catalog Dept. of Computer Science - PDF document

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Last Class Carnegie Mellon Univ. • Catalog Dept. of Computer Science • Intro to Operator Evaluation 15-415/615 - DB Applications • Typical Query Optimizer • Projection/Aggregation C. Faloutsos – A. Pavlo Lecture#14: Implementation of Relational Operations Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Today‟s Class Access Paths • More on Indexes • How the DBMS retrieves tuples from a • Explain table for a query plan. – File Scan (aka Sequential Scan) • Joins – Index Scan (Tree, Hash, List, …) • Mid-term Review (Christos) • Selectivity of an access path: – % of pages we retrieve – e.g., Selectivity of a hash index, on range query: 100% (no reduction!) Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4 1

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Selection Conditions B+Tree Prefix Search • A B-tree index matches (a conjunction of) Key = xy terms that involve only attributes in a prefix Key = _ y of the search key. yz – Index on <a,b,c> matches (a=5 AND b=3) , but ? not b=3 . • For Hash index, we must have all attributes xx xy zy zz in search key. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Partial Indexes Covering Indexes • Create an index on a subset of the entire • If all of the fields needed to process the table. This potentially reduces its size and query are available in an index, then the the amount of overhead to maintain it. DBMS does not need to retrieve the tuple. CREATE INDEX idx_foo CREATE INDEX idx_foo ON foo (a, b) ON foo (a, b) WHERE c = ‘ WuTang ’ SELECT b FROM foo SELECT b FROM foo WHERE a = 123 WHERE a = 123 AND c = ‘ WuTang ’ Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8 2

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Today‟s Class Index Include Columns • Embed additional columns in indexes to • More on Indexes support index-only queries. • Explain • Not part of the search key. • Joins CREATE INDEX idx_foo • Mid-term Review (Christos) ON foo (a, b) INCLUDE (c) SELECT b FROM foo WHERE a = 123 AND c = ‘ WuTang ’ Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS EXPLAIN EXPLAIN SELECT bid, COUNT( * ) AS cnt • When you precede a SELECT statement FROM Reserves with the keyword EXPLAIN , the DBMS GROUP BY bid ORDER BY cnt displays information from the optimizer about the statement execution plan. Pseudo Query Plan: • The system “explains” how it would SORT process the query, including how tables are COUNT joined and in which order. GROUP BY p bid RESERVES Faloutsos/Pavlo CMU SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12 3

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS EXPLAIN EXPLAIN EXPLAIN SELECT bid, COUNT( * ) AS cnt EXPLAIN SELECT bid, COUNT( * ) AS cnt FROM Reserves FROM Reserves GROUP BY bid GROUP BY bid ORDER BY cnt ORDER BY cnt Postgres v9.1 MySQL v5.5 Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS EXPLAIN ANALYZE EXPLAIN ANALYZE EXPLAIN ANALYZE • ANALYZE option causes the statement to be SELECT bid, COUNT( * ) AS cnt actually executed. FROM Reserves GROUP BY bid • The actual runtime statistics are displayed. ORDER BY cnt • This is useful for seeing whether the planner's estimates are close to reality. • Note that ANALYZE is a Postgres idiom. Postgres v9.1 Faloutsos/Pavlo CMU SCS 15-415/615 15 Faloutsos/Pavlo CMU SCS 15-415/615 16 4

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Today‟s Class EXPLAIN ANALYZE • Works on any type of query. • More on Indexes • Since ANALYZE actually executes a query, • Explain if you use it with a query that modifies the • Joins table, that modification will be made. • Mid-term Review (Christos) Faloutsos/Pavlo CMU SCS 15-415/615 17 Faloutsos/Pavlo CMU SCS 15-415/615 18 CMU SCS CMU SCS Cost-based Query Sub-System Sample Database Select * Queries From Blah B Where B.blah = blah SAILORS RESERVES sid sname rating age sid bid day rname 1 Christos 999 45.0 6 103 2014-02-01 matlock Query Parser 3 Obama 50 52.0 1 102 2014-02-02 macgyver 2 Tupac 32 26.0 2 101 2014-02-02 a-team 6 Bieber 10 19.0 1 101 2014-02-01 dallas Query Optimizer Sailors ( sid : int, sname : varchar, rating : int, age : real) Plan Plan Cost Catalog Manager Reserves ( sid : int, bid : int, day : date, rname : varchar) Generator Estimator Schema Statistics Query Plan Evaluator 19 Faloutsos/Pavlo CMU SCS 15-415/615 20 5

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Sample Database Joins SAILORS RESERVES • R ⨝ S is very common and thus must be sid sname rating age sid bid day rname carefully optimized. 1 Christos 999 45.0 6 103 2014-02-01 matlock 3 Obama 50 52.0 1 102 2014-02-02 macgyver • R × S followed by a selection is inefficient 2 Tupac 32 26.0 2 101 2014-02-02 a-team 6 Bieber 10 19.0 1 101 2014-02-01 dallas because cross-product is large. Each tuple is 50 bytes Each tuple is 40 bytes • There are many approaches to reduce join 80 tuples per page 100 tuples per page cost, but no one works best for all cases. 500 pages total 1000 pages total N =500, p S =80 M =1000, p R =100 • Remember, join is associative and commutative. Faloutsos/Pavlo CMU SCS 15-415/615 21 Faloutsos/Pavlo CMU SCS 15-415/615 22 CMU SCS CMU SCS Joins • Join techniques we will cover: – Nested Loop Joins – Index Nested Loop Joins – Sort-Merge Joins – Hash Joins Faloutsos/Pavlo CMU SCS 15-415/615 23 Faloutsos/Pavlo CMU SCS 15-415/615 24 6

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Joins First Example • Assume: SELECT * FROM Reserves R, Sailors S – M pages in R, pR tuples per page, m tuples total WHERE R.sid = S.sid – N pages in S, pS tuples per page, n tuples total – In our examples, R is Reserves and S is Sailors. • Assume that we don‟t know anything about • We will consider more complex join the tables and we don‟t have any indexes. conditions later. • Cost metric: # of I/Os We will ignore output costs Faloutsos/Pavlo CMU SCS 15-415/615 25 Faloutsos/Pavlo CMU SCS 15-415/615 26 CMU SCS CMU SCS Simple Nested Loop Join Simple Nested Loop Join • Algorithm #0: Simple Nested Loop Join • Algorithm #0: Simple Nested Loop Join outer relation foreach tuple r of R foreach tuple r of R foreach tuple s of S foreach tuple s of S output, if they match output, if they match inner relation R(A,..) R(A,..) S(A, ......) S(A, ......) Faloutsos/Pavlo CMU SCS 15-415/615 27 Faloutsos/Pavlo CMU SCS 15-415/615 28 7

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Simple Nested Loop Join Simple Nested Loop Join • Algorithm #0: Why is it bad? • Actual number: • How many disk accesses („ M ‟ and „ N ‟ are – M + (pR ∙ M) ∙ N = 1000 + 100 ∙ 1000 ∙ 500 the number of blocks for „ R ‟ and „ S ‟)? = 50,001,000 I/Os – At 10ms/IO, Total time ≈ 5.7 days – Cost: M + (pR ∙ M) ∙ N • What if smaller relation (S) was outer? R(A,..) – Slightly better… • What assumptions are being made here? M pages, S(A, ......) N pages, m tuples – 1 buffer for each table (and 1 for output) n tuples Faloutsos/Pavlo CMU SCS 15-415/615 29 Faloutsos/Pavlo CMU SCS 15-415/615 30 CMU SCS CMU SCS Simple Nested Loop Join Simple Nested Loop Join • Actual number: • Actual number: SSD ≈ 1.3 hours SSD ≈ 1.3 hours – M + (pR ∙ M) ∙ N = 1000 + 100 ∙ 1000 ∙ 500 – M + (pR ∙ M) ∙ N = 1000 + 100 ∙ 1000 ∙ 5000 at 0.1ms/IO at 0.1ms/IO = 50,001,000 I/Os = 50,001,000 I/Os – At 10ms/IO, Total time ≈ 5.7 days – At 10ms/IO, Total time ≈ 5.7 days • What if smaller relation (S) was outer? • What if smaller relation (S) was outer? – Slightly better… – Slightly better… • What assumptions are being made here? • What assumptions are being made here? – 1 buffer for each table (and 1 for output) – 1 buffer for each table (and 1 for output) Faloutsos/Pavlo CMU SCS 15-415/615 31 Faloutsos/Pavlo CMU SCS 15-415/615 32 8

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Simple Nested Loop Join Block Nested Loop Join • Actual number: • Algorithm #1: Block Nested Loop Join SSD ≈ 1.3 hours – M + (pR ∙ M) ∙ N = 1000 + 100 ∙ 1000 ∙ 5000 read block from R at 0.1ms/IO = 50,001,000 I/Os read block from S – At 10ms/IO, Total time ≈ 5.7 days output, if tuples match • What if smaller relation (S) was outer? R(A,..) – Slightly better… • What assumptions are being made here? M pages, S(A, ......) N pages, m tuples – 1 buffer for each table (and 1 for output) n tuples Faloutsos/Pavlo CMU SCS 15-415/615 33 Faloutsos/Pavlo CMU SCS 15-415/615 34 CMU SCS CMU SCS Block Nested Loop Join Block Nested Loop Join • Algorithm #1: Things are better. • Algorithm #1: Optimizations • How many disk accesses („ M ‟ and „ N ‟ are • Which one should be the outer relation? the number of blocks for „ R ‟ and „ S ‟)? – The smallest (in terms of # of pages) – Cost: M + (M∙N) R(A,..) R(A,..) M pages, M pages, S(A, ......) S(A, ......) N pages, N pages, m tuples m tuples n tuples n tuples Faloutsos/Pavlo CMU SCS 15-415/615 35 Faloutsos/Pavlo CMU SCS 15-415/615 36 9

Last Class Carnegie Mellon Univ. Catalog Dept. of Computer Science - PDF document

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Last Class Carnegie Mellon Univ. Catalog Dept. of Computer Science Intro to Operator Evaluation 15-415/615 - DB Applications Typical Query Optimizer Projection/Aggregation C.

Carnegie Mellon University Search TRECVID 2004 Workshop November 2004 Mike Christel, Jun

Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

Running Incomplete Programs Ian Voysey Carnegie Mellon University Cyrus Omar Carnegie Mellon

HookFinder: Identifying and Understanding Malware Hooking Behaviors Heng Yin Zhenkai Liang Dawn

Kensuke Kakiuchi (Nagoya Univ./ The Univ. of Tokyo) Collaborators: Takeru K. Suzuki (The Univ. of

Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon

From Carnegie Mellon to Kyoto: How Far Can We Go? Project Courses at Carnegie Mellon Involve

Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon

SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University

15-213 Recitation: Attack Lab Jenna MacCarley 28 Sep 2015 Carnegie Mellon Reminder Bomb lab

15-213 Recitation: Bomb Lab 21 Sep 2015 Monil Shah, Shelton DSouza Carnegie Mellon Agenda

for BlueGene/P Franz Franchetti 1 , Yevgen Voronenko 2 , Gheorghe Almasi 3 1 Carnegie Mellon

More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon

current interactions on iron in the NINJA experiment Contents Toho Univ. , Nagoya Univ., Kobe

A First Look Franz Franchetti, Daniele G. Spampinato, Anuva Kulkarni, Tze Meng Low Carnegie

3. Java - Language Constructs I them properly You know how values are defined in the source code (

persistent memory semantics in programming languages overview of the ongoing research Piotr

Peter Rutgers, Claudio Martella, Spyros Voulgaris , Peter Boncz VU University Amsterdam Spyros

Welcome to CS50 section! This is Week 8. Please open your CS50 IDE and run this in your console:

Elasticsearch for dummies Jakub Kluvnek jakub.kluvanek@biddingfox.com @kluvi, @biddingfox_devs

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

A Static Verification Framework for Message Passing in Go using Behavioural Types Julien Lange 1 ,

SEO TOOLS KEEPING A CHECK ON THE LATEST & GREATEST SEO TOOLS IN THE MARKET Introduction

Last Class Carnegie Mellon Univ. Catalog Dept. of Computer Science - PDF document

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS CMU SCS Last Class Carnegie Mellon Univ. Catalog Dept. of Computer Science Intro to Operator Evaluation 15-415/615 - DB Applications Typical Query Optimizer Projection/Aggregation C.

Carnegie Mellon University Search TRECVID 2004 Workshop November 2004 Mike Christel, Jun

Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

Running Incomplete Programs Ian Voysey Carnegie Mellon University Cyrus Omar Carnegie Mellon

HookFinder: Identifying and Understanding Malware Hooking Behaviors Heng Yin Zhenkai Liang Dawn

Kensuke Kakiuchi (Nagoya Univ./ The Univ. of Tokyo) Collaborators: Takeru K. Suzuki (The Univ. of

Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon

From Carnegie Mellon to Kyoto: How Far Can We Go? Project Courses at Carnegie Mellon Involve

Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon

SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University

15-213 Recitation: Attack Lab Jenna MacCarley 28 Sep 2015 Carnegie Mellon Reminder Bomb lab

15-213 Recitation: Bomb Lab 21 Sep 2015 Monil Shah, Shelton DSouza Carnegie Mellon Agenda

for BlueGene/P Franz Franchetti 1 , Yevgen Voronenko 2 , Gheorghe Almasi 3 1 Carnegie Mellon

More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon

current interactions on iron in the NINJA experiment Contents Toho Univ. , Nagoya Univ., Kobe

A First Look Franz Franchetti, Daniele G. Spampinato, Anuva Kulkarni, Tze Meng Low Carnegie

3. Java - Language Constructs I them properly You know how values are defined in the source code (

persistent memory semantics in programming languages overview of the ongoing research Piotr

Peter Rutgers, Claudio Martella, Spyros Voulgaris , Peter Boncz VU University Amsterdam Spyros

Welcome to CS50 section! This is Week 8. Please open your CS50 IDE and run this in your console:

Elasticsearch for dummies Jakub Kluvnek jakub.kluvanek@biddingfox.com @kluvi, @biddingfox_devs

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

A Static Verification Framework for Message Passing in Go using Behavioural Types Julien Lange 1 ,

SEO TOOLS KEEPING A CHECK ON THE LATEST &amp; GREATEST SEO TOOLS IN THE MARKET Introduction

SEO TOOLS KEEPING A CHECK ON THE LATEST & GREATEST SEO TOOLS IN THE MARKET Introduction