Cost-based Query Sub-System Select * Queries From Blah B Where - PDF document

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation CMU SCS Today's Class • Catalog (12.1) • Intro to Operator Evaluation (12.2-3) • Typical Query Optimizer (12.6) • Projection: Sorting vs. Hashing (14.3.2) Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS Cost-based Query Sub-System Select * Queries From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Plan Cost Catalog Manager Generator Estimator Schema Statistics Query Plan Evaluator 15-415/615 Faloutsos 3 1

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Cost-based Query Sub-System Select * Queries From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Plan Cost Catalog Manager Generator Estimator Schema Statistics Query Plan Evaluator 15-415/615 Faloutsos 4 CMU SCS Catalog: Schema • What would you store? – Info about tables, attributes, indices, users • How? – In tables! Attribute_Cat (attr_name: string , rel_name: string ; type: string ; position: integer ) Faloutsos/Pavlo CMU SCS 15-415/615 5 CMU SCS Catalog: Schema • What would you store? – Info about tables, attributes, indices, users • How? – In tables! Attribute_Cat (attr_name: string , rel_name: string ; type: string ; position: integer ) See INFORMATION_SCHEMA discussion from Lecture #7 Faloutsos/Pavlo CMU SCS 15-415/615 6 2

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Catalog: Statistics • Why do we need them? – To estimate cost of query plans • What would you store? – NTuples(R): # records for table R – NPages(R): # pages for R – NKeys(I): # distinct key values for index I – INPages(I): # pages for index I – IHeight(I): # levels for I – ILow(I), IHigh(I) : range of values for I Faloutsos/Pavlo CMU SCS 15-415/615 7 CMU SCS Catalog: Statistics • Why do we need them? – To estimate cost of query plans • What would you store? – NTuples(R): # records for table R – NPages(R): # pages for R – NKeys(I): # distinct key values for index I – INPages(I): # pages for index I – IHeight(I): # levels for I – ILow(I), IHigh(I) : range of values for I Faloutsos/Pavlo CMU SCS 15-415/615 8 CMU SCS Today's Class • Catalog (12.1) • Intro to Operator Evaluation (12.2-3) • Typical Query Optimizer (12.6) • Projection: Sorting vs. Hashing (14.3.2) Faloutsos/Pavlo CMU SCS 15-415/615 9 3

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 Relational Algebra: p cname, amt ( s amt>1000 (customer ⋈ account )) Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 p cname, amt s amt>1000 ⨝ acctno=acctno CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 11 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 “On -the- fly” p cname, amt “On -the- fly” s amt>1000 Nested Loop ⨝ acctno=acctno File Scan File Scan CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 12 4

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 p s Each operator iterates The output of each over its input and operator is the input performs some task. to the next operator. ⨝ CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 13 CMU SCS Operator Evaluation • Several algorithms are available for different relational operators. • Each has its own performance trade-offs. • The goal of the query optimizer is to choose the one that has the lowest “cost”. Next Class: How the DBMS finds the best plan. Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS Operator Execution Strategies • Indexing • Iteration (= seq. scanning) • Partitioning (sorting and hashing) Faloutsos/Pavlo CMU SCS 15-415/615 15 5

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Access Paths • How the DBMS retrieves tuples from a table for a query plan. – File Scan (aka Sequential Scan) – Index Scan (Tree, Hash, List, …) • Selectivity of an access path: – % of pages we retrieve – e.g., Selectivity of a hash index, on range query: 100% (no reduction!) Faloutsos/Pavlo CMU SCS 15-415/615 16 CMU SCS Operator Algorithms • Selection: • Projection: • Join: • Group By: • Order By: Faloutsos/Pavlo CMU SCS 15-415/615 17 CMU SCS Operator Algorithms • Selection: file scan; index scan • Projection: hashing; sorting • Join: • Group By: • Order By: Faloutsos/Pavlo CMU SCS 15-415/615 18 6

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Operator Algorithms • Selection: file scan; index scan • Projection: hashing; sorting • Join: many ways (loops, sort-merge, etc) • Group By: • Order By: Faloutsos/Pavlo CMU SCS 15-415/615 19 CMU SCS Operator Algorithms • Selection: file scan; index scan • Projection: hashing; sorting • Join: many ways (loops, sort-merge, etc) • Group By: hashing; sorting • Order By: sorting Faloutsos/Pavlo CMU SCS 15-415/615 20 CMU SCS Operator Algorithms Next Class • Selection: file scan; index scan Today • Projection: hashing; sorting Next Class • Join: many ways (loops, sort-merge, etc) Today • Group By: hashing; sorting Today • Order By: sorting Faloutsos/Pavlo CMU SCS 15-415/615 21 7

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Today's Class • Catalog (12.1) • Intro to Operator Evaluation (12.2-3) • Typical Query Optimizer (12.6) • Projection: Sorting vs. Hashing (14.3.2) Faloutsos/Pavlo CMU SCS 15-415/615 22 CMU SCS Query Optimization • Bring query in internal form (eg., parse tree) • … into “canonical form” (syntactic q -opt) • Generate alternative plans. • Estimate cost for each plan. • Pick the best one. Faloutsos/Pavlo CMU SCS 15-415/615 23 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 p cname, amt s amt>1000 ⨝ acctno=acctno CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 24 8

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Query Plan Example SELECT cname, amt FROM customer, account WHERE customer.acctno = account.acctno AND account.amt > 1000 p p cname, amt cname, amt s ⨝ acctno=acctno amt>1000 ⨝ s acctno=acctno amt>1000 CUSTOMER ACCOUNT CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 25 CMU SCS Today's Class • Catalog (12.1) • Intro to Operator Evaluation (12.2,3) • Typical Query Optimizer (12.6) • Projection: Sorting vs. Hashing (14.3.2) Faloutsos/Pavlo CMU SCS 15-415/615 26 CMU SCS Duplicate Elimination SELECT DISTINCT bname FROM account WHERE amt > 1000 • What does it do, in English? • How to execute it? p DISTINCT bname ( s amt>1000 (account )) Not technically correct because RA doesn’t have “DISTINCT” Faloutsos/Pavlo CMU SCS 15-415/615 27 9

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Duplicate Elimination SELECT DISTINCT bname FROM account WHERE amt > 1000 p DISTINCT bname s Two Choices: amt>1000 • Sorting ACCOUNT • Hashing Faloutsos/Pavlo CMU SCS 15-415/615 28 CMU SCS Sorting Projection p acctno bname amt DISTINCT bname A-123 Redwood 1800 s A-789 Downtown 2000 amt>1000 A-123 Perry 1500 A-456 Downtown 1300 ACCOUNT acctno bname amt bname bname A-123 Redwood 1800 Redwood Downtown X A-789 Downtown 2000 Downtown Downtown A-123 Perry 1500 Perry Perry Filter Sort Remove A-456 Downtown 1300 Downtown Redwood Columns Eliminate Dupes Faloutsos/Pavlo CMU SCS 15-415/615 29 CMU SCS Alternative to Sorting: Hashing! • What if we don’ t need the order of the sorted data? – Forming groups in GROUP BY – Removing duplicates in DISTINCT • Hashing does this! – And may be cheaper than sorting! (why?) – But what if table doesn’ t fit in memory? Faloutsos/Pavlo CMU SCS 15-415/615 30 10

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Hashing Projection • Populate an ephemeral hash table as we iterate over a table. • For each record, check whether there is already an entry in the hash table: – DISTINCT : Discard duplicate. – GROUP BY : Perform aggregate computation. • Two phase approach. Faloutsos/Pavlo CMU SCS 15-415/615 31 CMU SCS Phase 1: Partition • Use a hash function h 1 to split tuples into partitions on disk. – We know that all matches live in the same partition. – Partitions are “ spilled ” to disk via output buffers. • Assume that we have B buffers. Faloutsos/Pavlo CMU SCS 15-415/615 32 CMU SCS Phase 1: Partition p acctno bname amt DISTINCT bname A-123 Redwood 1800 s A-789 Downtown 2000 amt>1000 A-123 Perry 1500 A-456 Downtown 1300 ACCOUNT B-1 partitions Redwood acctno bname amt bname A-123 Redwood 1800 Redwood Downtown h 1 A-789 Downtown 2000 Downtown Downtown A-123 Perry 1500 Perry ⋮ Filter Remove Hash A-456 Downtown 1300 Downtown Columns Perry Faloutsos/Pavlo CMU SCS 15-415/615 33 11

Cost-based Query Sub-System Select * Queries From Blah B Where - PDF document

Faloutsos/Pavlo CMU - 15-415/615 CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#13: Query Evaluation CMU SCS Today's Class Catalog (12.1) Intro to Operator

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Cost-based Query Optimization Christoph Koch Computing the cost of a query plan We know how

Cost-Based Plan Selection Enumerate, Estimate, Select 304 Cost-Based Plan Selection Execution

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Query by Humming System Query by Humming System Dong In Lee Dong In Lee MA/MST 07 07

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Cost Allocation Plans and Indirect Cost Rates Cost Allocation Plans and Indirect Cost Rates

Lessons Learned from Measuring Student Behavior During Class PERRY SAMSON UNIVERSITY OF

MORE THAN THE RIDE 2020 IMPACT REPORT STATE 101 - FUNDING OPERATING BUDGET (FY2020) CAPITAL

Characterizing Individual Behavior from Interaction History Patrick Perry NYU Stern Case Study:

The Globalization of the Cyber Market Sponsored By: The Globalization of the Cyber Market Visit

Renewables Opportunities Rob Glenn, John Randolph, Kim Briele,

#getmoving2020 getmoving2020.org Local Investment Team feedback Regional Transportation

Gaussian Process Regression with Perry Groot , Peter Lucas Censored Data Using Expectation

Emacs: The Editor for the Next Forty Years or... Some M-x info on Future-Proofing Emacs Perry E.