Introduction Query Execution Engine Implements a set of physical - PDF document

Overview of Query Optimization in Relational Systems • An overview of current query optimization techniques Overview of Query Optimization • Provides fundamentals of query in Relational Systems optimization Presenter: Albert Wong Discussion: Stephen Ingram Introduction Query Execution Engine • Implements a set of physical operators • 2 key components of query evaluation • A physical operator takes as input one or more component of a SQL database system data streams and produces an output data – Query optimizer stream – Query execution engine – Ex. (external) sort, sequential scan, index scan, nested loop join, sort-merge join – pieces of code that are used as building blocks to make possible the execution of SQL queries – responsible for the execution of the operator tree (execution plan) that results in generating answers to the query Example Operator Tree Query Optimizer • Responsible for generating the input for the execution engine • Takes a parsed representation of a SQL query as input • Responsible for generating an efficient execution plan for the given SQL query from the space of possible execution plans 1

The Key Idea: Query Optimization Goals of an Optimizer as a Search Problem • To solve problem, we need to provide: • Search space includes plans that have low cost – Search space – Cost estimation technique so that a cost may • Costing technique is accurate be assigned to each plan in the search space • Enumeration algorithm is efficient – Enumeration algorithm that can search through the execution space Search Space Commuting Between Operators • Generalized Join Sequencing • Depends on the set of algebraic transformations that preserve equivalence • Outer Join and Join and the set of physical operators – Join(R, S LOJ T) = Join(R, S) LOJ T supported in an optimizer • Group-By and Join • Transformations do not necessarily reduce cost and therefore must be applied in a cost-based manner by the enumeration algorithm to ensure a positive benefit Linear and Bushy Joins A Hairy Discussion • Does the formulation of a query affect the execution of that query? Can users optimize their queries' execution through better syntax? • Bushy Joins: Is it naive to just leave them out of the search? Why do we always only consider linear joins? When would this cause problems? How could we incorporate bushy joins into our search? 2

Multi-Block Query to Single-Block Statistics and Cost Estimation • Merging Views • Cost estimation must be accurate because optimization is only as good as its cost estimates – Q = Join(R,V) – View V = Join(S,T) • Must be efficient as it is repeatedly invoked by – Q = Join(R,Join(S,T) the optimizer • Merging Nested • Basic estimation framework Subqueries – collect statistical summaries of data stored – given an operator and statistical summaries of its input streams, determine • statistical summary of output data stream • estimated cost of executing the operation Statistical Summaries of Data A Statistical Discussion • Ex. number of tuples in table, number of physical • Some estimated statistics are provably pages used by table, statistical information on erroneous. Is it then worth estimating? If columns such as histograms so, what sort of strategy should we adopt • Can use sampling to determine histograms that when using estimates with known are accurate for a large class of queries problems? – estimating distinct values is provably error prone • Statistics must be propagated from base data to be useful – Can be difficult as assumptions must be made when propagating statistical summaries Cost Computation Enumeration Architectures • Enumeration algorithm must pick an inexpensive • CPU, I/O, (parallel or distributed systems) execution plan for a given query by exploring the communication costs are all factors in cost search space estimation • Enumerators tend to concentrate on linear join sequences rather than bushy join sequences • Difficult to determine best cost estimator due to the size of the search space including bushy join sequences • Statistical summary propagation and • Want to build enumerator that can adapt to accurate cost estimation are difficult open changes in search space issues in query optimization – New transformations – Addition of new physical operators – Changes in cost estimation techniques 3

Extensible Optimizers Materialized Views • Use of generalized cost functions and • Views cached by database system physical properties with operator nodes • Query can take advantage of materialized views to reduce the cost of executing the query • Use of a rule engine that allows • Problems transformations to modify the query – Reformulating query to take advantage of expression or the operator trees materialized views (general problem is undecidable) • Exposed “knobs” to tune behavior of – Determining effective sufficient conditions is nontrivial system • Ex. Starburst and Volcano/Cascades Summary of Chaudhuri’s Paper Ending Discussions • Most of us have decided that the Relational Model is the • Query optimization considered a search way to go. These papers, however, show that under the problem whose solution requires a search hood are some scary problems and black magic. Is this surprising? Why (not)? space, cost estimation technique, and an • This paper is from 98, System R paper is from ~ 20 enumeration algorithm years earlier. How has query optimization changed in that span of time? Is the amount/direction of progress • Query optimization can be considered an surprising? Do you expect much change in the years art since the printing of this paper? • What other areas of Computer Science is query • No one knows what the best execution optimization like? Could it benefit from ideas from other plan for a given query is areas? How? 4

Introduction Query Execution Engine Implements a set of physical - PDF document

Overview of Query Optimization in Relational Systems An overview of current query optimization techniques Overview of Query Optimization Provides fundamentals of query in Relational Systems optimization Presenter: Albert Wong

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Physical Operators Scanning, sorting, merging, hashing 193 Physical Operators Execution Query

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Cost-Based Plan Selection Enumerate, Estimate, Select 304 Cost-Based Plan Selection Execution

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming paradigm A query execution

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY

The isomorphism and bi-embeddability relations for finitely generated groups Simon Thomas

Topological aspects of enumeration degrees Takayuki Kihara Department of Mathematics, University

Point degree spectra of represented spaces Arno Pauly Swansea University Joint work with

The Presentaion-Based Paper The Paper A Top-Down Row Enumeration Approach of Top-Down

Learning Equivalence Structures Luca San Mauro (Vienna University of Technology) Logic Colloquium

Inference in Bayesian Networks CE417: Introduction to Artificial Intelligence Sharif University

G22.2390-001 Logic in Computer Science Fall 2009 Lecture 3 1 Review Last week

Optimizing Procedure Calls Inlining Procedure calls can be costly (A.k.a. procedure integration,