Database Systems II Query Compiler CMPT 454, Simon Fraser - PDF document

Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Introduction The Query Compiler translates an SQL query into a physical query plan, which can be executed, in three steps: The query is parsed and represented as a parse tree . The parse tree is converted into a relational algebra expression tree ( logical query plan ). The logical query plan is refined into a physical query plan , which also specifies the algorithms used in each step and the way in which data is obtained. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 184

Introduction SQL query parse parse tree convert answer logical query plan execute query rewrite Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 185 Introduction Example SELECT B,D FROM R,S WHERE R.A = “c” S.E = 2 R.C=S.C Conceptual evaluation strategy: Perform cartesian product, Apply selection, and Project to specified attributes. Use as starting point for optimization. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 186

Introduction Example B,D R.A =“c” S.E=2 R.C=S.C X R S B,D [ R.C = S.C (RXS)] R.A=“c” S.E=2 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 187 Introduction Example B,D natural join R.A = “c” S.E = 2 R S This logical query plan is equivalent. It is more efficient, since it reduces the sizes of the intermediate tables. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 188

Introduction Example Needs to be refined into physical query plan. E.g., use R.A and S.C indexes as follows: (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 189 Parsing Parse Trees Nodes correspond to either atoms (terminal symbols) or syntactic categories (non-terminal symbols). An atom is a lexical element such as a keyord, name of an attribute or relation, constant, operator, parenthesis. A syntactic category denotes a family of query subparts that all play the same role within a query, e.g. Condition. Syntactic categories are enclosed in triangular brackets, e.g. <Condition>. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 190

Parsing Example SELECT title FROM StarsIn WHERE starName IN (SELECT name FROM MovieStar WHERE birthdate LIKE „%1960‟); CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 191 Parsing <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthDate „%1960‟ CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 192

Parsing Grammar for SQL The following grammar describes a simple subset of SQL. Queries <Query>::= SELECT <SelList> FROM <FromList> WHERE <Condition> ; Selection lists <SelList>::= <Attribute>, <SelList> <SelList>::= <Attribute> From lists <FromList>::= <Relation>, <FromList> <FromList>::= <Relation> CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 193 Parsing Grammar for SQL Conditions <Condition>::= <Condition> AND <Condition> <Condition>::= <Attribute> IN (<Query>) <Condition>::= <Attribute> = <Attribute> <Condition>::= <Attribute> LIKE <Pattern> Syntactic categories Relation and Attribute are not defined by grammar rules, but by the database schema. Syntactic category Pattern defined as some regular expression. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 194

Conversion to Query Plan How to convert a parse tree into a logical query plan, i.e. a relational algebra expression? Queries with conditions without subqueries are easy: Form Cartesian product of all relations in <FromList>. Apply a selection c where C is given by <Condition>. Finally apply a projection L where L is the list of attributes in <SelList>. Queries involving subqueries are more difficult. Remove subqueries from conditions and represent them by a two-argument selection in the logical query plan. See the textbook for details. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 195 Algebraic Laws for Query Plans Introduction Algebraic laws allow us to transform a Relational Algebra (RA) expression into an equivalent one. Two RA expressions are equivalent if, for all database instances, they produce the same answer. The resulting expression may have a more efficient physical query plan. Algebraic laws are used in the query rewrite phase. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 196

Algebraic Laws for Query Plans Introduction Commutative law : Order of arguments does not matter. x + y = y + x Associative law : May group two uses of the operator either from the left or the right. (x + y) + z = x + (y + z) Operators that are commutative and associative can be grouped and ordered arbitrarily. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 197 Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 198

Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R To prove this law, need to show that any tuple resulting from the left side expression is also produced by the right side expression, and vice versa. Suppose tuple t is in R S. There must be tuples r in R and s in S that agree with t on all shared attributes. If we evaluate S R, tuples s and r will again result in t. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 199 Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R Note that the order of attributes within a tuple does not matter (carry attribute names along). Relation as bag of tuples According to the same reasoning, the number of copies of t must be identical on both sides. The other direction of the proof is essentially the same, given the symmetry of S and R. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 200

Algebraic Laws for Query Plans Selection p1 p2 (R) = p1 [ p2 (R)] p1vp2 (R) = [ p1 (R)] U [ p2 (R)] p1 [ p2 (R)] = p2 [ p1 (R)] Simple conditions p1 or p2 may be pushed down further than the complex condition. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 201 Algebraic Laws for Query Plans Bag Union What about the union of relations with duplicates (bags)? R = {a,a,b,b,b,c} S = {b,b,c,c,d} R U S = ? Number of occurrences either SUM or MAX of occurrences in the imput relations. SUM: R U S = {a,a,b,b,b,b,b,c,c,c,d} MAX: R U S = {a,a,b,b,b,c,c,d} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 202

Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1 (R) U p2 (R) MAX implementation of union makes rule work. R={a,a,b,b,b,c} p1 satisfied by a,b, p2 satisfied by b,c p1 v p2 (R) = {a,a,b,b,b,c} p1 (R) = {a,a,b,b,b} p2 (R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,c} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 203 Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1 (R) U p2 (R) SUM implementation of union makes more sense. Senators (……) Reps (……) T1 = yr,state Senators, T2 = yr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? Use SUM implementation, but then some laws do not hold. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 204

Algebraic Laws for Query Plans Selection and Set Operations p (R U S) = p (R) U p (S) p (R - S) = p (R) - S = p (R) - p (S) CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 205 Algebraic Laws for Query Plans Selection and Join p: predicate with only R attributes q: predicate with only S attributes m: predicate with attributes from R and S [ p (R)] S p (R S) = R [ q (S)] q (R S) = CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 206

Algebraic Laws for Query Plans Selection and Join p q (R S) = [ p (R)] [ q (S)] p q m (R S) = m [ ( q S) ] p R) ( pvq (R S) = [ ( p R) S ] U [ R q S) ] ( CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 207 Algebraic Laws for Query Plans Selection and Join p q (R S) = p [ q (R S) ] = p [ R q (S) ] = [ p (R)] [ q (S)] CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 208

Database Systems II Query Compiler CMPT 454, Simon Fraser - PDF document

Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Introduction The Query Compiler translates an SQL query into a physical query plan, which can be executed, in three steps: The query is parsed

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

National Address Database National Address Database What is a National Address Database?

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Week 7 Binary Operations Discrete Math Marie Demlov http://math.feld.cvut.cz/demlova April

Then the left side is clearly \an y sequence of 0's and 1's. 1 The righ t

A sequent calculus for a semi-associative law 1 Noam Zeilberger University of Birmingham

tr rtt t

Lecture 2 Boolean Algebra and Circuits CS 230 - Spring 2020 1-1 Boolean Algebra Algebra

A Proof of the Group Properties of an Elliptic Curve David M. Russinoff ACL2 Workshop 2017 May

Real Numbers and their Properties Types of Numbers Z + Natural numbers - counting numbers - 1

Lecture 11: Hardware for Arithmetic Todays topics: Logic for common operations