CS 61: Database Systems Query optimization Adapted from - PowerPoint PPT Presentation

CS 61: Database Systems Query optimization Adapted from Silberschatz, Korth, and Sundarshan unless otherwise noted

Agenda 1. Query processing 2. Tips for fast queries 3. Explain (yourself) 2

Three typical non-database bottlenecks to performance: CPU, Ram, network I/O RAM: as much as possible CPU: as fast as possible Cache queries and data • in memory Less query processing • Network: and paging to disk You don’t want this • 3

Three typical non-database bottlenecks to performance: CPU, Ram, network I/O RAM: as much as possible CPU: as fast as possible Cache queries and data • in memory Less query processing • Network: and paging to disk Fast network • Fast disk (SAN) • 4

The query optimizer chooses the best execution plan for a given query High-level overview of SQL execution process Parse/Compile/ Replace Cache Execute Optimize placeholders Parse Cache Replace placeholders Execute Check syntax Store optimized Prepared statement Query is executed • • • • Check table and query plan in are not complete Data is fetched from • • columns exist cache statements disk and returned to If command Have placeholders user • • Compile submitted for some values Convert query • again, skip prior But, format of • to machine steps (already command is set now code done) Placeholders filled • Optimize with literal values Choose optimal • Place holder data • execution plan doesn’t change command format 5 Adapted from: http://javabypatel.blogspot.com/2015/09/how-prepared-statement-in-java-prevents-sql-injection.html

The query optimizer chooses the best execution plan for a given query High-level overview of SQL execution process Parse/Compile/ Replace Cache Execute Optimize placeholders Parse Cache Replace placeholders Execute Check syntax Store optimized Prepared statement Query is executed • • • • Check table and query plan in are not complete Data is fetched from • • columns exist cache statements disk and returned to If command Have placeholders user • • Compile submitted for some values Convert query • again, skip prior But, format of • to machine steps (already command is set now code done) Placeholders filled • Optimize with literal values Choose optimal • Place holder data • execution plan doesn’t change command format 6 Adapted from: http://javabypatel.blogspot.com/2015/09/how-prepared-statement-in-java-prevents-sql-injection.html

The database keep statistics to help the optimizer make smart decisions Database Tables Indexes Environment • Number of rows/disk Number and name Logical and • • blocks used of columns in the physical disk • Number of columns index key block size in each row Number of distinct Location and • • • Min/Max value in key values in the size of data files each column index key CPU speed • • Which columns have Histogram of key Disk • • indexes values in an index throughput Number of disk speed • blocks used by the RAM available • index 7 Adapted from Coronel and Morris

Optimizer approaches: rule-based and cost-based Rule-based optimizer: Cost-based optimizer: Uses preset rules and cost points Uses algorithms based on statistics • • to determine the best approach about objects being accessed to to execute a query determine the best approach to Rules assign a fixed cost to each execute a query • SQL operation Adds up the total SQL operation cost • I/O costs • Processing costs • Resource costs (RAM and • temporary space) 8

Cost-based example: multiple ways to execute the same query Want data from both tables, so will require a JOIN Optimizer Products table Vendors table Knows 7,000 rows 300 rows Estimates NH products: 1,000 NH vendors: 10 Optimizer tries to determine best way to execute query • Book gives detailed analysis of cost • I will focus on I/O operations 9 • Two ways this query could be executed Adapted from Coronel and Morris

Two options to execute the query Two options 1) JOIN first, then SELECT NH 2) SELECT NH first, then JOIN Optimizer 𝜏 VendorState=‘NH’ ⋈ p.VendorID = v.VendorID must choose which / \ | approach is Products 𝜏 VendorState=‘NH’ (Vendors) ⋈ p.VendorID = v.VendorID better / \ Products Vendors 10 Adapted from Coronel and Morris

Option 1: JOIN first, then SELECT 1) JOIN first, then SELECT NH Products: 7,000 rows NH products: 1,000 Vendors: 300 rows 𝜏 VendorState=‘NH’ ( 𝜏 p.VendorID = v.VendorID (Products X Vendors)) NH vendors: 10 Remember Step Operations Read I/O Write I/O Total I/O from Relational Ops Ops Ops Algebra, a JOIN 1 Cartesian Product (Product 7,000 + 300 2,100,000 2,107,300 is a Cartesian x Vendor) = 7,300 Product 2 Select rows from Step 1 2,100,000 7,000 2,107,000 followed by a with same vendor codes SELECT 3 Select rows from Step 2 7,000 1,000 8,000 with State = NH Total 2,114,300 2,108,000 4,222,300 11 Adapted from Coronel and Morris

Option 2: SELECT first, then JOIN 2) SELECT NH first, then JOIN Products: 7,000 rows This example considers NH products: 1,000 only I/O cost, the book is more precise Vendors: 300 rows 𝜏 p.VendorID = v.VendorID (Products X 𝜏 VendorState=‘NH’ (Vendors)) NH vendors: 10 Step Operations Read I/O Write I/O Total I/O Option 1: Ops Ops Ops 4,222,300 1 Select rows in Vendor with 300 10 310 Option 2: State = ‘NH’ 148,320 2 Cartesian product Products 7,000 + 10 70,000 77,010 x Step 1 = 7,010 Optimizer picks 3 Select rows in Step 2 with 70,000 1,000 71,000 Option 2 as same vendor codes execution plan (28 times Total 77,310 71,010 148,320 smaller) 12 Adapted from Coronel and Morris

Majority of performance problems are related to poorly written SQL code A carefully written query almost always outperforms a poorly written query • When possible, use simple columns or literals as operands; try to avoid using conditional expressions with functions • Numeric field comparisons are faster than character, date, and NULL comparisons • Equality comparisons are faster than inequality comparisons • When using multiple AND conditions, write the condition most likely to be false first (take advantage of short circuiting) • When using multiple OR conditions, write the condition most likely to be true first (short circuiting again) • Avoid the use of NOT logical operator (NOT Price>10 becomes Price <= 10) • For text matching, use ‘A%’ not ‘%A%’ if possible • Consider your index use! 14 Adapted from Coronel and Morris

Consider your index use Index considerations Indices speed up reads, but slow down writes Reads need only scan rows meeting criteria, not full table scan • Writes must update tables as well as (possibly) index • Impractical to put index on every attribute Take up too much memory • Performance hit • Considerations for indices: Use when attribute used in WHERE, HAVING, ORDER BY, or GROUP BY • clauses of frequently run queries Do not use on small tables • Do not use with low cardinality (small number of unique values) • Declare PK and FK so optimizer can use indexes on JOINs • (automatically done by MySQL) Declare indices for non-prime attributes used in JOINs • 15 Drop infrequently used indices •

Often indexes can increase SQL read performance significantly Show indexes YES if column can be NULL Key name Primary key always called PRIMARY Can the optimizer Cardinality: Column name use this Estimated number Table name index? of unique values Type of index: Collation: how 1 if can contain duplicates BTREE (default) sorted 0 otherwise Null, entire HASH A = ascending Column sequence column indexed D = descending number in index (first otherwise starts at 1 for multi- number of column indexes) indexed characters 17 Adapted from: https://dev.mysql.com/doc/mysql-infoschema-excerpt/5.7/en/statistics-table.html

Indices can be created on multiple attributes Optimizer can use index on left most prefix • Can use on Boro • Can use on Boro and ZipCode Cannot use on just ZipCode • (left most not met) 18

EXPLAIN tells you how MySQL is using EXPLAIN tells you how MySQL is using indices indices Possible indices Indices used There are 26,573 rows in Restaurants table Using index, execution plan only estimates scanning 13,279 rows; does not do a full table scan But there are only 10,649 rows in Manhattan MySQL uses estimates from table statistics to guess how many rows it will need to process 19 17

EXPLAIN tells you how MySQL is using indices Full table scan if only use ZipCode ZipCode is the second index, not part of the left most Remember unique rows (the Cardinality) is MySQL’s estimate, may not be exact Can use ANALYZE TABLE <name> to get updated key distribution and cardinality statistics from random sample (just an estimate, not an exact count) Optimizer may use selectivity and cardinality to determine where to use index on 20 JOIN operations

EXPLAIN tells you how MySQL is using indices Using both index attributes scans of only 201 rows 21

CS 61: Database Systems Query optimization Adapted from - PowerPoint PPT Presentation

CS 61: Database Systems Query optimization Adapted from Silberschatz, Korth, and Sundarshan unless otherwise noted Agenda 1. Query processing 2. Tips for fast queries 3. Explain (yourself) 2 Three typical non-database bottlenecks to

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

National Address Database National Address Database What is a National Address Database?

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Chapter 19 Linear Programming II NEW CS 473: Theory II, Fall 2015 October 29, 2015 19.1 The

An inequality between the edge-Wiener index and the Wiener index of a graph A. Tepeh joint work

N=1 Duality and the Superconformal Index Jennifer Lin University of Chicago Based on work with

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

Graphical-model based estimation and inference for differential privacy Ryan McKenna 1 , Daniel

Search and Inference in AI Planning Hctor Geffner ICREA & Universitat Pompeu Fabra

TYPE INFERENCE Franois Pottier The Programming Languages Mentoring Workshop @ ICFP August 30,

http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of

CS 61: Database Systems Query optimization Adapted from - PowerPoint PPT Presentation

CS 61: Database Systems Query optimization Adapted from Silberschatz, Korth, and Sundarshan unless otherwise noted Agenda 1. Query processing 2. Tips for fast queries 3. Explain (yourself) 2 Three typical non-database bottlenecks to

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

National Address Database National Address Database What is a National Address Database?

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Introduction to Database Systems Database Systems Lecture 1 Natasha Alechina www.cs.nott.ac.uk/

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

CSE 132B CSE 132B Database Systems Applications Database Systems Applications Alin Deutsch

Chapter 19 Linear Programming II NEW CS 473: Theory II, Fall 2015 October 29, 2015 19.1 The

An inequality between the edge-Wiener index and the Wiener index of a graph A. Tepeh joint work

N=1 Duality and the Superconformal Index Jennifer Lin University of Chicago Based on work with

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall

Graphical-model based estimation and inference for differential privacy Ryan McKenna 1 , Daniel

Search and Inference in AI Planning Hctor Geffner ICREA &amp; Universitat Pompeu Fabra

TYPE INFERENCE Franois Pottier The Programming Languages Mentoring Workshop @ ICFP August 30,

http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of

Search and Inference in AI Planning Hctor Geffner ICREA & Universitat Pompeu Fabra