Query Optimization Through the Looking Glass Some Lessons From - PowerPoint PPT Presentation

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query Compiler Viktor Leis Technische Universität München

Introduction: Query Optimization HJ B cardinality cost INL B T SELECT ... estimation model v S FROM R,S,T R WHERE ... plan space enumeration

Introduction: Query Optimization HJ B cardinality cost INL B T SELECT ... estimation model v S FROM R,S,T R WHERE ... plan space enumeration ◮ How good are cardinality estimators? ◮ When do bad estimates lead to slow queries? ◮ How important is an accurate cost model for the overall query optimization process? ◮ (How large does the enumerated plan space need to be?)

Join Order Benchmark: Data Set ◮ Internet Movie Data Base data set (imdb.com) ◮ around 4GB, 21 relations ◮ information about movies and related facts about actors, directors, production companies, etc. ◮ publicly available for non-commercial use ◮ like all real-world data sets, full of join-crossing correlations

Join Order Benchmark: 113 Queries SELECT cn.name, mi.info as rating, miidx.info as reldate FROM company_name cn, company_type ct, info_type it, info_type it2, title t kind_type kt, movie_info mi, movie_companies mc, movie_info_idx miidx WHERE cn.country_code = ’[us]’ AND it.info = ’rating’ AND ct.kind = ’production companies’ AND kt.kind = ’movie’ AND it2.info = ’release dates’ AND ... movie_info info_type movie_info_idx company_type movie_companies info_type company_name title kind_type

Methodology ◮ cardinality extraction 1. load data set into different systems 2. run statistics tool (default settings) 3. collect estimates for all subexpressions (e.g., using EXPLAIN in PostgreSQL) to obtain cardinality estimates 4. also run SELECT COUNT(*) queries to obtain true cardinalities ◮ cardinality injection into PostgreSQL

Cardinality Estimation

Cardinality Estimation for Base Table Selections ◮ q-error : max( e / r , r / e ) (over- underestimation factor) median 90th 95th max PostgreSQL 1.00 2.08 6.10 207 DBMS A 1.01 1.33 1.98 43.4 DBMS B 1.00 6.03 30.2 104000 DBMS C 1.06 1677 5367 20471 HyPer 1.02 4.47 8.00 2084

Cardinality Estimation for Joins (1) PostgreSQL [log scale] overestimation → 1e4 1e2 1 ← underestimation 1e2 1e4 95th percentile 1e6 75th percentile median 25th percentile 5th percentile 1e8 0 1 2 3 4 5 6

Cardinality Estimation for Joins (2) PostgreSQL DBMS A DBMS B DBMS C HyPer [log scale] overestimation → 1e4 1e2 1 ← underestimation 1e2 1e4 95th percentile 1e6 75th percentile median 25th percentile 5th percentile 1e8 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 number of joins

When do estimation errors result in bad query plans?

Effect of Estimates on Query Performance (1) ◮ performance with true cardinalities vs. PostgreSQL’s estimates default + no nested-loop join + rehashing (a) (b) (c) 60% 40% 20% 0% ) ) ) ) ) 0 ) ) ) ) ) 0 ) ) ) ) ) 0 9 1 2 0 0 9 1 2 0 0 9 1 2 0 0 0 0 0 . . , 1 0 . . , 1 0 . . , 1 0 0 1 1 1 0 1 1 1 0 1 1 1 , 1 , 1 , 1 , , . 2 > , , . 2 > , , . 2 > 3 9 1 , 3 9 1 , 3 9 1 , [ 0 [ 0 [ 0 . . [ . . [ . . [ 0 0 1 0 0 1 0 0 1 [ [ [ [ [ [ [ [ [

Effect of Estimates on Query Performance (2) PK indexes PK + FK indexes 60% (a) (b) 40% 20% 0% ) ) ) ) ) 0 ) ) ) ) ) 0 9 1 2 0 0 9 1 2 0 0 0 0 . . , 1 0 . . , 1 0 0 1 1 1 0 1 1 1 , 1 , 1 , , . 2 > , , . 2 > 3 9 1 , 3 9 1 , [ 0 [ 0 . . [ . . [ 0 0 1 0 0 1 [ [ [ [ [ [

Cost Model

Cost vs. Runtime standard cost model tuned cost model simple cost model 1e+05 PostgreSQL estimates 1e+03 1e+01 cost [log scale] 10000 true cardinalities 100 1 1e2 1e4 1e2 1e4 1e2 1e4 runtime [ms] [log scale]

Conclusions ◮ cardinality estimates are quite bad for all tested systems ◮ nested-loop joins are dangerous ◮ the more indexes are available the more difficult it becomes to find the optimal plan ◮ cost model is much less important than cardinality estimates The paper (“How Good Are Query Optimizers, Really?”) is available at: www.vldb.org/pvldb/vol9/p204-leis.pdf

Part II: Building an LLVM-Based Query Compiler

HyPer ◮ research project started at TU Munich in 2010 by Thomas Neumann and Alfons Kemper ◮ relational main-memory DBMS ◮ SQL dialect is (mostly) PostgreSQL compatible, follows PostgreSQL server protocol ◮ goal: similar performance as hand-written C programs ◮ startup acquired by Tableau Software in 2016

Compiling Query Plans select * from R1,R3, c = z , and Γ z initialize memory of a = b , ( select R2.z, count (*) for each tuple t in R 1 from R2 if t . x = 7 where R2.y=3 group by R2.z) R2 materialize t in hash table of a = b where R1.x=7 for each tuple t in R 2 and R1.a=R3.b if t . y = 3 and R2.z=R3.c aggregate t in hash table of Γ z a=b for each tuple t in Γ z materialize t in hash table of z = c x=7 for each tuple t 3 in R 3 z=c z = c [ t 3 . c ] for each match t 2 in R 1 a = b [ t 3 . b ] for each match t 1 in z;count(*) output t 1 ◦ t 2 ◦ t 3 y=3 R 2 R 3

Compiling Queries: Interface ◮ each operator implements the following interface: ◮ produce(): generate code for that operator (and its child operators) ◮ consume(attributes,source): generate code that receives a tuple from input

Compiling Queries: Example Operators scan.produce(): print “for each tuple in relation” scan.parent.consume(attributes,scan) σ .produce: σ .input.produce() σ .consume(a,s): print “if ”+ σ .condition σ .parent.consume(attr, σ ) .produce(): .left.produce() .right.produce() .consume(a,s): if (s== .left) print “materialize tuple in hash table” else print “for each match in hashtable[” +a.joinattr+“]” .parent.consume(a+new attributes)

Compiling to LLVM IR ◮ compilation to LLVM Intermediate Representation (IR) (“machine-independent assembler”) using C++ API ◮ attributes are kept in CPU registers as far as possible ◮ generating code can be tedious but many compile time abstractions help, e.g.: ◮ high-level control flow constructs ( if , for ) ◮ SQL value abstraction that handles null semantics, overflow checking, etc. ◮ these abstractions do not cost any runtime ◮ for pragmatic and code size reasons some operators are partially/mostly implemented in C++

Performance (1 GB, 1 thread) TPC-H O0 O1 # PG 9.6 comp. exec. comp. exec. 1 4908 6 161 42 77 2 254 23 13 149 8 3 1258 10 104 69 80 4 193 7 67 47 45 5 516 15 60 104 37

Lessons ◮ LLVM IR as a target language is great ◮ stable ◮ portable ◮ generates efficient machine code ◮ for large queries, compilation times can become a problem

More Information “Efficiently Compiling Efficient Query Plans”: www.vldb.org/pvldb/vol9/p204-leis.pdf “Compiling Database Queries into Machine Code”: sites.computer.org/debull/A14mar/p3.pdf all HyPer papers: www.hyper-db.com interested in doing a PhD at TU Munich? contact us (leis@in.tum.de)

Query Optimization Through the Looking Glass Some Lessons From - PowerPoint PPT Presentation

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query Compiler Viktor Leis Technische Universitt Mnchen Introduction: Query Optimization HJ B cardinality cost INL B T SELECT ... estimation

SCULPTURE MATERIALS SCULPTURE MATERIALS PATTERNS PLATE GLASS GLASS BOTTLES GLASS BOTTLES

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

3M Glass Bubbles iM16K 3M Glass Bubbles iM16K 3M Glass Bubbles and Other Additives

Knowledge Seminar Series : High Performance Glass October 18, 2017 Jim Larsen Director,

An Image Capture Application For The Google Glass Framework Oliver Nina (UCF), Roger Pack (FS)

RUSSIAN FLAT GLASS COATINGS MARKET OVERVIEW Dmitriy D. Bernt Glass Coating Technology

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging

3. A vacuum Glass Catch attachment for broken glass GLASS OTHERS HOME $2.35 Million

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

The Glass Menagerie Jadn S. & Claire B. Symbolism Lauras Glass Menagerie The Glass

Glass Fiber Reinforced Concrete What is Glass Fiber Reinforced Concrete Glass-fiber Reinforced

Recycling of Glass from Construction & Building Demolition Waste Views from the flat glass

Sept 2017 Fast, Safe , clean the Easy Lamination of Glass-Glass CONFIDENTIAL Page 1 Agenda

Poster Presentation Prepared by Mr. Tejas Patel 18th June, 2016 Poster Presentation Fair-2016

SIAS-Chains: Snapshot Isolation Append Storage Chains Dr. Robert Gottstein Prof. Ilia Petrov

Establishing Real Time ASEAN Food Security Information Network and Developing Human Resource

The Database Systems and Information Management Group at Technische Universit at Berlin 1

TPRs future small DB scheme strategy and other developments Jenny Davie Case Management Team

DEFINED BENEFIT PENSION OPTIONS: ST JOHN AIRPORT OUTLINE Issues with existing pension coverage

Pension policy and financial assessment of a new defined benefit pension scheme UNECOSOC

2015 Full Year Results 2015 Full Year Results Background Kevin Lyons-Tarr, CEO 1 2015 Full

Query Optimization Through the Looking Glass Some Lessons From - PowerPoint PPT Presentation

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query Compiler Viktor Leis Technische Universitt Mnchen Introduction: Query Optimization HJ B cardinality cost INL B T SELECT ... estimation

SCULPTURE MATERIALS SCULPTURE MATERIALS PATTERNS PLATE GLASS GLASS BOTTLES GLASS BOTTLES

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

3M Glass Bubbles iM16K 3M Glass Bubbles iM16K 3M Glass Bubbles and Other Additives

Knowledge Seminar Series : High Performance Glass October 18, 2017 Jim Larsen Director,

An Image Capture Application For The Google Glass Framework Oliver Nina (UCF), Roger Pack (FS)

RUSSIAN FLAT GLASS COATINGS MARKET OVERVIEW Dmitriy D. Bernt Glass Coating Technology

Glass Packaging Institute Overview and Activity Update Bryan Vickers Glass Packaging Institute

Glass Container Manufacturing &amp; Recycling Overview Scott DeFife President Glass Packaging

3. A vacuum Glass Catch attachment for broken glass GLASS OTHERS HOME $2.35 Million

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Introduction Query Execution Engine Implements a set of physical operators 2 key

The Glass Menagerie Jadn S. &amp; Claire B. Symbolism Lauras Glass Menagerie The Glass

Glass Fiber Reinforced Concrete What is Glass Fiber Reinforced Concrete Glass-fiber Reinforced

Recycling of Glass from Construction &amp; Building Demolition Waste Views from the flat glass

Sept 2017 Fast, Safe , clean the Easy Lamination of Glass-Glass CONFIDENTIAL Page 1 Agenda

Poster Presentation Prepared by Mr. Tejas Patel 18th June, 2016 Poster Presentation Fair-2016

SIAS-Chains: Snapshot Isolation Append Storage Chains Dr. Robert Gottstein Prof. Ilia Petrov

Establishing Real Time ASEAN Food Security Information Network and Developing Human Resource

The Database Systems and Information Management Group at Technische Universit at Berlin 1

TPRs future small DB scheme strategy and other developments Jenny Davie Case Management Team

DEFINED BENEFIT PENSION OPTIONS: ST JOHN AIRPORT OUTLINE Issues with existing pension coverage

Pension policy and financial assessment of a new defined benefit pension scheme UNECOSOC

2015 Full Year Results 2015 Full Year Results Background Kevin Lyons-Tarr, CEO 1 2015 Full

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging

The Glass Menagerie Jadn S. & Claire B. Symbolism Lauras Glass Menagerie The Glass

Recycling of Glass from Construction & Building Demolition Waste Views from the flat glass