Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ - PowerPoint PPT Presentation

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ Andy_Pavlo // 15- 721 // Spring 2018

2 Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations CMU 15-721 (Spring 2018)

3 H EKATO N REM ARK After switching to an in-memory DBMS, the only way to increase throughput is to reduce the number of instructions executed. → To go 10x faster, the DBMS must execute 90% fewer instructions… → To go 100x faster, the DBMS must execute 99% fewer instructions… COMPILATION IN THE MICROSOFT SQL SERVER HEKATON ENGINE IEEE Data Engineering Bulletin 2011 CMU 15-721 (Spring 2018)

4 O BSERVATIO N One way to achieve such a reduction in instructions is through code specialization . This means generating code that is specific to a particular task in the DBMS. Most code is written to make it easy for humans to understand rather than performance… CMU 15-721 (Spring 2018)

5 EXAM PLE DATABASE CREATE TABLE A ( CREATE TABLE B ( id INT PRIMARY KEY , id INT PRIMARY KEY, val INT val INT ); ); CREATE TABLE C ( a_id INT REFERENCES A( id ), b_id INT REFERENCES B( id ), PRIMARY KEY ( a_id , b_id ) ); CMU 15-721 (Spring 2018)

6 Q UERY PRO CESSIN G SELECT A.id, B.val Tuple-at-a-time FROM A, B WHERE A.id = B.id → Each operator calls next on their child to get AND B.val > 100 p the next tuple to process. A.id, B.val Operator-at-a-time → Each operator materializes their entire output ⨝ A.id=B.id for their parent operator. s Vector-at-a-time val>100 → Each operator calls next on their child to get A B the next chunk of data to process. CMU 15-721 (Spring 2018)

7 Q UERY IN TERPRETATIO N ⨝ A.id=C.a_id SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B σ A.val=123 ⨝ B.id=C.b_id WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id A Γ B.id, COUNT(*) AND B.id = C.b_id σ B.val= ? +1 B C CMU 15-721 (Spring 2018)

7 Q UERY IN TERPRETATIO N ⨝ A.id=C.a_id for t 1 in left.next() : SELECT * ⨝ buildHashTable (t 1 ) FROM A, C, for t 2 in right.next() : ( SELECT B.id, COUNT (*) if probe (t 2 ): emit (t 1 ⨝ t 2 ) FROM B σ A.val=123 σ for t 1 in left.next() : ⨝ B.id=C.b_id WHERE B.val = ? + 1 ⨝ for t in child.next() : buildHashTable (t 1 ) GROUP BY B.id) AS B if evalPred (t): emit (t) for t 2 in right.next() : WHERE A.val = 123 if probe (t 2 ): emit (t 1 ⨝ t 2 ) AND A.id = C.a_id A Γ Γ B.id, COUNT(*) A for t in child.next() : AND B.id = C.b_id for t in A : buildAggregateTable (t) emit (t) for t in aggregateTable : emit (t) σ B.val= ? +1 σ for t in child.next() : if evalPred (t): emit (t) B C B C for t in B : for t in C : emit (t) emit (t) CMU 15-721 (Spring 2018)

8 PREDICATE IN TERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + Parameter(0) Constant(1) CMU 15-721 (Spring 2018)

8 PREDICATE IN TERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) CMU 15-721 (Spring 2018)

8 PREDICATE IN TERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999 CMU 15-721 (Spring 2018)

8 PREDICATE IN TERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999 1 CMU 15-721 (Spring 2018)

8 PREDICATE IN TERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id true TupleAttribute(B.val) + 1000 1000 Parameter(0) Constant(1) 999 1 CMU 15-721 (Spring 2018)

9 CO DE SPECIALIZATIO N Any CPU intensive entity of database can be natively compiled if they have a similar execution pattern on different inputs. → Access Methods → Stored Procedures → Operator Execution → Predicate Evaluation → Logging Operations CMU 15-721 (Spring 2018)

10 BEN EFITS Attribute types are known a priori . → Data access function calls can be converted to inline pointer casting. Predicates are known a priori . → They can be evaluated using primitive data comparisons. No function calls in loops → Allows the compiler to efficiently distribute data to registers and increase cache reuse. CMU 15-721 (Spring 2018)

11 ARCH ITECTURE OVERVIEW Cost Estimates System Catalog SQL Query Optimizer Annotated AST Physical Plan Parser Binder Compiler Abstract Syntax Tree Native Code CMU 15-721 (Spring 2018)

12 CO DE GEN ERATIO N Approach #1: Transpilation → Write code that converts a relational query plan into C/C++ and then run it through a conventional compiler to generate native code. Approach #2: JIT Compilation → Generate an intermediate representation (IR) of the query that can be quickly compiled into native code . CMU 15-721 (Spring 2018)

13 H IQ UE CO DE GEN ERATIO N For a given query plan, create a C/C++ program that implements that query’s execution. → Bake in all the predicates and type conversions. Use an off-shelf compiler to convert the code into a shared object, link it to the DBMS process, and then invoke the exec function. GENERATING CODE FOR HOLISTIC QUERY EVALUATION ICDE 2010 CMU 15-721 (Spring 2018)

14 O PERATO R TEM PLATES SELECT * FROM A WHERE A.val = ? + 1 CMU 15-721 (Spring 2018)

14 O PERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple(table, t) if eval(predicate, tuple, params): emit(tuple) CMU 15-721 (Spring 2018)

14 O PERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple(table, t) if eval(predicate, tuple, params): emit(tuple) 1. Get schema in catalog for table. 2. Calculate offset based on tuple size. 3. Return pointer to tuple. CMU 15-721 (Spring 2018)

14 O PERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple(table, t) if eval(predicate, tuple, params): emit(tuple) 1. Get schema in catalog for table. 2. Calculate offset based on tuple size. 3. Return pointer to tuple. 1. Traverse predicate tree and pull values up. 2. If tuple value, calculate the offset of the target attribute. 3. Perform casting as needed for comparison operators. 4. Return true / false. CMU 15-721 (Spring 2018)

14 O PERATO R TEM PLATES Interpreted Plan Templated Plan for t in range (table.num_tuples): tuple_size = ### tuple = get_tuple(table, t) predicate_offset = ### if eval(predicate, tuple, params): parameter_value = ### emit(tuple) for t in range (table.num_tuples): 1. Get schema in catalog for table. tuple = table.data + t ∗ tuple_size 2. Calculate offset based on tuple size. val = (tuple+predicate_offset) + 1 3. Return pointer to tuple. if (val == parameter_value): emit(tuple) 1. Traverse predicate tree and pull values up. 2. If tuple value, calculate the offset of the target attribute. 3. Perform casting as needed for comparison operators. 4. Return true / false. CMU 15-721 (Spring 2018)

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ - PowerPoint PPT Presentation

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ Andy_Pavlo // 15- 721 // Spring 2018 2 Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations CMU 15-721 (Spring 2018) 3 H EKATO N REM

Lect 12a - Delaunay Triangulations Lect 12b - Delaunay Triangulations Lect 12c - Delaunay

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Lect ure # 13 ADVANCED DATABASE SYSTEMS Checkpoint Protocols @ Andy_Pavlo // 15- 721 //

Lect ure # 14 ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2018 2

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Part II) @

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware @ Andy_Pavlo // 15- 721 //

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15-

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15-

Lect 14a - Line Arrangements: Definitions and Zone Theorem Lect 14b - Line Arrangements:

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Interpreters and virtual machines Interpreters Michel Schinz 20070323 Interpreters Why

Fundamentals of Computer Programming How class is conducted Learn basic programming concepts

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

CSE 341 Programming Languages Implementing PLs Zach Tatlock Spring 2014 Typical workflow

C Language Introduction Declaring variables Types Structure of a main file

Python Tutorial Dr. Xudong Liu Assistant Professor School of Computing University of North

Welcome to CS 61A About the Course Parts of the Course Lecture : Videos posted to cs61a.org

Welcome to CS 61A About the Course Parts of the Course 4 Parts of the Course Lecture : Videos

Sambuz

Useful Links

Newsletter

Mail Us

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ - PowerPoint PPT Presentation

Lect ure # 03 ADVANCED DATABASE SYSTEMS Query Compilation @ Andy_Pavlo // 15- 721 // Spring 2018 2 Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations CMU 15-721 (Spring 2018) 3 H EKATO N REM

Lect 12a - Delaunay Triangulations Lect 12b - Delaunay Triangulations Lect 12c - Delaunay

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

Lect ure # 13 ADVANCED DATABASE SYSTEMS Checkpoint Protocols @ Andy_Pavlo // 15- 721 //

Lect ure # 14 ADVANCED DATABASE SYSTEMS Networking @ Andy_Pavlo // 15- 721 // Spring 2018 2

Lect ure # 06 ADVANCED DATABASE SYSTEMS Multi-Version Concurrency Control (Part II) @

Lect ure # 24 ADVANCED DATABASE SYSTEMS Databases on New Hardware @ Andy_Pavlo // 15- 721 //

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models &amp; Data Layout @ Andy_Pavlo // 15-

Lect ure # 22 ADVANCED DATABASE SYSTEMS Vectorized Execution (Part II) @ Andy_Pavlo // 15-

Lect 14a - Line Arrangements: Definitions and Zone Theorem Lect 14b - Line Arrangements:

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

ADVANCED DATABASE SYSTEMS Recovery Protocols @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Interpreters and virtual machines Interpreters Michel Schinz 20070323 Interpreters Why

Fundamentals of Computer Programming How class is conducted Learn basic programming concepts

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

CSE 341 Programming Languages Implementing PLs Zach Tatlock Spring 2014 Typical workflow

C Language Introduction Declaring variables Types Structure of a main file

Python Tutorial Dr. Xudong Liu Assistant Professor School of Computing University of North

Welcome to CS 61A About the Course Parts of the Course Lecture : Videos posted to cs61a.org

Welcome to CS 61A About the Course Parts of the Course 4 Parts of the Course Lecture : Videos

Sambuz

Useful Links

Newsletter

Mail Us

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Lect ure # 10 ADVANCED DATABASE SYSTEMS Storage Models & Data Layout @ Andy_Pavlo // 15-