ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation - PowerPoint PPT Presentation

Lect ure # 14 ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation @ Andy_Pavlo // 15- 721 // Spring 2020

2 ADM INISTRIVIA Project #2 Checkpoint : Sunday March 8 th Project #2 Final : Sunday March 15 th Project #3 will be announced next class. 15-721 (Spring 2020)

3 Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations 15-721 (Spring 2020)

4 HEKATO N REM ARK After switching to an in-memory DBMS, the only way to increase throughput is to reduce the number of instructions executed. → To go 10x faster, the DBMS must execute 90% fewer instructions… → To go 100x faster, the DBMS must execute 99% fewer instructions… COMPILATION IN THE MICROSOFT SQL SERVER HEKATON ENGINE IEEE D DATA ENGINEERING B BULLETIN 2 2011 15-721 (Spring 2020)

5 OBSERVATION One way to achieve such a reduction in instructions is through code specialization . This means generating code that is specific to a task in the DBMS (e.g., one query). Most code is written to make it easy for humans to understand rather than performance… 15-721 (Spring 2020)

6 EXAM PLE DATABASE CREATE TABLE A ( CREATE TABLE B ( id INT PRIMARY KEY , id INT PRIMARY KEY, val INT val INT ); ); CREATE TABLE C ( a_id INT REFERENCES A( id ), b_id INT REFERENCES B( id ), PRIMARY KEY ( a_id , b_id ) ); 15-721 (Spring 2020)

8 Q UERY INTERPRETATIO N ⨝ A.id=C.a_id SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B σ A.val=123 ⨝ B.id=C.b_id WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id A Γ B.id, COUNT(*) AND B.id = C.b_id σ B.val= ? +1 B C 15-721 (Spring 2020)

8 Q UERY INTERPRETATIO N ⨝ A.id=C.a_id for t 1 in left.next() : SELECT * ⨝ buildHashTable (t 1 ) FROM A, C, for t 2 in right.next() : ( SELECT B.id, COUNT (*) if probe (t 2 ): emit (t 1 ⨝ t 2 ) FROM B σ A.val=123 σ for t 1 in left.next() : ⨝ B.id=C.b_id WHERE B.val = ? + 1 ⨝ for t in child.next() : buildHashTable (t 1 ) GROUP BY B.id) AS B if evalPred (t): emit (t) for t 2 in right.next() : WHERE A.val = 123 if probe (t 2 ): emit (t 1 ⨝ t 2 ) AND A.id = C.a_id A Γ for t in child.next() : Γ B.id, COUNT(*) AND B.id = C.b_id A for t in A : buildAggregateTable (t) emit (t) for t in aggregateTable : emit (t) σ B.val= ? +1 σ for t in child.next() : if evalPred (t): emit (t) B C B C for t in B : for t in C : emit (t) emit (t) 15-721 (Spring 2020)

9 PREDICATE INTERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + Parameter(0) Constant(1) 15-721 (Spring 2020)

9 PREDICATE INTERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 15-721 (Spring 2020)

9 PREDICATE INTERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999 15-721 (Spring 2020)

9 PREDICATE INTERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999 1 15-721 (Spring 2020)

9 PREDICATE INTERPRETATIO N Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→( int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id true TupleAttribute(B.val) + 1000 1000 Parameter(0) Constant(1) 999 1 15-721 (Spring 2020)

10 CODE SPECIALIZATIO N Any CPU intensive entity of database can be natively compiled if they have a similar execution pattern on different inputs. → Access Methods → Stored Procedures → Operator Execution → Predicate Evaluation → Logging Operations 15-721 (Spring 2020)

11 BENEFITS Attribute types are known a priori . → Data access function calls can be converted to inline pointer casting. Predicates are known a priori . → They can be evaluated using primitive data comparisons. No function calls in loops → Allows the compiler to efficiently distribute data to registers and increase cache reuse. 15-721 (Spring 2020)

12 ARCHITECTURE OVERVIEW Cost Estimates System Catalog SQL Query Optimizer Annotated AST Physical Plan Parser Binder Compiler Abstract Syntax Tree Native Code 15-721 (Spring 2020)

13 CODE GENERATIO N Approach #1: Transpilation → Write code that converts a relational query plan into imperative language source code and then run it through a conventional compiler to generate native code. Approach #2: JIT Compilation → Generate an intermediate representation (IR) of the query that the DBMS then compiles into native code . 15-721 (Spring 2020)

14 HIQ UE CODE GENERATIO N For a given query plan, create a C/C++ program that implements that query’s execution. → Bake in all the predicates and type conversions. Use an off-shelf compiler to convert the code into a shared object, link it to the DBMS process, and then invoke the exec function. GENERATING CODE FOR HOLISTIC QUERY EVALUATION ICDE 2010 15-721 (Spring 2020)

15 OPERATO R TEM PLATES SELECT * FROM A WHERE A.val = ? + 1 15-721 (Spring 2020)

15 OPERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple (table, t) if eval (predicate, tuple, params): emit (tuple) 15-721 (Spring 2020)

15 OPERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple (table, t) if eval (predicate, tuple, params): emit (tuple) 1. Get schema in catalog for table. 2. Calculate offset based on tuple size. 3. Return pointer to tuple. 15-721 (Spring 2020)

15 OPERATO R TEM PLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple (table, t) if eval (predicate, tuple, params): emit (tuple) 1. Get schema in catalog for table. 2. Calculate offset based on tuple size. 3. Return pointer to tuple. 1. Traverse predicate tree and pull values up. 2. If tuple value, calculate the offset of the target attribute. 3. Perform casting as needed for comparison operators. 4. Return true / false. 15-721 (Spring 2020)

15 OPERATO R TEM PLATES Interpreted Plan Templated Plan tuple_size = ### for t in range (table.num_tuples): predicate_offset = ### tuple = get_tuple (table, t) parameter_value = ### if eval (predicate, tuple, params): emit (tuple) for t in range (table.num_tuples): 1. Get schema in catalog for table. tuple = table.data + t ∗ tuple_size 2. Calculate offset based on tuple size. val = (tuple+predicate_offset) 3. Return pointer to tuple. if (val == parameter_value + 1): emit (tuple) 1. Traverse predicate tree and pull values up. 2. If tuple value, calculate the offset of the target attribute. 3. Perform casting as needed for comparison operators. 4. Return true / false. 15-721 (Spring 2020)

16 DBM S INTEGRATIO N The generated query code can invoke any other function in the DBMS. This allows it to use all the same components as interpreted queries. → Concurrency Control → Logging / Checkpoints → Indexes 15-721 (Spring 2020)

17 EVALUATION Generic Iterators → Canonical model with generic predicate evaluation. Optimized Iterators → Type-specific iterators with inline predicates. Generic Hardcoded → Handwritten code with generic iterators/predicates. Optimized Hardcoded → Direct tuple access with pointer arithmetic. HIQUE → Query-specific specialized code. 15-721 (Spring 2020)

18 Q UERY COM PILATION EVALUATIO N Intel Core 2 Duo 6300 @ 1.86GHz Join Query: 10k ⨝ 10k→10m L2-cache Miss Memory Stall Instruction Exec. 250 Execution Time (ms) 200 150 100 50 0 Generic Optimized Generic Optimized HIQUE Iterators Iterators Hardcoded Hardcoded Source: Konstantinos Krikellas 15-721 (Spring 2020)

19 Q UERY COM PILATION COST Intel Core 2 Duo 6300 @ 1.86GHz TPC-H Queries Compile (-O0) Compile (-O2) 800 619 Compilation Time (ms) 600 403 400 274 213 160 200 121 0 Q1 Q2 Q3 Source: Konstantinos Krikellas 15-721 (Spring 2020)

20 OBSERVATION Relational operators are a useful way to reason about a query but are not the most efficient way to execute it. It takes a (relatively) long time to compile a C/C++ source file into executable code. HIQUE does not support for full pipelining… 15-721 (Spring 2020)

21 PIPELIN ED OPERATO RS ⨝ A.id=C.a_id SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B σ A.val=123 ⨝ B.id=C.b_id WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id A Γ B.id,COUNT(*) AND B.id = C.b_id σ B.val= ? +1 B C 15-721 (Spring 2020)

ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation - PowerPoint PPT Presentation

Lect ure # 14 ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation @ Andy_Pavlo // 15- 721 // Spring 2020 2 ADM INISTRIVIA Project #2 Checkpoint : Sunday March 8 th Project #2 Final : Sunday March 15 th Project #3 will be

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

CMU-SMU@TRECVID 2015: Video Hyperlinking Zhiyong Cheng 1 , Xuanchong Li 2 , Jialie Shen 1 ,

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson

Cutuing Edge TensorFlow Keras Tuner: hyperuuning for humans Elie Bursztein Google, @elie ? ?

Meditation for a Theorem Prover Reasoning and Consciousness Teaching a Theorem Prover to let

Hyper-Vacancy in a census tract More than 10 percent of housing units in this category Cuyahoga

3.2 Hypergeometric Distribution 3.5, 3.9 Mean and Variance Prof. Tesler Math 186 Winter 2017

Recurrent Pixel Embedding for Grouping Shu Kong CS, ICS, UCI Outline 1. Problem Statement --

Hyper-local sustainable assortment planning Nupur Aggarwal, Abhishek Bansal, Kushagra Manglik,

ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation - PowerPoint PPT Presentation

Lect ure # 14 ADVANCED DATABASE SYSTEMS Query Compilation & Code Generation @ Andy_Pavlo // 15- 721 // Spring 2020 2 ADM INISTRIVIA Project #2 Checkpoint : Sunday March 8 th Project #2 Final : Sunday March 15 th Project #3 will be

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

CMU-SMU@TRECVID 2015: Video Hyperlinking Zhiyong Cheng 1 , Xuanchong Li 2 , Jialie Shen 1 ,

Hyper: Make VM Runs Like Container Xu Wang &lt;xu@hyper.sh&gt; Hyper HQ Agenda Lesson

Cutuing Edge TensorFlow Keras Tuner: hyperuuning for humans Elie Bursztein Google, @elie ? ?

Meditation for a Theorem Prover Reasoning and Consciousness Teaching a Theorem Prover to let

Hyper-Vacancy in a census tract More than 10 percent of housing units in this category Cuyahoga

3.2 Hypergeometric Distribution 3.5, 3.9 Mean and Variance Prof. Tesler Math 186 Winter 2017

Recurrent Pixel Embedding for Grouping Shu Kong CS, ICS, UCI Outline 1. Problem Statement --

Hyper-local sustainable assortment planning Nupur Aggarwal, Abhishek Bansal, Kushagra Manglik,

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

Hyper: Make VM Runs Like Container Xu Wang <xu@hyper.sh> Hyper HQ Agenda Lesson