DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #13: QUERY COMPILATION

2 TODAY’S AGENDA Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations

3 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System

4 HEKATON REMARK After switching to an in-memory DBMS, the only way to increase throughput is to reduce the number of instructions executed. → To go 10x faster, the DBMS must execute 90% fewer instructions … → To go 100x faster, the DBMS must execute 99% fewer instructions … COMPILATION IN THE MICROSOFT SQL SERVER HEKATON ENGINE IEEE Data Engineering Bulletin 2011

5 OBSERVATION One way to achieve such a reduction in instructions is through code specialization . This means generating code that is specific to a particular query in the DBMS. → Encode everything known about the data (e.g., type) and query (e.g., where clause) Most code is written to make it easy for humans to understand rather than performance … → Interpretation vs. Compilation

6 EXAMPLE DATABASE CREATE TABLE A ( CREATE TABLE B ( id INT PRIMARY KEY , id INT PRIMARY KEY, val INT val INT ); ); CREATE TABLE C ( a_id INT REFERENCES A( id ), b_id INT REFERENCES B( id ), PRIMARY KEY ( a_id , b_id ) );

7 QUERY PROCESSING SELECT A.id, B.val FROM A, B WHERE A.id = B.id AND B.val > 100 p A.id, B.val ⨝ A.id=B.id s val>100 A B

8 QUERY PROCESSING SELECT A.id, B.val Tuple-at-a-time FROM A, B WHERE A.id = B.id → Each operator calls next on their child to get AND B.val > 100 p the next tuple to process. A.id, B.val Vector-at-a-time ⨝ → Each operator calls next on their child to get A.id=B.id the next chunk of data to process. s Operator-at-a-time val>100 → Each operator materializes their entire output A B for their parent operator.

9 QUERY PROCESSING SELECT A.id, B.val Tuple-at-a-time FROM A, B WHERE A.id = B.id → Each operator calls next on their child to get AND B.val > 100 p the next tuple to process. A.id, B.val Vector-at-a-time ⨝ → Each operator calls next on their child to get A.id=B.id the next chunk of data to process. s Operator-at-a-time val>100 → Each operator materializes their entire output A B for their parent operator.

10 QUERY INTERPRETATION ⨝ A.id=C.a_id SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B ⨝ B.id=C.b_id σ A.val=123 WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id A Γ B.id, COUNT(*) AND B.id = C.b_id σ B.val= ? +1 B C

11 QUERY INTERPRETATION ⨝ A.id=C.a_id for t 1 in left.next() : ⨝ SELECT * buildHashTable (t 1 ) FROM A, C, for t 2 in right.next() : ( SELECT B.id, COUNT (*) if probe (t 2 ): emit (t 1 ⨝ t 2 ) FROM B ⨝ B.id=C.b_id σ A.val=123 σ for t 1 in left.next() : ⨝ WHERE B.val = ? + 1 for t in child.next() : buildHashTable (t 1 ) GROUP BY B.id) AS B if evalPred (t): emit (t) for t 2 in right.next() : WHERE A.val = 123 if probe (t 2 ): emit (t 1 ⨝ t 2 ) AND A.id = C.a_id A Γ A Γ B.id, COUNT(*) AND B.id = C.b_id for t in child.next() : for t in A : buildAggregateTable (t) emit (t) for t in aggregateTable : emit (t) σ B.val= ? +1 σ for t in child.next() : if evalPred (t): emit (t) B C B C for t in B : for t in C : emit (t) emit (t)

12 PREDICATE INTERPRETATION SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id AND B.id = C.b_id

13 PREDICATE INTERPRETATION SELECT * FROM A, C, ( SELECT B.id, COUNT (*) FROM B WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 AND A.id = C.a_id AND B.id = C.b_id

14 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + Parameter(0) Constant(1)

17 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1)

20 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999

21 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999

22 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id TupleAttribute(B.val) + 1000 Parameter(0) Constant(1) 999 1

23 PREDICATE INTERPRETATION Execution Context SELECT * FROM A, C, ( SELECT B.id, COUNT (*) Current Tuple Query Parameters Table Schema FROM B (123, 1000) (int:999) B→(int:id, int:val) WHERE B.val = ? + 1 GROUP BY B.id) AS B WHERE A.val = 123 = AND A.id = C.a_id AND B.id = C.b_id true TupleAttribute(B.val) + 1000 1000 Parameter(0) Constant(1) 999 1

24 CODE SPECIALIZATION Any CPU intensive entity of database can be natively compiled if they have a similar execution pattern on different inputs. → Access Methods → Stored Procedures → Operator Execution → Predicate Evaluation → Logging Operations

25 BENEFITS Attribute types are known a priori . → Data access function calls can be converted to inline pointer casting. Predicates are known a priori . → They can be evaluated using primitive data comparisons. No function calls in loops → Allows the compiler to efficiently distribute data to registers and increase cache reuse.

26 ARCHITECTURE OVERVIEW Cost Estimates System Catalog SQL Query Optimizer Annotated AST Physical Plan Parser Binder Compiler Abstract Syntax Tree Native Code

27 CODE GENERATION Approach #1: Transpilation → Write code that converts a relational query plan into C/C++ and then run it through a conventional compiler to generate native code. Approach #2: JIT Compilation → Generate an intermediate representation (IR) of the query that can be quickly compiled into native code .

28 HIQUE – CODE GENERATION For a given query plan, create a C/C++ program that implements that query’s execution. → Bake in all the predicates and type conversions. Use an off-shelf compiler to convert the code into a shared object, link it to the DBMS process, and then invoke the exec function. GENERATING CODE FOR HOLISTIC QUERY EVALUATION ICDE 2010

29 OPERATOR TEMPLATES SELECT * FROM A WHERE A.val = ? + 1

30 OPERATOR TEMPLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple(table, t) if eval(predicate, tuple, params): emit(tuple)

31 OPERATOR TEMPLATES Interpreted Plan for t in range (table.num_tuples): tuple = get_tuple(table, t) if eval(predicate, tuple, params): emit(tuple) 1. Get schema in catalog for table. 2. Calculate offset based on tuple size. 3. Return pointer to tuple.

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #13: QUERY COMPILATION 2 TODAYS AGENDA Background Code Generation / Transpilation JIT Compilation (LLVM) Real-world Implementations 3 ANATOMY OF A

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

CS411: Two Perspectives on DBMS User perspective CS411 how to use a database system

Database Systems Database Systems 1 Creating a Database System Design Construction

National Address Database National Address Database What is a National Address Database?

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

Database Management System (DBMS) DBMS contains information about a particular enterprise

Distributed Databases 1 19.1 Distributed Database System A distributed database system

Distributed Databases Distributed database management system A distributed database (DDB) is

Using Pig, Hive, and Impala with Hadoop Jay Urbain,

Big Data Compete by asking bigger questions $$$... $ ??? SLA Yaaaay Hadoop to Save the

of Big Data 10/01/2018 25 Storage of Big Data Data is growing faster than Moores Law Too

V-trace, PopArt Normalization, Partially Observable MDPs Milan Straka January 7, 2019 Charles

Scripting for Multimedia LECTURE 9: WORKING WITH TABLES Tables in HTML A table displays a

Big Data and Internet Thinking Chentao Wu Associate Professor Dept. of Computer Science and

Hadoop Ecosystem Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini

Ontology Engineering Lecture 1: Introduction to Knowledge bases, ontologies, and the Semantic Web