Building a Modern Database Using LLVM Skye Wanderman-Milne, - PowerPoint PPT Presentation

Building a Modern Database Using LLVM Skye Wanderman-Milne, Cloudera skye@cloudera.com LLVM Developers’ Meeting, Nov. 6-7

Overview ● What is Cloudera Impala? ● Why code generation? ● Writing IR vs. cross compilation ● Results

What is Cloudera Impala? ● High-performance distributed SQL engine for Hadoop ○ Similar to Google’s Dremel ○ Designed for analytic workloads ● Reads/writes data from HDFS, HBase ○ Schema on read ○ Queries data directly from supported formats: text (CSV), Avro, Parquet, and more ● Open-source (Apache licensed)

What is Cloudera Impala? ● Primary goal: SPEED! ● Uses LLVM to JIT compile query-specific functions

Why code generation? Code generation (codegen) lets us use query- specific information to do less work ● Remove conditionals ● Propagate constant offsets, pointers, etc. ● Inline virtual functions calls

void MaterializeTuple(char* tuple) { void MaterializeTuple(char* tuple) { for (int i = 0; i < num_slots_; ++i) { *(tuple + 0) = ParseInt(); // i = 0 char* slot = tuple + offsets_[i]; *(tuple + 4) = ParseBoolean(); // i = 1 switch(types_[i]) { *(tuple + 5) = ParseInt(); // i = 2 case BOOLEAN: } *slot = ParseBoolean(); break; case INT: *slot = ParseInt(); break; case FLOAT: … case STRING: … // etc. } } } interpreted codegen’d

User-Defined Functions (UDFs) ● Allows users to extend Impala’s functionality by writing their own functions e.g. select my_func(c1) from table; ● Defined as C++ functions ● UDFs can be compiled to IR (vs. native code) with Clang ⇒ inline UDFs

IntVal my_func(const IntVal& v1, const IntVal& v2) { return IntVal(v1.val * 7 / v2.val); } SELECT my_func(col1 + 10, col2) FROM ... function pointer my_func function function pointer pointer + col2 (col1 + 10) * 7 / col2 function function pointer pointer col1 10 interpreted codegen’d

User-Defined Functions (UDFs) Future work: UDFs in other languages with LLVM frontends

Two choices for code generation: ● Use the C++ API to handcraft IR ● Compile C++ to IR

void MaterializeTuple(char* tuple) { void MaterializeTuple(char* tuple) { for (int i = 0; i < num_slots_; ++i) { *(tuple + 0) = ParseInt(); // i = 0 char* slot = tuple + offsets_[i]; *(tuple + 4) = ParseBoolean(); // i = 1 switch(types_[i]) { *(tuple + 5) = ParseInt(); // i = 2 case BOOLEAN: } *slot = ParseBoolean(); break; case INT: *slot = ParseInt(); break; case FLOAT: … case STRING: … // etc. } } } interpreted codegen’d

void HdfsAvroScanner::MaterializeTuple(MemPool* pool, uint8_t** data, Tuple* tuple) { BOOST_FOREACH(const SchemaElement& element, avro_header_->schema) { const SlotDescriptor* slot_desc = element.slot_desc; bool write_slot = false; void* slot = NULL; PrimitiveType slot_type = INVALID_TYPE; if (slot_desc != NULL) { write_slot = true; slot = tuple->GetSlot(slot_desc->tuple_offset()); slot_type = slot_desc->type(); } avro_type_t type = element.type; if (element.null_union_position != -1 && !ReadUnionType(element.null_union_position, data)) { type = AVRO_NULL; } switch (type) { case AVRO_NULL: Native if (slot_desc != NULL) tuple->SetNull(slot_desc->null_indicator_offset()); break; case AVRO_BOOLEAN: interpreted ReadAvroBoolean(slot_type, data, write_slot, slot, pool); break; case AVRO_INT32: ReadAvroInt32(slot_type, data, write_slot, slot, pool); function break; case AVRO_INT64: ReadAvroInt64(slot_type, data, write_slot, slot, pool); break; case AVRO_FLOAT: ReadAvroFloat(slot_type, data, write_slot, slot, pool); break; case AVRO_DOUBLE: ReadAvroDouble(slot_type, data, write_slot, slot, pool); break; case AVRO_STRING: case AVRO_BYTES: ReadAvroString(slot_type, data, write_slot, slot, pool); break; default: DCHECK(false) << "Unsupported SchemaElement: " << type; } } }

Building a Modern Database Using LLVM Skye Wanderman-Milne, - PowerPoint PPT Presentation

Building a Modern Database Using LLVM Skye Wanderman-Milne, Cloudera skye@cloudera.com LLVM Developers Meeting, Nov. 6-7 Overview What is Cloudera Impala? Why code generation? Writing IR vs. cross compilation Results What is

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Building an LLVM Backend LLVM 2014 tutorial Fraser Cormack Pierre-Andr Saulais Codeplay

The Many Faces of Instrumentation: Debugging and Better Performance using LLVM in HPC What are

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

SQL on Structurally-Encrypted Databases Seny Kamara Tarik Moataz Q : What is a relational

QBE Query-By-Example provides a visual interface for queries and updates a version

CS/COE 1520 pitt.edu/~ach54/cs1520 Developing Models in Flask Database overview Our models

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Ef Efficient RDF Schema Mapping and Triples Generation Based on ETL TL Tool Jiao Li, Guojian

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs,

CS 61: Database Systems Introduction Adapted from Silberschatz, Korth, and Sundarshan unless

Popeye and Roadrunner Lessons Learned Time marches on The number of flight events received by

Building a Modern Database Using LLVM Skye Wanderman-Milne, - PowerPoint PPT Presentation

Building a Modern Database Using LLVM Skye Wanderman-Milne, Cloudera skye@cloudera.com LLVM Developers Meeting, Nov. 6-7 Overview What is Cloudera Impala? Why code generation? Writing IR vs. cross compilation Results What is

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi &amp; Manel Grichi 1 Plan What is LLVM? How will you be using it?

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Building an LLVM Backend LLVM 2014 tutorial Fraser Cormack Pierre-Andr Saulais Codeplay

The Many Faces of Instrumentation: Debugging and Better Performance using LLVM in HPC What are

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

SQL on Structurally-Encrypted Databases Seny Kamara Tarik Moataz Q : What is a relational

QBE Query-By-Example provides a visual interface for queries and updates a version

CS/COE 1520 pitt.edu/~ach54/cs1520 Developing Models in Flask Database overview Our models

DATABASE SYSTEMS Introduction to MySQL Database System Course, 2016 AGENDA FOR TODAY

Ef Efficient RDF Schema Mapping and Triples Generation Based on ETL TL Tool Jiao Li, Guojian

Learning to Reconstruct Statistical Learning Theory and Encrypted Database Attacks Paul Grubbs,

CS 61: Database Systems Introduction Adapted from Silberschatz, Korth, and Sundarshan unless

Popeye and Roadrunner Lessons Learned Time marches on The number of flight events received by

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?