Designing an adaptive VM that combines vectorized and JIT execution - PowerPoint PPT Presentation

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware Tim Gubner ICDE PhD Symposium, 2018 1

Modern hardware vs. data processing systems ASIC Dark Silicon CPU GPU FPGA 2

State of the art System CPU GPU FGPA ASIC MonetDB ✓ doppioDB ✓ ✓ Ocelot ✓ ✓ HyPer ✓ MapD ✓ ✓ CoGaDB ✓ ✓ TensorFlow ✓ ✓ ? ✓ 3

Goal One system to rule them all One system to bring them, and in Dark Silicon bind them 4

Idea Domain- specific language Adaptive virtual machine CPU GPU FPGA ASIC 5

Virtual Machine

Compile or not to compile • Compilation is time consuming ( ≥ 20 ms 1 ) • Also noticeable in HyPer 2 • Compilers make assumptions. Resulting code either: • Static and concise • Dynamic and bulky (code explosion) 1 Using LLVM C++ API and optimization passes 2 Kohn et al. ”Adaptive Execution of Compiled Queries”, ICDE 2018 6

Compile or not to compile • Compilation is time consuming ( ≥ 20 ms 1 ) • Also noticeable in HyPer 2 • Compilers make assumptions. Resulting code either: • Static and concise • Dynamic and bulky (code explosion) Why would we ALWAYS want to compile EVERYTHING? 1 Using LLVM C++ API and optimization passes 2 Kohn et al. ”Adaptive Execution of Compiled Queries”, ICDE 2018 6

(Real) JIT-compilation Interpret Install new kernel Collect runtime (& guards) information & traces Adaptive by design Low compilation effort Ability to exploit multiple Profile Compile hardware architectures Aggressive workload-driven optimizations Mixed execution Select worthy Create specialised sub-program(s) program (& guards) Optimize 7

Domain-Specific Language

The seek for the right level of abstraction Low enough • Micro-adaptivity 3 • Efficient interpretation • JIT / incremental compilation 3 R˘ aducanu et al. ”Micro adaptivity in Vectorwise”, SIGMOD 2013 8

The seek for the right level of abstraction Low enough • Micro-adaptivity 3 • Efficient interpretation • JIT / incremental compilation High enough • Effcient execution on multiple devices • Macro-adaptivity: e.g. reorder operations 3 R˘ aducanu et al. ”Micro adaptivity in Vectorwise”, SIGMOD 2013 8

The seek for the right level of abstraction Low enough • Micro-adaptivity 3 • Efficient interpretation • JIT / incremental compilation High enough • Effcient execution on multiple devices • Macro-adaptivity: e.g. reorder operations Goal Relation algebra → ? → Assembly, OpenCL ... 3 R˘ aducanu et al. ”Micro adaptivity in Vectorwise”, SIGMOD 2013 8

Why (another) DSL? Relational algebra Too high-level (Scalar) Monad/Monoid comprehension Weld 4 , MRQL 5 High-level but per-tuple transformations lose information 4 S. Palkar et al. ”Weld: A Common Runtime for High Performance Data Analytics”, CIDR 2017 5 Fegaras, L. ”An Algebra for Distributed Big Data Analytics”, 2016 9

Why (another) DSL? C alikes OpenCL, CUDA, Intel SPMD (ispc) ... Too low-level MonetDB assembly language Heavily data-parallel, too low-level a

Our vision Data-parallelism as first-class citizen • Data-parallel skeletons/patterns Specialized operations on chunks of data For example: map , filter , scatter , gather ... • Lambda functions • Immutable variables for intermediates (Static single assignment form) • Mutable variables for remaining state • Partially typed ( a ∈ DECIMAL(6,2) instead of a ∈ int64 t ) b

Skeletons Op. map filter scatter gather ht ins merge ✓ π ✓ ✓ σ ✓ ✓ ✓ ✓ ✓ ⊲ ⊳ Hash G Hash ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ∪ Hash ✓ ✓ ✓ ⊲ ⊳ Merge Sort ✓ ( ✓ ) ✓ Skeletons themselves do not need to be implemented data-parallel (e.g. ht ins )... c

Example mut i mut k i := 0 k := 0 loop let input = read i some_data in let a = map (\x -> 2*x) input in let t = filter (\x -> x>0) a in let b = condense t write x i a write y k b i := i + len(a) k := k + len(b) if i >= 4096 then break d

Plan Base framework DSL, vectorized interpreter Dynamic VM Workload-specific optimizations Multiple target architectures GPUs, potentially FPGAs e

Takeaways DSL • Abstract enough for: • Efficient portability • Adaptive optimizations Domain- • Efficient interpretation specific language • State of art does not fit! • Data parallelism as first-class citizen Adaptive virtual VM machine • Interpret first, maybe compile later • Cost-models are hard to get right! CPU GPU FPGA ASIC • Adaptive by design • Aggressive workload-driven optimizations • Mixed execution f

Designing an adaptive VM that combines vectorized and JIT execution - PowerPoint PPT Presentation

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware Tim Gubner ICDE PhD Symposium, 2018 1 Modern hardware vs. data processing systems ASIC Dark Silicon CPU GPU FPGA 2 State of the art System

Just-In-Time (JIT) Motivation JIT Philosophy JIT Procedure Toyota Kanban Systems

JIT Compilation Module Overview JIT Compilation Native vs. Managed Compilation Managed

Estimation based based on on vectorized vectorized surfaces surfaces Estimation for for

Superinstructions and Replication in the Cacao JVM interpreter M. Anton Ertl Christian Thalinger

ORC LLVMs Next Generation of JIT API Contents LLVM JIT APIs Past, Present and Future I

JVM Optimization 101 Sebastian Zarnekow itemis Static vs Dynamic Compilation AOT vs JIT JIT

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, David Broneske, Marcus Pinnecke,

ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Integration of Health and Social Care Simon Carr, Housing Team,JIT JIT is a strategic

LLV8: LLV8: Adding Adding LLVM LLVM as as an an extra extra JIT tier to V8 JavaScript engine

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Class 14 Slides SLIDE what is the designing principle how does designing principle

CS6710 Tool Suite Verilog-XL Synthesis and Place & Route Synopsys Behavioral Design

Incremental Package Builds Guillaume Maudoux @layus NixCon 2017 Louvain-la-Neuve hold a huge

Live Coding in Scala.js Li Haoyi SF Scala 27/2/2014 Who Scala.js? I work at Dropbox writing

CS 5150 So(ware Engineering 17. Program Development William Y. Arms Integrated Development

Exploiting Incrementality with DBToaster Monitoring Programs Network Monitoring Server Status

1 So, heres our agenda for today. First we are going to talk a bit about the problem and why

Incremental Sampling Without Replacement for Sequence Models Kensen Shi, David Bieber, Charles

driven ECO Subramanyam Sripada Song Chen Synopsys Inc. Mar 16, 2017 Agenda Background

Sambuz

Useful Links

Newsletter

Mail Us

Designing an adaptive VM that combines vectorized and JIT execution - PowerPoint PPT Presentation

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware Tim Gubner ICDE PhD Symposium, 2018 1 Modern hardware vs. data processing systems ASIC Dark Silicon CPU GPU FPGA 2 State of the art System

Just-In-Time (JIT) Motivation JIT Philosophy JIT Procedure Toyota Kanban Systems

JIT Compilation Module Overview JIT Compilation Native vs. Managed Compilation Managed

Estimation based based on on vectorized vectorized surfaces surfaces Estimation for for

Superinstructions and Replication in the Cacao JVM interpreter M. Anton Ertl Christian Thalinger

ORC LLVMs Next Generation of JIT API Contents LLVM JIT APIs Past, Present and Future I

JVM Optimization 101 Sebastian Zarnekow itemis Static vs Dynamic Compilation AOT vs JIT JIT

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, David Broneske, Marcus Pinnecke,

ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

Integration of Health and Social Care Simon Carr, Housing Team,JIT JIT is a strategic

LLV8: LLV8: Adding Adding LLVM LLVM as as an an extra extra JIT tier to V8 JavaScript engine

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Class 14 Slides SLIDE what is the designing principle how does designing principle

CS6710 Tool Suite Verilog-XL Synthesis and Place &amp; Route Synopsys Behavioral Design

Incremental Package Builds Guillaume Maudoux @layus NixCon 2017 Louvain-la-Neuve hold a huge

Live Coding in Scala.js Li Haoyi SF Scala 27/2/2014 Who Scala.js? I work at Dropbox writing

CS 5150 So(ware Engineering 17. Program Development William Y. Arms Integrated Development

Exploiting Incrementality with DBToaster Monitoring Programs Network Monitoring Server Status

1 So, heres our agenda for today. First we are going to talk a bit about the problem and why

Incremental Sampling Without Replacement for Sequence Models Kensen Shi, David Bieber, Charles

driven ECO Subramanyam Sripada Song Chen Synopsys Inc. Mar 16, 2017 Agenda Background

Sambuz

Useful Links

Newsletter

Mail Us

CS6710 Tool Suite Verilog-XL Synthesis and Place & Route Synopsys Behavioral Design