Instrew: Leveraging LLVM for High Performance Dynamic Binary - PowerPoint PPT Presentation

Instrew: Leveraging LLVM for High Performance Dynamic Binary Instrumentation Alexis Engelke Martin Schulz Chair of Computer Architecture and Parallel Systems TUM Department of Informatics Technical University of Munich VEE 2020, virtual

Alexis Engelke 2020 Program Instrumentation ◮ Enhance program with additional code ◮ Use-cases: analysis, debugging, optimization, portability ◮ Dynamic Binary Instrumentation (DBI) ◮ Binary code instrumented/modified at run-time ◮ Works without recompiling program and libraries ◮ Very popular approach = ⇒ many frameworks available Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 2

Alexis Engelke 2020 DBI Frameworks ◮ Most popular framework: Valgrind ◮ Program behavior can be extended and modified ◮ Allows for extensive code transformations ◮ Usual focus: low rewriting time, not overall performance ◮ Few optimizations, instrumented code has low quality Solution: use standard compiler back-end Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 3

Alexis Engelke 2020 LLVM for DBI ◮ LLVM features high quality optimizer/code generator ◮ Built-in JIT-compiler allows use at run-time ◮ DBILL uses LLVM JIT-compiler for code generation ◮ Machine code → TCG IR → LLVM-IR + Easy to support several architectures − No (efficient) floating-point/SIMD support − Optimizations limited to basic blocks Solution: lift machine code directly to LLVM-IR Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 4

Alexis Engelke 2020 Classical DBI Architecture Instrumenter Process Guest Code Decode main Execution Lift to IR loop Manager (Instrument Code) Optimize IR Code Gen. Code Cache Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 5

Alexis Engelke 2020 Architecture Using LLVM-IR Instrumenter Process Guest Code Decode main Execution Lift to LLVM-IR loop Manager Opt. LLVM-IR LLVM JIT Code Cache Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 6

Alexis Engelke 2020 Lifting x86-64 Code to LLVM-IR ◮ Focus on most common x86-64 architecture ◮ Requirements: 1. LLVM-IR must be handled well by optimizer/code gen. � run-time performance 2. Avoid unnecessary transformations � reduced rewriting time 3. Only use architecture-independent LLVM-IR constructs � retargetability (assuming same pointer size) Implemented in our lifting library: Rellume Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 7

Alexis Engelke 2020 Lifting Stages 1. Decode & Recover Control Flow ◮ Decode machine code, following jump targets ◮ Stops on indirect branches, calls, returns ◮ Split into basic blocks 2. Lift Instructions Individually ◮ Create skeleton LLVM-IR function ◮ Generate LLVM-IR for each instruction 3. Create Epilogue & Fixup Branches ◮ Add branches between basic blocks, map data flow Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 8

Alexis Engelke 2020 Register Facets ◮ Facet : typed view on a register (part) ◮ Store and propagate multiple facets for registers ◮ Relevant for partial access and different data types ◮ Avoids many insert/extract/cast ops � better code ◮ Benefit: better optimizations across basic blocks ◮ General Purpose registers: scalar facets only . . . rax eax ax ah 64-bit int 32-bit int 16-bit int 8-bit int (high) ◮ Vector registers: scalar and vector facets . . . 4 × 32-bit float 8 × 16-bit int Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 9

Alexis Engelke 2020 Example define void @func_40061e(i8* %cpu) { Single parameter: prologue: CPU struct ; ... ◮ Instruction Ptr. ◮ Registers bb_40061e: ◮ Status Flags ; ... ◮ . . . epilogue: ; ... } Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 10

Alexis Engelke 2020 Example define void @func_40061e(i8* %cpu) { prologue: %rip_p_i8 = gep i8, i8* %cpu, i64 0 Construct ptrs. into %rip_p = bitcast i8* %rip_p_i8 to i64* CPU struct %rsp_p_i8 = gep i8, i8* %cpu, i64 40 %rsp_p = bitcast i8* %rsp_p_i8 to i64* %rsp = load i64, i64* %rsp_p Load registers into ; ... load other registers ... SSA variables br label %bb_40061e bb_40061e: ; ... epilogue: ; ... } Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 11

Alexis Engelke 2020 Example define void @func_40061e(i8* %cpu) { prologue: ; ... bb_40061e: %rsp_2 = phi i64 [%rsp, %prologue] ; sub rsp, 176 Lift instruction %rsp_3 = sub i64 %rsp_2, 176 semantics ; ... compute flags ... br label %epilogue epilogue: ; ... } Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 12

Alexis Engelke 2020 Example define void @func_40061e(i8* %cpu) { prologue: ; ... bb_40061e: ; ... epilogue: %rsp_4 = phi i64 [%rsp_3, %bb_40061e] store i64 %rsp_4, i64* %rsp_p Store new values ; ... store flags ... store i64 0x400625, i64* %rip_p Store new RIP ret void } Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 13

Alexis Engelke 2020 Instrew Architecture Client Process Server Process Guest Code Decode main Execution Rellume loop Manager Opt. LLVM-IR LLVM JIT Code Cache Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 14

Alexis Engelke 2020 Client-Server Architecture ◮ Instrew Server ◮ Rewrites code chunks on client request ◮ Returns an ELF object file containing rewritten code ◮ Instrew Client ◮ Manages execution and local code cache ◮ Sends request with program code to server process ◮ Relocates and links ELF files ◮ Communication: custom IPC protocol Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 15

Alexis Engelke 2020 Translation Details ◮ Translate code chunks with function granularity ◮ Decode until call/ret/indirect jump ◮ Enables power of LLVM’s whole-function optimizations ◮ Reduces number of rewrite requests ◮ Use special calling convention ◮ Reduces number of memory accesses to CPU structure ◮ Don’t compute flags before call / ret ◮ Flags extremely rarely used to pass args/return vals Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 16

Alexis Engelke 2020 Evaluation ◮ Run on SPEC CPU2017 benchmarks ◮ Comparison with Valgrind ◮ Most popular tool with similar set of use-cases ◮ No comparison with DBILL (no sources) and Pin (different scope of code modifications) System: 2 × Intel Xeon CPU E5-2697 v3 (Haswell) @ 2.6 GHz (3.6 GHz Turbo), 17 MiB L3 cache; 64 GiB main memory; SUSE Linux 12; Linux kernel 4.12.14-95.32; 64-bit mode. Compiler: GCC 9.2.0 with -O3 -march=x86-64 , implies SSE/SSE2 but no SSE3+/AVX. Libraries: glibc 2.22; LLVM 9.0. SPEC CPU2017 intspeed+fpspeed benchmarks, ref workload, single thread. Comparison: Valgrind 3.15.0. Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 17

Alexis Engelke 2020 SPEC CPU2017 Results Native Valgrind Instrew 14 Normalized run-time 12 10 8 6 4 2 0 n c f p k 4 g a 2 z s N m f 4 2 k b d s c c p 6 x e r m p m a m n l e w c a 3 g m e . v S b e t 2 e g o i n k b e 7 S a g o m . e x j n a l . p 2 . l 5 . 1 a . i r 5 n c s . w B 9 c 4 n . a . . o 0 n 5 p 1 6 2 . 8 m 4 o 4 0 m h b u 1 7 6 2 e 4 6 e 6 a 6 2 6 t 5 o e c . t 2 i g l 6 6 3 6 . o 6 a x c 6 8 . d 0 f 0 e a x . 3 . 6 c 9 2 . 1 . 6 3 8 . 4 6 3 4 7 2 6 6 0 6 6 6 Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 18

Alexis Engelke 2020 SPEC CPU2017 Results Native Valgrind Instrew 14 Overhead 1/5 of Valgrind Normalized run-time 12 Instrew: 1.7x ( 72% overhead ) 10 Valgrind: 4.7x (367% overhead ) 8 6 4 2 0 n c f p k 4 g a 2 z s N m f 4 2 k b d s c c p 6 x e r m p m a m n l e w c a 3 g m e . v S b e t 2 e g o i n k b e 7 S a g o m . e x j n a l . p 2 . l 5 . 1 a . i r 5 n c s . w B 9 c 4 n . a . . o 0 n 5 p 1 6 2 . 8 m 4 o 4 0 m h b u 1 7 6 2 e 4 6 e 6 a 6 2 6 t 5 o e c . t 2 i g l 6 6 3 6 . o 6 a x c 6 8 . d 0 f 0 e a x . 3 . 6 c 9 2 . 1 . 6 3 8 . 4 6 3 4 7 2 6 6 0 6 6 6 Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 19

Alexis Engelke 2020 SPEC CPU2017 Results Native Valgrind Instrew 14 Instrew Best Case Normalized run-time 12 Instrew: 1.1x; Valgrind: 3.0x 10 8 6 4 2 0 n c f p k 4 g a 2 z s N m f 4 2 k b d s c c p 6 x e r m p m a m n l e w c a 3 g m e . v S b e t 2 e g o i n k b e 7 S a g o m . e x j n a l . p 2 . l 5 . 1 a . i r 5 n c s . w B 9 c 4 n . a . . o 0 n 5 p 1 6 2 . 8 m 4 o 4 0 m h b u 1 7 6 2 e 4 6 e 6 a 6 2 6 t 5 o e c . t 2 i g l 6 6 3 6 . o 6 a x c 6 8 . d 0 f 0 e a x . 3 . 6 c 9 2 . 1 . 6 3 8 . 4 6 3 4 7 2 6 6 0 6 6 6 Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation 20

Instrew: Leveraging LLVM for High Performance Dynamic Binary - PowerPoint PPT Presentation

Instrew: Leveraging LLVM for High Performance Dynamic Binary Instrumentation Alexis Engelke Martin Schulz Chair of Computer Architecture and Parallel Systems TUM Department of Informatics Technical University of Munich VEE 2020, virtual

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

The Many Faces of Instrumentation: Debugging and Better Performance using LLVM in HPC What are

llvm.mix multi-stage compiler-assisted specializer generator built on LLVM Eugene Sharygin 1

Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline Motivation Related work

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

I dont care! On Incompleteness in Abstract Argumentation (and Belief Revision?) Pietro

Programming with Boolean Satisfaction Michael Codish Department of Computer Science Ben Gurion

On the Operational Meaning of the Bar Construction ...with an application to Probability Paolo

December 5, 2019 Zach Tatlock OctoML Lets Get in the Wayback Machine 2018 Lets Get in

On Completion of Constraint Handling Rules Slim Abdennadher and Thom Fr uhwirth Computer

Module 7 Policy Iteration CS 886 Sequential Decision Making and Reinforcement Learning

Boosting Verifiable Computation on Encrypted Data PKC 2020 Dario Fiore, Anca Nitulescu , David

Concepts of Programming Design Scala and Lightweight Modular Staging (LMS) Alexey Rodriguez

Instrew: Leveraging LLVM for High Performance Dynamic Binary - PowerPoint PPT Presentation

Instrew: Leveraging LLVM for High Performance Dynamic Binary Instrumentation Alexis Engelke Martin Schulz Chair of Computer Architecture and Parallel Systems TUM Department of Informatics Technical University of Munich VEE 2020, virtual

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

LLVM/Clang Mouna Abidi &amp; Manel Grichi 1 Plan What is LLVM? How will you be using it?

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

The Many Faces of Instrumentation: Debugging and Better Performance using LLVM in HPC What are

llvm.mix multi-stage compiler-assisted specializer generator built on LLVM Eugene Sharygin 1

Controlling Virtual Register Pressure in LLVM Middle-End 1 Outline Motivation Related work

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

I dont care! On Incompleteness in Abstract Argumentation (and Belief Revision?) Pietro

Programming with Boolean Satisfaction Michael Codish Department of Computer Science Ben Gurion

On the Operational Meaning of the Bar Construction ...with an application to Probability Paolo

December 5, 2019 Zach Tatlock OctoML Lets Get in the Wayback Machine 2018 Lets Get in

On Completion of Constraint Handling Rules Slim Abdennadher and Thom Fr uhwirth Computer

Module 7 Policy Iteration CS 886 Sequential Decision Making and Reinforcement Learning

Boosting Verifiable Computation on Encrypted Data PKC 2020 Dario Fiore, Anca Nitulescu , David

Concepts of Programming Design Scala and Lightweight Modular Staging (LMS) Alexey Rodriguez

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?