Instrew: Leveraging LLVM for High Performance Dynamic Binary - - PowerPoint PPT Presentation

instrew leveraging llvm for high performance dynamic
SMART_READER_LITE
LIVE PREVIEW

Instrew: Leveraging LLVM for High Performance Dynamic Binary - - PowerPoint PPT Presentation

Instrew: Leveraging LLVM for High Performance Dynamic Binary Instrumentation Alexis Engelke Martin Schulz Chair of Computer Architecture and Parallel Systems TUM Department of Informatics Technical University of Munich VEE 2020, virtual


slide-1
SLIDE 1

Instrew: Leveraging LLVM for High Performance Dynamic Binary Instrumentation

Alexis Engelke Martin Schulz

Chair of Computer Architecture and Parallel Systems TUM Department of Informatics Technical University of Munich

VEE 2020, virtual

slide-2
SLIDE 2

2 Alexis Engelke 2020

Program Instrumentation

◮ Enhance program with additional code ◮ Use-cases: analysis, debugging, optimization, portability ◮ Dynamic Binary Instrumentation (DBI)

◮ Binary code instrumented/modified at run-time ◮ Works without recompiling program and libraries ◮ Very popular approach = ⇒ many frameworks available

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-3
SLIDE 3

3 Alexis Engelke 2020

DBI Frameworks

◮ Most popular framework: Valgrind

◮ Program behavior can be extended and modified ◮ Allows for extensive code transformations

◮ Usual focus: low rewriting time, not overall performance

◮ Few optimizations, instrumented code has low quality

Solution: use standard compiler back-end

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-4
SLIDE 4

4 Alexis Engelke 2020

LLVM for DBI

◮ LLVM features high quality optimizer/code generator

◮ Built-in JIT-compiler allows use at run-time

◮ DBILL uses LLVM JIT-compiler for code generation

◮ Machine code → TCG IR → LLVM-IR + Easy to support several architectures − No (efficient) floating-point/SIMD support − Optimizations limited to basic blocks

Solution: lift machine code directly to LLVM-IR

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-5
SLIDE 5

5 Alexis Engelke 2020

Classical DBI Architecture

Instrumenter Process Guest Code Code Cache Execution Manager Decode Lift to IR

(Instrument Code)

Optimize IR Code Gen. main loop

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-6
SLIDE 6

6 Alexis Engelke 2020

Architecture Using LLVM-IR

Instrumenter Process Guest Code Code Cache Execution Manager Decode

Lift to LLVM-IR

  • Opt. LLVM-IR

LLVM JIT main loop

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-7
SLIDE 7

7 Alexis Engelke 2020

Lifting x86-64 Code to LLVM-IR

◮ Focus on most common x86-64 architecture ◮ Requirements:

  • 1. LLVM-IR must be handled well by optimizer/code gen.

run-time performance

  • 2. Avoid unnecessary transformations

reduced rewriting time

  • 3. Only use architecture-independent LLVM-IR constructs

retargetability (assuming same pointer size)

Implemented in our lifting library: Rellume

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-8
SLIDE 8

8 Alexis Engelke 2020

Lifting Stages

  • 1. Decode & Recover Control Flow

◮ Decode machine code, following jump targets ◮ Stops on indirect branches, calls, returns ◮ Split into basic blocks

  • 2. Lift Instructions Individually

◮ Create skeleton LLVM-IR function ◮ Generate LLVM-IR for each instruction

  • 3. Create Epilogue & Fixup Branches

◮ Add branches between basic blocks, map data flow

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-9
SLIDE 9

9 Alexis Engelke 2020

Register Facets

◮ Facet: typed view on a register (part) ◮ Store and propagate multiple facets for registers

◮ Relevant for partial access and different data types ◮ Avoids many insert/extract/cast ops better code

◮ Benefit: better optimizations across basic blocks ◮ General Purpose registers: scalar facets only

rax 64-bit int eax 32-bit int ax 16-bit int

ah

8-bit int (high)

. . . ◮ Vector registers: scalar and vector facets

4×32-bit float 8×16-bit int

. . .

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-10
SLIDE 10

10 Alexis Engelke 2020

Example

define void @func_40061e(i8* %cpu) { prologue: ; ... bb_40061e: ; ... epilogue: ; ... }

Single parameter: CPU struct ◮ Instruction Ptr. ◮ Registers ◮ Status Flags ◮ . . .

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-11
SLIDE 11

11 Alexis Engelke 2020

Example

define void @func_40061e(i8* %cpu) { prologue: %rip_p_i8 = gep i8, i8* %cpu, i64 0 %rip_p = bitcast i8* %rip_p_i8 to i64* %rsp_p_i8 = gep i8, i8* %cpu, i64 40 %rsp_p = bitcast i8* %rsp_p_i8 to i64* %rsp = load i64, i64* %rsp_p ; ... load other registers ... br label %bb_40061e bb_40061e: ; ... epilogue: ; ... }

Construct ptrs. into CPU struct Load registers into SSA variables

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-12
SLIDE 12

12 Alexis Engelke 2020

Example

define void @func_40061e(i8* %cpu) { prologue: ; ... bb_40061e: %rsp_2 = phi i64 [%rsp, %prologue] ; sub rsp, 176 %rsp_3 = sub i64 %rsp_2, 176 ; ... compute flags ... br label %epilogue epilogue: ; ... }

Lift instruction semantics

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-13
SLIDE 13

13 Alexis Engelke 2020

Example

define void @func_40061e(i8* %cpu) { prologue: ; ... bb_40061e: ; ... epilogue: %rsp_4 = phi i64 [%rsp_3, %bb_40061e] store i64 %rsp_4, i64* %rsp_p ; ... store flags ... store i64 0x400625, i64* %rip_p ret void }

Store new values Store new RIP

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-14
SLIDE 14

14 Alexis Engelke 2020

Instrew Architecture

Client Process Server Process Guest Code Code Cache Execution Manager Decode Rellume

  • Opt. LLVM-IR

LLVM JIT main loop

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-15
SLIDE 15

15 Alexis Engelke 2020

Client-Server Architecture

◮ Instrew Server

◮ Rewrites code chunks on client request ◮ Returns an ELF object file containing rewritten code

◮ Instrew Client

◮ Manages execution and local code cache ◮ Sends request with program code to server process ◮ Relocates and links ELF files

◮ Communication: custom IPC protocol

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-16
SLIDE 16

16 Alexis Engelke 2020

Translation Details

◮ Translate code chunks with function granularity

◮ Decode until call/ret/indirect jump ◮ Enables power of LLVM’s whole-function optimizations ◮ Reduces number of rewrite requests

◮ Use special calling convention

◮ Reduces number of memory accesses to CPU structure

◮ Don’t compute flags before call/ret

◮ Flags extremely rarely used to pass args/return vals

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-17
SLIDE 17

17 Alexis Engelke 2020

Evaluation

◮ Run on SPEC CPU2017 benchmarks ◮ Comparison with Valgrind

◮ Most popular tool with similar set of use-cases

◮ No comparison with DBILL (no sources) and Pin (different scope of code modifications)

System: 2×Intel Xeon CPU E5-2697 v3 (Haswell) @ 2.6 GHz (3.6 GHz Turbo), 17 MiB L3 cache; 64 GiB main memory; SUSE Linux 12; Linux kernel 4.12.14-95.32; 64-bit mode. Compiler: GCC 9.2.0 with -O3 -march=x86-64, implies SSE/SSE2 but no SSE3+/AVX. Libraries: glibc 2.22; LLVM 9.0. SPEC CPU2017 intspeed+fpspeed benchmarks, ref workload, single thread. Comparison: Valgrind 3.15.0. Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-18
SLIDE 18

18 Alexis Engelke 2020

SPEC CPU2017 Results

6 2 . g c c 6 5 . m c f 6 2 .

  • m

n e t p p 6 2 3 . x a l a n c b m k 6 2 5 . x 2 6 4 6 3 1 . d e e p s j e n g 6 4 1 . l e e l a 6 4 8 . e x c h a n g e 2 6 5 7 . x z 6 3 . b w a v e s 6 7 . c a c t u B S S N 6 1 9 . l b m 6 2 1 . w r f 6 2 7 . c a m 4 6 2 8 . p

  • p

2 6 3 8 . i m a g i c k 6 4 4 . n a b 6 4 9 . f

  • t
  • n

i k 3 d 6 5 4 . r

  • m

s g e

  • m

e a n

2 4 6 8 10 12 14

Normalized run-time Native Valgrind Instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-19
SLIDE 19

19 Alexis Engelke 2020

SPEC CPU2017 Results

6 2 . g c c 6 5 . m c f 6 2 .

  • m

n e t p p 6 2 3 . x a l a n c b m k 6 2 5 . x 2 6 4 6 3 1 . d e e p s j e n g 6 4 1 . l e e l a 6 4 8 . e x c h a n g e 2 6 5 7 . x z 6 3 . b w a v e s 6 7 . c a c t u B S S N 6 1 9 . l b m 6 2 1 . w r f 6 2 7 . c a m 4 6 2 8 . p

  • p

2 6 3 8 . i m a g i c k 6 4 4 . n a b 6 4 9 . f

  • t
  • n

i k 3 d 6 5 4 . r

  • m

s g e

  • m

e a n

2 4 6 8 10 12 14

Overhead 1/5 of Valgrind

Instrew: 1.7x ( 72% overhead) Valgrind: 4.7x (367% overhead) Normalized run-time Native Valgrind Instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-20
SLIDE 20

20 Alexis Engelke 2020

SPEC CPU2017 Results

6 2 . g c c 6 5 . m c f 6 2 .

  • m

n e t p p 6 2 3 . x a l a n c b m k 6 2 5 . x 2 6 4 6 3 1 . d e e p s j e n g 6 4 1 . l e e l a 6 4 8 . e x c h a n g e 2 6 5 7 . x z 6 3 . b w a v e s 6 7 . c a c t u B S S N 6 1 9 . l b m 6 2 1 . w r f 6 2 7 . c a m 4 6 2 8 . p

  • p

2 6 3 8 . i m a g i c k 6 4 4 . n a b 6 4 9 . f

  • t
  • n

i k 3 d 6 5 4 . r

  • m

s g e

  • m

e a n

2 4 6 8 10 12 14

Instrew Best Case

Instrew: 1.1x; Valgrind: 3.0x Normalized run-time Native Valgrind Instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-21
SLIDE 21

21 Alexis Engelke 2020

SPEC CPU2017 Results

6 2 . g c c 6 5 . m c f 6 2 .

  • m

n e t p p 6 2 3 . x a l a n c b m k 6 2 5 . x 2 6 4 6 3 1 . d e e p s j e n g 6 4 1 . l e e l a 6 4 8 . e x c h a n g e 2 6 5 7 . x z 6 3 . b w a v e s 6 7 . c a c t u B S S N 6 1 9 . l b m 6 2 1 . w r f 6 2 7 . c a m 4 6 2 8 . p

  • p

2 6 3 8 . i m a g i c k 6 4 4 . n a b 6 4 9 . f

  • t
  • n

i k 3 d 6 5 4 . r

  • m

s g e

  • m

e a n

2 4 6 8 10 12 14

Some Benchmarks are Slow

Instrew: 3.1x; Valgrind: 3.4x Many function calls Normalized run-time Native Valgrind Instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-22
SLIDE 22

22 Alexis Engelke 2020

SPEC CPU2017 Results

6 2 . g c c 6 5 . m c f 6 2 .

  • m

n e t p p 6 2 3 . x a l a n c b m k 6 2 5 . x 2 6 4 6 3 1 . d e e p s j e n g 6 4 1 . l e e l a 6 4 8 . e x c h a n g e 2 6 5 7 . x z 6 3 . b w a v e s 6 7 . c a c t u B S S N 6 1 9 . l b m 6 2 1 . w r f 6 2 7 . c a m 4 6 2 8 . p

  • p

2 6 3 8 . i m a g i c k 6 4 4 . n a b 6 4 9 . f

  • t
  • n

i k 3 d 6 5 4 . r

  • m

s g e

  • m

e a n

2 4 6 8 10 12 14

Rewriting Time Matters

Instrew: 4.0x; Valgrind: 4.5x High rewriting time: 31% Normalized run-time Native Valgrind Instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-23
SLIDE 23

23 Alexis Engelke 2020

Rewriting Overhead

◮ Mean Rewriting Time: 0.94%

◮ Notable exception: 602.gcc with 31%

◮ Mean Rewriting Time Breakdown:

◮ Most time spent for machine code generation ◮ SelectionDAG instruction selector known to be slow ◮ Replacement GlobalISel not yet ready Lift (12%) Optimize (22%) Code Gen. (65%) Link (<1%)

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-24
SLIDE 24

24 Alexis Engelke 2020

Discussion

◮ Clear performance improvement over Valgrind

◮ More and better optimizations ◮ High-quality code generator

◮ Expected to be faster than DBILL

◮ Instrew: 109% overhead on SPEC CPU2017 INT ◮ DBILL: 240% overhead on SPEC CINT2006

◮ Biggest drawback: rewriting time

◮ Needs to amortize over run of program ◮ Ongoing developments in LLVM will reduce this issue

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation

slide-25
SLIDE 25

25 Alexis Engelke 2020

Instrew: LLVM-based DBI

◮ Dynamic Binary Instrumentation based on LLVM ◮ First to lift whole functions directly to LLVM-IR, use LLVM’s high-quality optimizer/JIT code generator ◮ Client-server approach enabling further optimizations ◮ Reduction of overhead by 80% compared to Valgrind

Instrew is Free Software!

https://github.com/aengelke/instrew

Introduction Lifting x86-64 to LLVM-IR Instrumentation Framework Evaluation