Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo - PowerPoint PPT Presentation

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018

Ivo Raisr 39.6 GNU Toolchain

Why? Valgrind master If-Then-Else VEX register support into IR allocator v3

VEX operation ------ IMark(0x4001CA3, 4, 0) ------ movq 0x20(%rbp),%r10 t12 = GET:I64(32) movq 0x40(%rbp),%r9 STle(Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8))) = t12 0x4001CA3: movq %rdx,(%rsi,%rax,8) movq 0x10(%rbp),%r8 movq %r10,0x0(%r9,%r8,8) assembly rcode e t e t o z IR a i I m c R i t o p o l l a s r e instrument t isel s i g emit IR e r vcode ------ IMark(0x4001CA3, 4, 0) ------ assembly t0 = Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8)) STle(t0) = GET:I64(32) -- t12 = GET:I64(32) PUT(184) = 0x4001CA7:I64 movq 0x20(%rbp),%vR12 -- STle(Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8))) = t12 movq 0x40(%rbp),%vR24 movq 0x10(%rbp),%vR25 movq %vR12,0x0(%vR24,%vR25,8)

VEX register allocator 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 1 movq 0x40(%rbp),%vR65 1 movq 0x40(%rbp),%r10 2 movq 0x10(%rbp),%vR66 2 movq 0x10(%rbp),%r9 3 leaq 0x0(%vR65,%vR66,8),%vR8 3 leaq 0x0(%r10,%r9,8),%rbx 4 movq 0x3C0(%rbp),%vR35 4 movq 0x3C0(%rbp),%r15 5 movq 0x20(%rbp),%vR12 5 movq 0x20(%rbp),%r14 6 movq 0x3E0(%rbp),%vR67 6 movq 0x3E0(%rbp),%r10 7 movq 0x3B0(%rbp),%vR69 7 movq 0x3B0(%rbp),%r9 8 movq %vR69,%vR68 8 shlq $3,%r9 9 shlq $3,%vR68 9 orq %r9,%r10 10 movq %vR67,%vR70 10 callnz[0,RLPri_None] 0x58024160 11 orq %vR68,%vR70 11 movq %rbx,%rdi 12 callnz[0,RLPri_None] 0x58024160 12 movq %r15,%rsi 13 movq %vR8,%rdi 13 call[2,RLPri_None] 0x58023660 14 movq %vR35,%rsi 14 movq %r14,(%rbx) 15 call[2,RLPri_None] 0x58023660 15 movq %r15,%r10 16 movq %vR12,(%vR8) 16 notq %r10 17 movq %vR35,%vR75 17 movq %r14,%r9 18 notq %vR75 ... 19 movq %vR12,%vR74 ... vcode rcode

RegAlloc Terminology 1 movq 0x40(%rbp), %vR65 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 1 movq 0x40(%rbp),%vR65 2 movq 0x10(%rbp),%vR66 2 movq 0x10(%rbp), %vR66 3 leaq 0x0(%vR65,%vR66,8),%vR8 4 movq 0x3C0(%rbp),%vR35 ... 5 movq 0x20(%rbp),%vR12 6 movq 0x3E0(%rbp),%vR67 7 movq 0x3B0(%rbp),%vR69 8 movq %vR69, %vR68 8 movq %vR69,%vR68 9 shlq $3,%vR68 9 shlq $3, %vR68 10 movq %vR67,%vR70 11 orq %vR68,%vR70 12 callnz[0,RLPri_None] 0x58024160 10 movq %vR67, %vR70 13 movq %vR8,%rdi 14 movq %vR35,%rsi 11 orq %vR68, %vR70 15 call[2,RLPri_None] 0x58023660 16 movq %vR12,(%vR8) 17 movq %vR35,%vR75 12 callnz[0, RLPri_None] <addr> 18 notq %vR75 19 movq %vR12,%vR74 ... 13 movq %vR8, %rdi 14 movq %vR35, %rsi 15 call[2, RLPri_None] <addr> vcode ...

RegAlloc v3 Passes 1. scan insns ... %vR69 %rdi 8 movq %vR69, %vR68 9 shlq $3, %vR68 2. coalescing 10 movq %vR67, %vR70 %vR67 -> %vR70 -> %vR9 11 orq %vR68, %vR70 12 callnz[0, RLPri_None] <addr> 3. spill slots 13 movq %vR8, %rdi 14 movq %vR35, %rsi 4. process insns 15 call[2, RLPri_None] <addr> ... %vR68 ... %rdi 21 movq %vR70, %vR9 %vR69 ... %rax %vR70 ... %r9

RegAlloc v3 State vreg state ... 8 movq %vR69, %vR68 %vR68 ... [8, 12) ... %rdx... [12] 9 shlq $3, %vR68 %vR69 ... [7, 9) ... --- ... [10] 10 movq %vR67, %vR70 11 orq %vR68, %vR70 %vR70 ... [10, 12) ... %r9 ... [5] 12 callnz[0, RLPri_None] <addr> dead before 13 movq %vR8, %rdi live after 14 movq %vR35, %rsi spill slot real reg 15 call[2, RLPri_None] <addr> ... 21 movq %vR70, %vR9 %vR67 -> %vR70 -> %vR9

RegAlloc v3 State II. ... rreg state 8 movq %vR69, %vR68 9 shlq $3, %vR68 %rdx ... %vR68 10 movq %vR67, %vR70 %rcx ... --- 11 orq %vR68, %vR70 %rdi ... [reserved] 12 callnz[0, RLPri_None] <addr> 13 movq %vR8, %rdi 14 movq %vR35, %rsi 15 call[2, RLPri_None] <addr> rreg universe ... 21 movq %vR70, %vR9 %r12, %r13, %r14, %r15, %rbx, %rsi, %rdi, %r8, %r9, %r10 HRcInt64

Processing insn (simple cases) vreg state rreg state movq 0x40(%rbp), %vR68 %vR68 ... %r10 %r9 ... --- movq 0x40(%rbp), %r10 %vR70 ... --- %r10 ... %vR68 orq %vR68, %vR70 %vR68 ... %r10 %r9 ... %vR70 orq %r10, %r9 %vR70 ... %r9 %r10 ... %vR68 movq %v70, %rsi %rsi ... reserved call[2, RLPri_None] <addr> %vR68 ... %r10 %r9 ... %vR70 movq %r9, %rsi %vR70 ... %r9 %r10 ... %vR68

Processing insn (spill) %vR15 ... --- %r9 ... %vR70 movq 0x40(%rbp), %vR15 %vR68 ... %r10 %r10 ... %vR68 all rregs are taken, %vR70 ... %r9 ... what to do? (all assigned) spill slot movq %r9, 0xC0A(%rbp) movq 0x40(%rbp), %r9

Optimizations 1. MOV vregs coalescing 2. reusing spill slots 3. vreg spilling criteria 4. avoid spilling if rreg == spill slot 5. rreg allocation strategy 6. direct reload

5. rreg allocation strategy amd64 rreg universe for HRcInt64 %r12 %r13 %r14 %r15 callee save %rbx %rsi %rdi %r8 %r9 caller save %r10

6. direct reload from a spill slot addq %vR68, $0x9823, %vR15 %vR68 ... spilled standard way movq 0xC0A(%rbp), %r9 addq %r9, 0x9823, %r10 direct reload addq 0xC0A(%rbp), $0x9823, %r10

Benchmarks Memcheck on perf/bz2, amd64 total insns 16.0 v2 ratio 4,170 M v3 15.8 4,102 M regalloc insns 167 M 148 M v2 v3 v2 v3

VEX register allocator v3 is now the default. The old implementation available with: --vex-regalloc-version=2

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo - PowerPoint PPT Presentation

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo Raisr 39.6 GNU Toolchain Why? Valgrind master If-Then-Else VEX register support into IR allocator v3 VEX operation ------ IMark(0x4001CA3, 4, 0) ------ movq

Tuning Valgrind for your Workload Hints, tricks and tips to effectively use Valgrind on small or

Running Valgrind on multiple processors: a prototype Philippe Waroquiers FOSDEM 2015 valgrind

VEX: Where next for Valgrind's dynamic VEX: Where next for Valgrind's dynamic instrumentation

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

CSE 333 SECTION 2 gdb, valgrind, pointers & structs 1 Questions, Comments, Concerns

My code doesnt crash why should I still use Valgrind? Tyson Whitehead April 16, 2014

Bochs, Atom, Fit, Valgrind Vince Weaver October 25, 2004 Bochs - Background

Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind Aaron Pohle, Bjrn

Debugging programs on Linux: An Overview of gdb, idb, Insure++, Valgrind, ccmalloc and mpatrol

Tips & Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain

UNDERSTANDING PROCESSOR CACHE EFFECTS WITH VALGRIND & VTUNE Chester Rebeiro Embedded Lab

Regular Expressions Principles of Programming Languages Colorado School of Mines

DWARF 5 and GNU extensions New ways go from binary to source Mark J. Wielaard Who am I Mark J.

MALT : MALloc Tracker A memory profiling tool 3/02/2019 MALT, Sbastien Valat 1 Questions

Removing ROP Gadgets from OpenBSD AsiaBSDCon 2019 Todd Mortimer mortimer@openbsd.org Overview

LLVM Backend for HHVM Brett Simmers Maksim Panchenko Facebook HHVM JIT for PHP/Hack

Binarylevel program analysis: A discussion of x8664 Gang Tan CSE 597 Spring 2019 Penn

and Threads CS 4411 Spring 2020 Outline for Today Intro to EGOS and GitHub Address Space

Lecture 2: Processor Design, Single-Processor Performance G63.2011.002/G22.2945.001 September

How Julia Goes Fast Leah Hanson Main Points 1. Design choices make Julia fast. 2. Design and

STACK AND HEAP: COMMONLY ABUSED TERMS Simon Brand Codeplay Soware Ltd. AGENDA A bit about

Learning Automatic Schedulers through Projective Reparameterization Ajay Jain Saman Amarasinghe

Using Hardware Performance Events for Instruction-Level Monitoring on the x86 Architecture

1 2/26/2020 void multstore Today Code Examples (long x, long y, long *dest) { long t =

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo - PowerPoint PPT Presentation

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo Raisr 39.6 GNU Toolchain Why? Valgrind master If-Then-Else VEX register support into IR allocator v3 VEX operation ------ IMark(0x4001CA3, 4, 0) ------ movq

Tuning Valgrind for your Workload Hints, tricks and tips to effectively use Valgrind on small or

Running Valgrind on multiple processors: a prototype Philippe Waroquiers FOSDEM 2015 valgrind

VEX: Where next for Valgrind's dynamic VEX: Where next for Valgrind's dynamic instrumentation

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

CSE 333 SECTION 2 gdb, valgrind, pointers &amp; structs 1 Questions, Comments, Concerns

My code doesnt crash why should I still use Valgrind? Tyson Whitehead April 16, 2014

Bochs, Atom, Fit, Valgrind Vince Weaver October 25, 2004 Bochs - Background

Capability Wrangling Made Easy: Debugging on a Microkernel with Valgrind Aaron Pohle, Bjrn

Debugging programs on Linux: An Overview of gdb, idb, Insure++, Valgrind, ccmalloc and mpatrol

Tips &amp; Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain

UNDERSTANDING PROCESSOR CACHE EFFECTS WITH VALGRIND &amp; VTUNE Chester Rebeiro Embedded Lab

Regular Expressions Principles of Programming Languages Colorado School of Mines

DWARF 5 and GNU extensions New ways go from binary to source Mark J. Wielaard Who am I Mark J.

MALT : MALloc Tracker A memory profiling tool 3/02/2019 MALT, Sbastien Valat 1 Questions

Removing ROP Gadgets from OpenBSD AsiaBSDCon 2019 Todd Mortimer mortimer@openbsd.org Overview

LLVM Backend for HHVM Brett Simmers Maksim Panchenko Facebook HHVM JIT for PHP/Hack

Binarylevel program analysis: A discussion of x8664 Gang Tan CSE 597 Spring 2019 Penn

and Threads CS 4411 Spring 2020 Outline for Today Intro to EGOS and GitHub Address Space

Lecture 2: Processor Design, Single-Processor Performance G63.2011.002/G22.2945.001 September

How Julia Goes Fast Leah Hanson Main Points 1. Design choices make Julia fast. 2. Design and

STACK AND HEAP: COMMONLY ABUSED TERMS Simon Brand Codeplay Soware Ltd. AGENDA A bit about

Learning Automatic Schedulers through Projective Reparameterization Ajay Jain Saman Amarasinghe

Using Hardware Performance Events for Instruction-Level Monitoring on the x86 Architecture

1 2/26/2020 void multstore Today Code Examples (long x, long y, long *dest) { long t =

CSE 333 SECTION 2 gdb, valgrind, pointers & structs 1 Questions, Comments, Concerns

Tips & Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain

UNDERSTANDING PROCESSOR CACHE EFFECTS WITH VALGRIND & VTUNE Chester Rebeiro Embedded Lab