valgrind
play

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo - PowerPoint PPT Presentation

Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018 Ivo Raisr 39.6 GNU Toolchain Why? Valgrind master If-Then-Else VEX register support into IR allocator v3 VEX operation ------ IMark(0x4001CA3, 4, 0) ------ movq


  1. Valgrind register allocator overhaul Ivo Raisr FOSDEM 2018

  2. Ivo Raisr 39.6 GNU Toolchain

  3. Why? Valgrind master If-Then-Else VEX register support into IR allocator v3

  4. VEX operation ------ IMark(0x4001CA3, 4, 0) ------ movq 0x20(%rbp),%r10 t12 = GET:I64(32) movq 0x40(%rbp),%r9 STle(Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8))) = t12 0x4001CA3: movq %rdx,(%rsi,%rax,8) movq 0x10(%rbp),%r8 movq %r10,0x0(%r9,%r8,8) assembly rcode e t e t o z IR a i I m c R i t o p o l l a s r e instrument t isel s i g emit IR e r vcode ------ IMark(0x4001CA3, 4, 0) ------ assembly t0 = Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8)) STle(t0) = GET:I64(32) -- t12 = GET:I64(32) PUT(184) = 0x4001CA7:I64 movq 0x20(%rbp),%vR12 -- STle(Add64(GET:I64(64),Shl64(GET:I64(16),0x3:I8))) = t12 movq 0x40(%rbp),%vR24 movq 0x10(%rbp),%vR25 movq %vR12,0x0(%vR24,%vR25,8)

  5. VEX register allocator 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 1 movq 0x40(%rbp),%vR65 1 movq 0x40(%rbp),%r10 2 movq 0x10(%rbp),%vR66 2 movq 0x10(%rbp),%r9 3 leaq 0x0(%vR65,%vR66,8),%vR8 3 leaq 0x0(%r10,%r9,8),%rbx 4 movq 0x3C0(%rbp),%vR35 4 movq 0x3C0(%rbp),%r15 5 movq 0x20(%rbp),%vR12 5 movq 0x20(%rbp),%r14 6 movq 0x3E0(%rbp),%vR67 6 movq 0x3E0(%rbp),%r10 7 movq 0x3B0(%rbp),%vR69 7 movq 0x3B0(%rbp),%r9 8 movq %vR69,%vR68 8 shlq $3,%r9 9 shlq $3,%vR68 9 orq %r9,%r10 10 movq %vR67,%vR70 10 callnz[0,RLPri_None] 0x58024160 11 orq %vR68,%vR70 11 movq %rbx,%rdi 12 callnz[0,RLPri_None] 0x58024160 12 movq %r15,%rsi 13 movq %vR8,%rdi 13 call[2,RLPri_None] 0x58023660 14 movq %vR35,%rsi 14 movq %r14,(%rbx) 15 call[2,RLPri_None] 0x58023660 15 movq %r15,%r10 16 movq %vR12,(%vR8) 16 notq %r10 17 movq %vR35,%vR75 17 movq %r14,%r9 18 notq %vR75 ... 19 movq %vR12,%vR74 ... vcode rcode

  6. RegAlloc Terminology 1 movq 0x40(%rbp), %vR65 0 (evCheck) decl 0x8(%rbp); jns nofail; jmp *(%rbp); nofail: 1 movq 0x40(%rbp),%vR65 2 movq 0x10(%rbp),%vR66 2 movq 0x10(%rbp), %vR66 3 leaq 0x0(%vR65,%vR66,8),%vR8 4 movq 0x3C0(%rbp),%vR35 ... 5 movq 0x20(%rbp),%vR12 6 movq 0x3E0(%rbp),%vR67 7 movq 0x3B0(%rbp),%vR69 8 movq %vR69, %vR68 8 movq %vR69,%vR68 9 shlq $3,%vR68 9 shlq $3, %vR68 10 movq %vR67,%vR70 11 orq %vR68,%vR70 12 callnz[0,RLPri_None] 0x58024160 10 movq %vR67, %vR70 13 movq %vR8,%rdi 14 movq %vR35,%rsi 11 orq %vR68, %vR70 15 call[2,RLPri_None] 0x58023660 16 movq %vR12,(%vR8) 17 movq %vR35,%vR75 12 callnz[0, RLPri_None] <addr> 18 notq %vR75 19 movq %vR12,%vR74 ... 13 movq %vR8, %rdi 14 movq %vR35, %rsi 15 call[2, RLPri_None] <addr> vcode ...

  7. RegAlloc v3 Passes 1. scan insns ... %vR69 %rdi 8 movq %vR69, %vR68 9 shlq $3, %vR68 2. coalescing 10 movq %vR67, %vR70 %vR67 -> %vR70 -> %vR9 11 orq %vR68, %vR70 12 callnz[0, RLPri_None] <addr> 3. spill slots 13 movq %vR8, %rdi 14 movq %vR35, %rsi 4. process insns 15 call[2, RLPri_None] <addr> ... %vR68 ... %rdi 21 movq %vR70, %vR9 %vR69 ... %rax %vR70 ... %r9

  8. RegAlloc v3 State vreg state ... 8 movq %vR69, %vR68 %vR68 ... [8, 12) ... %rdx... [12] 9 shlq $3, %vR68 %vR69 ... [7, 9) ... --- ... [10] 10 movq %vR67, %vR70 11 orq %vR68, %vR70 %vR70 ... [10, 12) ... %r9 ... [5] 12 callnz[0, RLPri_None] <addr> dead before 13 movq %vR8, %rdi live after 14 movq %vR35, %rsi spill slot real reg 15 call[2, RLPri_None] <addr> ... 21 movq %vR70, %vR9 %vR67 -> %vR70 -> %vR9

  9. RegAlloc v3 State II. ... rreg state 8 movq %vR69, %vR68 9 shlq $3, %vR68 %rdx ... %vR68 10 movq %vR67, %vR70 %rcx ... --- 11 orq %vR68, %vR70 %rdi ... [reserved] 12 callnz[0, RLPri_None] <addr> 13 movq %vR8, %rdi 14 movq %vR35, %rsi 15 call[2, RLPri_None] <addr> rreg universe ... 21 movq %vR70, %vR9 %r12, %r13, %r14, %r15, %rbx, %rsi, %rdi, %r8, %r9, %r10 HRcInt64

  10. Processing insn (simple cases) vreg state rreg state movq 0x40(%rbp), %vR68 %vR68 ... %r10 %r9 ... --- movq 0x40(%rbp), %r10 %vR70 ... --- %r10 ... %vR68 orq %vR68, %vR70 %vR68 ... %r10 %r9 ... %vR70 orq %r10, %r9 %vR70 ... %r9 %r10 ... %vR68 movq %v70, %rsi %rsi ... reserved call[2, RLPri_None] <addr> %vR68 ... %r10 %r9 ... %vR70 movq %r9, %rsi %vR70 ... %r9 %r10 ... %vR68

  11. Processing insn (spill) %vR15 ... --- %r9 ... %vR70 movq 0x40(%rbp), %vR15 %vR68 ... %r10 %r10 ... %vR68 all rregs are taken, %vR70 ... %r9 ... what to do? (all assigned) spill slot movq %r9, 0xC0A(%rbp) movq 0x40(%rbp), %r9

  12. Optimizations 1. MOV vregs coalescing 2. reusing spill slots 3. vreg spilling criteria 4. avoid spilling if rreg == spill slot 5. rreg allocation strategy 6. direct reload

  13. 5. rreg allocation strategy amd64 rreg universe for HRcInt64 %r12 %r13 %r14 %r15 callee save %rbx %rsi %rdi %r8 %r9 caller save %r10

  14. 6. direct reload from a spill slot addq %vR68, $0x9823, %vR15 %vR68 ... spilled standard way movq 0xC0A(%rbp), %r9 addq %r9, 0x9823, %r10 direct reload addq 0xC0A(%rbp), $0x9823, %r10

  15. Benchmarks Memcheck on perf/bz2, amd64 total insns 16.0 v2 ratio 4,170 M v3 15.8 4,102 M regalloc insns 167 M 148 M v2 v3 v2 v3

  16. VEX register allocator v3 is now the default. The old implementation available with: --vex-regalloc-version=2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend