bungee jumps accelerating indirect branches through
play

Bungee Jumps: Accelerating Indirect Branches Through - PowerPoint PPT Presentation

Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S. McFarlin Craig Zilles 1 Indirect Branches Are Increasingly Predictable 20 Nehalem Sandy Bridge Haswell TAGE


  1. Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel ¡S. ¡McFarlin Craig ¡Zilles 1

  2. Indirect ¡Branches ¡Are ¡Increasingly ¡Predictable 20 Nehalem Sandy Bridge Haswell TAGE Mispredicts/Kilo Instrs 15 B e 10 t t e 5 r 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg 2

  3. Indirect ¡Branches ¡Are ¡Increasingly ¡Predictable 20 Nehalem Sandy Bridge Haswell TAGE Mispredicts/Kilo Instrs 15 B e 10 t t e 5 r 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg And ¡Unbiased predictability bias 1 0.75 0.5 0.25 0 meteor raytrace btree fannkuch fasta richards nqueens revcomp float specnorm regexdna knuke mandelbrot Geomean 3

  4. In-­‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 60 VTable Shape area() 10 area() 4

  5. In-­‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 if s->type == Circle 60 area() VTable Shape area() else if s->type == Rect 10 R C P O area() else if s->type == Square area() area() else area() 4

  6. In-­‑Order ¡Machines ¡Specialize ¡Based ¡on ¡Branch ¡Bias ¡or ¡ Eliminate ¡Branch ¡PredicCon ¡Altogether area() 30 if s->type == Circle 60 area() VTable Shape area() else if s->type == Rect 10 R C P O area() else if s->type == Square p0 = (obj is type B ); area() p1 = (obj is type C ) area() p2 = (obj is type D ) else Predication area() p0 : r = B::func( ); p1 : r = C::func( ); p2 : r = D::func( ); if( !( p0 | p1 | p2 )) r = obj->func( ); 4

  7. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  8. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  9. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  10. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  11. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  12. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  13. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  14. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  15. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  16. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

  17. Challenge: ¡Non-­‑Reconvergence ¡& ¡Large ¡ Number ¡of ¡Targets F A sjeng: f_in_check 7 ld r8, [rip+0x39] ld r9, [rax*8+0x94] ld edi, [rsi*4+0x46] jmp r9 movsxd rcx, r8 G ld r10, [rcx*4+0x46] 25 11 ld r8, [rip+0x5e] cmp edi, r10 ld edi, [rsi*4+0x42] jnz M 24 24 B movsxd rcx, r8 14 19 Text ld r8, [rip+0x5b] ld r10, [rcx*4+0x42] ld edi, [rsi*4+0x92] cmp edi, r10 C movsxd rcx, r8 jnz L ld edx, [rsi*4+0x7d] ld r10, [rcx*4+0x92] D cmp edx, 0x6 cmp edi, r10 E ld ecx, [rip+0x58] 99 jz I jnz H 1 ld edx, [rcx*8+0x10] ld esi, [rip+0x81] test edx, edx lea edi, [rsi+rcx*1] 5 95 99 1 cmp edi, edx jz K jz J 99 1 96 4 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend