AArch64 performance analysis and resulted enhancements on GCC Feng - PowerPoint PPT Presentation

AArch64 performance analysis and resulted enhancements on GCC Feng Xue, Jiangning Liu November 23, 2019

Agenda • Loop split on semi-invariant conditional statement • IPA constant propagation and recursive function versioning • Some issues in current register allocator • Trapless conditional selection instruction generation 2

Loop conditional statement elimination • Loop Split • Loop Unswitch for (i = 0; i < 100; i++) { for (i = 0; i < 100; i++) if (a != b) { { if (i < 40) for (i = 0; i < 100; i++) if (a != b) S1; S1; S1; else } else S2; else { S2; } for (i = 0; i < 100; i++) } S2; for (i = 0; i < 40; i++) } S1; for (i = 40; i < 100; i++) S2; 3

Loop semi-invariant conditional statement • Loop invariant condition ？ • Simple semi-invariant pattern f(a)? extern int flag; a = ... No change to a for (i = 0; i < 100; i++) { if (flag) ... printf (…); } for (i = 0; i < 100; i++) { if (a < 10) a = new_value (); } 4

How to eliminate semi-invariant condition? • Loop Unswitch • Loop Split if (flag) { for (i = 0; i < 100; i++) { for (i = 0; i < 100; i++) { if (flag) if (flag) printf (…); printf (…); else { S1; S1; } i++; } break; else { } for (i = 0; i < 100; i++) { } S1; for (; i < 100; i++) } S1; 5

Identify semi-invariant condition • Conditional expression tree evaluation A_1 = PHI(...) • Normal value operation • SSA-PHI merge operation if(A_1) foo(int p, int q, int r) { a = r; for (i = 0; i < 100; i++) { B_1 = ... B_2 = ... if (a) b = q; B_3 = PHI(B_1, B_2) else b = p; if (b * b < 10) cond = (B_3 * B_3 < 10) a = new_value(); Both value expression and the condition that it } control-depends on should be semi-invariant. } 6

Identify semi-invariant condition • Semi-invariant loop iteration value V_1 = PHI(init, V_5) if(cond) V_4 = ... V_3 = PHI(V_1, V_4) V_5 = V_3 7

IPA constant propagation • Jump function • In-memory constant f() { f(int a, int b) { g(b, 3, -a, a + 1); int a = 1; struct {f0, f1} b = {2, 3}; } JF{f->g}[0] = param#1 g(&a, b); } JF{f->g}[1] = 3 JF_agg{f->g}[0, @0] = 1 JF{f->g}[2] = -param#0 JF{f->g}[3] = param#0 + 1 JF_agg{f->g}[1, @0] = 2 JF_agg{f->g}[1, @4] = 3 8

IPA constant propagation • Parameter passing in FORTRAN subroutine f(a) f(int *a) { integer, intent(in) a int t = *a + 1; call g(a + 1) g(&t) end subroutine } • Enhanced in-memory constant propagation ▪ JF_agg[i, @offset] = constant ▪ JF_agg[i, @offset] = param#j OP constant ▪ JF_agg[i, @offset] = *(param#j + offset2) OP constant 9

Recursive function optimizations f(int i) { • Recursive tail call transformation if (i == 4) { • Recursive inlining do_work(); • Recursive versioning return; } do_prepare(); main() f(i + 1); do_post(); } main() { f<i=1>() f<i=2>() f<i=3>() f<i=4>() f(1); } 10

Recursive function versioning • Only for self-recursive function • New option for recursive versioning depth B() C() • Recursive constant propagation strategy 1 6 f(int i) { f(i) D() g(i); f(i + 1); 6 2,3,4 0 7,8,9 1 } B() { f(1); } f(i) g(i) C() { f(6); } D() { g(0); } Versioning depth is supposed to be 4. 11

IPA constant propagation TODOs • Global variable value propagation • Extend jump function int CST; f(int a, int b) { init() { CST = 4; } g(1 – a, b ? 1 : 2, a + b); calc(int i) { return i / CST; } } main() { init(); JF{f->g}[0] = 1 – param#0 ... = calc(100); JF{f->g}[1] = param#1 ? 1 : 2; } JF{f->g}[2] = param#0 + param#1 calc(100) -> calc(100, CST) 12

Issues in register allocator • Context sensitive • Root cause ▪ Execution profile normalization error f1() { f1() { S1 BB1 (30) -> 30/10 = 3 } Different allocation result BB2 (1000) -> 1000/10 = 100 f2() { } if (cond) f2() { S1 if (cond) else Irrelevant code BB1 (3) - > 3/10 = 0.3 ≈ 1 S2 BB2 (100)-> 100/10 = 10 } } ▪ Code generation instability impacts inlining ▪ Hard to do code and performance comparison 13

Issues in register allocator • Top-down allocation order • Possible solutions Region 1 ▪ Use live range split to replace spilling v1 =... ▪ Do post refinement on outside region mem reg spill Region 2 mem mem mem reg reload ...= v1 ▪ Local information impacts global allocation decision in too early stage 14

Trapless conditional selection instruction generation int f(int k, int b) { sp, sp, #16 uxtw x2, w0 uxtw x0, w0 add x3, sp, 8 int a[2]; add x2, sp, 8 ldr w5, [sp, 16] if (b < a[k]) { ldr w3, [x2, x0, lsl 2] ldr w4, [x3, x2, lsl 2] a[k] = b; cmp w3, w1 cmp w4, w1 } bls .L2 csel w1, w1, w4, hi return a[0]+a[2]; str w1, [x2, x0, lsl 2] str w1, [x3, x2, lsl 2] } .L2: ldr w0, [sp, 8] ldr w1, [sp, 8] add sp, sp, 16 ▪ For “a” is local variable, ldr w0, [sp, 16] add w0, w0, w5 always writable, introducing add sp, sp, 16 ret extra write on “a” will not add w0, w1, w0 cause trap. ret 15

Build something with us. 与我们一起创造未来 ! http://developer.amperecomputing.com 16

Thanks 谢谢 17

AArch64 performance analysis and resulted enhancements on GCC Feng - PowerPoint PPT Presentation

AArch64 performance analysis and resulted enhancements on GCC Feng Xue, Jiangning Liu November 23, 2019 Agenda Loop split on semi-invariant conditional statement IPA constant propagation and recursive function versioning Some issues

Porting FreeBSD to AArch64 Andrew Turner andrew@fubar.geek.nz 12 June 2015 About me Source

Dynamic Tracing Tools on ARM/AArch64 platform Updates and Challenges Hiroyuki ISHII Panasonic

Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, Sofia Laura Promberger 1 2

The implementation of AArch64 NEON Instruction Set Ana Pazos Senior Staff Engineer, QuIC

Credit Policy Enhancements Final Draft Proposal Kevin King Senior Financial Analyst and Credit

Welcome to our information session on iJAN enhancements! This webinar aims to inform iJAN users of

LED Enhancements: Federal LED Enhancements: Federal Workers Workers 2009 LED Partnership

ENHANCEMENTS PRESENTATION SUMMARY THE BROADMOOR ENHANCEMENTS UPDATE MEETING Tuesday, April 30,

CDMA Network Enhancements Danny Locklear Vice President, Global Marketing Carrier Networks

Verification Verification, Performance Performance Analysis Performance Performance Analysis

Hydrological changes across Russia resulted from climate variability and human impacts Alexander

Browser Enhancements to Help Improve Page Load Performance Using Delta Delivery W3C Performance

High Performance Systems EuroMPI 2015 Objectives Yet another performance analysis tool

NUPlans Application Changes Fall 2015 Enhancements and IBM Product Upgrade Overview

Decision on Credit and Financial Tariff Enhancements Ryan Seghesio Chief Financial Officer &

Safety Enhancements and OPEX Savings from Appropriate Installation of Flexible Deluge Pipework

MAINTAINING SQL INVARIANTS IN WEAKLY CONSISTENT DATABASES Nuno Preguia (NOVA LINCS,

The Investigatory Powers Act 2016: practical tips in 20 minutes for UKNOF39 Neil Brown

EVENT-DRIVEN AND DATA-DRIVEN CONTROL AND OPTIMIZATION IN CYBER-PHYSICAL SYSTEMS C. G. Cassandras

Implementing Expanded RPSs in Illinois and Michigan Hosted by Warren Leon, Executive Director,

REAL CLOSED FIELDS AND MODELS OF FRAGMENTS OF ARITHMETIC (Joint work P. DAquino and S.

Dorothea Wagner Algorithm Engineering e s i g Route Planning D n E Graph Clustering x p

Brief History of IPE at UCSF Early findings from a longitudinal mixed-methods study of the

Addressing Human Capacity Building for Health from Academia's Perspective Building

AArch64 performance analysis and resulted enhancements on GCC Feng - PowerPoint PPT Presentation

AArch64 performance analysis and resulted enhancements on GCC Feng Xue, Jiangning Liu November 23, 2019 Agenda Loop split on semi-invariant conditional statement IPA constant propagation and recursive function versioning Some issues

Porting FreeBSD to AArch64 Andrew Turner andrew@fubar.geek.nz 12 June 2015 About me Source

Dynamic Tracing Tools on ARM/AArch64 platform Updates and Challenges Hiroyuki ISHII Panasonic

Porting the LHCb Stack from x86 (Intel) to aarch64 (ARM) CHEP 2018, Sofia Laura Promberger 1 2

The implementation of AArch64 NEON Instruction Set Ana Pazos Senior Staff Engineer, QuIC

Credit Policy Enhancements Final Draft Proposal Kevin King Senior Financial Analyst and Credit

Welcome to our information session on iJAN enhancements! This webinar aims to inform iJAN users of

LED Enhancements: Federal LED Enhancements: Federal Workers Workers 2009 LED Partnership

ENHANCEMENTS PRESENTATION SUMMARY THE BROADMOOR ENHANCEMENTS UPDATE MEETING Tuesday, April 30,

CDMA Network Enhancements Danny Locklear Vice President, Global Marketing Carrier Networks

Verification Verification, Performance Performance Analysis Performance Performance Analysis

Hydrological changes across Russia resulted from climate variability and human impacts Alexander

Browser Enhancements to Help Improve Page Load Performance Using Delta Delivery W3C Performance

High Performance Systems EuroMPI 2015 Objectives Yet another performance analysis tool

NUPlans Application Changes Fall 2015 Enhancements and IBM Product Upgrade Overview

Decision on Credit and Financial Tariff Enhancements Ryan Seghesio Chief Financial Officer &amp;

Safety Enhancements and OPEX Savings from Appropriate Installation of Flexible Deluge Pipework

MAINTAINING SQL INVARIANTS IN WEAKLY CONSISTENT DATABASES Nuno Preguia (NOVA LINCS,

The Investigatory Powers Act 2016: practical tips in 20 minutes for UKNOF39 Neil Brown

EVENT-DRIVEN AND DATA-DRIVEN CONTROL AND OPTIMIZATION IN CYBER-PHYSICAL SYSTEMS C. G. Cassandras

Implementing Expanded RPSs in Illinois and Michigan Hosted by Warren Leon, Executive Director,

REAL CLOSED FIELDS AND MODELS OF FRAGMENTS OF ARITHMETIC (Joint work P. DAquino and S.

Dorothea Wagner Algorithm Engineering e s i g Route Planning D n E Graph Clustering x p

Brief History of IPE at UCSF Early findings from a longitudinal mixed-methods study of the

Addressing Human Capacity Building for Health from Academia's Perspective Building

Decision on Credit and Financial Tariff Enhancements Ryan Seghesio Chief Financial Officer &