32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain - PowerPoint PPT Presentation

Porting & Optimising Code 32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain Working Group Linaro Connect, Dublin July 2013

A Presentation of Four Parts • Register Files • Structure Layout & Data Models • Atomics • Vectorization & Neon Intrinsics www.linaro.org

Simplification • View for those writing apps • No complicated kernel stuff Little Endian • www.linaro.org

Bias Warning Assembler Compiler www.linaro.org

Why 64-bit? Memory www.linaro.org

General Purpose Registers – 32-bit ARM r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (SP) r14 (LR) r15 (PC) www.linaro.org

General Purpose Registers r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (SP) r14 (LR) r15 (PC) www.linaro.org

General Purpose Registers r0 r16 r1 r17 r2 r18 r3 r19 r4 r20 r5 r21 r6 r22 r7 r23 r8 r24 r9 r25 r10 r26 r11 r27 r12 r28 r13 (SP) r29 r14 (LR) r30 r15 (PC) www.linaro.org

General Purpose Registers – 64-bit ARM r0 r16 r1 r17 r2 r18 r3 r19 r4 r20 r5 r21 r6 r22 r7 r23 r8 r24 r9 r25 r10 r26 r11 r27 r12 r28 r13 r29 r14 r30 (LR) r15 SP PC www.linaro.org

General Purpose Registers Bit 63 Bit 0 rN { wN xN www.linaro.org

General Purpose Registers – Consequences • Easier to do 64-bit arithmetic! • Less need to spill to the stack • Spare registers to keep more temporaries www.linaro.org

Structure Layout – 32-bit struct foo { int32_t a; 0: a 4: void* p; p 8: int32_t x; x 12: }; www.linaro.org

Structure Layout – 64-bit struct foo { int32_t a; 0: a 4: void* p; <hole> 8: int32_t x; 12: p }; 16: x 20: www.linaro.org

Structure Layout – 64-bit struct foo { void* p; 0: 4: p int32_t a; 8: int32_t x; a 12: }; x 16: www.linaro.org

Brief Aside • API: Application Programming Interface Defines the interfaces a programmer may use o High level o • ABI: Application Binary Interface Defines how to call functions, layout memory &c. o Low level o www.linaro.org

Data Models ILP32 int long long long pointer www.linaro.org

Data Models ILP32 int long long long pointer LP64 int long long long pointer www.linaro.org

Data Models ILP32 int long long long pointer LP64 LLP64 int int long long long long long long pointer pointer www.linaro.org

Data Models ILP32 LP64 LLP64 struct foo { 0: a a a int a; 4: l <hole> l long l; 8: int x; x x 12: l }; 16: x 20: www.linaro.org

That’s It... www.linaro.org

One more thing... www.linaro.org

One more thing... • Remove conditionalisation www.linaro.org

Two more things... • Remove conditionalisation • Add some new load/store semantics www.linaro.org

Three more things... • Remove conditionalisation • Add some new load/store semantics • Change the register layout for the floating-point/SIMD registers s3 s2 s1 s0 d1 d0 q0 s7 s6 s5 s4 d3 d2 q1 www.linaro.org

Three more things... • Remove conditionalisation • Add some new load/store semantics • Change the register layout for the floating-point/SIMD registers s0 d0 v0 s1 d1 v1 www.linaro.org

Four more things... • Remove conditionalization • Add some new load/store semantics • Change the register layout for the float-point/SIMD registers • Add some more SIMD instructions www.linaro.org

Many more things... • Remove conditionalization • Add some new load/store semantics • Change the register layout for the float-point/SIMD registers • Add some more SIMD instructions • ... www.linaro.org

Atomics #if defined(__GNUC__) && #else (defined(__i386__) || int AtomicAdd (volatile int* ptr, int defined(__x86_64__)) increment) int { AtomicAdd(volatile int* ptr, int increment) *ptr += increment; { return *ptr; int temp = increment; } __asm__ __volatile__( #endif "lock; xaddl %0,%1” : "+r" (temp), "+m" (*ptr) : : "memory"); return temp + increment; } www.linaro.org

Atomics type __atomic_add_fetch (type *ptr, type val, int memmodel) These built-in functions perform the operation suggested by the name, and return the result of the operation. That is, { *ptr op= val; return *ptr; } All memory models are valid. www.linaro.org

Atomics int AtomicAdd(volatile int* ptr, int increment) { return __atomic_add_fetch (ptr, increment, memmodel); } www.linaro.org

Atomics • There are basically three types of memory model defined by C++11 which GCC’s support is based upon: Sequentially Consistent o Acquire/Release o Relaxed o www.linaro.org

Atomics – Sequentially Consistent a = 1; if (x.load() == 20) x.store(20); assert (a == 1); www.linaro.org

Atomics – Relaxed a = 1; if ( x.load(memory_order_relaxed)) == 20) x.store(20, memory_order_relaxed); assert (a == 1); www.linaro.org

Atomics – Acquire/Release x.store (10, memory_order_release); y.store (20, memory_order_release); assert (y.load (memory_order_acquire) assert (y.load (memory_order_acquire) == 20 && == 0 && x.load (memory_order_acquire) == x.load (memory_order_acquire) == 0) 10) www.linaro.org

Atomics – Sequentially Consistent x.store (10); y.store (20); assert (y.load () == 20 && assert (y.load () == 0 && x.load () == 0) x.load () == 10) www.linaro.org

Atomics – Acquire/Release a = 1; if (x.load(memory_order_acquire) == 20) x.store(20, memory_order_release); assert (a == 1); www.linaro.org

Atomics int AtomicAdd(volatile int* ptr, int increment) { return __atomic_add_fetch (ptr, increment, __ATOMIC_SEQ_CST); } www.linaro.org

And Now For Something Completely Different... add: vld1.32 {q9}, [r1]! vld1.32 {q8}, [r2]! vadd.i32 q8, q9, q8 + subs r3, r3, #4 vst1.32 {q8}, [r0]! + bne add + bx lr + www.linaro.org

Autovectorisation void add(int *a, const int *b, const int *c, unsigned n) { unsigned i; for (i = 0; i < n; ++i) a[i] = b[i] + c[i]; } www.linaro.org

Autovectorisation .cpu generic ldr w7, [x2,x5] .file "t.c" add w4, w6, 1 .text add w7, w8, w7 .align 2 str w7, [x0,x5] .global add cmp w3, w4 .type add, %function bls .L1 add: ubfiz x4, x4, 2, 32 cbz w3, .L1 ldr w7, [x1,x4] add x4, x0, 16 ldr w5, [x2,x4] cmp x1, x4 add w6, w6, 2 add x5, x1, 16 add w5, w7, w5 cset w8, cs str w5, [x0,x4] cmp x0, x5 cmp w3, w6 cset w7, cs bls .L1 add x5, x2, 16 uxtw x6, w6 cmp x2, x4 lsl x6, x6, 2 cset w6, cs ldr w3, [x1,x6] cmp x0, x5 ldr w1, [x2,x6] cset w4, cs add w1, w3, w1 orr w5, w8, w7 str w1, [x0,x6] orr w4, w6, w4 .L1: tst w5, w4 ret beq .L3 .L3: cmp w3, 5 sub w6, w3, #1 bls .L3 add x6, x6, 1 lsr w7, w3, 2 lsl x6, x6, 2 mov x4, 0 mov x3, 0 lsl w6, w7, 2 .L11: mov w5, w4 ldr w5, [x1,x3] .L9: ldr w4, [x2,x3] add x8, x2, x4 add w4, w5, w4 add x9, x1, x4 str w4, [x0,x3] ld1 {v0.4s}, [x8] add x3, x3, 4 ld1 {v1.4s}, [x9] cmp x3, x6 add x8, x0, x4 bne .L11 add v0.4s, v1.4s, v0.4s ret add w5, w5, 1 .size add, .-add st1 {v0.4s}, [x8] .ident "GCC: (GNU) 4.9.0 20130416 (experimental)" cmp w5, w7 add x4, x4, 16 bcc .L9 cmp w3, w6 beq .L1 uxtw x5, w6 lsl x5, x5, 2 ldr w8, [x1,x5] www.linaro.org

32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain - PowerPoint PPT Presentation

Porting & Optimising Code 32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain Working Group Linaro Connect, Dublin July 2013 A Presentation of Four Parts Register Files Structure Layout & Data Models Atomics

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Supporting 64 bit pointers in RISCV 32 bit LLVM backend Reshabh Sharma Background: Prof.

Bit manipulations Operate on the bits of integers (0,...,31 for 4-byte integer) Single-bit

DactyMatch Green Bit Green Bit Fingerprint Recognition Recognition Fingerprint SDK v.2.2

bit (h 1 ,,._,, ~ informabon &. telecommunications south d a kota bit (h i ~- FV16

PCI-DSS Penetration Testing Adam Goslin, Co-Founder High Bit Security May 10, 2011 About

Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. Parhi A W-bit f ixed point

DROP 20% (121-1)/2=60 N= 1= Bit AND to SET 121 SO Is ODD - 60/2=30 60 BIT n= AND

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

Scalable Large-Margin x x the man bit the dog the man bit the dog x x

DOMAINS AND HOSTING Web Application Development AGENDA A bit about domain names A bit

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Binary Instrumentation Support for Measuring Performance in OpenMP Programs Mustafa Elfituri

Virtual Machines Virtual Machines What is a virtual machine? Examples? Benefits?

January 30, 2020 John Tang Vice President of Regulatory Affairs San Jose Water Company 110 West

A Black-Box Discrete Optimization Benchmarking (BB-DOB) Pipeline Survey: Taxonomy, Evaluation,

GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group University Carlos III of Madrid

eBusiness Architectures and Standards Anil K. Nori Software Architect Microsoft USA

Introduction & Service Overview 1 About us We are boutique IT services company serving

32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain - PowerPoint PPT Presentation

Porting & Optimising Code 32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain Working Group Linaro Connect, Dublin July 2013 A Presentation of Four Parts Register Files Structure Layout & Data Models Atomics

Listing Bit Strings List all bit strings of length 3. Listing Bit Strings List all bit strings

Lecture 13 : Lecture 13 : Special Bit Instructions Todays Goals L Learn bit-set and

Bit Basics Eric McCreath Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit

https://bit.ly/3pptcRS 3 4 https://bit.ly/2UiBgWq Vase Face Face https://bit.ly/3luge2Q

The MIPS instruction set architecture The MIPS has a 32 bit architecture, with 32 bit

Bit Basics A bit (Binary digIT) is single unit of binary storage. A bit is normally group with

Supporting 64 bit pointers in RISCV 32 bit LLVM backend Reshabh Sharma Background: Prof.

Bit manipulations Operate on the bits of integers (0,...,31 for 4-byte integer) Single-bit

DactyMatch Green Bit Green Bit Fingerprint Recognition Recognition Fingerprint SDK v.2.2

bit (h 1 ,,._,, ~ informabon &amp;. telecommunications south d a kota bit (h i ~- FV16

PCI-DSS Penetration Testing Adam Goslin, Co-Founder High Bit Security May 10, 2011 About

Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. Parhi A W-bit f ixed point

DROP 20% (121-1)/2=60 N= 1= Bit AND to SET 121 SO Is ODD - 60/2=30 60 BIT n= AND

bit.ly/uwctech Sen McHugh | Transformational Technology bit.ly/ SAMMS smc@uwcsea.edu.sg |

Scalable Large-Margin x x the man bit the dog the man bit the dog x x

DOMAINS AND HOSTING Web Application Development AGENDA A bit about domain names A bit

Porting OpenVMS to x86-64 Update Clair Grant Camiel Vanderhoeven April 8, 2016 Porting OpenVMS

Binary Instrumentation Support for Measuring Performance in OpenMP Programs Mustafa Elfituri

Virtual Machines Virtual Machines What is a virtual machine? Examples? Benefits?

January 30, 2020 John Tang Vice President of Regulatory Affairs San Jose Water Company 110 West

A Black-Box Discrete Optimization Benchmarking (BB-DOB) Pipeline Survey: Taxonomy, Evaluation,

GrPPI Generic Reusable Parallel Patterns Interface ARCOS Group University Carlos III of Madrid

eBusiness Architectures and Standards Anil K. Nori Software Architect Microsoft USA

Introduction &amp; Service Overview 1 About us We are boutique IT services company serving

bit (h 1 ,,._,, ~ informabon &. telecommunications south d a kota bit (h i ~- FV16

Introduction & Service Overview 1 About us We are boutique IT services company serving