Hardware-accelerated Galois Field Arithmetic on the ARMv8 - - PowerPoint PPT Presentation

hardware accelerated galois field arithmetic on the armv8
SMART_READER_LITE
LIVE PREVIEW

Hardware-accelerated Galois Field Arithmetic on the ARMv8 - - PowerPoint PPT Presentation

Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture Markus Ongyerth Chair for Network Architectures and Services Department for Computer Science Technische Universit at M unchen September 30, 2014 Markus Ongyerth:


slide-1
SLIDE 1

Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture

Markus Ongyerth

Chair for Network Architectures and Services Department for Computer Science Technische Universit¨ at M¨ unchen

September 30, 2014

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 1

slide-2
SLIDE 2

Outline

1

Motivation Network Coding ARMv8

2

Algorithms

3

Results

4

Contributions

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 2

slide-3
SLIDE 3

Network Coding

A C B D E F

Figure: Node composition of a butterfly network

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 3

slide-4
SLIDE 4

Network Coding

A C B D E F

Figure: Message passing on normal routed network

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 4

slide-5
SLIDE 5

Network Coding

A C B D E F

Figure: Message passing with network coding

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 5

slide-6
SLIDE 6

ARMv8-Apple A7

64bit 1.3 GHz 64/64kib L1 cache 1MiB L2 cache

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 6

slide-7
SLIDE 7

imul and shuffle

imul Possible on GPRs Benefits naturally from bigger registers Can benefit from SIMD shuffle Uses precomputed values Only useable with SIMD extensions (table-lookup) Currently not supported by Apple-LLVM

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 7

slide-8
SLIDE 8

Benchmark

Generation of 16 128B to 8KiB Results in Gbps

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 8

slide-9
SLIDE 9

Result for iPad

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

L1 L2 XOR NEON XOR 32 bit XOR 64 bit

Figure: GF(2) - base performance

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 9

slide-10
SLIDE 10

Results in GF(2)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

XOR 32 bit A7 XOR 32 bit Exynos5

Figure: GF(2) - base performance

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

slide-11
SLIDE 11

Results in GF(2)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

XOR 32 bit A7 XOR 32 bit Exynos5 XOR 64 bit A7 XOR 64 bit Exynos5

Figure: GF(2) - 64bit GPR

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

slide-12
SLIDE 12

Results in GF(2)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0

XOR 32 bit A7 XOR 128 bit A7 XOR 128 bit Exynos5

Figure: GF(2) - NEON

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

slide-13
SLIDE 13

Results in GF(4)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0

imul NEON ipad imul NEON exynos5 table lookup ipad table lookup exynos5

Figure: GF(4) - base performance

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11

slide-14
SLIDE 14

Results in GF(4)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0

shuffle imul NEON ipad imul NEON exynos5

Figure: GF(4) - performance with shuffle

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11

slide-15
SLIDE 15

Results in GF(16)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

imul NEON ipad imul NEON exynos5 table lookup ipad table lookup exynos5

Figure: GF(16) - base performance

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12

slide-16
SLIDE 16

Results in GF(16)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

shuffle imul NEON ipad imul NEON exynos5

Figure: GF(16) - performance with shuffle

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12

slide-17
SLIDE 17

Results in GF(256)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

imul NEON ipad imul NEON exynos5 table lookup ipad table lookup exynos5

Figure: GF(256) - base performance

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13

slide-18
SLIDE 18

Results in GF(256)

0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

shuffle imul NEON ipad imul NEON exynos5

Figure: GF(256) - performance with shuffle

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13

slide-19
SLIDE 19

What I did

Deactivate unsupported parts GUI for libmoepgf benchmark on IOS Benchmark on ARMv8 Expected results Shuffle is still to see

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 14

slide-20
SLIDE 20

Selected references

The libmoepgf library is available for download at:

http://moep80211.net/plink/netcod2014 Stephan G¨ unther Efficient GF Arithmetic for Linear Network Coding using Hardware SIMD extensions (2014) Shuo-Yen Robert Li, Senior Member, IEEE, Raymond W. Yeung, Fellow, IEEE, and Ning Cai: Linear Network Coding, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 2, FEBRUARY 2003

Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 15