hardware accelerated galois field arithmetic on the armv8
play

Hardware-accelerated Galois Field Arithmetic on the ARMv8 - PowerPoint PPT Presentation

Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture Markus Ongyerth Chair for Network Architectures and Services Department for Computer Science Technische Universit at M unchen September 30, 2014 Markus Ongyerth:


  1. Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture Markus Ongyerth Chair for Network Architectures and Services Department for Computer Science Technische Universit¨ at M¨ unchen September 30, 2014 Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 1

  2. Outline Motivation 1 Network Coding ARMv8 Algorithms 2 Results 3 4 Contributions Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 2

  3. Network Coding A B C D E F Figure: Node composition of a butterfly network Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 3

  4. Network Coding A B C D E F Figure: Message passing on normal routed network Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 4

  5. Network Coding A B C D E F Figure: Message passing with network coding Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 5

  6. ARMv8-Apple A7 64bit 1.3 GHz 64/64kib L1 cache 1MiB L2 cache Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 6

  7. imul and shuffle imul shuffle Possible on GPRs Uses precomputed values Benefits naturally from Only useable with SIMD bigger registers extensions (table-lookup) Can benefit from SIMD Currently not supported by Apple-LLVM Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 7

  8. Benchmark Generation of 16 128B to 8KiB Results in Gbps Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 8

  9. Result for iPad 11.0 L1 L2 XOR NEON 10.0 XOR 32 bit 9.0 XOR 64 bit 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 9

  10. Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 32 bit Exynos5 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

  11. Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 32 bit Exynos5 9.0 XOR 64 bit A7 XOR 64 bit Exynos5 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - 64bit GPR Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

  12. Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 128 bit A7 9.0 XOR 128 bit Exynos5 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - NEON Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10

  13. Results in GF(4) imul NEON ipad 4.0 imul NEON exynos5 3.6 table lookup ipad table lookup exynos5 3.2 2.8 2.4 2.0 1.6 1.2 0.8 0.4 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(4) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11

  14. Results in GF(4) shuffle 4.0 imul NEON ipad 3.6 imul NEON exynos5 3.2 2.8 2.4 2.0 1.6 1.2 0.8 0.4 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(4) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11

  15. Results in GF(16) 2.0 imul NEON ipad 1.8 imul NEON exynos5 1.6 table lookup ipad table lookup exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(16) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12

  16. Results in GF(16) 2.0 shuffle 1.8 imul NEON ipad 1.6 imul NEON exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(16) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12

  17. Results in GF(256) 2.0 imul NEON ipad 1.8 imul NEON exynos5 1.6 table lookup ipad table lookup exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(256) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13

  18. Results in GF(256) 2.0 shuffle 1.8 imul NEON ipad 1.6 imul NEON exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(256) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13

  19. What I did Deactivate unsupported parts GUI for libmoepgf benchmark on IOS Benchmark on ARMv8 Expected results Shuffle is still to see Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 14

  20. Selected references The libmoepgf library is available for download at: http://moep80211.net/plink/netcod2014 Stephan G¨ unther Efficient GF Arithmetic for Linear Network Coding using Hardware SIMD extensions (2014) Shuo-Yen Robert Li, Senior Member, IEEE, Raymond W. Yeung, Fellow, IEEE, and Ning Cai: Linear Network Coding, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 2, FEBRUARY 2003 Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend