hardware acceleration
play

Hardware Acceleration of a Software-based VPN Furkan Turan Ruan de - PowerPoint PPT Presentation

Hardware Acceleration of a Software-based VPN Furkan Turan Ruan de Clercq, Pieter Maene, Oscar Reparaz Ingrid Verbauwhede KU Leuven - COSIC VPN Introduction VPN (Virtual Private Network) encrypts the communication between two parties. 2 VPN


  1. Hardware Acceleration of a Software-based VPN Furkan Turan Ruan de Clercq, Pieter Maene, Oscar Reparaz Ingrid Verbauwhede KU Leuven - COSIC

  2. VPN Introduction VPN (Virtual Private Network) encrypts the communication between two parties. 2

  3. VPN Device Introduction Goal: Start with a VPN application, Convert it into a 2 port VPN device, Accelerate it with a cryptographic coprocessor. VPN Device VPN Device 3

  4. Software-based VPN How a software-based VPN application works: Application Application Virtual Physical VPN Network Network Application Interface Interface SigmaVPN: Light-weight, secure and modular software-based VPN 4

  5. 2 Port VPN Device with Hardware Accelerator The new Private Comm. module uses a Physical Network Interface. It is capable of even capturing broadcast messages. Coprocessor Linux Private Sigma Public Comm VPN Comm 5

  6. NaCl’s CryptoBox Alice Bob K S ← ECDH (K SEC,A , K PUB,B ) K S ← ECDH (K SEC,B , K PUB,A ) K D ← HSalsa20 (K S , N 1 ) S ← Salsa20(K D, N 2 || CTR) CT ← S ⊕ MSG MAC A ← Poly1305(CT, S) CT, MAC A , N 1,2 K D ← HSalsa20 (K S , N 1 ) S ← Salsa20(K D, N 2 || CTR) MAC B ← Poly1305(CT, S) Compare(MAC A , MAC B ) MSG ← S ⊕ CT 7

  7. One-time Authenticator: Poly1305 An update operation for each 128-bit blocks of the message The operation implements a modular multiplication in radix (2 130 -5) … Msg[0:127] Msg[128:256] Msg[x:x+128] 128 130 + Acc + + 131 + x x x MAC Acc Acc Acc ... Acc mod mod mod mod 2 128 2 130 -5 2 130 -5 2 130 -5 128 128 R S 9

  8. Poly1305’s Implementation Implemented using a school-book multiplication: • Big multiplication is divided into smaller blocks • Followed by propagation of the results x5 Each small block multiplication is handled in single-cycle multipliers of Zynq’s DSP48 Slices To boost the performance: • Parallel execution of smaller-block multiplications x5 • Parallel propagating the results 10

  9. Poly1305’s Implementation Block of Block Operand 1 Operand 2 X A datapath for each column to handle smaller block multiplications. x5 Result of a column is propagated to the next. + The multipliers set the critical path. 11

  10. Hardware Implementation • Processing System runs Linux - SigmaVPN. • DMA transfers data between co-processor and RAM. ZYNQ Programmable Logic (PL) Processing System AXI4 Lite (PS) AXI4 Stream ARM Cores AXI4 Full Cryptographic DMA Coprocessor CTRL Dest Add. Length 12

  11. Coprocessor's Datapath 512 512 128 512 512 13

  12. Scheduling • Operation is divided into time slots • A time slot is the time to process a 512-bit message block • Each hardware module is active in each time slot 14

  13. Hardware Utilization Single Instance of Duplicated Processing Blocks Processing Blocks • Resource Utilization: 53.67% • Resource Utilization: 97.25% • Max Clock Freq: 92.85 MHz • Max Clock Freq: 81.25 MHz • Process 512-bit block in a time slot • Process 1024-bit block in a time slot ZYBO Board comes with Zynq Z-7010 SoC; • The smallest Zynq device • Has limited resources 15

  14. Communication btw. HW & SW Configuring DMA for transferring buffers requires: • Accessing physical addresses • Coherent memory accesses Created a Linux kernel space module (Device File) Problem: Overhead of making context switches • Going do kernel space costs ~800 cycles. • Transferring the frame btw. User and Kernel space costs ~740 cycles. 16

  15. Improvements to Cryptographic Operations • Encrypted and decrypted many test vectors with both SW-only and SW+HW implementations. • Compared results for accuracy and execution times. Improvement in Encryption 18 Min 4.9, Max 15.1 16 Improvement (Factor) 14 Improvement in Decryption 12 Min 9.1, Max 16.2 10 8 6 4 2 0 16 32 64 128 256 512 1024 Message Length (Bytes) Encryption Decryption 17

  16. Improvements to VPN Bandwidth Test Network Structure: VPN VPN Device Device Bandwidth tests using Iperf Network Bandwidth Measurement Tool 18

  17. Improvements to VPN Bandwidth TCP bandwidth increase UDP bandwidth increase • 2.9 times for 128-byte frames, • 2 times for 128-byte frames, • 4.36 times for 1024-byte frames. • 5.36 times for 1024-byte frames. Bandwidth (Mbps) for Comm. with 1024-byte ETH Frames 100 90 80 70 60 50 40 30 20 10 0 TCP UDP No VPN VPN without Crypto VPN with SW Crypto VPN with HW+SW Crypto 19

  18. Functionality Test • The designed VPN device is still capable of establishing a secure communication with original SigmaVPN application. o A VPN device on a low-cost dev-board, providing confidential communication between a whole home/business network and a remote server. 20

  19. Conclusion • A cryptographic hardware accelerator is offered for NaCl's CryptoBox specifically for SigmaVPN. • Encrypting a 1024-byte message in 94% less time compared to SW-only implementation. • Integrating our HW-SW codesign into SigmaVPN offers up to 6 times more communication bandwidth. • Xilinx Open HW Design Contest Finalist: http://www.openhw.eu/2016-finalists.html • It’s available open source: https://github.com/furkanturan/Hardware-Accelerated-SigmaVPN 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend