lightweight implementations of sha 3 candidates on fpgas
play

Lightweight Implementations of SHA-3 Candidates on FPGAs Jens-Peter - PowerPoint PPT Presentation

Introduction Methodology Implementations Results Lightweight Implementations of SHA-3 Candidates on FPGAs Jens-Peter Kaps Panasayya Yalla Kishore Kumar Surapathi Bilal Habib Susheel Vadlamudi Smriti Gurung John Pham Cryptographic


  1. Introduction Methodology Implementations Results Lightweight Implementations of SHA-3 Candidates on FPGAs Jens-Peter Kaps Panasayya Yalla Kishore Kumar Surapathi Bilal Habib Susheel Vadlamudi Smriti Gurung John Pham Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of Engineering, George Mason University, Fairfax, VA, USA 12th International Conference on Cryptology in India Indocrypt 2011 Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 1 / 27

  2. Introduction Methodology Implementations Results Outline 1 Introduction 2 Methodology 3 Implementations 4 Results Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 2 / 27

  3. Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Hash Function Competition A hash algorithm reads an arbitrary length message and produces a fixed bit string called hash value/message digest. Main applications: Digital signatures, Message Authentication Codes (MAC), Universal Unique IDentifier(UUID/GUID), password tables and many more. NIST competition for new secure hash algorithm SHA-3 Announced in Nov 2007, 64 entries submitted. 14 selected for Round 2. Currently in Round 3 → 5 finalists. NIST’s selection criteria: Security, HW/SW speed, scalability. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 3 / 27

  4. Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Hash Function Competition A hash algorithm reads an arbitrary length message and produces a fixed bit string called hash value/message digest. Main applications: Digital signatures, Message Authentication Codes (MAC), Universal Unique IDentifier(UUID/GUID), password tables and many more. NIST competition for new secure hash algorithm SHA-3 Announced in Nov 2007, 64 entries submitted. 14 selected for Round 2. Currently in Round 3 → 5 finalists. NIST’s selection criteria: Security, HW/SW speed, scalability. Motivation Analyze performance of candidates in a constrained FPGA environment ⇒ determine scalability on FPGAs. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 3 / 27

  5. Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Previous Work on SHA-3 Candidates Several Throughput/Area optimized implementations on FPGAs were published: Gaj et al.[CHES 2010], Matsuo et al.[SHA-3 conference 2010], Baldwin et al.[SHA-3 conference 2010]. Only two specific for low-area implementations of SHA-3 finalists: Kerckhof et al.[HASH 2011], Jungk et al.[Reconfig 2011]. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 4 / 27

  6. Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Previous Work on SHA-3 Candidates Several Throughput/Area optimized implementations on FPGAs were published: Gaj et al.[CHES 2010], Matsuo et al.[SHA-3 conference 2010], Baldwin et al.[SHA-3 conference 2010]. Only two specific for low-area implementations of SHA-3 finalists: Kerckhof et al.[HASH 2011], Jungk et al.[Reconfig 2011]. Problem: Rating algorithm performance when Implementations are on different devices, made with different implementation goals and features, vary in both: area and throughput, and support different I/O interface widths. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 4 / 27

  7. Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Our Goal: Comprehensive set of lightweight implementations of all Round 2 SHA-3 Candidates (except SIMD) and all SHA-3 Finalists. All optimized for the same target → maximum Throughput to Area ratio for given area budget. All use the same standardized interface. Implemented on different families for fair comparison with other reported results. Target Details: Xilinx Spartan 3, low cost FPGA family Budget: 400-600 slices, 1 Block RAM (BRAM) Implemented 256 bit digest versions only Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 5 / 27

  8. Introduction Methodology Assumptions Implementations Interface and Protocol Results Assumptions Implementing for minimum area alone can lead to unrealistic run-times. ⇒ Target: Achieve the maximum Throughput/Area ratio for a given area budget. Realistic scenario: System on Chip: Certain area only available. Standalone: Smaller Chip, lower cost, but limit to smallest chip available, e.g. 768 slices on smallest Spartan 3 FPGA. Makes fair comparison of lightweight implementations possible. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 6 / 27

  9. Introduction Methodology Assumptions Implementations Interface and Protocol Results Interface and Protocol Based on Interface and I/O Protocol from Gaj et al.[CHES 2010]. msg len ap, seq len ap (after padding ) in 32-bit words. msg len bp, seq len bp (before padding) in bits. n − 2 � msg len bp = seq len ap i · 32 + seq len bp n − 1 i =0 n − 1 bits � w msg len ap = seq len ap i · 32 seq_len_ap 0 0 i =0 seq w = 16 bits. 0 clk rst seq_len_ap 0 bits 1 w seq msg_len_ap 1 clk rst 1 msg_len_bp SHA Core w w seq_len_ap 1 n−1 din dout seq_len_bp message n−1 src_ready dst_ready src_read dst_write seq n−1 a)SHA Interface b)SHA Protocol Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 7 / 27

  10. BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein BLAKE-256 Algorithm Key Features M C 255 Init. Ti 0 Salt value: A user Dependant 512 512 512 256 constant 128 bits set all to 0 P P P1 14x 512 512 8 G functions : XOR, addition, G G G G IV CM shifting. P2 H 511 255 G G G G P1,P2 : Permutation 256 0 Blake scales very well. CM CM G 32 A A’ Folded up to 4 times vertically 12 32 7 <<< <<< B B’ and 4 times horizontally. 32 C C’ 16 32 8 <<< <<< D D’ Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 8 / 27

  11. BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein BLAKE-256 Implementation Implementation dout 32 CM REG_A DRAM 1 A A’ 0 1 Salt : BRAM 31 15 32 0 din 31 16 0 R4 16 Reg REG_B1 REG_B2 <<< State: DRAM Port−A 15 32 DRAM 1 B 1 B’ 0 BRAM 0 32 0 R2 Quasi pipelined Half G Port−B <<< 32 32 REG_C DRAM 1 C C’ function 32 0 R3 <<< Registers: Reduce REG_1 REG_2 DRAM 1 D 1 D’ R1 0 0 32 1 0 1 0 critical path <<< D 32 32 C B 32 32 A Permutation causes a large controller with 210 addresses. BRAM contains constants, message, IV, intermediate hash. Scalability: Unfolding leads to worse TP/A. Improvement: Rescheduling of G results in 290 clock per block versus 350 . Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 9 / 27

  12. BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein Grøstl Algorithm M Key Features 512 512 IV Hi−1 Based on AES like architecture 512 S-BOX, shift rows, Mixed P Q Addp Addq 512 512 columns S−Box S−Box Grøstl scales well, like AES. 10x 10x Sft Row Sft Row Folded up to 8 times vertically. Mix Mix Small storage requirements. 255 Uses many narrow memory H 0 Hi accesses in parallel (8 per column). Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 10 / 27

  13. BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein Grøstl Implementation 31 Implementation din 0 31 dout 15 B 1 16 Reg 0 32 15 A A State p,q : DRAM 0 B Port−A A Shift Rows : how data 1 0 1 0 2 1 0 2 1 0 0 1 BRAM 4xDRAM 4xDRAM 4xDRAM 4xDRAM accessed from DRAM Port−B 32 8 8 8 8 GFMul Add Constant Mix Column : Reg GF-multiplier(half Reg Reg 0 1 0 1 SBox SBox SBox SBox multiplier) Finalization takes as many clock cycles as 1 block. BRAM stores only intermediate hash and IV. One new column every 3 clock cycles, P & Q interleaved. Scalability: Reducing number of clock cycles per column by adding S-Boxes and/or GF-Multiplier. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 11 / 27

  14. BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein JH Algorithm 511 1024 Key Features 0 1023 512 M 512 Grouping: reordering of 1024 E 8 S 0 Group bits state SBOX : Permutation R 8 256 S−box C 0 Linear transformation : rotation L 42x and XOR R 6 S−box P 1024 De-grouping: inverse of L grouping De−group Permutation , grouping, and P 512 M de-grouping makes scaling 1023 511 1023 H 512 0 1024 768 difficult Folding increases size Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 12 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend