Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND - PowerPoint PPT Presentation

Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND CNRS – Lab-STICC – IRISA HAH Project CryptArchi June, 2017

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Summary Context & Motivations 1 HECC Operations 2 Efficient Multiplier 3 Architectures and Tools for HECC 4 Conclusion 5 G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 2 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Public-Key Cryptography (PKC) Provides cryptographic primitives such as digital signature, key exchange and specific encryption schemes First PKC standard: RSA - ≥ 2000-bit keys recommended today - Too costly for embedded applications Elliptic Curve Cryptography (ECC): - Better performances and lower cost than RSA - Allows more advanced schemes Hyper-Elliptic Curve Cryptography (HECC): - Evolution of ECC focusing on larger sets of curves - Supposed to have a smaller cost than ECC G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 3 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Operations Hierarchy in (H)ECC ADD and DBL built using F P operations Protocols Curve-Level Modular arithmetic in F P : Scalar Operations Multiplication - 100 · · · 200 bits elements for HECC [Software] [ k ] P b - Operations involve modular reduction - Choice for P : DBL(P) ADD(P ,Q) – Generic P : more flexible but slower P+P – Specific P ( e.g. pseudo-Mersenne): faster but more specific ... x ± y x x y Modular multiplication ( M ) and square ( S ): GF(p)/GF(2 m ) Operations - Most common and costly operations [Hardware] - Efficient dedicated units Main metric: numbers of M and S in F P G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 4 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion ECC, HECC, Kummer-HECC F P elements size source ADD DBL ECC ℓ ECC 12 M + 2 S 7 M + 3 S [Bernstein and Lange] ℓ HECC ≈ 1 HECC 2 ℓ ECC 40 M + 4 S 38 M + 6 S [Lange, 2005] Kummer ℓ HECC 19 M + 12 S [Renes et al., 2016] ECC: - Size of F P elements 2 × larger - Simpler ADD and DBL operations HECC: - Smaller F P - More operations in F P for ADD / DBL Kummer-HECC is more efficient than ECC [Renes et al., 2016]: - ARM Cortex M0: up to 75% clock cycles reduction for signatures - AVR AT-mega: up to 32% cycles reduction for Diffie-Hellman G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 5 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Curve-Level Operations in Kummer No ADD operation but still DBL Differential addition : xADD ( ± P , ± Q , ± ( P − Q )) → ± ( P + Q ) xADD and DBL can be combined: xDBLADD ( ± P , ± Q , ± ( P − Q )) → ( ± [2] P , ± ( P + Q )) For details see [Renes et al., 2016], [Gaudry, 2007] and [Bos et al., 2016] G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 6 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion xDBLADD F P Operations cst cst cst var s a s s M M S M OUT var a s M M a a S M OUT var s a s s M M S M OUT var a s a a M M S OUT cst cst cst cst var s a S M a a S M OUT var a s S M s s S M OUT var s a a a S M S M OUT var a s S M s s S M OUT cst cst cst cst G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 7 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Scalar Multiplication Montgomery ladder based crypto scalarmult [Renes et al., 2016]: Require: m -bit scalar k = � m − 1 i =0 2 i k i , point P b , cst ∈ F 4 P Ensure: V 1 = [ k ] P b , V 2 = [ k + 1] P b V 1 ← cst V 2 ← P b for i = m − 1 downto 0 do ( V 1 , V 2 ) ← CSWAP ( k i , ( V 1 , V 2 )) ( V 1 , V 2 ) ← xDBLADD ( V 1 , V 2 , P b ) ( V 1 , V 2 ) ← CSWAP ( k i , ( V 1 , V 2 )) end for return ( V 1 , V 2 ) CSWAP ( k i , ( X , Y )) returns ( X , Y ) if k i = 0, else ( Y , X ) Constant time, uniform operations (independent from key bits) Some parallelism between xDBLADD internal F P operations CSWAP : very simple but involves secret bits (to be protected) G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 8 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Montgomery Modular Multiplication (MMM) R = A × B n × n → 2 n bits q = ( R × ( −P − 1 )) mod (2 n ) n × n → n bits q P = q × P n × n → 2 n bits A B Objective: A × B mod P R Proposed in [Montgomery, 1985] q q Variants are actual state-of-the-art for F P multiplication (with generic P ) R Final reduction step discards n LSBs S G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 9 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion Modular Multiplication: Dependencies Problem In practice, MMM is interleaved - Operands are split into s words of w bits such that n = s × w - Iterations over partial products and reductions on words - Coarsely Integrated Operand Scanning (CIOS) from [Ko¸ c et al., 1996] Impact on hardware implementation - Dependencies → latencies between internal iterations - Hardware pipeline in DSP slices cannot be filled efficiently Proposed solution: Hyper-Threaded Modular Multiplier (HTMM) - Based on simple CIOS algorithm - Use idle stages to compute other independent MMMs in parallel G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 10 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion HTMM Internal Architecture HTMM architecture: 3 hardware stages - Stages are fully pipelined (several clock cycles per stage) - 3 to 4 DSP slices in each stage q i = t 0 S = + t t = A i B + S q i A i STAGE 1 STAGE 2 STAGE 3 B G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 11 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion HTMM Internal Architecture HTMM architecture: 3 hardware stages - Stages are fully pipelined (several clock cycles per stage) - 3 to 4 DSP slices in each stage B (3) A (4) A (3) B (4) A (5) B (0) A (1) A (0) B (1) A (2) OPERANDS B (2) B (5) STAGE 1 0 1 2 0 1 2 0 1 2 0 1 2 3 4 5 ... STAGE 2 0 1 2 0 1 2 0 1 2 0 1 2 3 4 STAGE 3 0 1 2 0 1 2 0 1 2 0 1 2 3 RESUL T P (0) P (1) P (2) time G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 11 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion HTMM Implementations Xilinx FPGAs - Virtex 4 XC4VLX100 (V4) - Virtex 5 XC5VLX110T (V5) - Spartan 6 XC6SLX75 (S6) Comparison with fastest MMM implementation in literature - Design presented in [Ma et al., 2013] - Implemented on the same FPGAs for fair comparison 2 versions of HTMM: - HTMM DRAM : operands stored in FPGA slices (LUTs) - HTMM BRAM : operands stored in FPGA BRAMs Parameters for HTMM: - P→ 128 bits - w = 34 bits, s = 4 - Operands size n = s × w = 134 bits G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 12 / 22

Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion HTMM Implementations Results Results for 3 independent multiplications: Version FPGA DSP BRAM FF LUT Slices Freq. Nb. Time 18K/9K (MHz) cycles (ns) V4 21 6/0 1311 1201 879 252 258 [Ma et al., 2013] V5 21 6/0 1310 1027 406 296 65 220 S6 21 0/6 1280 1600 540 210 309 V4 11 0/0 1638 1128 1346 330 239 HTMM DRAM V5 11 0/0 1616 652 517 400 79 198 S6 11 0/0 1631 1344 483 302 261 V4 11 2/0 615 364 449 328 241 HTMM BRAM V5 11 2/0 593 371 249 357 79 221 S6 11 0/2 587 359 180 304 260 S6: -47% DSPs, -66% BRAMs, -66% slices, -15% duration For only 1 single M , HTMM is less efficient (69 cycles against 25) G. Gallin - A. Tisserand Hardware Architectures for HECC CryptArchi 2017 13 / 22

Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND - PowerPoint PPT Presentation

Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND CNRS Lab-STICC IRISA HAH Project CryptArchi June, 2017 Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion

Architecture level Optimizations for Kummer based HECC on FPGAs Gabriel GALLIN Turku Ozlum

HB 3472 Update HOUSE COMMITTEE ON HIGHER EDUCATION AND WORKFORCE DEVELOPMENT Presented by: Rob

Outcomes Based Funding Update HECC FEBRUARY FULL COMMISSION MEETING 2/12/2015 Brian Fox,

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Architectures Architectural styles Software architectures Architectures versus middleware

House Bill 2998: Implementation Update HECC October 2018 Meeting Patrick Crane, Director of

HECC S Standard A: N Need Providing Clear Evidence of Occupation Demand Kasena Dailey, CCWD

Hyper-Threaded Multiplier for HECC Gabriel GALLIN and Arnaud TISSERAND CNRS Lab-STICC

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

software and hardware for the Internet of Things. Choose hardware Design hardware Design

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

Overview Agent Architectures Definition of agent architecture Classical Architectures for

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

HPC Architectures Types of resource currently in use Outline Shared memory architectures

Rational Points on an Elliptic Curve Dr. Carmen Bruni University of Waterloo November 11th, 2015

A Finite Field Example Over F p geometric pictures dont make sense. Example Let E : y 2 = x 3

Bitcoin II & Introduction to Elliptic Curve Cryptography Sep. 11, 2019 Overview

High-speed Define 19; prime. elliptic-curve cryptography Define = 358990. Define 1 Curve :

Computing Isogenies between Montgomery Curves Using the Action of (0 , 0) Joost Renes Radboud

2017.03.24 Yongsoo Song Contents Motivation The Learning with errors (LWE) Problem

Quantum-resistant Cryptography based on Isogenies between Elliptic Curves A Brief Survey Jo ao

Key Management and Distribution public-key encryption helps address key distribution problems

Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND - PowerPoint PPT Presentation

Hardware Architectures for HECC Gabriel GALLIN and Arnaud TISSERAND CNRS Lab-STICC IRISA HAH Project CryptArchi June, 2017 Summary Context & Motivations HECC Operations Efficient Multiplier Architectures and Tools Conclusion

Architecture level Optimizations for Kummer based HECC on FPGAs Gabriel GALLIN Turku Ozlum

HB 3472 Update HOUSE COMMITTEE ON HIGHER EDUCATION AND WORKFORCE DEVELOPMENT Presented by: Rob

Outcomes Based Funding Update HECC FEBRUARY FULL COMMISSION MEETING 2/12/2015 Brian Fox,

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Architectures Architectural styles Software architectures Architectures versus middleware

House Bill 2998: Implementation Update HECC October 2018 Meeting Patrick Crane, Director of

HECC S Standard A: N Need Providing Clear Evidence of Occupation Demand Kasena Dailey, CCWD

Hyper-Threaded Multiplier for HECC Gabriel GALLIN and Arnaud TISSERAND CNRS Lab-STICC

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

software and hardware for the Internet of Things. Choose hardware Design hardware Design

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

Overview Agent Architectures Definition of agent architecture Classical Architectures for

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

HPC Architectures Types of resource currently in use Outline Shared memory architectures

Rational Points on an Elliptic Curve Dr. Carmen Bruni University of Waterloo November 11th, 2015

A Finite Field Example Over F p geometric pictures dont make sense. Example Let E : y 2 = x 3

Bitcoin II &amp; Introduction to Elliptic Curve Cryptography Sep. 11, 2019 Overview

High-speed Define 19; prime. elliptic-curve cryptography Define = 358990. Define 1 Curve :

Computing Isogenies between Montgomery Curves Using the Action of (0 , 0) Joost Renes Radboud

2017.03.24 Yongsoo Song Contents Motivation The Learning with errors (LWE) Problem

Quantum-resistant Cryptography based on Isogenies between Elliptic Curves A Brief Survey Jo ao

Key Management and Distribution public-key encryption helps address key distribution problems

Bitcoin II & Introduction to Elliptic Curve Cryptography Sep. 11, 2019 Overview