fast endomorphisms in hardware
play

FAST ENDOMORPHISMS IN HARDWARE Kimmo Jrvinen 1 , 2 1 University of - PowerPoint PPT Presentation

FAST ENDOMORPHISMS IN HARDWARE Kimmo Jrvinen 1 , 2 1 University of Helsinki, Computer Science, Helsinki, Finland kimmo.u.jarvinen@helsinki.fi 2 Xiphera Ltd., Espoo, Finland kimmo.jarvinen@xiphera.com The 21st Workshop on Elliptic Curve


  1. FAST ENDOMORPHISMS IN HARDWARE Kimmo Järvinen 1 , 2 1 University of Helsinki, Computer Science, Helsinki, Finland kimmo.u.jarvinen@helsinki.fi 2 Xiphera Ltd., Espoo, Finland kimmo.jarvinen@xiphera.com The 21st Workshop on Elliptic Curve Cryptography Nijmegen, the Netherlands, Nov. 13–15, 2017 ECC’17 November 15, 2017 1/36

  2. INTRODUCTION ◮ This talk surveys my work on hardware implementations of ECC with fast endomorphisms ◮ Particularly: Koblitz curves, Four Q , and GLV/GLS curves ◮ In software, fast endomorphisms reduce the number of operations and lead to significant speedups ◮ In hardware, simplicity is often the key to efficiency and the feasibility of fast endomorphisms is less clear ECC’17 November 15, 2017 2/36

  3. PRELIMINARIES ECC’17 November 15, 2017 3/36

  4. SCALAR MULTIPLICATION ◮ Let E be an elliptic curve defined over a finite field F q ◮ Points on E (together with O ) form an additive Abelian group ◮ Let k be an integer and P be a point on E ; then, scalar multiplication is the following operation: [ k ] P = P + P + . . . + P � �� � k times ◮ Scalar multiplication is the central operation of ECC mostly determining the efficiency of the cryptosystem ECC’17 November 15, 2017 4/36

  5. ECC HIERARCHY SCALAR MULTIPLICATION POINT POINT ADDITION DOUBLING FIELD FIELD FIELD ADD/SUB MULT INV ECC’17 November 15, 2017 5/36

  6. ECC HIERARCHY SCALAR MULTIPLICATION POINT POINT ADDITION DOUBLING FIELD FIELD FIELD ADD/SUB MULT INV ECC’17 November 15, 2017 5/36

  7. ECC HIERARCHY SCALAR MULTIPLICATION POINT POINT ADDITION DOUBLING FIELD FIELD FIELD ADD/SUB MULT INV ECC’17 November 15, 2017 5/36

  8. ANATOMY OF ECC HW Mult logic Add ALU logic Other logic ECC’17 November 15, 2017 6/36

  9. ANATOMY OF ECC HW Mult FAU logic ctrl Add ALU FAU logic Local regs Other logic ECC’17 November 15, 2017 6/36

  10. ANATOMY OF ECC HW Key storage Mult FAU ECC ctrl logic ECC Co-Processor ctrl Host Processor Add ALU FAU logic Main Local memory regs Other logic ECC’17 November 15, 2017 6/36

  11. FAST ENDOMORPHISMS ◮ GLV/GLS curves have an efficiently computable endomorphism φ ( P ) such that φ ( P ) = [ λ ] P Then, scalar multiplication can be computed as: [ k ] P = [ k 0 ] P + [ k 1 ] φ ( P ) where k 0 + k 1 λ = k If k 0 , k 1 are of the same size, Shamir’s trick for double scalar multplication saves about half of the point doublings ◮ Koblitz curves are curves over F 2 m for which φ ( x , y ) = ( x 2 , y 2 ) is an endomorphism ECC’17 November 15, 2017 7/36

  12. OVERVIEW OF CHALLENGES ◮ Fast endomorphisms require recoding of the scalars (e.g., find k 0 , k 1 ) ⇒ Logic must be added (either a separate converter or FAU instruction set extension) ◮ The size of the overhead depends on the curve and implementation architecture ◮ For binary curves, FAU supports arithmetic over F 2 m but conversions require operations over Z ◮ For prime curves, FAU supports arithmetic over Z but FAU is typically highly optimized for mod p arithmetic ECC’17 November 15, 2017 8/36

  13. SOFTWARE VS. HARDWARE Software +++ Faster scalar multiplications - Slightly larger program memory and data memory requirements ⇒ Advantages bigger than disadvantages (almost always) ECC’17 November 15, 2017 9/36

  14. SOFTWARE VS. HARDWARE Software +++ Faster scalar multiplications - Slightly larger program memory and data memory requirements ⇒ Advantages bigger than disadvantages (almost always) Hardware ++(+) Faster scalar multiplications (almost surely) - - More complex control logic - ( - ) New instructions needed in FAU - ( - - ) More memory/registers needed ⇒ ??? ECC’17 November 15, 2017 9/36

  15. PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ECC’17 November 15, 2017 10/36

  16. PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ≥ t 1 Scalar recoding Precomputation Main for-loop Main for-loop · · · Inversion ECC’17 November 15, 2017 10/36

  17. PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ≥ t 1 Scalar recoding Precomputation Main for-loop Main for-loop · · · Inversion Precomputation ≥ t 2 s.t. t 2 < t 1 Main for-loop Main for-loop Inversion · · · Scalar recoding ECC’17 November 15, 2017 10/36

  18. PARALLELISM ◮ Stages should be balanced because throughput is determined by the slowest stage ◮ For-loop is by far the slowest stage ◮ Solutions: (a) Make for-loop faster by using more area (or make other parts slower and save area) (b) Use parallel for-loop units ECC’17 November 15, 2017 11/36

  19. KOBLITZ CURVES (Joint work with J. Adikari, B.B. Brumley, V. Dimitrov, S. Sinha Roy, J. Skyttä, and I. Verbauwhede) ECC’17 November 15, 2017 12/36

  20. KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ECC’17 November 15, 2017 13/36

  21. KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ◮ Cheap Frobenius maps φ : ( x , y ) �→ ( x 2 , y 2 ) can be used instead of point doublings ECC’17 November 15, 2017 13/36

  22. KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ◮ Cheap Frobenius maps φ : ( x , y ) �→ ( x 2 , y 2 ) can be used instead of point doublings ◮ . . . but first the integer k needs to be given as a τ -adic √ i = 0 k i τ i where τ = ( µ + expansion k = � ℓ − 1 − 7 ) / 2 ∈ C · · · add dbl dbl add dbl add dbl dbl add dbl add · · · conversion add add add add F 2 m Z ECC’17 November 15, 2017 13/36

  23. SCALAR CONVERSIONS ◮ Many cryptosystems (e.g., signature schemes) require k also as an integer (a) Select a random integer and find its τ -adic expansion (b) Select a random τ -adic expansion and find its integer equivalent ECC’17 November 15, 2017 14/36

  24. SCALAR CONVERSIONS ◮ Many cryptosystems (e.g., signature schemes) require k also as an integer (a) Select a random integer and find its τ -adic expansion (b) Select a random τ -adic expansion and find its integer equivalent ◮ Option (a) ◮ Base- τ expansions can be found analogously to finding binary expansions except with divisions by τ instead of 2 ◮ Straightforward τ -adic expansion of k is twice as long as k ◮ Meier and Staffelbach: Because P = φ m ( P ) , then α P = β P if α ≡ β ( mod τ m − 1 ) ◮ Solinas: Reduction modulo ( τ m − 1 ) / ( τ − 1 ) gives an expansion of length m + a where a ∈ { 0 , 1 } ECC’17 November 15, 2017 14/36

  25. SCALAR CONVERSIONS ◮ Both require complex operations (e.g., divisions, large multiplications) ◮ High-speed implementations: Avoid conversions from becoming the bottleneck ⇒ HW acceleration ◮ Lightweight implementations: Conversions done over Z ⇒ How to combine efficiently with F 2 m ? ◮ Lazy reduction (repeated divisions by τ ) and its many variations (pipelined, word-wise, . . . ) are commonly used and lead to fast conversions but with an expense in area ECC’17 November 15, 2017 15/36

  26. HIGH-SPEED IMPLEMENTATION ◮ The key to high speed is to accelerate the main for-loop; other parts can be separated to different pipeline stages ◮ For-loop consists of point additions and Frobenius maps ◮ Point additions are dominated by field multiplications (in F 2 m ) ◮ Point addition with Lopez-Dahab formulas (SAC’98) ◮ Frobenius maps φ ( Q ) = ( X 2 , Y 2 , Z 2 ) are cheap and can be computed independently for all coordinates ECC’17 November 15, 2017 16/36

  27. HIGH-SPEED IMPLEMENTATION X 1 X 2 Z 1 Z 2 Point addition: Y 1 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) Y 3 ECC’17 November 15, 2017 17/36

  28. HIGH-SPEED IMPLEMENTATION X 1 X 2 Z 1 Z 2 Point addition: Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

  29. HIGH-SPEED IMPLEMENTATION X 1 X 2 X 1 X 2 Z 1 Z 2 Z 1 Z 2 Point addition: Y 1 Y 3 Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

  30. HIGH-SPEED IMPLEMENTATION X 1 X 2 X 1 X 2 X 1 X 2 Z 1 Z 2 Z 1 Z 2 Z 1 Z 2 Point addition: Y 1 Y 3 Y 1 Y 3 Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Y 2 Y 4 Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

  31. HIGH-SPEED RESULTS ◮ The above technique computes the for-loop in less than 5 µ s on K-163 or 12 µ s on K-283 in a Stratix II FPGA (old) ◮ One core performs over 200,000 op/s with delay of 11.7 µ s ◮ Multiple cores fit in an FPGA and one device can reach throughputs of several millions ◮ Delay is not spectacular compared to modern SW but throughput is ECC’17 November 15, 2017 18/36

  32. COMPACT IMPLEMENTATION ◮ Koblitz curve K-283 ◮ 16-bit ALU for binary polynomial arithmetic extended with a 16-bit integer adder/subtractor ECC’17 November 15, 2017 19/36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend