four q on fpga
play

Four Q on FPGA: New Hardware Speed Records for Elliptic Curve - PowerPoint PPT Presentation

Four Q on FPGA: New Hardware Speed Records for Elliptic Curve Cryptography over Large Prime Characteristic Fields K. Jrvinen 1 , A. Miele 2 , R. Azarderakhsh 3 , and P . Longa 4 1 Aalto University 2 Intel Corporation 3 Rochester Institute of


  1. Four Q on FPGA: New Hardware Speed Records for Elliptic Curve Cryptography over Large Prime Characteristic Fields K. Järvinen 1 , A. Miele 2 , R. Azarderakhsh 3 , and P . Longa 4 1 Aalto University 2 Intel Corporation 3 Rochester Institute of Technology 4 Microsoft Research Contact: kimmo.jarvinen@aalto.fi, plonga@microsoft.com CHES 2016, Santa Barbara, CA, USA, August 17–19, 2016

  2. Introduction Four Q : ◮ Four Q is a high-performance elliptic curve with very good SW performance (2–3 × faster than Curve25519) ◮ Four Q has been shown to offer the fastest scalar multiplications on a wide range of software platforms: ◮ On several 32-bit ARM microarchitectures (SAC 2016) ◮ On several 64-bit Intel/AMD processors, low and high-end (ASIACRYPT 2015) ◮ Four Q employs four-dimensional scalar decompositions, requires extensive precomputation, complex control, etc. ⇒ Not clear how well it suits for HW implementation Four Q on FPGA CHES 2016 2/17

  3. Introduction Contributions: ◮ The first FPGA-based implementations of Four Q ◮ Four Q offers 2–2.5 × faster performance than Curve25519 ◮ Speed-area tradeoff is the primary optimization goal ◮ Protected against timing and SPA attacks ◮ We present three implementations: single-core, multi-core, and Montgomery ladder variant Four Q on FPGA CHES 2016 3/17

  4. Four Q Costello, Longa, ASIACRYPT’15 E / F p 2 : − x 2 + y 2 = 1 + dx 2 y 2 ◮ Twisted Edwards curve with # E ( F p 2 ) = 392 · ξ where ξ is a 246-bit prime ◮ Defined over F p 2 with the Mersenne prime p = 2 127 − 1 ◮ Complete addition formulas over extended twisted Edwards coordinates (Hisil et al. ASIACRYPT’08) Four Q on FPGA CHES 2016 4/17

  5. Four Q Costello, Longa, ASIACRYPT’15 E / F p 2 : − x 2 + y 2 = 1 + dx 2 y 2 ◮ Twisted Edwards curve with # E ( F p 2 ) = 392 · ξ where ξ is a 246-bit prime ◮ Defined over F p 2 with the Mersenne prime p = 2 127 − 1 ◮ Complete addition formulas over extended twisted Edwards coordinates (Hisil et al. ASIACRYPT’08) ◮ Two efficiently-computable endomorphisms ψ and φ ◮ Four-dimensional decomposition for the 256-bit scalar m with ( a 1 , a 2 , a 3 , a 4 ) such that a i ∈ [0 , 2 64 ) : [ m ] P = [ a 1 ] P + [ a 2 ] ψ ( P ) + [ a 3 ] φ ( P ) + [ a 4 ] ψ ( φ ( P )) Four Q on FPGA CHES 2016 4/17

  6. Scalar Multiplication Input: Point P , integer m ∈ [0 , 2 256 ) Output: [ m ] P 1 Decompose and recode m 2 Precompute lookup table T 3 Q ← T [ v 64 ] 4 for i = 63 to 0 do Q ← [2] Q 5 Q ← Q + m i T [ v i ] 6 Four Q on FPGA CHES 2016 5/17

  7. Scalar Multiplication Scalar decompose and recode Input: Point P , integer m ∈ [0 , 2 256 ) ◮ Decompose to a multi-scalar Output: [ m ] P ( a 1 , a 2 , a 3 , a 4 ) 1 Decompose and recode m ◮ Sign-aligned so that a 1 [ j ] ∈ {± 1 } 2 Precompute lookup table T and a i [ j ] ∈ { 0 , a 1 [ j ] } for 2 ≤ j ≤ 4 3 Q ← T [ v 64 ] ◮ Recode to signs m i ∈ {− 1 , 1 } 4 for i = 63 to 0 do and values v i ∈ [0 , 7] (point index) Q ← [2] Q 5 Q ← Q + m i T [ v i ] 6 Four Q on FPGA CHES 2016 5/17

  8. Scalar Multiplication Precomputation Input: Point P , integer m ∈ [0 , 2 256 ) ◮ Precompute 8 points: T [ u ] = P + Output: [ m ] P [ u 0 ] φ ( P )+[ u 1 ] ψ ( P )+[ u 2 ] ψ ( φ ( P )) 1 Decompose and recode m for u = ( u 2 , u 1 , u 0 ) ∈ [0 , 7] 2 Precompute lookup table T ◮ Store them with 5 coordinates 3 Q ← T [ v 64 ] ( X + Y, Y − X, 2 Z, 2 dT, − 2 dT ) ⇒ 4 for i = 63 to 0 do + T [ u ] : ( X + Y, Y − X, 2 Z, 2 dT ) Q ← [2] Q 5 − T [ u ] : ( Y − X, X + Y, 2 Z, − 2 dT ) Q ← Q + m i T [ v i ] 6 ◮ 68 M + 27 S and several additions Four Q on FPGA CHES 2016 5/17

  9. Scalar Multiplication Main for-loop Input: Point P , integer m ∈ [0 , 2 256 ) ◮ Fully regular and constant-time Output: [ m ] P ◮ Only 64 double-and-adds 1 Decompose and recode m ◮ Doubling: 2 Precompute lookup table T ( X, Y, Z, T a , T b ) ← ( X, Y, Z ) 3 Q ← T [ v 64 ] 4 for i = 63 to 0 do ◮ Addition: Q ← [2] Q ( X, Y, Z, T a , T b ) ← 5 Q ← Q + m i T [ v i ] 6 ( X, Y, Z, T a , T b ) × ( X + Y, Y − X, 2 Z, 2 dT ) Four Q on FPGA CHES 2016 5/17

  10. General Architecture Scalar Decomposition and Recoding Unit ◮ Decomposes and recodes the scalar ◮ Mainly multiplications with constants Field Arithmetic Unit (“the core”) ◮ Precomputation and the main for-loop ◮ Highly optimized for F p with the Mersenne prime Four Q on FPGA CHES 2016 6/17

  11. Scalar Unit ◮ Decomposition is computed with a truncated multiplier Y X (mainly multiplications with 195 264 constants) 17 ◮ The main component is a 264 17 × 264-bit multiplier FSM 17 × 264-bit row multiplier built 281 by using 11 DSPs + 264 281 ◮ Recoding is bit manipulations 17 and 64-bit additions 64 64 ◮ Outputs ( m 0 , v 0 ) first, scalar Z H Z L multiplication begins with ( m 64 , v 64 ) ⇒ Store in a LIFO buffer Four Q on FPGA CHES 2016 7/17

  12. Field Arithmetic Unit commands, responses do di 64 64 Interface logic 2 16 127 2 18 Dual-port RAM 127 Control 127 127 16 Datapath Four Q on FPGA CHES 2016 8/17

  13. Field Arithmetic Unit commands, responses do di 256 × 127-bit RAM (128 F p 2 elements) 64 64 4 BRAM Interface logic 2 16 127 2 18 Dual-port RAM 127 Control 127 127 16 Datapath Four Q on FPGA CHES 2016 8/17

  14. Field Arithmetic Unit commands, 127-bit datapath, responses do di optimized for 64 64 p = 2 127 − 1 Interface logic 2 16 127 2 18 Dual-port RAM 127 Control 127 127 16 Datapath Four Q on FPGA CHES 2016 8/17

  15. Field Arithmetic Unit commands, responses do di FSM + Program ROM 64 64 (6 BRAMs) Interface logic 2 16 127 2 18 Dual-port RAM 127 Control 127 127 16 Datapath Four Q on FPGA CHES 2016 8/17

  16. Field Arithmetic Unit: Datapath 128 63 64 129 64 + 64 × 64 -bit 128 multiplier 127 127 (pipelined) 63 127 64 c 64 a b 127 1 127 127 127 r 127 + / − 0 127 127 c 0 1 Four Q on FPGA CHES 2016 9/17

  17. Field Arithmetic Unit: Datapath Multiplier path 128 63 64 129 64 + 64 × 64 -bit 128 multiplier 127 127 (pipelined) 63 127 64 c 64 a b 127 1 127 127 127 r 127 + / − 0 127 127 c 0 1 Four Q on FPGA CHES 2016 9/17

  18. Field Arithmetic Unit: Datapath 128 63 64 129 64 + 64 × 64 -bit 128 multiplier 127 127 (pipelined) 63 127 64 c 64 a b 127 1 127 127 127 r 127 + / − 0 127 127 c 0 1 Adder path Four Q on FPGA CHES 2016 9/17

  19. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) Four Q on FPGA CHES 2016 10/17

  20. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) Multiplier pipeline Adders Dual-port RAM Input regs Four Q on FPGA CHES 2016 10/17

  21. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) R R 1 Four Q on FPGA CHES 2016 10/17

  22. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) 2 Four Q on FPGA CHES 2016 10/17

  23. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) 3 Four Q on FPGA CHES 2016 10/17

  24. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) R R 4 Four Q on FPGA CHES 2016 10/17

  25. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) R R 5 Four Q on FPGA CHES 2016 10/17

  26. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) 6 Four Q on FPGA CHES 2016 10/17

  27. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) + 7 Four Q on FPGA CHES 2016 10/17

  28. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) R R % 8 Four Q on FPGA CHES 2016 10/17

  29. Example: Multiplication in F p 2 3 multiplications, 2 additions and 3 subtractions in F p : a × b = ( a 0 , a 1 ) × ( b 0 , b 1 ) = ( a 0 · b 0 − a 1 · b 1 , ( a 0 + a 1 ) · ( b 0 + b 1 ) − a 0 · b 0 − a 1 · b 1 ) W + 9 Four Q on FPGA CHES 2016 10/17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend