efficient software implementation of binary field
play

Efficient Software Implementation of Binary Field Arithmetic Using - PowerPoint PPT Presentation

Efficient Software Implementation of Binary Field Arithmetic Using Vector Instruction Sets Diego F. Aranha Department of Computer Science University of Bras lia Joint work with Julio L opez and Darrel Hankerson and Francisco Rodr


  1. Efficient Software Implementation of Binary Field Arithmetic Using Vector Instruction Sets Diego F. Aranha Department of Computer Science University of Bras´ ılia Joint work with Julio L´ opez and Darrel Hankerson and Francisco Rodr´ ıguez-Henr´ ıquez Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  2. Introduction Binary fields ( F 2 m ) are omnipresent in Cryptography: Efficient Curve-based Cryptography (ECC, PBC) Post-quantum Cryptography Symmetric ciphers Many algorithms/optimizations already described in the literature: Is it possible to unify the fastest ones in a simple formulation? Can such a formulation reflect the state-of-the-art and provide new ideas? Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  3. Objective Contributions Formulation of state-of-the-art binary field arithmetic using vector instructions New strategy for the implementation of multiplication Side-channel resistance Time-memory trade-offs to compensate for native multiplier Experimental results Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  4. Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instruction set Super shuffle engine introduced in 45 nm series Relevant vector instructions: Instruction Description Cost Mnemonic MOVDQA Memory load/store 2.5 ← PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 PXOR,PAND,POR Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 Byte shuffling 3 (1) shuffle , lookup PSHUFB Memory alignment 2 (1) PALIGNR ⊳ Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  5. New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  6. New SSSE3 instructions Example: Bit manipulation Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  7. New SSSE3 instructions Example: Bit manipulation Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  8. New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  9. Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � a i z i . Polynomial basis: a ( z ) ∈ F 2 m = i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words. Graphical representation: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  10. Proposed representation To employ 4-bit granular arithmetic, convert to split form : � � a i z i − 4 , a i z i , a L = a H = 0 ≤ i < m , 0 ≤ i < m , 0 ≤ i mod 8 ≤ 3 4 ≤ i mod 8 ≤ 7 A i A L A H Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  11. Proposed representation Easy to convert to split form: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = ( A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 ) >> 4 Easy to convert back: a ( z ) = a H ( z ) z 4 + a L ( z ) . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  12. Squaring in F 2 m m a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 � a ( z ) = i =0 m − 1 a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 � i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  13. Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  14. Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ), use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  15. Proposed squaring in F 2 m A i A A L H ... table 01010101 00010001 00010000 00000101 00000100 00000001 00000000 lookup lookup A A H L interhi, interlo T T 2i+1 2i a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  16. Square root extraction in F 2 m Algorithm by Fong et al.: a even ( z ) + √ z · a odd ( z ) � a ( z ) = Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  17. Square root extraction in F 2 m Algorithm by Fong et al.: a even ( z ) + √ z · a odd ( z ) � a ( z ) = Since square-root is also a linear operation: � a H ( z ) z 4 + a L ( z ) � a ( z ) = a H ( z ) z 2 + � � = a L ( z ) √ z · ( a L odd ( z ) + a H odd ( z ) z 2 ) + a L even ( z ) + a H even ( z ) z 2 = Note: Multiplication by √ z ideally requires shifted additions only. If not possible, precompute product by √ z . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  18. Proposed square root in F 2 m A i shuffle A A L H 00110011 ... 11001100 ... table table · z² 00000001 00000000 00000100 00000000 lookup lookup A A H L A A L H A A even odd a ( z ) = √ z · ( a L odd ( z ) + a H odd ( z ) z 2 ) + a L even ( z ) + a H even ( z ) z 2 � Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  19. Multiplication in F 2 m 1 Three strategies: L´ opez-Dahab comb method Shuffle-based multiplication Native multiplication Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  20. L´ opez-Dahab multiplication in F 2 m We can compute u · b ( z ) using shifts and additions. If a ( z ) is divided into 4-bit polynomials, compute a ( z ) · b ( z ) by: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  21. L´ opez-Dahab multiplication in F 2 m If the multiplier is represented in split form: b ( z ) · ( a H ( z ) z 4 + a L ( z )) a ( z ) · b ( z ) = b ( z ) z 4 a H ( z ) + b ( z ) a L ( z ) = This is a well-known technique for removing expensive 4-bit shifts! Note: The core operation is accumulating u × dense b ( z ). Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  22. L´ opez-Dahab multiplication in F 2 m Algorithm 1 LD multiplication implemented with n 128-bit registers. Input: a ( z ) = a [0 .. n − 1] , b ( z ) = b [0 .. n − 1]. Output: c ( z ) = c [0 .. n − 1]. Note: m i denotes the vector of n 2 128-bit registers ( r ( i − 1+ n / 2) , . . . , r i ). 1: Compute T 0 ( u ) = u ( z ) · b ( z ) , T 1 ( u ) = u ( z ) · ( b ( z ) z 4 ) for all u ( z ) of degree < 4. 2: ( r n − 1 . . . , r 0 ) ← 0 3: for k ← 56 downto 0 by 8 do 4: for j ← 1 to n − 1 by 2 do 5: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 6: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 7: m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 0 ( u ), m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 1 ( v ) 8: end for 9: ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 10: end for 11: for k ← 56 downto 0 by 8 do 12: for j ← 0 to n − 2 by 2 do 13: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 14: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 15: m j / 2 ← m j / 2 ⊕ T 0 ( u ), m j / 2 ← m j / 2 ⊕ T 1 ( v ) 16: end for 17: if k > 0 then ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 18: end for 19: return c = ( r n − 1 . . . , r 0 ) mod f ( z ) Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  23. Shuffle-based multiplication in F 2 m If both multiplicand and multiplier are represented in split form: a ( z ) · b ( z ) = ( b H ( z ) z 4 + b L ( z )) · ( a H ( z ) z 4 + a L ( z )) Using Karatsuba formula, we can reduce it to 3 multiplications: a ( z ) · b ( z ) = a H b H z 8 +[( a H + a L )( b H + b L ) + a H b H + a L b L ] z 4 + a L b L Note: The core operation is accumulating u × sparse b L , H ( z ). x B B B B B B B B B B 1 B ... 9 8 7 6 5 4 3 2 0 n-1 Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend