high speed parallel software implementation of the t
play

High-speed parallel software implementation of the T pairing Diego - PowerPoint PPT Presentation

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of Computing UNICAMP Joint work with Julio L opez and Darrel Hankerson Diego F. Aranha, Julio L opez, Darrel Hankerson High-speed parallel


  1. High-speed parallel software implementation of the η T pairing Diego F. Aranha Institute of Computing – UNICAMP Joint work with Julio L´ opez and Darrel Hankerson Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  2. Introduction Pairing computation is the most expensive operation in Pairing-Based Cryptography. Parallelism is being increasingly introduced in modern architectures. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  3. Objective Explore two types of parallelism in software to reduce pairing computation latency: Vector instructions; Multiprocessing. Applications: real-time services (DNS?), embedded devices. Contributions Novel ways for implementing binary field arithmetic; Parallelization of Miller’s Algorithm; Static load balancing technique; Experimental results. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  4. Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instruction set; Multiprocessing with overheads of around 10 microsec; Super shuffle engine introduced in 45 nm series. Relevant vector instructions: Instruction Description Cost Mnemonic Memory load/store 2.5 ← MOVDQA PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ PXOR,PAND,POR Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 Byte shuffling 3 (1) shuffle , lookup PSHUFB PALIGNR Memory alignment 2 (1) ⊳ Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  5. New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  6. New SSSE3 instructions Example: Bit manipulation Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  7. New SSSE3 instructions Example: Bit manipulation Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  8. New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  9. Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � a i z i . Polynomial basis: a ( z ) ∈ F 2 m = i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words. Notation: A is a 64-bit variable, A is a 128-bit variable. Graphical representation: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  10. Squaring in F 2 m m a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 � a ( z ) = i =0 m − 1 a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 � i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  11. Squaring in F 2 m We can write: a ( z ) = a L ( z ) + a H ( z ) · z 4 . Since squaring is a linear operation: a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Polynomials a L ( z ) and a H ( z ) are easy to compute: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  12. Squaring in F 2 m We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ) we use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  13. Proposed squaring in F 2 m t ( z ) = a L ( z ) 2 + a H ( z ) 2 · z 8 Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  14. Square root extraction in F 2 m m − 1 m − 1 √ a = a 2 m − 1 a i z i � 2 m − 1 = z 2 m − 1 � i � � � � = a i i =0 i =0 2 + √ z i − 1 i � � = a i z a i z 2 i even i odd a even + √ z · a odd = For f ( z ) = z 1223 + z 255 + 1 in F 2 1223 , we have √ z = z 612 + z 128 . Important: Multiplication by √ z requires shifts and additions only. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  15. Proposed square root in F 2 m Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  16. Multiplication in F 2 m 1 Multi-precision multiplication: An instance of Karatsuba; L´ opez-Dahab comb method; 2 Modular reduction. Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  17. Karatsuba multiplication in F 2 m c ( z ) = a ( z ) · b ( z ) A 1 B 1 z m + [( A 1 + A 0 )( B 1 + B 0 ) + A 1 B 1 + A 0 B 0 ] z ⌈ m / 2 ⌉ + A 0 B 0 . = Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  18. Karatsuba multiplication in F 2 m c ( z ) = a ( z ) · b ( z ) A 1 B 1 z m + [( A 1 + A 0 )( B 1 + B 0 ) + A 1 B 1 + A 0 B 0 ] z ⌈ m / 2 ⌉ + A 0 B 0 . = Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  19. L´ opez-Dahab multiplication in F 2 m We can compute u · b ( z ) using shifts and additions. If a ( z ) is divided into 4-bit polynomials, compute a ( z ) · b ( z ) by: Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  20. Proposed multiplication in F 2 m Algorithm 1 LD multiplication implemented with n 128-bit registers. Input: a ( z ) = a [0 .. n − 1] , b ( z ) = b [0 .. n − 1]. Output: c ( z ) = c [0 .. n − 1]. Note: m i denotes the vector of n 2 128-bit registers ( r ( i − 1+ n / 2) , . . . , r i ). 1: Compute T 0 ( u ) = u ( z ) · b ( z ) , T 1 ( u ) = u ( z ) · ( b ( z ) ≪ 4) for all u ( z ) of degree lower than 4. 2: ( r n − 1 . . . , r 0 ) ← 0 3: for k ← 56 downto 0 by 8 do 4: for j ← 1 to n − 1 by 2 do 5: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 6: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 7: m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 0 ( u ), m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 1 ( v ) 8: end for 9: ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 10: end for 11: for k ← 56 downto 0 by 8 do 12: for j ← 0 to n − 2 by 2 do 13: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 14: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 15: m j / 2 ← m j / 2 ⊕ T 0 ( u ), m j / 2 ← m j / 2 ⊕ T 1 ( v ) 16: end for 17: if k > 0 then ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 18: end for 19: return c = ( r n − 1 . . . , r 0 ) mod f ( z ) Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

  21. Modular reduction (64-bit mode) Algorithm 2 Fast modular reduction by f ( z ) = z 1223 + z 255 + 1. Input: c ( z ) = c [0 .. 2 n − 1]. Output: c ( z ) mod f ( z ) = c [0 .. n − 1]. 1: for i ← 2 n − 1 downto n do 2: t ← c [ i ] 3: c [ i − 15] ← c [ i − 15] ⊕ ( t ≫ 8) c [ i − 16] ← c [ i − 16] ⊕ ( t ≪ 56) 4: c [ i − 19] ← c [ i − 19] ⊕ ( t ≫ 7) 5: c [ i − 20] ← c [ i − 20] ⊕ ( t ≪ 57) 6: 7: end for 8: t ← c [19] ≫ 7 , c [0] ← c [0] ⊕ t , t ← t ≪ 7 9: c [3] ← c [3] ⊕ ( t ≪ 56) 10: c [4] ← c [4] ⊕ ( t ≫ 8) 11: c [19] ← ( c [19] ⊕ t ) ∧ 0x7F 12: return c Diego F. Aranha, Julio L´ opez, Darrel Hankerson High-speed parallel software implementation of the η T pairing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend