how fast can higher order masking be in software
play

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi - PowerPoint PPT Presentation

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi and Matthieu Rivain EUROCRYPT 2017, Paris 1 Introduction 2 Field Multiplications 3 Non-Linear Operations 4 Generic Polynomial Methods 5 Polynomial Methods for


  1. How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi and Matthieu Rivain EUROCRYPT 2017, Paris

  2. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 2/32

  3. Higher-Order Masking x = x 1 + x 2 + · · · + x d 3/32

  4. Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) 3/32

  5. Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) � Non-linear operations: O ( d 2 ) 3/32

  6. Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) � Non-linear operations: O ( d 2 ) � Challenge for blockciphers: S-boxes 3/32

  7. Ishai-Sahai-Wagner Multiplication � c i = � � × � � � � � a i b i = a i × b j i i i i,j       a 1 b 1 a 1 b 2 . . . a 1 b d 0 0 . . . 0 0 r 1 , 2 . . . r 1 ,d . . . . . . 0 a 2 b 2 . . . . a 2 b 1 0 . . . . r 1 , 2 0 . . . .        +  +       . . . . . . ... . . . . . .       . . . . . . r d,d − 1     0 0 . . . a d b d a d b 1 a d b 2 . . . 0 r 1 ,d r d,d − 1 0 4/32

  8. The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = i =0 5/32

  9. The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = � i =0 Generic Methods � S ( x ) = ( p i ⋆ q i )( x ) i � CRV decomposition, ⋆ = × (CHES 2014) � Algebraic decomposition, ⋆ = ◦ (CRYPTO 2015) 5/32

  10. The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = � i =0 � Generic Methods AES Specific Methods � S AES ( x ) = Aff ( x 254 ) S ( x ) = ( p i ⋆ q i )( x ) i � CRV decomposition, ⋆ = × (CHES 2014) � RP multiplication chain (CHES 2010) � Algebraic decomposition, ⋆ = ◦ (CRYPTO 2015) � KHL multiplication chain (CHES 2011) 5/32

  11. Our results � Optimized implementations of state of the art higher-order masking techniques � Bottom-up approach: ◮ base field multiplication ◮ ISW/CPRR ◮ polynomial methods � Finely tuned ARM assembly (parallelization) � Alternative strategy: bitslice method (new AES and PRESENT speed records) 6/32

  12. ARM � 32-bit architecture with 16 registers (13 user accessible register) � Barrelshifter: shifts and rotates virtually free � Example: x -times and add on GF(2)[ x ] in 1 cycle EOR $acc , $var , $acc , LSL #1 7/32

  13. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 8/32

  14. Field Multiplication � Goal: efficient implementation of multiplication over GF(2 n ) � Fastest method: precomputed look-up table � Limitation: constrained memory on embedded system n 4 5 6 7 8 9 10 Table size 0.25 kiB 1 kiB 4 kiB 16 kiB 64 kiB 512 kiB 2048 kiB 9/32

  15. Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 10/32

  16. Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 n 2 + a ℓ ) × ( b h x n 2 + b ℓ ) a × b = ( a h x Karatsuba = T1[ a h | b h ] + T2[ a ℓ | b ℓ ] + T3[ a h + a ℓ | b h + b ℓ ] 10/32

  17. Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 n 2 + a ℓ ) × ( b h x n 2 + b ℓ ) a × b = ( a h x Half table = T1[ a h | a ℓ | b h ] + T2[ a h | a ℓ | b ℓ ] 10/32

  18. Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 code size 52 56 B 80 B 88 B 90 B 152 B 268 B � For n = 4 : full table ◮ Fastest multiplication: 4 clock cycles ◮ Low code size: 268 B 10/32

  19. Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 code size 52 176 B 560 B 808 B 810 B 8216 B 64 kiB � For n = 8 : exp-log or half-tab ◮ tradeoff between clock cycles and code size 10/32

  20. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 11/32

  21. Quadratic Operations � ISW ◮ Secure GF-mult of 2 operands ◮ Might need refreshing (see paper for details) � CPRR ◮ Evaluation of quadratic functions in 1 operand ◮ Similar to ISW: GF-mult � lookup tables ◮ Twice more random 12/32

  22. Performances Comparisons 3 , 500 ISW-FT ISW-HT 3 , 000 ISW-EL 2 , 500 CPRR Clock Cycles 2 , 000 1 , 500 1 , 000 500 0 d = 3 d = 5 d = 10 � ISW < CPRR when table too huge � Asymptotical comp: 1 CPRR � 1.16 ISW-FT, 0.88 ISW-HT, 0.75 ISW-EL 13/32

  23. Parallelization � 32-bit register filled with only n -bit elements � Perform several ISW/CPRR in parallel: ◮ n = 4 � 8 elements/register ◮ n = 8 � 4 elements/register � Consequence: ◮ Parallel: load, store, xor, loops ◮ Sequential: GF mult, CPRR lookups 14/32

  24. Performances Gain of Parallelization � n = 8 (4 elements) � n = 4 (8 elements) ISW-HT ISW-FT ISW-EL CPRR 15 , 000 15 , 000 CPRR sequential Clock Cycles Clock Cycles sequential parallel parallel 10 , 000 10 , 000 5 , 000 5 , 000 0 0 d = 3 d = 5 d = 10 d = 3 d = 5 d = 10 � Asympt. ratio: CPRR 54% . � Asympt. ratio: ISW 42% . 15/32

  25. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 16/32

  26. Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) 17/32

  27. Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B 17/32

  28. Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B � find p i by solving a linear system 17/32

  29. Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B � find p i by solving a linear system � CRV vs AD: ◮ CRV [CRV14]: ⋆ = GF-multiplication � ISW multiplication ◮ AD [CPRR15]: ⋆ = composition � CPRR evaluation 17/32

  30. CRV Improvement � Use CPRR for the basis computation � Example for n = 8 : This paper CRV x 3 = x 3 x 3 = x · x 2 x 9 = ( x 3 ) 3 x 7 = x · ( x 3 ) 2 x 5 = x 5 x 29 = x · ( x 7 ) 4 x 25 = ( x 5 ) 5 x 87 = x 3 · x 29 x 125 = ( x 25 ) 5 x 251 = ( x 6 ) 16 · ( x 87 ) 128 x 115 = ( x 125 ) 5 5 ISW 6 CPRR 18/32

  31. Implementation Results � n = 4 (8 s-boxes in / � n = 8 (4 s-boxes in / / ) / ) 3 , 000 Alge. dec. Alge. dec. 800 CRV-FT CRV-HT 2 , 500 CRV-EL Clock Cycles × 10 2 Clock Cycles × 10 600 2 , 000 1 , 500 400 1 , 000 200 500 0 0 d = 3 d = 5 d = 10 d = 3 d = 5 d = 10 19/32

  32. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 20/32

  33. Polynomial Methods for AES � Based on the specific algebraic structure of the AES: S ( x ) = Aff( x 254 ) � RP10 method : 4 ISW mult � Security flaw due to refreshing � Patch [CPRR13]: 1 CPRR + 3 ISW � Improvement [GPS14]: 3 CPRR + 1 ISW � KHL11 method: 5 ISW mult on GF(16) � Patch [this paper]: 1 CPRR + 4 ISW 21/32

  34. Implementation Results � 16 s-boxes in / / KHL 100 RP-HT RP-EL Clock Cycles × 10 3 80 60 40 20 0 d = 3 d = 5 d = 10 � KHL < RP- ∗ : smaller elements � higher parallelization degree 22/32

  35. 1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 23/32

  36. Bitslice for the AES � Sbox seen as boolean circuit X 1 X 2 X n x 1 x 2 . . . x n . . . . . . . . . � + + CPU CPU XOR XOR . . . . . . + CPU AND � 16 S-boxes in / / 24/32

  37. Application for AES S-boxes � Circuit for the AES S-box [BMP13] ◮ 83 XOR gates ◮ 32 AND gates � Bitslice (16 s-boxes) ◮ 83 XOR instructions ◮ 32 AND instructions � Masking at the order d : ◮ 83 × d XOR instructions ◮ 32 ISW-AND 25/32

  38. Improvement 2 16-bit ISW-AND � 1 32-bit ISW-AND � Goal: grouping AND gates per pairs � Validation on BMP circuit � 16 s-boxes = 16 ISW-AND � 1 ISW-AND per s-box 26/32

  39. Performance Comparison of ISW 8 , 000 ISW-AND (32 / / AND) ISW-FT (8 / / GF(16)-mult) ISW-HT (4 / / GF(256)-mult) 6 , 000 Clock Cycles 4 , 000 2 , 000 0 d = 3 d = 5 d = 10 27/32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend