gimli a cross platform permutation
play

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K - PowerPoint PPT Presentation

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K olbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, Fran cois-Xavier Standaert, Yosuke Todo, Beno t Viguier


  1. Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K¨ olbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, Fran¸ cois-Xavier Standaert, Yosuke Todo, Benoˆ ıt Viguier Advances in permutation-based cryptography, Milan, October 10, 2018 1

  2. What is a Permutation? Definition: A Permutation is a keyless block cipher. 2

  3. What is a Permutation? Definition: A Permutation is a keyless block cipher. 2

  4. What is a Permutation? Definition: A Permutation is a keyless block cipher. k 0 k 1 M f C Even-Mansour construction 2

  5. What is a Permutation? Definition: A Permutation is a keyless block cipher. k 0 k 1 M f C Even-Mansour construction m 0 m 1 m 2 z 0 z 2 r bits f f f f c bits Absorbing phase Squeezing phase Sponge construction 2

  6. Why Gimli? Currently we have: Permutation width in bits Benefits AES 128 very fast if the instruction is available . Chaskey 128 lightning fast on Cortex-M0/M3/M4 Keccak- f 200,400,800,1600 low-cost masking Salsa20,ChaCha20 512 very fast on CPUs with vector units . 3

  7. Why Gimli? Currently we have: Permutation Hindrance AES Not that fast without HW . Chaskey Low security margin, slow with side-channel protection Keccak- f Huge state (800,1600) Salsa20,ChaCha20 Horrible on HW . 4

  8. Why Gimli? Currently we have: Permutation Hindrance AES Not that fast without HW . Chaskey Low security margin, slow with side-channel protection Keccak- f Huge state (800,1600) Salsa20,ChaCha20 Horrible on HW . Can we have a permutation that is not too big, nor too small and good in all these areas? 4

  9. Yes! Source: Wikipedia , Fair Use 5

  10. What is Gimli? Gimli is: ◮ a 384-bit permutation (just the right size) • Sponge with c = 256 , r = 128 = ⇒ 128 bits of security • Cortex-M3/M4: full state in registers • AVR, Cortex-M0: 192 bits (half state) fit in registers 6

  11. What is Gimli? Gimli is: ◮ a 384-bit permutation (just the right size) • Sponge with c = 256 , r = 128 = ⇒ 128 bits of security • Cortex-M3/M4: full state in registers • AVR, Cortex-M0: 192 bits (half state) fit in registers ◮ with high cross-platform performances ◮ designed for: • energy-efficient hardware • side-channel-protected hardware • microcontrollers • compactness • vectorization • short messages • high security level 6

  12. Specifications: State j i Figure: State Representation 384 bits represented as: ◮ a parallelepiped with dimensions 3 × 4 × 32 (Keccak-like) ◮ or, as a 3 × 4 matrix of 32-bit words. 7

  13. Specifications: Non-linear layer In parallel: x ← x ≪ 24 y ← y ≪ 9 x y In parallel: x ← x ⊕ ( z ≪ 1) ⊕ (( y ∧ z ) ≪ 2) z y ← y ⊕ x ⊕ (( x ∨ z ) ≪ 1) z ← z ⊕ y ⊕ (( x ∧ y ) ≪ 3) x y In parallel: z x ← z z ← x x y z Figure: The bit-sliced 9-to-3-bit SP-box applied to a column 8

  14. Specifications: Linear layer Small Swap Big Swap Figure: The linear layer 9 e 3 7 7 9 ? ? ⊕ Figure: Constant addition 0x9e3779?? 9

  15. Gimli in C extern void Gimli(uint32_t *state) { uint32_t round, column, x, y, z; for (round = 24; round > 0; --round) { for (column = 0; column < 4; ++column) { x = rotate(state[ column], 24); // x <<< 24 y = rotate(state[4 + column], 9); // y <<< 9 z = state[8 + column]; state[8 + column] = x ^ (z << 1) ^ ((y & z) << 2); state[4 + column] = y ^ x ^ ((x | z) << 1); state[column] = z ^ y ^ ((x & y) << 3); } if ((round & 3) == 0) { // small swap: pattern s...s...s... etc. x = state[0]; state[0] = state[1]; state[1] = x; x = state[2]; state[2] = state[3]; state[3] = x; } if ((round & 3) == 2) { // big swap: pattern ..S...S...S. etc. x = state[0]; state[0] = state[2]; state[2] = x; x = state[1]; state[1] = state[3]; state[3] = x; } if ((round & 3) == 0) { // add constant: pattern c...c...c... etc. state[0] ^= (0x9e377900 | round); } } } 10

  16. Specifications: Rounds Round 24 Non-linear layer Small Swap & Round constant addition Round 23 Non-linear layer Round 22 Non-linear layer Big Swap Round 21 Non-linear layer Non-linear layer Round 20 Small Swap & Round constant addition Round 19 Non-linear layer Non-linear layer Round 18 Big Swap . . . . . . Figure: 7 first rounds of Gimli 11

  17. Unrolled AVR & Cortex-M0 1. SP-box col. 0 2. SP-box col. 1 Round 24 swap word s 0 , 0 and s 0 , 1 1 2 7 8 3. SP-box col. 1 4. SP-box col. 1 5. SP-box col. 0 Round 23 5 3 11 9 6. SP-box col. 0 store columns 0,1 ; load columns 2,3 7. SP-box col. 2 Round 22 6 4 12 10 8. SP-box col. 3 swap word s 0 , 2 and s 0 , 3 9. SP-box col. 3 10. SP-box col. 3 11. SP-box col. 2 Round 21 21 23 13 15 12. SP-box col. 2 push word s 0 , 2 , s 0 , 3 ; load word s 0 , 0 , s 0 , 1 13. SP-box col. 2 Round 20 22 24 14 16 14. SP-box col. 2 15. SP-box col. 3 16. SP-box col. 3 swap word s 0 , 2 and s 0 , 3 Round 19 27 25 19 17 17. SP-box col. 3 18. SP-box col. 3 19. SP-box col. 2 Round 18 28 26 20 18 20. SP-box col. 2 store columns 2,3 ; load columns 0,1 . . . . . . Figure: Computation order on AVR & Cortex-M0 12

  18. Implementation in Assembly # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 v ← z ≪ 1 v ← y u ← u ∧ v y ← y ≪ 9 x ← z ∧ y y ← u ∨ z u ← u ≪ 3 u ← x x ← x ≪ 2 y ← y ≪ 1 z ← z ⊕ v . x ← u ⊕ x y ← u ⊕ y z ← z ⊕ u . x ← x ⊕ v y ← y ⊕ v . The SP-box requires only 2 additional registers u and v . 13

  19. Rotate for free on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 v ← z ≪ 1 v ← y u ← u ∧ (v ≪ 9) . x ← z ∧ (y ≪ 9) y ← u ∨ z u ← u ≪ 3 u ← x x ← x ≪ 2 y ← y ≪ 1 z ← z ⊕ (v ≪ 9) . x ← u ⊕ x y ← u ⊕ y z ← z ⊕ u . x ← x ⊕ v y ← y ⊕ (v ≪ 9) . Remove y <<< 9 . 14

  20. Shift for free on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . v ← y u ← u ∧ (v ≪ 9) . x ← z ∧ (y ≪ 9) y ← u ∨ z . u ← x . . z ← z ⊕ (v ≪ 9) . x ← u ⊕ (x ≪ 2) y ← u ⊕ (y ≪ 1) z ← z ⊕ (u ≪ 3) . x ← x ⊕ (z ≪ 1) y ← y ⊕ (v ≪ 9) . Get rid of the other shifts. 15

  21. Free mov on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . v ← y x ← x ∧ (v ≪ 9) . u ← z ∧ (y ≪ 9) y ← x ∨ z . . . . z ← z ⊕ (v ≪ 9) . u ← x ⊕ (u ≪ 2) y ← x ⊕ (y ≪ 1) z ← z ⊕ (x ≪ 3) . u ← u ⊕ (z ≪ 1) y ← y ⊕ (v ≪ 9) . Remove the last mov : u contains the new value of x y contains the new value of y z contains the new value of z 16

  22. Free mov on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . . x ← x ∧ (y ≪ 9) . u ← z ∧ (y ≪ 9) v ← x ∨ z . . . . z ← z ⊕ (y ≪ 9) . u ← x ⊕ (u ≪ 2) v ← x ⊕ (v ≪ 1) z ← z ⊕ (x ≪ 3) . u ← u ⊕ (z ≪ 1) v ← v ⊕ (y ≪ 9) . Remove the last mov : u contains the new value of x v contains the new value of y z contains the new value of z 17

  23. Free swap on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 u ← z ∧ (y ≪ 9) v ← x ∨ z x ← x ∧ (y ≪ 9) . u ← x ⊕ (u ≪ 2) v ← x ⊕ (v ≪ 1) z ← z ⊕ (y ≪ 9) . u ← u ⊕ (z ≪ 1) v ← v ⊕ (y ≪ 9) z ← z ⊕ (x ≪ 3) Swap x and z : u contains the new value of z v contains the new value of y z contains the new value of x SP-box requires a total of 10 instructions. 18

  24. How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

  25. How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Cortex-M0 49 40 9 . 8 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

  26. How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Cortex-M0 49 40 9 . 8 Cortex-M3/M4 63 34 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

  27. How fast is Gimli? (Software) Cycles / Bytes (Lower is better) Cortex-A8 AVR ATmega 413 small 19 . 3 x blocks 216 16 . 9 1 block 213 8 . 73 fast 1 block 171 6 . 25 small x blocks 151 5 . 48 fast x blocks Cortex-M0 49 40 9 . 8 Cortex-M3/M4 63 34 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

  28. How fast is Gimli? (Software) Cycles / Bytes (Lower is better) Cortex-A8 AVR ATmega 413 small 19 . 3 x blocks 216 16 . 9 1 block 213 8 . 73 fast 1 block 171 6 . 25 small x blocks 151 5 . 48 fast x blocks Intel Haswell Cortex-M0 6 . 76 1 blocks 49 4 . 46 40 1 block 2 . 84 9 . 8 1 block 2 . 33 2 blocks 1 . 77 4 blocks Cortex-M3/M4 1 . 38 8 blocks 63 1 . 2 8 blocks 34 0 . 85 x blocks 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12] 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend