Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K - PowerPoint PPT Presentation

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K¨ olbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, Fran¸ cois-Xavier Standaert, Yosuke Todo, Benoˆ ıt Viguier Advances in permutation-based cryptography, Milan, October 10, 2018 1

What is a Permutation? Definition: A Permutation is a keyless block cipher. 2

What is a Permutation? Definition: A Permutation is a keyless block cipher. k 0 k 1 M f C Even-Mansour construction 2

What is a Permutation? Definition: A Permutation is a keyless block cipher. k 0 k 1 M f C Even-Mansour construction m 0 m 1 m 2 z 0 z 2 r bits f f f f c bits Absorbing phase Squeezing phase Sponge construction 2

Why Gimli? Currently we have: Permutation width in bits Benefits AES 128 very fast if the instruction is available . Chaskey 128 lightning fast on Cortex-M0/M3/M4 Keccak- f 200,400,800,1600 low-cost masking Salsa20,ChaCha20 512 very fast on CPUs with vector units . 3

Why Gimli? Currently we have: Permutation Hindrance AES Not that fast without HW . Chaskey Low security margin, slow with side-channel protection Keccak- f Huge state (800,1600) Salsa20,ChaCha20 Horrible on HW . 4

Why Gimli? Currently we have: Permutation Hindrance AES Not that fast without HW . Chaskey Low security margin, slow with side-channel protection Keccak- f Huge state (800,1600) Salsa20,ChaCha20 Horrible on HW . Can we have a permutation that is not too big, nor too small and good in all these areas? 4

Yes! Source: Wikipedia , Fair Use 5

What is Gimli? Gimli is: ◮ a 384-bit permutation (just the right size) • Sponge with c = 256 , r = 128 = ⇒ 128 bits of security • Cortex-M3/M4: full state in registers • AVR, Cortex-M0: 192 bits (half state) fit in registers 6

What is Gimli? Gimli is: ◮ a 384-bit permutation (just the right size) • Sponge with c = 256 , r = 128 = ⇒ 128 bits of security • Cortex-M3/M4: full state in registers • AVR, Cortex-M0: 192 bits (half state) fit in registers ◮ with high cross-platform performances ◮ designed for: • energy-efficient hardware • side-channel-protected hardware • microcontrollers • compactness • vectorization • short messages • high security level 6

Specifications: State j i Figure: State Representation 384 bits represented as: ◮ a parallelepiped with dimensions 3 × 4 × 32 (Keccak-like) ◮ or, as a 3 × 4 matrix of 32-bit words. 7

Specifications: Non-linear layer In parallel: x ← x ≪ 24 y ← y ≪ 9 x y In parallel: x ← x ⊕ ( z ≪ 1) ⊕ (( y ∧ z ) ≪ 2) z y ← y ⊕ x ⊕ (( x ∨ z ) ≪ 1) z ← z ⊕ y ⊕ (( x ∧ y ) ≪ 3) x y In parallel: z x ← z z ← x x y z Figure: The bit-sliced 9-to-3-bit SP-box applied to a column 8

Specifications: Linear layer Small Swap Big Swap Figure: The linear layer 9 e 3 7 7 9 ? ? ⊕ Figure: Constant addition 0x9e3779?? 9

Gimli in C extern void Gimli(uint32_t *state) { uint32_t round, column, x, y, z; for (round = 24; round > 0; --round) { for (column = 0; column < 4; ++column) { x = rotate(state[ column], 24); // x <<< 24 y = rotate(state[4 + column], 9); // y <<< 9 z = state[8 + column]; state[8 + column] = x ^ (z << 1) ^ ((y & z) << 2); state[4 + column] = y ^ x ^ ((x | z) << 1); state[column] = z ^ y ^ ((x & y) << 3); } if ((round & 3) == 0) { // small swap: pattern s...s...s... etc. x = state[0]; state[0] = state[1]; state[1] = x; x = state[2]; state[2] = state[3]; state[3] = x; } if ((round & 3) == 2) { // big swap: pattern ..S...S...S. etc. x = state[0]; state[0] = state[2]; state[2] = x; x = state[1]; state[1] = state[3]; state[3] = x; } if ((round & 3) == 0) { // add constant: pattern c...c...c... etc. state[0] ^= (0x9e377900 | round); } } } 10

Specifications: Rounds Round 24 Non-linear layer Small Swap & Round constant addition Round 23 Non-linear layer Round 22 Non-linear layer Big Swap Round 21 Non-linear layer Non-linear layer Round 20 Small Swap & Round constant addition Round 19 Non-linear layer Non-linear layer Round 18 Big Swap . . . . . . Figure: 7 first rounds of Gimli 11

Unrolled AVR & Cortex-M0 1. SP-box col. 0 2. SP-box col. 1 Round 24 swap word s 0 , 0 and s 0 , 1 1 2 7 8 3. SP-box col. 1 4. SP-box col. 1 5. SP-box col. 0 Round 23 5 3 11 9 6. SP-box col. 0 store columns 0,1 ; load columns 2,3 7. SP-box col. 2 Round 22 6 4 12 10 8. SP-box col. 3 swap word s 0 , 2 and s 0 , 3 9. SP-box col. 3 10. SP-box col. 3 11. SP-box col. 2 Round 21 21 23 13 15 12. SP-box col. 2 push word s 0 , 2 , s 0 , 3 ; load word s 0 , 0 , s 0 , 1 13. SP-box col. 2 Round 20 22 24 14 16 14. SP-box col. 2 15. SP-box col. 3 16. SP-box col. 3 swap word s 0 , 2 and s 0 , 3 Round 19 27 25 19 17 17. SP-box col. 3 18. SP-box col. 3 19. SP-box col. 2 Round 18 28 26 20 18 20. SP-box col. 2 store columns 2,3 ; load columns 0,1 . . . . . . Figure: Computation order on AVR & Cortex-M0 12

Implementation in Assembly # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 v ← z ≪ 1 v ← y u ← u ∧ v y ← y ≪ 9 x ← z ∧ y y ← u ∨ z u ← u ≪ 3 u ← x x ← x ≪ 2 y ← y ≪ 1 z ← z ⊕ v . x ← u ⊕ x y ← u ⊕ y z ← z ⊕ u . x ← x ⊕ v y ← y ⊕ v . The SP-box requires only 2 additional registers u and v . 13

Rotate for free on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 v ← z ≪ 1 v ← y u ← u ∧ (v ≪ 9) . x ← z ∧ (y ≪ 9) y ← u ∨ z u ← u ≪ 3 u ← x x ← x ≪ 2 y ← y ≪ 1 z ← z ⊕ (v ≪ 9) . x ← u ⊕ x y ← u ⊕ y z ← z ⊕ u . x ← x ⊕ v y ← y ⊕ (v ≪ 9) . Remove y <<< 9 . 14

Shift for free on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . v ← y u ← u ∧ (v ≪ 9) . x ← z ∧ (y ≪ 9) y ← u ∨ z . u ← x . . z ← z ⊕ (v ≪ 9) . x ← u ⊕ (x ≪ 2) y ← u ⊕ (y ≪ 1) z ← z ⊕ (u ≪ 3) . x ← x ⊕ (z ≪ 1) y ← y ⊕ (v ≪ 9) . Get rid of the other shifts. 15

Free mov on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . v ← y x ← x ∧ (v ≪ 9) . u ← z ∧ (y ≪ 9) y ← x ∨ z . . . . z ← z ⊕ (v ≪ 9) . u ← x ⊕ (u ≪ 2) y ← x ⊕ (y ≪ 1) z ← z ⊕ (x ≪ 3) . u ← u ⊕ (z ≪ 1) y ← y ⊕ (v ≪ 9) . Remove the last mov : u contains the new value of x y contains the new value of y z contains the new value of z 16

Free mov on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 . . x ← x ∧ (y ≪ 9) . u ← z ∧ (y ≪ 9) v ← x ∨ z . . . . z ← z ⊕ (y ≪ 9) . u ← x ⊕ (u ≪ 2) v ← x ⊕ (v ≪ 1) z ← z ⊕ (x ≪ 3) . u ← u ⊕ (z ≪ 1) v ← v ⊕ (y ≪ 9) . Remove the last mov : u contains the new value of x v contains the new value of y z contains the new value of z 17

Free swap on Cortex-M3/M4 # Rotate # Compute x # Compute y # Compute z x ← x ≪ 24 u ← z ∧ (y ≪ 9) v ← x ∨ z x ← x ∧ (y ≪ 9) . u ← x ⊕ (u ≪ 2) v ← x ⊕ (v ≪ 1) z ← z ⊕ (y ≪ 9) . u ← u ⊕ (z ≪ 1) v ← v ⊕ (y ≪ 9) z ← z ⊕ (x ≪ 3) Swap x and z : u contains the new value of z v contains the new value of y z contains the new value of x SP-box requires a total of 10 instructions. 18

How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Cortex-M0 49 40 9 . 8 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

How fast is Gimli? (Software) Cycles / Bytes (Lower is better) AVR ATmega 413 small 216 213 fast 171 small 151 fast Cortex-M0 49 40 9 . 8 Cortex-M3/M4 63 34 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

How fast is Gimli? (Software) Cycles / Bytes (Lower is better) Cortex-A8 AVR ATmega 413 small 19 . 3 x blocks 216 16 . 9 1 block 213 8 . 73 fast 1 block 171 6 . 25 small x blocks 151 5 . 48 fast x blocks Cortex-M0 49 40 9 . 8 Cortex-M3/M4 63 34 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12]

How fast is Gimli? (Software) Cycles / Bytes (Lower is better) Cortex-A8 AVR ATmega 413 small 19 . 3 x blocks 216 16 . 9 1 block 213 8 . 73 fast 1 block 171 6 . 25 small x blocks 151 5 . 48 fast x blocks Intel Haswell Cortex-M0 6 . 76 1 blocks 49 4 . 46 40 1 block 2 . 84 9 . 8 1 block 2 . 33 2 blocks 1 . 77 4 blocks Cortex-M3/M4 1 . 38 8 blocks 63 1 . 2 8 blocks 34 0 . 85 x blocks 21 13 7 Chaskey Gimli Salsa20 ChaCha20 AES-128 NORX-32-4-1 Keccak- f [400,12] Keccak- f [800,12] 19

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K - PowerPoint PPT Presentation

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K olbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, Fran cois-Xavier Standaert, Yosuke Todo, Beno t Viguier

Gimli: Server Process Monitoring & Fault Analysis Agenda What problem does Gimli solve?

Cryptanalysis Results on the NIST Candidate Gimli (WIP) Antonio Flrez Gutirrez, Gatan

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Building Consistent Cross-Platform Interfaces Building Consistent Cross-Platform Interfaces

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

INSIDE THE PLATFORM Who are we Classic platforms Classic platform Modern platform Modern

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

Native-quality, cross-platform HTML5 apps Peter Helm 11.9.2012 Enyo is A framework for

A Scalable Cross- -Platform Platform A Scalable Cross Infrastructure for Application

Refined enumeration of permutations sorted with two stacks and a D 8 symmetry Mathilde Bouvel and

Permuting Upper and Lower bounds [Aggarwal, Vitter, 88] Page 1 Upper Bound Assume instance is

Quarter Turn Baxter Permutations Kevin Dilks North Dakota State University June 26, 2017 Kevin

Polyas Theory of Counting Generating Functions Polyas Theory of Counting Example 1 A disc

Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting Kousha Etessami U. of

Shuffling properties for products of random permutations Olivier Bernardi (MIT) Joint work with

Backtracking And Branch And Bound Subset & Permutation Problems Subset problem of size n.

Sequence Covering Arrays Lower Bounds Upper Bounds Existence Results Charles J. Colbourn 1

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K - PowerPoint PPT Presentation

Gimli: A Cross-Platform Permutation Daniel J. Bernstein, Stefan K olbl, Stefan Lucks, Pedro Maat Costa Massolino, Florian Mendel, Kashif Nawaz, Tobias Schneider, Peter Schwabe, Fran cois-Xavier Standaert, Yosuke Todo, Beno t Viguier

Gimli: Server Process Monitoring &amp; Fault Analysis Agenda What problem does Gimli solve?

Cryptanalysis Results on the NIST Candidate Gimli (WIP) Antonio Flrez Gutirrez, Gatan

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Building Consistent Cross-Platform Interfaces Building Consistent Cross-Platform Interfaces

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

INSIDE THE PLATFORM Who are we Classic platforms Classic platform Modern platform Modern

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

Native-quality, cross-platform HTML5 apps Peter Helm 11.9.2012 Enyo is A framework for

A Scalable Cross- -Platform Platform A Scalable Cross Infrastructure for Application

Refined enumeration of permutations sorted with two stacks and a D 8 symmetry Mathilde Bouvel and

Permuting Upper and Lower bounds [Aggarwal, Vitter, 88] Page 1 Upper Bound Assume instance is

Quarter Turn Baxter Permutations Kevin Dilks North Dakota State University June 26, 2017 Kevin

Polyas Theory of Counting Generating Functions Polyas Theory of Counting Example 1 A disc

Discrete Mathematics &amp; Mathematical Reasoning Chapter 6: Counting Kousha Etessami U. of

Shuffling properties for products of random permutations Olivier Bernardi (MIT) Joint work with

Backtracking And Branch And Bound Subset &amp; Permutation Problems Subset problem of size n.

Sequence Covering Arrays Lower Bounds Upper Bounds Existence Results Charles J. Colbourn 1

Gimli: Server Process Monitoring & Fault Analysis Agenda What problem does Gimli solve?

Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting Kousha Etessami U. of

Backtracking And Branch And Bound Subset & Permutation Problems Subset problem of size n.