Innovations in permutation-based crypto based on joint work with - - PowerPoint PPT Presentation

innovations in permutation based crypto
SMART_READER_LITE
LIVE PREVIEW

Innovations in permutation-based crypto based on joint work with - - PowerPoint PPT Presentation

Innovations in permutation-based crypto based on joint work with Guido Bertoni 3 , Seth Hoffert, Michal Peeters 1 , Gilles Van Assche 1 and Ronny Van Keer 1 ECC, Nijmegen, November 14, 2017 1 / 35 Joan Daemen 1 , 2 1 STMicroelectronics 2


slide-1
SLIDE 1

Innovations in permutation-based crypto

Joan Daemen1,2 based on joint work with Guido Bertoni3, Seth Hoffert, Michaël Peeters1, Gilles Van Assche1 and Ronny Van Keer1

1STMicroelectronics 2Radboud University 3Security Pattern

ECC, Nijmegen, November 14, 2017

1 / 35

slide-2
SLIDE 2

Pseudo-random function (PRF) input …

2 / 35

slide-3
SLIDE 3

Stream encryption nonce plaintext = ciphertext

3 / 35

slide-4
SLIDE 4

Message authentication (MAC) plaintext plaintext

4 / 35

slide-5
SLIDE 5

Authenticated encryption nonce plaintext = ciphertext plaintext

5 / 35

slide-6
SLIDE 6

String sequence input and incrementality

packet #1 packet #1 FK ( P(1))

6 / 35

slide-7
SLIDE 7

String sequence input and incrementality

packet #1 packet #2 packet #1 packet #2 FK ( P(2) ◦ P(1))

6 / 35

slide-8
SLIDE 8

String sequence input and incrementality

packet #1 packet #2 packet #3 packet #1 packet #2 packet #3 FK ( P(3) ◦ P(2) ◦ P(1))

6 / 35

slide-9
SLIDE 9

Session authenticated encryption (SAE) [KT, SAC 2011]

K, N 1 T(0) A(1) P(1) C(1) T(1) A(2) P(2) C(2) T(3) A(3) P(3) C(3) T(2)

Initialization taking nonce N T ← 0t + FK (N) history ← N return tag T of length t Wrap taking metadata A and plaintext P C ← P + FK (A ◦ history) T ← 0t + FK (C ◦ A ◦ history) history ← C ◦ A ◦ history return ciphertext C of length |P| and tag T of length t

7 / 35

slide-10
SLIDE 10

Session authenticated encryption (SAE) [KT, SAC 2011]

K, N 1 T(0) A(1) P(1) C(1) T(1) A(2) P(2) C(2) T(3) A(3) P(3) C(3) T(2)

Initialization taking nonce N T ← 0t + FK (N) history ← N return tag T of length t Wrap taking metadata A and plaintext P C ← P + FK (A ◦ history) T ← 0t + FK (C ◦ A ◦ history) history ← C ◦ A ◦ history return ciphertext C of length |P| and tag T of length t

7 / 35

slide-11
SLIDE 11

Synthetic initialization value (SIV) of [KT, eprint 2016/1188]

A P FK FK T C

Unwrap taking metadata A, ciphertext C and tag T P ← C + FK (T ◦ A) τ ← 0t + FK (P ◦ A) if τ ̸= T then return error! else return plaintext P of length |C| Variant of SIV of [Rogaway & Shrimpton, EC 2006]

8 / 35

slide-12
SLIDE 12

Wide block cipher (WBC), as in [KT, eprint 2016/1188]

Encipher P with K and tweak W (L, R) ← split(P) R0 ← R0 + HK(L ◦ 0) L ← L + GK (R ◦ W ◦ 1) R ← R + GK (L ◦ W ◦ 0) L0 ← L0 + HK(R ◦ 1) C ← L ∥ R return ciphertext C of length |P|

Pʹleft Pʹright W HK(... ° 0) GK(... ° 1) GK(... ° 0) HK(... ° 1) Cleft Cright

Inspired by HHFHFH of [Bernstein, Nandi & Sarkar, Dagstuhl 2016]

9 / 35

slide-13
SLIDE 13

How to build a PRF?

10 / 35

slide-14
SLIDE 14

How to build a PRF?

By icelight (flickr.com)

10 / 35

slide-15
SLIDE 15

Sponge [Keccak Team, Ecrypt 2008]

input

  • utput
  • uter

inner r c f f f f f f absorbing squeezing

Taking K as first part of input gives a PRF

11 / 35

slide-16
SLIDE 16

More efficient: donkeySponge [Keccak Team, DIAC 2012]

12 / 35

slide-17
SLIDE 17

Incrementality: duplex [Keccak Team, SAC 2011]

r c

  • uter

inner initialize pad trunc f duplexing σ0 Z0 pad trunc f duplexing σ1 Z1 pad trunc f duplexing σ2 Z2 …

13 / 35

slide-18
SLIDE 18

More efficient: MonkeyDuplex [Keccak Team, DIAC 2012]

Instances: Ketje [Keccak Team, now extended with Ronny Van Keer, CAESAR 2014] + half a dozen other CAESAR submissions

14 / 35

slide-19
SLIDE 19

Consolidation: Full-state keyed duplex

± K f iv Z ¾ f Z ¾ f Z ¾ …

[Mennink, Reyhanitabar, & Vizar, Asiacrypt 2015] [Daemen, Mennink & Van Assche, Asiacrypt 2017]

15 / 35

slide-20
SLIDE 20

SAE with full-state keyed duplex: Motorist [KT, Keyak 2015]

SUV 1 T(0) A(1) P(1) C(1) T(1) P(2) C(2) T(2) A(3) T(3)

16 / 35

slide-21
SLIDE 21

How to build a parallelizable PRF?

by Peter Miller (flick.com)

17 / 35

slide-22
SLIDE 22

How to build a parallelizable PRF?

by Barilla Food Service

17 / 35

slide-23
SLIDE 23

Farfalle: early attempt [KT 2014-2016]

k f M0

1

k f M1

i

k f Mi … … f k Z0 f k

1

Z1 f k

j

Zj

Similar to Protected Counter Sums [Bernstein, ”stretch”, JOC 1999] Problem: collisions with higher-order differentials if f has low degree

18 / 35

slide-24
SLIDE 24

Farfalle: early attempt [KT 2014-2016]

k f M0

1

k f M1

i

k f Mi … … f k Z0 f k

1

Z1 f k

j

Zj

Similar to Protected Counter Sums [Bernstein, ”stretch”, JOC 1999] Problem: collisions with higher-order differentials if f has low degree

18 / 35

slide-25
SLIDE 25

Farfalle now [Keccak Team + Seth Hoffert, ToSC 2017]

pc

c

m0 k pc

c

m1 k … pc i

c

mi k pe

e

z0 k′ pe

e

z1 k′ … pe j

e

zj k′ K∥10∗ pb

i+2

c

pd

Input mask rolling and pc against accumulator collisions State rolling, pe and output mask against state retrieval at output Middle pd against higher-order DC Input-output attacks have to deal with pe ◦ pd ◦ pc

19 / 35

slide-26
SLIDE 26

Kravatte = Farfalle with Keccak-p as in eprint 2016/1188

pc

c

m0 k pc

c

m1 k … pc i

c

mi k pe

e

z0 k′ pe

e

z1 k′ … pe j

e

zj k′ K∥10∗ pb

i+2

c

pd

Target security: 128 bits, incl. multi-target pi = Keccak-p[1600] with # rounds in pb, pc, pd, pe being 6, 6, 4, 4 Rolling function as in [Granger, Jovanovic, Mennink & Neves, EC 2016], linear with order 2320 − 1

20 / 35

slide-27
SLIDE 27

Kravatte = Farfalle with Keccak-p as in eprint 2016/1188

pc

c

m0 k pc

c

m1 k … pc i

c

mi k pe

e

z0 k′ pe

e

z1 k′ … pe j

e

zj k′ K∥10∗ pb

i+2

c

pd

Target security: 128 bits, incl. multi-target pi = Keccak-p[1600] with # rounds in pb, pc, pd, pe being 6, 6, 4, 4 Rolling function as in [Granger, Jovanovic, Mennink & Neves, EC 2016], linear with order 2320 − 1

20 / 35

slide-28
SLIDE 28

Kravatte = Farfalle with Keccak-p as in eprint 2016/1188

pc

c

m0 k pc

c

m1 k … pc i

c

mi k pe

e

z0 k′ pe

e

z1 k′ … pe j

e

zj k′ K∥10∗ pb

i+2

c

pd

Target security: 128 bits, incl. multi-target pi = Keccak-p[1600] with # rounds in pb, pc, pd, pe being 6, 6, 4, 4 Rolling function as in [Granger, Jovanovic, Mennink & Neves, EC 2016], linear with order 2320 − 1

20 / 35

slide-29
SLIDE 29

Kravatte as in TOSC 2018

f m0 k f m1 k … f i mi k f z0 k′ f z1 k′ … f j zj k′ K∥10∗ f

i+2

f

Due to theoretical attack reversing last rounds, increase # rounds pi = Keccak-p[1600] with # rounds 6666 : Achouffe configuration Disadvantage of Kravatte: 200-byte granularity

21 / 35

slide-30
SLIDE 30

Kravatte as in TOSC 2018

Due to theoretical attack reversing last rounds, increase # rounds pi = Keccak-p[1600] with # rounds 6666 : Achouffe configuration Disadvantage of Kravatte: 200-byte granularity

21 / 35

slide-31
SLIDE 31

Kravatte as in TOSC 2018

f m0 k f m1 k … f i mi k f z0 k′ f z1 k′ … f j zj k′ K∥10∗ f

i+2

f

Due to theoretical attack reversing last rounds, increase # rounds pi = Keccak-p[1600] with # rounds 6666 : Achouffe configuration Disadvantage of Kravatte: 200-byte granularity

21 / 35

slide-32
SLIDE 32

by Perrie Nicholas Smith (perriesmith.deviantart.com)

22 / 35

slide-33
SLIDE 33

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-34
SLIDE 34

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-35
SLIDE 35

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-36
SLIDE 36

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-37
SLIDE 37

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-38
SLIDE 38

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-39
SLIDE 39

Gimli [Bernstein, Kölbl, Lucks, Massolino, Mendel, Nawaz, Schneider,

Schwabe, Standaert, Todo, Viguier, CHES 2017]

has ideal size and shape: 48 bytes in 12 words of 32 bits fits in registers of ARM Cortex M3/M4 and suitable for SIMD For low-end platforms: locality of operations

minimizes swapping on AVR, M0, etc. limits diffusion, see e.g. [Mike Hamburg, 2017] no problem for nominal number of rounds: 24 not clear how many rounds needed in Farfalle

23 / 35

slide-40
SLIDE 40

Xoodoo · [noun, mythical] · /zu: du:/ · Alpine mammal that lives in compact herds, can survive avalanches and is appreciated for the wide trails it creates in the

  • landscape. Despite its fluffy appearance it is very ro-

bust and does not get distracted by side channels.

24 / 35

slide-41
SLIDE 41

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-42
SLIDE 42

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-43
SLIDE 43

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-44
SLIDE 44

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-45
SLIDE 45

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-46
SLIDE 46

Xoodoo [Keccak team with Seth Hoffert and Johan De Meulder]

https://github.com/XoodooTeam/Xoodoo

384-bit permutation Main purpose: usage in Farfalle: XooPRF

Achouffe configuration linear full-state rolling function of order 2384 − 1 Efficient on wide range of platforms

But also for

small-state authenticated encryption, Ketje style sponge-based hashing, …

Keccak-p philosophy ported to Gimli dimensions 3 × 4 × 32!

25 / 35

slide-47
SLIDE 47

Xoodoo state

x y z state

State: 3 horizontal planes each consisting of 4 lanes

26 / 35

slide-48
SLIDE 48

Xoodoo state

x y z plane

State: 3 horizontal planes each consisting of 4 lanes

26 / 35

slide-49
SLIDE 49

Xoodoo state

x y z lane

State: 3 horizontal planes each consisting of 4 lanes

26 / 35

slide-50
SLIDE 50

Xoodoo state

x y z column

State: 3 horizontal planes each consisting of 4 lanes

26 / 35

slide-51
SLIDE 51

Xoodoo round function

θ ρwest χ ρeast

Iterated: nr rounds that differ only by round constant

27 / 35

slide-52
SLIDE 52

Nonlinear mapping χ

Effect on one plane: 1 2

complement

χ as in Keccak-p, operating on 3-bit columns Involution and same propagation differentially and linearly

28 / 35

slide-53
SLIDE 53

Mixing layer θ

+ =

column parity θ-effect fold

Column parity mixer: compute parity, fold and add to state good average diffusion, identity for states in kernel

29 / 35

slide-54
SLIDE 54

Mixing layer θ

+ =

column parity θ-effect fold

Column parity mixer: compute parity, fold and add to state good average diffusion, identity for states in kernel

29 / 35

slide-55
SLIDE 55

Mixing layer θ

+ =

column parity θ-effect fold

Column parity mixer: compute parity, fold and add to state good average diffusion, identity for states in kernel

29 / 35

slide-56
SLIDE 56

Mixing layer θ

+ =

column parity unfold θ-effect

Column parity mixer: compute parity, fold and add to state good average diffusion, identity for states in kernel

29 / 35

slide-57
SLIDE 57

Plane shift ρeast

1 2

shift (2,8) shift (0,1)

After χ and before θ Shifts planes y = 1 and y = 2 over different directions

30 / 35

slide-58
SLIDE 58

Plane shift ρwest

1 2

shift (0,11) shift (1,0)

After θ and before χ Shifts planes y = 1 and y = 2 over different directions

31 / 35

slide-59
SLIDE 59

Xoodoo pseudocode

.

nr rounds from i = 1 − nr to 0, with a 5-step round function: θ : P ← A0 + A1 + A2 E ← P ≪ (1, 5) + P ≪ (1, 14) Ay ← Ay + E for y ∈ {0, 1, 2} ρwest : A1 ← A1 ≪ (1, 0) A2 ← A2 ≪ (0, 11) ι : A0,0 ← A0,0 + rci χ : B0 ← A1 · A2 B1 ← A2 · A0 B2 ← A0 · A1 Ay ← Ay + By for y ∈ {0, 1, 2} ρeast : A1 ← A1 ≪ (0, 1) A2 ← A2 ≪ (2, 8)

32 / 35

slide-60
SLIDE 60

Xoodoo software performance

width cycles/byte per round ARM Intel bytes Cortex M3 Skylake Keccak-p[1600] 200 2.44 0.080 ChaCha 64 0.69 0.059 Gimli 48 0.91 0.074∗ Xoodoo 48 1.20 0.083

∗ on Intel Haswell

33 / 35

slide-61
SLIDE 61

Xoodoo diffusion and confusion

Trail bounds, using [Mella, Daemen, Van Assche, ToSC 2016]:

  • min. trail weights

# rounds diff. linear 1 2 2 2 8 8 3 36 36 6 ≥ 100 ≥ 100 Strict Avalanche Criterion (SAC) [Webster, Tavares, Crypto ’85] A mapping satisfies SAC if flipping an input bit will make each output bit flip with probability close to 1/2 Xoodoo satisfies SAC after 3 rounds in forward direction after 2 rounds in backward direction

34 / 35

slide-62
SLIDE 62

Xoodoo diffusion and confusion

Trail bounds, using [Mella, Daemen, Van Assche, ToSC 2016]:

  • min. trail weights

# rounds diff. linear 1 2 2 2 8 8 3 36 36 6 ≥ 100 ≥ 100 Strict Avalanche Criterion (SAC) [Webster, Tavares, Crypto ’85] A mapping satisfies SAC if flipping an input bit will make each output bit flip with probability close to 1/2 Xoodoo satisfies SAC after 3 rounds in forward direction after 2 rounds in backward direction

34 / 35

slide-63
SLIDE 63

Xoodoo diffusion and confusion

Trail bounds, using [Mella, Daemen, Van Assche, ToSC 2016]:

  • min. trail weights

# rounds diff. linear 1 2 2 2 8 8 3 36 36 6 ≥ 100 ≥ 100 Strict Avalanche Criterion (SAC) [Webster, Tavares, Crypto ’85] A mapping satisfies SAC if flipping an input bit will make each output bit flip with probability close to 1/2 Xoodoo satisfies SAC after 3 rounds in forward direction after 2 rounds in backward direction

34 / 35

slide-64
SLIDE 64

Xoodoo diffusion and confusion

Trail bounds, using [Mella, Daemen, Van Assche, ToSC 2016]:

  • min. trail weights

# rounds diff. linear 1 2 2 2 8 8 3 36 36 6 ≥ 100 ≥ 100 Strict Avalanche Criterion (SAC) [Webster, Tavares, Crypto ’85] A mapping satisfies SAC if flipping an input bit will make each output bit flip with probability close to 1/2 Xoodoo satisfies SAC after 3 rounds in forward direction after 2 rounds in backward direction

34 / 35

slide-65
SLIDE 65

Thanks for your attention!

θ ρwest χ ρeast

35 / 35