Putting wings on SPHINCS PQCRYPTO Conference Stefan K olbl April - - PowerPoint PPT Presentation

putting wings on sphincs
SMART_READER_LITE
LIVE PREVIEW

Putting wings on SPHINCS PQCRYPTO Conference Stefan K olbl April - - PowerPoint PPT Presentation

Putting wings on SPHINCS PQCRYPTO Conference Stefan K olbl April 10th, 2018 Technical University of Denmark, Cybercrypt SPHINCS SPHINCS Hash-based signature scheme Stateless 128-bit post-quantum security Sizes: Public


slide-1
SLIDE 1

Putting wings on SPHINCS

PQCRYPTO Conference

Stefan K¨

  • lbl

April 10th, 2018

Technical University of Denmark, Cybercrypt

slide-2
SLIDE 2

SPHINCS

SPHINCS

  • Hash-based signature scheme
  • Stateless
  • 128-bit post-quantum security
  • Sizes:
  • Public Key: 1KB
  • Secret Key: 1KB
  • Signature: 41KB

https://sphincs.cr.yp.to/

1

slide-3
SLIDE 3

How to instantiate SPHINCS?

1

slide-4
SLIDE 4

SPHINCS

Main components:

  • One-time Signature (WOTS)
  • Few-time Signature (HORST)
  • Merkle-Tree

2

slide-5
SLIDE 5

SPHINCS

Message . . . 32x

HORST

Level 1 Level 2 Level 12

3

slide-6
SLIDE 6

SPHINCS

pk . . . pk pk . . . pk pk OTSsign OTSsign . . . 4

slide-7
SLIDE 7

SPHINCS

What is computed?

  • Many calls to a hash function...
  • ...but using short input only.

f f f f

5

slide-8
SLIDE 8

SPHINCS

For one signature

  • ≈450.000 times F
  • ≈90.000 times H

{0, 1}512 {0, 1}256 {0, 1}256 {0, 1}256 H F

6

slide-9
SLIDE 9

Cryptographic Hash Functions

Which hash function could we use?

  • Standards
  • SHA256
  • SHA-3
  • ChaCha12 permutation
  • Keccak
  • Haraka
  • Simpira

7

slide-10
SLIDE 10

Cryptographic Hash Functions

SHA-2 (FIPS PUB 180-4)

  • 512-bit Message Blocks
  • Padding...

IV f M1 f M2 f Mn h1 hn+1

8

slide-11
SLIDE 11

Cryptographic Hash Functions

SHA3-256 (FIPS PUB 202)

  • 1600-bit Permutation
  • 1088-bit Message Blocks

r c π M0 π M1 π M2 π h0 π h1 h

9

slide-12
SLIDE 12

Cryptographic Hash Functions

Other Keccak variants:

  • Use 800-bit permutation?
  • Use less rounds

(Kangaroo121).

  • Best preimage attack on 4

rounds2.

0see https://eprint.iacr.org/2016/770 0Linear Structures: Applications to Cryptanalysis of Round-Reduced Keccak, Asiacrypt 2016

10

slide-13
SLIDE 13

Cryptographic Hash Functions

ChaCha12

  • Suggested in SPHINCS paper.
  • Use ChaCha12 permutation in sponge.
  • Great software performance with vectorization.

11

slide-14
SLIDE 14

Cryptographic Hash Functions

Haraka: A short-input hash function3

  • Permutation based on AES rounds.
  • SPN construction.
  • 256- and 512-bit permutation.

x π H(x)

trunc

3https://eprint.iacr.org/2016/098

12

slide-15
SLIDE 15

Cryptographic Hash Functions

Simpira4

  • Permutation based on AES rounds.
  • Feistel construction.
  • 256- and 512-bit permutation.

x π H(x)

trunc

4https://eprint.iacr.org/2016/122

13

slide-16
SLIDE 16

Microarchitectures

SPHINCS not well suited for small devices5

  • Signature size larger than RAM for some devices.
  • Computational costs for signing high...
  • ... but verification is cheap.

Focus on highend platforms:

  • Intel Haswell/Skylake, AMD Ryzen
  • ARM Cortex A57/A72

5see https://eprint.iacr.org/2015/1042

14

slide-17
SLIDE 17

Microarchitectures

How to get a fast implementation?

  • Vectorization (AVX2, NEON,

AVX-512)

  • Hardware Support (AES, SHA-2,

SHA-3)

  • Utilize pipeline

15

slide-18
SLIDE 18

Microarchitectures

Vector Instructions X0 ⊕ Y0 = Z0 X1 ⊕ Y1 = Z1 X2 ⊕ Y2 = Z2 X3 ⊕ Y3 = Z3 X4 ⊕ Y4 = Z4 X5 ⊕ Y5 = Z5 X6 ⊕ Y6 = Z6 X7 ⊕ Y7 = Z7

  • Apply same operation on all elements of the vector.
  • Use independet inputs.

16

slide-19
SLIDE 19

Microarchitectures

Pipelining

  • Latency
  • Inverse Throughput

Cycles aesenc aesenc aesenc Laesenc 17

slide-20
SLIDE 20

Microarchitectures

Pipelining

  • Latency
  • Inverse Throughput

Cycles aesenc aesenc aesenc aesenc aesenc aesenc T −1

aesenc

17

slide-21
SLIDE 21

Microarchitectures

Pipelining

  • Latency
  • Inverse Throughput

Cycles aesenc aesenc aesenc aesenc aesenc aesenc T −1

aesenc

aesenc aesenc aesenc 17

slide-22
SLIDE 22

Platforms

Performance varies a lot depending on the platform Platform Instruction Latency

  • inv. Throughput

Skylake vectorized XOR 1 0.33 Ryzen vectorized XOR 1 0.5 Cortex A57 vectorized XOR 3 2

18

slide-23
SLIDE 23

Implementations

How to implement those functions efficiently?

  • SHA-2
  • Keccak[b = 800]
  • ChaCha12
  • Haraka
  • Simpira

19

slide-24
SLIDE 24

Implementations

How to implement those functions efficiently?

  • SHA-2
  • 32-bit word oriented
  • Vectorize
  • Hardware Support
  • Keccak[b = 800]
  • ChaCha12
  • Haraka
  • Simpira

19

slide-25
SLIDE 25

Implementations

How to implement those functions efficiently?

  • SHA-2
  • Keccak[b = 800]
  • 32-bit word oriented
  • Vectorize
  • ChaCha12
  • Haraka
  • Simpira

19

slide-26
SLIDE 26

Implementations

How to implement those functions efficiently?

  • SHA-2
  • Keccak[b = 800]
  • ChaCha12
  • 32-bit word oriented
  • Vectorize
  • Haraka
  • Simpira

19

slide-27
SLIDE 27

Implementations

How to implement those functions efficiently?

  • SHA-2
  • Keccak[b = 800]
  • ChaCha12
  • Haraka
  • AES + permute
  • Simpira

19

slide-28
SLIDE 28

Implementations

How to implement those functions efficiently?

  • SHA-2
  • Keccak[b = 800]
  • ChaCha12
  • Haraka
  • Simpira
  • AES

19

slide-29
SLIDE 29

Tour de SPHINCS

19

slide-30
SLIDE 30

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

20

slide-31
SLIDE 31

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 Haraka Keccak SHA-256 Simpira

20

slide-32
SLIDE 32

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 Haraka Keccak SHA-256 142.06 Simpira

20

slide-33
SLIDE 33

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 Haraka Keccak 108.62 SHA-256 142.06 Simpira

20

slide-34
SLIDE 34

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 43.49 Haraka Keccak 108.62 SHA-256 142.06 Simpira

20

slide-35
SLIDE 35

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 43.49 Haraka Keccak 108.62 SHA-256 142.06 Simpira 28.40

20

slide-36
SLIDE 36

Tour de SPHINCS

Intel Skylake

  • AVX2 (256-bit vector)
  • AES-NI

Signing (million cycles) Design Skylake ChaCha12 43.49 Haraka 20.78 Keccak 108.62 SHA-256 142.06 Simpira 28.40

20

slide-37
SLIDE 37

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

21

slide-38
SLIDE 38

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 Haraka Keccak SHA-256 Simpira

21

slide-39
SLIDE 39

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 Haraka Keccak 189.98 SHA-256 Simpira

21

slide-40
SLIDE 40

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 63.42 Haraka Keccak 189.98 SHA-256 Simpira

21

slide-41
SLIDE 41

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 63.42 Haraka Keccak 189.98 SHA-256 53.33 Simpira

21

slide-42
SLIDE 42

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 63.42 Haraka Keccak 189.98 SHA-256 53.33 Simpira 20.43

21

slide-43
SLIDE 43

Tour de SPHINCS

AMD Ryzen

  • AVX2 (256-bit vector)
  • AES-NI (2 ports)
  • SHA256 instructions

Signing (million cycles) Design Ryzen ChaCha12 63.42 Haraka 15.54 Keccak 189.98 SHA-256 53.33 Simpira 20.43

21

slide-44
SLIDE 44

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

22

slide-45
SLIDE 45

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 Haraka Keccak SHA-256 Simpira

22

slide-46
SLIDE 46

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 Haraka Keccak 376.90 SHA-256 Simpira

22

slide-47
SLIDE 47

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 193.51 Haraka Keccak 376.90 SHA-256 Simpira

22

slide-48
SLIDE 48

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 193.51 Haraka Keccak 376.90 SHA-256 92.08 Simpira

22

slide-49
SLIDE 49

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 193.51 Haraka Keccak 376.90 SHA-256 92.08 Simpira 63.48

22

slide-50
SLIDE 50

Tour de SPHINCS

ARM Cortex A57

  • NEON (128-bit vector)
  • AES
  • SHA256 support

Signing (million cycles) Design Cortex A57 ChaCha12 193.51 Haraka 47.10 Keccak 376.90 SHA-256 92.08 Simpira 63.48

22

slide-51
SLIDE 51

Formula SPHINCS

Hash Performance for F

2 4 6 8 10 12 14 16 18 20 Skylake Ryzen Cortex-A57 Cycles per Byte ChaCha Haraka Keccak SHA256 Simpira 1.71 2.73 7.3 0.63 0.39 1.08 4.11 6.94 16.71 5.52 2.44 3.91 0.94 0.49 1.85

23

slide-52
SLIDE 52

Formula SPHINCS

Hash Performance for H

1 2 3 4 5 6 7 8 9 10 11 Skylake Ryzen Cortex-A57 Cycles per Byte ChaCha Haraka Keccak SHA256 Simpira 1.71 2.73 7.15 0.72 0.48 1.44 2.20 3.55 8.68 2.58 1.13 1.82 0.94 0.49 1.51

24

slide-53
SLIDE 53

NIST PQ Competition

Two variants of SPHINCS in NIST PQ competition:

  • Gravity-SPHINCS
  • Results directly apply.
  • Already uses Haraka.
  • SPHINCS+
  • Tweakable Hash.
  • Needs to process slightly larger inputs.

25

slide-54
SLIDE 54

Conclusion

Summary

  • Gap between fastest and slowest up to 10x.
  • Verification can be really fast, e.g. 258.660 cycles on

Ryzen. Future Platforms:

  • Larger Vectors (AVX512)
  • Vectorized AES (Intel Icelake)
  • SHA-3 instructions (ARMv8.4-a)

26

slide-55
SLIDE 55

Questions? https://github.com/kste/sphincs

26