Part I: Introduction to Post Quantum Cryptography Tutorial@CHES - - PowerPoint PPT Presentation

part i introduction to post quantum cryptography
SMART_READER_LITE
LIVE PREVIEW

Part I: Introduction to Post Quantum Cryptography Tutorial@CHES - - PowerPoint PPT Presentation

Part I: Introduction to Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Gneysu Ruhr-Universitt Bochum & DFKI 04.10.2017 Overview Goals Provide a high-level introduction to Post-Quantum Cryptography (PQC)


slide-1
SLIDE 1

Part I: Introduction to Post Quantum Cryptography

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

slide-2
SLIDE 2
  • Goals

– Provide a high-level introduction to Post-Quantum Cryptography (PQC) – Introduce selected implementation details (HW/SW) for some PQC classes (Focus: Encryption) – Highlight open challenges for PQC schemes

  • Topics/Parts
  • 1. Introduction to PQC
  • 2. Hardware Implementation of PQC
  • 3. (Embedded) Software Implementation of PQC

Overview

slide-3
SLIDE 3

Tutorial Outline – Part I

  • Introduction
  • Classes of Post-Quantum Cryptography (PQC)

– Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography

  • Lessons Learned
slide-4
SLIDE 4

Long-Term Security in Embedded Devices

  • For many today‘s applications

and systems long-term security is an essential requirement

  • Many processing platforms

have tight constraints with their computational ressources

10-30 years > 15 years 10 years 5-25 years

slide-5
SLIDE 5

Security of Practical Cryptographic Primitives

  • Cryptosystems must combine security and efficiency
  • Embedded devices mostly deploy standardized cryptography

– Symmetric encryption: Advanced Encryption Standard – Asymmetric encryption: RSA (Factorization Problem), ElGamal or Elliptic Curve Cryptography (DLOG Problem)

  • No „hard“ security guarantees are

available for these real-world cryptosystems

  • Common practise : Parameters chosen to

resist best known (cryptanalytic) attack

slide-6
SLIDE 6

Best Attacks on Cryptosystems

  • Attacks on symmetric cryptosystems

– Modern symmetric ciphers follow well-understood principles – For „solid“ ciphers best attack is exhaustive key search – Scaling key sizes to achieve long-term security

  • Attacks on asymmetric cryptosystems

– Virtually all asymmetric cryptosystems are based on factorization or DLOG problem – Best attacks with subexponential complexity

  • General Number Field Sieve (on RSA)
  • Index Calculus (on DLOG)
slide-7
SLIDE 7

Key Size Recommendations

  • Security parameters assuming today‘s algorithmic knowledge

and computing capabilities of a powerful attacker (e.g. NSA)

Source: ECRYPT II Yearly Key Size Report

Short-term security (days to months) Mid-term security (years to decades) Long-term security (many years)

(symmetric)

slide-8
SLIDE 8
  • General problem: RSA & DLOG-based

cryptosystems are closely related

  • A breakthrough in classical

cryptanalysis is likely to affect both PKC classes

  • Further problem:

powerful quantum computers

Public-Key Cryptography and Long-Term Security

slide-9
SLIDE 9

Alternatives for Public-Key Cryptography

  • Research on alternative public-key

cryptosystems is required  NIST Call for PQC (Nov 30)

  • Foundation on

NP-hard problems?

  • No polynomial-time attacks

(such as Grover‘s/Shor‘s alg.) with quantum computers

  • Efficiency in implementations

comparable to currently employed cryptosystems

slide-10
SLIDE 10

Post-Quantum Cryptography

  • Definition

– Class of cryptographic schemes based on the classical computing paradigm – Designed to provide security in the era

  • f powerful quantum

computers

  • Important:

– PQC ≠ quantum cryptography!

slide-11
SLIDE 11

Post-Quantum Cryptography - Categories

  • Five main branches
  • f post-quantum crypto:

– Code-based – Lattice-based – Hash-based – Multivariate-quadratic – Supersingular isogenies

  • Should support public-key encryption

and/or digital signatures

slide-12
SLIDE 12
  • CHES has a long tradition on the implementation
  • f PQC cryptosystems:

– CHES 2001: Bailey et al.: NTRU in Small Devices – CHES 2004: Yang et al. : TTS on SmartCards – CHES 2008: Bogdanov et al.: MQ-Cryptosystems in HW – CHES 2009: Eisenbarth et al.: MicroEliece – CHES 2011: Session on Lattice-based attacks (3 papers) – CHES 2012: High-Performance McEliece+MQ+Lattices; GLS-Cryptosystem – CHES 2013: McBits + QC-MDPC McEliece Implementations – CHES 2014: RingLWE + Lattice-based Signature Implementations – CHES 2015: Session on Lattice crypto (2 papers), Homomorphic Encryption – CHES 2016: QcBits, Fault-Attack on BLISS signature scheme – CHES 2017: Tomorrow‘s session on PQC (3 papers)

CHES History in PQC

slide-13
SLIDE 13

Research Directions in PQC

  • Propose novel robust and failure-proof

cryptographic constructions

  • Efficient constant-time implementation

techniques and algorithmic tweaks

  • Physical resistance against side-channel

analysis and fault-injection attacks

  • Improve cryptanalysis to foster

confidence considering potential attacks

  • Identify secure parameters against

attacks from quantum-computers

  • Compatible implementations for IoT

devices, Internet infrastructures and Cloud services

Implementierungsaspekte alternativer asymmetrischer Kryptosysteme

ICT-644729

(2015-2019) (2015-2019) (2012-2017)

+ more

slide-14
SLIDE 14

Outline

  • Introduction
  • Classes of Post-Quantum Cryptography (PQC)

– Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography

  • Lessons Learned
slide-15
SLIDE 15

Introduction to Code-based Cryptography

  • Error-Correcting Codes are well-known in a large variety of

applications

  • Detection/Correction of errors in noisy channels by adding

redundancy

  • Observation:

Some problems in code-based theory are NP-complete  Possible foundation for Code-Based Cryptosystems (CBC)

slide-16
SLIDE 16

Linear Codes and Cryptography

  • Linear codes: Error correcting codes for which

redundancy depends linearly on the information

  • Generator and parity check matrices for encoding and

decoding

  • Rows of G form a basis for the code C[n,

n, k, d] d] of length n n with dimension k k and minimum distance d

  • Matrices can be in systematic form minimizing

time/storage

Matrix size of G: k x n

slide-17
SLIDE 17

Linear Codes and Cryptography

  • Parity check matrix H is a (n-k) ∙ k matrix orthogonal to G
  • Defines the dual C of the code C via scalar product
  • A codeword c ∈ C if and only if Hc = 0
  • The term s = Hc’ = Hc + He is the syndrome of the error
slide-18
SLIDE 18

Syndrome Decoding Problem

  • Input given

– H : parity check matrix of size (n - k) · n – s : vector of GF(2n-k) – t : positive integer (defined by error correction capability)

  • Problem: Is there a vector e in GF(

GF(2n) of weight w(e)≤ t s.t. H · eT = s

  • Syndrome decoding problem is NP-complete

– E.R. BERLEKAMP, R.J. MCELIECE and H.C. VAN TILBORG On the inherent intractability of certain coding problems. IEEE Transactions on Information Theory, 24(3), May 1978.

slide-19
SLIDE 19

Decryption Let Ψ𝐼 be a 𝑢-error-correcting decoding algorithm. P𝑛𝑈 ← Ψ𝐼 𝑇−1 · 𝑦 Extract 𝑛 by transposing the computation P−1 · P𝑛𝑈. Encryption Encode the message 𝑛 into an error vector 𝑓 ∈𝑆 𝐺

2 𝑜, 𝑥𝑢 𝑓 ≤ 𝑢

x ← 𝐼 · 𝑓𝑈

Niederreiter Encryption Scheme [1986]

Key Generation Given a code C[n, k, d] with parity check matrix H and error correcting capability t Private Key: (𝑇, 𝐼, 𝑄), where S is a scrambling and P a permutation matrix Public Key: 𝐼 = 𝑇 · 𝐼 · 𝑄

slide-20
SLIDE 20

Decryption Let Ψ𝐼 be a 𝑢-error-correcting decoding algorithm. S𝑛 ← Ψ𝐼 𝑦 · P−1 removing the error e Extract 𝑛 by computing S−1 · S𝑛 Encryption Message 𝑛 ∈ 𝐺

2 𝑜−𝑠 , error vector 𝑓 ∈𝑆 𝐺 2 𝑜, 𝑥𝑢 𝑓 ≤ 𝑢

x ← 𝑛 𝐻 + 𝑓

McEliece Encryption Scheme [1978]

Key Generation Given a code C[n, k, d] with generator matrix G and error correcting capability t Private Key: (𝑇, 𝐻, 𝑄), where S is a scrambling and P a permutation matrix Public Key: 𝐻 = 𝑇 · 𝐻 · 𝑄

slide-21
SLIDE 21

Code-based Encryption Schemes*

McEliece [M78] Niederreiter [N86]

Taxonomy of Code-Based Encryption Schemes

Generalized Reed-Solomon Goppa Reed Muller Concatenated LRPC/LDCP/MDPC Srivastava Elliptic Rank-Metric

* This is a selection based on presenter‘s choice.

slide-22
SLIDE 22

Code-based Encryption Schemes*

McEliece [M78] Niederreiter [N86]

Taxonomy of Code-Based Encryption Schemes

Generalized Reed-Solomon Goppa Reed Muller Concatenated Srivastava Elliptic LRPC/LDCP/MDPC

* This is a selection based on presenter‘s choice.

Rank-Metric

slide-23
SLIDE 23

Key Aspects of Code-Based Cryptography

  • Focus on encryption, signature schemes are inefficient
  • Selection of the employed code is a highly critical issue

– Properties of code determine key size, matrices are often large – Structures in codes reduce key size, but might enable attacks – Encoding is fast on most platforms (matrix multiplication) – Decoding requires efficient techniques in terms of time and memory

  • Basic McEliece is only CPA-secure; conversion required
  • Protection against side-channel and fault-injection attacks

Encrypt Decrypt

Kpub=M (Matrix) y=Mx+e Kpriv y=Ψ(y, Kpriv) x y x y

slide-24
SLIDE 24

Outline

  • Introduction
  • Classes of Post-Quantum Cryptography (PQC)

– Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography

  • Conclusions
slide-25
SLIDE 25
  • Hard problem: Shortest/Closest Vector Problem

(SVP/CVP) in the worst case

  • Typically thought to be

– Unpractical but provably secure – Practical but without proof (GGH/NTRU) – Lately: Ideal lattices can potentially combine both

  • More constructions feasible beyond classical PKC:

hash functions, PRFs, identity-based encryption, homomorphic encryption

Lattice-based Cryptography – Basics

slide-26
SLIDE 26

Solving of a system of linear equations

Learning with Errors

4 1 11 10 5 5 9 53 3 9 10 1 3 3 2 12 7 3 4 6 5 11 4 3 3 5 4 8 1 10 4 12 9

× =

Blue is given; Find (learn) red  Solve linear system

6 9 11 11

ℤ13

7×4

ℤ13

4×1

ℤ13

7×1

secret

slide-27
SLIDE 27

Solving of a system of linear equations

Learning with Errors

4 1 11 10 5 5 9 53 3 9 10 1 3 3 2 12 7 3 4 6 5 11 4 3 3 5 4 8 1 10 4 12 9

× =

Blue is given; Find red  Learning with Errors (LWE) Problem

6 9 11 11

ℤ13

7×4

ℤ13

4×1

ℤ13

7×1

secret

  • 1

1 1 1

  • 1

+ ℤ13

7×1

random

small noise looks random

slide-28
SLIDE 28
  • Encryption and signature systems are both feasible (and secure)

– Significant ciphertext expansion for (R-)LWE encryption – Decryption error probability with (R-)LWE encryption

  • Random Sampling not only from uniform

but also from Discrete Gaussian distributions (not a trivial task!)

  • Most operations are efficient and parallizable

– (Ideal lattices) Make use of FFT for polynomial multiplication – (Standard lattices) Matrix-vector arithmetic

  • Reasonably large public and private keys

– Given for encryption/signatures constructions – Unclear for advanced services such as functional encryption (e.g., FHE)

Key Aspects of Lattice-based Systems

slide-29
SLIDE 29

Outline

  • Introduction
  • Classes of Post-Quantum Cryptography (PQC)

– Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography

  • Lessons Learned
slide-30
SLIDE 30

Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979)

  • Definition: Given a security parameter 𝑜, the set of 𝑜-bit vectors

𝑉𝑜 = {0,1}𝑜 and a one-way function ℎ: 𝑉𝑜 → 𝑉𝑜

  • Secret key: Generate 2𝑜 × 𝑜-bit vector

𝑌 = (𝑦 0,0 , 𝑦 0,1 , 𝑦 1,0 , 𝑦 1,1 , . . , 𝑦 𝑜−1,1 )

  • Public Key : Compute 𝑍 = 𝑧 0,0 , . . , 𝑧 𝑜−1,1

∀𝑧𝑗,𝑘 = 𝑔(𝑦𝑗,𝑘)

  • Publish public key Y

… = X x0 x1 x0 x1 x0 x1 x0 x1 x0 x1 h h h h h h h h h h … = Y y0 y1 y0 y1 y0 y1 y0 y1 y0 y1

slide-31
SLIDE 31

Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979)

  • Definition: Given a published public key 𝑍 and an

𝑜-bit message 𝑁 = (𝑛0, … , 𝑛𝑜−1) to sign

  • Sign: Generate signature 𝜏 = (𝑦 0,𝑛0 , . . , 𝑦 𝑜−1,𝑛𝑜−1 ) by

revealing corresponding 𝑦 𝑗,𝑛𝑗 secret bits.

  • Verify: Check that for f(𝜏𝑗) = 𝑧(𝑗,𝑛𝑗) ∀ 𝑗 = [0, 𝑜 − 1]

m0 m1 m2 mn-2 mn-1 … = 𝜏 x0 x1 x0 x1 x0 x1 x0 x1 x0 x1 r r r r r h h h h h … = Y y0 y1 y0 y1 y0 y1 y0 y1 y0 y1

=

!

slide-32
SLIDE 32

Extension for Multiple Use: Merkle‘s Signature Scheme

  • Idea by R. Merkle [1979]: reduces

the validity of many OTS verification keys to a single verification key using a binary tree

  • Properties and Requirements

– Max. signature count determined by height H of tree (fixed at setup) – Needs to keep track of already used signatures in the tree  stateful signature scheme – Can be used with any one-time signature scheme and (collision- resistant) cryptographic hash function

P K = V 3 [ ] V 2 [ ] V 2 [ 1 ] V 1 [ ] V 1 [ 1 ] V 1 [ 2 ] V 1 [ 3 ] V [ ] = 𝑕(𝑍0) V [ 1 ] = 𝑕(𝑍1) V [ 2 ] = 𝑕(𝑍2) V [ 3 ] = 𝑕(𝑍0) V [ 4 ] = 𝑕(𝑍4) V [ 5 ] = 𝑕(𝑍5) V [ 6 ] = 𝑕(𝑍6) V [ 7 ] = 𝑕(𝑍7)

Public MSS key Public OTS keys

slide-33
SLIDE 33

Merkle Signature Scheme Principle

  • Let 𝑕: {0,1}∗ → {0,1}𝑜 be a hash function with security parameter 𝑜
  • Fix height 𝐼 and generate 2𝐼 LD-OTS key pairs (𝑌𝑗, 𝑍

𝑗) with 0 ≤ 𝑗 < 2𝐼

  • Notation: 𝑊

𝑗 𝑘 with 0 ≤ 𝑗 ≤ 𝐼 and 0 ≤ 𝑘 < 2𝐼−𝑗

  • Computation rule for inner nodes: 𝑊

𝑗 𝑘 = g(𝑊 𝑗−1[2j] || 𝑊 𝑗−1[2j+1])

with 0 < 𝑗 ≤ H and 0 ≤ 𝑘 < 2𝑗

PK = V3[0] V2[0] V2[1] V1[0] V1[1] V1[2] V1[3]

V0[0] = 𝑕(𝑍

0)

V0[1] = 𝑕(𝑍

1)

V0[2] = 𝑕(𝑍

2)

V0[3] = 𝑕(𝑍

0)

V0[4] = 𝑕(𝑍

4)

V0[5] = 𝑕(𝑍

5)

V0[6] = 𝑕(𝑍

6)

V0[7] = 𝑕(𝑍

7)

(𝑌0, 𝑍

0)

(𝑌1, 𝑍

1)

(𝑌2, 𝑍

2)

(𝑌3, 𝑍

3)

(𝑌4, 𝑍

4)

(𝑌5, 𝑍

5)

(𝑌6, 𝑍

6)

(𝑌7, 𝑍

7)

Example: 𝐼 = 3

slide-34
SLIDE 34
  • Only signature schemes available, no encryption
  • Moderate requirements for implementations

– Second preimage (older schemes: collision) resistant hash function – Pseudorandom functions for OTS (XMSS)

  • Hard limitation on the number of signatures per tree

– Height of the tree determines max. # of signatures (issue with DoS attacks for real-world systems) – Requires track record of signatures already used (critical in untrusted environments!) – Increasing tree height increases memory requirements and computational complexity

Key Aspects of Hash-based Cryptographic Systems

slide-35
SLIDE 35

Outline

  • Introduction
  • Classes of Post-Quantum Cryptography (PQC)

– Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography

  • Lessons Learned
slide-36
SLIDE 36

Lessons Learned

  • Post-Quantum Cryptography essential for long-term security

– Code-based encryption schemes are the most mature candidates – Digital signatures from hash-based cryptography with high confidence respect to security and under standardization – Lattice-based cryptography has high potential and extremely high versatility

  • Next topics in this tutorial (selection due to time constraints)

– Efficient implementation strategies for Code-Based Cryptosystems – Efficient implementation of Lattice-Based Cryptosystems

ICT-644729

slide-37
SLIDE 37

Part I: Introduction to Post Quantum Cryptography

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

Thank you! Questions?

slide-38
SLIDE 38

Part II: Hardware Architectures for Post Quantum Cryptography

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

including slides by Ingo von Maurich and Thomas Pöppelmann

Tutorial@CHES 2017 - Tim Güneysu

slide-39
SLIDE 39

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part II

slide-40
SLIDE 40

Recall: McEliece Encryption Scheme [1978]

Key Generation Given a [𝑜, 𝑙]-code 𝐷 with generator matrix 𝐻 and error correcting capability 𝑢 Private Key: (𝑇, 𝐻, 𝑄), where 𝑇 is a scrambling and 𝑄 is a permutation matrix Public Key: 𝐻′ = 𝑇 · 𝐻 · 𝑄 Encryption Message 𝑛 ∈ 𝔾2

𝑙, error vector e ∈𝑆 𝔾2 𝑜, wt e ≤ 𝑢

x ← 𝑛𝐻′ + e Decryption Let Ψ𝐼 be a 𝑢-error-correcting decoding algorithm. 𝑛 · 𝑇 ← Ψ𝐼 𝑦 · 𝑄−1 , removes the error e · 𝑄−1 Extract 𝑛 by computing 𝑛 · 𝑇 · 𝑇−1

slide-41
SLIDE 41
  • Original proposal: McEliece with binary Goppa codes
  • Code properties determine key size, matrices are often large
  • Code parameters revisited by Bernstein, Lange and Peters
  • Public key is a 𝑙 ∗ (𝑜 − 𝑙) bit matrix (redundant part only)

Security Parameters (Binary Goppa Codes)

slide-42
SLIDE 42
  • Selection of the employed code is a highly critical issue

– Properties of code determine key size, short keys essential – Structures in codes reduce key size, but can enable attacks – Encoding is a fast operation on all platforms (matrix multiplication) – Decoding requires efficient techniques in terms of time and memory

  • Basic McEliece is only CPA-secure; conversion required
  • Protection against side-channel and fault-injection attacks

Code-based Cryptography for Embedded Devices

Encrypt Decrypt

Kpub=M (Matrix) y=Mx+e Kpriv y=Ψ(y, Kpriv) x y x y

slide-43
SLIDE 43
  • 𝑢-error correcting (𝑜, 𝑠, 𝑥)-QC-MDPC code of length 𝑜 = 𝑜0𝑠
  • Parity-check matrix 𝐼 consists of 𝑜0 blocks with fixed row weight 𝑥

Code/Key Generation 1. Generate 𝑜0 first rows of parity-check matrix blocks 𝐼𝑗 ℎ𝑗 ∈𝑆 𝐺

2 𝑠 of weight 𝑥𝑗, w = 𝑗=0 𝑜0−1𝑥𝑗

2. Obtain remaining rows by 𝑠 − 1 quasi-cyclic shifts of ℎ𝑗 3. 𝐼 = [𝐼0|𝐼1|… |𝐼𝑜0−1] 4. Generator matrix of systematic form 𝐻 = 𝐽𝑙 𝑅 Q = (𝐼𝑜0−1

−1

∗ 𝐼0)𝑈 (𝐼𝑜0−1

−1

∗ 𝐼1)𝑈 … (𝐼𝑜0−1

−1 ∗ 𝐼𝑜0−2)𝑈

Quasi-Cyclic Moderate Density Check Codes (QC-MDPC)

slide-44
SLIDE 44

Background on QC-MDPC Codes

I

Generator matrix 𝐻 Parity check matrix 𝐼 𝐼0 𝐼1

𝑜0 = 2

slide-45
SLIDE 45

Encryption Message 𝑛 ∈ 𝐺2

𝑙, error vector 𝑓 ∈𝑆 𝐺2 𝑜, 𝑥𝑢(𝑓) ≤ 𝑢

x ← 𝑛𝐻 + 𝑓 Decryption Let Ψ𝐼 be a 𝑢-error-correcting (QC-)MDPC decoding algorithm. 𝑛𝐻 ← Ψ𝐼 𝑛𝐻 + 𝑓 Extract 𝑛 from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] 𝑜0 = 2, 𝑜 = 9602, 𝑠 = 4801, 𝑥 = 90, 𝑢 = 84

(QC-)MDPC McEliece

slide-46
SLIDE 46

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part II

slide-47
SLIDE 47
  • Two Operations

– Encryption/Encoding:

  • Matrix-vector multiplication

(with large matricies, either to be stored or to be generated on-the-fly);

  • TRNG for error generation

– Decryption/Decoding:

  • Code- specific syndrome decoding;

hard-decision decoding with simple (bitwise) operations preferred

  • Inverse-matrix-vector multiplication

Hardware Implementation of Building Blocks for McEliece/Niederreiter

G

codeword ciphertext message

slide-48
SLIDE 48

Efficient Decoding of MDPC Codes

Decoders for LDPC/MDPC codes: bit flipping and belief propagation

“Bit-Flipping” Decoder 1. Compute syndrome 𝑡 of the ciphertext 2. Count unsatisfied parity-check-equations #𝑣𝑞𝑑 for each ciphertext bit 3. Flip ciphertext bits that violate ≥ 𝑐 equations 4. Recompute syndrome 5. Repeat until 𝑡 = 0 or reaching max. iterations (decoding failure)

  • How to determine threshold 𝑐 ?
  • Precompute 𝑐𝑗 for each iteration [Gal62]
  • 𝑐 = 𝑛𝑏𝑦𝑣𝑞𝑑 [HP03]
  • 𝑐 = 𝑛𝑏𝑦𝑣𝑞𝑑 − δ [MTSB13]
slide-49
SLIDE 49

Target: Xilinx Spartan-6 FPGA Scheme: QC-MDPC Encryption

  • Given first 4801-bit row 𝑕 of 𝐻 and message 𝑛,

compute 𝑦 = 𝑛𝐻 + 𝑓

  • Storage requirements
  • One 18 kBit BRAM is sufficient to store message m,

row 𝑕 and the redundant part (3x4801-bit vectors)

  • But only two data ports are available
  • Read out 32-bit of the message and store them

in a separate register

  • Error addition
  • Instead of starting with an all-zero redundant part we preload it with

the second half of the error vector

FPGA Low-Resource Encryption

Control + XOR

m G redundan t part

m BRAM

32 flip flops

slide-50
SLIDE 50

QC-MDPC Decryption

  • Secret key and ciphertext consist of two blocks
  • Iterative vs. parallel design
  • Decoding is complex task → parallel processing
  • BRAM-based implementation: storage requirements
  • Secret key (2x4801 bit)
  • Ciphertext (2x4801 bit)
  • Syndrome (4801 bit)
  • In total 3 BRAMs due to memory and port access requirements

FPGA Low-Resource Decryption

slide-51
SLIDE 51

QC-MDPC Decryption

  • Syndrome computation 𝑡 = 𝐼𝑦𝑈
  • Similar technique as for encoding
  • Compare 𝑡 = 𝟏?
  • Compute binary OR of all 32-bit blocks of the syndrome
  • Count #𝑣𝑞𝑑
  • Hamming weight of syndrome AND ℎ0/ℎ1 (32-bit at a time)
  • Accumulate Hamming weight
  • Bit-flipping
  • If #𝑣𝑞𝑑 ≥ 𝑐𝑗 invert ciphertext bit(s) and XOR ℎ0/ℎ1 to the

syndrome while rotating both

FPGA Low-Resource Decryption

slide-52
SLIDE 52
  • Post-PAR for Xilinx Spartan-6 XC6SLX4 & Virtex-6 XC6VLX240T
  • Encryption takes 735,000 cycles
  • Decryption takes 4,274,000 cycles on average

Lightweight FPGA Results

slide-53
SLIDE 53
  • Realistic public key size (0.6 kByte vs. 50-100 kByte)
  • Smallest McEliece FPGA implementation
  • Sufficient performance for many applications

Lightweight FPGA Comparison

slide-54
SLIDE 54

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part II

slide-55
SLIDE 55
  • Recall: Benefits of Lattice-Based Cryptography

– We can get signatures and public key encryption from lattices and also more advanced services (IBE, FHE) – A lot of development on theory side; schemes are improving – Implementation of lattice-based cryptography is a young field;

  • nly done for a few years (except maybe for NTRU)

Lattice-Based Cryptography

slide-56
SLIDE 56
  • Operations on large matrices

(e.g., 532x840)

  • Mostly matrix-vector multiplication modulo 𝑟 < 232
  • Large public keys (e.g., 532x840 matrix)

To be Ideal or not Ideal?

 Ideal Lattices

  • Operations on polynomials with 256 or

512 coefficients

  • Mostly polynomial multiplication modulo

𝑟 < 232

  • Public keys are one (or two) polynomials

with 256 or 512 coefficients

 Random Lattices

Two important lines of research: random lattices and ideal lattices

  • Major impact on implementation (theory not that much)
  • Security for random lattices is better understood

(ideal lattices are more structured)

slide-57
SLIDE 57

Solving of a system of linear equations

Learning with Errors

4 1 11 10 5 5 9 53 3 9 10 1 3 3 2 12 7 3 4 6 5 11 4 3 3 5 4 8 1 10 4 12 9

× =

Blue is given; Find (learn) red  Solve linear system

6 9 11 11

ℤ13

7×4

ℤ13

4×1

ℤ13

7×1

secret

slide-58
SLIDE 58

Solving of a system of linear equations

Learning with Errors

4 1 11 10 5 5 9 53 3 9 10 1 3 3 2 12 7 3 4 6 5 11 4 3 3 5 4 8 1 10 4 12 9

× = Blue is given; Find red  Learning with errors

6 9 11 11

ℤ13

7×4

ℤ13

4×1

ℤ13

7×1

secret

  • 1

1 1 1

  • 1

+ ℤ13

7×1

random

small noise looks random

slide-59
SLIDE 59

From learning with errors to ring-learning with errors

(Ring) Learning with Errors

4 1 11 10 3 4 1 11 2 3 4 1 12 2 3 4 9 12 2 3 10 9 12 2 11 10 9 12

ℤ13

7×4

  • Shift first line on every line
  • Use rule that we negate x in

case of wrap around (e.g., 10 ⇒ −10 ≡ 3 mod 13)

4 1 11 10

Only one line has to be stored

slide-60
SLIDE 60

Ring Learning with Errors: Principle

1

  • 2

… 1 …

+ =

32 43 … 12

random

small secret (Gaussian) small error (Gaussian)

random

𝒕 𝒇

  • Ideal lattices correspond to ideals in

the ring R =

𝑎𝑟 𝑦 𝑦𝑜+1

  • Ring Learning With Errors (RLWE)

sample is: 𝐮 = 𝒃𝒕 + 𝒇 ∈ 𝑆 for uniform 𝒃 ∈ R and small discrete Gaussian distributed 𝒕, 𝒇 ← 𝐸𝜏

– Search-RLWE: Find s when given 𝐮 and 𝐛 – Decision-RLWE: Distinguish 𝐮 from uniform when given 𝐮 and 𝐛 34 23 … 23

×

𝒃

slide-61
SLIDE 61

Example: Polynomial Addition in R =

𝑎𝒓 𝑦 𝑦𝒐+1

  • Assume ring R =

𝑎𝒓 𝑦 𝑦𝒐+1

  • Assume parameters 𝑟 = 5 and 𝑜 = 4
  • 𝒘 = 4𝑦3 + 2𝑦2 + 0𝑦1 + 1

= (4,2,0,1)

  • 𝐥 = 2𝑦3 + 1𝑦2 + 4𝑦1 + 0

= 2,1,4,0

  • 𝒕 = 𝒘 + 𝒍 = 4 + 2 mod 5,2 + 1,4,1 = (1,3,4,1)

𝒕 𝒍 𝒘

slide-62
SLIDE 62

Example: Polynomial Multiplication in R =

𝑎𝒓 𝑦 𝑦𝒐+1

  • 𝒍 = 2, 1, 4, 0
  • 𝒕 = 1, 3, 4, 1
  • Task: 𝒜 = 𝒕 ∗ 𝒍 = (3, 0, 2, 0)
slide-63
SLIDE 63

Discrete Gaussian Distribution

  • 𝐸𝜏 is defined by

assigning weight proportional to 𝜍𝜏 𝑦 = exp(

−𝑦2 2𝜏2)

  • 1501

1020 502 …

  • 1900

572 R = 𝑎𝟓𝟏𝟘𝟒 𝑦 𝑦𝟑𝟔𝟕 + 1

Uniform

  • 1

4

  • 8

… 1

Remark on Arithmetic of x-distributed values: Uniform * Gaussian = Uniform Gaussian * Gaussian = larger Gaussian

Gaussian

𝒃 e

slide-64
SLIDE 64

Gaussian Sampling: Options

Rejection Sampling Bernoulli Sampling Knuth-Yao Sampling Cumulative Distribution Table (CDT) Sampling

[DG14] Efficient sampling from discrete Gaussians for lattice-based cryptography on a constrained device, Dwarakanath and Galbraith, Applicable Algebra in Engineering, Communication and Computing, 2014 [DDLL14] Lattice Signatures and Bimodal Gaussians, Léo Ducas and Alain Durmus and Tancrède Lepoint and Vadim Lyubashevsky, CRYPTO '13

slide-65
SLIDE 65

Ring-LWE Encryption Scheme [LP11/LPR10]

Enc(𝒃, 𝒒, 𝑛 ∈ 0,1 𝑜): 𝒇1, 𝒇2, 𝒇3 ← 𝐸𝜏. 𝒏 = 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 . Ciphertext: [𝒅1 = 𝒃 ⋅ 𝒇1 +𝒇2, 𝒅2 = 𝒒 ⋅ 𝒇1 +𝒇3 + 𝒏]

Gen: Choose 𝒃 ← 𝑆 and 𝒔1, 𝒔2 ← 𝐸𝜏; pk: 𝒒 = 𝒔1 − 𝒃 ⋅ 𝒔2∈ R; sk: 𝒔2

𝑏 𝑞 𝐸𝜏 x x 𝐸𝜏 𝐸𝜏 + + + 𝑛 𝑓𝑜𝑑𝑝𝑒𝑓 𝑑1 𝑑2

Dec(𝑑 = [𝒅1, 𝒅2], 𝒔𝟑): Output 𝑒𝑓𝑑𝑝𝑒𝑓(𝒅1 ⋅ 𝒔2 +𝒅2)

𝑑1 𝑑2 𝑠

1

x + 𝑒𝑓𝑑𝑝𝑒𝑓 𝑛 Correctness: 𝒅1𝒔2 + 𝒅2 = (𝒃𝒇1 + 𝒇2)𝒔2 +𝒒𝒇1 + 𝒇3 + 𝒏 = 𝒔2𝒃𝒇1 + 𝒔2𝒇2 + 𝒔1𝒇1 − 𝒔2𝒃𝒇1 + 𝒇3 + 𝒏 = 𝒏 + 𝒔2𝒇2+𝒔1𝒇1 + 𝐟3 large small

slide-66
SLIDE 66

Ring-LWE Encryption: Parameters

Error correction

  • Encode(m)

– Return 𝑛 ⋅ 𝑟/2

  • Decode(x)

– If (1/4𝑟 < 𝑦 < 3/4𝑟) Return 1 – Else return 0

1 … 1 2046 … 2046 𝒏 m 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 𝑜 −bit message/coefficients 402 1907 … 2631 4024 1 … 1 𝒏 𝒏 + 𝒔2𝒇2+ 𝒔1𝒇1 + 𝐟3 de𝑑𝑝𝑒𝑓 𝑛

R = 𝑎𝟓𝟏𝟘𝟒 𝑦 𝑦𝟑𝟔𝟕 + 1

slide-67
SLIDE 67

Ring-LWE Encryption: Parameters

  • Message and ciphertext:

– Message space: 𝑜 bits – Expansion 2 ⋅ log2 𝑟 – Two large polynomials (𝒅1, 𝒅2)

  • Public key: one or two large polynomials (𝒃, 𝒒)
  • Secret key: small polynomial (𝒔𝟑)

Parameter sets 𝑜 𝑞 𝜏 |𝒅1, 𝒅2| |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits

slide-68
SLIDE 68

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part II

slide-69
SLIDE 69
  • Two main components

– Polynomial multiplier for 𝑜 = {256,512,1024} over specific rings with coefficients with less than log2(𝑟) < 24 bits – Discrete Gaussian sampler with precisely defined precision 𝜏

Hardware Implementation Building Blocks for R-LWE

slide-70
SLIDE 70

Hardware Implementation: Low-Cost Design for Xilinx Spartan-6

  • Row-wise polynomial

multiplication (𝒃𝒇1/𝒒𝒇1)

– Simple address generation – Sample coefficient of 𝒇1, add row of 𝒅1 then add row of 𝒅2, add coefficient of 𝒇2 and 𝒇3

  • Key and ciphertext are

stored in block memory

  • DSP block for arithmetic

(𝑟 × 𝑟-bit multipler)

Multiplication (DSP) Modular reduction (power

  • t two possible)
slide-71
SLIDE 71

Post-place-and-route performance on a Spartan-6 LX9 FPGA.

Hardware Implementation: Low Area

  • Usage of 𝑟 = 4096 leads to area improvement and higher clock frequency
  • Performance is still very good
  • Area consumption is low, especially for decryption

Area savings by power of two modulus

slide-72
SLIDE 72

Ring-LWE: Can we do better?

  • Schoolbook polynomial multiplication is simple and independent of

parameters

  • Performance is reasonable but can still be improved
  • Remember: according to schoolbook multiplication, we need 𝑜2

multiplications modulo q for one polynomial multiplication

– 1282 = 16384 – 2562 = 65536 – 5122 = 262144 – 10242 = 1048576

Can we do better?

slide-73
SLIDE 73

Optimization: Polynomial Multiplication based on NTT

  • Include algorithmic tweaks for fast polynomial multiplication
  • The Number Theoretic Transform (NTT) is a discrete Fourier transform

(DFT) defined over a finite field or ring. For a given primitive 𝑜-th root

  • f unity 𝜕 the NTT is defined as:

– Forward transformation: NTT

  • 𝑩[𝑗] = 𝑘=0

𝑜−1𝒃 𝑘 𝜕𝑗𝑘, 𝑗 = 0,1,… , 𝑜

– Inverse transformation: INTT

  • 𝒃[𝑗] = 𝑜−1 𝑘=0

𝑜−1 𝑩 𝑘 𝜕−𝑗𝑘, 𝑗 = 0,1,… , 𝑜

  • NTT exists if 𝑟 is a prime, 𝑜 a power of two and if q ≡ 1 mod 2𝑜
  • Example: Ring-LWE encryption: 7681 mod 2 ∙ 256 = 1
slide-74
SLIDE 74

NTT for Lattice Cryptography: Convolution Theorem

  • With the convolution theorem we can basically multiply two

vectors/polynomials with the help of the NTT

– 𝐝 = INTT NTT 𝒃 ∘ NTT 𝒄 – Efficient algorithms are known for bi-direction conversion

  • Negative Wrapped Convolution:

– Polynomial multiplication in 𝑎𝑟 𝑦 / 𝑦𝑜 + 1 – Runtime 𝑃(𝑜 log𝑜) – No appending of zeros required (as for regular convolution) – Implicit polynomial reduction by 𝑦𝑜 + 1 NTT NTT INTT

𝒃 𝒄 𝒅

slide-75
SLIDE 75

Efficient Computation of the NTT (Cooley-Tukey)

  • Bitreversal required (NTT𝑜𝑝→𝑐𝑝)
  • Precomputationof powers of 𝜕 possible
  • Arithmetic is basically multiplication and reduction

modulo 𝑟 (

𝑜 2 log2(𝑜) times)

  • Further optimizations still possible

Multiplication by 𝜕0 = 1

twiddle factors

slide-76
SLIDE 76

Ring-LWE Encryption on FPGA

NTT is very fast but still quite small

Lots of improvement since [GFS+12]

slide-77
SLIDE 77

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part II

slide-78
SLIDE 78
  • Efficient McEliece implementations with practical key sizes
  • QC-MDPC codes are an efficient alternative to binary Goppa codes
  • Note: consider attacks on decryption failure rate (ASIACRYPT 2016)
  • Low-cost FPGA implementation practical for key agreement scheme (in prep)
  • Efficient R-LWE encryption are extremely efficient
  • R-LWE (and variants) also allow signature + advanced schemes
  • FPGA implementations more efficient than RSA, en par with ECC
  • Papers and source code available at

http://www.seceng.rub.de/research/projects/pqc/

  • For more papers and codes, see project websites of

Lessons Learned

ICT-644729

slide-79
SLIDE 79

Part II: Hardware Architectures for Post Quantum Cryptography

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

Thank you! Questions?

Tutorial@CHES 2017 - Tim Güneysu

slide-80
SLIDE 80

Part III: Post Quantum Cryptography in Embedded Software

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

including slides by Ingo von Maurich and Thomas Pöppelmann

slide-81
SLIDE 81

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part III

slide-82
SLIDE 82

Recall: McEliece Encryption Scheme [1978]

Key Generation Given a [𝑜, 𝑙]-code 𝐷 with generator matrix 𝐻 and error correcting capability 𝑢 Private Key: (𝑇, 𝐻, 𝑄), where 𝑇 is a scrambling and 𝑄 is a permutation matrix Public Key: 𝐻′ = 𝑇 · 𝐻 · 𝑄 Encryption Message 𝑛 ∈ 𝔾2

𝑙, error vector e ∈𝑆 𝔾2 𝑜, wt e ≤ 𝑢

x ← 𝑛𝐻′ + e Decryption Let Ψ𝐼 be a 𝑢-error-correcting decoding algorithm. 𝑛 · 𝑇 ← Ψ𝐼 𝑦 · 𝑄−1 , removes the error e · 𝑄−1 Extract 𝑛 by computing 𝑛 · 𝑇 · 𝑇−1

slide-83
SLIDE 83

Encryption Message 𝑛 ∈ 𝐺2

𝑙, error vector 𝑓 ∈𝑆 𝐺2 𝑜, 𝑥𝑢(𝑓) ≤ 𝑢

x ← 𝑛𝐻 + 𝑓 Decryption Let Ψ𝐼 be a 𝑢-error-correcting (QC-)MDPC decoding algorithm. 𝑛𝐻 ← Ψ𝐼 𝑛𝐻 + 𝑓 Extract 𝑛 from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] 𝑜0 = 2, 𝑜 = 9602, 𝑠 = 4801, 𝑥 = 90, 𝑢 = 84

(QC-)MDPC McEliece

slide-84
SLIDE 84

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part III

slide-85
SLIDE 85

32-bit ARM Microcontroller

ARM-based 32-bit Microcontroller

  • STM32F407@168MHz
  • 32-bit ARM Cortex-M4
  • 1 Mbyte flash, 192 kbyte SRAM
  • Crypto functions: TRNG, 3DES, AES, SHA-1/-256, HMAC co-processor
  • Costs: roughly US$ 10

AVR-based 8-bit Microcontroller

  • ATXMega128A1@32MHz
  • 8-bit AVR Xmega Family
  • 256 Kbyte flash, 8 Kbyte SRAM
  • Crypto functions: DES, AES
  • Costs: roughly US$ 10
slide-86
SLIDE 86

Implementing Key Generation

  • Memory is a scarce resource on microcontrollers
  • Generate and store random sparse vectors of length 4801

with 45 bits set  store set bit locations only Generating secret key 𝑰 = [𝑰𝟏|𝑰𝟐]

  • Generate first row of 𝐼1, repeat if not invertible
  • Generate first row of 𝐼0
  • Convert to sparse representation → 90 counters

Computing public key 𝑯 = [𝑱|𝑹]

  • Compute 𝑅 from first row of 𝐼1

−1and 𝐼0

slide-87
SLIDE 87

Implementing (Plain) Encryption

  • Recall operation principle as for low-cost hardware
  • All processes are based on 32-bit based operations
  • Set bits in message 𝑛 select rows of the public key 𝐻
  • Parse 𝑛 bit-by-bit, XOR current row of 𝐻 if bit is set
  • Error addition for encryption
  • Use TRNG to provide random bits to add 𝑢 errors
  • Obtain individual error indices by rejection sampling

from log2 𝑜 = 14 bit

slide-88
SLIDE 88

Implementing (Plain) Decryption

Recall syndrome computation; parity check matrix in sparse

  • Parse ciphertext bit-by-bit
  • XOR row of the secret key if corresponding ciphertext bit is set

Decoding iteration

  • Count #bits that are set in the syndrome and current row of

the parity-check matrix blocks  use 90 counters

  • Compare #bits to decoding threshold
  • Invert current ciphertext bit if #bits above threshold
  • Add current row to syndrome
  • Generate next row → increment counters (check overflows)
slide-89
SLIDE 89

Implementation Results

Scheme Platform Cycles/Op Time McE MDPC (keygen) STM32F407 148,576,008 884 ms McE MDPC (enc) STM32F407 16,771,239 100 ms McE MDPC (dec) STM32F407 37,171,833 221 ms McE MDPC (enc) ATxmega256 26,767,463 836 ms McE MDPC (dec) ATxmega256 86,874,388 2,71 s

  • 8-Bit AVR platform too slow for real-world deployment
  • Key generation excessive, decryption roughly 3 seconds
  • 32-bit ARM is a suitable platform and provides built-in TRNG
  • Improved QcBits software for Cortex-M4 by Chou (CHES 2016)
slide-90
SLIDE 90
  • CCA2-Security for McEliece Encryption:

– Additional conversion (e.g., via Fujisaki-Okamoto, includes the necessity for hash-function and re-encryption)

  • Side-Channel Attacks:

– Masking schemes (SCA) for McEliece by Eisenbarth et al. [SAC15], does not include CCA2 security

  • Decryption Failure Rate Attacks:

– Guo et al [ASIACRYPT16] identifies correlation between decoding failures in iterative decoders (bit flipping decoding)

Further Implementation Remarks and Requirements

slide-91
SLIDE 91

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part III

slide-92
SLIDE 92

Ring-LWE Encryption Scheme [LP11/LPR10]

Enc(𝒃, 𝒒, 𝑛 ∈ 0,1 𝑜): 𝒇1, 𝒇2, 𝒇3 ← 𝐸𝜏. 𝒏 = 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 . Ciphertext: [𝒅1 = 𝒃 ⋅ 𝒇1 +𝒇2, 𝒅2 = 𝒒 ⋅ 𝒇1 +𝒇3 + 𝒏]

Gen: Choose 𝒃 ← 𝑆 and 𝒔1, 𝒔2 ← 𝐸𝜏; pk: 𝒒 = 𝒔1 − 𝒃 ⋅ 𝒔2∈ R; sk: 𝒔2

𝑏 𝑞 𝐸𝜏 x x 𝐸𝜏 𝐸𝜏 + + + 𝑛 𝑓𝑜𝑑𝑝𝑒𝑓 𝑑1 𝑑2

Dec(𝑑 = [𝒅1, 𝒅2], 𝒔𝟑): Output 𝑒𝑓𝑑𝑝𝑒𝑓(𝒅1 ⋅ 𝒔2 +𝒅2)

𝑑1 𝑑2 𝑠

1

x + 𝑒𝑓𝑑𝑝𝑒𝑓 𝑛 Correctness: 𝒅1𝒔2 + 𝒅2 = (𝒃𝒇1 + 𝒇2)𝒔2 +𝒒𝒇1 + 𝒇3 + 𝒏 = 𝒔2𝒃𝒇1 + 𝒔2𝒇2 + 𝒔1𝒇1 − 𝒔2𝒃𝒇1 + 𝒇3 + 𝒏 = 𝒏 + 𝒔2𝒇2+𝒔1𝒇1 + 𝐟3 large small

slide-93
SLIDE 93

Ring-LWE Encryption: Parameters

  • Message and ciphertext:

– Message space: 𝑜 bits – Expansion 2 ⋅ log2 𝑟 – Two large polynomials (𝒅1, 𝒅2)

  • Public key: one or two large polynomials (𝒃, 𝒒)
  • Secret key: small polynomial (𝒔𝟑)

Parameter sets 𝑜 𝑞 𝜏 |𝒅1, 𝒅2| |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits

slide-94
SLIDE 94

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part III

slide-95
SLIDE 95

Simple Implementation of RLWE-Encryption

void encrypt(poly a, poly p, unsigned char * plaintext, poly c1, poly c2) { int i,j; poly e1,e2,e3; gauss_poly(e1); gauss_poly(e2); gauss_poly(e3); poly_init(c1, 0, n); // init with 0 poly_init(c2, 0, n); // init with 0 for(i = 0;i < n; i++){ // multiplication loops for(j = 0; j<n; j++){ c1[(i + j) % n] = modq(c1[(i + j) % n] + (a[i] * e1[j] * (i+j>=n ? -1 : 1))); c2[(i + j) % n] = modq(c2[(i + j) % n] + (p[i] * e1[j] * (i+j>=n ? -1 : 1))); } c1[i] = modq(c1[i] + e2[i]); c2[i] = (plaintext[i>>3] & (1<<(i%8))) ? modq(c2[i] + e3[i] + q/2) : modq(c2[i] + e3[i]); } }

This has to be fast

slide-96
SLIDE 96
  • Two main components

– Polynomial multiplier for 𝑜 = {256,512,1024} over specific rings with coefficients with less than log2(𝑟) < 24 bits – Discrete Gaussian sampler with precisely defined precision 𝜏 and tail cut 𝜐

Software Implementation Main Functions for R-LWE

slide-97
SLIDE 97

Intermediate Results

  • Implementation of RLWE-Encryption on the

AVR 8-bit ATxmega processor running at 32 MHz

  • Schoolbook multiplication (SchoolMul)
  • Encryption is two multiplications and decryption one
slide-98
SLIDE 98

Recall Improvement: Polynomial Multiplication with NTT

  • Number Theoretic Transform (NTT) is a discrete Fourier

transform (DFT) defined over a finite field or ring. For a given primitive 𝑜-th root of unity 𝜕 the NTT is defined as: – Forward transformation: NTT

  • 𝑩[𝑗] = 𝑘=0

𝑜−1𝒃 𝑘 𝜕𝑗𝑘, 𝑗 = 0,1, … , 𝑜

– Inverse transformation: INTT

  • 𝒃[𝑗] = 𝑜−1 𝑘=0

𝑜−1𝑩 𝑘 𝜕−𝑗𝑘,𝑗 = 0,1, … , 𝑜

  • NTT exists if 𝑟 is a prime, 𝑜 a power of two and if

q ≡ 1 mod 2𝑜

slide-99
SLIDE 99

Efficient Computation of the NTT (Textbook)

09.10.2012

  • Bitreversal required (NTT𝑜𝑝→𝑐𝑝)
  • Precomputation of powers of 𝜕 possible
  • Arithmetic is basically multiplication and

reduction modulo 𝑟 (

𝑜 2 log2(𝑜) times)

Multiplication by 𝜕0 = 1

twiddle factors

slide-100
SLIDE 100

Optimization of NTT Computation

Removal of expensive “helper” functions

  • Problem: Permutation (Bitrev) of polynomial

is expensive

– “Standard” NTT𝑐𝑝→𝑜𝑝 requires bitreversed input and produces naturally ordered output – Bitreversal before each forward or inverse NTT

  • Solution: NTT algorithm can be written as

– Natural to bitreversed for forward: NTT𝑜𝑝→𝑐𝑝 – Bitreversed to natural for inverse: INTT𝑐𝑝→𝑜𝑝 – No bitreversal necessary anymore:

  • INTT𝑐𝑝→𝑜𝑝(NTT𝑜𝑝→𝑐𝑝 𝒃 ∘ NTT𝑜𝑝→𝑐𝑝(𝒄))
slide-101
SLIDE 101

Optimization of NTT Computation

Removal of expensive “helper” functions

  • Problem: Multiplication by scalar 𝑜−1 in inverse

transformation is expensive

  • Solution: In lattice-based crypto we usually multiply by

pretransformed constants (e.g., 𝒃, 𝒒, or 𝒔2)

– Put 𝑜−1 into these constants – Multiplication by scalar does not change much as

  • x ∙ NTT(𝒃) ⇔ NTT(𝑦 ∙ 𝒃)

– Store 𝒃′ = 𝑜−1 𝒃

slide-102
SLIDE 102

Optimization of NTT Computation

Removal of expensive “helper” functions

  • Problem: Multiplication by powers of 𝜔 and 𝜔−1

(PowMul) is expensive

  • Solution: Merge powers of 𝜔 into twiddle factors

– Only possible with forward transformation and current butterfly (see next picture)

slide-103
SLIDE 103

Optimization of NTT Computation

  • Combines all tricks for forward transformation
  • We cannot merge powers of 𝜔−1; We have to multiply after

transformation is finished

slide-104
SLIDE 104

Optimization of NTT Computation

  • Usage of Gentlemen-Sande (GS) butterfly instead of Cooley-Tukey

(CT) allows merging of inverse multiplication by powers of 𝜔−1

– CT: 𝑏 + 𝜕𝑐 and 𝑏 − 𝜕𝑐 – GS: 𝑏 + 𝑐 and (𝑏 − 𝑐)𝜕

slide-105
SLIDE 105

Optimization of NTT Computation

  • We save several steps compared to straightforward approach
  • Almost no additional costs (if we store twiddle factors)

– No multiplication by one in first stage anymore – Can be mitigated by using lookup tables if coefficients for e are small

Textbook

(*) FFT people probably know most of these tricks

Optimized (*)

slide-106
SLIDE 106

Optimization of NTT Computation

How to accelerate the multiplication core operation

  • Address generation for NTT is cheap and well researched

(see FFT)

  • The only expensive computation is the

butterfly, which boils down to

– a log2 𝑟 × log2 𝑟 multiplication – a mod 𝑟 modulo reduction – two additions or subtractions modulo 𝑟

  • Implementation of the butterfly depends
  • n target architecture

– General methods like Montgomery or Barret reduction – Reductions that depend on special primes like Solinas primes

slide-107
SLIDE 107

Ring-LWE Encryption on ATXmega (ATXMega128A1)

  • Moderate

performance impact of larger parameter set

  • Very fast

decryption

  • Some pitfalls in

practice (only CPA and decryption errors)

slide-108
SLIDE 108

Ring-LWE Encryption on ATXmega Family

Schoolbook was 12 million

[POG15] High-Performance Ideal Lattice-Based Cryptography on 8-bit ATxmega Microcontrollers, Thomas Pöppelmann, Tobias Oder, and Tim Güneysu, Latincrypt’15

Code size is not significantly increased Sampler is the bottleneck

slide-109
SLIDE 109

Ring-LWE Encryption on Other Platforms [CRV+15]

Table from [CRV+15]: Ruan de Clercq, Sujoy Sinha Roy, Frederik Vercauteren, Ingrid Verbauwhede: Efficient software implementation of ring-LWE encryption. DATE 2015: 339-344

slide-110
SLIDE 110
  • CCA2-Security:

– Additional conversion (e.g., via Fujisaki-Okamoto, includes the necessity for hash-function and re-encryption)

  • Side-Channel Attacks:

– Masking schemes (SCA) by Reparaz et al [CHES15, PQCRYPTO16], does not include CCA2 security

  • Fault-Injection Attacks:

– Loop-Abort attacks by Espitau et al. [ePrint 16] – Fault Sensitivity by Bindel et al. [FDTC16]

Further Implementation Remarks and Requirements

slide-111
SLIDE 111

Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

Tutorial Outline – Part III

slide-112
SLIDE 112
  • Efficient McEliece implementations with practical key sizes
  • QC-MDPC codes are an efficient alternative also in software
  • Note: consider reported issues with decryption error (ASIACRYPT 2016)
  • Physical attacks are more challenging to counter with probabilistic decoding
  • Efficient R-LWE encryption are extremely efficient
  • R-LWE (and variants) also allow signature + advanced schemes
  • Software implementations very efficient compared to ECC and RSA
  • Papers and source code available at

http://www.seceng.rub.de/research/projects/pqc/

  • For more papers and codes, see project websites of

Lessons Learned

ICT-644729

slide-113
SLIDE 113

Part III: Post Quantum Cryptography in Embedded Software

Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017

Thank you! Questions?