NewHope for ARM Cortex-M Erdem Alkim 1 , Philipp Jakubeit 2 , Peter - - PowerPoint PPT Presentation

newhope for arm cortex m
SMART_READER_LITE
LIVE PREVIEW

NewHope for ARM Cortex-M Erdem Alkim 1 , Philipp Jakubeit 2 , Peter - - PowerPoint PPT Presentation

NewHope for ARM Cortex-M Erdem Alkim 1 , Philipp Jakubeit 2 , Peter Schwabe 2 erdemalkim@gmail.com , phil.jakubeit@gmail.com , peter@cryptojedi.org 1 Ege University, Izmir, Turkey 2 Radboud University, Nijmegen, The Netherlands SPACE 2016 NewHope


slide-1
SLIDE 1

NewHope for ARM Cortex-M

Erdem Alkim1, Philipp Jakubeit2, Peter Schwabe2

erdemalkim@gmail.com, phil.jakubeit@gmail.com, peter@cryptojedi.org

1Ege University, Izmir, Turkey 2Radboud University, Nijmegen, The Netherlands

SPACE 2016

slide-2
SLIDE 2

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-3
SLIDE 3

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-4
SLIDE 4

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-5
SLIDE 5

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-6
SLIDE 6

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers

Alternatives:

◮ Problems which are not broken by quantum algorithms (yet) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-7
SLIDE 7

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers

Alternatives:

◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-8
SLIDE 8

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers

Alternatives:

◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem

Steps taken:

◮ Tor considering (ECC+RLWE) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-9
SLIDE 9

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers

Alternatives:

◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem

Steps taken:

◮ Tor considering (ECC+RLWE) ◮ Google experimented (ECC+RLWE) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-10
SLIDE 10

NewHope Efficient Implementation

Post-Quantum Cryptography

Shor’s algorithm in 1994:

◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years

Threat:

◮ Record encrypted messages today ◮ Break encryption with quantum computers

Alternatives:

◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem

Steps taken:

◮ Tor considering (ECC+RLWE) ◮ Google experimented (ECC+RLWE) ◮ Slowest 5% increased by 20ms ◮ Slowest 1% increased by 150ms Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16

slide-11
SLIDE 11

NewHope Efficient Implementation

Ring-Learning-With-Errors Problem

Rq = Zq[X]/(X n + 1),

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16

slide-12
SLIDE 12

NewHope Efficient Implementation

Ring-Learning-With-Errors Problem

Rq = Zq[X]/(X n + 1), χ – an error distribution on Rq Search version:

Given: (ai, bi) for ai ∈ Rq and bi = s · ai + ei for ei

$

← χ Wanted: s

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16

slide-13
SLIDE 13

NewHope Efficient Implementation

Ring-Learning-With-Errors Problem

Rq = Zq[X]/(X n + 1), χ – an error distribution on Rq Search version:

Given: (ai, bi) for ai ∈ Rq and bi = s · ai + ei for ei

$

← χ Wanted: s

a1 ∈ Rq, b1 =s · a1 + e1 a2 ∈ Rq, b2 =s · a2 + e2 . . .

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16

slide-14
SLIDE 14

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-15
SLIDE 15

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-16
SLIDE 16

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-17
SLIDE 17

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-18
SLIDE 18

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-19
SLIDE 19

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

2010 Gaborit: Noisy Diffie-Hellman

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-20
SLIDE 20

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

2010 Gaborit: Noisy Diffie-Hellman 2011 Linder, Peikert: (Approximate) Key Agreement

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-21
SLIDE 21

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

2010 Gaborit: Noisy Diffie-Hellman 2011 Linder, Peikert: (Approximate) Key Agreement 2012 Ding: Reconciliation-based Key Exchange

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-22
SLIDE 22

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

2010 Gaborit: Noisy Diffie-Hellman 2011 Linder, Peikert: (Approximate) Key Agreement 2012 Ding: Reconciliation-based Key Exchange 2014 Peikert: Tweak to obtain unbiased keys

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-23
SLIDE 23

NewHope Efficient Implementation

Post-Quantum Key Exchange

Use encryption scheme to send a chosen key

1998 Hoffstein, Pipher, Silverman: NTRU cryptosystem 2005 Regev: LWE 2010 Lyubashevsky, Peikert, Regev: RLWE

Lattice based key exchange

2010 Gaborit: Noisy Diffie-Hellman 2011 Linder, Peikert: (Approximate) Key Agreement 2012 Ding: Reconciliation-based Key Exchange 2014 Peikert: Tweak to obtain unbiased keys 2015 Bos, Costello, Naehrig, Stebila: Instantiate, Implement, and integrate into OpenSSL

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16

slide-24
SLIDE 24

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-25
SLIDE 25

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-26
SLIDE 26

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-27
SLIDE 27

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-28
SLIDE 28

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-29
SLIDE 29

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-30
SLIDE 30

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-31
SLIDE 31

NewHope Efficient Implementation

NewHope – The Protocol

Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn

16

Alice (server) Bob (client) seed

$

← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e

$

← ψn

16

s′, e′, e′′

$

← ψn

16

b←as + e

(seed,b)

− − − − − − →

1824 Bytes

a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us

(u,r)

← − − − − − −

2048 Bytes

r

$

← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16

slide-32
SLIDE 32

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-33
SLIDE 33

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-34
SLIDE 34

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-35
SLIDE 35

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)

◮ Fast polynomial multiplication

◮ Number theoretic transform (NTT) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-36
SLIDE 36

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)

◮ Fast polynomial multiplication

◮ Number theoretic transform (NTT) ◮ c = a · b Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-37
SLIDE 37

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)

◮ Fast polynomial multiplication

◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-38
SLIDE 38

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)

◮ Fast polynomial multiplication

◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) ◮ Multiply evaluations NTT(a) ◦ NTT(b) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-39
SLIDE 39

NewHope Efficient Implementation

Relevant Building Blocks

◮ Error Distribution

◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)

◮ Fast polynomial multiplication

◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) ◮ Multiply evaluations NTT(a) ◦ NTT(b) ◮ Deevaluate NTT−1(NTT(a) ◦ NTT(b)) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16

slide-40
SLIDE 40

NewHope Efficient Implementation

Number Theoretic Transform (NTT)

◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1

j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive

n-th root of unity.

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16

slide-41
SLIDE 41

NewHope Efficient Implementation

Number Theoretic Transform (NTT)

◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1

j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive

n-th root of unity.

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16

slide-42
SLIDE 42

NewHope Efficient Implementation

Number Theoretic Transform (NTT)

◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1

j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive

n-th root of unity.

log n level Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16

slide-43
SLIDE 43

NewHope Efficient Implementation

Number Theoretic Transform (NTT)

◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1

j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive

n-th root of unity.

log n level

n 2 butterfly operations per level

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16

slide-44
SLIDE 44

NewHope Efficient Implementation

Butterfly Operations

xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16

slide-45
SLIDE 45

NewHope Efficient Implementation

Butterfly Operations

xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Barrett Reduction: r = x mod m Precompute µ = b2k

m

Replaces division by multiplication Reduces 16-bit Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16

slide-46
SLIDE 46

NewHope Efficient Implementation

Butterfly Operations

xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Barrett Reduction: r = x mod m Precompute µ = b2k

m

Replaces division by multiplication Reduces 16-bit Montgomery Reduction: T mod m R > m, gcd(m, R) = 1 TR−1 mod m Reduces 32-bit Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16

slide-47
SLIDE 47

NewHope Efficient Implementation

Algorithmic Optimization Techniques

◮ Using Montgomery arithmetic ◮ Using short Barrett reductions

Montgomery reduction (R = 218) montgomery_reduce,rm: MUL rt, rm, #12287 // inv(q) AND rt, rt, #262143 // R-1 MUL rt, rt, #12289 // q ADD rm, rm, rt SHR rm, rm, #18 Short Barrett reduction barrett_reduce,rb: MUL rt, rb, #5 SHR rt, rt, #16 MUL rt, rt, #12289 SUB rb, rb, rt Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16

slide-48
SLIDE 48

NewHope Efficient Implementation

Algorithmic Optimization Techniques

◮ Using Montgomery arithmetic ◮ Using short Barrett reductions ◮ Lazy reduction

W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16

slide-49
SLIDE 49

NewHope Efficient Implementation

Algorithmic Optimization Techniques

◮ Using Montgomery arithmetic ◮ Using short Barrett reductions ◮ Lazy reduction ◮ Negative-wrapped convolution

c = (nψ)−1NTT−1(NTT(ψa) ◦ NTT(ψb)) a, b, c ∈ Rq ψ = {√ω0, . . . , √ωn−1}

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16

slide-50
SLIDE 50

NewHope Efficient Implementation

Algorithmic Optimization Techniques

◮ Using Montgomery arithmetic ◮ Using short Barrett reductions ◮ Lazy reduction ◮ Negative-wrapped convolution ◮ Precomputed constants

c = (nψ)−1NTT−1(NTT(ψa) ◦ NTT(ψb)) ω = {ω0, ω1, . . . , ω n 2 −1} ψ = {ω0, ω0 · ψ, . . . , ω n 2 −1, ω n 2 −1 · ψ}, for ψ = 7

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16

slide-51
SLIDE 51

NewHope Efficient Implementation

Cortex-M Family

Cortex-M0

STM32F0 Discovery board 8KB RAM 64KB Flash 32-bit word size Thumb + subset Thumb 2 8 General-purpose registers 5 High registers 3 Reserved registers (SP,LR,PC)

Cortex-M4

STM32F4 Discovery board 192KB RAM 1MB Flash 32-bit word size Full Thumb 2 13 General-purpose registers 3 Reserved registers (SP,LR,PC) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 10 / 16

slide-52
SLIDE 52

NewHope Efficient Implementation

Architecture Specific Optimization Techniques

◮ Unrolled NTT

◮ Code size increases ◮ Cycle counts decreases Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16

slide-53
SLIDE 53

NewHope Efficient Implementation

Architecture Specific Optimization Techniques

◮ Unrolled NTT

◮ Code size increases ◮ Cycle counts decreases

◮ Adapted to word size

◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16

slide-54
SLIDE 54

NewHope Efficient Implementation

Architecture Specific Optimization Techniques

◮ Unrolled NTT

◮ Code size increases ◮ Cycle counts decreases

◮ Adapted to word size

◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication

◮ Merged levels

◮ 2 Level on the Cortex-M0 ◮ 3(4) level on the Cortex-M4 Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16

slide-55
SLIDE 55

NewHope Efficient Implementation

Architecture Specific Optimization Techniques

◮ Unrolled NTT

◮ Code size increases ◮ Cycle counts decreases

◮ Adapted to word size

◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication

◮ Merged levels

◮ 2 Level on the Cortex-M0 ◮ 3(4) level on the Cortex-M4

◮ Minimized register reordering

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16

slide-56
SLIDE 56

NewHope Efficient Implementation

Results: Comparison

Lattice-based cryptography on Cortex-M4F:

1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on

constrained devices.

(Oder, P¨

  • ppelmann, and G¨

uneysu) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16

slide-57
SLIDE 57

NewHope Efficient Implementation

Results: Comparison

Lattice-based cryptography on Cortex-M4F:

1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on

constrained devices.

(Oder, P¨

  • ppelmann, and G¨

uneysu)

Relevant subroutines:

◮ Sampling noise ◮ NTT

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16

slide-58
SLIDE 58

NewHope Efficient Implementation

Results: Comparison

Lattice-based cryptography on Cortex-M4F:

1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on

constrained devices.

(Oder, P¨

  • ppelmann, and G¨

uneysu)

Relevant subroutines:

◮ Sampling noise ◮ NTT – scale by 2 · 10 9

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16

slide-59
SLIDE 59

NewHope Efficient Implementation

Results: Comparison

N

  • i

s e S a m p l i n g N T T M u l t i p l i c a t i

  • n

N T T 1 2 3 4 5 ·105 7.15x 1.83x 1.02x 0.59x 0.94x 0.55x Cycle counts Cortex-M0 Cortex-M4 RNG Cortex-M4F

Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 13 / 16

slide-60
SLIDE 60

NewHope Efficient Implementation

Results: Comparison

N

  • i

s e S a m p l i n g N T T M u l t i p l i c a t i

  • n

N T T 0.5 1 1.5 2 ·106 0.48x 0.28x 0.11x 0.03x 0.55x 0.32x Cycle counts Cortex-M4 (ours) Cortex-M0 (ours) Cortex-M4F

Lattice-based digital signatures on constrained devices. (Oder, P¨

  • ppelmann, and G¨

uneysu) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 14 / 16

slide-61
SLIDE 61

NewHope Efficient Implementation

Overview

Operation Cycle Counts 48 MHz NTT on M0 148517 3.1ms NTT on M4 86769 1.81ms

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 15 / 16

slide-62
SLIDE 62

NewHope Efficient Implementation

Overview

Operation Cycle Counts 48 MHz NTT on M0 148517 3.1ms NTT on M4 86769 1.81ms Operation Cycle Counts 48 MHz NewHope on M0 3228606 67.26ms Curve25519 on M0 3513628 73.02ms NewHope on M4 1816908 37.85ms Curve25519 on M4 1607860 ≈33.50ms

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 15 / 16

slide-63
SLIDE 63

NewHope Efficient Implementation

https://eprint.iacr.org/2016/758 https://github.com/newhopearm/newhopearm

Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 16 / 16