NewHope for ARM Cortex-M
Erdem Alkim1, Philipp Jakubeit2, Peter Schwabe2
erdemalkim@gmail.com, phil.jakubeit@gmail.com, peter@cryptojedi.org
1Ege University, Izmir, Turkey 2Radboud University, Nijmegen, The Netherlands
NewHope for ARM Cortex-M Erdem Alkim 1 , Philipp Jakubeit 2 , Peter - - PowerPoint PPT Presentation
NewHope for ARM Cortex-M Erdem Alkim 1 , Philipp Jakubeit 2 , Peter Schwabe 2 erdemalkim@gmail.com , phil.jakubeit@gmail.com , peter@cryptojedi.org 1 Ege University, Izmir, Turkey 2 Radboud University, Nijmegen, The Netherlands SPACE 2016 NewHope
1Ege University, Izmir, Turkey 2Radboud University, Nijmegen, The Netherlands
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers
◮ Problems which are not broken by quantum algorithms (yet) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers
◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers
◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem
◮ Tor considering (ECC+RLWE) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers
◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem
◮ Tor considering (ECC+RLWE) ◮ Google experimented (ECC+RLWE) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
◮ Factorization problem – polynomial time ◮ Discrete logarithm problem – polynomial time ◮ Quantum computers are in reach: IBM estimates ≈15 years
◮ Record encrypted messages today ◮ Break encryption with quantum computers
◮ Problems which are not broken by quantum algorithms (yet) ◮ Lattice based cryptography ◮ Ring-learning-with-errors problem
◮ Tor considering (ECC+RLWE) ◮ Google experimented (ECC+RLWE) ◮ Slowest 5% increased by 20ms ◮ Slowest 1% increased by 150ms Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 2 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16
NewHope Efficient Implementation
$
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16
NewHope Efficient Implementation
$
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 3 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 4 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
Parameters: q = 12289 < 214, n = 1024 Error distribution: ψn
16
Alice (server) Bob (client) seed
$
← {0, . . . , 255}32 a←Parse(SHAKE-128(seed)) s, e
$
← ψn
16
s′, e′, e′′
$
← ψn
16
b←as + e
(seed,b)
− − − − − − →
1824 Bytes
a←Parse(SHAKE-128(seed)) u←as′ + e′ v←bs′ + e′′ v′←us
(u,r)
← − − − − − −
2048 Bytes
r
$
← HelpRec(v) ν←Rec(v′, r) ν←Rec(v, r) µ←SHA3-256(ν) µ←SHA3-256(ν)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 5 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)
◮ Number theoretic transform (NTT) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)
◮ Number theoretic transform (NTT) ◮ c = a · b Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)
◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)
◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) ◮ Multiply evaluations NTT(a) ◦ NTT(b) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Centered, binomial distribution ψ16 ◮ µ = 0 and σ2 = 8 ◮ 32-byte seed ◮ ChaCha20 Stream cipher ◮ RNG (internal)
◮ Number theoretic transform (NTT) ◮ c = a · b ◮ Evaluate NTT(a) and NTT(b) ◮ Multiply evaluations NTT(a) ◦ NTT(b) ◮ Deevaluate NTT−1(NTT(a) ◦ NTT(b)) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 6 / 16
NewHope Efficient Implementation
◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1
j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16
NewHope Efficient Implementation
◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1
j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16
NewHope Efficient Implementation
◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1
j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive
log n level Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16
NewHope Efficient Implementation
◮ Fast Fourier Transform defined over finite fields ◮ bi = n−1
j=0 ωijaj for 0 ≤ i ≤ n − 1, and ω being a primitive
log n level
n 2 butterfly operations per level
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 7 / 16
NewHope Efficient Implementation
xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16
NewHope Efficient Implementation
xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Barrett Reduction: r = x mod m Precompute µ = b2k
m
Replaces division by multiplication Reduces 16-bit Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16
NewHope Efficient Implementation
xj (xj + xj+d ) xj+d (xj − xj+d )ωj W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Barrett Reduction: r = x mod m Precompute µ = b2k
m
Replaces division by multiplication Reduces 16-bit Montgomery Reduction: T mod m R > m, gcd(m, R) = 1 TR−1 mod m Reduces 32-bit Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 8 / 16
NewHope Efficient Implementation
Montgomery reduction (R = 218) montgomery_reduce,rm: MUL rt, rm, #12287 // inv(q) AND rt, rt, #262143 // R-1 MUL rt, rt, #12289 // q ADD rm, rm, rt SHR rm, rm, #18 Short Barrett reduction barrett_reduce,rb: MUL rt, rb, #5 SHR rt, rt, #16 MUL rt, rt, #12289 SUB rb, rb, rt Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16
NewHope Efficient Implementation
W = omega[(1024 + j)/(2*dist)]; tmp = x[j]; X[j] = Barrett((tmp + x[j + dist])); X[j + dist] = Montgomery(W * (tmp - x[j + dist])); Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16
NewHope Efficient Implementation
c = (nψ)−1NTT−1(NTT(ψa) ◦ NTT(ψb)) a, b, c ∈ Rq ψ = {√ω0, . . . , √ωn−1}
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16
NewHope Efficient Implementation
c = (nψ)−1NTT−1(NTT(ψa) ◦ NTT(ψb)) ω = {ω0, ω1, . . . , ω n 2 −1} ψ = {ω0, ω0 · ψ, . . . , ω n 2 −1, ω n 2 −1 · ψ}, for ψ = 7
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 9 / 16
NewHope Efficient Implementation
STM32F0 Discovery board 8KB RAM 64KB Flash 32-bit word size Thumb + subset Thumb 2 8 General-purpose registers 5 High registers 3 Reserved registers (SP,LR,PC)
STM32F4 Discovery board 192KB RAM 1MB Flash 32-bit word size Full Thumb 2 13 General-purpose registers 3 Reserved registers (SP,LR,PC) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 10 / 16
NewHope Efficient Implementation
◮ Code size increases ◮ Cycle counts decreases Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16
NewHope Efficient Implementation
◮ Code size increases ◮ Cycle counts decreases
◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16
NewHope Efficient Implementation
◮ Code size increases ◮ Cycle counts decreases
◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication
◮ 2 Level on the Cortex-M0 ◮ 3(4) level on the Cortex-M4 Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16
NewHope Efficient Implementation
◮ Code size increases ◮ Cycle counts decreases
◮ Coefficients: 14-Bit ◮ Word size: 32-Bit ◮ Load/Store 2 coefficients per memory operation ◮ NTT, Addition, Multiplication
◮ 2 Level on the Cortex-M0 ◮ 3(4) level on the Cortex-M4
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 11 / 16
NewHope Efficient Implementation
1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on
(Oder, P¨
uneysu) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16
NewHope Efficient Implementation
1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on
(Oder, P¨
uneysu)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16
NewHope Efficient Implementation
1 Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) 2 Beyond ECDSA and RSA: Lattice-based digital signatures on
(Oder, P¨
uneysu)
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 12 / 16
NewHope Efficient Implementation
N
s e S a m p l i n g N T T M u l t i p l i c a t i
N T T 1 2 3 4 5 ·105 7.15x 1.83x 1.02x 0.59x 0.94x 0.55x Cycle counts Cortex-M0 Cortex-M4 RNG Cortex-M4F
Efficient software implementation of ring-LWE encryption. (de Clercq, Roy, Vercauteren, and Verbauwhede) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 13 / 16
NewHope Efficient Implementation
N
s e S a m p l i n g N T T M u l t i p l i c a t i
N T T 0.5 1 1.5 2 ·106 0.48x 0.28x 0.11x 0.03x 0.55x 0.32x Cycle counts Cortex-M4 (ours) Cortex-M0 (ours) Cortex-M4F
Lattice-based digital signatures on constrained devices. (Oder, P¨
uneysu) Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 14 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 15 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 15 / 16
NewHope Efficient Implementation
Alkim, Jakubeit, Schwabe 2016 A new hope on ARM Cortex-M 16 / 16