1
Sujoy Sinha Roy
Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU - - PowerPoint PPT Presentation
Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU Leuven 1 Small area for HW implementations Small code size for SW implementation Low power or energy or both Reasonably fast computation time 2 This talk
1
Sujoy Sinha Roy
2
3
➢ Over binary field ➢ Over prime field
4
Generic elliptic curves y2 + xy = x3 + ax2 + b where a and b are from
Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 λ = (y1 + y2)/(x1+x2) Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = λ2 + λ + a y3 = x1
2 + λx3 + x3
λ = x1 + y1/x1 Scalar multiplication: Base point P(x,y) on curve and scalar n nP = P + P + P + … + P PA PD PD PD PD PD PA PA 1 1 1 … … … … Scalar multiplication using double and add algorithm
Finite field operations
5
➢ special arithmetic such as endomorphism ➢ sparse irreducible polynomial
➢ Reduces number of field operations ➢ Also number of registers e.g. Montgomery ladder, special encoding of scalar etc.
➢ Inversion free
➢ Constant time arithmetic. E.g., Montgomery ladder ➢ Random projective coordinate ➢ Scalar randomization (may be?)
6
Uses NIST 163-bit ECC over F2
163
~80 bit security
“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.
7
“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.
8
“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.
9
“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.
10
Uses NIST 283-bit Koblitz curve over F2
283
~140 bit security
11
Generic elliptic curves y2 + xy = x3 + ax2 + b Scalar multiplication
Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = λ2 + λ + a y3 = x1
2 + λx3 + x3
PA PD PD PD PD PD PA PA 1 1 1 … … … … Koblitz curves y2 + xy = x3 + ax2 + 1, a=0 or 1 Scalar multiplication
Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = x1
2
Frobenius endomorphism
y3 = y1
2
PA FE FE FE FE FE PA PA 1 1 1 … … … … Cheap!
12
PA PD PD PD PD PD PA PA 1 1 1 … … … … PA FE FE FE FE FE PA PA 1 1 1 … … … … Generic elliptic curve Koblitz curve Scalar Scalar
Scalar conversion
13
PA PD PD PD PD PD PA PA 1 1 1 … … … … PA FE FE FE FE FE PA PA 1 1 1 … … … … Generic elliptic curve Koblitz curve Scalar Scalar
Scalar conversion Several implementations of lightweight ECC over 𝔾2
m
14
Järvinen
⇒ For Koblitz curve K283, integer add/sub of size 283-bit
15
➢ We compute (d0,d1) (d0/2 – d1, d0/2) ➢ We compute (a0,a1) (2a1, a1 - a0) ➢ We compute (b0,b1) (b0/2 – b1, b0/2) ➢ Sign is corrected in the end of loop Saves 1/3 of cycles!
16
Conditional multi-precision addition reveals info of the secret scalar
O or 1
17
Conditional multi-precision addition reveals info of the secret scalar
O or 1
We generate u ∈ {-1,1} using zero-free function Ψ( ) ➢ u = -1 then b0 - a0 ➢ u = +1 then b0 + a0 Similar operations ⇒ Increased SPA resistance!
18
Scalar conversion produces zero-free representation
(X; Y;Z) = (xr; yr2; r), where r is random
19
Area 4.3 KGE (without RAM) ~10 KGE (with RAM) RAM size 4032 bits Time 1,566,000 cycles 98 ms (16MHz) Energy 9.6 µJ Power 98 µW (1MHz) “Lightweight coprocessor for Koblitz curves: 283-bit ECC including scalar conversion with only 4300 gates” by SS Roy, K Järvinen, I Verbauwhede in CHES2015
20
21
E: y2 = x3 + 486662x2 + x 128-bit security
22
Montgomery ladder Combined PA-PD No need to store y-coordinate! 4S + 5M +MA+ 8A
E: y2 = x3 + 486662x2 + x 128-bit security
23
Modular reduction is easier C = AB = C1∙2255 + C0 C mod p = (C1 ∙19 + C0) mod p
24
Modular multiplier for Curve25519 “Efficient Elliptic-Curve Cryptography using Curve25519 on Reconfigurable Devices” by Sasdrich and Güneysu in ARC 2014 Throughput: 25,000 point multiplications per sec Area of point multiplier: 2,783 LUTs 3,592 FF 20 DSP MULTs Parallel processing for high throughput
25
➢ Speed vs area
“NaCl’s crypto_box in hardware” by M. Hutter, J. Schilling, P. Schwabe, and W. Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.
26
Note: Unified implementation of Curve25519, Salsa20 and Poly 1305 Smallest configuration: Area 14,648 GE, power 40µW (including optimized RAM) Key exchange takes 3,455,394 cycles Fastest configuration: Area 17,966 GE, power 70µW (including optimized RAM) Key exchange takes 811,170 cycles
“NaCl’s crypto_box in hardware” by M. Hutter, J. Schilling, P. Schwabe, and W. Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.
Results
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
1
42
Binary extension fields F2
m
faster
security Prime fields Fp
support efficient computation
general purpose computers, so wider support
43