Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU - - PowerPoint PPT Presentation

public key cryptography on iot devices
SMART_READER_LITE
LIVE PREVIEW

Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU - - PowerPoint PPT Presentation

Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU Leuven 1 Small area for HW implementations Small code size for SW implementation Low power or energy or both Reasonably fast computation time 2 This talk


slide-1
SLIDE 1

1

Sujoy Sinha Roy

COSIC, KU Leuven

Public key cryptography on IoT devices

slide-2
SLIDE 2

2

  • Small area for HW implementations
  • Small code size for SW implementation
  • Low power or energy or both
  • Reasonably fast computation time
slide-3
SLIDE 3

3

This talk

Lightweight hardware implementation of Elliptic Curve Cryptography (ECC)

➢ Over binary field ➢ Over prime field

slide-4
SLIDE 4

4

Elliptic curves over binary field F2

m

Generic elliptic curves y2 + xy = x3 + ax2 + b where a and b are from

Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 λ = (y1 + y2)/(x1+x2) Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = λ2 + λ + a y3 = x1

2 + λx3 + x3

λ = x1 + y1/x1 Scalar multiplication: Base point P(x,y) on curve and scalar n nP = P + P + P + … + P PA PD PD PD PD PD PA PA 1 1 1 … … … … Scalar multiplication using double and add algorithm

Finite field operations

slide-5
SLIDE 5

5

Lightweight ECC: common tricks

  • Choice of elliptic curve, finite field etc.

➢ special arithmetic such as endomorphism ➢ sparse irreducible polynomial

  • Efficient point multiplication algorithm

➢ Reduces number of field operations ➢ Also number of registers e.g. Montgomery ladder, special encoding of scalar etc.

  • Projective coordinate system

➢ Inversion free

  • Affordable light-weight countermeasures

➢ Constant time arithmetic. E.g., Montgomery ladder ➢ Random projective coordinate ➢ Scalar randomization (may be?)

slide-6
SLIDE 6

6

Uses NIST 163-bit ECC over F2

163

~80 bit security

“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.

slide-7
SLIDE 7

7

  • Co-processor architecture
  • Components within ‘- -’ rectangle are implemented on the chip

“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.

slide-8
SLIDE 8

8

  • Algorithm: Montgomery ladder with projective coordinate
  • Circular register file
  • Digit serial arithmetic unit (MALU)
  • Full custom balanced layout for Register File and MALU

“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.

slide-9
SLIDE 9

9

  • UMC 130 nm
  • Core area 0.54 mm2
  • Scalar multiplication 86K cycles (102 ms at 847.5 KHz)
  • Power 50.4µW at 847.5KHz
  • Energy per scalar multiplication 5.1µJ

“A 5.1μJ per point-multiplication elliptic curve cryptographic processor” by V. Rozic, O. Reparaz, and I. Verbauwhede, published in IJCTA 2016.

Measurement results

slide-10
SLIDE 10

10

Uses NIST 283-bit Koblitz curve over F2

283

~140 bit security

slide-11
SLIDE 11

11

Elliptic curves over F2

m

Generic elliptic curves y2 + xy = x3 + ax2 + b Scalar multiplication

Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = λ2 + λ + a y3 = x1

2 + λx3 + x3

PA PD PD PD PD PD PA PA 1 1 1 … … … … Koblitz curves y2 + xy = x3 + ax2 + 1, a=0 or 1 Scalar multiplication

Point addition: P3(x3, y3) = P1(x1,y1)+P2(x2,y2) x3 = λ2 + λ + x1 + x2 + a y3 = λ(x1 + x3) + x3 + y1 Point doubling: P3(x3, y3) = 2P1(x1,y1) x3 = x1

2

Frobenius endomorphism

y3 = y1

2

PA FE FE FE FE FE PA PA 1 1 1 … … … … Cheap!

slide-12
SLIDE 12

12

But there is a catch …

PA PD PD PD PD PD PA PA 1 1 1 … … … … PA FE FE FE FE FE PA PA 1 1 1 … … … … Generic elliptic curve Koblitz curve Scalar Scalar

Scalar conversion

slide-13
SLIDE 13

13

But there is a catch …

PA PD PD PD PD PD PA PA 1 1 1 … … … … PA FE FE FE FE FE PA PA 1 1 1 … … … … Generic elliptic curve Koblitz curve Scalar Scalar

Scalar conversion Several implementations of lightweight ECC over 𝔾2

m

slide-14
SLIDE 14

14

Scalar conversion

  • Step 1: Scalar is reduced using the lazy reduction by Brumley and

Järvinen

  • Step 2: zero-free expansion by Okeya, Takagi, and Vuillaume

⇒ For Koblitz curve K283, integer add/sub of size 283-bit

slide-15
SLIDE 15

15

Optimization

  • We avoid negations

➢ We compute (d0,d1)  (d0/2 – d1, d0/2) ➢ We compute (a0,a1)  (2a1, a1 - a0) ➢ We compute (b0,b1)  (b0/2 – b1, b0/2) ➢ Sign is corrected in the end of loop Saves 1/3 of cycles!

slide-16
SLIDE 16

16

SPA resistance

Conditional multi-precision addition reveals info of the secret scalar

O or 1

slide-17
SLIDE 17

17

SPA resistance

Conditional multi-precision addition reveals info of the secret scalar

O or 1

We generate u ∈ {-1,1} using zero-free function Ψ( ) ➢ u = -1 then b0 - a0 ➢ u = +1 then b0 + a0 Similar operations ⇒ Increased SPA resistance!

slide-18
SLIDE 18

18

Scalar multiplication

Scalar conversion produces zero-free representation

  • Zero-free representation is generated in (almost) constant time
  • Conversion is one time for a scalar ⇒ attacker has one trace
  • The accumulator point is randomized as shown by Coron:

(X; Y;Z) = (xr; yr2; r), where r is random

slide-19
SLIDE 19

19

Lightweight 283-bit Koblitz curve processor

Area 4.3 KGE (without RAM) ~10 KGE (with RAM) RAM size 4032 bits Time 1,566,000 cycles 98 ms (16MHz) Energy 9.6 µJ Power 98 µW (1MHz) “Lightweight coprocessor for Koblitz curves: 283-bit ECC including scalar conversion with only 4300 gates” by SS Roy, K Järvinen, I Verbauwhede in CHES2015

slide-20
SLIDE 20

20

An implementation over prime field

slide-21
SLIDE 21

21

Curve25519

E: y2 = x3 + 486662x2 + x 128-bit security

  • Montgomery curve
  • Efficient prime p = 2255 − 19
  • Known for fast arithmetic
slide-22
SLIDE 22

22

Curve25519

Montgomery ladder Combined PA-PD No need to store y-coordinate! 4S + 5M +MA+ 8A

E: y2 = x3 + 486662x2 + x 128-bit security

  • Montgomery curve
  • Efficient prime p = 2255 − 19
  • Known for fast arithmetic
slide-23
SLIDE 23

23

Curve25519

  • Efficient prime p = 2255 − 19
  • 15 × 17 = 255
  • Special acceleration on HW by processing words of 17-bit
  • E.g. Xilinx FPGAs have 25×18 DSP multipliers

Modular reduction is easier C = AB = C1∙2255 + C0 C mod p = (C1 ∙19 + C0) mod p

slide-24
SLIDE 24

24

Modular multiplier for Curve25519 “Efficient Elliptic-Curve Cryptography using Curve25519 on Reconfigurable Devices” by Sasdrich and Güneysu in ARC 2014 Throughput: 25,000 point multiplications per sec Area of point multiplier: 2,783 LUTs 3,592 FF 20 DSP MULTs Parallel processing for high throughput

slide-25
SLIDE 25

25

lightweight architecture for Curve25519

  • 32 bit word-serial architecture
  • Single port memory
  • 32-bit multiplier parameterized for digit width w = 2,4,8,12 and 16

➢ Speed vs area

  • ASIP ⇒ programmable

“NaCl’s crypto_box in hardware” by M. Hutter, J. Schilling, P. Schwabe, and W. Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.

slide-26
SLIDE 26

26

lightweight architecture for Curve25519

Note: Unified implementation of Curve25519, Salsa20 and Poly 1305 Smallest configuration: Area 14,648 GE, power 40µW (including optimized RAM) Key exchange takes 3,455,394 cycles Fastest configuration: Area 17,966 GE, power 70µW (including optimized RAM) Key exchange takes 811,170 cycles

“NaCl’s crypto_box in hardware” by M. Hutter, J. Schilling, P. Schwabe, and W. Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.

Results

slide-27
SLIDE 27

27

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0

×

c0

slide-28
SLIDE 28

28

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1

×

c0 c1

slide-29
SLIDE 29

29

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2

×

c0 c1 c2

slide-30
SLIDE 30

30

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

slide-31
SLIDE 31

31

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

a3b1 a2b2 a1b1

… …

c4

slide-32
SLIDE 32

32

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

a3b1 a2b2 a1b1

… …

c4 a0 b0

× +

c0

slide-33
SLIDE 33

33

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

a3b1 a2b2 a1b1

… …

c4 a1 b0

× +

a1b0

slide-34
SLIDE 34

34

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

a3b1 a2b2 a1b1

… …

c4 a0 b1

× +

c0

slide-35
SLIDE 35

35

Of general interest … [product scanning]

  • Classical product scanning example

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

a3b1 a2b2 a1b1

… …

c4 a1 b0

× +

a1b0

Observation: same operand words are fetched several times

slide-36
SLIDE 36

36

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1

×

c0 c1

… × +

slide-37
SLIDE 37

37

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1 a2b0 a1b1 a0b2 a3b0 a2b1 a1b2 a0b3

×

c0 c1 c2 c3

… … × +

slide-38
SLIDE 38

38

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1

×

c0 c1

b0 a0

× +

c0

slide-39
SLIDE 39

39

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1

×

c0 c1

b0 a0

× +

a1

slide-40
SLIDE 40

40

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b0 a1b0 a0b1

×

c0 c1

b1 a0

× +

c1 a1

slide-41
SLIDE 41

41

Of general interest … [product scanning]

  • Two column product scanning of “NaCl’s crypto_box in hardware”

a0 a1 a2 a3 b0 b1 b2 b3

a0b a1b a0b

1

×

c0 c1

b1 a0

× +

c1 a1

Reduces overhead of memory access!

slide-42
SLIDE 42

42

Elliptic curves: two fields

Binary extension fields F2

m

  • Carry free arithmetic
  • Computationally

faster

  • Easy to implement in HW
  • But less confidence about

security Prime fields Fp

  • Multiprecision arithmetic is
  • slower. But recent curves

support efficient computation

  • Easy to implement in

general purpose computers, so wider support

  • Stronger security

Both fields can me implemented on IoT devices

slide-43
SLIDE 43

43

Thank You