Lightweight Coprocessor for Koblitz Curves: 283-bit ECC Including - - PowerPoint PPT Presentation

lightweight coprocessor for koblitz curves 283 bit ecc
SMART_READER_LITE
LIVE PREVIEW

Lightweight Coprocessor for Koblitz Curves: 283-bit ECC Including - - PowerPoint PPT Presentation

Lightweight Coprocessor for Koblitz Curves: 283-bit ECC Including Scalar Conversion with only 4300 Gates S. Sinha Roy, K. Jrvinen , I. Verbauwhede KU Leuven ESAT/COSIC Leuven, Belgium K. Jrvinen, CHES 2015, Sept. 14, 2015 Introduction


slide-1
SLIDE 1

Lightweight Coprocessor for Koblitz Curves: 283-bit ECC Including Scalar Conversion with

  • nly 4300 Gates
  • S. Sinha Roy, K. Järvinen, I. Verbauwhede

KU Leuven ESAT/COSIC Leuven, Belgium

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-2
SLIDE 2

Introduction

2/17

We present a lightweight coprocessor for the 283-bit Koblitz curve The first lightweight implementation of a high security curve The first to include on-the-fly lightweight conversion One of the smallest ECC coprocessors A large set of side-channel countermeasures

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-3
SLIDE 3

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-4
SLIDE 4

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC RAM

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-5
SLIDE 5

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC RAM k, P Q intermediate values

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-6
SLIDE 6

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-7
SLIDE 7

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC k,P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-8
SLIDE 8

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC intermediate values

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-9
SLIDE 9

High-level Architecture

3/17

Point multiplication Q = kP: CPU RAM ECC Q

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-10
SLIDE 10

Koblitz Curves

4/17

Binary curves which are included in many standards (e.g., NIST) Example (Point multiplication Q = kP)

add dbl dbl add dbl add dbl dbl

· · ·

add dbl add

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-11
SLIDE 11

Koblitz Curves

4/17

Binary curves which are included in many standards (e.g., NIST) Point doublings can be replaced with cheap Frobenius maps: φ : (x, y) → (x2, y2) Example (Point multiplication Q = kP)

add dbl dbl add dbl add dbl dbl

· · ·

add dbl add add add add

· · ·

add

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-12
SLIDE 12

Koblitz Curves

4/17

Binary curves which are included in many standards (e.g., NIST) Point doublings can be replaced with cheap Frobenius maps: φ : (x, y) → (x2, y2) . . . but first the integer k needs to be converted to a τ-adic expansion k = ℓ−1

i=0 kiτ i where τ = (µ + √−7)/2 ∈ C

Example (Point multiplication Q = kP)

add dbl dbl add dbl add dbl dbl

· · ·

add dbl add conversion add add add

· · ·

add

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-13
SLIDE 13

Koblitz Curves

4/17

Binary curves which are included in many standards (e.g., NIST) Point doublings can be replaced with cheap Frobenius maps: φ : (x, y) → (x2, y2) . . . but first the integer k needs to be converted to a τ-adic expansion k = ℓ−1

i=0 kiτ i where τ = (µ + √−7)/2 ∈ C

Example (Point multiplication Q = kP)

add dbl dbl add dbl add dbl dbl

· · ·

add dbl add conversion add add add

· · ·

add

Z F2m

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-14
SLIDE 14

Secure Lightweight Conversion

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-15
SLIDE 15

Conversions Algorithms

6/17

Our conversion algorithms are based on: (1) the lazy reduction by Brumley and Järvinen (2) the zero-free expansion by Okeya, Takagi, and Vuillaume

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-16
SLIDE 16

Conversions Algorithms

6/17

Our conversion algorithms are based on: (1) the lazy reduction by Brumley and Järvinen (2) the zero-free expansion by Okeya, Takagi, and Vuillaume ⇒ Only (multiprecision) additions and subtractions (1): Integer k to ρ = b0 + b1τ

(a0, a1) ← (1, 0), (b0, b1) ← (0, 0), (d0, d1) ← (k, 0) for i = 0 to m − 1 do u ← d0 mod 2 d0 ← d0 − u (b0, b1) ← (b0 + u · a0, b1 + u · a1) (d0, d1) ← (d1 − d0/2, −d0/2) (a0, a1) ← (−2a1, a0 − a1) ρ = (b0, b1) ← (b0 + d0, b1 + d1)

(2): ρ to τ-adic exp.

i ← 0 while |b0| = 1 or b1 = 0 do u ← Ψ(b0 + b1τ) b0 ← b0 − u (b0, b1) ← (b1 − b0/2, −b0/2) ti ← u i ← i + 1 ti ← b0

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-17
SLIDE 17

Modifications for Efficiency and Improved Security

7/17

a b c m m m ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-18
SLIDE 18

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-19
SLIDE 19

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-20
SLIDE 20

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-21
SLIDE 21

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-22
SLIDE 22

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-23
SLIDE 23

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-24
SLIDE 24

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-25
SLIDE 25

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

1

Negations (e.g., −d0/2) take about 1/3 of cycles

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-26
SLIDE 26

Modifications for Efficiency and Improved Security

7/17

a b c 16 16 16 ±

1

Negations (e.g., −d0/2) take about 1/3 of cycles ⇒ We use the modification (d0/2 − d1, d0/2) instead of (d1 − d0/2, −d0/2) ⇒ The signs will be incorrect but can be corrected

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-27
SLIDE 27

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-28
SLIDE 28

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0 u = 1 ⇒ b0 + a0 and b1 + a1 u = 0 ⇒ do nothing

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-29
SLIDE 29

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0 u = 1 ⇒ b0 + a0 and b1 + a1 u = 0 ⇒ do nothing Bad SPA leakage!

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-30
SLIDE 30

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0 u = 1 ⇒ b0 + a0 and b1 + a1 u = 0 ⇒ do nothing Bad SPA leakage!

2

We select u ∈ {−1, 1} by using Ψ(d0 + d1τ)

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-31
SLIDE 31

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0 u = 1 ⇒ b0 + a0 and b1 + a1 u = 0 ⇒ do nothing Bad SPA leakage!

2

We select u ∈ {−1, 1} by using Ψ(d0 + d1τ) u = +1 ⇒ b0 + a0 and b1 + a1 u = −1 ⇒ b0 − a0 and b1 − a1

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-32
SLIDE 32

Modifications for Efficiency and Improved Security (cont.) 8/17 bi + u · ai, where u = d0 mod 2 ∈ {0, 1} d0 u = 1 ⇒ b0 + a0 and b1 + a1 u = 0 ⇒ do nothing Bad SPA leakage!

2

We select u ∈ {−1, 1} by using Ψ(d0 + d1τ) u = +1 ⇒ b0 + a0 and b1 + a1 u = −1 ⇒ b0 − a0 and b1 − a1 Similar operations ⇒ Improved SPA resistance!

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-33
SLIDE 33

Point Multiplication

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-34
SLIDE 34

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-35
SLIDE 35

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-36
SLIDE 36

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-37
SLIDE 37

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1 φ2

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-38
SLIDE 38

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1 −P−1 φ2

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-39
SLIDE 39

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1 −P−1 φ2 φ2

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-40
SLIDE 40

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1 −P−1 +P+1 φ2 φ2

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-41
SLIDE 41

Point Multiplication with Zero-free Expansions

10/17

Zero-free τ-adic expansion [Okeya et al, 2005] A τ-adic representation that represents k with ki ∈ {−1, 1} Combined with w-bit windows and precomputations ⇒ Fast point multiplication of only ℓ/w point additions ⇒ Constant pattern of point operations Example

1¯ 1¯ 11111¯ 1111¯ 1¯ 1¯

  • 1. . . 1¯

111

+P−1 −P−1 +P+1 +P−1 +P+1 +P−1 −P+1 +P−1 +P+1 φ2 φ2 φ2 φ2 φ2 φ2 φ2

w = 2:

P+1 = φ(P) + P P−1 = φ(P) − P

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-42
SLIDE 42

Additional Side-channel Countermeasures

11/17

Point additions and subtractions are computed in two phases: (1) To add (x, y) set (xp, yp, ym) ← (x, y, x + y), to subtract (x, y) set (xp, ym, yp) ← (x, y, x + y) (2) Add (xp, yp, ym)

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-43
SLIDE 43

Additional Side-channel Countermeasures

11/17

Point additions and subtractions are computed in two phases: (1) To add (x, y) set (xp, yp, ym) ← (x, y, x + y), to subtract (x, y) set (xp, ym, yp) ← (x, y, x + y) (2) Add (xp, yp, ym) The accumulator point is randomized as shown by Coron: (X, Y, Z) = (xr, yr2, r), where r is random

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-44
SLIDE 44

Additional Side-channel Countermeasures

11/17

Point additions and subtractions are computed in two phases: (1) To add (x, y) set (xp, yp, ym) ← (x, y, x + y), to subtract (x, y) set (xp, ym, yp) ← (x, y, x + y) (2) Add (xp, yp, ym) The accumulator point is randomized as shown by Coron: (X, Y, Z) = (xr, yr2, r), where r is random The expansion is expanded up to (almost) constant length

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-45
SLIDE 45

Additional Side-channel Countermeasures

11/17

Point additions and subtractions are computed in two phases: (1) To add (x, y) set (xp, yp, ym) ← (x, y, x + y), to subtract (x, y) set (xp, ym, yp) ← (x, y, x + y) (2) Add (xp, yp, ym) The accumulator point is randomized as shown by Coron: (X, Y, Z) = (xr, yr2, r), where r is random The expansion is expanded up to (almost) constant length The attacker can obtain only a single trace from the conversion

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-46
SLIDE 46

Architecture and Results

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-47
SLIDE 47

Architecture of the ALU

13/17

>

+ −

25 27 212

R1 R2 RdB1 RdB2 WtB1 WtB2

CU CL 1

ALU

RAM Address

ADDRESS

16 15 16 11 5 16

Single Port RAM CONTROL

Scalar Conversion, Field Addition/Squaring/Multiplication/Inversion, Point Arithmetic

16

din dout

LSB clr en en LSB carryin Binary Add 16x16 Binary Mult shift Offset

1 1

clr2

2

add carry carry

1 2

CU CL mask T ROM

18

+

Base Address Base WtOffset RdOffset Reduction−ROM

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-48
SLIDE 48

Results and Comparisons

14/17

We synthesized the design (coprocessor, not RAM) for UMC 130 nm CMOS with Synopsys Design Compiler 4,323 GE 1,566,000 clock cycles (incl. conversion) 97.89 ms (@16 MHz) 97.70 µW (@16 MHz) 9.56 µJ (@16 MHz)

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-49
SLIDE 49

Results and Comparisons (Cont.)

15/17

Work Curve RAM Area Latency Latency Power (GE) (cycles) (ms) (µW) Batina’06 B-163 no 9,926 95,159 190.32 <60 Bock’08 B-163 yes 12,876 – 95 93 Hein’08 B-163 yes 13,250 296,299 2,792 80.85 Kumar’06 B-163 yes 16,207 376,864 27.90 n/a Lee’08 B-163 yes 12,506 275,816 244.08 32.42 Wegner’11 B-163 yes 8,958 286,000 2,860 32.34 Wegner’13 B-163 no 4,114 467,370 467.37 66.1 Pessl’14 P-160 yes 12,448 139,930 139.93 42.42 Azarderakhsh’14 K-163 yes 11,571 106,700 7.87 5.7 Our, est. B-163 no ≈3,773 ≈485,000 ≈30.31 ≈6.11 Our, est. K-163 no ≈4,323 ≈420,900 ≈26.30 ≈6.11 Our, est. B-283 no ≈3,773 ≈1,934,000 ≈120.89 ≈6.11 Our, est. K-283 yes⋆ 10,204⋆ 1,566,000 97.89 >6.11 Our K-283 no 4,323 1,566,000 97.89 6.11 ⋆ Estimate for a 256 × 16-bit RAM, space needed for 252 16-bit words (4032 bits)

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-50
SLIDE 50

Conclusions & Future Work

16/17

We showed that 283-bit curves are feasible for lightweight implementations ⇒ The price to pay comes mainly in latency and memory requirements

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-51
SLIDE 51

Conclusions & Future Work

16/17

We showed that 283-bit curves are feasible for lightweight implementations ⇒ The price to pay comes mainly in latency and memory requirements Koblitz curves are feasible for lightweight implementations ⇒ Lead to savings in latency and energy consumption

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-52
SLIDE 52

Conclusions & Future Work

16/17

We showed that 283-bit curves are feasible for lightweight implementations ⇒ The price to pay comes mainly in latency and memory requirements Koblitz curves are feasible for lightweight implementations ⇒ Lead to savings in latency and energy consumption The drop-in concept is very efficient for high security curves ⇒ Area of the memory becomes less of an issue

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-53
SLIDE 53

Conclusions & Future Work

16/17

We showed that 283-bit curves are feasible for lightweight implementations ⇒ The price to pay comes mainly in latency and memory requirements Koblitz curves are feasible for lightweight implementations ⇒ Lead to savings in latency and energy consumption The drop-in concept is very efficient for high security curves ⇒ Area of the memory becomes less of an issue Future work Careful validation of resistance against side-channel attacks

  • K. Järvinen, CHES 2015, Sept. 14, 2015
slide-54
SLIDE 54

Thank you! Questions?

  • K. Järvinen, CHES 2015, Sept. 14, 2015