Hardware-Software Co-Design for Security: ECC Processor Example
Arnaud Tisserand
CNRS, Lab-STICC
SILM Workshop, Nov. 2019
Hardware-Software Co-Design for Security: ECC Processor Example - - PowerPoint PPT Presentation
Hardware-Software Co-Design for Security: ECC Processor Example Arnaud Tisserand CNRS, Lab-STICC SILM Workshop, Nov. 2019 -- Introduction Public-key (or asymmetric) cryptography (PKC): RSA (hyper-)elliptic curve cryptography ((H)ECC)
Hardware-Software Co-Design for Security: ECC Processor Example
Arnaud Tisserand
CNRS, Lab-STICC
SILM Workshop, Nov. 2019
Introduction
Public-key (or asymmetric) cryptography (PKC):
Design, prototype and evaluate hardware/software (HW/SW) for PKC:
Objectives:
Arnaud Tisserand. CNRS – Lab-STICC 2/20
Elliptic Curve Cryptography (ECC)
Elliptic curve over GF(p): E : y2 = x3 + ax + b Curve points representation:
many field inversions
significantly faster (e.g., Jacobian) Scalar multiplication: Q = [k]P = P + P + · · · + P
where P ∈ E and k = (kn−1kn−2 . . . k1k0)2
y 2 = x3 + 4x + 20 over GF(1009) The most time consuming
k has 200–600 bits Good and complete presentation in [14] and [10]
Arnaud Tisserand. CNRS – Lab-STICC 3/20
Scalar Multiplication
Q = [k]P = P + P + · · · + P
Double-and-add scalar multiplication algorithm: 1: Q ← O 2: for i from n − 1 to 0 do 3: Q ← [2]Q (DBL) 4: if ki = 1 then Q ← Q + P (ADD) 5: return Q
≈ 0.5n ones in k)
mul OUT RX sub OUT RY mul OUT RZ PZ mul PZ mul PZ mul PZ PX mul PX PY mul PY QY QY QX QX QZ QZ QZ QZ mul v18 add add v12 sub v13 mul v10 v10 v10 mul v11 v11 sub v11 mul v16 v17 v14 v14 sub v0 v1 v1 v2 v2 sqr v2 sub v3 v4 v4 v5 sqr v5 mul v6 v7 v7 v8 v9 v9 mul OUT RX sub OUT RY add OUT RZ PZ mul PZ mul PZ PX sqr PX mul PX PY PY mul PY a a add add v18 v18 add v19 v19 sub add v12 v12 sub v12 v13 add add v10 v10 v10 v11 mul v16 sqr v17 v17 v15 mul add v23 v23 sqr v22 v20 add v25 v25 v24 v24 add v0 add v1 v1 add v1 v2 v3 v4 sqr v4 v5 v6 v6 v6 v6 v7 v7 add v8 v8 v9 v9Arnaud Tisserand. CNRS – Lab-STICC 4/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
DBL DBL DBL DBL DBL DBL
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
DBL DBL DBL DBL DBL DBL ADD ADD
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
DBL DBL DBL DBL DBL DBL ADD ADD
0 0 0 1 1
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Side Channel Attacks
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
curve level
x±y x×y . . .
field level
DBL DBL DBL DBL DBL DBL ADD ADD
0 0 0 1 1
Scalar multiplication operation for i from 0 to t − 1 do if ki = 1 then Q = ADD(P, Q) P = DBL(P)
Arnaud Tisserand. CNRS – Lab-STICC 5/20
Software vs Hardware Support
reg. file FU1 FU2 FU3 LSU memory hierarchy D instructions management + control I @ @
SW HW
CTRL
reg.
reg.
reg.
reg.
memory
FLEXIBILITY EXCELLENT limited SPEED slow fast AREA large small ENERGY large small
moderate HUGE
Software vs Hardware Support
reg. file FU1 FU2 FU3 LSU memory hierarchy D instructions management + control I @ @
SW HW
CTRL
reg.
reg.
reg.
reg.
memory
FLEXIBILITY EXCELLENT limited SPEED slow fast AREA large small ENERGY large small
moderate HUGE
Arnaud Tisserand. CNRS – Lab-STICC 6/20
Activity in a Processor
Operation to be executed: r ← x + a[i] time x a[i] r + data/op.
predictor, monitoring, etc.
Arnaud Tisserand. CNRS – Lab-STICC 7/20
Activity in a Processor
Operation to be executed: r ← x + a[i] time signals x a[i] r + data/op.
predictor, monitoring, etc.
Arnaud Tisserand. CNRS – Lab-STICC 7/20
Activity in a Processor
Operation to be executed: r ← x + a[i] time signals x a[i] r + data/op. LD R1,R2 ADD R3,R1,R4 instructions
predictor, monitoring, etc.
Arnaud Tisserand. CNRS – Lab-STICC 7/20
Activity in a Processor
Operation to be executed: r ← x + a[i] time signals x a[i] r + data/op. LD R1,R2 ADD R3,R1,R4 instructions AS
predictor, monitoring, etc.
Arnaud Tisserand. CNRS – Lab-STICC 7/20
Activity in a Processor
Operation to be executed: r ← x + a[i] time signals x a[i] r + data/op. LD R1,R2 ADD R3,R1,R4 instructions AS processor internal state (PIS) processor internal state (PIS) processor internal state (PIS) state
predictor, monitoring, etc.
Arnaud Tisserand. CNRS – Lab-STICC 7/20
Our Processor Specifications
key exchange signature etc
protocol level
[k]P ADD(P, Q) DBL(P)
P + P curve level
x±y x×y . . .
field level
Arnaud Tisserand. CNRS – Lab-STICC 8/20
Our Processor Specifications
key exchange signature etc
protocol level
HW
[k]P ADD(P, Q) DBL(P)
P + P curve level
x±y x×y . . .
field level
⇒ hardware (HW)
◮ dedicated functional units ◮ internal parallelism
◮ reduced silicon area ◮ low energy (& power consumption) ◮ large area used at each clock cycle
Arnaud Tisserand. CNRS – Lab-STICC 8/20
Our Processor Specifications
key exchange signature etc
protocol level
HW SW
[k]P ADD(P, Q) DBL(P)
P + P curve level
x±y x×y . . .
field level
⇒ hardware (HW)
◮ dedicated functional units ◮ internal parallelism
◮ reduced silicon area ◮ low energy (& power consumption) ◮ large area used at each clock cycle
⇒ software (SW)
◮ curves, algorithms, representations (points/elements), k recoding, . . . ◮ at design time / at run time
Arnaud Tisserand. CNRS – Lab-STICC 8/20
Our Processor Specifications
key exchange signature etc
protocol level
HW SW HW
[k]P ADD(P, Q) DBL(P)
P + P curve level
x±y x×y . . .
field level
⇒ hardware (HW)
◮ dedicated functional units ◮ internal parallelism
◮ reduced silicon area ◮ low energy (& power consumption) ◮ large area used at each clock cycle
⇒ software (SW)
◮ curves, algorithms, representations (points/elements), k recoding, . . . ◮ at design time / at run time
⇒ HW
◮ secure units (F2m, Fp) ◮ secure key storage/management ◮ secure control
Arnaud Tisserand. CNRS – Lab-STICC 8/20
Processor Architecture
processor
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor register file FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor key mng. register file FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor CTRL key mng. register file FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor CTRL code mem. key mng. register file FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor interconnect CTRL code mem. key mng. register file FU1 FU2 FU3
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
processor interconnect CTRL code mem. key mng. register file FU1 FU2 FU3
Data: w-bit (32, . . . , 128) except for k digits, control: a few bits per unit
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Processor Architecture
external interface processor interconnect CTRL code mem. key mng. register file FU1 FU2 FU3
Data: w-bit (32, . . . , 128) except for k digits, control: a few bits per unit
Arnaud Tisserand. CNRS – Lab-STICC 9/20
Protected F2m Multipliers
Unprotected
50 100 150 200 250 100 200 300 400 500 #transitions cycles Mastrovito 233 200 225 250 cycles Arnaud Tisserand. CNRS – Lab-STICC 10/20
Protected F2m Multipliers
Unprotected
50 100 150 200 250 100 200 300 400 500 #transitions cycles Mastrovito 233 200 225 250 cycles
Protected Overhead: Area/time < 10 %
Arnaud Tisserand. CNRS – Lab-STICC 10/20
Protected Processor for F2m
100 200 300 50 100 150 200 250 300 350 #transit. cycles DBL operation Mastrovito Unprotected Activity trace 0.00 0.02 0.04 0.06 0.08 current [mA] DBL operation Mastrovito Unprotected Current measures 100 200 300 #transit. DBL operation Mastrovito Protected Activity trace 0.00 0.04 0.08 0.12 0.16 current [mA] DBL operation Mastrovito Protected Current measures 100 200 300 #transit. ADD operation Mastrovito Protected Activity trace
Arnaud Tisserand. CNRS – Lab-STICC 11/20
Key Management Unit
key mng. k key recoding ki CTRL
(fixed/sliding), double-base [6] and multiple-base [7] number systems (w/wo randomization), addition chains [20], other ?
Arnaud Tisserand. CNRS – Lab-STICC 12/20
Double-Base Number System
Standard radix-2 representation: k =
t−1
ki2i = kt−1 kt−2 . . . k2 k1 k0
t explicit digits
Arnaud Tisserand. CNRS – Lab-STICC 13/20
Double-Base Number System
Standard radix-2 representation: k =
t−1
ki2i = kt−1
2t−1
kt−2
2t−2
. . .
. . .
k2
22
k1
21
k0
20 t explicit digits implicit weights
Digits: ki ∈ {0, 1}, typical size: t ∈ {160, . . . , 600}
Arnaud Tisserand. CNRS – Lab-STICC 13/20
Double-Base Number System
Standard radix-2 representation: k =
t−1
ki2i = kt−1
2t−1
kt−2
2t−2
. . .
. . .
k2
22
k1
21
k0
20 t explicit digits implicit weights
Digits: ki ∈ {0, 1}, typical size: t ∈ {160, . . . , 600} Double-Base Number System (DBNS): k =
n−1
kj2aj3bj =
Arnaud Tisserand. CNRS – Lab-STICC 13/20
Double-Base Number System
Standard radix-2 representation: k =
t−1
ki2i = kt−1
2t−1
kt−2
2t−2
. . .
. . .
k2
22
k1
21
k0
20 t explicit digits implicit weights
Digits: ki ∈ {0, 1}, typical size: t ∈ {160, . . . , 600} Double-Base Number System (DBNS): k =
n−1
kj2aj3bj = kn−1 an−1 bn−1 . . . . . . . . . k1 a1 b1 k0 a0 b0 n (2, 3)−terms explicit “digits” explicit ranks aj, bj ∈ N, kj ∈ {1} or kj ∈ {−1, 1}, size n ≈ log t
Arnaud Tisserand. CNRS – Lab-STICC 13/20
Double-Base Number System
Standard radix-2 representation: k =
t−1
ki2i = kt−1
2t−1
kt−2
2t−2
. . .
. . .
k2
22
k1
21
k0
20 t explicit digits implicit weights
Digits: ki ∈ {0, 1}, typical size: t ∈ {160, . . . , 600} Double-Base Number System (DBNS): k =
n−1
kj2aj3bj = kn−1 an−1 bn−1 . . . . . . . . . k1 a1 b1 k0 a0 b0 n (2, 3)−terms explicit “digits” explicit ranks aj, bj ∈ N, kj ∈ {1} or kj ∈ {−1, 1}, size n ≈ log t DBNS is a very redundant and sparse representation:
1701 = (11010100101)2
1701 = 243 + 1458 = 2035 + 2136 = (1, 0, 5), (1, 1, 6) = 1728 − 27 = 2633 − 2033 = (1, 6, 3), (−1, 0, 3) = 729 + 972 = 2036 + 2235 = (1, 0, 6), (1, 2, 5) . . .
Arnaud Tisserand. CNRS – Lab-STICC 13/20
Randomized DBNS Recoding of the Scalar k
encryption signature etc
protocol level
[k]P ADD(P, Q) DBL(P) TPL(P)
curve level
x±y x×y . . .
field level
On-the-fly DBNS random recoding for the scalar k randomly recode windows of the scalar k on-the-fly: 1 + 2 ⇆ 3 1 + 3 ⇆ 22 1 + 23 ⇆ 32 . . . control number of reductions (←) and expansions (→) Point tripling operation Q = TPL(P) = P + P + P k
ki block time
recoding rules possible rules
Arnaud Tisserand. CNRS – Lab-STICC 14/20
Randomized DBNS Recoding of the Scalar k
encryption signature etc
protocol level
[k]P ADD(P, Q) DBL(P) TPL(P)
curve level
x±y x×y . . .
field level
On-the-fly DBNS random recoding for the scalar k randomly recode windows of the scalar k on-the-fly: 1 + 2 ⇆ 3 1 + 3 ⇆ 22 1 + 23 ⇆ 32 . . . control number of reductions (←) and expansions (→) Point tripling operation Q = TPL(P) = P + P + P k
ki block time
recoding rules possible rules
recoded ki (,ki+1)
random choice
Arnaud Tisserand. CNRS – Lab-STICC 14/20
Randomized DBNS Recoding of the Scalar k
encryption signature etc
protocol level
[k]P ADD(P, Q) DBL(P) TPL(P)
curve level
x±y x×y . . .
field level
On-the-fly DBNS random recoding for the scalar k randomly recode windows of the scalar k on-the-fly: 1 + 2 ⇆ 3 1 + 3 ⇆ 22 1 + 23 ⇆ 32 . . . control number of reductions (←) and expansions (→) Point tripling operation Q = TPL(P) = P + P + P k
ki block time
recoding rules possible rules
recoded ki (,ki+1)
random choice
DBNS is redundant ⇒ security ր DBNS is sparse ⇒ 20–30 % speed ր Ref: [6]
Arnaud Tisserand. CNRS – Lab-STICC 14/20
Register File (≈ Dual Port Memory)
x[i] y[i] r[i] field elements (size ≥ m bits) word size (w bits) Control signals: addresses (port A, port B), read/write, write enable Specific addressing model for Fq elements through an intermediate address table with hardware loop
⇒ HW: loop x[0], x[1], . . . x[ℓ − 1]
Arnaud Tisserand. CNRS – Lab-STICC 15/20
Developed Programming Tools
time now
V0 hardware modules . . . configurations CAD tools selection user crypto. lib. assembler binary implementation
Arnaud Tisserand. CNRS – Lab-STICC 16/20
Developed Programming Tools
time now
V0 V1 hardware modules . . . configurations CAD tools selection user crypto. lib. assembler binary implementation small compiler python API
Arnaud Tisserand. CNRS – Lab-STICC 16/20
Developed Programming Tools
time now
V0 V1 V2 hardware modules hardware modules . . . configurations CAD tools selection user crypto. lib. crypto. lib. assembler binary implementation small compiler Sage API
Arnaud Tisserand. CNRS – Lab-STICC 16/20
PAVOIS Integrated Circuit
ECC 256 bits GF(p) with p configurable 65 nm CMOS 1.5 mm2
basic layout obfuscation
Arnaud Tisserand. CNRS – Lab-STICC 17/20
Cryptoprocessors for HECC
Data Memory Control Program Memory Data MUX Ctrl DMUX AddSub Mult CSWAP Data Memory Global Control Program Memory Data MUX Ctrl DMUX AddSub Mult OReg CSWAP CSWAP AddSub Data Memory Mult AddSub Control Program Memory Data MUX Ctrl DMUX Mult Control Program Memory Data MUX ADD/SUB AddSub Data Memory Data MUX Data Memory ADD/SUB AddSub Mult Mult C S W A PArnaud Tisserand. CNRS – Lab-STICC 18/20
Our Long Term Objectives
Study the links between:
to ensure
◮ theoretical attacks ◮ physical attacks
area 1 delay 1 energy 1 security 1
Arnaud Tisserand. CNRS – Lab-STICC 19/20
Our Long Term Objectives
Study the links between:
to ensure
◮ theoretical attacks ◮ physical attacks
area 1 1 + a delay 1 1 + t energy 1 1 + e a, t, e ∈ 0%, 5%, 10%, . . . , 100% security 1
Arnaud Tisserand. CNRS – Lab-STICC 19/20
Our Long Term Objectives
Study the links between:
to ensure
◮ theoretical attacks ◮ physical attacks
area 1 1 + a delay 1 1 + t energy 1 1 + e a, t, e ∈ 0%, 5%, 10%, . . . , 100% security 1 ×10 ×100
Arnaud Tisserand. CNRS – Lab-STICC 19/20
The end, questions ?
Contact:
Lab-STICC, Centre Recherche UBS Rue St Maud´
Thank you
Arnaud Tisserand. CNRS – Lab-STICC 20/20
References I
[1]
Improving modular inversion in RNS using the plus-minus method. In G. Bertoni and J.-S. Coron, editors, Proc. 15th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), volume 8086 of LNCS, pages 233–249, Santa Barbara, CA, USA, August 2013. Springer. [2]
Single base modular multiplication for efficient hardware RNS implementations of ECC. In T. Guneysu and H. Handschuh, editors, Proc. 17th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), volume 9293 of LNCS, pages 123–140, Saint-Malo, France, September 2015. Springer. [3]
Hybrid position-residues number system. In J. Hormigo, S. Oberman, and N. Revol, editors, Proc. 23rd Symposium on Computer Arithmetic (ARITH), pages 126–133, Santa Clara, CA, U.S.A, July 2016. IEEE Computer Society. [4]
SPA resistant elliptic curve cryptosystem using addition chains.
[5]
Comparison of simple power analysis attack resistant algorithms for an elliptic curve cryptosystem. Journal of Computers, 2(10):52–62, 2007. [6]
Hardware implementation of DBNS recoding for ECC processor. In Proc. 44rd Asilomar Conference on Signals, Systems and Computers, pages 1129–1133, Pacific Grove, California, U.S.A., November 2010. IEEE. [7]
On-the-fly multi-base recoding for ECC scalar multiplication without pre-computations. In A. Nannarelli, P.-M. Seidel, and P. T. P. Tang, editors, Proc. 21st Symposium on Computer Arithmetic (ARITH), pages 219–228, Austin, TX, U.S.A, April 2013. IEEE Computer Society. Arnaud Tisserand. CNRS – Lab-STICC 20/20
References II
[8]
Robust sub-powered asynchronous logic. In J. Becker and M. R. Adrover, editors, Proc. 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pages 1–7, Palma de Mallorca, Spain, September 2014. IEEE. [9]
Asynchronous charge sharing power consistent Montgomery multiplier. In J. Sparso and E Yahya, editors, Proc. 21st IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), pages 132–138, Mountain View, California, USA, May 2015. [10]
Handbook of Elliptic and Hyperelliptic Curve Cryptography. Discrete Mathematics and Its Applications. Chapman & Hall/CRC, July 2005. [11]
Architecture level optimizations for kummer based HECC on FPGAs. In Arpita Patra and Nigel P. Smart, editors, Proc. 18th International Conference on Cryptology in India (IndoCrypt), volume 10698 of LNCS, pages 44–64, Chennai, India, December 2017. Springer. [12]
Hyper-threaded multiplier for HECC. In Proc. 51st Asilomar Conference on Signals, Systems and Computers, pages 447–451, Pacific Grove, CA, USA, October
[13]
Generation of finely-pipelined GF(P) multipliers for flexible curve based cryptography on FPGAs. IEEE Transactions on Computers, 69(11):1612–1622, November 2019. [14]
Guide to Elliptic Curve Cryptography. Springer, 2004. Arnaud Tisserand. CNRS – Lab-STICC 20/20
References III
[15]
Microcontroller implementation of simultaneous protections against observation and perturbation attacks for ECC. In Proc. 15th International Conference on Security and Cryptography (SECRYPT), Porto, Portugal, July 2018. Springer. [16]
Arithmetic Operators on GF(2m) for Cryptographic Applications: Performance - Power Consumption - Security Tradeoffs. Phd thesis, University of Rennes 1 and Silesian University of Technology, December 2012. [17]
Analysis of GF(2233) multipliers regarding elliptic curve cryptosystem applications. In 11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems (PDeS), pages 271–276, Brno, Czech Republic, May 2012. [18]
GF(2m) finite-field multipliers with reduced activity variations. In 4th International Workshop on the Arithmetic of Finite Fields, volume 7369 of LNCS, pages 152–167, Bochum, Germany, July 2012. Springer. [19]
Fast and secure finite field multipliers. In Proc. 18th Euromicro Conference on Digital System Design (DSD), pages 653–660, Madeira, Portugal, August 2015. [20]
Full hardware implementation of short addition chains recoding for ECC scalar multiplication. In Actes Conf´ erence d’informatique en Parall´ elisme, Architecture et Syst` eme (ComPAS), Lille, France, June 2015. [21]
Hardware accelerators for ECC and HECC. In 19th Workshop on Elliptic Curve Cryptography (ECC), Bordeaux, France, September 2015. Invited talk. Arnaud Tisserand. CNRS – Lab-STICC 20/20