9/9/2012 1
Cryptographic hardware: how to make it cool, fast and secure
Junfeng Fan
KULeuven, ESAT/SCD-COSIC CHES 2012
CHES Tutorial
Crypto hardware
CHES Tutorial: Crypto hardware design 3 9/9/2012
CHES Tutorial Cryptographic hardware: how to make it cool, fast and - - PDF document
9/9/2012 CHES Tutorial Cryptographic hardware: how to make it cool, fast and secure Junfeng Fan KULeuven, ESAT/SCD-COSIC CHES 2012 Crypto hardware 9/9/2012 CHES Tutorial: Crypto hardware design 3 1 9/9/2012 Smart card SoC (NXP P60C080)
KULeuven, ESAT/SCD-COSIC CHES 2012
CHES Tutorial: Crypto hardware design 3 9/9/2012
9/9/2012 CHES Tutorial: Crypto hardware design 4
9/9/2012 CHES Tutorial: Crypto hardware design 5
9/9/2012 CHES Tutorial: Crypto hardware design 6
– Within power, area, timing budgets
– Resistant to attacks
electromagnetic leaks
9/9/2012 CHES Tutorial: Crypto hardware design 7
I. Introduction
ASIC FPGA Design flow
II. Building Blocks
AES RSA/ECC
III. Optimization
Area Speed Power
IV. Physical Security
Passive Active
CHES Tutorial: Crypto hardware design 8
9/9/2012
Chip
9/9/2012 CHES Tutorial: Crypto hardware design 9
Verilog VHDL
DRC LVS ERC
Synthesis RTL Design Physical Design Fabrication System Specification Architectural Design Packaging and Testing
[Source: Andrew B. Kahng et al.]
Chip Physical Verification
CHES Tutorial: Crypto hardware design 10 9/9/2012
IN OUT 1 1 1 1 1
INV NOR NAND
IN1 IN2 OUT 1 1 1 1 1 IN1 IN2 OUT 1 1 1 1 1 1 1
9/9/2012 CHES Tutorial: Crypto hardware design 11 11
Power (Vdd)-Rail Ground (GND)-Rail
Contact
Vdd GND OUT IN2 IN1 OUT IN2 IN1 OUT IN1 Vdd GND IN2
Diffusion layer p-type transistor n-type transistor Metal layer Poly layer
[source: Andrew B. Kahng et al.]
9/9/2012 CHES Tutorial: Crypto hardware design 12
row decoder column decoder n n-k k 2m bits column circuitry bitline conditioning memory cells: 2n-k rows x 2m+k columns bitlines wordlines
bit bit_b word
[Source: Adnan Aziz]
6T Cell
CHES Tutorial: Crypto hardware design 13 9/9/2012
Combinational Logic 1 Combinational Logic 2 D_in D_out CLK CLK Delay_1 Delay_2 Clock Period DFF DFF DFF
CHES Tutorial: Crypto hardware design 14 9/9/2012
CLK Delay_1 Delay_2 = Delay_1 Clock Period Combinational Logic 1 Combinational Logic 2 D_in D_out CLK DFF DFF DFF
9/9/2012 CHES Tutorial: Crypto hardware design 15
D_out D_in CLK Round 1 Round 2 Round 10
D_in D_out CLK Round Latency: 10 Throughput: 1 Block/Cycle Latency: 10 Throughput: 1/10 Block/Cycle DFF DFF DFF DFF DFF DFF
CHES Tutorial: Crypto hardware design 16 9/9/2012
– Limited energy – Limited power
– Source of information leakage
9/9/2012 CHES Tutorial: Crypto hardware design 17
0-1 transition
IN OUT 0 0 0 1 discharge 1 0 charge 1 1
Optimal Pairing
CHES Tutorial: Crypto hardware design 18 9/9/2012
RSA1024 RSA2048 ECC160p ECC256p AES128 DH2048 MCU
Decoder Crypto Datapath Register File
BUS
ECC Exp AES …
9/9/2012 CHES Tutorial: Crypto hardware design 19
4-LUT Z B C D A A B C D 16x1 Z SRAM addr ABCD Z 0000 0 0001 0
… .
1101 0 1110 0 1111 1 AND Z A B C D ABCD Z 0000 0 0001 1
… .
1101 1 1110 1 1111 0 XOR Z A B C D
9/9/2012 CHES Tutorial: Crypto hardware design 20
Virtex-5 SliceL
9/9/2012 CHES Tutorial: Crypto hardware design 21
I/O Blocks (IOBs) Configurable Logic Blocks (CLBs) Clock Management (DCMs, BUFGMUXes) Block SelectRAM™ resource Dedicated multipliers Programmable interconnect
CHES Tutorial: Crypto hardware design 22
9/9/2012
9/9/2012 CHES Tutorial: Crypto hardware design 23
– AES 128-bit (CTR) – RSA 1024-, 2048-, 4096-bit – ECC 160-, 192-, 256-bit, prime field
– Frequency: 200 MHz – AES128 : 20Gbits/s – RSA1024 : 2000 signatures per second – ECC160 : 4000 signatures per second
9/9/2012 CHES Tutorial: Crypto hardware design 24
– AES 128-bit (CTR) – RSA 1024-bit – ECC 160-bit
– Frequency: 5 MHz – AES128 : 1Mbits/s – RSA1024 : 5 signatures per second – ECC160: 10 signatures per second
9/9/2012 CHES Tutorial: Crypto hardware design 25
9/9/2012 CHES Tutorial: Crypto hardware design 26
CHES Tutorial: Crypto hardware design 27 9/9/2012
– Nr = 10, 12, 14
AddRoundKey i:=1 SubBytes ShiftRows MixColumns AddRoundKey
i<Nr-1 ?
SubBytes ShiftRows AddRoundKey RoundKey[0] RoundKey[i] RoundKey[Nr] i++ Nr-1 times
9/9/2012 CHES Tutorial: Crypto hardware design 28
sbox
a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 ai bi
9/9/2012 CHES Tutorial: Crypto hardware design 29
a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15
ShiftRow
– multiply with constant entries
9/9/2012 CHES Tutorial: Crypto hardware design 30
a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 bi bi+1 bi+2 bi+3 ai ai+1 ai+2 ai+3 2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2
=
a6 a5 a4 a3 a2 a1 a0 0 0 0 0 a7 a7 0 a7 a7 b7 b6 b5 b4 b3 b2 b1 b0
2 x
a7 a6 a5 a4 a3 a2 a1 a0 a6 a5 a4 a3 a2 a1 a0 0 0 0 0 a7 a7 0 a7 a7 b7 b6 b5 b4 b3 b2 b1 b0
3 x
9/9/2012 CHES Tutorial: Crypto hardware design 31
a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 k2 k6 k10 k14 k0 k4 k8 k12 k1 k5 k9 k13 k3 k7 k11 k15 a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15
9/9/2012 CHES Tutorial: Crypto hardware design 32 W[0] W[1] W[2] W[3]
Rcon(i/NK) Subword(RotWord())
W[i-Nk] ^ W[i-1] W[i-Nk] ^ ByteSub(RotByte(W[i-1]))^ Rcon[i/Nk]
9/9/2012 CHES Tutorial: Crypto hardware design 33
AES (ENC) Pi Ci-1 AES (ENC) Ci-1 Pi
Sender Receiver
9/9/2012 CHES Tutorial: Crypto hardware design 34
P
i
C
i
Cntri AES (ENC) Yi Cntri AES (ENC) P
i
Sender Receiver Yi
9/9/2012 CHES Tutorial: Crypto hardware design 35
Sender Receiver Pipelining Notes ECB Enc Dec Yes not used CBC Enc Dec No CBC-MAC Enc Enc No Message authentication CFB Enc Enc No OFB Enc Enc No CTR Enc Enc Yes OCB CCM Enc 2Enc Dec 2Enc Yes Only CTR Privacy & MAC Privacy & MAC
– 20Gbits / 200MHz = 100 bits per cycle => Let’s say, it encrypt one block (128-bit) per cycle
9/9/2012 CHES Tutorial: Crypto hardware design 36
AES AES AES
bus I/O
Round 1
I/O
Round 10
AddRK
bus Type-1 Type-2
9/9/2012 CHES Tutorial: Crypto hardware design 37
Add RoundKey Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7 Round 8 Round 9 Round 10
Key expansion MixColumn SubBytes ShiftRow Add RoundKey
32b
Key expansion SubBytes ShiftRow Add RoundKey
Key Plaintext Ciphertext
9/9/2012 CHES Tutorial: Crypto hardware design 38
Add RoundKey Round 1 Round 2 Round 9 Round 10
32b Key Plaintext Ciphertext
…
Key expansion MixColumn SubBytes ShiftRow Add RoundKey
– Step 1: Multiplicative inverse in GF(28) – Step 2: Affine transformation
– Using a lookup tables (LUT) – On-the-fly computation
9/9/2012 CHES Tutorial: Crypto hardware design 39
9/9/2012 CHES Tutorial: Crypto hardware design 40
[Source: Hodjat et al., 2006]
200 400 600 800 1000 1200 1400 1600 1800 1 2 3 4 5 6 7 Delay (nsec) Area (gates) gf_design 2-stage pipeline gf_design 3-stage pipeline gf_design without pipeline LUT_design without pipeline
Map
X2 X2 × e
+ ×
X-1
+ × ×
Map-1 Affine
8 8 Map X2 X2 × e
+ ×
X-1
+ × ×
Map-1 Affine
8 8
Map
X2 X2 × e
+ ×
X-1
+ × ×
Map-1 Affine
8 8
The area cost of the Sbox using two-stage and three-stage composite field implementation is 23% and 32% less than the LUT design with the same speed.
CHES Tutorial: Crypto hardware design 41 9/9/2012
– (basic) encryption – (basic) decryption Input: Public key of the receiver: {n, e} , where n=pq, Plaintext: 0≤ m ≤ n-1. Output: Ciphertext: c = me mod n. Input: Private key: d, where d=e-1 mod (p-1)(q-1), Ciphertext: c. Output: m = cd mod n.
9/9/2012 CHES Tutorial: Crypto hardware design 42
ModSqr ModMul
ModSqr ModMul
ModSqr ModMul
CHES Tutorial: Crypto hardware design 43 9/9/2012
P Q R=P+Q
9/9/2012 CHES Tutorial: Crypto hardware design 44
Q R=2Q
9/9/2012 CHES Tutorial: Crypto hardware design 45
PointDbl PointAdd
PointDbl PointAdd
PointDbl PointAdd
9/9/2012 CHES Tutorial: Crypto hardware design 46
PointDbl PointAdd
PointDbl PointAdd
PointDbl PointAdd
9/9/2012 CHES Tutorial: Crypto hardware design 47
RSA, ECC Modular Exponentiation Point Multiplication Modular Add/Sub Modular Mul Modular Inv RSA1024
MM1024 ECC160p
MM160 160PD 80PA
* Using Jacobian coordinates: PD = 4M+4S, PA=12M+4S
9/9/2012 CHES Tutorial: Crypto hardware design 48
– Frequency: 200 MHz – RSA1024 : 2000 signatures per second – ECC160 : 4000 signatures per second
9/9/2012 CHES Tutorial: Crypto hardware design 49
A3 A2 A1 A0 B3 B2 B1 B0 C7 C6 C1 C0 …
64-bit Mul on a 16-bit CPU Complexity: Mul: n2 Add: O(n2)
9/9/2012 CHES Tutorial: Crypto hardware design 50
Complexity: Mul: 3nlog23 Add: O(n2) Schoolbook Karatsuba 16-bit MUL. 4096 (=642) 729 1024-bit Mul on a 16-bit CPU 1024 512 512 256 256 256 256
A=AH ∙ 2512+AL
– Step 1: q = floor(c/p) – Step 2: r = c – qp.
– Precomputed: p’ = 1/p – Step 1: q = cp’ – Step 2: r = c - qp
9/9/2012 CHES Tutorial: Crypto hardware design 51
9/9/2012 CHES Tutorial: Crypto hardware design 52
Input: a, b< p. Output: c= ab mod p. Parameter: R=2w>p.
Step 1: domain conversion:
a a’ = aR mod p b b’ = bR mod p
Step 2: multiplication:
c’ = a’b’R-1 mod p = abR mod p
Step 3: domain conversion :
c=ab c’ = abR mod p MontM(A,B) :=ABR-1 mod p MontM(a,R2) = aR2R-1 mod p = aR mod p MontM(c’,1) = abRR-1 mod p = ab mod p
Correctness: Qp mod R ≡ (Tp’ mod R) p mod R ≡ Tp’p mod R ≡ (-T ) mod R thus, (T +Qp) mod R = 0. thus, “div R” is simply a right-shift. Input: A, B< p. Output: C ≡ ABR-1 mod p Parameter: R=2w>p, and p’= -p-1 mod R Step 1: T=AB Step 2: Q=(T mod R ) p’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C
9/9/2012 CHES Tutorial: Crypto hardware design 53
Input: A, B< p. Output: C ≡ ABR-1 mod p Parameter: R=2w>p, and p’= -p-1 mod R Step 1: T=AB Step 2: Q=(T mod R ) p’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C Correctness: Qp mod R ≡ (Tp’ mod R) p mod R ≡ Tp’p mod R ≡ (-T ) mod R thus, (T +Qp) mod R = 0. Since R=2w, “div R” is simply a right-shift.
9/9/2012 CHES Tutorial: Crypto hardware design 54
Parameter: p, R, where R=2w>p, and p’= -p-1 mod R Input: A, B< p. Output: C ≡ ABR-1 mod p Step 1: T=AB Step 2: Q=Tp’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C
A B T
Step 1
T mod R p’ Q
Step 2
Q mod R p’ G
Step 3
T C
Question: If A and B have n words each, how many multiplications are needed?
9/9/2012 CHES Tutorial: Crypto hardware design 55
9/9/2012 CHES Tutorial: Crypto hardware design 56
Parameter: p = 2k - c Input: A, B < p. Output: r ≡ AB mod p Step 1: T = AB Step 2: Repeat : T = TL + TH c until T<p.
// T = TH 2k + TL = TH (2k – c) + TL + TL c // = TH p+ TL + TH c
Return: T
A B T
TH = T div 2k C H
TL = = T mod 2k T
Question: How many iterations are needed?
9/9/2012 CHES Tutorial: Crypto hardware design 57 Image Soure: http://www.clipartmojo.com
I am trying RSA1024 on my calculator…… gosh my finger hurts! Algorithm Complexity Integer multiplication Schoolbook n2 Integer multiplication Karatsuba 3nlog23 Barrett modular multiplication ≈ 2n2 + n Montgomery modular multiplication 2n2 + n s: operand size [bits] w: digit size n: number of digits [n = s/w]
Note: Other variants of Montgomery and Barrett reduction algorithm may have different complexity.
– Montgomery method (Schoolbook): 2n2+n = 2080. – We need (at least) 2080/66 multipliers (32-bit unsigned). – Cycles for data loading, addition, and so on.
9/9/2012 CHES Tutorial: Crypto hardware design 58
– Montgomery method (Schoolbook): 2n2+n = 55. – We need (at least) 55/19 multipliers (32-bit unsigned). – Cycles for data loading, addition, and so on.
9/9/2012 CHES Tutorial: Crypto hardware design 59
9/9/2012 CHES Tutorial: Crypto hardware design 60
MUL31 A31 MUL30 A30 MUL1 A1 MUL0 A0 Bi Carry Safe Adder
[Source: Mentens et al., GLSVLSI 2007]
Mul: 1024x32
9/9/2012 CHES Tutorial: Crypto hardware design 61
MUL31 FSM
RAM
I/O MUL31 FSM
RAM
I/O MUL31 FSM
RAM
I/O
core 0 core 1 core 31
[Source: Tenca and Koç, IEEE ToC, 2003]
9/9/2012 CHES Tutorial: Crypto hardware design 62
MUL32a AMSB BMSB MUL33c CMSB MUL32b ALSB BLSB T = (T2 << 64) + T0 + ( T1 - T2 - T2) <<32 T2 T1 T0 T C = A+B CLSB A B
9/9/2012 CHES Tutorial: Crypto hardware design 63
– Frequency: 5 MHz – Area : < 60 kgates – AES128 : 1Mbits/s – RSA1024 : 5 signatures per second – ECC160: 10 signatures per second
– AES: 5M / (1M/128) = 640 cycles per block (≈64 cycles per round) – RSA: 5M / (5 * 1500) = 699 cycles per MM1024 – ECC: 5M / (10 * 2560) = 204 cycles per MM160
– How many SBox? – LUT vs. Finite-field computation
9/9/2012 CHES Tutorial: Crypto hardware design 64
[source: Moradi et al., CHES 2011]
State (4x4 Bytes) Key (4x4 Bytes) SubByte MixCocumn KE FSM I/O
– Flip-flops: (128+128)*6 = 1536 gates – SBox + MixColumn = 800 ‡ – FSM + KeyAdd = 300 (roughly)
9/9/2012 CHES Tutorial: Crypto hardware design 65
Author CMOS [um] Area [GE] Freq. [MHz] Throughput [Mbps] Hwang et al. 0.18 19300 330 3840 Mangard 0.6 7000 50 98 Satoh 0.11 5400 130 331 Feldhofer 0.35 3400 80 10 Moradi et al. 0.18 2400 0.1 0.057
9/9/2012 CHES Tutorial: Crypto hardware design 66
PA_PD Exp Inv
FSM
ALU
Register File Runtime Memory
BUS …
– 699 cycles for MM1024 – 204 cycles for MM160
– Multiplier: 32-bit? 64-bit? – Memory: single-port? dual-port?
– Storage: 1024*8 = 8kb – ALU: ? – FSM: ?
9/9/2012 CHES Tutorial: Crypto hardware design 67
Should be enough?
– Consider C= ∑ AiBj
– Programmable – Fast to access
– Reduce memory access
9/9/2012 CHES Tutorial: Crypto hardware design 68
CPU Decoder 64-bit MAC
Register File (8x64)
BUS
InsRom
CHES Tutorial: Crypto hardware design 69
9/9/2012
– Faster modular +,-,x, /
– Chinese Remainder Theorem (CRT) – m-ary method
– m-ary method – x-coordinate only point multiplication – multiple point multiplication
9/9/2012 CHES Tutorial: Crypto hardware design 70
RSA, ECC Modular Exponentiation Point Multiplication Modular Add/Sub Modular Mul Modular Inv
– ECC coprocessor (k163)
9/9/2012 CHES Tutorial: Crypto hardware design 71
CPU FSM Digit-serial multiplier Reg File
BUS
1 2 3 4 5 20 40 60 80 100 120
Area [kGE] Cycles [x10^4] Freq [x10kHz] Power [uw] Energy [uJ]
[source: Lee et al., IEEE ToC, 2008]
– p192 = 2192 - 264 - 1 – p224 = 2224 - 296 + 1 – p256 = 2256 - 2224 + 2192 + 296 -1 – p384 = 2384 - 2128 - 296 + 232 -1 – p521 = 2521 - 1
9/9/2012 CHES Tutorial: Crypto hardware design 72
9/9/2012 CHES Tutorial: Crypto hardware design 73
– Reducing VDD – Relax critical path (pipelining)
– Minimum device sizes – Compact and custom layout
– Clock gating – Eliminate glitches
9/9/2012 CHES Tutorial: Crypto hardware design 74
Power Time Power Time
Design-I Design-II
CHES Tutorial: Crypto hardware design 75
9/9/2012
9/9/2012 CHES Tutorial: Crypto hardware design 76
Side-channel analysis Fault analysis
9/9/2012 CHES Tutorial: Crypto hardware design 77
0-1 transition
IN OUT 0 0 0 1 discharge 1 0 charge 1 1
– Symmetric:
– Asymmetric:
9/9/2012 CHES Tutorial: Crypto hardware design 78
9/9/2012 CHES Tutorial: Crypto hardware design 79
Left-to-right binary method for point multiplication k = (kl-1,kl-2,...,k0) R ← O, for i=l-1 downto 0 do R ← [2]R if ki = 1 then R ← R + P end if end for
9/9/2012 CHES Tutorial: Crypto hardware design 80
[Source: B. Gierlichs]
9/9/2012 CHES Tutorial: Crypto hardware design 81
[Source: B. Gierlichs]
9/9/2012 CHES Tutorial: Crypto hardware design 82
– Usually a few hundred for software implementations – Usually a few thousand for hardware implementations – Can go up to several 100k if implementation is protected
9/9/2012 CHES Tutorial: Crypto hardware design 83
Input Real key Real side- channel Real output Model of side- channel Key hypothesis Hypothetical output Statistical analysis Hypothesis correct?
[Source: B. Gierlichs]
– Fixed operation time – Fixed operation pattern – Data randomization
– Differential logic – Masking
9/9/2012 CHES Tutorial: Crypto hardware design 84
9/9/2012 CHES Tutorial: Crypto hardware design 85
Input: n, m and e. Output: c = me mod n. 1. Let e = [et, et-1, …, e1, e0]2; 2. c := 1; 3. For i:=t downto 0 do 4. c:= c2 mod n; 5. if ei ==1 then 6. c:=cm mod n; Return c.
What is required to secure it?
Input: n, m and e. Output: c = me mod n. 1. Let e = [1, et-1, …, e1, e0]2; 2. R[0] := m; R[1] = m2 mod n; 3. For i:=t-1 downto 0 do 4. R[1-ei] := R[0]R[1] mod n; 5. R[ei] := R[ei]R[ei] mod n; Return R[0].
Left-to-right binary method Montgomery Powering Ladder What is required to secure it?
9/9/2012 CHES Tutorial: Crypto hardware design 86
Input: n, m and e. Output: c = me mod n. 1. r = Random(); //r <n 2. ms := rm; 3. v= ms
e mod n;
4. u:= re mod n; 5. c:=v/u mod n; Return c.
Randomized exponentiation
9/9/2012 CHES Tutorial: Crypto hardware design 87
Input: k, P. Output: Q = kP. 1. r = Random(); //r < order(P) 2. k’ := k + r *order(P); 3. Q= k’ P; // [order(P)] P = O. Return Q.
Randomized scalar
Input: k, P. Output: Q = kP. precomputed: R, S=kR. 1. T := P + R; 2. Q’ = k T; 3. Q = Q’ – S 4. r = Random(); //r < 28 5. R = rR, S = rS; //update R, S Return Q.
Base point blinding
9/9/2012 CHES Tutorial: Crypto hardware design 88
1
0-1 transition
IN IN OUT OUT 0 0 1 1 0 1 1 discharge charge 1 0 1 charge discharge 1 1 0 0
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 89
in
Pr(echarge) Ev(aluation) PDN IN OUTPre OUTEV Charge 0 0 1 1 0 1 1 discharge 1 0 1 1 1 1 1 discharge
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 90
[courtesy I. Verbawhede]
– WDDL – MCML – MDPL
– SABL – DyCML
9/9/2012 CHES Tutorial: Crypto hardware design 91
9/9/2012 CHES Tutorial: Crypto hardware design 92
B A Z A B A B Z Z A B A prch Z B Z
[courtesy I. Verbawhede]
– input 0 output 0 – no precharge operator
9/9/2012 CHES Tutorial: Crypto hardware design 93
AND gate OR gate prch precharge inputs clk Encryption Module register clk
eval. prch.
[Tiri,DATE2004]
OAI221X2:
9/9/2012 CHES Tutorial: Crypto hardware design 94
A
A B B Y Y
AOI22X1 OAI22X1 INVX4 INVX4
C0
OAI221X1 AOI221X1
A0 A1 B0 B1 Y Y
INVX2 INVX2
A0 A1 B0 B1 C0
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 95
5 6 3 4 2 1
5 6 3 4 2 1 5 6 3 4 2 1 5 6 3 4 2 1
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 96
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 97
CA = CA’ Co,A + Cw,A + Ci,I1 + … Ci,Ik = Co,A’ + Cw,A’ + Ci,I1’ + … Ci,Ik’ Cw,A = Cw,A’
Co,A’ Ci,I2’ Co,A Ci,I2 Ci,I1’ Ci,I1 gate gate 2 gate 1 Co: intrinsic output capacitance Cw: interconnect capacitance Ci: input capacitance Cw,A’ Rw,A’ Cw,A Rw,A
[courtesy I. Verbawhede]
9/9/2012 CHES Tutorial: Crypto hardware design 98
insecure single-ended secure WDDL differential route
[Hwang, JSSC06]
9/9/2012 CHES Tutorial: Crypto hardware design 99
9/9/2012 CHES Tutorial: Crypto hardware design 100
9/9/2012 CHES Tutorial: Crypto hardware design 101
[Trichina,2004]
ama bmb ma mb mAND mAND a.bmAND
– Vcc – Glitch – Clock – Temperature – UV – Light – X-Rays
9/9/2012 CHES Tutorial: Crypto hardware design 102
[Source: H. Handshuh]
9/9/2012 CHES Tutorial: Crypto hardware design 103
Double-and-add-always method k = (kl-1,kl-2,...,k0) R0 ← O, for i=l-1 downto 0 do R0 ← [2] R0 R1 ← R0 + P if ki = 1 then R0 ← R1 else R0 ← R0 end if end for IN OPA OP0 OP1 OPB OUT ki=0 ki=1
9/9/2012 CHES Tutorial: Crypto hardware design 104
– Input/output validity – Intermediate check
– Duplicated data-path – Multiple executions
– Circuit shields – Sensors (temperature, frequency, voltage, etc. )
9/9/2012 CHES Tutorial: Crypto hardware design 105
9/9/2012 CHES Tutorial: Crypto hardware design 106
– Larger area – Longer operation time
– Security – Power – Area – Attacks – Performance A E R
9/9/2012 CHES Tutorial: Crypto hardware design 107
Server Client
root-of-trust
Protocol/Algorithm-level validation
Noncritical software
Matching & Crypto SW
Architecture-level validation
Architecture-level attacks
Matching & Crypto HW
Software driver
Microarchitecture-level validation
Microarchitecture-level attacks DPA-resistant HW Circuit-level attacks
[Source: P. Schaumont]
9/9/2012 CHES Tutorial: Crypto hardware design 108
Design Spec Architecture Design RTL Design Logic Synthesis Secure Algorithms
Netlist
Security Evaluation Phase I Physical Design Security Evaluation Phase II
Layout
Secure Logic SCA FA
– Architecture design – Low-power design methods – Physical security
– A better design flow for crypto hardware – Provable physical security?
9/9/2012 CHES Tutorial: Crypto hardware design 109