CHES Tutorial Cryptographic hardware: how to make it cool, fast and - - PDF document

ches tutorial
SMART_READER_LITE
LIVE PREVIEW

CHES Tutorial Cryptographic hardware: how to make it cool, fast and - - PDF document

9/9/2012 CHES Tutorial Cryptographic hardware: how to make it cool, fast and secure Junfeng Fan KULeuven, ESAT/SCD-COSIC CHES 2012 Crypto hardware 9/9/2012 CHES Tutorial: Crypto hardware design 3 1 9/9/2012 Smart card SoC (NXP P60C080)


slide-1
SLIDE 1

9/9/2012 1

Cryptographic hardware: how to make it cool, fast and secure

Junfeng Fan

KULeuven, ESAT/SCD-COSIC CHES 2012

CHES Tutorial

Crypto hardware

CHES Tutorial: Crypto hardware design 3 9/9/2012

slide-2
SLIDE 2

9/9/2012 2

Smart card SoC (NXP P60C080)

9/9/2012 CHES Tutorial: Crypto hardware design 4

Smart phone SoC (Texas Instrument OMAP4470)

9/9/2012 CHES Tutorial: Crypto hardware design 5

slide-3
SLIDE 3

9/9/2012 3

Design target

9/9/2012 CHES Tutorial: Crypto hardware design 6

  • Efficient, lightweight implementation

– Within power, area, timing budgets

  • Public key: 1024 bits RSA on 8 bit mC
  • Public key on a passive RFID tag
  • Trustworthy implementation

– Resistant to attacks

  • Active attacks: probing, power glitches, JTAG scan chain
  • Passive attacks: side channel attacks, including power, timing and

electromagnetic leaks

Outline

9/9/2012 CHES Tutorial: Crypto hardware design 7

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

I. Introduction

ASIC FPGA Design flow

II. Building Blocks

AES RSA/ECC

III. Optimization

Area Speed Power

IV. Physical Security

Passive Active

slide-4
SLIDE 4

9/9/2012 4

Part I: Introduction to hardware design

CHES Tutorial: Crypto hardware design 8

  • ASIC
  • FPGA
  • Design Flow

9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Chip

ASIC Design Flow

9/9/2012 CHES Tutorial: Crypto hardware design 9

Verilog VHDL

DRC LVS ERC

Synthesis RTL Design Physical Design Fabrication System Specification Architectural Design Packaging and Testing

[Source: Andrew B. Kahng et al.]

Chip Physical Verification

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-5
SLIDE 5

9/9/2012 5

Standard Cells

CHES Tutorial: Crypto hardware design 10 9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
  • Common Logic Gates

IN OUT 1 1 1 1 1

INV NOR NAND

IN1 IN2 OUT 1 1 1 1 1 IN1 IN2 OUT 1 1 1 1 1 1 1

9/9/2012 CHES Tutorial: Crypto hardware design 11 11

Power (Vdd)-Rail Ground (GND)-Rail

Contact

Vdd GND OUT IN2 IN1 OUT IN2 IN1 OUT IN1 Vdd GND IN2

Diffusion layer p-type transistor n-type transistor Metal layer Poly layer

NAND

[source: Andrew B. Kahng et al.]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-6
SLIDE 6

9/9/2012 6

SRAM

9/9/2012 CHES Tutorial: Crypto hardware design 12

row decoder column decoder n n-k k 2m bits column circuitry bitline conditioning memory cells: 2n-k rows x 2m+k columns bitlines wordlines

bit bit_b word

[Source: Adnan Aziz]

6T Cell

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Critical Path Delay

CHES Tutorial: Crypto hardware design 13 9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Combinational Logic 1 Combinational Logic 2 D_in D_out CLK CLK Delay_1 Delay_2 Clock Period DFF DFF DFF

slide-7
SLIDE 7

9/9/2012 7

Register balancing

CHES Tutorial: Crypto hardware design 14 9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

CLK Delay_1 Delay_2 = Delay_1 Clock Period Combinational Logic 1 Combinational Logic 2 D_in D_out CLK DFF DFF DFF

Latency vs. Throughput

9/9/2012 CHES Tutorial: Crypto hardware design 15

D_out D_in CLK Round 1 Round 2 Round 10

D_in D_out CLK Round Latency: 10 Throughput: 1 Block/Cycle Latency: 10 Throughput: 1/10 Block/Cycle DFF DFF DFF DFF DFF DFF

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-8
SLIDE 8

9/9/2012 8

Power and energy

CHES Tutorial: Crypto hardware design 16 9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
  • Why is it important?

– Limited energy – Limited power

  • Extremely important for crypto devices.

– Source of information leakage

CMOS dynamic power

9/9/2012 CHES Tutorial: Crypto hardware design 17

0-1 transition

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

IN OUT 0 0 0 1 discharge 1 0 charge 1 1

slide-9
SLIDE 9

9/9/2012 9

Optimal Pairing

HW/SW codesign

  • Support multiple algorithms and protocols

CHES Tutorial: Crypto hardware design 18 9/9/2012

RSA1024 RSA2048 ECC160p ECC256p AES128 DH2048 MCU

Decoder Crypto Datapath Register File

BUS

ECC Exp AES …

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

FPGA

9/9/2012 CHES Tutorial: Crypto hardware design 19

4-LUT Z B C D A A B C D 16x1 Z SRAM addr ABCD Z 0000 0 0001 0

… .

1101 0 1110 0 1111 1 AND Z A B C D ABCD Z 0000 0 0001 1

… .

1101 1 1110 1 1111 0 XOR Z A B C D

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-10
SLIDE 10

9/9/2012 10

FPGA

9/9/2012 CHES Tutorial: Crypto hardware design 20

Virtex-5 SliceL

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Virtex-II architecture

9/9/2012 CHES Tutorial: Crypto hardware design 21

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

I/O Blocks (IOBs) Configurable Logic Blocks (CLBs) Clock Management (DCMs, BUFGMUXes) Block SelectRAM™ resource Dedicated multipliers Programmable interconnect

slide-11
SLIDE 11

9/9/2012 11

Part II: Building blocks

CHES Tutorial: Crypto hardware design 22

  • AES Core
  • ECC/RSA Core

9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

A simplified bank system

9/9/2012 CHES Tutorial: Crypto hardware design 23

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-12
SLIDE 12

9/9/2012 12

Sever-side specification

  • Platform: Xilinx Virtex-5 FPGA
  • Function

– AES 128-bit (CTR) – RSA 1024-, 2048-, 4096-bit – ECC 160-, 192-, 256-bit, prime field

  • Performance

– Frequency: 200 MHz – AES128 : 20Gbits/s – RSA1024 : 2000 signatures per second – ECC160 : 4000 signatures per second

9/9/2012 CHES Tutorial: Crypto hardware design 24

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Card-side specification

  • Platform: 130nm ASIC
  • Function

– AES 128-bit (CTR) – RSA 1024-bit – ECC 160-bit

  • Performance

– Frequency: 5 MHz – AES128 : 1Mbits/s – RSA1024 : 5 signatures per second – ECC160: 10 signatures per second

9/9/2012 CHES Tutorial: Crypto hardware design 25

  • Area: < 60k GE
  • Power: < 1mW
  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-13
SLIDE 13

9/9/2012 13

Well…

9/9/2012 CHES Tutorial: Crypto hardware design 26

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

AES - Algorithm

CHES Tutorial: Crypto hardware design 27 9/9/2012

  • International Standard
  • 128/192/256-bit

– Nr = 10, 12, 14

  • Separate key expansion
  • Different Enc/Dec

AddRoundKey i:=1 SubBytes ShiftRows MixColumns AddRoundKey

i<Nr-1 ?

SubBytes ShiftRows AddRoundKey RoundKey[0] RoundKey[i] RoundKey[Nr] i++ Nr-1 times

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-14
SLIDE 14

9/9/2012 14

AES – SubBytes

  • Byte substitution: each byte individual
  • 16 identical Sboxes

9/9/2012 CHES Tutorial: Crypto hardware design 28

sbox

a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 ai bi

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

AES - ShiftRow

  • ShiftRow: circularly rotate each row of state array

9/9/2012 CHES Tutorial: Crypto hardware design 29

a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15

ShiftRow

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-15
SLIDE 15

9/9/2012 15

AES - MixColumn

  • matrix multiplication of state array columns

– multiply with constant entries

9/9/2012 CHES Tutorial: Crypto hardware design 30

a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 bi bi+1 bi+2 bi+3 ai ai+1 ai+2 ai+3 2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2

=

a6 a5 a4 a3 a2 a1 a0 0 0 0 0 a7 a7 0 a7 a7 b7 b6 b5 b4 b3 b2 b1 b0

2 x

a7 a6 a5 a4 a3 a2 a1 a0 a6 a5 a4 a3 a2 a1 a0 0 0 0 0 a7 a7 0 a7 a7 b7 b6 b5 b4 b3 b2 b1 b0

3 x

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

AES - AddRoundKey

  • Add round key

9/9/2012 CHES Tutorial: Crypto hardware design 31

a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15 k2 k6 k10 k14 k0 k4 k8 k12 k1 k5 k9 k13 k3 k7 k11 k15 a2 a6 a10 a14 a0 a4 a8 a12 a1 a5 a9 a13 a3 a7 a11 a15

+ =>

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-16
SLIDE 16

9/9/2012 16

Key expansion

  • Input 128-bit: W[0], W[1], W[2], W[3]

9/9/2012 CHES Tutorial: Crypto hardware design 32 W[0] W[1] W[2] W[3]

Rcon(i/NK) Subword(RotWord())

W[i-Nk] ^ W[i-1] W[i-Nk] ^ ByteSub(RotByte(W[i-1]))^ Rcon[i/Nk]

1 2 3 4 9 10

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

CBC-MAC

9/9/2012 CHES Tutorial: Crypto hardware design 33

  • Cipher block chaining – Message Authentication Code
  • Initialization Vector: IV = C-1
  • Feedback inhibits pipelining

AES (ENC) Pi Ci-1 AES (ENC) Ci-1 Pi

Sender Receiver

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-17
SLIDE 17

9/9/2012 17

Counter mode

  • Converts block cipher into stream cipher
  • no feedback: pipelining is possible
  • crucial to choose non-repeating counter functions, e.g. LFSR
  • crucial to choose counter IV’s that are UNIQUE

9/9/2012 CHES Tutorial: Crypto hardware design 34

P

i

C

i

Cntri AES (ENC) Yi Cntri AES (ENC) P

i

Sender Receiver Yi

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Different operation modes

9/9/2012 CHES Tutorial: Crypto hardware design 35

Sender Receiver Pipelining Notes ECB Enc Dec Yes not used CBC Enc Dec No CBC-MAC Enc Enc No Message authentication CFB Enc Enc No OFB Enc Enc No CTR Enc Enc Yes OCB CCM Enc 2Enc Dec 2Enc Yes Only CTR Privacy & MAC Privacy & MAC

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-18
SLIDE 18

9/9/2012 18

Architecture design

  • Constraints

– 20Gbits / 200MHz = 100 bits per cycle => Let’s say, it encrypt one block (128-bit) per cycle

9/9/2012 CHES Tutorial: Crypto hardware design 36

AES AES AES

bus I/O

Round 1

I/O

Round 10

AddRK

bus Type-1 Type-2

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

High level architecture

9/9/2012 CHES Tutorial: Crypto hardware design 37

Add RoundKey Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7 Round 8 Round 9 Round 10

Key expansion MixColumn SubBytes ShiftRow Add RoundKey

32b

Key expansion SubBytes ShiftRow Add RoundKey

Key Plaintext Ciphertext

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-19
SLIDE 19

9/9/2012 19

Add pipeline registers

9/9/2012 CHES Tutorial: Crypto hardware design 38

Add RoundKey Round 1 Round 2 Round 9 Round 10

32b Key Plaintext Ciphertext

Key expansion MixColumn SubBytes ShiftRow Add RoundKey

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

SubBytes

  • The most complex block

– Step 1: Multiplicative inverse in GF(28) – Step 2: Affine transformation

  • Design choice

– Using a lookup tables (LUT) – On-the-fly computation

9/9/2012 CHES Tutorial: Crypto hardware design 39

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-20
SLIDE 20

9/9/2012 20

SubBytes

9/9/2012 CHES Tutorial: Crypto hardware design 40

[Source: Hodjat et al., 2006]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

200 400 600 800 1000 1200 1400 1600 1800 1 2 3 4 5 6 7 Delay (nsec) Area (gates) gf_design 2-stage pipeline gf_design 3-stage pipeline gf_design without pipeline LUT_design without pipeline

Map

X2 X2 × e

+ ×

X-1

+ × ×

Map-1 Affine

8 8 Map X2 X2 × e

+ ×

X-1

+ × ×

Map-1 Affine

8 8

Map

X2 X2 × e

+ ×

X-1

+ × ×

Map-1 Affine

8 8

The area cost of the Sbox using two-stage and three-stage composite field implementation is 23% and 32% less than the LUT design with the same speed.

Public Key Cryptography

CHES Tutorial: Crypto hardware design 41 9/9/2012

  • RSA

– (basic) encryption – (basic) decryption Input: Public key of the receiver: {n, e} , where n=pq, Plaintext: 0≤ m ≤ n-1. Output: Ciphertext: c = me mod n. Input: Private key: d, where d=e-1 mod (p-1)(q-1), Ciphertext: c. Output: m = cd mod n.

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-21
SLIDE 21

9/9/2012 21

Modular exponentiation

9/9/2012 CHES Tutorial: Crypto hardware design 42

ModSqr ModMul

4 1

  • a17 mod n

= a {10001}2 mod n = ((((a2)2)2)2 ∙ a mod n

ModSqr ModMul

? ?

  • Consider 1024-bit RSA :

me mod n, where e has 1024 bits.

ModSqr ModMul

≈1024 ≈512

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Elliptic Curve Cryptography

CHES Tutorial: Crypto hardware design 43 9/9/2012

E: y2+a1xy + a3y = x3 + a2x2 + a4x + a6, where a1, a2, a3, a4, a6 are from a field K and ∆≠0.

P Q R=P+Q

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-22
SLIDE 22

9/9/2012 22

Elliptic Curve Cryptography

9/9/2012 CHES Tutorial: Crypto hardware design 44

E: y2+a1xy + a3y = x3 + a2x2 + a4x + a6, where a1, a2, a3, a4, a6 are from a field K and ∆≠0.

Q R=2Q

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

ECC Computation

9/9/2012 CHES Tutorial: Crypto hardware design 45

  • Point multiplication: [k]P = P + P + … + P
  • Example:

[23]P = [{10111}2]P = [2]([2]([2]([2]P) + P) + P) + P

  • Consider a 160-bit ECC.

k times

PointDbl PointAdd

4 3

PointDbl PointAdd

? ?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

PointDbl PointAdd

≈160 ≈80

slide-23
SLIDE 23

9/9/2012 23

ECC Computation

9/9/2012 CHES Tutorial: Crypto hardware design 46

  • Can we do better?

[23]P = [{10111}2]P = [{101001}2]P = [2] ([2]([2](([2]([2]P) )- P))) - P

  • 160-bit ECC, using NAF

PointDbl PointAdd

5 2

PointDbl PointAdd

? ?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

PointDbl PointAdd

≈160 ≈53

ECC/RSA processor

9/9/2012 CHES Tutorial: Crypto hardware design 47

RSA, ECC Modular Exponentiation Point Multiplication Modular Add/Sub Modular Mul Modular Inv RSA1024

≈1500

MM1024 ECC160p

≈2560 *

MM160 160PD 80PA

* Using Jacobian coordinates: PD = 4M+4S, PA=12M+4S

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-24
SLIDE 24

9/9/2012 24

How many cycles do we have?

9/9/2012 CHES Tutorial: Crypto hardware design 48

  • Performance

– Frequency: 200 MHz – RSA1024 : 2000 signatures per second – ECC160 : 4000 signatures per second

  • RSA1024: 200*106 / (2000 * 1500) = 66 cycles per MM1024
  • ECC160: 200*106 / (4000 * 2560 ) = 19 cycles per MM160
  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Long Integer Multiplier

  • Schoolbook

9/9/2012 CHES Tutorial: Crypto hardware design 49

A3 A2 A1 A0 B3 B2 B1 B0 C7 C6 C1 C0 …

64-bit Mul on a 16-bit CPU Complexity: Mul: n2 Add: O(n2)

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-25
SLIDE 25

9/9/2012 25

Karatsuba multiplier

9/9/2012 CHES Tutorial: Crypto hardware design 50

(ax+b)(cx+d) = acx2 + (ad+bc)x + bd = acx2 + ((a+b)(c+d)-ac-bd) x + bd

Complexity: Mul: 3nlog23 Add: O(n2) Schoolbook Karatsuba 16-bit MUL. 4096 (=642) 729 1024-bit Mul on a 16-bit CPU 1024 512 512 256 256 256 256

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

A=AH ∙ 2512+AL

Reduction

  • r = c mod p
  • Integer division

– Step 1: q = floor(c/p) – Step 2: r = c – qp.

  • (Pseudo-) Barrett reduction

– Precomputed: p’ = 1/p – Step 1: q = cp’ – Step 2: r = c - qp

9/9/2012 CHES Tutorial: Crypto hardware design 51

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-26
SLIDE 26

9/9/2012 26

Montgomery Reduction

9/9/2012 CHES Tutorial: Crypto hardware design 52

Input: a, b< p. Output: c= ab mod p. Parameter: R=2w>p.

Step 1: domain conversion:

a a’ = aR mod p b b’ = bR mod p

Step 2: multiplication:

c’ = a’b’R-1 mod p = abR mod p

Step 3: domain conversion :

c=ab c’ = abR mod p MontM(A,B) :=ABR-1 mod p MontM(a,R2) = aR2R-1 mod p = aR mod p MontM(c’,1) = abRR-1 mod p = ab mod p

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Correctness: Qp mod R ≡ (Tp’ mod R) p mod R ≡ Tp’p mod R ≡ (-T ) mod R thus, (T +Qp) mod R = 0. thus, “div R” is simply a right-shift. Input: A, B< p. Output: C ≡ ABR-1 mod p Parameter: R=2w>p, and p’= -p-1 mod R Step 1: T=AB Step 2: Q=(T mod R ) p’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C

Montgomery Reduction

  • MontM(A,B) := ABR-1 mod p

9/9/2012 CHES Tutorial: Crypto hardware design 53

Input: A, B< p. Output: C ≡ ABR-1 mod p Parameter: R=2w>p, and p’= -p-1 mod R Step 1: T=AB Step 2: Q=(T mod R ) p’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C Correctness: Qp mod R ≡ (Tp’ mod R) p mod R ≡ Tp’p mod R ≡ (-T ) mod R thus, (T +Qp) mod R = 0. Since R=2w, “div R” is simply a right-shift.

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-27
SLIDE 27

9/9/2012 27

Montgomery Reduction

9/9/2012 CHES Tutorial: Crypto hardware design 54

Parameter: p, R, where R=2w>p, and p’= -p-1 mod R Input: A, B< p. Output: C ≡ ABR-1 mod p Step 1: T=AB Step 2: Q=Tp’ mod R Step 3: C = (T +Qp) div R Step 4: C = C – p if C>=p Return: C

A B T

Step 1

*

T mod R p’ Q

Step 2

*

Q mod R p’ G

Step 3

*

T C

+

Question: If A and B have n words each, how many multiplications are needed?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Reduction with special moduli

  • Mersenne prime

– p = 2k -1 – limited candidates

  • Pseudo-Mersenne prime

– p = 2k – c, where c is small

9/9/2012 CHES Tutorial: Crypto hardware design 55

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-28
SLIDE 28

9/9/2012 28

Reduction with Mersenne prime

9/9/2012 CHES Tutorial: Crypto hardware design 56

Parameter: p = 2k - c Input: A, B < p. Output: r ≡ AB mod p Step 1: T = AB Step 2: Repeat : T = TL + TH c until T<p.

// T = TH 2k + TL = TH (2k – c) + TL + TL c // = TH p+ TL + TH c

Return: T

A B T

*

TH = T div 2k C H

*

TL = = T mod 2k T

+

Question: How many iterations are needed?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Complexity

9/9/2012 CHES Tutorial: Crypto hardware design 57 Image Soure: http://www.clipartmojo.com

I am trying RSA1024 on my calculator…… gosh my finger hurts! Algorithm Complexity Integer multiplication Schoolbook n2 Integer multiplication Karatsuba 3nlog23 Barrett modular multiplication ≈ 2n2 + n Montgomery modular multiplication 2n2 + n s: operand size [bits] w: digit size n: number of digits [n = s/w]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Note: Other variants of Montgomery and Barrett reduction algorithm may have different complexity.

slide-29
SLIDE 29

9/9/2012 29

Architecture for RSA

  • Budget : 66 cycles per MM1024
  • s=1024, w=32, n=32

– Montgomery method (Schoolbook): 2n2+n = 2080. – We need (at least) 2080/66 multipliers (32-bit unsigned). – Cycles for data loading, addition, and so on.

  • How do we organize the multipliers?

9/9/2012 CHES Tutorial: Crypto hardware design 58

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Architecture for ECC

  • Budget: 19 cycles per MM160
  • s=160, w=32, n=5

– Montgomery method (Schoolbook): 2n2+n = 55. – We need (at least) 55/19 multipliers (32-bit unsigned). – Cycles for data loading, addition, and so on.

  • How do we organize the multipliers?

9/9/2012 CHES Tutorial: Crypto hardware design 59

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-30
SLIDE 30

9/9/2012 30

Architecture Type-I

9/9/2012 CHES Tutorial: Crypto hardware design 60

MUL31 A31 MUL30 A30 MUL1 A1 MUL0 A0 Bi Carry Safe Adder

  • Basic idea: Multiplier Array + Carry-Safe Adder

[Source: Mentens et al., GLSVLSI 2007]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Mul: 1024x32

Architecture Type-II

  • Basic idea: Processing Elements + data bus

9/9/2012 CHES Tutorial: Crypto hardware design 61

MUL31 FSM

RAM

I/O MUL31 FSM

RAM

I/O MUL31 FSM

RAM

I/O

core 0 core 1 core 31

[Source: Tenca and Koç, IEEE ToC, 2003]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-31
SLIDE 31

9/9/2012 31

Architecture Type-III

  • Karatsuba multiplier

9/9/2012 CHES Tutorial: Crypto hardware design 62

MUL32a AMSB BMSB MUL33c CMSB MUL32b ALSB BLSB T = (T2 << 64) + T0 + ( T1 - T2 - T2) <<32 T2 T1 T0 T C = A+B CLSB A B

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Card-side: Low-area

9/9/2012 CHES Tutorial: Crypto hardware design 63

  • Budget revisited

– Frequency: 5 MHz – Area : < 60 kgates – AES128 : 1Mbits/s – RSA1024 : 5 signatures per second – ECC160: 10 signatures per second

  • Constraints

– AES: 5M / (1M/128) = 640 cycles per block (≈64 cycles per round) – RSA: 5M / (5 * 1500) = 699 cycles per MM1024 – ECC: 5M / (10 * 2560) = 204 cycles per MM160

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-32
SLIDE 32

9/9/2012 32

Architecture design - AES

  • SBox design

– How many SBox? – LUT vs. Finite-field computation

9/9/2012 CHES Tutorial: Crypto hardware design 64

[source: Moradi et al., CHES 2011]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

State (4x4 Bytes) Key (4x4 Bytes) SubByte MixCocumn KE FSM I/O

  • Area estimation

– Flip-flops: (128+128)*6 = 1536 gates – SBox + MixColumn = 800 ‡ – FSM + KeyAdd = 300 (roughly)

AES on ASICs

9/9/2012 CHES Tutorial: Crypto hardware design 65

Author CMOS [um] Area [GE] Freq. [MHz] Throughput [Mbps] Hwang et al. 0.18 19300 330 3840 Mangard 0.6 7000 50 98 Satoh 0.11 5400 130 331 Feldhofer 0.35 3400 80 10 Moradi et al. 0.18 2400 0.1 0.057

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-33
SLIDE 33

9/9/2012 33

Architecture - PKC

9/9/2012 CHES Tutorial: Crypto hardware design 66

PA_PD Exp Inv

CPU

FSM

ALU

Register File Runtime Memory

BUS …

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Architecture design – RSA/ECC

  • Budget revisited

– 699 cycles for MM1024 – 204 cycles for MM160

  • Architecture design

– Multiplier: 32-bit? 64-bit? – Memory: single-port? dual-port?

  • Area estimation

– Storage: 1024*8 = 8kb – ALU: ? – FSM: ?

9/9/2012 CHES Tutorial: Crypto hardware design 67

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Should be enough?

slide-34
SLIDE 34

9/9/2012 34

Detailed architecture

  • MAC

– Consider C= ∑ AiBj

  • Local Instruction ROM

– Programmable – Fast to access

  • Register file

– Reduce memory access

9/9/2012 CHES Tutorial: Crypto hardware design 68

CPU Decoder 64-bit MAC

Register File (8x64)

BUS

InsRom

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Part III: Optimization

CHES Tutorial: Crypto hardware design 69

  • Area
  • Speed
  • Power

9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-35
SLIDE 35

9/9/2012 35

Reduce area-delay product

  • Reduce complexity

– Faster modular +,-,x, /

  • RSA

– Chinese Remainder Theorem (CRT) – m-ary method

  • ECC

– m-ary method – x-coordinate only point multiplication – multiple point multiplication

9/9/2012 CHES Tutorial: Crypto hardware design 70

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

RSA, ECC Modular Exponentiation Point Multiplication Modular Add/Sub Modular Mul Modular Inv

Reduce area-delay product

  • Design space exploration

– ECC coprocessor (k163)

9/9/2012 CHES Tutorial: Crypto hardware design 71

CPU FSM Digit-serial multiplier Reg File

BUS

1 2 3 4 5 20 40 60 80 100 120

Area [kGE] Cycles [x10^4] Freq [x10kHz] Power [uw] Energy [uJ]

[source: Lee et al., IEEE ToC, 2008]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-36
SLIDE 36

9/9/2012 36

Reduce area-delay product

  • Losing flexibility
  • NIST primes

– p192 = 2192 - 264 - 1 – p224 = 2224 - 296 + 1 – p256 = 2256 - 2224 + 2192 + 296 -1 – p384 = 2384 - 2128 - 296 + 232 -1 – p521 = 2521 - 1

  • Koblitz curves

9/9/2012 CHES Tutorial: Crypto hardware design 72

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Low-power design

9/9/2012 CHES Tutorial: Crypto hardware design 73

  • Reducing clock frequency

– Reducing VDD – Relax critical path (pipelining)

  • Reducing CL

– Minimum device sizes – Compact and custom layout

  • Reducing switches

– Clock gating – Eliminate glitches

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-37
SLIDE 37

9/9/2012 37

Improve your design

9/9/2012 CHES Tutorial: Crypto hardware design 74

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Power Time Power Time

VS.

Design-I Design-II

Part III: Secure implementation

CHES Tutorial: Crypto hardware design 75

  • Power analysis
  • Fault analysis
  • Countermeasures

9/9/2012

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-38
SLIDE 38

9/9/2012 38

Physical attacks

9/9/2012 CHES Tutorial: Crypto hardware design 76

Side-channel analysis Fault analysis

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Power as a side-channel

9/9/2012 CHES Tutorial: Crypto hardware design 77

  • Consumes power when output makes a 0 to 1 transition
  • It tells you what it is doing…

0-1 transition

IN OUT 0 0 0 1 discharge 1 0 charge 1 1

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-39
SLIDE 39

9/9/2012 39

Simple power analysis

  • Based on one or few measurements
  • Discovery of data-(in)dependent properties

– Symmetric:

  • Number of rounds (resp. key length)
  • Memory accesses (usually higher power consumption)

– Asymmetric:

  • The key (if badly implemented, e.g. RSA / ECC)
  • Key length
  • Search for repetitive patterns
  • Search for conditional operations

9/9/2012 CHES Tutorial: Crypto hardware design 78

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Simple power analysis on ECC

9/9/2012 CHES Tutorial: Crypto hardware design 79

Left-to-right binary method for point multiplication k = (kl-1,kl-2,...,k0) R ← O, for i=l-1 downto 0 do R ← [2]R if ki = 1 then R ← R + P end if end for

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-40
SLIDE 40

9/9/2012 40

Simple power analysis on AES

9/9/2012 CHES Tutorial: Crypto hardware design 80

  • What is the keylength of this AES implementation?

[Source: B. Gierlichs]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Simple power analysis on AES

9/9/2012 CHES Tutorial: Crypto hardware design 81

  • 10 rounds => AES 128
  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[Source: B. Gierlichs]

slide-41
SLIDE 41

9/9/2012 41

Differential Power Analysis

9/9/2012 CHES Tutorial: Crypto hardware design 82

  • Recall: CMOS has data-dependent dynamic power

dissipation (very small differences)

  • Requires many measurements

– Usually a few hundred for software implementations – Usually a few thousand for hardware implementations – Can go up to several 100k if implementation is protected

  • Discovery of data-dependencies by statistical means (uni-

and multivariate)

  • Applies to symmetric and asymmetric schemes
  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Correlation DPA

9/9/2012 CHES Tutorial: Crypto hardware design 83

Input Real key Real side- channel Real output Model of side- channel Key hypothesis Hypothetical output Statistical analysis Hypothesis correct?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[Source: B. Gierlichs]

slide-42
SLIDE 42

9/9/2012 42

Countermeasure

  • Algorithmic level

– Fixed operation time – Fixed operation pattern – Data randomization

  • Circuit level (generic)

– Differential logic – Masking

9/9/2012 CHES Tutorial: Crypto hardware design 84

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Protect RSA from SPA

9/9/2012 CHES Tutorial: Crypto hardware design 85

Input: n, m and e. Output: c = me mod n. 1. Let e = [et, et-1, …, e1, e0]2; 2. c := 1; 3. For i:=t downto 0 do 4. c:= c2 mod n; 5. if ei ==1 then 6. c:=cm mod n; Return c.

What is required to secure it?

Input: n, m and e. Output: c = me mod n. 1. Let e = [1, et-1, …, e1, e0]2; 2. R[0] := m; R[1] = m2 mod n; 3. For i:=t-1 downto 0 do 4. R[1-ei] := R[0]R[1] mod n; 5. R[ei] := R[ei]R[ei] mod n; Return R[0].

Left-to-right binary method Montgomery Powering Ladder What is required to secure it?

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-43
SLIDE 43

9/9/2012 43

Protect RSA from DPA

  • Masking the exponent

9/9/2012 CHES Tutorial: Crypto hardware design 86

Input: n, m and e. Output: c = me mod n. 1. r = Random(); //r <n 2. ms := rm; 3. v= ms

e mod n;

4. u:= re mod n; 5. c:=v/u mod n; Return c.

Randomized exponentiation

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Protect ECC from DPA

  • Randomization

9/9/2012 CHES Tutorial: Crypto hardware design 87

Input: k, P. Output: Q = kP. 1. r = Random(); //r < order(P) 2. k’ := k + r *order(P); 3. Q= k’ P; // [order(P)] P = O. Return Q.

Randomized scalar

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Input: k, P. Output: Q = kP. precomputed: R, S=kR. 1. T := P + R; 2. Q’ = k T; 3. Q = Q’ – S 4. r = Random(); //r < 28 5. R = rR, S = rS; //update R, S Return Q.

Base point blinding

slide-44
SLIDE 44

9/9/2012 44

Secure logic style

  • Duplicate logic

9/9/2012 CHES Tutorial: Crypto hardware design 88

1

  • transition

0-1 transition

IN IN OUT OUT 0 0 1 1 0 1 1 discharge charge 1 0 1 charge discharge 1 1 0 0

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

Secure logic style

  • Dynamic logic breaks input sequence

9/9/2012 CHES Tutorial: Crypto hardware design 89

in

  • ut

Pr(echarge) Ev(aluation) PDN IN OUTPre OUTEV Charge 0 0 1 1 0 1 1 discharge 1 0 1 1 1 1 1 discharge

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

slide-45
SLIDE 45

9/9/2012 45

Transition independent power consumption

  • …doesn’t create any side channel information
  • When logic values are measured by charging and discharging

capacitances, we need to use a fixed amount of energy for every transition

9/9/2012 CHES Tutorial: Crypto hardware design 90

switch once every cycle switch a constant load capacitance

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

Secure logic style

  • Based on standard cell library

– WDDL – MCML – MDPL

  • Based on full custom layout

– SABL – DyCML

9/9/2012 CHES Tutorial: Crypto hardware design 91

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-46
SLIDE 46

9/9/2012 46

Solution based on standard cells

9/9/2012 CHES Tutorial: Crypto hardware design 92

B A Z A B A B Z Z A B A prch Z B Z

De-Morgan’s Law AND-ing with precharge signal 1 2

  • false output
  • with false inputs
  • precharge 1:
  • utputs are 0
  • precharge 0 - evaluation:

1 output is 1

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

Wave Dynamic Differential Logic (WDDL)

  • Restrict library to AND, OR gate

– input 0  output 0 – no precharge operator

9/9/2012 CHES Tutorial: Crypto hardware design 93

AND gate OR gate prch precharge inputs clk Encryption Module register clk

eval. prch.

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[Tiri,DATE2004]

slide-47
SLIDE 47

9/9/2012 47

WDDL library

  • All functions of and2, or2 operator
  • In addition: inverted input, output signals
  • XOR2X4:

OAI221X2:

  • Our WDDL library: 128 cells

9/9/2012 CHES Tutorial: Crypto hardware design 94

A

A B B Y Y

AOI22X1 OAI22X1 INVX4 INVX4

C0

OAI221X1 AOI221X1

A0 A1 B0 B1 Y Y

INVX2 INVX2

A0 A1 B0 B1 C0

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

Experimental results

  • Measurement

results for FPGA test circuit

9/9/2012 CHES Tutorial: Crypto hardware design 95

5 6 3 4 2 1

single ended WDDL

  • ut
  • ut
  • ut

5 6 3 4 2 1 5 6 3 4 2 1 5 6 3 4 2 1

single ended WDDL

  • ut
  • ut
  • ut
  • ut
  • ut
  • ut
  • ut
  • ut
  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

slide-48
SLIDE 48

9/9/2012 48

Unbalanced capacitive loads

  • For constant power consumption:

constant load capacitance.

  • Match loads at differential outputs

9/9/2012 CHES Tutorial: Crypto hardware design 96

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

Load capacitance breakdown

9/9/2012 CHES Tutorial: Crypto hardware design 97

CA = CA’ Co,A + Cw,A + Ci,I1 + … Ci,Ik = Co,A’ + Cw,A’ + Ci,I1’ + … Ci,Ik’ Cw,A = Cw,A’

Co,A’ Ci,I2’ Co,A Ci,I2 Ci,I1’ Ci,I1 gate gate 2 gate 1 Co: intrinsic output capacitance Cw: interconnect capacitance Ci: input capacitance Cw,A’ Rw,A’ Cw,A Rw,A

  • Intrinsic caps.:

matched

  • Interconnect:

dominant (Moore’s law)

  • Balancing

interconnect: crucial

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[courtesy I. Verbawhede]

slide-49
SLIDE 49

9/9/2012 49

AES, controller, fingerprint processor.

9/9/2012 CHES Tutorial: Crypto hardware design 98

insecure single-ended secure WDDL differential route

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[Hwang, JSSC06]

DPA attack on AES key bytes- SCMOS

9/9/2012 CHES Tutorial: Crypto hardware design 99

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-50
SLIDE 50

9/9/2012 50

DPA attack on WDDL

9/9/2012 CHES Tutorial: Crypto hardware design 100

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Masking

  • Goal: random masks conceal data
  • (Different from SW or algorithmic masking)
  • Masking for one AND gate

9/9/2012 CHES Tutorial: Crypto hardware design 101

[Trichina,2004]

ama bmb ma mb mAND mAND a.bmAND

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-51
SLIDE 51

9/9/2012 51

Fault injection

  • Apply combinations of strange environmental

conditions

– Vcc – Glitch – Clock – Temperature – UV – Light – X-Rays

9/9/2012 CHES Tutorial: Crypto hardware design 102

input error

[Source: H. Handshuh]

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Fault attack on ECC

  • Safe-error attacks

9/9/2012 CHES Tutorial: Crypto hardware design 103

Double-and-add-always method k = (kl-1,kl-2,...,k0) R0 ← O, for i=l-1 downto 0 do R0 ← [2] R0 R1 ← R0 + P if ki = 1 then R0 ← R1 else R0 ← R0 end if end for IN OPA OP0 OP1 OPB OUT ki=0 ki=1

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-52
SLIDE 52

9/9/2012 52

Fault attack on ECC

  • Weak curve attack

9/9/2012 CHES Tutorial: Crypto hardware design 104

E: y2+a1xy + a3y = x3 + a2x2 + a4x + a6 E’: y2+a1xy + a3y = x3 + a2x2 + a4x + a’6

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Countermeasures

  • Parameter verification

– Input/output validity – Intermediate check

  • Redundant computation

– Duplicated data-path – Multiple executions

  • Physical protections

– Circuit shields – Sensors (temperature, frequency, voltage, etc. )

9/9/2012 CHES Tutorial: Crypto hardware design 105

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary
slide-53
SLIDE 53

9/9/2012 53

Security comes at a price

9/9/2012 CHES Tutorial: Crypto hardware design 106

  • Adding countermeasures

– Larger area – Longer operation time

  • Design puzzle

– Security – Power – Area – Attacks – Performance A E R

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Mapping Algorithms - Security Partitioning

9/9/2012 CHES Tutorial: Crypto hardware design 107

Server Client

root-of-trust

Protocol/Algorithm-level validation

Noncritical software

Matching & Crypto SW

Architecture-level validation

Architecture-level attacks

Matching & Crypto HW

Software driver

Microarchitecture-level validation

Microarchitecture-level attacks DPA-resistant HW Circuit-level attacks

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

[Source: P. Schaumont]

slide-54
SLIDE 54

9/9/2012 54

Design flow revisited

9/9/2012 CHES Tutorial: Crypto hardware design 108

Design Spec Architecture Design RTL Design Logic Synthesis Secure Algorithms

Netlist

Security Evaluation Phase I Physical Design Security Evaluation Phase II

Layout

Secure Logic SCA FA

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary

Summary

  • Hardware design for crypto hardware

– Architecture design – Low-power design methods – Physical security

  • Research topics

– A better design flow for crypto hardware – Provable physical security?

9/9/2012 CHES Tutorial: Crypto hardware design 109

  • I. Introduction
  • II. Building Blocks
  • III. Optimization
  • IV. Physical Security
  • V. Summary