Good Practices for Designing Cryptographic Primitives in Hardware - - PowerPoint PPT Presentation

good practices for designing cryptographic primitives in
SMART_READER_LITE
LIVE PREVIEW

Good Practices for Designing Cryptographic Primitives in Hardware - - PowerPoint PPT Presentation

Good Practices for Designing Cryptographic Primitives in Hardware Miroslav Kne evi NXP Semiconductors miroslav.knezevic@nxp.com January 25, 2016 School on Design for a Secure IoT, Tenerife, 2016 THINGS Smart Plugs 3. January 28, 2016


slide-1
SLIDE 1

Good Practices for Designing Cryptographic Primitives in Hardware

Miroslav Knežević NXP Semiconductors miroslav.knezevic@nxp.com January 25, 2016 School on Design for a Secure IoT, Tenerife, 2016

slide-2
SLIDE 2

THINGS

slide-3
SLIDE 3

Smart Plugs

January 28, 2016

3.

slide-4
SLIDE 4

Smart Door Locks

January 28, 2016

4.

slide-5
SLIDE 5

Smart Thermostats

January 28, 2016

5.

slide-6
SLIDE 6

Smart Smoke Detectors

January 28, 2016

6.

slide-7
SLIDE 7

Smart Light Bulbs

January 28, 2016

7.

slide-8
SLIDE 8

Connected Cars

January 28, 2016

8.

slide-9
SLIDE 9

WHY HARDWARE?

slide-10
SLIDE 10

…to run software, obviously!

January 28, 2016

10.

slide-11
SLIDE 11

But software alone is…

January 28, 2016

11.

Slow Insecure Energy inefficient

slide-12
SLIDE 12

Super Powers of Embedded HW

January 28, 2016

13.

Performance Power (Energy) Area Security*

* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!
slide-13
SLIDE 13

Area – Designing the Smallest Block Cipher

January 28, 2016

14.

slide-14
SLIDE 14

A A VDD

January 28, 2016

15.

MOSFET Channel Length

S D G

slide-15
SLIDE 15

NAND gate

January 28, 2016

16.

  • Smallest logical gate with two inputs.
  • GE (gate equivalence) = physical area
  • f a single NAND gate.
  • (Ab)used for comparing HW designs

across different CMOS technologies.

slide-16
SLIDE 16

XOR gate

January 28, 2016

17.

2-3 GE

slide-17
SLIDE 17

Modern Lightweight Ciphers

January 28, 2016

18.

< 1000 GE

slide-18
SLIDE 18

AES (128-bit key, ENC only)

January 28, 2016

19.

2500 GE

slide-19
SLIDE 19

January 28, 2016

20.

Advances in CMOS Technology

140 nm 40 nm 10 nm

slide-20
SLIDE 20

January 28, 2016

21.

ARM Cortex M0 example (~20 kGE)

16.25 mm CMOS 40 nm CMOS 140 nm

slide-21
SLIDE 21

Block Cipher – HW Perspective

January 28, 2016

22.

Memory Key size ≥ 80 bits Block size ≥ 32 bits Round function + Key schedule + Control logic Minimize!

slide-22
SLIDE 22

KATAN – The Smallest Block Cipher

January 28, 2016

23.

slide-23
SLIDE 23

January 28, 2016

24.

KATAN32 = 315 GE + 508 bits of ROM

Only 508 bits of expanded key!

KATAN – The Smallest Block Cipher

slide-24
SLIDE 24

KATAN in numbers

January 28, 2016

25.

It’s (one of) the smallest known cipher(s): < 500 GE But it’s not very fast: 254 clock cycles Still scalable: 3 times faster for negligible area overhead

slide-25
SLIDE 25

KATAN vs Competition

January 28, 2016

26.

SPECK IBM130 Synopsys ≥ 580 GE KLEIN TSMC180 Synopsys ≥ 1.3 kGE Piccolo 130nm Synopsys ≥ 700 GE LED 180nm Synopsys ≥ 700 GE KATAN NXP140 Cadence ≥ 460 GE PRESENT UMC180 IHP250 AMIS350 Synopsys ~1kGE SIMON IBM130 Synopsys ≥ 520 GE

slide-26
SLIDE 26

<5 6.25 6.67 7.67 NXP 90NM UMC 130NM UMC 180NM NANGATE 45NM

AREA OF SCAN-FF [GE]

Memory Elements in different CMOS Technologies

January 28, 2016

27.

slide-27
SLIDE 27

521 737 918 1192 1340 738 1060 1329 1728 1950 759 1103 1367 1768 2012 868 1256 1571 2071 2323

VERSION 1 VERSION 2 VERSION 3 VERSION 4 VERSION 5

AREA [GE]

SPONGENT in different CMOS Technologies

January 28, 2016

28.

up to 70% difference!

slide-28
SLIDE 28

How can we do a fair comparison?

January 28, 2016

29.

Difficult in practice. But why not using an open-core library at least? http://www.nangate.com/

slide-29
SLIDE 29

Performance – Designing the Fastest Block Cipher

January 28, 2016

30.

slide-30
SLIDE 30

Latency vs Throughput

January 28, 2016

31.

Latency = 15 s Throughput = 0.067 beer/s

12 3 6 9

Serial processing

slide-31
SLIDE 31

Latency vs Throughput

January 28, 2016

32.

Latency = 15 s Throughput = 0.2 beer/s

12 3 6 9

Parallel processing!

slide-32
SLIDE 32

Latency vs Throughput

January 28, 2016

33.

Latency = 15 s Throughput = 0.2 beer/s

12 3 6 9

Pipelining!

slide-33
SLIDE 33

Latency vs Throughput

January 28, 2016

34.

Latency = 5 s Throughput = 0.2 beer/s

12 3 6 9

bottom-up! Unrolling!

slide-34
SLIDE 34

128 128 8 MDS LIGHT 128 128 4 BINARY NO 64 64 4 MDS LIGHT 64 64, 96, 128 4 BINARY LIGHT 64 80, 128 4

BIT PERMUTATION

LIGHT 64 64, 80, 96 4 MDS LIGHT 64 64, 128 4 MDS NO

Latency of Existing Ciphers – Is Lightweight = “Light + Wait”?

January 28, 2016

35.

BLOCK-SIZE KEY-SIZE S-BOX P-LAYER K-SCHEDULE MCRYPTON AES NOEKEON MINI-AES PRESENT KLEIN LED

slide-35
SLIDE 35

Unrolled HW Architectures

January 28, 2016

36.

slide-36
SLIDE 36

17.8 15.3 20.3 25.3 31.2 46.6 9.8 9.8 9.8 9.9 14.8 15.5 14.8 14.7 20.2 16.4 21.4 26.4 32.8 48.2 10.8 10.8 11 12 17 17.4 16.4 16.6

LATENCY [NS]

1-cycle 2-cycle

Results – Latency (CMOS 90 nm)

January 28, 2016

37.

slide-37
SLIDE 37

Number of Rounds vs Key Size

January 28, 2016

38.

slide-38
SLIDE 38

366.6 48.2 63.7 79.9 128.7 193.1 41.3 40.4 41.4 40 102.5 49.5 72.3 73.8 191.8 24.9 32.6 41.3 63.5 96 20.9 21.1 21 22 49.6 27.1 37.6 37.1

AREA [KGE]

1-cycle 2-cycle

Results – Area (CMOS 90 nm)

January 28, 2016

39.

slide-39
SLIDE 39
  • Use small S-boxes (e.g. 5-bit, 4-bit, 3-bit)
  • Almost everything follows the normal distribution. So does the S-box!

Low Latency Encryption S-box

January 28, 2016

40.

slide credit: Gregor Leander choose me!

slide-40
SLIDE 40

Low Latency Encryption Number of Rounds vs Round Complexity

January 28, 2016

41.

  • Not too low complexity.
  • Reduce the number of rounds at the cost of (slightly) heavier round.
slide-41
SLIDE 41
  • Number of rounds should be independent of the key schedule.
  • Use constant addition instead of a key schedule (if possible).

Low Latency Encryption Key Schedule

January 28, 2016

42.

slide-42
SLIDE 42
  • Use involution where possible: 𝑔 𝑔 𝑦

= 𝑦.

  • Make Encryption and Decryption procedures similar.
  • BUT: think application oriented – sometimes it is beneficial to have

asymmetric constructions.

Low Latency Encryption Encryption vs Decryption

January 28, 2016

43.

slide-43
SLIDE 43

Low Latency Encryption Meet PRINCE

January 28, 2016

44.

𝛽-reflection property:

slide-44
SLIDE 44

Low Latency Encryption Meet PRINCE

January 28, 2016

45.

17.8 15.3 20.3 25.3 31.2 46.6 9.8 9.8 9.8 9.9 14.8 15.5 14.8 14.7 8

LATENCY [NS]

slide-45
SLIDE 45

Low Latency Encryption Meet PRINCE

January 28, 2016

46.

366.6 48.2 63.7 79.9 128.7 193.1 41.3 40.4 41.4 40 102.5 49.5 72.3 73.8 17

AREA [KGE]

slide-46
SLIDE 46

Power/Energy – Future of Lightweight Crypto

January 28, 2016

47.

slide-47
SLIDE 47

History of Lightweight Crypto (Area)

January 28, 2016

48.

500 1000 1500 2000 2500 3000 3500 1970 1980 1990 2000 2010 2020

AREA (GE) YEAR

Lightweight Block Ciphers

DES AES PRESENT KATAN

slide-48
SLIDE 48

History of Lightweight Crypto (Latency)

January 28, 2016

49.

1 10 100 1000 10000 1970 1980 1990 2000 2010 2020

LATENCY (# CLOCK CYCLES) YEAR

Lightweight Block Ciphers

slide-49
SLIDE 49

History of Lightweight Crypto (Energy*)

January 28, 2016

50.

1 10 100 1000 10000 100000 1000000 10000000 100000000 1970 1980 1990 2000 2010 2020

AREA * LATENCY YEAR

Lightweight Block Ciphers

slide-50
SLIDE 50

Future of Lightweight Crypto (Energy*)

January 28, 2016

51.

1 10 100 1000 10000 100000 1000000 10000000 100000000 1970 1980 1990 2000 2010 2020

AREA * LATENCY YEAR

Lightweight Block Ciphers

PRINCE

Future of LWC

slide-51
SLIDE 51

Energy

January 28, 2016

52.

=

slide-52
SLIDE 52

Power

January 28, 2016

53.

=

slide-53
SLIDE 53

Passive RFID: Low Power Applications

January 28, 2016

54. CONTROL MEMORY CRYPTO

RF

NETWORK INTERFACE

RF

RFID tag Reader

slide-54
SLIDE 54

Anything Battery Powered: Low Energy Applications

January 28, 2016

55.

slide-55
SLIDE 55

Every mW matters!

January 28, 2016

56.

Total number of mobile devices in 2015 = 9.5 billion* Average (regular) power consumption of a smartphone = 160 mW** Total energy spent = €2.8 billion*** a year!

*** average electricity price in 2014 in EU was €0.208 per kWh. ** An Analysis of Power Consumption in a Smartphone, A Carroll, G Heiser, USENIX 2010. * Mobile Statistics Report 2015-2019, The Radicati Group Inc.

slide-56
SLIDE 56

Cisco estimates…

January 28, 2016

57.

… 50 billion* connected devices by 2020

* http://www.cisco.com/web/solutions/trends/iot/portfolio.html

slide-57
SLIDE 57

Intel says…

January 28, 2016

58.

… 200 billion* smart devices by 2020

* http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html

slide-58
SLIDE 58

Moving Bits vs Moving People & Things

January 28, 2016

59.

* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf

slide-59
SLIDE 59

World’s ICT Energy Consumption

January 28, 2016

60.

* http://www.tech-pundit.com/wp-content/uploads/2013/07/Cloud_Begins_With_Coal.pdf

The ICT ecosystem uses about 1500 TWh of electricity annually and approaches 10% of world electricity generation!

slide-60
SLIDE 60

What can Crypto do about it?

January 28, 2016

61.

Become Lightweight Crypto!

slide-61
SLIDE 61

Power Consumption in (Crypto) HW

January 28, 2016

62.

𝑄𝑢𝑝𝑢 = 𝑄𝑡𝑥𝑗𝑢𝑑ℎ𝑗𝑜𝑕 + 𝑄𝑚𝑓𝑏𝑙𝑏𝑕𝑓 𝑄𝑡𝑥𝑗𝑢𝑑ℎ𝑗𝑜𝑕 ≈ 𝐷𝑓𝑔𝑔 ∙ 𝑊

𝐸𝐸 2 ∙ 𝑔 𝑑𝑚𝑙∙ 𝑡𝑥

𝑄𝑡𝑥𝑗𝑢𝑑ℎ𝑗𝑜𝑕 ≫ 𝑄𝑚𝑓𝑏𝑙𝑏𝑕𝑓

slide-62
SLIDE 62

Energy Consumption in (Crypto) HW

January 28, 2016

63.

𝐹 = 𝑄 ∙ 𝑢 = 𝑄 ∙ 𝑂 𝑔

𝑑𝑚𝑙

𝐹 ≈ 𝐷𝑓𝑔𝑔 ∙ 𝑊

𝐸𝐸 2 ∙ 𝑂 ∙ 𝑡𝑥

slide-63
SLIDE 63
  • Reduce circuit area (e.g. serializing): 𝐷𝑓𝑔𝑔 ↓, but 𝑂 ↑
  • Reduce switching activity (e.g. clock gating): 𝑡𝑥 ↓
  • Move to smaller CMOS technologies: 𝐷𝑓𝑔𝑔 ↓, 𝑊

𝐸𝐸 ↓, but 𝑄𝑚𝑓𝑏𝑙𝑏𝑕𝑓 ↑

  • Reduce the operating clock frequency: 𝑔

𝑑𝑚𝑙 ↓

  • Reduce the latency: 𝑂 ↓

Reducing Power and Energy Consumption

January 28, 2016

64.

𝑄 ≈ 𝐷𝑓𝑔𝑔 ∙ 𝑊

𝐸𝐸 2 ∙ 𝑡𝑥 ∙ 𝑔 𝑑𝑚𝑙

𝐹 ≈ 𝐷𝑓𝑔𝑔 ∙ 𝑊

𝐸𝐸 2 ∙ 𝑡𝑥 ∙ 𝑂

slide-64
SLIDE 64

Security – Know Your Enemy

January 28, 2016

66.

* In the original slide deck Superman was held responsible for Security. During the coffee break, students suggested that Batman would be a better representative and I agree he is. So sorry, Mr. Kent!
slide-65
SLIDE 65

January 28, 2016

67.

slide-66
SLIDE 66

Power (Current) Measurement Setup

January 28, 2016

68.

RAM

CPU COPROS

PERIPHERALS

VDD R

RAM

CPU COPROS

PERIPHERALS

VDD 𝑗 EM-probes 𝑗

slide-67
SLIDE 67

SIMPLE POWER ANALYSIS

slide-68
SLIDE 68

SPA – Public Key Crypto

January 28, 2016

70.

  • Modular exponentiation:

mod

k

x m N 

for down to if then endfor return

2   n i

2

x x 

m x x  

) 1 ( 

i

k

m x 

x

Power consumption depends

  • n value of the secret bit ki!
slide-69
SLIDE 69

SPA – Symmetric Key Crypto (1)

January 28, 2016

71.

RAM

CPU COPROS

PERIPHERALS

slide-70
SLIDE 70

DIFFERENTIAL POWER ANALYSIS

slide-71
SLIDE 71

I-V characteristic of CMOS inverter

January 28, 2016

76.

Vi VDD ID

slide-72
SLIDE 72

CMOS timing delays (𝒖𝒔, 𝒖𝒈, 𝒖𝑸𝑰𝑴, 𝒖𝑸𝑴𝑰, 𝒖𝑼𝑰𝑴, 𝒖𝑼𝑴𝑰)

January 28, 2016

77.

Vi Vo VDD

slide-73
SLIDE 73

Masking – XOR gate example

January 28, 2016

78.

x1 y1 x2 y2 z1 z2 x y z

𝑦⨁𝑧 = 𝑦1⨁𝑦2 ⨁ 𝑧1⨁𝑧2 = 𝑦1⨁𝑧1 ⨁ 𝑦2⨁𝑧2 = 𝑨1⨁𝑨2 = 𝑨

slide-74
SLIDE 74

Masking – AND gate example

January 28, 2016

79.

x y z

x1 y1 x1 y2 x2 y2 x2 y1 z1 z2 z1

𝑦 ∙ 𝑧 = 𝑦1⨁𝑦2 ∙ 𝑧1⨁𝑧2 = 𝑨1 ⊕ 𝑨1 ⊕ (𝑦1 ∙ 𝑧1) ⊕ (𝑦1 ∙ 𝑧2) ⊕ (𝑦2 ∙ 𝑧1) ⊕ 𝑦2 ∙ 𝑧2 = 𝑨1⨁𝑨2 = 𝑨

slide-75
SLIDE 75

Masking – AND gate example (delays cause leakage)

January 28, 2016

80.

x1 y1 x1 y2 x2 y2 x2 y1 z1 z2 z1

y1 y2 y switching

  • 1

1 1 AND, 1 XOR 1 1 1 AND, 2 XOR 1 1 2 AND, 2 XOR

y = 0 => 2 AND, 2 XOR gates switching on average y = 1 => 2 AND, 3 XOR gates switching on average

slide-76
SLIDE 76

Masking with Sufficient Noise

slide credit: Marcel Medwed

slide-77
SLIDE 77

CORRELATION POWER ANALYSIS

slide-78
SLIDE 78

January 28, 2016

83.

Correlation Power Analysis

S-BOX (8-bit)

plaintext (𝑞) ciphertext (𝑑) key (𝑙) 𝑦𝑘

𝑗 = 𝐼𝑋 𝑙𝑗 ⊕ 𝑞𝑘

0 < 𝑗 < 255, 0 < 𝑘 < 𝑜 𝑧𝑘 = 𝑔 𝐼𝑋 𝐿 ⊕ 𝑞𝑘 Measured current! 𝜍𝑦𝑧

𝑗

= 𝑘=0

𝑜−1 𝑦𝑘 𝑗 −

𝑦 𝑧𝑘 − 𝑧 𝑘=0

𝑜−1 𝑦𝑘 𝑗 −

𝑦

2

𝑘=0

𝑜−1 𝑧𝑘 −

𝑧

2

# of traces Pearson correlation

slide-79
SLIDE 79

January 28, 2016

84.

Correlation Power Analysis

S-BOX (8-bit)

plaintext (𝑞) ciphertext (𝑑) key (𝑙) 𝑦𝑘

𝑗 = 𝐼𝑋 𝑇𝐶𝑃𝑌 𝑙𝑗 ⊕ 𝑞𝑘

0 < 𝑗 < 255, 0 < 𝑘 < 𝑜 𝑧𝑘 = 𝑔 𝐼𝑋 𝑇𝐶𝑃𝑌 𝐿 ⊕ 𝑞𝑘 Measured current! 𝜍𝑦𝑧

𝑗

= 𝑘=0

𝑜−1 𝑦𝑘 𝑗 −

𝑦 𝑧𝑘 − 𝑧 𝑘=0

𝑜−1 𝑦𝑘 𝑗 −

𝑦

2

𝑘=0

𝑜−1 𝑧𝑘 −

𝑧

2

# of traces Pearson correlation

slide-80
SLIDE 80

COST OF COUNTERMEASURES

slide-81
SLIDE 81

Security comes at Price – Area Overhead

January 28, 2016

86.

Insecure X GE SCA-secure 5X GE SCA&FA-secure 10X GE

slide-82
SLIDE 82

Security comes at Price – Performance Penalty

January 28, 2016

87.

Insecure N s SCA-secure ~3-5N s SCA&FA-secure ~8-10N s

slide-83
SLIDE 83

Challenges – SCA Countermeasures

January 28, 2016

88.

INCOMPLETE MODELS

Circuit models Adversary models

1st vs higher

  • rder DPA
slide-84
SLIDE 84

Challenges – FA Countermeasures

January 28, 2016

89.

LACK OF CREATIVITY

Redundant executions Dummy

  • perations

Light sensors

slide-85
SLIDE 85

THANK YOU!

January 28, 2016

90.

Thanks to the teams of KATAN, SPONGENT, PRINCE, FIDES

slide-86
SLIDE 86

Workshop on Crypto Design for IoT

January 28, 2016

91.

https://www.cosic.esat.kuleuven.be/ecrypt_net_iot_workshop_2016/