Lightweight Cryptography and and RFID Security Svetla Nikova - - PowerPoint PPT Presentation
Lightweight Cryptography and and RFID Security Svetla Nikova - - PowerPoint PPT Presentation
Lightweight Cryptography and and RFID Security Svetla Nikova COSIC KUL COSIC, KULeuven and UTwente d UT t Overview Lightweight cryptography - state of the art Comparison Standard vs. Lightweight Comparison Standard vs. Lightweight
Overview
- Lightweight cryptography - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- TI implementations
- Area, Power or Throughput?
- Conclusions
Lightweight Crypto
Stream ciphers (3): only the eStream finalists:
2005 Grain, Trivium, Mickey
Lightweight Crypto
Stream ciphers (3): only the eStream finalists:
2005 Grain, Trivium, Mickey
Bl k i h (25) Block ciphers (25):
1977 DES, 1989 GOST 1997 XTEA 1998 AES 2005 mCrypton, STEA 2006 Hight, SEA 2007 Clefia Kasumi DESL DESXL Present 2007 Clefia, Kasumi, DESL, DESXL, Present 2008 Puffin 2009 Katan, Ktantan, Hummingbird, MIBS 2010 PRINT 2011 Klein, LED, Twine, EPCBC, Vitamin-B, Piccolo
Lightweight Crypto
Stream ciphers (3): only the eStream finalists:
2005 Grain, Trivium, Mickey
Bl k i h (25) Block ciphers (25):
1977 DES, 1989 GOST 1997 XTEA 1998 AES 2005 mCrypton, STEA 2006 Hight, SEA 2007 Clefia Kasumi DESL DESXL Present 2007 Clefia, Kasumi, DESL, DESXL, Present 2008 Puffin 2009 Katan, Ktantan, Hummingbird, MIBS 2010 PRINT 2011 Klein, LED, Twine, EPCBC, Vitamin-B, Piccolo
Hash functions (10):
2007 MAME 2008 Squash DM Present H Present Keccak 2008 Squash, DM-Present, H-Present, Keccak 2010 Quark, Armadilo 2011 Spongent, Vitamin-H, Photon
Overview
- Lightweight crypto - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- TI implementations
- Area, Power or Throughput?
- Conclusions
Standard vs. Lightweight
Modules AES ‐ T Present Memory 2040 887 AES-T (TUG) 2005 0.35 µm CMOS Technology of Philips 100kHz @1 5V Encryption + Decryption [GE] Mix Column [GE] 306 100kHz @1.5V Encryption + Decryption Present (RUB/DTU/ORANGE) 2007 UMC L180 0.18μm 1P6M 100 kHz @1.8V Encryption only S‐box [GE] 408 32 FSM + Rest [GE] 646 192
Difficult to compare:
- Different technology – GE differs
- Power depends even more on the
[GE] Total Area 3400 1111
technology used.
- Here 0.35 µm vs. 0.18 μm is a big
technology difference! C l d t d d t h l
[GE] Cycles 1000 547
- Cycles do not depend on technology
- But AES=128 while Present=64 bits
- Some implementations - encryption
- nly others include decryption too
Power [μA] 3.0 1.34
- nly others include decryption too.
- Including decryption in AES adds
cost in MixColumn and FSM.
Standard (hits back) vs. Lightweight
Modules AES ‐ T AES ‐ B Present Memory 2040 1678 887 AES T [GE] Mix Column [GE] 306 373 AES-T 0.35 µm CMOS Technology of Philips 100kHz @1.5V S‐box [GE] 408 233 32 FSM + Rest [GE] 646 317 192 Encryption + Decryption Present and AES-B UMC L180 0 18μm 1P6M [GE] Total Area 3400 2601 1111 UMC L180 0.18μm 1P6M 100 kHz @1.8V Encryption only F i i i [GE] Cycles 1000 226 547 Fair comparison is now possible. Power [μA] 3.0 3.7 1.34
Standard (hits back) vs. Lightweight
Standard designs hit back AES-B (RUB/NTU) 2011
Standard (hits back) vs. Lightweight
Modules AES ‐ T AES ‐ B Present Memory 2040 1678 887
- Memory becomes smaller
in GE due to technology change
[GE] Mix Column [GE] 306 373
change.
- MixColumns become
bigger but this is the trade-off in order to
S‐box [GE] 408 233 32 FSM + Rest [GE] 646 317 192
trade off in order to gain more in the FSM.
- Canright’s S-box is used
which is smaller, but not
[GE] Total Area 3400 2601 1111
as much as indicated (again because of the technology change). I i diffi l
[GE] Cycles 1000 226 547
- It is difficult to compare
the FSM since AES-T contains also the decryption still AES B
Power [μA] 3.0 3.7 1.34
decryption, still AES-B state machine is smaller.
Standard vs. Lightweight (updated)
Modules AES ‐ B Present Memory 1678 887
Smaller key and block size
- 128 bit - too much
- 80 bit key and 64 bit data – ok
[GE] Mix Column [GE] 373
y
- 32, 48 bit data might be acceptable?
128 + 128 100 % 80 + 64 56 25 % S‐box [GE] 233 32 FSM + Rest [GE] 317 192 80 + 64 56.25 % 80 + 48 50 % 80 + 32 43.75 % [GE] Total Area 2601 1111
- Memory
- 65% for AES-B
- 80% for Present
[GE] Cycles 226 547
- 80% for Present
Power [μA] 3.7 1.34
Standard vs. Lightweight (updated)
Modules AES ‐ B Present Memory 1678 887
- P-layer costs 0 for Present.
- Simple FSM can save a lot.
[GE] Mix Column [GE] 373
- 8x8 S-box costs ~300 GE or
at least 200.
- While an 4x4 S-box costs ~ 50 GE or
S‐box [GE] 233 32 FSM + Rest [GE] 317 192
at least 30.
- Saving of 6 to 7 times in the S-box.
W k S b d P l
[GE] Total Area 2601 1111
- Weaker S-box and P-layer
compensated by a larger number of rounds - 31 vs. 10.
[GE] Cycles 226 547 Power [μA] 3.7 1.34
Standard vs. Lightweight (updated)
Still the lightweight cipher is more than twice smaller. And also the power consumption is ~ 3 times less. p p
Overview
- Lightweight crypto - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- TI implementations
- Area, Power or Throughput?
- Conclusions
Side-Channel Attacks
Device executing the cryptographic algorithm leaks information on internal state Instantaneous leakage depends on intermediate variables, which results in ti equations That have lower nonlinearity That may contain noise Power consumption depends on:
- Instructions executed
- Data processed
Signal is noisy; multiple measurements d d needed
SCA countermeasures at different levels
Hardware logic style Hardware logic style Relieves cryptographers Places burden on hardware designers Algorithms and implementations Algorithms and implementations Probably lowest feasible level Ciphers and Protocols Ciphers and Protocols New standards, takes time
Lightweight SCA protection
Simple masking are vulnerable due to glitches. Private circuits [Ishai et al ] too expensive not realistic model z = f (x) Private circuits [Ishai et.al.] – too expensive, not realistic model. Multi-party computation (TI) made practical.
1. Correctness
z = f (x) f ( )
2. Non-completeness 3 I d d t if
z1= f1 (x1,x2) z2= f2 (x1,x3) z3= f3 (x2,x3)
3. Independent uniform distribution of input
3 3 ( 2 3)
Power consumption of each fi is independent of x1, x2, x3. Secure in the presence of glitches (transition count model) Secure in the presence of glitches (transition count model) against 1st order SCA.
Example: multiplier
- = secure AND gate
- 3 shares
3 shares
- Secure in the presence of glitches
Lightweight SCA protection
Protecting Arbitrary Functions: Multiplication of elements needs at least +1 shares Multiplication of n elements needs at least n+1 shares Hardware size increases about quadratic with the number of shares Can we reduce the number of shares? Hence 3 shares we can apply only to the quadratic functions Hence 3 shares we can apply only to the quadratic functions. Pipelining: Registers are insensitive to glitches g g Split functions into parts with less non-linearity Use registers between combinatorial parts Problem: Property 3: the inputs of each step need to be independent uniformly distributed Pipelining: output of each step is input of next step W d P t 3 f t t ll We need Property 3 for output as well.
Lightweight SCA protection
Which functions can we protect? Th b f h d d h d f h f i The number of shares depends on the degree of the function. Hence 3 shares we can apply only to the quadratic functions.
- The multiplications in GF(2) (AND gate) and GF(4).
- The Boolean functions with 2 and 3 inputs
- The Boolean functions with 2 and 3 inputs.
- Noekeon (KUL) 2000, S-box.
S(x) = NL(L(NL(x)) Pipelined implementation
Noekeon Implementation Results
- Implementation using Austria
Microsystems Standard Cell Library CMOS 0.35μm
- S-Box:
- 54 GE (implementation of 2
quadratic mappings)
- correlation
- Protected S-Box:
- 188 GE (excluding 12 bit register)
- no correlation between shares and
unshared values unshared values
- Less than 4x increase (actually 3.5x)
in size N t li t l Note nonlinear part only.
Noekeon Implementation Results
- An 4x4 S-box costs ~ 50
GE or at-least 30 GE, but the s-box of Noekeon is 54 the s-box of Noekeon is 54 GE when decomposed in two quadratic mappings.
- Since the shared mappings
- Since the shared mappings
are less efficient than the
- riginals we get instead of
theoretically expected 3x y p increase slightly more 3.5x.
Lightweight SCA protection
Which 4x4 S-boxes can we protect? A Poschmann et al 2010 Present S box also can be decomposed A.Poschmann et.al 2010 – Present S-box also can be decomposed. Hence similar to Noekeon, Present can be shared with 3 shares only. There are 302 affine-equivalent classes for the 4 x 4 bijections: There are 302 affine equivalent classes for the 4 x 4 bijections: 295 cubic classes, 6 quadratic classes and 1 affine class. Bijections (permutations) in GF(2)4 belong to the symmetric group S16. Theorem A 4 x 4 bijection can be decomposed using quadratic bijections
- Theorem. A 4 x 4 bijection can be decomposed using quadratic bijections
if and only if it belongs to the alternating group A16 (151 classes).
Lightweight SCA protection
Which 4x4 S-boxes can we protect? There are 302 affine-equivalent classes for the 4 x 4 bijections: 295 cubic classes, 6 quadratic classes and 1 affine class. Bijections (permutations) in GF(2)4 belong to the symmetric group S16. j (p ) ( ) g y g p
16.
- Theorem. A 4 x 4 bijection can be decomposed using quadratic bijections
if and only if it belongs to the alternating group A16 (151 classes). H th 302/2 6 1 144 bi l i A hi h b Hence there are 302/2 - 6 - 1 = 144 cubic classes in A16 which can be decomposed.
- 30 classes can be decomposed with length 2,
- the remaining 114 classes can be decomposed with length 3
- the remaining 114 classes can be decomposed with length 3.
Thus 144 classes can be masked using only 3 shares. Decomposable S-boxes: Noekeon; Present; Serpent 0 1 2 6; Khazad PQ Decomposable S boxes: Noekeon; Present; Serpent 0,1,2,6; Khazad P,Q.
Overview
- Lightweight crypto - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- TI implementations
- Area, Power or Throughput?
- Conclusion
PRESENT ‐ Implementation Results
Modules Present Present TI Memory 887 2635 .8V
A.Poschmann et.al 2010
[GE] 300% Mix Column [GE] 0 kHz @1
Memory 80+64 bits Shared 3x increase Efficient S-box only 32 GE.
S‐box [GE] 32 355 11x FSM + Rest [G ] 192 592 308% P6M - 100
y Shared 8.8x increase + 12 bit register (pipelined) The FSM increases 3 times
[GE] 308% Total Area 1111 3582 0.18μm 1
The FSM increases 3 times. Pipeline increases the cycles and slightly the control.
[GE] 322% Cycles 547 578 106% UMC L180
In total the increase is ~ 3x Cycles – small increase only. But the power increases ~4x.
Power [μA] 1.34 5.02 375% U
Lightweight SCA protection
Can we protect AES S-box 8x8 or only 4x4 S-boxes? N li i i
- Nonlinear part = inversion over
GF(256)
- Tower field approach
- Need to ensure Property 3 in
every step
- No efficient method
No efficient method
- Large search space
- Ongoing research to make it
g g efficient.
Lightweight SCA protection
Can we protect the AES S-box? R b h h h i f h l i li i i GF(4) Remember we have the sharing of the multiplications in GF(4). But this multiplication is the only non-linear in the AES (Canright) S-box. S-box is transformed from S-box is transformed from GF(28) to GF(28)/GF(24)/GF(22) Tower field approach RUB/NTU 2011 [MPLPW2011]
Lightweight SCA protection
Can we protect the AES S-box? Th l i li i i GF(4) i h l li i h AES S b The multiplications in GF(4) is the only non-linear in the AES S-box. Recall our countermeasure requires registers between different stages of shared functions. Thus Canright’s S box representation requires in total five pipelining Thus Canright s S-box representation requires in total five pipelining stages. This implies that in total one needs to store 174 bits.
AES Implementation Results
Modules AES ‐ B AES TI Memory 1678 5055 8V
RUB/NTU 2011 [MPLPW2011] Memory 2x128 bits
y [GE] 300% Mix Column [GE] 373 1120 300% 0 kHz @1.8
Memory 2x128 bits Shared 3x increase Complex S-box only 233 GE. Sh d 13 7 i +
S‐box [GE] 233 4244 18x FSM + Rest 317 695 P6M - 100
Shared 13.7x increase + 174 bit register (pipelined) The FSM increases only
[GE] 219% Total Area 2601 11114 0.18μm 1P
2 times. In total the increase is ~ 4x
[GE] 427% Cycles 226 266 118% MC L180 0
Cycles – small increase only. But the power increases ~4x.
Power [μA] 3.7 13.4 362% UM
Threshold Implementation [MPLPW2011]
- Present TI - first order DPA fail with 5 million measurements.
(data masking, key masking, random data and key permutations).
- AES TI - 5 million traces correlation collision attack succeeds because
uniformity fails and resharing is required.
- With resharing 100 million traces are still insufficient for CPA using a
HD d l d MIA i HD d l thi d d CPA ith 400 HD model and MIA using a HD model, even third-order CPA with 400 million traces fails.
Threshold Implementation Results
Modules AES ‐ B AES TI Present Present TI Memory 1678 5055 887 2635 UMC L1 [GE] 300% 300% Mix Column [GE] 373 1120 300% 180 0.18μ S‐box [GE] 233 4244 18215% 32 355 11094% FSM + Rest [G ] 317 695 219% 192 592 308% m 1P6M - [GE] 219% 308% Total Area 2601 11114 1111 3582
- 100 kHz
[GE] 427% 322% Cycles 226 266 118% 547 578 106% @1.8V Power [μA] 3.7 13.4 362% 1.34 5.02 375%
Overview
- Lightweight crypto - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- Comparing different TI implementations
g
- Area, Power or Throughput?
- Conclusions
Area, Power or Throughput
3000 4000
Area
Cipher Area [GE] NXP 0.140 µm Power [µW] Consumption @ 1 MHz, 1.2V Throughput [bit/cycle]
1000 2000
Power
AES ‐T 3162 5.95 0.12 Present 1173 3.45 0.12
00 10.00 15.00
Power
1598 5.56 2.06 Katan 64 984 7.62 0.50
0.00 5.00
Throughput
1102 8.63 0.75 Grain 861 7.40 1.00
0.00 0.50 1.00 1.50 2.00 2.50 A P P K K G T C
Trivium 1298 12.02 1.00 Crypto 1 306 2.57 1.00
AES (Tina) Present Present Katan64 Katan64 Grain Trivium Crypto 1
Power and Throughput
Road tolling example: car passing with high speed should authenticate with antenna/reader on certain (height) distance. ( g ) Requirements: Di t < 10 12 Ti < 10 Distance < 10-12 m; Time < 10 ms. Why power is so important? In that “extreme” example the power consumption is more important than In that extreme example the power consumption is more important than the area. The excess of power can be used to improve the throughput. Can we do crypto on RFID 12 meters far away? C th ti t t i h t ti f ? Can we authenticate a tag in short time so far away?
Power and Throughput
Toll example requirements: Distance < 10-12 m; Time < 10 ms. So we can not only do a crypto but we can make a So we can not only do a crypto, but we can make a full authentication even with SCA protected lightweight implementation.
Distance for Fixed Time 10 ms
Cipher / Authentication Time [ms] Distance [m] AES‐T 6 10 10 12
12 20 11 10 20 30
Fixed Time 10 ms
AES TI 10 7 23 10
7
Time at Fixed Distance 10 m
Present 2 10 10 20 Present TI 8 10
6 23 8 5 10 15 20 25
Present TI 8 10 10 11
2 5 AES AES TI Present Presnt TI
Overview
- Lightweight crypto - state of the art
- Comparison Standard vs. Lightweight
Comparison Standard vs. Lightweight
- SCA countermeasures - TI approach
- TI implementations
- Lightweight Area, Power or Throughput?
- Conclusions
Conclusions
- Young and challenging research area
g g g
- Already many interesting lightweight designs
available
- New lightweight primitives should be designed with
SCA protection in mind
- The semiconductor industry shows interest in
- The semiconductor industry shows interest in
implementing lightweight primitives with SCA countermeasures
- Research should focus on all parameters not only
- n area
Thank you!