Differential Power Analysis (DPA) With Key Ranking and Enumeration - - PowerPoint PPT Presentation

differential power analysis dpa with key ranking and
SMART_READER_LITE
LIVE PREVIEW

Differential Power Analysis (DPA) With Key Ranking and Enumeration - - PowerPoint PPT Presentation

Differential Power Analysis (DPA) With Key Ranking and Enumeration Martijn Stam COINS Winterschool in Finse, May 2019 The Rise of Side-Channels The Ideal World 2 Modern Cryptology From Katz and Lindells Classic Textbook Three Principles


slide-1
SLIDE 1

Differential Power Analysis (DPA) With Key Ranking and Enumeration

Martijn Stam COINS Winterschool in Finse, May 2019

slide-2
SLIDE 2

The Rise of Side-Channels The Ideal World 2

Modern Cryptology

From Katz and Lindell’s Classic Textbook

Three Principles

1 Formal Definitions: “giving a clear description

  • f what threats are in scope and what security

guarantees are desired”

2 Precise Assumptions: “that are simpler to state,

since [they] are easier to study and (potentially) refute”

3 Proofs of Security: “that a construction satisfies

a definition under certain specified assumptions”

slide-3
SLIDE 3

The Rise of Side-Channels The Ideal World 3

FDH-RSA Cryptosystem

Black-box perspective of chosen-message attacks

(N, d, e) ←$ Kg

S ← H(M)d mod N

M S

ˆ M, ˆ S

N, e

Adveuf-cma

FDH−RSA(❆) = Pr

  • ˆ

Se mod N = H( ˆ M), ˆ M fresh

slide-4
SLIDE 4

The Rise of Side-Channels The Real World 4

The Rise of Side-Channels

Paul Kocher’s Revolution

https://www.paulkocher.com/ 1996 Timing Attacks 1999 Simple and Differential Power Analysis (DPA) (w. Joshua Jaffe & Benjamin Jun) 2016 https://www.youtube.com/ watch?v=6lt7ExN6Kw4 Power Analysis Measuring power consumption over time allows (relatively) easy recovery of secret keys

slide-5
SLIDE 5

The Rise of Side-Channels The Real World 5

SPA: Simple Power Analysis

A Simple Attack Against Unprotected RSA

What is SPA SPA exploits data-dependent differences in power consumption of a single

  • peration to recover secret information.

S ← H(m)d mod N

s ← 1, x ← H(m) while d > 0 if d odd then s ← s · x mod N x ← x2 mod N d ← ⌊d/2⌋

Simple Attack Assume you can tell multiplications and squarings apart. So you observe something like SMSSMSSM Corresponds to exponent (10101)2 = 21

slide-6
SLIDE 6

The Rise of Side-Channels The Real World 6

DPA: Differential Power Analysis

The workhorse of side-channel attacks

What is DPA DPA exploits data-dependent correlation in power consumption over multiple, related operations to recover secret information. Power of DPA Any unprotected implementation will eventually be susceptible. Countermeasures All implementations will need protection against side channels.

slide-7
SLIDE 7

The Rise of Side-Channels The Real World 7

Power Analysis Attacks

Stefan Mangard, Elisabeth Oswald, and Thomas Popp’s Classic

Revealing the Secrets of Smart Cards “first comprehensive treatment of power analysis attacks and countermeasures” Aimed at the practitioner From 2007 ⇒ no modern ideas and theory

slide-8
SLIDE 8

Outline

1

How Differential Power Attacks Work Our Setting A Typical Pipeline for Key Recovery Profiled Attack Example

2

Key Enumeration and Ranking Enumeration Ranking

3

Conclusion Want to Learn More?

slide-9
SLIDE 9

How Differential Power Attacks Work Our Setting 9

Modern Cryptology

Black-box Blockciphers

What is a Blockcipher A blockcipher E is family of keyed permutations E : {0, 1}k × {0, 1}n → {0, 1}n where k is the key length and n the block length Blockcipher Usage Use a mode-of-operation like GCM to create an encryption scheme GCM security proof assumes the blockcipher E is a “PRP” So E is treated as a black box What happens if you can see it “work”?

slide-10
SLIDE 10

How Differential Power Attacks Work Our Setting 10

Modern Cryptology

AES-128

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Round function f

AK SB

S

x x x x

SR

C ← M × C

x x x x

MC

wi−1 xi yi zi wi

Design k = 128, n = 128, where 128 = 16 × 8 (16 bytes) 10 rounds of whitened SP network Non-linearity comes from bytewise S-boxes

Images: TikZ for Cryptographers, Jérémy Jean, www.iacr.org/authors/tikz/

slide-11
SLIDE 11

How Differential Power Attacks Work Our Setting 11

SCALE: A Resource by Dan Page

https://github.com/danpage/scale

Side-Channel Attack Lab. Exercises Provides a suite of material related to side-channel (and fault) attacks that is low-cost, accessible, relevant, coherent, and effective. SCALE Data Sets

1 Four platforms: an Atmel atmega328p (an AVR) plus three NXP ARM

Cortex-M processors

2 Implementation uses an 8-bit datapath and look-up tables for the

S-box and xtime operations (but code not known)

3 2 × 1000 traces of AES-128 each (known vs. unknown key) 4 Traces acquired using a Picoscope 2206B, using triggers for alignment

slide-12
SLIDE 12

How Differential Power Attacks Work Our Setting 12

Plotting a Trace

SCALE’s AES-128 on an Atmel

20000 40000 60000 80000 100000 120000 0.4 0.2 0.0 0.2 0.4

A full trace k = 2B7E151628AED2A6ABF7158809CF4F3C Total of 132, 292 points You can see a pattern repeating roughly 10 times

slide-13
SLIDE 13

How Differential Power Attacks Work Our Setting 13

Finding the Rounds

Using crosscorrelation

50000 100000 150000 200000 250000 200 400 600 800 1000 1200

Crosscorrelation of a trace Compares how well shifs of the trace match the original ci =

  • j

ajai+j Leads to round duration of 12421

slide-14
SLIDE 14

How Differential Power Attacks Work Our Setting 13

Finding the Rounds

Using crosscorrelation

2000 4000 6000 8000 10000 12000 0.5 0.4 0.3 0.2 0.1 0.0 0.1 0.2 2000 4000 6000 8000 10000 12000 0.1 0.0 0.1 0.2 0.3

Plotting the Rounds Jointly Left Rounds 1 and 2 superimposed Round 1 is building up power Right Rounds 5 and 6 superimposed Peaks and jittery areas match well

slide-15
SLIDE 15

How Differential Power Attacks Work Our Setting 14

Plotting a Trace

SCALE’s AES-128 on an Atmel

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

3rd round close up

1 Some peaks, some jitter 2 Hard to really discern much of interest...

slide-16
SLIDE 16

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

Engineer’s Perspective (MOP, Ch. 4) Ptotal = Pop + Pdata + Pel. noise + Pconst Pop d.o. the operation Pdata d.o. the data

  • Pel. noise electrical noise

Pconst constant base

slide-17
SLIDE 17

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

Engineer’s Perspective (MOP, Ch. 4) Pop + Pdata = Pexp + Psw. noise Pop d.o. the operation Pdata d.o. the data Pexp exploitable signal

  • Psw. noise switching noise
slide-18
SLIDE 18

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

Engineer’s Perspective (MOP, Ch. 4) Ptotal = Pexp + Psw. noise + Pel. noise + Pconst

  • Pel. noise electrical noise

Pconst constant base Pexp exploitable signal

  • Psw. noise switching noise
slide-19
SLIDE 19

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

Theoretician’s Perspective Ptotal = f(data) + N(0, σ) f(data) mainly models Pexp, function f incorporates Pop and Pconst σ depends on Psw. noise and Pel. noise

slide-20
SLIDE 20

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

Some Caveats

1 Which operations are performed on which registers can be relevant 2 Looking at multiple points might lead to multivariate dependencies 3 Sometimes noise levels (σ) are data-dependent 4 The function f and noise level σ are unknown

slide-21
SLIDE 21

How Differential Power Attacks Work Our Setting 15

Signal versus Noise

What determines the power consumption?

20000 40000 60000 80000 100000 120000 0.4 0.2 0.0 0.2 0.4 20000 40000 60000 80000 100000 120000 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008

Atmel AES, Based on 1000 traces Assuming no branches in the execution Left Pointwise sample mean: Pconst + Pop Right Pointwise sample variance: Pdata + Pel. noise Both Pexp and Psw. noise depend on your target...

slide-22
SLIDE 22

How Differential Power Attacks Work Our Setting 16

Signal versus Noise

Intermediate values and target selection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Round function f

AK SB

S

x x x x

SR

C ← M × C

x x x x

MC

wi−1 xi yi zi wi

The Locality of Leakage Intermediate value: the (few) byte(s) involved in a specific operation Locality assumption: leakage primarily depends on the intermediate value operated upon

26000 28000 30000 32000 34000 36000 38000 0.1 0.0 0.1 0.2 0.3

slide-23
SLIDE 23

How Differential Power Attacks Work Our Setting 16

Signal versus Noise

Intermediate values and target selection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Round function f

AK SB

S

x x x x

SR

C ← M × C

x x x x

MC

wi−1 xi yi zi wi

The Locality of Leakage Intermediate value: the (few) byte(s) involved in a specific operation Locality assumption: leakage primarily depends on the intermediate value operated upon Target intermediate value captured by Pexp The “rest” contributes to

  • Psw. noise
slide-24
SLIDE 24

How Differential Power Attacks Work Our Setting 16

Signal versus Noise

Intermediate values and target selection

5500 6000 6500 7000 7500 8000 8500 0.03 0.02 0.01 0.00 0.01 0.02 0.03 0.04 00 0F F0 FF 5500 6000 6500 7000 7500 8000 8500 0.06 0.04 0.02 0.00 0.02 0.04 00 0F F0 FF

First Round, Byte Pos. “0”, keybyte 2B Left Average leakage based on select plaintext values Right Average leakage based on select sbox inputs Initial peak correlates more with plaintext Final peak correlates more with sbox input

slide-25
SLIDE 25

How Differential Power Attacks Work Our Setting 17

SNR: Signal to Noise Ratio

Visualizing “Simple” Leakage

Mangard’s SNR Recall we said Ptotal = f(data) + N(0, σ) f(data) is called the signal, the other term the noise SNR = Var(signal) Var(noise) = Vardataf σ2

slide-26
SLIDE 26

How Differential Power Attacks Work Our Setting 17

SNR: Signal to Noise Ratio

Visualizing “Simple” Leakage

2000 4000 6000 8000 10000 12000 14000 5 10 15 20

Round 1 SNRs, sample estimate 16 Sbox inputs as separate targets Fixed key, so equivalent to 16 plaintext bytes Clearly see the different bytes leak repeatedly, one afer the other

slide-27
SLIDE 27

How Differential Power Attacks Work Our Setting 17

SNR: Signal to Noise Ratio

Visualizing “Simple” Leakage

5500 6000 6500 7000 7500 8000 5 10 15 20

Round 1 SNRs, zoom in Clearly see the different bytes leak repeatedly, one afer the other Peaks differ in height At the bases consecutive bytes leak jointly

slide-28
SLIDE 28

How Differential Power Attacks Work Our Setting 18

Hamming Weight and Hamming Distance

Two Common Leakage Models

Hamming Weight Power consumption is linear in the Hamming weight of the target data f(data) = a · HW(data) + b Correspond to model where power depends primarily on “setting” bits Hamming Distance Power consumption is linear in the Hamming distance of the target data input with the output f(data) = a · HD(datain, dataout) + b Correspond to model where power depends primarily on “flipping” bits

slide-29
SLIDE 29

How Differential Power Attacks Work Our Setting 19

SNR: Signal to Noise Ratio

AES Atmel Hamming Leakage

2000 4000 6000 8000 10000 12000 14000 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2000 4000 6000 8000 10000 12000 14000 2 4 6 8 10 12 14 16

Round 1, “0” byte SNR vs Sbox input Hamming weight plaintext Hamming weight (right) Explains the third SNR peak (lef) No non-linearity ⇒ hard to exploit

slide-30
SLIDE 30

How Differential Power Attacks Work Our Setting 19

SNR: Signal to Noise Ratio

AES Atmel Hamming Leakage

2000 4000 6000 8000 10000 12000 14000 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 2000 4000 6000 8000 10000 12000 14000 1 2 3 4 5 6

Round 1, “0” byte SNR vs Sbox output Hamming weight Sbox input Hamming weight (right) Explains the final two SNR peaks (lef) Non-linearity ⇒ exploitable for key recovery

slide-31
SLIDE 31

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 20

Kerckhoffs Principle

Known Knowns and Unkown Unknowns

Kerckhoffs’s Principle A cryptosystem’s security should reside in the the secrecy of its keys (known unknown) without any need to keep the cryptosystem secure (known known) What about Implementations What device is being used? Which cryptosystem is implemented how? Auxiliary inputs (plaintexts/ciphertexts/randomness)? But what about the leakage such as power consumption?

slide-32
SLIDE 32

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 20

Kerckhoffs Principle

Known Knowns and Unkown Unknowns

Kerckhoffs’s Principle A cryptosystem’s security should reside in the the secrecy of its keys (known unknown) without any need to keep the cryptosystem secure (known known) What about Power Consumption? Realistically, even if you know what operations are being performed, how a device leaks is too unpredictable (unknown unknown). Not-Quite-Kerckhoff Principle

slide-33
SLIDE 33

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

K∗ ←$ Kg C ← EK∗(X) L ←$ Leak(K∗, X) L ← Leak(K, X)

ˆ K

The naked guess-the-key game: Advkr

E (❆) = Pr

  • K∗ = ˆ

K

slide-34
SLIDE 34

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

K∗ ←$ Kg assert Leak ∈ L C ← EK∗(X) L ←$ Leak(K∗, X)

Leak, X L

L ← Leak(K, X)

ˆ K

The adversary selects how the leakage is derived

slide-35
SLIDE 35

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

K∗ ←$ Kg Leak ←$ L C ← EK∗(X) L ←$ Leak(K∗, X)

X L Leak

L ← Leak(K, X)

ˆ K

The adversary knows exactly how to model the leakage

slide-36
SLIDE 36

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

K∗ ←$ Kg Leak ←$ L C ← EK∗(X) L ←$ Leak(K∗, X)

X L

L ← Leak(K, X)

K,X L

ˆ K

The adversary learns how the leakage profile ˆ θ looks: Leak(data) ≈ Mˆ

θ(data)

slide-37
SLIDE 37

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

K∗ ←$ Kg Leak ←$ L C ← EK∗(X) L ←$ Leak(K∗, X)

X L

L ← Leak(K, X)

ˆ K

The adversary infers a leakage model M: Leak(data) ≈ M(data)

slide-38
SLIDE 38

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 21

Different Adversarial Scenarios

Not-Quite-Kerckhoffs Principle

Different Scenarios

1 The adversary selects how the leakage is derived

includes leakage-resilience and formal probing models

2 The adversary knows exactly how to model the leakage

used for simulated leakage models

3 The adversary learns how the leakage profile looks

captures real-life profiled attacks

4 The adversary infers a leakage model

captures real-life attacks without profiling Caveat: “Stronger” models (higher in the list) tend to be relative to less real- istic and potentially “weaker” forms of leakage

slide-39
SLIDE 39

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 22

A Typical Side-Channel Attack Pipeline

A Acquire training data control over keys and plaintexts signal processing to clean up traces B Build a profile

1 select features or PoIs 2 fix model M, estimate parameters ˆ

θ C Collect target traces unknown target key, known plaintexts use signal processing as before D Distinguish

1 extract features or PoIs 2 using model M and parameters ˆ

θ, for each key candidate, compute distinguishing score

slide-40
SLIDE 40

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 23

A Typical Side-Channel Attack Pipeline

Acquisition and Collection

Experimental setup Use oscilloscope to measure power (Optional) Use triggers to align data Use signal processing to clean up raw trace ⇐ see the textbook for details!

slide-41
SLIDE 41

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 24

A Typical Side-Channel Attack Pipeline

Feature Extraction and Points of Interest

2000 4000 6000 8000 10000 12000 14000 0.5 0.4 0.3 0.2 0.1 0.0 0.1 0.2 2000 4000 6000 8000 10000 12000 14000 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

Where does a target intermediate leak? An intermediate leaks mostly where it is being used Good to identify where this is Then reduce dimension of interesting points if possible Easiest is to select a point of interest

slide-42
SLIDE 42

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 25

A Typical Side-Channel Attack Pipeline

Build a Model, Profiling

2000 4000 6000 8000 10000 12000 14000 2 4 6 8 10 12 14 16 2000 4000 6000 8000 10000 12000 14000 1 2 3 4 5 6

How does a target intermediate leak? Assume Leak(data) = Mθ(data) for known model M Estimate the “real” parameters θ by ˆ θ

slide-43
SLIDE 43

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 26

A Typical Side-Channel Attack Pipeline

Build a Model, Profiling

Lessons from Machine Learning Suppose the real leakage follows data-dependent distribution Leak(data)

1 Assume that unknown Leak(data) follows known model M with

unknown parameters θ Leak(data) ≈ Mθ(data)

2 Estimate the “real” parameters θ by ˆ

θ Mθ(data) ≈ Mˆ

θ(data)

Warning: More complex models have smaller modelling errors (first point) at the expense of larger estimation errors (second point)

slide-44
SLIDE 44

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 27

A Typical Side-Channel Attack Pipeline

Distinguish using Divide-and-Conquer

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Round function f

AK SB

S

x x x x

SR

C ← M × C

x x x x

MC

wi−1 xi yi zi wi

Divide-and-Conquer Idea: Recover each subkey byte separately For all 256 candidates, calculate a distinguishing score The lowest (or highest) score indicates the likely true subkey byte Guess all 16 subkey bytes correctly ⇔ guess the AES key correctly

slide-45
SLIDE 45

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 27

A Typical Side-Channel Attack Pipeline

Distinguish using Divide-and-Conquer

k0 score 0.123... 1 0.127... . . . . . . 255 0.238... k1 score 0.134... 1 0.116... . . . . . . 255 0.098... ... k15 score 0.184... 1 0.167... . . . . . . 255 0.152...

Divide-and-Conquer Idea: Recover each subkey byte separately For all 256 candidates, calculate a distinguishing score The lowest (or highest) score indicates the likely true subkey byte Guess all 16 subkey bytes correctly ⇔ guess the AES key correctly

slide-46
SLIDE 46

How Differential Power Attacks Work A Typical Pipeline for Key Recovery 27

A Typical Side-Channel Attack Pipeline

Distinguish using Divide-and-Conquer

k0 score 0.123... 1 0.127... . . . . . . 255 0.238... k1 score 0.134... 1 0.116... . . . . . . 255 0.098... ... k15 score 0.184... 1 0.167... . . . . . . 255 0.152...

Distinguishing Scores using Leak(data) ≈ Mˆ

θ(data)

1 Assume data relevant for leakage only depends on one subkey

(easiest to attack AES first or last round)

2 For each 256 possibilities and each trace,

calculate how the modelled leakage would look

3 Compare modelled leakage with observed trace, combine into score