+ = Photo from Iain Tate on Flickr Photo from Becky Stern on - - PowerPoint PPT Presentation

photo from iain tate on flickr photo from becky stern on
SMART_READER_LITE
LIVE PREVIEW

+ = Photo from Iain Tate on Flickr Photo from Becky Stern on - - PowerPoint PPT Presentation

A C OMPREHENSIVE E VALUATION OF M UTUAL I NFORMATION A NALYSIS U SING A F AIR E VALUATION F RAMEWORK Carolyn Whitnall, Elisabeth Oswald carolyn.whitnall@bris.ac.uk Department of Computer Science, University of Bristol 16 th August 2011 C. W


slide-1
SLIDE 1

A COMPREHENSIVE EVALUATION OF MUTUAL INFORMATION ANALYSIS USING A FAIR EVALUATION FRAMEWORK

Carolyn Whitnall, Elisabeth Oswald

carolyn.whitnall@bris.ac.uk Department of Computer Science, University of Bristol

16th August 2011

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 1 / 1

slide-2
SLIDE 2

+

Photo from Iain Tate on Flickr

=

Photo from Casey Marshall on Flickr Photo from Becky Stern on Flickr

Algorithm + Device = Measurements! But how to make the most of those measurements?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 2 / 1

slide-3
SLIDE 3

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-4
SLIDE 4

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-5
SLIDE 5

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-6
SLIDE 6

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-7
SLIDE 7

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-8
SLIDE 8

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-9
SLIDE 9

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations True key Nearest rival

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-10
SLIDE 10

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations True key Nearest rival

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-11
SLIDE 11

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations True key Nearest rival

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-12
SLIDE 12

WHAT IS A SIDE-CHANNEL DISTINGUISHER?

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations True key Nearest rival

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 3 / 1

slide-13
SLIDE 13

WHAT MAKES A GOOD DISTINGUISHER?

THE USUAL APPROACH. . .

Desirable metric: “# of trace measurements required for key recovery” Not like-for-like: Practical outcomes highly sensitive to estimator choice Not computable: Sampling distributions (usually) unknown

OUR CONTRIBUTION

‘True’ distinguishing vectors can be directly computed for well-defined hypothetical scenarios Theoretic advantages = ⇒ practical advantages (unequal estimation costs) BUT Certain characteristics have a strong bearing on likely practical outcomes What features of the theoretic distinguishing vectors most contribute to its estimatability?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 5 / 1

slide-14
SLIDE 14

WHAT MAKES A GOOD DISTINGUISHER?

THE USUAL APPROACH. . .

Desirable metric: “# of trace measurements required for key recovery” Not like-for-like: Practical outcomes highly sensitive to estimator choice Not computable: Sampling distributions (usually) unknown

OUR CONTRIBUTION

‘True’ distinguishing vectors can be directly computed for well-defined hypothetical scenarios Theoretic advantages = ⇒ practical advantages (unequal estimation costs) BUT Certain characteristics have a strong bearing on likely practical outcomes What features of the theoretic distinguishing vectors most contribute to its estimatability?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 5 / 1

slide-15
SLIDE 15

‘A FAIR EVALUATION FRAMEWORK’

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations

Correct key ranking in the theoretic vector ◮ Distinguisher must isolate key in theory to stand a

chance in practice

Nearest-rival distinguishing score – # s.d. between correct key value and highest ranked alternative ◮ The smaller the margin, the fewer the traces needed

for estimation!

Average minimum support – how large an input support does the distinguisher need? ◮ An attack which needs to ‘see more inputs’ will

inevitably need more traces

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 6 / 1

slide-16
SLIDE 16

‘A FAIR EVALUATION FRAMEWORK’

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations

Correct key ranking in the theoretic vector ◮ Distinguisher must isolate key in theory to stand a

chance in practice

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations

Nearest-rival distinguishing score – # s.d. between correct key value and highest ranked alternative ◮ The smaller the margin, the fewer the traces needed

for estimation!

Average minimum support – how large an input support does the distinguisher need? ◮ An attack which needs to ‘see more inputs’ will

inevitably need more traces

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 6 / 1

slide-17
SLIDE 17

‘A FAIR EVALUATION FRAMEWORK’

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations

Correct key ranking in the theoretic vector ◮ Distinguisher must isolate key in theory to stand a

chance in practice

10 20 30 40 50 60 −0.4 −0.2 0.2 0.4 0.6 Key hypothesis Distinguisher value 10 20 30 40 50 60 −2 −1 1 2 3 # standard deviations

Nearest-rival distinguishing score – # s.d. between correct key value and highest ranked alternative ◮ The smaller the margin, the fewer the traces needed

for estimation!

10 20 30 40 0.2 0.4 0.6 0.8 1 Support size Theoretic success rate

Average minimum support – how large an input support does the distinguisher need? ◮ An attack which needs to ‘see more inputs’ will

inevitably need more traces

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 6 / 1

slide-18
SLIDE 18

THE DISTINGUISHERS AT A GLANCE. . .

MIA: MUTUAL INFORMATION

Defined as: D(k) = I(Lk∗ + ε; Mk) = H(Lk∗ + ε) − H(Lk∗ + ε|Mk), where H is the differential entropy: H(X) = −

  • x∈X pX(x)log2(pX(x))

Functional of the distribution—estimation problematic

DPA outcomes extremely sensitive to estimator choice; no ‘ideal’ exists No general results for the sampling distributions

CPA: PEARSON’S CORRELATION COEFFICIENT

Defined as: D(k) = ρ(Lk∗ + ε, Mk) =

Cov(Lk∗+ε,Mk)

Var(Lk∗+ε)√ Var(Mk)

Function of distributional moments—estimation simple

Sample correlation coefficient suits a broad range of assumptions Lots of ‘nice’ results for its sampling distribution

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 7 / 1

slide-19
SLIDE 19

THE DISTINGUISHERS AT A GLANCE. . .

MIA: MUTUAL INFORMATION

Defined as: D(k) = I(Lk∗ + ε; Mk) = H(Lk∗ + ε) − H(Lk∗ + ε|Mk), where H is the differential entropy: H(X) = −

  • x∈X pX(x)log2(pX(x))

Functional of the distribution—estimation problematic

DPA outcomes extremely sensitive to estimator choice; no ‘ideal’ exists No general results for the sampling distributions

CPA: PEARSON’S CORRELATION COEFFICIENT

Defined as: D(k) = ρ(Lk∗ + ε, Mk) =

Cov(Lk∗+ε,Mk)

Var(Lk∗+ε)√ Var(Mk)

Function of distributional moments—estimation simple

Sample correlation coefficient suits a broad range of assumptions Lots of ‘nice’ results for its sampling distribution

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 7 / 1

slide-20
SLIDE 20

WHY ‘MUTUAL INFORMATION ANALYSIS’?

Proposed (Gierlichs et al., 2008) as an enhancement to correlation DPA: Optimal in an information theoretic sense – quantifies total dependence Generic – should work even without a good power model

  • However. . . correlation DPA frequently performs better in empirical

comparisons What can we learn from a theoretic evaluation? Distinguisher Power model Abbreviation Correlation DPA Hamming weight CPA(HW) Mutual Information Analysis Hamming weight MIA(HW) Identity MIA(ID)

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 8 / 1

slide-21
SLIDE 21

WHY ‘MUTUAL INFORMATION ANALYSIS’?

Proposed (Gierlichs et al., 2008) as an enhancement to correlation DPA: Optimal in an information theoretic sense – quantifies total dependence Generic – should work even without a good power model

  • However. . . correlation DPA frequently performs better in empirical

comparisons What can we learn from a theoretic evaluation? Distinguisher Power model Abbreviation Correlation DPA Hamming weight CPA(HW) Mutual Information Analysis Hamming weight MIA(HW) Identity MIA(ID)

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 8 / 1

slide-22
SLIDE 22

WHY ‘MUTUAL INFORMATION ANALYSIS’?

Proposed (Gierlichs et al., 2008) as an enhancement to correlation DPA: Optimal in an information theoretic sense – quantifies total dependence Generic – should work even without a good power model

  • However. . . correlation DPA frequently performs better in empirical

comparisons What can we learn from a theoretic evaluation? Distinguisher Power model Abbreviation Correlation DPA Hamming weight CPA(HW) Mutual Information Analysis Hamming weight MIA(HW) Identity MIA(ID)

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 8 / 1

slide-23
SLIDE 23

NOISE-FREE HAMMING WEIGHT LEAKAGE

10 20 30 40 50 60 −0.5 0.5 1 Offset from correct key (k ⊕ k*) Distinguisher value Correlation attack against the first DES S−Box 10 20 30 40 50 60 −1 1 2 3 # standard deviations True key Nearest rival 10 20 30 40 50 60 0.5 1 1.5 2 2.5 Offset from correct key (k ⊕ k*) Distinguisher value Mutual information attack against the first DES S−Box 10 20 30 40 50 60 −1 1 2 3 4 5 6 7 # standard deviations True key Nearest rival

CPA(HW) MIA(HW) MIA(ID) Correct key ranking 1 1 1 Nearest-rival distinguishing score 2.14 5.61 5.08 Average minimum support 6 8 16

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 9 / 1

slide-24
SLIDE 24

MIA STRANGELY SENSITIVE TO NOISE

0.125 0.5 2 8 32 128 2 3 4 5 6 7 Signal−to−noise ratio Distinguishing score Nearest−rival distinguishing score MIA(ID) MIA(HW) CPA(HW)

Impact of noise on nearest rival distinguishing score: Constant for correlation-based distinguisher Evidence of stochastic resonance for MI-based distinguishers (Note: no change in required support sizes throughout tested range)

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 10 / 1

slide-25
SLIDE 25

MIA SHOWS PROMISE IN LESS TYPICAL SCENARIOS. . .

Candidate scenario: Hamming distance leakage from reference state 4(10) = 0100(2)

|CPA(HW)| MIA(HW) MIA(ID) Correct key ranking 1 1 1 Nearest rival distinguishing score 0.86 3.93 4.57 Average minimum support 34 15 17

Question 1: Do these advantages persist in the presence of noise? Question 2: If so, can they be translated to practical advantages with standard estimation procedures?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 11 / 1

slide-26
SLIDE 26

. . . STILL LOOKING PROMISING. . .

Question 1: Do the theoretic advantages in the ‘pure signal’ setting persist in the presence of noise?

0.125 0.5 2 8 32 128 1 2 3 4 5 Signal−to−noise ratio Distinguishing score Nearest−rival distinguishing score MIA(ID) MIA(HW) CPA(HW) 0.125 0.5 2 8 32 128 10 20 30 40 50 Average minimum support Signal−to−noise ratio Input support size MIA(ID) MIA(HW) CPA(HW)

✪MIA(HW)

Distinguishing score falls below that of CPA(HW) Hefty penalty in terms of required support size

✧MIA(ID)

Maintains substantially larger distinguishing scores Required support size remains constant

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 12 / 1

slide-27
SLIDE 27

. . . EXPERIMENTAL RESULTS CONFIRM IT!

Question 2: Can the theoretic advantages be translated to practical advantages with standard estimation procedures?

0.125 0.5 2 8 32 128 10 100 1000 Signal−to−noise ratio Number of traces Traces required for key recovery: mean MIA(ID) (16 bins) MIA(HW) (5 bins) CPA(HW)

✪MIA(HW) Least efficient in all but the pure-signal scenario ✧MIA(ID) Comparable to CPA(HW) when SNR ≤ 0.5, but more efficient thereafter

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 13 / 1

slide-28
SLIDE 28

BAD NEWS FOR DUAL-RAIL PRECHARGE LOGIC?

Unless output capacitances are perfectly balanced then some data-dependent signal will still leak Power consumption when not perfectly balanced can be likened to the HD from a constant reference state:

Reference state ← → Bit-wise difference in the wire capacitances

Confirmed by experimental attacks in Gierlichs et al., 2008 MIA can be used to thwart countermeasures which resist correlation DPA!

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 14 / 1

slide-29
SLIDE 29

IN CONCLUSION

The problem: Empirical studies don’t enable concrete, like-for-like comparisons between distinguishers Our solution: A theoretic evaluation which bypasses the practical problems

  • f estimation

Implications for MI-based distinguishers: There are scenarios where MI has a substantial theoretic advantage (e.g. Hamming distance leakage, DRP logic) Such advantages can be translated into practical advantages The (standardised) MI distinguishing vector exhibits a type of stochastic resonance as noise levels vary Whitnall, C and Oswald, E: A Fair Evaluation Framework for Comparing Side-Channel Distinguishers. Journal of Cryptographic Engineering, 2011.

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 15 / 1

slide-30
SLIDE 30

IN CONCLUSION

The problem: Empirical studies don’t enable concrete, like-for-like comparisons between distinguishers Our solution: A theoretic evaluation which bypasses the practical problems

  • f estimation

Implications for MI-based distinguishers: There are scenarios where MI has a substantial theoretic advantage (e.g. Hamming distance leakage, DRP logic) Such advantages can be translated into practical advantages The (standardised) MI distinguishing vector exhibits a type of stochastic resonance as noise levels vary Whitnall, C and Oswald, E: A Fair Evaluation Framework for Comparing Side-Channel Distinguishers. Journal of Cryptographic Engineering, 2011.

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 15 / 1

slide-31
SLIDE 31

THANK YOU FOR LISTENING!

Any questions?

  • C. WHITNALL (UNIVERSITY OF BRISTOL)

EVALUATING MIA CRYPTO 2011 16 / 1