Deep Learning For Embedded Security Evaluation Emmanuel PROUFF Joint - - PowerPoint PPT Presentation

deep learning for embedded security evaluation
SMART_READER_LITE
LIVE PREVIEW

Deep Learning For Embedded Security Evaluation Emmanuel PROUFF Joint - - PowerPoint PPT Presentation

Deep Learning For Embedded Security Evaluation Deep Learning For Embedded Security Evaluation Emmanuel PROUFF Joint work with Ryad Benadjila, Eleonora Cagli (CEA LETI), C ecile Dumas (CEA LETI), Houssem Maghrebi (UL), Loic Masure (CEA LETI),


slide-1
SLIDE 1

Deep Learning For Embedded Security Evaluation

Deep Learning For Embedded Security Evaluation

Emmanuel PROUFF

Joint work with Ryad Benadjila, Eleonora Cagli (CEA LETI), C´

ecile Dumas (CEA LETI), Houssem Maghrebi (UL), Loic Masure (CEA LETI), Thibault Portigliatti (ex SAFRAN), R´ emi Strullu and Adrian Thillard

ANSSI (French Network and Information Security Agency)

June 17, 2019

June 2019, Summer School, ˇ Sibenik, Croatia| E. Prouff| 0/18

slide-2
SLIDE 2

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Probability distribution function (pdf) of Electromagnetic Emanations

Cryptographic Processing with a secret k = 1.

1/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-3
SLIDE 3

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Probability distribution function (pdf) of Electromagnetic Emanations

Cryptographic Processing with a secret k = 1.

1/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-4
SLIDE 4

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Probability distribution function (pdf) of Electromagnetic Emanations

Cryptographic Processing with a secret k = 2.

1/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-5
SLIDE 5

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Probability distribution function (pdf) of Electromagnetic Emanations

Cryptographic Processing with a secret k = 3.

1/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-6
SLIDE 6

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Probability distribution function (pdf) of Electromagnetic Emanations

Cryptographic Processing with a secret k = 4.

1/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-7
SLIDE 7

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Context:

Target Device Clone Device

2/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-8
SLIDE 8

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Context:

Target Device Clone Device

[On Clone Device] For every k estimate the pdf of − → X | K = k.

k = 4 k = 1 k = 2 k = 3

2/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-9
SLIDE 9

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Context:

Target Device Clone Device

[On Clone Device] For every k estimate the pdf of − → X | K = k.

k = 4 k = 1 k = 2 k = 3

2/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-10
SLIDE 10

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Context:

Target Device Clone Device

[On Clone Device] For every k estimate the pdf of − → X | K = k.

k = 4 k = 1 k = 2 k = 3

[On Target Device] Estimate the pdf of − → X. k = ?

2/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-11
SLIDE 11

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Context:

Target Device Clone Device

[On Clone Device] For every k estimate the pdf of − → X | K = k.

k = 4 k = 1 k = 2 k = 3

[On Target Device] Estimate the pdf of − → X. k = ? [Key-recovery] Compare the pdf estimations.

2/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-12
SLIDE 12

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-13
SLIDE 13

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-14
SLIDE 14

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks

Profiling phase (using profiling traces under known Z) Attack phase (N attack traces xi, e.g. with known plaintexts pi)

Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-15
SLIDE 15

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks

Profiling phase (using profiling traces under known Z)

◮ estimate Pr[ X|Z = z] by simple distributions for each value of z

Attack phase (N attack traces xi, e.g. with known plaintexts pi)

Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-16
SLIDE 16

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks

Profiling phase (using profiling traces under known Z)

◮ estimate Pr[ X|Z = z] for each value of z

Attack phase (N attack traces xi, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-17
SLIDE 17

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks

Profiling phase (using profiling traces under known Z)

◮ mandatory dimensionality reduction ◮ estimate Pr[ X|Z = z] for each value of z

Attack phase (N attack traces xi, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-18
SLIDE 18

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Illustration| Template Attacks|

Side Channel Attacks (Classical Approach)

Notations

  • X observation of the device behaviour

P public input of the processing Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks

Profiling phase (using profiling traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ estimate Pr[ε( ˜ X)|Z = z] for each value of z

Attack phase (N attack traces xi, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ε( ˜ X) = ε(˜ xi)|Z = f(pi, k)]

3/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-19
SLIDE 19

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions|

Defensive Mechanisms

+

Misaligning Countermeasures

Random Delays, Clock Jittering, ... In theory: assume to be insufficient to provide security In practice: one of the main issues for evaluators = ⇒ Need for efficient resynchronization techniques

4/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-20
SLIDE 20

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions|

Defensive Mechanisms

+

Misaligning Countermeasures

Random Delays, Clock Jittering, ... In theory: assume to be insufficient to provide security In practice: one of the main issues for evaluators = ⇒ Need for efficient resynchronization techniques

Masking Countermeasure

Each key-dependent internal state element is randomly split into 2 shares The crypto algorithm is adapted to always manipulate shares at = times The adversary needs to recover information on the two shares to recover K = ⇒ Need for efficient Methods to recover tuple of leakage samples that jointly depend on the target secret

4/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-21
SLIDE 21

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Motivating Conclusions

5/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-22
SLIDE 22

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Motivating Conclusions

Now: preprocessing to prepare data

◮ Traces resynchronisation ◮ Selection of PoIs

make strong hypotheses on the statistical dependency

◮ e.g. Gaussian approximation

characterization to extract information

◮ e.g. Maximum Likelihood

The proposed perspective: preprocessing to prepare data

◮ Traces resynchronisation ◮ Selection of PoIs

make strong hypotheses on the statistical dependency

◮ e.g. Gaussian approximation

Train algorithms to directly extract information

5/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-23
SLIDE 23

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Profiling phase (using profiling traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ estimate Pr[ X|Z = z] for each value of z

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-24
SLIDE 24

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks with a Classifier

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Profiling phase (using profiling traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ estimate Pr[ X|Z = z] for each value of z

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-25
SLIDE 25

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks with a Classifier

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Training phase (using training traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ estimate Pr[ X|Z = z] for each value of z

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-26
SLIDE 26

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks with a Classifier

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Training phase (using training traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ construct a classifier F s.t. F( x)[z] = y ≈ Pr[Z = z| X = x]

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log Pr[ X = xi|Z = f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-27
SLIDE 27

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks with a Classifier

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Training phase (using training traces under known Z)

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ construct a classifier F s.t. F( x)[z] = y ≈ Pr[Z = z| X = x]

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log F( xi)[f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-28
SLIDE 28

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Side Channel Attacks with a Classifier

Notations

  • X side channel trace

Z target (a cryptographic sensitive variable Z = f(P, K)) Goal: make inference over Z, observing X Pr[Z| X]

Template Attacks Machine Learning Side Channel Attacks

Training phase (using training traces under known Z)   Integrated approach

◮ manage de-synchronization problem ◮ mandatory dimensionality reduction ◮ construct a classifier F s.t. F( x)[z] = y ≈ Pr[Z = z| X = x]

Attack phase (N attack traces, e.g. with known plaintexts pi)

◮ Log-likelihood score for each key hypothesis k dk =

N

  • i=1

log F( xi)[f(pi, k)]

6/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-29
SLIDE 29

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Machine Learning Approach

Overview of Machine Learning Methodology

Human effort: choose a class of algorithms choose a model to fit + tune hyper-parameters Automatic training: automatic tuning of trainable parameters to fit data aaa

7/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-30
SLIDE 30

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Machine Learning Approach

Overview of Machine Learning Methodology

Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa

7/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-31
SLIDE 31

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Machine Learning Approach

Overview of Machine Learning Methodology

Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters MLP, ConvNet Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa

7/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-32
SLIDE 32

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Machine Learning Approach

Overview of Machine Learning Methodology

Human effort: choose a class of algorithms Neural Networks choose a model to fit + tune hyper-parameters MLP, ConvNet Automatic training: automatic tuning of trainable parameters to fit data Stochastic Gradient Descent aaa

7/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-33
SLIDE 33

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

0% 20% 40% 60% Classification Horse Dog Cat Classifier

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-34
SLIDE 34

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

0% 20% 40% 60% Classification Horse Dog Cat Classifier

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-35
SLIDE 35

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

0% 20% 40% 60% Classification Horse Dog Cat Classifier

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-36
SLIDE 36

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

0% 20% 40% 60% Classification Horse Dog Cat Classifier

It is important to explicit the data translation-invariance

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-37
SLIDE 37

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

0% 10% 20% 30% 40% 50% Classification Horse Dog Cat Classifier

? ? ?

It is important to explicit the data translation-invariance

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-38
SLIDE 38

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

Classifier 0% 50% 100% P(Z|X=x) Z=1 Z=0 x

It is important to explicit the data translation-invariance

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-39
SLIDE 39

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

Classifier 0% 50% 100% P(Z|X=x) Z=1 Z=0 x

It is important to explicit the data translation-invariance

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-40
SLIDE 40

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

Classifier 0% 50% 100% P(Z|X=x) Z=1 Z=0 x

It is important to explicit the data translation-invariance

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-41
SLIDE 41

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

Classifier 0% 50% 100% P(Z|X=x) Z=1 Z=0 x

It is important to explicit the data translation-invariance Convolutional Neural Networks: share weights across space

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-42
SLIDE 42

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Convolutional Neural Networks

An answer to translation-invariance

Classifier 0% 50% 100% P(Z|X=x) Z=1 Z=0 x

It is important to explicit the data translation-invariance Convolutional Neural Networks: share weights across space

8/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-43
SLIDE 43

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Basic Example

Figure: Convolutional filtering: W = 2, nfilter = 4, stride = 1, padding = same. Max pooling layer: W = stride = 3.

9/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

Input Trace Length = 9

4 2 7 8 2 8 2 4 9

Depth = 1

10 1 4 4 2 8 4 6

4 conv. filters

  • f size 2

Filtered Traces Depth= 4

58 76 52 70 28 34 24 32 84 24 40 44 36 66 40 56

Before Pooling Depth= 4

84 28 58

After Pooling Depth= 4

84

slide-44
SLIDE 44

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Example: masked manipulation of a sensitive datum Z

Figure: Deep Learning Behaviour Against Masked Datum

10/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

Z ⊕ R R Input Trace

7 7 1 1 4 7 6 1 5

Depth = 1

8 9 2 7 7 1 5 2

Filtered Traces Depth= 4

43 14 27 17 55 55 17 44 98 69 44 61 81 43 43 42

After Pooling Depth= 4

98 69 44 61 8 3 3 1 7 3 7 8 1 10 9 6

Filtered Traces Depth= 4

slide-45
SLIDE 45

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Example: masked manipulation of a sensitive datum Z

Figure: Deep Learning Behaviour Against Masked Datum

11/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

Input Trace Z ⊕ R R 9

1 10 1 2 4 5 10 1

Depth = 1

2 4 6 1 7 8 1 9

Filtered Traces Depth= 4

27 48 61 19 8010040 95 43 56 29 49 32 40 16 38

After Pooling Depth= 4

8010061 95 4 4 4 8 3 4 9 7 8 8 9 5

Filtered Traces Depth= 4

slide-46
SLIDE 46

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Example: masked manipulation of a sensitive datum Z

Figure: Deep Learning Behaviour Against Masked Datum

12/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

Input Trace Z ⊕ R R 3

7 6 6 9 2 10 1 3

Depth = 1

1 3 1 4 9 4 9 2

Filtered Traces Depth= 4

28 15 28 10 19 34 19 42 92 46 92 28 27 35 27 40

After Pooling Depth= 4

92 46 92 42 10 8 2 1 9 3 6 5 9 6 10 2

Filtered Traces Depth= 4

slide-47
SLIDE 47

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-48
SLIDE 48

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

Aims at finding the parameters of the function modelling for the dependency btw the target value and the leakage.

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-49
SLIDE 49

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

Aims at finding the parameters of the function modelling for the dependency btw the target value and the leakage. The search is done by solving a minimization problem with respect to some metric (aka loss function)

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-50
SLIDE 50

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

Aims at finding the parameters of the function modelling for the dependency btw the target value and the leakage. The search is done by solving a minimization problem with respect to some metric (aka loss function) The training algorithm has itself some training hyper-parameters:

the number of iterations (aka epochs) of the minimization procedure, the number of input traces (aka batch) treated during a single iteration.

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-51
SLIDE 51

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

Aims at finding the parameters of the function modelling for the dependency btw the target value and the leakage. The search is done by solving a minimization problem with respect to some metric (aka loss function) The training algorithm has itself some training hyper-parameters:

the number of iterations (aka epochs) of the minimization procedure, the number of input traces (aka batch) treated during a single iteration.

The trained model has architecture hyper-parameters:

the size of the layers, the nature of the layers, the number of layers, etc.

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-52
SLIDE 52

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Introduction| Convolutional Neural Networks| Training of Models|

Training of Neural Networks

Trading Side-Channel Expertise for Deep Learning Expertise .... or huge computational power!

Training

Aims at finding the parameters of the function modelling for the dependency btw the target value and the leakage. The search is done by solving a minimization problem with respect to some metric (aka loss function) The training algorithm has itself some training hyper-parameters:

the number of iterations (aka epochs) of the minimization procedure, the number of input traces (aka batch) treated during a single iteration.

The trained model has architecture hyper-parameters:

the size of the layers, the nature of the layers, the number of layers, etc.

Tricky Points

Find sound hyper-parameters is the main issue in Deep Learning: this can be done thanks to a good understanding of the underlying structure of the data and/or access to important computational power.

13/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-53
SLIDE 53

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Creation of an open database for Training and Testing

ANSSI recently publishes

source codes of secure implementations of AES128 for public 8-bit architectures (https://github.com/ANSSI-FR/secAES-ATmega8515)

◮ first version: 10-masking + processing in random order ◮ second version: affine masking + processing in random order (plus other minor tricks)

data-bases of electromagnetic leakages (https://github.com/ANSSI-FR/ASCAD) example scripts for the training and testing of models in SCA contexts

Goal

Enable fair and easy benchmarking Initiate discussions and exchanges on the application of DL to SCA Create a community of contributors on this subject

14/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-54
SLIDE 54

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Comparisons with State-Of-the-Art Methods

15/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-55
SLIDE 55

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-56
SLIDE 56

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-57
SLIDE 57

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers)

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-58
SLIDE 58

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-59
SLIDE 59

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.)

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-60
SLIDE 60

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.) Importance of cross-validation

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-61
SLIDE 61

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.) Importance of cross-validation

Open Issues

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-62
SLIDE 62

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.) Importance of cross-validation

Open Issues

Models are trained to recover manipulated values (e.g. sbox outputs) but not the key itself

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-63
SLIDE 63

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.) Importance of cross-validation

Open Issues

Models are trained to recover manipulated values (e.g. sbox outputs) but not the key itself Current loss functions measure the accuracy of pdf estimations but not the efficiency of the resulting attack

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-64
SLIDE 64

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| ASCAD Open Data-Base| Results|

Feedbacks & Open Issues

Feedbacks

The number of epochs for the training is between 100 and 1000 Model architectures are relatively complex (more than 10 layers) Data-bases for the training must be large Require important processing capacities (several GPUs, RAM memory, etc.) Importance of cross-validation

Open Issues

Models are trained to recover manipulated values (e.g. sbox outputs) but not the key itself Current loss functions measure the accuracy of pdf estimations but not the efficiency of the resulting attack Adaptation to get (very) efficient key enumeration algorithms

16/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-65
SLIDE 65

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Recent Results on DL for SCA Evaluation

DL for leakage identification after successful attack: HettwerGehrerG¨

uneysu2019, PerinEgeWoudenberg,MasureDumasProuff2019

Application to asymmetric cryptography running on defensive hardware:

CarboneConinCornelieDassanceDufresneDumasProuffVenelli2019

Various studies to explain the behaviour of deep learning analysis (and/or to improve it) in the context of side-channel attacks:

PicekHeuserJovicBhasinRegazzoni19, KimPicekHeuserBhasinHanjalic19, etc.

17/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-66
SLIDE 66

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-67
SLIDE 67

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing)

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-68
SLIDE 68

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-69
SLIDE 69

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-70
SLIDE 70

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient New needs:

◮ big data-bases for the training, ◮ platforms to enable comparisons and benchmarking, ◮ create an open community ”ML for Embedded Security Analysis”, ◮ encourage exchanges with the Machine Learning community, ◮ understand the efficiency of the current countermeasures

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018

slide-71
SLIDE 71

Learning| Countermeasures| Machine Learning| Building a Community| Conclusions| Recent Results|

Conclusions

State-of-the-Art Template Attack separates resynchronization/dimensionality reduction from characterization Deep Learning provides an integrated approach to directly extract information from rough data (no preprocessing) Many recent results validate the practical interest of the Machine Learning approach We are in the very beginning and we are still discovering how much Deep Learning is efficient New needs:

◮ big data-bases for the training, ◮ platforms to enable comparisons and benchmarking, ◮ create an open community ”ML for Embedded Security Analysis”, ◮ encourage exchanges with the Machine Learning community, ◮ understand the efficiency of the current countermeasures

Thank You! Questions?

18/18 Emmanuel PROUFF - ANSSI / Invited Talk PANDA 2018