Make Some Noise Unleashing the Power of Convolutional Neural - - PowerPoint PPT Presentation

make some noise unleashing the power of convolutional
SMART_READER_LITE
LIVE PREVIEW

Make Some Noise Unleashing the Power of Convolutional Neural - - PowerPoint PPT Presentation

Make Some Noise Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis Jaehun Kim 1 Stjepan Picek 1 Annelie Heuser 2 Shivam Bhasin 3 Alan Hanjalic 1 Delft University of Technology, Delft, The


slide-1
SLIDE 1

Make Some Noise

Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis

Jaehun Kim1 Stjepan Picek1 Annelie Heuser2 Shivam Bhasin3 Alan Hanjalic1

Delft University of Technology, Delft, The Netherlands Univ Rennes, Inria, CNRS, IRISA, France Physical Analysis and Cryptographic Engineering, Temasek Laboratories at Nanyang Technological University, Singapore

CHES 2019, Atlanta, August 28, 2019

1 / 36

slide-2
SLIDE 2

Make Some Noise

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

2 / 36

slide-3
SLIDE 3

Make Some Noise Motivation

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

3 / 36

slide-4
SLIDE 4

Make Some Noise Motivation

What and Why

Investigate the limits of CNNs’ performance when considering side-channel analysis. Propose new CNN instance (and justify the choice for that instance). Investigate how additional, non-task-specific noise can significantly improve the performance of CNNs in side-channel analysis. Show how small changes in setup result in big differences in performance.

4 / 36

slide-5
SLIDE 5

Make Some Noise Side-channel Analysis

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

5 / 36

slide-6
SLIDE 6

Make Some Noise Side-channel Analysis

Profiled Attacks

Side-channel attacks (SCAs) are passive, non-invasive implementation attacks. Profiled attacks have a prominent place as the most powerful among side channel attacks. Within profiling phase the adversary estimates leakage models for targeted intermediate computations, which are then exploited to extract secret information in the actual attack phase.

6 / 36

slide-7
SLIDE 7

Make Some Noise Side-channel Analysis

Profiled Attacks

7 / 36

slide-8
SLIDE 8

Make Some Noise Side-channel Analysis

Profiled Attacks

Template Attack (TA) is the most powerful attack from the information theoretic point of view. Some machine learning (ML) techniques also belong to the profiled attacks. Deep learning has been shown to be able to reach top performance even if the device is protected with countermeasures.

8 / 36

slide-9
SLIDE 9

Make Some Noise Deep Learning

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

9 / 36

slide-10
SLIDE 10

Make Some Noise Deep Learning

Deep Learning

Let us build a neural network.

10 / 36

slide-11
SLIDE 11

Make Some Noise Deep Learning

Deep Learning

Let us continue adding neurons.

11 / 36

slide-12
SLIDE 12

Make Some Noise Deep Learning

Multilayer Perceptron - “Many” Hidden Layers

12 / 36

slide-13
SLIDE 13

Make Some Noise Deep Learning

Multilayer Perceptron - One Hidden Layer

13 / 36

slide-14
SLIDE 14

Make Some Noise Deep Learning

Universal Approximation Theorem

A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn. Given enough hidden units and enough data, multilayer perceptrons can approximate virtually any function to any desired accuracy. Valid results if and only if there is a sufficiently large number

  • f training data in the series.

14 / 36

slide-15
SLIDE 15

Make Some Noise Deep Learning

Convolutional Neural Networks

CNNs represent a type of neural networks which were first designed for 2-dimensional convolutions. They are primarily used for image classification but lately, they have proven to be powerful classifiers in other domains. From the operational perspective, CNNs are similar to

  • rdinary neural networks: they consist of a number of layers

where each layer is made up of neurons. CNNs use three main types of layers: convolutional layers, pooling layers, and fully-connected layers.

15 / 36

slide-16
SLIDE 16

Make Some Noise Deep Learning

Convolutional Neural Networks - Convolution Layer

16 / 36

slide-17
SLIDE 17

Make Some Noise Deep Learning

Convolutional Neural Networks - Pooling

17 / 36

slide-18
SLIDE 18

Make Some Noise Deep Learning

Design Principle - VGG Like CNN

Small kernel size: 3 × 3 for every layer. Max pooling with 2 x 2 windows, with stride 2. Increasing number of filters per layer: doubled after every max pooling layer. Convolutional blocks are added until the spatial dimension is reduced to 1. After the fully connected layers is the output layey. The convolutional and fully connected layers use ReLu activations, the output layer uses Softmax to normalize the predictions.

18 / 36

slide-19
SLIDE 19

Make Some Noise Deep Learning

Design Principle - VGG Like CNN

net = fcθ,softmax ◦

P

  • p=1

fcθp,ReLU ◦

Q

  • q=1
  • poolMax ◦

Rq

  • r=1

convφr,ReLU

  • ,

(1) convφ,σ(X) = σ(φ ∗ X), (2) fcθ,σ(x) = σ(θ⊺x). (3)

19 / 36

slide-20
SLIDE 20

Make Some Noise Deep Learning

Convolutional Neural Networks - Final

8 P(y|x) input Max Pooling (2) 16 32 256 256 512 64 128 128 Dropout Dropout BN BN BN BN BN Flatten

20 / 36

slide-21
SLIDE 21

Make Some Noise Adding Noise

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

21 / 36

slide-22
SLIDE 22

Make Some Noise Adding Noise

Why Adding Noise

To reduce the overfitting of the model, we introduce noise to the training phase. Since in our case, the input normalization is also learned during the training process via the BN layer, we added the noise tensor after the first BN. X ∗ = BN0(X) + Ψ, Ψ ∼ N(0, α). (4) The noise tensor follows the normal distribution.

22 / 36

slide-23
SLIDE 23

Make Some Noise Adding Noise

Data Augmentation

Is this data augmentation? Data augmentation typically applies domain knowledge to deform the original signal into more “plausible” way and uses both the original and transformed measurements in the training phase. Our technique is a regularization technique and can be seen as 1) a noisy training and 2) data augmentation. We are closer to the noisy training as we neither: 1) add transformed measurements and 2) use domain-specific knowledge.

23 / 36

slide-24
SLIDE 24

Make Some Noise Adding Noise

Underfitting and Overfitting

24 / 36

slide-25
SLIDE 25

Make Some Noise Results

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

25 / 36

slide-26
SLIDE 26

Make Some Noise Results

Results

We consider only the intermediate value model (256 classes). We experiment with CNNs, template attack, and pooled template attack. 4 publicly available datasets.

26 / 36

slide-27
SLIDE 27

Make Some Noise Results

Datasets

500 1000 1500 2000 2500 3000 samples 0.000 0.001 0.002 0.003 0.004 0.005 Gini Importance

(a) DPAcontest v4 dataset.

200 400 600 800 1000 1200 samples 0.00072 0.00074 0.00076 0.00078 0.00080 0.00082 0.00084 0.00086 0.00088 Gini Importance

(b) AES HD dataset.

500 1000 1500 2000 2500 3000 3500 samples 0.000200 0.000225 0.000250 0.000275 0.000300 0.000325 0.000350 0.000375 Gini Importance 100 200 300 400 500 600 700 samples 0.0010 0.0012 0.0014 0.0016 0.0018 0.0020 0.0022 Gini Importance

27 / 36

slide-28
SLIDE 28

Make Some Noise Results

Results DPAv4

(a) RD network per fold. (b) RD network averaged. 28 / 36

slide-29
SLIDE 29

Make Some Noise Results

Results AES HD

(a) ASCAD network per fold. (b) ASCAD network averaged. 29 / 36

slide-30
SLIDE 30

Make Some Noise Results

Results AES RD

(a) RD network per fold. (b) RD network averaged. 30 / 36

slide-31
SLIDE 31

Make Some Noise Results

Results ASCAD

(a) ASCAD network per fold. (b) ASCAD network averaged. 31 / 36

slide-32
SLIDE 32

Make Some Noise Results

What Else Do We Demonstrate

We show noise addition to be quite stable over different levels

  • f noise, number of epochs, and profiling set sizes.

Our results indicate that it is not possible to have a single best CNN architecture even if considering “only” SCA. Attacking a dataset without countermeasure could be more difficult than attacking one that has countermeasures. What is really a new CNN architecture and what is simply a new instance in accordance to the input data? The less traces we have in the profiling phase, the more noise we need.

32 / 36

slide-33
SLIDE 33

Make Some Noise Results

Beware of the Choice of the Profiling Set

(a) ASCAD network per fold. (b) RD network per fold. 33 / 36

slide-34
SLIDE 34

Make Some Noise Conclusions

Outline

1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions

34 / 36

slide-35
SLIDE 35

Make Some Noise Conclusions

Conclusions

VGG-like CNNs seem to work very good for SCA. There are other domains that use machine learning/deep learning and we can learn a lot from them. Here, by using such good practices, we are able to reach top performance in SCA. We propose to add noise addition as a standard technique in the SCA evaluation for deep learning techniques.

35 / 36

slide-36
SLIDE 36

Make Some Noise Conclusions

Questions?

Thanks for your attention! Q?

36 / 36