 
              Make Some Noise Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis Jaehun Kim 1 Stjepan Picek 1 Annelie Heuser 2 Shivam Bhasin 3 Alan Hanjalic 1 Delft University of Technology, Delft, The Netherlands Univ Rennes, Inria, CNRS, IRISA, France Physical Analysis and Cryptographic Engineering, Temasek Laboratories at Nanyang Technological University, Singapore CHES 2019, Atlanta, August 28, 2019 1 / 36
Make Some Noise Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 2 / 36
Make Some Noise Motivation Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 3 / 36
Make Some Noise Motivation What and Why Investigate the limits of CNNs’ performance when considering side-channel analysis. Propose new CNN instance (and justify the choice for that instance). Investigate how additional, non-task-specific noise can significantly improve the performance of CNNs in side-channel analysis. Show how small changes in setup result in big differences in performance. 4 / 36
Make Some Noise Side-channel Analysis Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 5 / 36
Make Some Noise Side-channel Analysis Profiled Attacks Side-channel attacks (SCAs) are passive, non-invasive implementation attacks. Profiled attacks have a prominent place as the most powerful among side channel attacks. Within profiling phase the adversary estimates leakage models for targeted intermediate computations, which are then exploited to extract secret information in the actual attack phase. 6 / 36
Make Some Noise Side-channel Analysis Profiled Attacks 7 / 36
Make Some Noise Side-channel Analysis Profiled Attacks Template Attack (TA) is the most powerful attack from the information theoretic point of view. Some machine learning (ML) techniques also belong to the profiled attacks. Deep learning has been shown to be able to reach top performance even if the device is protected with countermeasures. 8 / 36
Make Some Noise Deep Learning Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 9 / 36
Make Some Noise Deep Learning Deep Learning Let us build a neural network. 10 / 36
Make Some Noise Deep Learning Deep Learning Let us continue adding neurons. 11 / 36
Make Some Noise Deep Learning Multilayer Perceptron - “Many” Hidden Layers 12 / 36
Make Some Noise Deep Learning Multilayer Perceptron - One Hidden Layer 13 / 36
Make Some Noise Deep Learning Universal Approximation Theorem A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n . Given enough hidden units and enough data, multilayer perceptrons can approximate virtually any function to any desired accuracy. Valid results if and only if there is a sufficiently large number of training data in the series. 14 / 36
Make Some Noise Deep Learning Convolutional Neural Networks CNNs represent a type of neural networks which were first designed for 2-dimensional convolutions. They are primarily used for image classification but lately, they have proven to be powerful classifiers in other domains. From the operational perspective, CNNs are similar to ordinary neural networks: they consist of a number of layers where each layer is made up of neurons. CNNs use three main types of layers: convolutional layers, pooling layers, and fully-connected layers. 15 / 36
Make Some Noise Deep Learning Convolutional Neural Networks - Convolution Layer 16 / 36
Make Some Noise Deep Learning Convolutional Neural Networks - Pooling 17 / 36
Make Some Noise Deep Learning Design Principle - VGG Like CNN Small kernel size: 3 × 3 for every layer. Max pooling with 2 x 2 windows, with stride 2. Increasing number of filters per layer: doubled after every max pooling layer. Convolutional blocks are added until the spatial dimension is reduced to 1. After the fully connected layers is the output layey. The convolutional and fully connected layers use ReLu activations, the output layer uses Softmax to normalize the predictions. 18 / 36
Make Some Noise Deep Learning Design Principle - VGG Like CNN R q Q P � � � net = fc θ, softmax ◦ fc θ p , ReLU ◦ � pool Max ◦ � , conv φ r , ReLU p =1 q =1 r =1 (1) conv φ,σ ( X ) = σ ( φ ∗ X ) , (2) fc θ,σ ( x ) = σ ( θ ⊺ x ) . (3) 19 / 36
Make Some Noise Deep Learning Convolutional Neural Networks - Final input Max Pooling (2) 512 P(y|x) 8 16 256 256 32 64 128 128 Flatten BN BN BN BN BN Dropout Dropout 20 / 36
Make Some Noise Adding Noise Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 21 / 36
Make Some Noise Adding Noise Why Adding Noise To reduce the overfitting of the model, we introduce noise to the training phase. Since in our case, the input normalization is also learned during the training process via the BN layer, we added the noise tensor after the first BN. X ∗ = BN 0 ( X ) + Ψ , Ψ ∼ N (0 , α ) . (4) The noise tensor follows the normal distribution. 22 / 36
Make Some Noise Adding Noise Data Augmentation Is this data augmentation? Data augmentation typically applies domain knowledge to deform the original signal into more “plausible” way and uses both the original and transformed measurements in the training phase. Our technique is a regularization technique and can be seen as 1) a noisy training and 2) data augmentation. We are closer to the noisy training as we neither: 1) add transformed measurements and 2) use domain-specific knowledge. 23 / 36
Make Some Noise Adding Noise Underfitting and Overfitting 24 / 36
Make Some Noise Results Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 25 / 36
Make Some Noise Results Results We consider only the intermediate value model (256 classes). We experiment with CNNs, template attack, and pooled template attack. 4 publicly available datasets. 26 / 36
Make Some Noise Results Datasets 0.005 0.00088 0.00086 0.004 0.00084 Gini Importance Gini Importance 0.00082 0.003 0.00080 0.002 0.00078 0.00076 0.001 0.00074 0.00072 0.000 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 1200 samples samples (a) DPAcontest v4 dataset. (b) AES HD dataset. 0.000375 0.0022 0.000350 0.0020 0.000325 0.0018 Gini Importance Gini Importance 0.000300 0.0016 0.000275 0.0014 0.000250 0.0012 0.000225 0.0010 0.000200 0 500 1000 1500 2000 2500 3000 3500 0 100 200 300 400 500 600 700 samples samples 27 / 36
Make Some Noise Results Results DPAv4 (a) RD network per fold. (b) RD network averaged. 28 / 36
Make Some Noise Results Results AES HD (a) ASCAD network per fold. (b) ASCAD network averaged. 29 / 36
Make Some Noise Results Results AES RD (a) RD network per fold. (b) RD network averaged. 30 / 36
Make Some Noise Results Results ASCAD (a) ASCAD network per fold. (b) ASCAD network averaged. 31 / 36
Make Some Noise Results What Else Do We Demonstrate We show noise addition to be quite stable over different levels of noise, number of epochs, and profiling set sizes. Our results indicate that it is not possible to have a single best CNN architecture even if considering “only” SCA. Attacking a dataset without countermeasure could be more difficult than attacking one that has countermeasures. What is really a new CNN architecture and what is simply a new instance in accordance to the input data? The less traces we have in the profiling phase, the more noise we need. 32 / 36
Make Some Noise Results Beware of the Choice of the Profiling Set (a) ASCAD network per fold. (b) RD network per fold. 33 / 36
Make Some Noise Conclusions Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 34 / 36
Make Some Noise Conclusions Conclusions VGG-like CNNs seem to work very good for SCA. There are other domains that use machine learning/deep learning and we can learn a lot from them. Here, by using such good practices, we are able to reach top performance in SCA. We propose to add noise addition as a standard technique in the SCA evaluation for deep learning techniques. 35 / 36
Make Some Noise Conclusions Questions? Thanks for your attention! Q? 36 / 36
Recommend
More recommend