Make Some Noise Unleashing the Power of Convolutional Neural - PowerPoint PPT Presentation

Make Some Noise Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis Jaehun Kim 1 Stjepan Picek 1 Annelie Heuser 2 Shivam Bhasin 3 Alan Hanjalic 1 Delft University of Technology, Delft, The Netherlands Univ Rennes, Inria, CNRS, IRISA, France Physical Analysis and Cryptographic Engineering, Temasek Laboratories at Nanyang Technological University, Singapore CHES 2019, Atlanta, August 28, 2019 1 / 36

Make Some Noise Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 2 / 36

Make Some Noise Motivation Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 3 / 36

Make Some Noise Motivation What and Why Investigate the limits of CNNs’ performance when considering side-channel analysis. Propose new CNN instance (and justify the choice for that instance). Investigate how additional, non-task-specific noise can significantly improve the performance of CNNs in side-channel analysis. Show how small changes in setup result in big differences in performance. 4 / 36

Make Some Noise Side-channel Analysis Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 5 / 36

Make Some Noise Side-channel Analysis Profiled Attacks Side-channel attacks (SCAs) are passive, non-invasive implementation attacks. Profiled attacks have a prominent place as the most powerful among side channel attacks. Within profiling phase the adversary estimates leakage models for targeted intermediate computations, which are then exploited to extract secret information in the actual attack phase. 6 / 36

Make Some Noise Side-channel Analysis Profiled Attacks 7 / 36

Make Some Noise Side-channel Analysis Profiled Attacks Template Attack (TA) is the most powerful attack from the information theoretic point of view. Some machine learning (ML) techniques also belong to the profiled attacks. Deep learning has been shown to be able to reach top performance even if the device is protected with countermeasures. 8 / 36

Make Some Noise Deep Learning Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 9 / 36

Make Some Noise Deep Learning Deep Learning Let us build a neural network. 10 / 36

Make Some Noise Deep Learning Deep Learning Let us continue adding neurons. 11 / 36

Make Some Noise Deep Learning Multilayer Perceptron - “Many” Hidden Layers 12 / 36

Make Some Noise Deep Learning Multilayer Perceptron - One Hidden Layer 13 / 36

Make Some Noise Deep Learning Universal Approximation Theorem A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of R n . Given enough hidden units and enough data, multilayer perceptrons can approximate virtually any function to any desired accuracy. Valid results if and only if there is a sufficiently large number of training data in the series. 14 / 36

Make Some Noise Deep Learning Convolutional Neural Networks CNNs represent a type of neural networks which were first designed for 2-dimensional convolutions. They are primarily used for image classification but lately, they have proven to be powerful classifiers in other domains. From the operational perspective, CNNs are similar to ordinary neural networks: they consist of a number of layers where each layer is made up of neurons. CNNs use three main types of layers: convolutional layers, pooling layers, and fully-connected layers. 15 / 36

Make Some Noise Deep Learning Convolutional Neural Networks - Convolution Layer 16 / 36

Make Some Noise Deep Learning Convolutional Neural Networks - Pooling 17 / 36

Make Some Noise Deep Learning Design Principle - VGG Like CNN Small kernel size: 3 × 3 for every layer. Max pooling with 2 x 2 windows, with stride 2. Increasing number of filters per layer: doubled after every max pooling layer. Convolutional blocks are added until the spatial dimension is reduced to 1. After the fully connected layers is the output layey. The convolutional and fully connected layers use ReLu activations, the output layer uses Softmax to normalize the predictions. 18 / 36

Make Some Noise Deep Learning Design Principle - VGG Like CNN R q Q P � � � net = fc θ, softmax ◦ fc θ p , ReLU ◦ � pool Max ◦ � , conv φ r , ReLU p =1 q =1 r =1 (1) conv φ,σ ( X ) = σ ( φ ∗ X ) , (2) fc θ,σ ( x ) = σ ( θ ⊺ x ) . (3) 19 / 36

Make Some Noise Deep Learning Convolutional Neural Networks - Final input Max Pooling (2) 512 P(y|x) 8 16 256 256 32 64 128 128 Flatten BN BN BN BN BN Dropout Dropout 20 / 36

Make Some Noise Adding Noise Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 21 / 36

Make Some Noise Adding Noise Why Adding Noise To reduce the overfitting of the model, we introduce noise to the training phase. Since in our case, the input normalization is also learned during the training process via the BN layer, we added the noise tensor after the first BN. X ∗ = BN 0 ( X ) + Ψ , Ψ ∼ N (0 , α ) . (4) The noise tensor follows the normal distribution. 22 / 36

Make Some Noise Adding Noise Data Augmentation Is this data augmentation? Data augmentation typically applies domain knowledge to deform the original signal into more “plausible” way and uses both the original and transformed measurements in the training phase. Our technique is a regularization technique and can be seen as 1) a noisy training and 2) data augmentation. We are closer to the noisy training as we neither: 1) add transformed measurements and 2) use domain-specific knowledge. 23 / 36

Make Some Noise Adding Noise Underfitting and Overfitting 24 / 36

Make Some Noise Results Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 25 / 36

Make Some Noise Results Results We consider only the intermediate value model (256 classes). We experiment with CNNs, template attack, and pooled template attack. 4 publicly available datasets. 26 / 36

Make Some Noise Results Datasets 0.005 0.00088 0.00086 0.004 0.00084 Gini Importance Gini Importance 0.00082 0.003 0.00080 0.002 0.00078 0.00076 0.001 0.00074 0.00072 0.000 0 500 1000 1500 2000 2500 3000 0 200 400 600 800 1000 1200 samples samples (a) DPAcontest v4 dataset. (b) AES HD dataset. 0.000375 0.0022 0.000350 0.0020 0.000325 0.0018 Gini Importance Gini Importance 0.000300 0.0016 0.000275 0.0014 0.000250 0.0012 0.000225 0.0010 0.000200 0 500 1000 1500 2000 2500 3000 3500 0 100 200 300 400 500 600 700 samples samples 27 / 36

Make Some Noise Results Results DPAv4 (a) RD network per fold. (b) RD network averaged. 28 / 36

Make Some Noise Results Results AES HD (a) ASCAD network per fold. (b) ASCAD network averaged. 29 / 36

Make Some Noise Results Results AES RD (a) RD network per fold. (b) RD network averaged. 30 / 36

Make Some Noise Results Results ASCAD (a) ASCAD network per fold. (b) ASCAD network averaged. 31 / 36

Make Some Noise Results What Else Do We Demonstrate We show noise addition to be quite stable over different levels of noise, number of epochs, and profiling set sizes. Our results indicate that it is not possible to have a single best CNN architecture even if considering “only” SCA. Attacking a dataset without countermeasure could be more difficult than attacking one that has countermeasures. What is really a new CNN architecture and what is simply a new instance in accordance to the input data? The less traces we have in the profiling phase, the more noise we need. 32 / 36

Make Some Noise Results Beware of the Choice of the Profiling Set (a) ASCAD network per fold. (b) RD network per fold. 33 / 36

Make Some Noise Conclusions Outline 1 Motivation 2 Side-channel Analysis 3 Deep Learning 4 Adding Noise 5 Results 6 Conclusions 34 / 36

Make Some Noise Conclusions Conclusions VGG-like CNNs seem to work very good for SCA. There are other domains that use machine learning/deep learning and we can learn a lot from them. Here, by using such good practices, we are able to reach top performance in SCA. We propose to add noise addition as a standard technique in the SCA evaluation for deep learning techniques. 35 / 36

Make Some Noise Conclusions Questions? Thanks for your attention! Q? 36 / 36

Make Some Noise Unleashing the Power of Convolutional Neural - PowerPoint PPT Presentation

Make Some Noise Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis Jaehun Kim 1 Stjepan Picek 1 Annelie Heuser 2 Shivam Bhasin 3 Alan Hanjalic 1 Delft University of Technology, Delft, The

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Recent 35t Noise Tests 20160511 - 35t Recent Noise Tests Switching LV power to linear

NOISE AT WORK AWARENESS SESSION FOR WORKERS WHAT IS NOISE Noise is all around us at home,

Noise Barrier Meeting March 12, 2019 WHY ARE WE HERE TONIGHT? Noise Barrier Final Design Noise

Widening and Improvements Noise Review: Grant Road Hampton St to Santa Rita Rd January 13, 2016

Noise Programs & NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Reconstruction II Neural Networks in Monte Carlo Rendering Philipp Slusallek Karol Myszkowski

Debugging Neural Networks for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2020/ In

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Natural language processing with neural networks. Hubert Brykowski Europython 2019 Hubert

Introduction to Neural Networks David Stutz david.stutz@rwth-aachen.de Seminar Selected Topics

RVTensor: A light-weight neural network inference framework based on the RISC-V architecture

Quality-Aware Neural Complementary Item Recommendation Yin Zhang , Haokai Lu, Wei Niu, James

Make Some Noise Unleashing the Power of Convolutional Neural - PowerPoint PPT Presentation

Make Some Noise Make Some Noise Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis Jaehun Kim 1 Stjepan Picek 1 Annelie Heuser 2 Shivam Bhasin 3 Alan Hanjalic 1 Delft University of Technology, Delft, The

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Recent 35t Noise Tests 20160511 - 35t Recent Noise Tests Switching LV power to linear

NOISE AT WORK AWARENESS SESSION FOR WORKERS WHAT IS NOISE Noise is all around us at home,

Noise Barrier Meeting March 12, 2019 WHY ARE WE HERE TONIGHT? Noise Barrier Final Design Noise

Widening and Improvements Noise Review: Grant Road Hampton St to Santa Rita Rd January 13, 2016

Noise Programs &amp; NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Reconstruction II Neural Networks in Monte Carlo Rendering Philipp Slusallek Karol Myszkowski

Debugging Neural Networks for NLP Graham Neubig Site https://phontron.com/class/nn4nlp2020/ In

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Neural Networks with Euclidean Symmetry for Physical Sciences 3D rotation- and

Natural language processing with neural networks. Hubert Brykowski Europython 2019 Hubert

Introduction to Neural Networks David Stutz david.stutz@rwth-aachen.de Seminar Selected Topics

RVTensor: A light-weight neural network inference framework based on the RISC-V architecture

Quality-Aware Neural Complementary Item Recommendation Yin Zhang , Haokai Lu, Wei Niu, James

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noise Programs & NextGen Briefing Stan Shepherd, Manager Airport Noise Programs 1