CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided Backpropagation, Deep Dream, Deep Art, Fooling Convolutional Neural Networks Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Acknowledgements Andrej Karpathy Video Lecture on Visualization and Deep Dream ∗ ∗ Visualization, Deep Dream, Neural Style, Adversarial Examples 2/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Module 13.1: Visualizing patches which maximally activate a neuron 3/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

1000 dense Consider some neurons in a given 4096 dense layer of a CNN 4096 dense 2 2 MaxPooling We can feed in images to this CNN 256 5 5 3 3 Convolution and identify the images which cause 256 7 7 3 3 these neurons to fire Convolution 384 9 9 We can then trace back to the patch 3 3 Convolution 384 11 in the image which causes these neur- 11 3 3 MaxPooling 23 256 ons to fire 23 3 3 Convolution 27 Let us look at the result of one of such 256 27 5 5 experiment conducted by Grishick et 55 MaxPooling 96 55 al., 2014 3 3 Convolution 96 11 11 Input 4/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

They consider 6 neurons in the pool5 layer and find the image patches which cause these neurons to fire One neuron fires for people faces Another neuron fires for dog faces Another neuron fires for flowers Another neuron fires for numbers Another neuron fires for houses Another neuron fires for shiny surfaces 5/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Module 13.2: Visualizing filters of a CNN 6/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Recall that we had done something x ˆ similar while discussing autoencoders We are interested in finding an input which maximally excites a neuron h ( x ) Turns out that the input which will W maximally activate a neuron is � W � x { w T x } max x || x || 2 = x T x = 1 s.t. w 1 Solution: x = � w T 1 w 1 7/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

h 11 h 12 Now recall that we can think of a CNN also as a feed-forward network with sparse connections and weight sharing . . . Once again, we are interested in 16 knowing what kind of inputs will cause a given neuron to fire The solution would be the same h 14 2 ( W � W � ) where W is the filter(2 × 2, in = * this case) We can thus think of these filters as pattern detectors 8/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

We can simply plot the K × K weights (filters) as images & visualize them as patterns The filters essentially detect these patterns (by causing the neurons to maximally fire) This is only interpretable for the fil- ters in the first convolution layer { w T x } (Why?) max x || x || 2 = x T x = 1 s.t. w 1 Solution: x = � w T 1 w 1 9/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Module 13.3: Occlusion experiments 10/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

... pomeranian wheel hound Typically we are interested in under- Softmax standing which portions of the image are responsible for maximizing the (a) Input Image probability of a certain class We could occlude (gray out) differ- ent patches in the image and see the True Label: Pomeranian effect on the predicted probability of the correct class For example this heat map shows that (a) Input Image (b) Layer 5, strongest feature map occluding the face of the dog causes a maximum drop in the prediction probability True Label: Pomeranian Similar observations are made for other images (a) Input Image (b) Layer 5, strongest feature map 11/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13 True Label: Pomeranian

Module 13.4: Finding influence of input pixels using backpropagation 12/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

We can think of an image as a m × n inputs x 0 , x 1 , . . . , x m × n We are interested in finding the influ- h j ence of each of these inputs( x i ) on a given neuron( h j ) If a small change in x i causes a large change in h j then we can say that x i x i has a lot of influence of h j In other words the gradient ∂h j flatten ∂x i could tell us about the influence · · · · x 0 x 1 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x mn 13/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

∂h j = 0 − → no influence ∂x i h j ∂h j = large − → high influence ∂x i ∂h j = small − → low influence ∂x i x i We could just compute these partial flatten derivatives w.r.t all the inputs ∂h i ∂h i · · · · And then visualize this gradient mat- ∂x 0 ∂x 1 · · · · · · rix as an image itself · · · · · · · · · · · · · · · · · · · · · · · ∂h i ∂x mn 14/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

h j x i But how do we compute these gradi- flatten ents? · · · · x 0 x 1 Recall that we can represent CNNs · · · · · · by feedforward neural network · · · · · · Then we already know how to com- · · · · · · pute influences (gradient) using back- · · · · · · propogation · · · · · x mn For example, we know how to back- prop the gradients till the first hidden 15/72 layer Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

This is what we get if we compute the gradients and plot it as an image The above procedure does not show very sharp influences Springenberg et al. proposed “guided back propagation” which gives a bet- ter idea about the influences 16/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Module 13.5: Guided Backpropagation 17/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

We feed an input to the CNN and do a forward pass We consider one neuron in some fea- ture map at some layer We are interested in finding the influ- ence of the input on this neuron We retain this neuron and set all other neurons in the layer to zero 18/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

We now backpropogate all the way to the inputs Recall that during forward pass relu 1 -1 5 1 0 5 Forward pass activation allows only positive values 2 -5 -7 2 0 0 -3 2 4 0 2 4 to pass & clamps − ve values to zero Similarly during backward pass no gradient passes through the dead relu -2 0 -1 -2 3 -1 Backward pass: 6 0 0 6 -3 1 neurons backpropagation 0 -1 3 2 -1 3 “ ” In guided back propagation any - ve gradients flowing from the upper “ ” 0 0 0 -2 3 -1 layer are also set to 0 “ Backward pass: ” guided 6 0 0 6 -3 1 backpropagation 0 0 3 2 -1 3 19/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Intuition: Neglect all the negative influences (gradients) and focus only Backpropagation on the positive influences (gradients) This gives a better picture of the true influence of the input Guided Backpropagation 20/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Module 13.6: Optimization over images 21/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Suppose we want to create an image which looks like a dumbell (or an os- trich, or a car, or just anything) 55 27 23 Dumbell 11 In other words we want to create 9 11 7 3 5 dense 3 3 5 3 3 3 3 3 2 dense dense 3 3 5 3 3 an image such that if we pass it 2 11 5 7 256 9 11 256 MaxPooling 23 27 384 Convolution through a trained ConvNet it should 55 Convolution 384 256 Convolution MaxPooling 1000 256 96 maximize the probability of the class Convolution MaxPooling 4096 4096 96 Convolution dumbell Input We could pose this as an optimization problem w.r.t I ( i 0 , i 1 , . . . , i mn ) arg max I ( S c ( I ) − λ Ω( I )) S c ( I ) = Score for class C before softmax Ω( I ) = Some regularizer to ensure that I looks like an image 22/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

We can essentially think of the image as a collection of parameters Keep the weights of trained convolu- 55 27 23 Dumbell 11 tional neural network fixed 9 11 7 3 5 dense 3 3 5 3 3 3 3 3 2 dense dense 3 3 5 3 3 2 11 5 Now adjust these parameters(image 7 256 9 11 256 MaxPooling 23 27 384 Convolution 55 Convolution pixels) so that the score of a class is 384 256 Convolution MaxPooling 1000 256 96 maximized Convolution MaxPooling 4096 4096 96 Convolution Let us see how Input 23/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

55 27 23 Class i 11 9 5 7 3 dense 3 5 3 3 3 Zero Image 3 3 3 2 dense dense 3 3 3 5 3 2 5 7 256 9 256 11 MaxPooling 27 23 384 Convolution 55 Convolution 384 256 Convolution MaxPooling 1000 256 96 Convolution MaxPooling 4096 4096 96 Convolution 1 Start with a zero image 2 Set the score vector to be [0 , 0 , . . . 1 , 0 , 0] 3 Compute the gradient ∂S c ( I ) ∂i k 4 Now update the pixel i k = i k − η ∂S c ( I ) ∂i k 5 Now again do a forward pass through the network 6 Go to step 2 24/72 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 13

Recommend

More recommend