Presenter: Maulik Shah Scribe: Yunjia Zhang
Grad-CAM
Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
1
Grad-CAM Visual Explanations from Deep Networks via Gradient-based - - PowerPoint PPT Presentation
Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra Presenter: Maulik Shah Scribe: Yunjia Zhang 1
Presenter: Maulik Shah Scribe: Yunjia Zhang
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
1
2
3
in the image
detail
4
5
the convolutional feature maps
What is it?
6
fk(x, y) k (x, y) Fk = ∑
x,y
fk(x, y) Sc = ∑
k
wc
kFk
c Mc(x, y) = ∑
k
wc
k fk(x, y)
c
How does it work?
7
8
networks on other tasks
Drawbacks
9
network, without requiring architectural changes or re-training
explanations help establish trust, and identify a ‘stronger’ model from a ‘weaker’ one though the outputs are the same
Overview
10
layers
the importance of each neuron for a decision of interest
Motivation
11
How it works
: gradient of score for class wrt feature maps
∂yc ∂Ak yc c Ak αc
k = 1
Z ∑
i ∑ j
∂yc ∂Ak
ij
Lc
Grad−CAM = ReLU (∑ k
αc
k Ak
)
12
How it works
13
Results
14
Motivation
grained detail
cat’
CAM visualizations solves the issue
15
How it works
16
details went into decision making
by using the model predicted it as ‘tiger cat’
17
Results
Localization
18
Localization
19
Class Discrimination
categories, and create visualizations for each of them
20
for 90 image-category pairs
distinctive visualizations for each class of interest
Class Discrimination
21
Class Discrimination
Model Accuracy(%) Deconvolution 53.33 Deconvolution + Grad-CAM 61.23 Guided Backpropagation 44.44 Guided Backpropagation + Grad-CAM 61.23
22
Trust - Why is it needed?
trustworthy?
the decision!
23
Trust - Experimental Setup
CAM visualizations
as ground truth
24
Trust - Experimental Setup
workers were asked were asked to rate reliability of the two models as follows
25
Trust - Result
class predictions
better, just based on individual predictions
26
Faithfulness vs Interpretability
function learned by the model
make it not interpretable/easy to visualize
27
intensity by Grad-CAM and Guided Grad-CAM
Faithfulness vs Interpretability
28
making, first collect the misclassified examples
well as the predicted class
inherent in the dataset
reasonable explanations
29
“Doctors” vs “Nurses”
well(82%)
face/hairstyle to make the predictions, thus learning gender stereotypes
30
female doctors, as well as male nurses
remove biases from the dataset, thus making fair and ethical decisions
31
uses VGG-16 CNN for images and an LSTM-based language model
convolutional layer of the CNN
32
How it works
33
34
Comparison to Dense Cap
regions of the image
(FCLN) and an LSTM based language model
bounding boxes
bounding box it was generated for
35
Comparison to Dense Cap
36
Comparison to Dense Cap
box
3.27 ± 0.18 2.32 ± 0.08 6.38 ± 0.99
37
language model for questions
way classification problem
for for the answer and use that to compute Grad-CAM to show image evidence that supports the answer
yc
38
How it works
39
answer a visual question
QI pairs using the rank correlation evaluation protocol
random attention maps (zero correlation)
at localizing regions required to output a particular answer
Comparison to Human Attention Maps
40
hierarchical attention mechanism on the question and the image
but larger changes for layers which involve dimensionality reduction
Visualizing ResNet-based VQA model with attention
41
42
classes more accurately, better reveal trustworthiness, and help identify biases
43
44
45