Explaining Deep Neural Networks with a Polynomial Time Algorithm for - - PowerPoint PPT Presentation

explaining deep neural networks with a polynomial time
SMART_READER_LITE
LIVE PREVIEW

Explaining Deep Neural Networks with a Polynomial Time Algorithm for - - PowerPoint PPT Presentation

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation Marco Ancona , Cengiz ztireli 2 , Markus Gross 1,2 1 Department of Computer Science, ETH Zurich, Switzerland 2 Disney Research, Zurich, Switzerland


slide-1
SLIDE 1

Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation

Marco Ancona, Cengiz Öztireli2, Markus Gross1,2

1Department of Computer Science, ETH Zurich, Switzerland

2Disney Research, Zurich, Switzerland

slide-2
SLIDE 2

… …

Attribution method

Pre-trained model

TARGET

slide-3
SLIDE 3

Layer-wise Relevance Propagation (LRP)

Bach et al. 2015

DeepLIFT

Shrikumar et al. 2017

Saliency Maps

Simonyan et al. 2015

Integrated Gradients

Sundararajan et al. 2017

Grad-CAM

Selvaraju et al. 2016

Simple occlusion

Zeiler et al. 2014

LIME

Ribeiro et al. 2016

Guided Backpropagation

Springenberg et al. 2014

Prediction Difference Analysis

Zintgraf et al. 2017

Meaningful Perturbation

Fong et al. 2017

Gradient * Input

Shrikumar et al. 2016

… KernelSHAP/DeepSHAP

Lundberg et al., 2017

Attribution methods

slide-4
SLIDE 4

Evaluating attribution methods

  • No ground-truth explanation à not easy to evaluate empirically
slide-5
SLIDE 5

Evaluating attribution methods

  • No ground-truth explanation à not easy to evaluate empirically
  • Often based on heuristics à not easy to justify theoretically
slide-6
SLIDE 6

Evaluating attribution methods

  • No ground-truth explanation à not easy to evaluate empirically
  • Often based on heuristics à not easy to justify theoretically

“Axiomatic approach”

From a set of desired properties to the method definition

slide-7
SLIDE 7

(Some) desirable properties

Completeness Continuity

Attributions for two nearly identical inputs on a continuous function should be nearly identical.

Linearity

Attributions generated for a linear combination of two models should also be a linear combination of the original attributions.

Symmetry

Attributions should sum up to the output of the function being considered, for comprehensive accounting. If two features have exactly the same role in the model, they should receive the same attribution.

slide-8
SLIDE 8

Shapley Values

Shapley, Lloyd S., 1953

The only attribution method that satisfies all the aforementioned properties.

slide-9
SLIDE 9

Shapley Values

Shapley, Lloyd S., 1953

The only attribution method that satisfies all the aforementioned properties.

slide-10
SLIDE 10

Shapley Values

The function to analyze (eg. the map from the input layer to a specific output neuron in a DNN)

Shapley, Lloyd S., 1953

slide-11
SLIDE 11

Shapley Values

S is a given set of input features

Shapley, Lloyd S., 1953

slide-12
SLIDE 12

Shapley Values

Shapley, Lloyd S., 1953

slide-13
SLIDE 13

Shapley Values

All unique subsets S

  • f features taken from the input (set) P

Shapley, Lloyd S., 1953

slide-14
SLIDE 14

Shapley Values

All unique subsets S

  • f features taken from the input (set) P

Shapley, Lloyd S., 1953

slide-15
SLIDE 15

Shapley Values

average

Shapley, Lloyd S., 1953

slide-16
SLIDE 16

Shapley Values

marginal contribution average

Shapley, Lloyd S., 1953

slide-17
SLIDE 17

Shapley Values

marginal contribution average all subsets

Shapley, Lloyd S., 1953

slide-18
SLIDE 18

Shapley Values

marginal contribution average all subsets “The average marginal contribution of a feature with respect to all subsets of other features”

Shapley, Lloyd S., 1953

slide-19
SLIDE 19

Shapley Values

Issue: testing all subsets is unfeasible!

Shapley, Lloyd S., 1953

slide-20
SLIDE 20

Shapley value sampling

Castro et al., 2009

0.16

slide-21
SLIDE 21

0.16 0.10

Shapley value sampling

Castro et al., 2009

slide-22
SLIDE 22

0.16 0.10 0.25

Shapley value sampling

Castro et al., 2009

slide-23
SLIDE 23

0.16 0.10 0.25 -0.35

Shapley value sampling

Castro et al., 2009

slide-24
SLIDE 24

Pros: Shapley value sampling is unbiased

slide-25
SLIDE 25

Pros: Shapley value sampling is unbiased Cons: might require a lot of samples (network evaluations) to produce an accurate result

slide-26
SLIDE 26

Pros: Shapley value sampling is unbiased Cons: might require a lot of samples (network evaluations) to produce an accurate result Can we avoid sampling?

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Shapley value sampling

slide-30
SLIDE 30

Deep Approximate Shapley Propagation

slide-31
SLIDE 31

Deep Approximate Shapley Propagation

slide-32
SLIDE 32

Deep Approximate Shapley Propagation

ReLU

slide-33
SLIDE 33

Deep Approximate Shapley Propagation

ReLU k out of N Features on

slide-34
SLIDE 34

Deep Approximate Shapley Propagation

ReLU k out of N Features on

slide-35
SLIDE 35

Deep Approximate Shapley Propagation

“Rectified” Normal Distribution ReLU k out of N Features on

slide-36
SLIDE 36

Deep Approximate Shapley Propagation

slide-37
SLIDE 37

Deep Approximate Shapley Propagation

slide-38
SLIDE 38

Deep Approximate Shapley Propagation

To propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks Gast et al., 2018

slide-39
SLIDE 39

Deep Approximate Shapley Propagation

To propagate distributions through the network layers we use Lightweight Probabilistic Deep Networks

Affine transformation Rectified Linear Unit Leaky Rectified Linear Unit Mean pooling Max pooling … Gast et al., 2018

The use of other probabilistic frameworks is also possible

slide-40
SLIDE 40

DASP vs other methods

Gradient-based methods

ü (Very) fast ✗ Poor Shapley Value estimation

Sampling-based methods

ü Unbiased Shapley Value estimator ✗ Slow

DASP

slide-41
SLIDE 41

For details, come at the poster Pacific Ballroom #63

Thank you

Lightweight Probabilistic Deep Network (Keras)

github.com/marcoancona/LPDN

Deep Approximate Shapley Propagation github.com/marcoancona/DASP

References

Lloyd S. Shapley, A value for n-person games, 1952 Castro et al., Polynomial calculation of the Shapley value based on sampling, 2009 Fatima et al., A linear approximation method for the Shapley value, 2014 Ribeiro et al., "Why Should I Trust You?": Explaining the Predictions of Any Classifier, 2016 Sundararajan et al., Axiomatic attribution for deep networks, 2017 Shrikumar at al., Learning important features through propagating activation differences, 2017 Lundberg et al., A Unified Approach to Interpreting Model Predictions, 2017 Gast et al., Lightweight Probabilistic Deep Networks, 2018