icassp 2017 tutorial on methods for interpreting and
play

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding - PowerPoint PPT Presentation

ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G. Montavon, W. Samek, K.-R. Mller Part 2: Making Deep Neural Networks Transparent 5 March 2017 Making Deep Neural Nets Transparent DNN transparency


  1. ICASSP 2017 Tutorial on Methods for Interpreting and Understanding Deep Neural Networks G. Montavon, W. Samek, K.-R. Müller Part 2: Making Deep Neural Networks Transparent 5 March 2017

  2. Making Deep Neural Nets Transparent DNN transparency interpreting models explaining decisions activation data sensitivity decomposition maximization generation analysis - Berkes 2006 - Hinton 2006 - Khan 2001 - Poulin 2006 - Erhan 2010 - Goodfellow 2014 - Gevrey 2003 - Landecker 2013 - Simonyan 2013 - v. den Oord 2016 - Baehrens 2010 - Bach 2015 - Nguyen 2015/16 - Nguyen 2016 - Simonyan 2013 - Montavon 2017 focus on model focus on data ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 2/44

  3. Making Deep Neural Nets Transparent model analysis decision analysis - visualizing fi lters - include distribution - sensitivity analysis - max. class activation (RBM, DGN, etc.) - decomposition ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 3/44

  4. Interpreting Classes and Outputs Image classification: GoogleNet "motorbike" Question: How does a “motorbike” typically look like? Quantum chemical calculations: GDB-7 high Question: How to interpret “ α high” in terms of molecular geometry? ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 4/44

  5. The Activation Maximization (AM) Method Let us interpret a concept predicted by a deep neural net (e.g. a class, or a real-valued quantity): Examples: ◮ Creating a class prototype: max x ∈X log p ( ω c | x ) . ◮ Synthesizing an extreme case: max x ∈X f ( x ) . ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 5/44

  6. Interpreting a Handwritten Digits Classifier converged solutions x ⋆ initial solutions → → optimizing max x p ( ω c | x ) → → ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 6/44

  7. Interpreting a DNN Image Classifier goose ostrich Images from Simonyan et al. 2013 “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps” Observations: ◮ AM builds typical patterns for these classes (e.g. beaks, legs). ◮ Unrelated background objects are not present in the image. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 7/44

  8. Improving Activation Maximization Activation-maximization produces class-related patterns, but they are not resembling true data points. This can lower the quality of the interpretation for the predicted class ω c . Idea: ◮ Force the interpretation x ⋆ to match the data more closely. This can be achieved by redefining the optimization problem: Find the input pattern that → Find the most likely input maximizes class probability. pattern for a given class. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 8/44

  9. Improving Activation Maximization Find the input pattern that Find the most likely input → maximizes class probability. pattern for a given class. x * x 0 x 0 x * ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 9/44

  10. Improving Activation Maximization Find the input pattern that Find the most likely input → maximizes class probability. pattern for a given class. Nguyen et al. 2016 introduced several enhancements for activation maximization: ◮ Multiplying the objective by an expert p ( x ) : p ( x | ω c ) ∝ p ( ω c | x ) · p ( x ) � �� � old ◮ Optimization in code space: x ⋆ = g ( z ⋆ ) ) + λ � z � 2 z ∈Z p ( ω c | g ( z ) max ���� x These two techniques require an unsupervised model of the data, either a density model p ( x ) or a generator g ( z ) . ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 10/44

  11. discriminative model log p ( ω c | x ) interpre- 0 1 2 3 4 5 6 7 8 9 AM + density neural 10 tation network - optimum has for ω c clear meaning log p ( x | ω c ) 784 + - objective can be + const. hard to optimize x 900 1 log p ( x ) discriminative model density model x =g ( z ) z AM + generator log p ( ω c | x ) neural - more straightforward 900 784 100 10 network to optimize - not optimizing log p( x | ω c ) 0 1 2 3 4 5 6 7 8 9 generative model ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 11/44

  12. Comparison of Activation Maximization Variants simple AM AM-density AM-gen simple AM (init. to (init. to (init. to (initialized class class class to mean) means) means) means) Observation: Connecting to the data leads to sharper prototypes. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 12/44

  13. Enhanced AM on Natural Images Images from Nguyen et al. 2016. “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks” Observation: Connecting AM to the data distribution leads to more realistic and more interpretable images. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 13/44

  14. Summary ◮ Deep neural networks can be interpreted by finding input patterns that maximize a certain output quantity (e.g. class probability). ◮ Connecting to the data (e.g. by adding a generative or density model) improves the interpretability of the solution. x * x 0 x 0 x * ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 14/44

  15. Limitations of Global Interpretations Question: Below are some images of motorbikes. What would be the best prototype to interpret the class “motorbike”? Observations: ◮ Summarizing a concept or category like “motorbike” into a single image can be difficult (e.g. different views or colors). ◮ A good interpretation would grow as large as the diversity of the concept to interpret. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 15/44

  16. From Prototypes to Individual Explanations Finding a prototype: GoogleNet "motorbike" Question: How does a “motorbike” typically look like? Individual explanation: GoogleNet "motorbike" Question: Why is this example classified as a motorbike? ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 16/44

  17. From Prototypes to Individual Explanations Finding a prototype: GDB-7 high Question: How to interpret “ α high” in terms of molecular geometry? Individual explanation: GDB-7 = ... Question: Why α has a certain value for this molecule? ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 17/44

  18. From Prototypes to Individual Explanations Other examples where individual explanations are preferable to global interpretations: ◮ Brain-computer interfaces: Analyze input data for a given user at a given time in a given environment. ◮ Personalized medicine: Extracting the relevant information about a medical condition for a given patient at a given time. Each case is unique and needs its own explanation. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 18/44

  19. From Prototypes to Individual Explanations model analysis decision analysis - visualizing fi lters - include distribution - sensitivity analysis - max. class activation (RBM, DGN, etc.) - decomposition ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 19/44

  20. Explaining Decisions Goal: Determine the relevance of each input variable for a given decision f ( x 1 , x 2 , . . . , x d ) , by assigning to these variables relevance scores R 1 , R 2 , . . . , R d . f ( x ) R 1 R 2 x 2 f ( x ') R 1 R 2 x 1 ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 20/44

  21. Basic Technique: Sensitivity Analysis Consider a function f , a data point x = ( x 1 , . . . , x d ) , and the prediction f ( x 1 , . . . , x d ) . Sensitivity analysis measures the local variation of the function along each input dimension � � ∂ f � 2 � R i = � ∂ x i x = x Remarks: ◮ Easy to implement (we only need access to the gradient of the decision function). ◮ But does it really explain the prediction? ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 21/44

  22. Explaining by Decomposing R 3 R 2 R 4 aggregate R 1 f ( x ) = quantity � i R i = f ( x ) decomposition Examples: ◮ Economic activity (e.g. petroleum, cars, medicaments, ...) ◮ Energy production (e.g. coal, nuclear, hydraulic, ...) ◮ Evidence for object in an image (e.g. pixel 1, pixel 2, pixel 3, ...) ◮ Evidence for meaning in a text (e.g. word 1, word 2, word 3, ...) ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 22/44

  23. What Does Sensitivity Analysis Decompose? Sensitivity analysis � ∂ f � � 2 � R i = � ∂ x i x = x is a decomposition of the gradient norm �∇ x f � 2 . � i R i = �∇ x f � 2 Proof: Sensitivity analysis explains a variation of the function, not the function value itself. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 23/44

  24. What Does Sensitivity Analysis Decompose? Example: Sensitivity for class “car” input image sensitivity ◮ Relevant pixels are found both on cars and on the background. ◮ Explains what reduces/increases the evidence for cars rather what is the evidence for cars. ICASSP 2017 Tutorial — G. Montavon, W. Samek, K.-R. Müller 24/44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend