Contrastive Relevance Propagation for Interpreting Predictions by a - PowerPoint PPT Presentation

Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa 1 , Yoshitaka Kameya 1 , Hanju Lee 2 , Yosuke Shinya 2 , and Naoki Mitsumoto 2 1 Department of Information Engineering, Meijo University 2 DENSO CORPORATION IJCNN-19 1

Outline • Background • Proposed method: CRP • Experiments IJCNN-19 2

Outline • Background • Proposed method: CRP • Experiments IJCNN-19 3

Background: SSD (1) • Object detection is a well-known task in computer vision • SSD (Single-Shot MultiBox Detector) [Liu+ ECCV-16] : – Known for its high speed and accuracy – Outputs: Classification • Confidences for classes Localization • Location offsets (center on x-axis, center on y-axis, width, height) Input: Output: IJCNN-19 4

Background: SSD (2) • SSD: – Based on a (large) single convolutional network – Layers for classification and layers for localization are connected from several convolutional layers Localization → Different resolutions Classification Input VGG-16 until Pool5 layer image Non-maximum suppression Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 38 19 19 10 Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv10_2 Conv11_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 Loc11 1 Conv: Conv: Conv: 1x1x256 Conv: 1x1x128 Conv: 1x1x128 Conv: 1x1x128 3x3x1024 Conv: 3x3x512-s2 Conv: 3x3x256-s1 1x1x1024 Conv: 3x3x256-s2 Conv: 3x3x256-s1 IJCNN-19 5

Background: LRP (1) • LRP (Layer-wise Relevance Propagation) [Bach+ 15] : – Often used for interpreting predictions of DNNs Output: Input: Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 38 19 19 10 Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 IJCNN-19 6

Background: LRP (1) • LRP (Layer-wise Relevance Propagation) [Bach+ 15] : – Often used for interpreting predictions of DNNs – Propagates relevance backward from the output to the input features – Creates a heatmap using relevance at the input features Output: Input: Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 300 Relevance Heatmap: 38 19 19 10 Cls9 5 to “dog” Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 Relevance propagation IJCNN-19 7

Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i  j R i R i  j : passed through connection ( l + 1) R j R i  j ( l ) R i IJCNN-19 8

Background: LRP (2) • LRP is equipped with several propagation rules: – Common: Layer l Layer l + 1 ( l + 1) : distributed to lower units R j ( l ) := S j R i  j R i R i  j : passed through connection ( l + 1) R j – Simple LRP: R i  j ( l ) R i –  -LRP: –  -LRP: IJCNN-19 11

Background: Indistinguishable Heatmaps (1) • Heatmaps are almost invariant even when the target class has been changed • Heatmaps obtained with  -LRP (  = 1 ,  = 0 ): Target class: “dog” Target class: “cat” (actually predicted) (“what - if” analysis) IJCNN-19 12

Background: Indistinguishable Heatmaps (2) • Relevance propagated in each layer: Relevance decreases exponentially IJCNN-19 13

Background: Indistinguishable Heatmaps (3) • Recent works that seem to support our observation: – [Adebayo+ NeurIPS-18]: • Uses Inception v3 (a large network) • If relevance = gradient  input, the input part dominates → Heatmaps will be invariant (since the input is of course fixed) – [Ancona+ ICLR-18]: • Several methods tend to return similar heatmaps (theoretically or empirically): – Gradient  input – DeepLIFT (Rescale) – Integrated Gradients – Simple LRP IJCNN-19 14

Background: Our Motivation • We introduce contrastive relevance that highlights the more important part to the target class Target class: “dog” Target class: “cat” • We design the meaning of relevance to be consistent in two heterogeneous tasks in SSD: – Classification – Localization (Regression) IJCNN-19 15

Outline ✓ Background • Proposed method: CRP • Experiments IJCNN-19 16

Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same A detected box Classification Cls4 Loc4 Layer Cls7 Loc7 Relevance Cls8 to class k Loc8 300 38 19 19 10 of interest Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 17

Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same A detected box Cls4 Localization Loc4 Layer Cls7 Loc7 Relevance Cls8 Loc8 to shifting 300 38 19 19 10 Cls9 5 to right Loc9 Conv6 Conv7 Conv4_3 Conv8_2 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 18

Contrastive Relevance Propagation (CRP) • CRP: LRP tailored for SSD – Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same Another detected box Cls4 Loc4 Cls7 Classification Loc7 Layer Cls8 Relevance Loc8 300 38 19 19 10 to class k’ Cls9 5 Loc9 Conv6 Conv7 Conv4_3 Conv8_2 of interest 3 Conv9_2 Cls10 Loc10 Conv11_2 Conv10_2 300 38 19 19 10 5 3 1 Cls11 512 1024 1024 512 256 256 256 1 Loc11 High-Level Feature Layer Low-Level Feature Layers IJCNN-19 19

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k class K IJCNN-19 20

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Initial Classification Relevance layer class 1 0 class k * 1 (target) class k 0 class K 0 IJCNN-19 21

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k We use w + -rule (  -LRP with  = 1 ,  = 0 ) class K to find units that positively contribute to class k * IJCNN-19 22

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k class K At this moment, we can compute a class-specific relevance R i [ k *] for the target class k * by summing up the passed relevance IJCNN-19 23

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) We compute contrastive relevance class k class K “average relevance” over other classes to find units that make a significantly positive or a significantly negative contribution to the target class k * IJCNN-19 24

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k Until the input layer, we use w + -rule class K to distribute the positivity or the negativity of contrastive relevance (activations x i are non-negative due to ReLU) IJCNN-19 25

CRP: Propagation Rules in Classification Low-level feature High-level feature layer layer Classification layer class 1 class k * (target) class k Until the input layer, we use w + -rule class K to distribute the positivity or the negativity of contrastive relevance (activations x i are non-negative due to ReLU) IJCNN-19 26

CRP: Propagation Rules in Localization Low-level feature High-level feature layer layer Localization layer center on x-axis center on y-axis (target) width height IJCNN-19 27

CRP: Propagation Rules in Localization Low-level feature High-level feature layer layer Localization Initial layer Relevance center on x-axis 0 center on y-axis 1 (target) width 0 height 0 IJCNN-19 28

Contrastive Relevance Propagation for Interpreting Predictions by a - PowerPoint PPT Presentation

Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa 1 , Yoshitaka Kameya 1 , Hanju Lee 2 , Yosuke Shinya 2 , and Naoki Mitsumoto 2 1 Department of Information Engineering, Meijo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Contrastive Causation Making Causation Contrastive What this talk presupposes... The

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Adversarial Contrastive Estimation ACL 2018 AVISHEK (JOEY) BOSE, HUAN LING, *YANSHUAI CAO

Summarizing Contrastive Viewpoints in Opinionated Text MICHAEL PAUL* CHENGXIANG ZHAI

Parallel corpora in translation and contrastive studies Lucie Chlumsk Faculty of Arts, Charles

Geometric Sound Transmission Micah Taylor Overview Geometric propagation Very fast Can be

Lecture no: 2 Short on dB calculations Basics about antennas Propagation mechanisms

Amateur Radio License Propagation and Antennas Todays Topics Propagation Antennas

Clinical application / / Clinical application relevance relevance H.-J. Stellbrink, Hamburg

World Commission on the Social Dimension of Globalization (WCSDG) RELEVANCE FOR THE RELEVANCE

A Hierarchical Bayesian Language Model Based on Pitman-Yor Processes Author: Yee Whye Teh, 2006

3 rd Generation Thunderstorm Map Predicted Duck Pair Accessibility to Upland Nesting

Suicidality (CAMS) Framework: Grounding in Philosophy and Reaching Towards Future Developments

Analysis of MAP in CRP Normal-Normal model ukasz Rajkowski Faculty of Mathematics, Informatics

COVID-19 Therapy: the RECOVERY trial Martin Landray University of Oxford, UK on behalf of the

Model Overview Open Forum Welcome to todays webinar. We will begin promptly at 2:00 PM EST.

ASU Convening on Faculty Mentorship: Think Tank 2 November 13-14, 2018 Introduction Slides Ann

Splitting Algorithms We have seen that slotted Aloha has maximal throughput 1 /e Now we will look

Contrastive Relevance Propagation for Interpreting Predictions by a - PowerPoint PPT Presentation

Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa 1 , Yoshitaka Kameya 1 , Hanju Lee 2 , Yosuke Shinya 2 , and Naoki Mitsumoto 2 1 Department of Information Engineering, Meijo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Contrastive Causation Making Causation Contrastive What this talk presupposes... The

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Adversarial Contrastive Estimation ACL 2018 *AVISHEK (JOEY) BOSE, *HUAN LING, *YANSHUAI CAO

Summarizing Contrastive Viewpoints in Opinionated Text MICHAEL PAUL* CHENGXIANG ZHAI

Parallel corpora in translation and contrastive studies Lucie Chlumsk Faculty of Arts, Charles

Geometric Sound Transmission Micah Taylor Overview Geometric propagation Very fast Can be

Lecture no: 2 Short on dB calculations Basics about antennas Propagation mechanisms

Amateur Radio License Propagation and Antennas Todays Topics Propagation Antennas

Clinical application / / Clinical application relevance relevance H.-J. Stellbrink, Hamburg

World Commission on the Social Dimension of Globalization (WCSDG) RELEVANCE FOR THE RELEVANCE

A Hierarchical Bayesian Language Model Based on Pitman-Yor Processes Author: Yee Whye Teh, 2006

3 rd Generation Thunderstorm Map Predicted Duck Pair Accessibility to Upland Nesting

Suicidality (CAMS) Framework: Grounding in Philosophy and Reaching Towards Future Developments

Analysis of MAP in CRP Normal-Normal model ukasz Rajkowski Faculty of Mathematics, Informatics

COVID-19 Therapy: the RECOVERY trial Martin Landray University of Oxford, UK on behalf of the

Model Overview Open Forum Welcome to todays webinar. We will begin promptly at 2:00 PM EST.

ASU Convening on Faculty Mentorship: Think Tank 2 November 13-14, 2018 Introduction Slides Ann

Splitting Algorithms We have seen that slotted Aloha has maximal throughput 1 /e Now we will look

Adversarial Contrastive Estimation ACL 2018 AVISHEK (JOEY) BOSE, HUAN LING, *YANSHUAI CAO