Contrastive Relevance Propagation for Interpreting Predictions by a - - PowerPoint PPT Presentation

contrastive relevance propagation
SMART_READER_LITE
LIVE PREVIEW

Contrastive Relevance Propagation for Interpreting Predictions by a - - PowerPoint PPT Presentation

Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa 1 , Yoshitaka Kameya 1 , Hanju Lee 2 , Yosuke Shinya 2 , and Naoki Mitsumoto 2 1 Department of Information Engineering, Meijo


slide-1
SLIDE 1

Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector

Hideomi Tsunakawa1, Yoshitaka Kameya1, Hanju Lee2, Yosuke Shinya2, and Naoki Mitsumoto2

1Department of Information Engineering, Meijo University 2DENSO CORPORATION

1 IJCNN-19

slide-2
SLIDE 2

Outline

  • Background
  • Proposed method: CRP
  • Experiments

IJCNN-19 2

slide-3
SLIDE 3

Outline

  • Background
  • Proposed method: CRP
  • Experiments

IJCNN-19 3

slide-4
SLIDE 4

Background: SSD (1)

  • Object detection is a well-known task in computer vision
  • SSD (Single-Shot MultiBox Detector) [Liu+ ECCV-16]:

– Known for its high speed and accuracy – Outputs:

  • Confidences for classes
  • Location offsets

(center on x-axis, center on y-axis, width, height)

IJCNN-19 4

Input: Output:

Classification Localization

slide-5
SLIDE 5

Background: SSD (2)

  • SSD:

– Based on a (large) single convolutional network – Layers for classification and layers for localization are connected from several convolutional layers → Different resolutions

IJCNN-19 5

Non-maximum suppression

Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11

Input image

300 300

VGG-16 until Pool5 layer

38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

Conv: 3x3x1024 Conv: 1x1x1024 Conv: 1x1x256 Conv: 3x3x512-s2 Conv: 1x1x128 Conv: 3x3x256-s2 Conv: 1x1x128 Conv: 3x3x256-s1 Conv: 1x1x128 Conv: 3x3x256-s1

Classification Localization

slide-6
SLIDE 6

Background: LRP (1)

  • LRP (Layer-wise Relevance Propagation) [Bach+ 15]:

– Often used for interpreting predictions of DNNs

IJCNN-19 6 Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11 300 300 38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

Input: Output:

slide-7
SLIDE 7

Background: LRP (1)

  • LRP (Layer-wise Relevance Propagation) [Bach+ 15]:

– Often used for interpreting predictions of DNNs – Propagates relevance backward from the output to the input features – Creates a heatmap using relevance at the input features

IJCNN-19 7 Cls4 Loc4 Cls7 Loc7 Cls8 Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11 300 300 38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

Input: Output: Heatmap: Relevance to “dog” Relevance propagation

slide-8
SLIDE 8

Background: LRP (2)

  • LRP is equipped with several propagation rules:

– Common:

Rj

(l + 1): distributed to lower units

Ri

(l) := Sj Rij

Rij: passed through connection

IJCNN-19 8

Layer l Layer l + 1

Rj

(l + 1)

Ri

(l)

Rij

slide-9
SLIDE 9

Background: LRP (2)

  • LRP is equipped with several propagation rules:

– Common:

Rj

(l + 1): distributed to lower units

Ri

(l) := Sj Rij

Rij: passed through connection

IJCNN-19 9

Layer l Layer l + 1

Rj

(l + 1)

Ri

(l)

Rij

slide-10
SLIDE 10

Background: LRP (2)

  • LRP is equipped with several propagation rules:

– Common:

Rj

(l + 1): distributed to lower units

Ri

(l) := Sj Rij

Rij: passed through connection

IJCNN-19 10

Layer l Layer l + 1

Rj

(l + 1)

Ri

(l)

Rij

slide-11
SLIDE 11

Background: LRP (2)

  • LRP is equipped with several propagation rules:

– Common:

Rj

(l + 1): distributed to lower units

Ri

(l) := Sj Rij

Rij: passed through connection

– Simple LRP: –  -LRP: –  -LRP:

IJCNN-19 11

Layer l Layer l + 1

Rj

(l + 1)

Ri

(l)

Rij

slide-12
SLIDE 12

Background: Indistinguishable Heatmaps (1)

  • Heatmaps are almost invariant even when the target

class has been changed

  • Heatmaps obtained with  -LRP ( = 1,  = 0):

IJCNN-19 12

Target class: “dog”

(actually predicted)

Target class: “cat”

(“what-if” analysis)

slide-13
SLIDE 13

Background: Indistinguishable Heatmaps (2)

  • Relevance propagated in each layer:

IJCNN-19 13

Relevance decreases exponentially

slide-14
SLIDE 14

Background: Indistinguishable Heatmaps (3)

  • Recent works that seem to support our observation:

– [Adebayo+ NeurIPS-18]:

  • Uses Inception v3 (a large network)
  • If relevance = gradient  input, the input part dominates

→ Heatmaps will be invariant

(since the input is of course fixed)

– [Ancona+ ICLR-18]:

  • Several methods tend to return similar heatmaps

(theoretically or empirically): – Gradient  input – DeepLIFT (Rescale) – Integrated Gradients – Simple LRP

IJCNN-19 14

slide-15
SLIDE 15

Background: Our Motivation

  • We introduce contrastive relevance that highlights

the more important part to the target class

  • We design the meaning of relevance to be consistent

in two heterogeneous tasks in SSD: – Classification – Localization (Regression)

IJCNN-19 15

Target class: “dog” Target class: “cat”

slide-16
SLIDE 16

Outline

✓ Background

  • Proposed method: CRP
  • Experiments

IJCNN-19 16

slide-17
SLIDE 17

Contrastive Relevance Propagation (CRP)

  • CRP: LRP tailored for SSD

– Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same

IJCNN-19 17 Cls4 Loc4

Cls7

Loc7 Cls8 Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11 300 300 38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

A detected box Relevance to class k

  • f interest

High-Level Feature Layer Low-Level Feature Layers Classification Layer

slide-18
SLIDE 18

Contrastive Relevance Propagation (CRP)

  • CRP: LRP tailored for SSD

– Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same

IJCNN-19 18 Cls4 Loc4 Cls7

Loc7

Cls8 Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11 300 300 38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

A detected box Relevance to shifting to right

Localization Layer High-Level Feature Layer Low-Level Feature Layers

slide-19
SLIDE 19

Contrastive Relevance Propagation (CRP)

  • CRP: LRP tailored for SSD

– Classifies SSD’s layers into 4 types – Applies semantically appropriate propagation rules to each layer type – In both classification and localization, the meanings of “relevance” are the same

IJCNN-19 19 Cls4 Loc4 Cls7 Loc7

Cls8

Loc8 Cls9 Loc9 Cls10 Loc10 Cls11 Loc11 300 300 38 38

Conv4_3

512 19 19

Conv6

1024 19 19

Conv7

1024 10 10 512

Conv8_2

5 5

Conv9_2

256

Conv10_2

3 3 256 256 1 1

Conv11_2

Another detected box Relevance to class k’

  • f interest

Classification Layer High-Level Feature Layer Low-Level Feature Layers

slide-20
SLIDE 20

CRP: Propagation Rules in Classification

IJCNN-19 20

Classification layer High-level feature layer Low-level feature layer

class 1 class K class k class k*

(target)

slide-21
SLIDE 21

CRP: Propagation Rules in Classification

IJCNN-19 21

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k

Initial Relevance

1

slide-22
SLIDE 22

CRP: Propagation Rules in Classification

IJCNN-19 22

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k

We use w+-rule

( -LRP with  = 1,  = 0)

to find units that positively contribute to class k*

slide-23
SLIDE 23

CRP: Propagation Rules in Classification

IJCNN-19 23

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k At this moment, we can compute a class-specific relevance Ri[k*] for the target class k* by summing up the passed relevance

slide-24
SLIDE 24

CRP: Propagation Rules in Classification

IJCNN-19 24

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k

We compute contrastive relevance

to find units that make a significantly positive or a significantly negative contribution to the target class k*

“average relevance” over other classes

slide-25
SLIDE 25

CRP: Propagation Rules in Classification

IJCNN-19 25

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k

Until the input layer, we use w+-rule to distribute the positivity or the negativity of contrastive relevance (activations xi are non-negative due to ReLU)

slide-26
SLIDE 26

CRP: Propagation Rules in Classification

IJCNN-19 26

Classification layer High-level feature layer Low-level feature layer

class 1 class k*

(target)

class K class k

Until the input layer, we use w+-rule to distribute the positivity or the negativity of contrastive relevance (activations xi are non-negative due to ReLU)

slide-27
SLIDE 27

CRP: Propagation Rules in Localization

IJCNN-19 27

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height

slide-28
SLIDE 28

CRP: Propagation Rules in Localization

IJCNN-19 28

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height Initial Relevance 1

slide-29
SLIDE 29

CRP: Propagation Rules in Localization

IJCNN-19 29

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height Sign-based rule switching: We switch two rules according to the sign of xj If xj is positive, use w+-rule

( -LRP with  = 1,  = 0)

to find units that positively contribute to center on y-axis

Activation

xj

slide-30
SLIDE 30

CRP: Propagation Rules in Localization

IJCNN-19 30

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height

Activation

xj

Sign-based rule switching: We switch two rules according to the sign of xj If xj is negative, use w–-rule

( -LRP with  = 0,  = 1)

to find units that negatively contribute to center on y-axis

slide-31
SLIDE 31

CRP: Propagation Rules in Localization

IJCNN-19 31

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height

We compute contrastive relevance

relevance from the localization layer “overall average” class-specific relevance

slide-32
SLIDE 32

CRP: Propagation Rules in Localization

IJCNN-19 32

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height

Until the input layer, we use w+-rule as in classification

slide-33
SLIDE 33

CRP: Propagation Rules in Localization

IJCNN-19 33

Localization layer High-level feature layer Low-level feature layer

center on x-axis center on y-axis

(target)

width height

Until the input layer, we use w+-rule as in classification

slide-34
SLIDE 34

Outline

✓ Background ✓ Proposed method: CRP

  • Experiments

IJCNN-19 34

slide-35
SLIDE 35

Experimental Settings

  • Dataset: Pascal VOC 2012
  • We ported the TensorFlow implementation of LRP

(https://github.com/VigneshSrinivasan10/interprettensor)

into a TensorFlow implementation of SSD

(https://github.com/balancap/SSD-Tensorflow)

  • SSD implementation includes a learned model

(We conducted no learning)

  • We added CRP-specific routines
  • Relevance was normalized before creating heatmaps

IJCNN-19 35

(See the paper for details)

slide-36
SLIDE 36

Numerical Example

  • Relevance is almost symmetrically distributed at zero

IJCNN-19 36

 0

Positives Negatives

Different Colors in Heatmap: Target class: “dog”

slide-37
SLIDE 37

Error Analysis (1)

  • A dog was misclassified as a sheep

IJCNN-19 37

slide-38
SLIDE 38

Error Analysis (2)

  • A dog was misclassified as a sheep

IJCNN-19 38

Target class: “dog” Target class: “sheep”

slide-39
SLIDE 39

Error Analysis (3)

  • A dog was misclassified as a sheep

IJCNN-19 39

Target class: “sheep” <85%tile values masked

slide-40
SLIDE 40

Error Analysis (4)

  • Unwanted localizations:

– Horizontal shift to left with widening – Vertical shift to top with heightening

IJCNN-19 40

Before localization After localization

slide-41
SLIDE 41

Error Analysis (5)

  • Unwanted localizations:

– Horizontal shift to left with widening – Vertical shift to top with heightening

IJCNN-19 41

Target offset: center on x-axis Target offset: center on y-axis

slide-42
SLIDE 42

Error Analysis (6)

  • Unwanted localizations:

– Horizontal shift to left with widening – Vertical shift to top with heightening

IJCNN-19 42

Target offset: width Target offset: height

slide-43
SLIDE 43

Summary

  • CRP (contrastive relevance propagation) as an LRP method

tailored for SSD:

– Can highlight only significantly important features for a target class – Can deal with SSD’s heterogeneous outputs (classification and localization)

  • Some error analyses using CRP were conducted

IJCNN-19 43

  • Applying CRP to other object detectors such as YOLO
  • Applying CRP (retrospectively) to standard CNNs

Future work

slide-44
SLIDE 44

Thank you for your attention!

IJCNN-19 44