Revisiting Class Activation Mapping for Learning from Imperfect Data - - PowerPoint PPT Presentation
Revisiting Class Activation Mapping for Learning from Imperfect Data - - PowerPoint PPT Presentation
The 2nd Learning from Imperfect Data (LID) Workshop Revisiting Class Activation Mapping for Learning from Imperfect Data Wonho Bae *, Junhyug Noh*, Jinhwan Seo, and Gunhee Kim Challenge Results 1 st place Track 3: Weakly Supervised Object
1st place
Track 3: Weakly Supervised Object Localization
2
Challenge Results
2nd place
Track 1: Weakly Supervised Semantic Segmentation
Weakly-Supervised Object Localization
3
Input Output monkey
Class Activation Mapping (CAM)
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
CAM (Class Activation Maps)
CNN +$
>
> @ABC
D$
localization result resize
4
Class Activation Mapping (CAM)
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
CAM (Class Activation Maps)
CNN +$
>
> @ABC
D$
localization result resize
5
Class Activation Mapping (CAM) for Track 3
6 !!,# ∗ + !$,# ∗ + !%,# ∗ + ⋯ + !&,# ∗ =
!% "& "' "( ")
GAP
…
!!,# !$,# !%,#
… *: ,-./01
#234
1 2 3 $ ⋯
"
CAM (Class Activation Maps)
CNN !%
5
> #&'(
$%
localization result resize
[HaS] Singh, et al. ICCV 2017 [AE] Wei, et al. CVPR 2017 [ACoL] Zhang, et al. CVPR 2018 [ADL] Choe, et al. CVPR 2019
How to Grasp Whole Object Region?
7
Our Approach
- Motivation
- Information to capture the whole area of the object already exists in feature maps
- Problem
- Three modules (M1–M3) of CAM do not take phenomena (P1–P3) into account
- It results in the localization being limited to small discriminative regions of an object
- Solution
- Correctly utilize the information by simply modifying the three modules
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
M1: Global Average Pooling (GAP) M2: Class Activation Maps (CAM)
CNN +$
>
> @ABC
D$
M3: Thresholding localization result resize Phenomena observed in the feature map (,) P1: P2: P3:
Our Approach (1) Thresholded Average Pooling
- Problem: Global Average Pooling (GAP) under P1
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
M1: Global Average Pooling (GAP) M2: Class Activation Maps (CAM)
CNN +$
>
> @ABC
D$
M3: Thresholding localization result resize Phenomena observed in the feature map (,) P1:
9
!
"
!# $
- Problem: Global Average Pooling (GAP) under P1
Our Approach (1) Thresholded Average Pooling
10
GAP GAP 2.5
⋮
9.9
⋮
"
⋮
#$,& (= 0.04) #
',&
(= 0.01) (
' (max: 59.2)
($ (max: 64.7) = 0.100 + 0.099 + ⋯ *+,-
#$,& ∗ = #
',& ∗
=
Classification phase Localization phase
- Problem: Global Average Pooling (GAP) under P1
Our Approach (1) Thresholded Average Pooling
11
- Problem: Global Average Pooling (GAP) under P1
- Solution: Thresholded Average Pooling (TAP)
Our Approach (1) Thresholded Average Pooling
12
- Problem: Class Activation Maps (CAM) under P2
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
M1: Global Average Pooling (GAP) M2: Class Activation Maps (CAM)
CNN +$
>
> @ABC
D$
M3: Thresholding localization result resize Phenomena observed in the feature map (,) P2:
Our Approach (2) Negative Weight Clamping
- Problem: Class Activation Maps (CAM) under P2
Our Approach (2) Negative Weight Clamping
14
− =
Positive only Negative only Both
- Problem: Class Activation Maps (CAM) under P2
Our Approach (2) Negative Weight Clamping
IoA between the ground truth boxes and the CAMs
15
Positive weights Negative weights
- Problem: Class Activation Maps (CAM) under P2
- Solution: Negative Weight Clamping (NWC)
Our Approach (2) Negative Weight Clamping
16
- Problem: Maximum as a Standard (MaS) under P3
Our Approach (3) Percentile as a Thresholding Standard
!",$ ∗ + !',$ ∗ + !(,$ ∗ + ⋯ + !*,$ ∗ =
+$ ," ,' ,( ,-
GAP
…
!",$ !',$ !*,$
… .: 012345
6789
1 2 3 = ⋯
,
M1: Global Average Pooling (GAP) M2: Class Activation Maps (CAM)
CNN +$
>
> @ABC
D$
M3: Thresholding localization result resize Phenomena observed in the feature map (,) P3:
17
- Problem: Maximum as a Standard (MaS) under P3
Our Approach (3) Percentile as a Thresholding Standard
18
Num of channels (activation > "!.#) Result with CAM CAM values (descending order)
threshold (!!"#) threshold (!!"#)
100 − percentile (%) 100 − percentile (%)
- Problem: Maximum as a Standard (MaS) under P3
- Solution: Percentile as a Standard (PaS)
Our Approach (3) Percentile as a Thresholding Standard
19
Experimental Setting
- Backbone: ResNet50-SE
- Batch size: 210
- Input size: 384×384
- Random crop size: 336×336
- TAP threshold (𝜐!"#): 0.05
- PaS percentile (𝑗): 98
20
Results on Validation Set
- Results with different components
- To preserve the details of masks, we also applied a fully connected CRF.
- The performance gradually improves as each component is added.
21
Leaderboard
22
- Track 3: Weakly Supervised Object Localization
Qualitative Results
CAM + Ours CAM + Ours CAM + Ours CAM + Ours
Expansion to Track 1
24
Expansion to Track 1
25
Our target!
Class Activation Mapping (CAM) for Track 1
26 !!,# ∗ + !$,# ∗ + !%,# ∗ + ⋯ + !&,# ∗ =
!% "& "' "( ")
GAP
…
!!,# !$,# !%,#
… *: ,-./01
#234
1 2 3 $ ⋯
"
CAM (Class Activation Maps)
CNN !%
5
> #&'(
$%
localization result resize
Leaderboard
27
- Track 1: Weakly Supervised Semantic Segmentation