Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , - - PowerPoint PPT Presentation

β–Ά
seesaw loss for long tailed instance segmentation
SMART_READER_LITE
LIVE PREVIEW

Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , - - PowerPoint PPT Presentation

Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , Wenwei Zhang 2 , Yuhang Zang 2 , Yuhang Cao 1 , Jiangmiao Pang 3 , Tao Gong 4 , Kai Chen 1 , Ziwei Liu 1 , Chen Change Loy 2 , Dahua Lin 1 1 The Chinese University of Hong Kong 2


slide-1
SLIDE 1

Seesaw Loss for Long-Tailed Instance Segmentation

Jiaqi Wang1, Wenwei Zhang2, Yuhang Zang2, Yuhang Cao1, Jiangmiao Pang3, Tao Gong4, Kai Chen1, Ziwei Liu1, Chen Change Loy2, Dahua Lin1

1The Chinese University of Hong Kong 2Nanyang Technological University 3Zhejiang University 4University of Science and Technology of China

Team: MMDet

slide-2
SLIDE 2

Results

Comparison of our entry with official baseline on LVIS v1 test-dev.

26.8 38.9

0.42 5.42 10.42 15.42 20.42 25.42 30.42 35.42 40.42 45.42 MASK AP Baseline AP MMDet AP

slide-3
SLIDE 3

Results

Comparison of our entry with official baseline on LVIS v1 test-dev.

19 29.5 25.2 37 32 45.4

0.42 5.42 10.42 15.42 20.42 25.42 30.42 35.42 40.42 45.42 50.42 MASK AP Basline APr MMDet APr Basline APc MMDet APc Basline APf MMDet APf

slide-4
SLIDE 4

Overview

  • 1. We propose Seesaw Loss that dynamically rebalances the penalty

between different categories for long-tailed instance segmentation.

slide-5
SLIDE 5

Overview

  • 1. We propose Seesaw Loss that dynamically rebalances the penalty

between different categories for long-tailed instance segmentation.

  • 2. We propose HTC-Lite, a light-weight version of Hybrid Task Cascade

(HTC).

slide-6
SLIDE 6

Overview

  • 1. We propose Seesaw Loss that dynamically rebalances the penalty

between different categories for long-tailed instance segmentation.

  • 2. We propose HTC-Lite, a light-weight version of Hybrid Task Cascade

(HTC).

slide-7
SLIDE 7

Seesaw Loss

Existing object detectors struggle on long-tailed datasets, exhibiting unsatisfactory performance on rare classes. The reason lies on that the overwhelming number of samples in frequent classes leads to models whose rare class confidences are severely suppressed.

slide-8
SLIDE 8

Seesaw Loss

  • Dynamic: Seesaw Loss dynamically modifies the penalty according to the relative

ratio of instance numbers between each category pair.

  • Smooth: Seesaw Loss smoothly adjusts the punishment on rare classes when the

training instances are positive samples of other relatively frequent categories.

  • Self-calibrated: It directly learns to balance the penalty to each categories during

training, without relying on known dataset distributions or a specific data sampler. To tackle this problem, we propose Seesaw Loss for long-tailed instance segmentation.

slide-9
SLIDE 9

Seesaw Loss

Seesaw Loss can be derived from cross-entropy loss:

slide-10
SLIDE 10

Seesaw Loss

Seesaw Loss accumulates the number of training samples for each category during each training iteration. Given an instance with positive label 𝒋, for the other category π’Œ, Seesaw Loss dynamically adjusts the penalty for negative label π’Œ w.r.t.

π‘Άπ’Œ 𝑢𝒋 as,

slide-11
SLIDE 11

Seesaw Loss

Seesaw Loss

  • When category 𝒋 is more frequent than category π’Œ, Seesaw Loss will reduce the penalty on

category π’Œ by a factor of

π‘Άπ’Œ 𝑢𝒋 𝒒

to protect category π’Œ.

  • Otherwise, Seesaw Loss will keep the penalty on negative classes to reduce misclassification.
slide-12
SLIDE 12

Seesaw Loss

Normalized Linear Layer. We adopt a normalized linear layer to predict classification activation. Objectness Score. We adopt an objectness branch to predict objectness scores with normalized linear layer and cross-entropy loss.

slide-13
SLIDE 13

Seesaw Loss

slide-14
SLIDE 14

Overview

  • 1. We propose Seesaw Loss that dynamically rebalances the penalty

between different categories for long-tailed classification.

  • 2. We propose HTC-Lite, a light-weight version of Hybrid Task Cascade

(HTC).

slide-15
SLIDE 15

HTC-Lite

  • Original HTC

F

Pool Pool Pool

B1 M1 B2 M2 B3 M3

Pool

Semantic Segmentatin

Pool

slide-16
SLIDE 16

HTC-Lite

  • Reduce the number of mask heads

F

Pool Pool Pool

B1 B2 B3 M3

Pool

Semantic Segmentatin

Pool

slide-17
SLIDE 17

HTC-Lite

  • Use context encoding rather than semantic segmentation head
  • Does not rely on semantic segmentation annotation

F

Pool Pool Pool

B1 B2 B3 M3

Pool

Context Encoding

Pool

slide-18
SLIDE 18

HTC-Lite

Comparison with HTC w/o semantic

Dataset Method Bbox AP Mask AP COCO HTC w/o semantic 41.5 36.7 HTC-Lite 42.5 37.8 LVIS V1 HTC w/o semantic 26.8 24.5 HTC-Lite 27.2 25.2

slide-19
SLIDE 19

Experiments

Training/Testing details

  • 1. Training Dataset:
  • Detectors: LVIS v1 training split
  • 2. Training scales
  • long edge: random sampled from 768~1792 pixels
  • Random crop to 1280 x 1280
  • 3. Augmentation
  • Use InstaBoost
  • 4. Test time augmentation
  • Random flip
  • Scales: (1200, 1200), (1400, 1400), (1600, 1600), (1800, 1800), (2000, 2000)

No extra data or annotation is used in our entry.

slide-20
SLIDE 20

Experiments

Model Modifications

  • Synchronized BN
  • CARAFE
  • HTC-Lite
  • TSD
  • Mask Scoring
  • Better Neck: FPG + CARAFE + DCNv2
  • Better Backbone: ResNest-200 + DCNv2
slide-21
SLIDE 21

18.7

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline

Experiments

slide-22
SLIDE 22

18.7 18.9 (+0.2)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN

Experiments

slide-23
SLIDE 23

18.7 18.9 (+0.2) 19.4 (+0.5)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample

Experiments

slide-24
SLIDE 24

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite

Experiments

slide-25
SLIDE 25

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD

Experiments

slide-26
SLIDE 26

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring

Experiments

slide-27
SLIDE 27

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug.

Experiments

slide-28
SLIDE 28

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6) 27 (+0.5)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug. FPG

Experiments

slide-29
SLIDE 29

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6) 27 (+0.5) 29.9 (+2.9)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug. FPG ResNest200 DCNv2

Experiments

slide-30
SLIDE 30

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6) 27 (+0.5) 29.9 (+2.9) 36.8 (+6.9)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug. FPG ResNest200 DCNv2 Seesaw Loss

Experiments

slide-31
SLIDE 31

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6) 27 (+0.5) 29.9 (+2.9) 36.8 (+6.9) 37.3 (+0.5)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug. FPG ResNest200 DCNv2 Seesaw Loss Finetune

Experiments

slide-32
SLIDE 32

18.7 18.9 (+0.2) 19.4 (+0.5) 21.9 (+2.5) 23.5 (+1.6) 23.9 (+0.4) 26.5 (+2.6) 27 (+0.5) 29.9 (+2.9) 36.8 (+6.9) 37.3 (+0.5) 38.8 (+1.5)

18 20 22 24 26 28 30 32 34 36 38 40

mask AP on val

38.92 on test-dev Baseline SyncBN CARAFE Upsample HTC-Lite TSD Mask Scoring Training Time Aug. FPG ResNest200 DCNv2 Seesaw Loss Finetune Test Time Aug.

Experiments

slide-33
SLIDE 33

Supported Methods

  • RPN
  • Guided Anchoring
  • Fast / Faster R-CNN
  • R-FCN
  • Grid R-CNN
  • Libra R-CNN
  • Mask R-CNN
  • Dynamic R-CNN
  • Mask scoring R-CNN
  • Double Head R-CNN
  • Cascade R-CNN
  • Hybrid Task Cascade
  • DetectoRS

GitHub: MMDet

We recently release MMDetetion v2.0 & MMDetetion3D

  • HRNet
  • GCNet
  • NAS-FPN
  • PAFPN
  • FSAF
  • PointRend
  • Instaboost
  • Mixed Precision Training
  • CARAFE
  • DCN / DCN V2
  • Weight Standardization
  • Generalized Attention
  • Generalized Focal Loss
  • GRoIE
  • DIoU
  • CIoU
  • BIoU
  • RetinaNet
  • ATSS
  • SSD
  • GHM
  • OHEM
  • FCOS
  • NAS-FCOS
  • FoveaBox
  • Reppoints

GitHub: MMDet3D

slide-34
SLIDE 34

Thank you!