A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / - - PowerPoint PPT Presentation

a good box is not a guarantee of a good mask
SMART_READER_LITE
LIVE PREVIEW

A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / - - PowerPoint PPT Presentation

LVIS Challenge 2020 A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 R231 / G 36 / B 39 #586a87 #595959 #e72427


slide-1
SLIDE 1

1 Pantone / 185C C 0 / M 95 / Y 85 / K 0 R231 / G 36 / B 39 #e72427 Pantone / 129C C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 C 75 / M 59 / Y 37 / K 0 R 88 / G 106 / B 135 #586a87 C 71 / M 63 / Y 60 / K 14 R 89 / G 89 / B 89 #595959

A Good Box is not a Guarantee of a Good Mask

Jingru Tan1 Gang Zhang2 Hanming Deng3 Changbao Wang3 Lewei Lu3 Quanquan Li3 LVIS Challenge 2020

1Tongji University 2Tsinghua University 3Sensetime Research

slide-2
SLIDE 2

Overview

2

Introduction of LVIS

Long tail distribution High quality mask annotations

Training Pipeline

Representation learning stage Fine-tuning stage

Challenges in LVIS

Inconsistent annotations Objects that are hard to represent with boxes

Our Results

Improvements & tricks

slide-3
SLIDE 3

Overview

3

Introduction of LVIS

Long tail distribution High quality mask annotations

slide-4
SLIDE 4

Introduction of LVIS

4

Long Tail Distribution

Classifier is heavily biased towards head categories. Tail categories are hard to classify

slide-5
SLIDE 5

Introduction of LVIS

5

COCO Coarse polygon annotations LVIS Precise polygon annotations High Quality Mask Annotations

slide-6
SLIDE 6

Overview

6

Introduction of LVIS

Long tail distribution High quality mask annotations

Training Pipeline

Representation learning stage Fine-tuning stage

slide-7
SLIDE 7

Training pipeline

7

Representation Learning Fine-tuning Learning universal representation Balancing classifier (for long-tail distribution) Pay more attention on mask prediction (for high quality mask)

slide-8
SLIDE 8

Representation Learning

8

Equalization Loss

Equalization Loss for Long-tailed Object Recognition, CVPR 2020

Mosaic & Rotate & Multi-Scale Training

YOLOv4: Optimal Speed and Accuracy of Object Detection, Arxiv preprint

Repeat Factor Sampling

LVIS: A Dataset for Large Vocabulary Instance Segmentation , CVPR 2019

slide-9
SLIDE 9

Representation Learning

9

To further enhance the model performance, pseudo label is inferenced on LVIS and external datasets like Open Images for self-training. For self-training, we ignore all proposals matched with the pseudo boxes.

AP@Seg m AP@r AP@ c AP@f AP@BBox Baseline

26.2 17.1 26.2 30.2 27.0

Open Images 26.8 17.5 27.2 30.5 28.1 Ignore LVIS pseudo 26.8 17.0 27.1 30.9 27.8

Self-Training

Pseudo Label

slide-10
SLIDE 10

Fine-tuning – BBox Head

10

Classifier is heavily biased towards head categories. Balanced Group Softmax

Balanced Classifier

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. CVPR 2020

slide-11
SLIDE 11

Fine-tuning – Mask Head

Category Frequency BBox AP Mask AP Area Mask/BBox Mask AP - BBox AP coatrack common 73.6 10.1 0.29

  • 63.5

tripod frequent 40.9 3.8 0.22

  • 37.1

necklace frequent 32.8 3.0 0.17

  • 29.8

ski pole frequent 36.2 7.5 0.15

  • 28.7

fork frequent 47.2 21.9 0.26

  • 25.3

windshield wiper frequent 29.6 5.2 0.26

  • 24.4

giraffe frequent 79.2 60.7 0.33

  • 18.5

Mask AP - BBox AP < -5.0: 270 of 1203 categories (~25%) Mask AP - BBox AP < -5.0 and Area Mask/BBox < 0.5: 168 of 1203 categories

A Good Box is not a Guarantee of a Good Mask

Large BBox, Small Mask

11

slide-12
SLIDE 12

Fine-tuning – Mask Head

12

A Good Box is not a Guarantee of a Good Mask

The smaller the mask/bbox ratio, the larger the gap between mask and bbox AP.

slide-13
SLIDE 13

Fine-tuning – Mask Head

13

Tripod Ski pole Giraffe

Some Examples

slide-14
SLIDE 14

Fine-tuning – Mask Head

14

New Strategy for Mask Proposal Assignment

AP@Segm AP@r AP@c AP@f AP@BBox Baseline

34.7 26.1 34.9 38.1 37.6

Ratio Assign 35.0 26.4 35.2 38.6 37.6

Solution for Categories with Small Ratio

Feature Pyramid Networks for Object Detection, CVPR 2017

slide-15
SLIDE 15

Fine-tuning – Mask Head

15 AP@Segm AP@r AP@c AP@f AP@BBox Baseline

34.7 26.1 34.9 38.1 37.6

+BML 35.0 26.1 35.3 38.5 37.6

Solution for Categories with Small Ratio

Balanced Mask Loss: foreground/background imbalance

slide-16
SLIDE 16

Fine-tuning – Mask Head

16 Category Frequency Area Mask/BBox AP Gap AP Gap (Ours) Improvemen t coatrack common 0.29

  • 63.5
  • 60.1

+3.4 tripod frequent 0.22

  • 37.1
  • 33.1

+4.0 necklace frequent 0.17

  • 29.8
  • 27.7

+2.1 ski pole frequent 0.15

  • 28.7
  • 24.5

+4.2 fork frequent 0.26

  • 25.3
  • 21.3

+4.0 windshield wiper frequent 0.26

  • 24.4
  • 21.1

+3.3 giraffe frequent 0.33

  • 18.5
  • 14.4

+4.1

However, it is still an open problem. Leave the further research in the future.

w Ratio Assign & Balanced Mask Loss

slide-17
SLIDE 17

Fine-tuning – Mask Head

17

AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 + Ratio Assign 35.0 26.2 35.2 38.5 37.6 + Balanced Mask Loss 35.2 26.0 35.4 38.9 37.6 + Boundary Supervision* 35.6 26.9 35.6 39.3 37.6 + 7 Convs for Mask Head 35.8 26.8 35.9 39.6 37.6 + Deformable RoI Pooling 36.1 28.8 35.8 39.8 38.3

Predicting High Quality Mask

Boundary-preserving Mask R-CNN, ECCV 2020

slide-18
SLIDE 18

Overview

18

Introduction of LVIS

Long tail distribution High quality mask annotations

Training Pipeline

Representation learning stage Fine-tuning stage

Our Results

Improvements & tricks

slide-19
SLIDE 19

Our Results

19

Baseline

19.2 baseline

slide-20
SLIDE 20

Our Results

20

Data Augmentation (Mosaic, Rotate)

19.2 baseline 20.3 data augmentation

slide-21
SLIDE 21

Our Results

21

Equalization Loss

baseline 19.2 20.3 data augmentation 22.4 EQL baseline

slide-22
SLIDE 22

Our Results

22

Repeat Factor Sampling

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS

slide-23
SLIDE 23

Our Results

23

HTC w/o Semantic Branch

baseline 19.2 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC baseline

slide-24
SLIDE 24

Our Results

24

ResNeSt101 + DCN + 400-1400 Multi-Scale training

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN

slide-25
SLIDE 25

Our Results

25

Some Tricks

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks

Make sampling probability in mosaic align with RFS Make rotated boxes align with rotated masks

slide-26
SLIDE 26

Our Results

26

Self-Training

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks 33.7 self training

slide-27
SLIDE 27

Our Results

27

Mask Scoring + Pseudo Ignore + ResNeSt269

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks 33.7 self training 36.5 mask scoring + pseudo ignore + S269

slide-28
SLIDE 28

Our Results

28

Balanced Group Softmax

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks 33.7 self training 36.5 mask scoring + pseudo ignore + S269 37.6 balanced group softmax

slide-29
SLIDE 29

Our Results

29

High Quality Mask

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks 33.7 self training 36.5 misc 37.6 balanced group softmax 38.8 high quality mask

misc: mask scoring, pseudo ignore, ResNeSt269

slide-30
SLIDE 30

Our Results

30

Testing Time Augmentation

19.2 baseline 20.3 data augmentation 22.4 EQL 26.2 RFS 28.8 HTC 32.0 S101 & DCN 33.2 tricks 33.7 self training 36.5 misc 37.6 balanced group softmax 38.8 high quality mask 41.5 TTA

TTA: (1) multi-scale testing (2) scale-aware inference (3) revised Softnms

slide-31
SLIDE 31

Overview

31

Introduction of LVIS

Long tail distribution High quality mask annotations

Training Pipeline

Representation learning stage Fine-tuning stage

Challenges in LVIS

Inconsistent annotations Objects that are hard to represent with boxes

Our Results

Improvements & tricks

slide-32
SLIDE 32

Challenges in LVIS

32

Not-well Boxable Objects

Fire Hose (Mask AP 3.9) Hose (Mask AP 6.5)

slide-33
SLIDE 33

Challenges in LVIS

33

Categories that are Hard to Detect

Stirrup (Mask AP 1.2) Hook (Mask AP 7.3)

slide-34
SLIDE 34

Challenges in LVIS

34

Inconsistent Annotations

Crib (Mask AP - BBox AP = -51.6)

slide-35
SLIDE 35

35 Pantone / 185C C 0 / M 95 / Y 85 / K 0 R231 / G 36 / B 39 #e72427 Pantone / 129C C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 C 75 / M 59 / Y 37 / K 0 R 88 / G 106 / B 135 #586a87 C 71 / M 63 / Y 60 / K 14 R 89 / G 89 / B 89 #595959

Thank you

LVIS Challenge 2020