Feature Selection Matters for Anchor-Free Object Detection Chenchen - - PowerPoint PPT Presentation

feature selection matters for anchor free object detection
SMART_READER_LITE
LIVE PREVIEW

Feature Selection Matters for Anchor-Free Object Detection Chenchen - - PowerPoint PPT Presentation

Carnegie Mellon Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon University 04/29/2020 Carnegie Mellon Overview Background Motivation Feature Selection in Anchor-Free Detection General


slide-1
SLIDE 1

Carnegie Mellon

Feature Selection Matters for Anchor-Free Object Detection

Chenchen Zhu Carnegie Mellon University 04/29/2020

slide-2
SLIDE 2

Carnegie Mellon

Overview

2

  • Background
  • Motivation
  • Feature Selection in Anchor-Free Detection
  • General concept
  • Network architecture
  • Ground-truth and loss
  • Feature selection
  • Experiments
slide-3
SLIDE 3

Carnegie Mellon

Overview

3

  • Background
  • Motivation
  • Feature Selection in Anchor-Free Detection
  • General concept
  • Network architecture
  • Ground-truth and loss
  • Feature selection
  • Experiments
slide-4
SLIDE 4

Carnegie Mellon

Background

4

A long-lasting challenge: scale variation

slide-5
SLIDE 5

Carnegie Mellon

Background

5

Prior methods addressing scale variation

Image pyramid

slide-6
SLIDE 6

Carnegie Mellon

Background

6

Prior methods addressing scale variation

Anchor boxes [Ren et al, Faster R-CNN]

slide-7
SLIDE 7

Carnegie Mellon

Background

7

Prior methods addressing scale variation

Pyramidal feature hierarchy, e.g. [Liu et al, SSD]

slide-8
SLIDE 8

Carnegie Mellon

Background

8

Prior methods addressing scale variation

Feature pyramid network [Lin et al, FPN, RetinaNet]

slide-9
SLIDE 9

Carnegie Mellon

Background

9

Prior methods addressing scale variation

Augmentation Balanced FPN [Pang et al, Libra R-CNN] HRNet [Wang et al] NAS-FPN [Ghiasi et al] EfficentDet [Tan et al]

slide-10
SLIDE 10

Carnegie Mellon

Background

10

Combining feature pyramid with anchor boxes

  • Smaller anchor associated with lower pyramid levels (local

fine-grained information)

  • Larger anchor associated with higher pyramid levels (global

semantic information)

feature pyramid small anchors medium anchors large anchors anchor-based head anchor-based head anchor-based head

slide-11
SLIDE 11

Carnegie Mellon

Overview

11

  • Background
  • Motivation
  • Feature Selection in Anchor-Free Detection
  • General concept
  • Network architecture
  • Ground-truth and loss
  • Feature selection
  • Experiments
slide-12
SLIDE 12

Carnegie Mellon

Motivation

12

Implicit feature selection by anchor boxes

  • IoU-based
  • Heuristic guided

feature pyramid 50x50 60x60 small anchors medium anchors large anchors ad-hoc heuristics! 40x40 anchor-based head anchor-based head anchor-based head

slide-13
SLIDE 13

Carnegie Mellon

Motivation

13

Problem: feature selection by heuristics may not be optimal. Question: how can we select feature level based on semantic information rather than just box size? Answer: allowing arbitrary feature assignment by removing the anchor matching mechanism (using anchor-free methods), selecting the most suitable feature level/levels.

slide-14
SLIDE 14

Carnegie Mellon

Overview

14

  • Background
  • Motivation
  • Feature Selection in Anchor-Free Detection
  • General concept
  • Network architecture
  • Ground-truth and loss
  • Feature selection
  • Experiments
slide-15
SLIDE 15

Carnegie Mellon

Feature Selection in Anchor-Free Detection

15

The general concept

  • Each instance can be arbitrarily assigned to a single or

multiple feature levels.

anchor-free head anchor-free head anchor-free head instance feature selection feature pyramid

slide-16
SLIDE 16

Carnegie Mellon

Feature Selection in Anchor-Free Detection

16

Instantiation

  • Network architecture
  • Ground-truth and loss
  • Feature selection: heuristic guided vs. semantic guided
slide-17
SLIDE 17

Carnegie Mellon

Feature Selection in Anchor-Free Detection

17

Network architecture (on RetinaNet)

feature pyramid class+box subnets class+box subnets class+box subnets

slide-18
SLIDE 18

Carnegie Mellon

Feature Selection in Anchor-Free Detection

18

Network architecture (on RetinaNet)

WxH x256 WxH x256 WxH x256 WxH xK WxH x256 WxH x4

x4 x4 class subnet box subnet anchor-free head class+box subnets

slide-19
SLIDE 19

Carnegie Mellon

Feature Selection in Anchor-Free Detection

19

slide-20
SLIDE 20

Carnegie Mellon

Feature Selection in Anchor-Free Detection

20

Ground-truth and loss (similar to DenseBox [Huang et al])

WxH xK WxH x4

“car” class focal loss IoU loss class output box output anchor-free head for one feature level

slide-21
SLIDE 21

Carnegie Mellon

Feature Selection in Anchor-Free Detection

21

slide-22
SLIDE 22

Carnegie Mellon

Feature Selection in Anchor-Free Detection

22

Question: what is a good representation of semantic information to guide feature selection? Our assumption: semantic information is encoded in the network loss.

slide-23
SLIDE 23

Carnegie Mellon

Feature Selection in Anchor-Free Detection

23

feature pyramid focal loss IoU loss focal loss IoU loss focal loss IoU loss

slide-24
SLIDE 24

Carnegie Mellon

Feature Selection in Anchor-Free Detection

24

Question: is it enough to select just one feature level for each instance?

slide-25
SLIDE 25

Carnegie Mellon

Feature Selection in Anchor-Free Detection

25

Can we use similar features from multiple levels to further improve the performance?

slide-26
SLIDE 26

Carnegie Mellon

Feature Selection in Anchor-Free Detection

26

Semantic guided feature selection: soft version

feature pyramid feature selection net

RoIAlign

C

instance b

RoIAlign RoIAlign

slide-27
SLIDE 27

Carnegie Mellon

Feature Selection in Anchor-Free Detection

27

anchor-free head anchor-free head anchor-free head feature selection net feature pyramid

slide-28
SLIDE 28

Carnegie Mellon

Overview

28

  • Background
  • Motivation
  • Feature Selection in Anchor-Free Detection
  • General concept
  • Network architecture
  • Ground-truth and loss
  • Feature selection
  • Experiments
slide-29
SLIDE 29

Carnegie Mellon

Experiments

29

l Data

uCOCO Dataset, train set: train2017, validation set: val2017, test set: test-dev

l Ablation study

uTrain on train2017, evaluate on val2017 uResNet-50 as backbone network

l Runtime analysis

uTrain on train2017, evaluate on val2017 uRun on a single 1080Ti with CUDA 10 and CUDNN 7

l Compare with state of the arts

uTrain on train2017 with 2x iterations, evaluate on test-dev

slide-30
SLIDE 30

Carnegie Mellon

Experiments

30

Ablation study: the effect of feature selection

Heuristic guided Semantic guided AP AP50 AP75 APS APM APL Hard selection Soft selection RetinaNet (anchor- based)  35.7 54.7 38.5 19.5 39.9 47.5 Ours (anchor- free)  35.9 54.8 38.1 20.2 39.7 46.5  37.0 55.8 39.5 20.5 40.1 48.5  38.0 56.9 40.5 21.0 41.1 50.2

slide-31
SLIDE 31

Carnegie Mellon

Experiments

31

Ablation study: the effect of feature selection

Heuristic guided Semantic guided AP AP50 AP75 APS APM APL Hard selection Soft selection RetinaNet (anchor- based)  35.7 54.7 38.5 19.5 39.9 47.5 Ours (anchor- free)  35.9 54.8 38.1 20.2 39.7 46.5  37.0 55.8 39.5 20.5 40.1 48.5  38.0 56.9 40.5 21.0 41.1 50.2 Anchor-free branches with heuristic feature selection can achieve comparable performance with anchor-based counterparts.

slide-32
SLIDE 32

Carnegie Mellon

Experiments

32

Ablation study: the effect of feature selection

Heuristic guided Semantic guided AP AP50 AP75 APS APM APL Hard selection Soft selection RetinaNet (anchor- based)  35.7 54.7 38.5 19.5 39.9 47.5 Ours (anchor- free)  35.9 54.8 38.1 20.2 39.7 46.5  37.0 55.8 39.5 20.5 40.1 48.5  38.0 56.9 40.5 21.0 41.1 50.2 Hard version of semantic guided feature selection chooses more suitable feature levels than heuristic guided selection.

slide-33
SLIDE 33

Carnegie Mellon

Visualization of hard feature selection

33

slide-34
SLIDE 34

Carnegie Mellon

Experiments

34

Ablation study: the effect of feature selection

Heuristic guided Semantic guided AP AP50 AP75 APS APM APL Hard selection Soft selection RetinaNet (anchor- based)  35.7 54.7 38.5 19.5 39.9 47.5 Ours (anchor- free)  35.9 54.8 38.1 20.2 39.7 46.5  37.0 55.8 39.5 20.5 40.1 48.5  38.0 56.9 40.5 21.0 41.1 50.2 Hard selection doesn’t fully explore the network potential. Using similarity from multiple features is helpful.

slide-35
SLIDE 35

Carnegie Mellon

Visualization of soft feature selection

35

slide-36
SLIDE 36

Carnegie Mellon

Visualization of soft feature selection

36

slide-37
SLIDE 37

Carnegie Mellon

Experiments

37

Ablation study: the effect on different feature pyramids

Feature pyramid Heuristic guided selection Semantic guided selection AP AP50 AP75 APS APM APL FPN  35.9 54.8 38.1 20.2 39.7 46.5  38.0 56.9 40.5 21.0 41.1 50.2 BFP  36.8 57.2 39.0 22.0 41.0 45.9  38.8 58.7 41.3 22.5 42.6 50.8

slide-38
SLIDE 38

Carnegie Mellon

Experiments

38

Runtime analysis

Backbone Method AP AP50 Runtime (FPS) ResNet-50 RetinaNet (anchor-based) 35.7 54.7 11.6 Ours (anchor-free) 38.8 58.7 14.9 ResNet-101 RetinaNet (anchor-based) 37.7 57.2 8.0 Ours (anchor-free) 41.0 60.7 11.2 ResNeXt-101 RetinaNet (anchor-based) 39.8 59.5 4.5 Ours (anchor-free) 43.1 63.7 6.1

slide-39
SLIDE 39

Carnegie Mellon

Experiments

39

Comparison with state of the arts

slide-40
SLIDE 40

Carnegie Mellon

References

40

Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016. Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer

  • vision. 2017.

Pang, Jiangmiao, et al. "Libra r-cnn: Towards balanced learning for object detection." Proceedings of the IEEE Conference

  • n Computer Vision and Pattern Recognition. 2019.

Wang, Jingdong, et al. "Deep high-resolution representation learning for visual recognition." arXiv preprint arXiv:1908.07919 (2019). Ghiasi, Golnaz, et al. "Nas-fpn: Learning scalable feature pyramid architecture for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. Tan, Mingxing, et al. "Efficientdet: Scalable and efficient object detection." arXiv preprint arXiv:1911.09070 (2019). Huang, Lichao, et al. "Densebox: Unifying landmark localization with end to end object detection." arXiv preprint arXiv:1509.04874 (2015). Zhu, Chenchen, Yihui He, and Marios Savvides. "Feature selective anchor-free module for single-shot object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. Zhu, Chenchen, et al. "Soft Anchor-Point Object Detection." arXiv preprint arXiv:1911.12448 (2019).

slide-41
SLIDE 41

Carnegie Mellon

Conclusion

41

Free feature selection is one of major differences between anchor-free and anchor-based methods. Semantic guided feature selection is the key!

anchor-free head anchor-free head anchor-free head instance feature selection feature pyramid

slide-42
SLIDE 42

Carnegie Mellon

THANKS!

42