Recent Progress in Object Detection Jiaqi Wang Multimedia - - PowerPoint PPT Presentation

recent progress in object detection
SMART_READER_LITE
LIVE PREVIEW

Recent Progress in Object Detection Jiaqi Wang Multimedia - - PowerPoint PPT Presentation

Recent Progress in Object Detection Jiaqi Wang Multimedia Laboratory The Chinese University of Hong Kong Task definition Image classification Object detection Semantic segmentation Instance segmentation Task definition Progress Cascade


slide-1
SLIDE 1

Recent Progress in Object Detection

Jiaqi Wang Multimedia Laboratory The Chinese University of Hong Kong

slide-2
SLIDE 2

Task definition

Image classification Object detection Semantic segmentation Instance segmentation

slide-3
SLIDE 3

Task definition

slide-4
SLIDE 4

Progress

2014 2015 2016 2017 2018 R-CNN Fast R-CNN Faster R-CNN R-FCN FPN Mask R-CNN SSD YOLO RetinaNet Cascade R-CNN YOLO v2 Relation Network CornerNet MultiBox SNIP recent

slide-5
SLIDE 5

General pipeline

backbone neck image Feature generation proposals dense

  • cls. & reg.

sliding window anchors RoI feature extractor task head Region proposal RoI features Region recognition

Two-stage Detector

slide-6
SLIDE 6

General pipeline

backbone neck image Feature generation dense

  • cls. & reg.

sliding window anchors

Single-stage Detector

slide-7
SLIDE 7

Faster R-CNN

  • Region Proposal Network (RPN)
  • Training pipeline
slide-8
SLIDE 8

Faster R-CNN

  • RPN
slide-9
SLIDE 9

Faster R-CNN

Training pipeline

  • Joint training: multi-task

backbone RPN Fast R- CNN head proposals No gradient preferred

slide-10
SLIDE 10

Feature Pyramid Network (FPN)

  • Top-down pathway
  • Multi-level prediction
slide-11
SLIDE 11

Feature Pyramid Network (FPN)

slide-12
SLIDE 12

Feature Pyramid Network (FPN)

slide-13
SLIDE 13

Mask R-CNN

  • RoIAlign
  • Mask branch
slide-14
SLIDE 14

Mask R-CNN

RoI Pooling RoI Align

slide-15
SLIDE 15

Mask R-CNN

Mask branch

slide-16
SLIDE 16

Cascade R-CNN

  • Cascade architecture
  • Training distribution
slide-17
SLIDE 17

Cascade R-CNN

Cascade architecture

Faster R-CNN Cascade R-CNN

slide-18
SLIDE 18

Cascade R-CNN

Training distribution

Regressor Detector

slide-19
SLIDE 19

Cascade R-CNN

Training distribution

slide-20
SLIDE 20

RetinaNet

  • FPN
  • Focal Loss
slide-21
SLIDE 21

RetinaNet

FPN

heavier head than SSD / Faster R-CNN

slide-22
SLIDE 22

RetinaNet

Focal Loss

  • Problem: class imbalance
  • inefficient training
  • loss is overwhelmed by negative samples

Model Solution Two-stage detectors 1) proposal 2) mini-batch sampling SSD Hard negative mining RetinaNet Focal loss

slide-23
SLIDE 23

RetinaNet

Focal Loss

  • Solution: high confidence -> small loss
slide-24
SLIDE 24

COCO Challenge 2018

Comparison of our approach with 2017 winning entries on COCO test-dev.

0.44 0.467 0.474 0.49

0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 MASK AP 2017 winner (single model) 2017 winner Single model Final results

0.505 0.526 0.541 0.56

0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 BOX AP 2017 winner (single model) 2017 winner Single model Final results

slide-25
SLIDE 25

COCO Challenge 2018

  • 1. We developed a hybrid task cascade framework for detection and

segmentation.

Detection & Segmentation

slide-26
SLIDE 26

COCO Challenge 2018

  • 1. We developed a hybrid task cascade framework for detection and

segmentation.

  • 2. We proposed a feature guided anchoring scheme to improve the

average recall (AR) of RPN by 10 points.

Proposal Detection & Segmentation

slide-27
SLIDE 27

COCO Challenge 2018

  • 1. We developed a hybrid task cascade framework for detection and

segmentation.

  • 2. We proposed a feature guided anchoring scheme to improve the

average recall (AR) of RPN by 10 points.

  • 3. We designed a new backbone FishNet.

Backbone Proposal Detection & Segmentation

slide-28
SLIDE 28

COCO Challenge 2018

Hybrid Task Cascade (HTC)

  • Cascade Mask R-CNN (Cascade R-CNN + Mask R-CNN)

F

pool pool pool M3 B3 M2 B2 M1 B1 RPN

Problem: Two branches at each stage are executed in parallel, without interaction.

slide-29
SLIDE 29

COCO Challenge 2018

Hybrid Task Cascade (HTC)

  • Interleaved execution

F

pool pool pool M3 B3 M2 B2 M1 B1 RPN pool

Problem: No direct information flow between mask branches at different stages.

slide-30
SLIDE 30

COCO Challenge 2018

Hybrid Task Cascade (HTC)

  • Mask Information Flow

F

pool pool pool M3 B3 M2 B2 M1 B1 RPN pool

Problem: Spatial context is not much explored.

slide-31
SLIDE 31

COCO Challenge 2018

Hybrid Task Cascade (HTC)

  • Spatial context

F

pool pool pool M3 B3 S M2 B2 M1 B1 RPN pool

slide-32
SLIDE 32

COCO Challenge 2018

Hybrid Task Cascade (HTC)

slide-33
SLIDE 33

COCO Challenge 2018

Guided anchoring

backbone neck image Feature generation proposals dense

  • cls. & reg.

sliding window anchors RoI feature extractor task head Region proposal RoI features Region recognition learnable

slide-34
SLIDE 34

COCO Challenge 2018

Guided anchoring

  • Our goal
  • Sparse
  • Arbitrary shape
  • General rules for anchor design
  • Alignment
  • Consistency
slide-35
SLIDE 35

COCO Challenge 2018

Guided anchoring

Location prediction

slide-36
SLIDE 36

COCO Challenge 2018

Guided anchoring

Shape prediction

slide-37
SLIDE 37

COCO Challenge 2018

Guided anchoring

slide-38
SLIDE 38

COCO Challenge 2018

Guided anchoring

Feature adaption

slide-39
SLIDE 39

COCO Challenge 2018

Guided anchoring

RPN (ResNet-50) RPN (ResNet- 152) RPN (ResNeXt- 101) GA-RPN (ResNet- 50) GA-RPN (SENet- 154) RPN (SENet-154) 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 2 4 6 8 10 12

AR1000 Runtime on TITAN X (fps)

slide-40
SLIDE 40

COCO Challenge 2018

Guided anchoring

slide-41
SLIDE 41

COCO Challenge 2018

1. Training scales

  • short edge: random sampled from 400 ~ 1400
  • long edge: 1600

2. Test scales

  • (600, 900), (800, 1200), (1000, 1500), (1200, 1800), (1400, 2100)

3. Pipeline

  • Joint training
  • Finetune with GA-RPN proposals
  • Test with GA-RPN proposals

4. Resources

  • 32 Tesla V100 GPUs (16GB) for 3 days

Implementation details

slide-42
SLIDE 42

COCO Challenge 2018

Implementation details

Backbones

  • SENet-154
  • ResNeXt101 (64*4d)
  • ResNeXt101 (32*8d)
  • DPN-107
  • FishNet

comparable ~0.8 points higher

slide-43
SLIDE 43

COCO Challenge 2018

Implementation details

Other tricks

  • w/ SoftNMS
  • w/o OHEM
  • w/o classwise balance sampling
  • w/o voting for bbox or mask
slide-44
SLIDE 44

COCO Challenge 2018

  • With bells and whistles

baseline R-50 Cascade with mask HTC deformable conv synchronize BN multi-scale training better backbone GARPN finetune multi-scale & flip testing

model ensemble

35 37 39 41 43 45 47 49

mask AP on test-dev

36.9 47.4 (+2.1) 38.4 (+1.5) 45.3 (+1.0) 39.5 (+1.1) 40.7 (+1.2) 42.5 (+1.8)

49.0 (+1.6)

44.3 (+1.8)

slide-45
SLIDE 45

mmdetection

GitHub: mmdetection

√ √ √ √ √ √ √ √ √ √ √

  • Comprehensive

RPN Fast/Faster R-CNN Mask R-CNN FPN Cascade R-CNN RetinaNet More … …

  • High performance

Better performance Optimized memory consumption Faster speed

  • Handy to develop

Written with PyTorch Modular design

slide-46
SLIDE 46

Hybrid Task Cascade for Instance Segmentation (Accepted to CVPR 2019) https://arxiv.org/abs/1901.07518 Region Proposal by Guided Anchoring (Accepted to CVPR 2019) https://arxiv.org/abs/1901.03278 FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction (Accepted to NIPS 2018) https://arxiv.org/abs/1901.03495