We Weakly and deeply supervised vi visual learning www . xinggangw - - PowerPoint PPT Presentation

we weakly and deeply supervised vi visual learning
SMART_READER_LITE
LIVE PREVIEW

We Weakly and deeply supervised vi visual learning www . xinggangw - - PowerPoint PPT Presentation

CSIG We Weakly and deeply supervised vi visual learning www . xinggangw . info 1 Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et


slide-1
SLIDE 1

We Weakly and deeply supervised vi visual learning

王兴刚 www.xinggangw.info 华中科技大学

1

CSIG青年科学家论坛

slide-2
SLIDE 2

2

Annotation time of manual supervision

Berman et al., What’s the Point: Semantic Segmentation with Point Supervision, ECCV 16 Slide credit: Hakan Bilen

1 2.4 10 78 second per instance Annotation time:

slide-3
SLIDE 3
  • Supervision: image (category) labels
  • Target: Object detection, semantic segmentation etc

3

Image labels

Person, Horse

[Verbeek CVPR 07, Pendey, ICCV 11, Cinbis CVPR 14, Wang ECCV 14, Papandreou ICCV 15, BelienCVPR 15, Tang CVPR 17, Wei CVPR 17, Singh ICCV 17, Huang CVPR 18 etc.]

slide-4
SLIDE 4

4

Video labels

  • Supervision: video (category) labels
  • Target: Object detection, semantic segmentation etc

[Tokmakov ECCV 16] [Papazoglou, ICCV 13]

slide-5
SLIDE 5
  • Supervision: one point each instance/category
  • Target: Object detection, semantic segmentation etc.

5

Clicks in object

[BearmanECCV 16]

slide-6
SLIDE 6

6

Extreme points

DEXTR [Maninis CVPR 18]

  • Supervision: object extreme points
  • Target: instance segmentation

[Padadopoulos ICCV 17, Maninis CVPR 18]

slide-7
SLIDE 7
  • Supervision: scribbles/lines per instance
  • Target: instance segmentation

7

Scribbles in object

[BearmanECCV 16] MIL Cut [Wu CVPR 14]

slide-8
SLIDE 8
  • Supervision: object bounding boxes
  • Target: instance segmentation

8

Object bbox

BoxSup [Dai ICCV 15] [Rother SIGGRAPH 04, Dai ICCV 15, Khoreva CVPR 17]

slide-9
SLIDE 9

9

Webly supervision

[Hou et al, arxiv 18]

  • Supervision: Keywords & search engines
  • Target: semantic segmentation
slide-10
SLIDE 10

10

Hashtag

[Mahajan, ECCV 18]

  • Supervision: 3.5 billion images with Instagram tags
  • Target: a good pre-trained model
slide-11
SLIDE 11
  • Supervision: COCO (has bbox) + ImageNet (has image label)
  • Target: object detection for 9000 classes

11

Mixing full & weak supervision

YOLO9000, CVPR 2017 Best Paper Honorable Mention [Redmon CVPR 17]

Blue: COCO class. Dark: ImageNet class

slide-12
SLIDE 12
  • Supervision: bbox in source domain + image label in target

domain

  • Target: bbox in target domain

12

Full + weak supervision + Domain adaptation

[Inoue, CVPR 18]

slide-13
SLIDE 13
  • Supervision: counts of object per class
  • Target: object detection

13

Count of object

C-WSL [Wang, ECCV 18]

slide-14
SLIDE 14
  • Supervision: only number of classes
  • Target: object bbox

14

Only number of classes

bMCL [Zhu, CVPR 12, PAMI 15]

slide-15
SLIDE 15

15

Polygon-RNN

Polygon-RNN, Honorable Mention Best Paper Award [Castrejon, CVPR 17]

  • Supervision: bbox + interactive key point
  • Target: object polygon

Polygon-RNN cuts down the number of required annotation clicks by a factor of 4.74

slide-16
SLIDE 16

16

From the perspective of machine learning

Full supervision Incomplete supervision Inaccurate supervision Inexact supervision

Person, Dog

WSL [Zhou, 2018, National science review]:

  • Incomplete supervision
  • Inaccurate supervision
  • Inexact supervision
slide-17
SLIDE 17
  • Weakly supervised
  • bject detection

17

Next

Person, Horse

  • Weakly supervised

semantic segmentation

slide-18
SLIDE 18

18

Standard MIL pipeline

Slide credit: VittoFerrari

1. Window space (usually, using object proposals) 2. Initialization 3. Re-localization & Re-training

[Chum CVPR 07, Deselaers ECCV 10, Siva ICCV 11, Wang ICCV15, BilenCVPR 15]

slide-19
SLIDE 19

19

Weakly-supervised deep detection network (WSDDN)

Slide credit: Hakan Bilen [BilenCVPR 16]

J End-to-end Region CNN for WSOD L Normalization over classes hurts performance

slide-20
SLIDE 20

20

Online instance classifier refinement (OICR) network

  • Additional blocks (instance classifiers) for score propagation
  • In-network supervision

J The positive proposals in one image are not sharing score J Performance significantly improves L The instance-level in-network supervision may not be correct

[Tang CVPR 17]

slide-21
SLIDE 21

21

Proposal cluster learning

WSDDN OICR PCL

J In-network supervision for proposal cluster is more robust J MIL in MIL network (Bag in bag MIL) L It still relies on hand-crafted

  • bject proposals

[Tang, arXiv:1807.03342v1, under revision of TPAMI]

slide-22
SLIDE 22

22

Weakly supervised region proposal network

[Tang ECCV 18]

J Generating object proposals from neural activations J Confirming that CNN contains rich localization information even under weak supervision J The first weakly supervised region proposal network (wsRPN)

slide-23
SLIDE 23

23

Generative adversarial learning

[Shen CVPR 18]

J Training SSD by WSOD using GAN loss J Fast inference speed using SSD J Accurate WSOD by adversarial learning

slide-24
SLIDE 24

24

Performance

WSOD performance (mAP on PASCAL VOC 2007 test)

39.3 42.8 43.7 47 47.5 50.4 69.9 WSDDN (CVPR16) WCCN (CVPR17) HCP+ (CVPR17) OCIR (CVPR17) GAL-FWSD512 (CVPR18) WSRPN (ECCV18) FASTER RCNN (PAMI17)

slide-25
SLIDE 25

25

Class activation maps

[Zhou CVPR 16]

J Finding discriminative regions by Global Average Pooling in a CNN trained using image labels J A very insightful work for understanding CNN

slide-26
SLIDE 26

26

Adversarial erasing network

[Wei CVPR 17]

J Adversarial erasing finds dense and complete object regions J Very impressive WSSS results

slide-27
SLIDE 27

27

Seed, Expand and Constrain (SEC)

[Kolesnikov ECCV 16]

J Seed with weak localization cues J Expand with image labels J Constrain to object boundary using CRF

slide-28
SLIDE 28

28

Deep seeded region grown (DSRG) network

Segmentation Network Classification network

Seeding Loss

Boundary Loss

Downscale CRF Seed Seed

seeded region growing

[Huang CVPR 18]

J Region growing for complete and dense object regions J A segmentation network generates new pixel labels by itself

slide-29
SLIDE 29

29

Iteratively Mining Common Object Features

[Wang CVPR 18]

J Mining common features between region(super-pixel)-level classification network and pixel-level segmentation network

slide-30
SLIDE 30

30

Performance

WSSS performance (mIoU on PASCAL VOC 2012 test)

24.9 35.6 39.6 45.1 51.2 51.7 52.7 55.7 59.2 60.3 63.2 70 10 20 30 40 50 60 70 80 MIL-FCN (CVPRW14) CCNN (CVPR15) EM-ADAPT (ICCV15) DCSM (ECCV16) STC (PAMI16) SEC (ECCV16) AF-SS (ECCV16) AE-PSL (CVPR17) DCSP (BMVC17) MCOF (CVPR18) DSRG (CVPR18) FCN (RESNET101)

slide-31
SLIDE 31
  • There are many different kinds of weak

supervision for visual recognition.

31

Take always

1 2 3 4

  • Deep learning enables effective WSVL;

however, performance still far from full supervised models.

  • WSVL significantly reduces human

labeling efforts.

  • WSVL is a rising research area; there are

lots of interesting ideas to explorer.

slide-32
SLIDE 32
  • CVPR tutorial: Weakly supervised learning for computer

vision, by Hakan Belen, Rodrigo Benenson, Jasper Uijlings, https://hbilen.github.io/wsl-cvpr18.github.io

  • Source codes
  • WSSDN: https://github.com/hbilen/WSDDN
  • CAM: http://cnnlocalization.csail.mit.edu
  • OICR/PCL: https://github.com/ppengtang/oicr/tree/pcl
  • SEC: https://github.com/kolesman/SEC
  • DSRG: https://github.com/speedinghzl/DSRG

32

Resources

slide-33
SLIDE 33

Thanks a lot your attention!

33

Questions?

www.xinggangw.info