we weakly and deeply supervised vi visual learning
play

We Weakly and deeply supervised vi visual learning www . xinggangw - PowerPoint PPT Presentation

CSIG We Weakly and deeply supervised vi visual learning www . xinggangw . info 1 Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et


  1. 华中科技大学 王兴刚 CSIG青年科学家论坛 We Weakly and deeply supervised vi visual learning www . xinggangw . info 1

  2. Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et al., What’s the Point: Semantic Segmentation with Point Supervision, ECCV 16 Slide credit: Hakan Bilen

  3. Image labels 3 Person, Horse • Supervision: image (category) labels • Target: Object detection, semantic segmentation etc [Verbeek CVPR 07, Pendey, ICCV 11, Cinbis CVPR 14, Wang ECCV 14, Papandreou ICCV 15, BelienCVPR 15, Tang CVPR 17, Wei CVPR 17, Singh ICCV 17, Huang CVPR 18 etc.]

  4. Video labels 4 • Supervision: video (category) labels • Target: Object detection, semantic segmentation etc [Papazoglou, ICCV 13] [Tokmakov ECCV 16]

  5. Clicks in object 5 [BearmanECCV 16] • Supervision: one point each instance/category • Target: Object detection, semantic segmentation etc.

  6. Extreme points 6 DEXTR [Maninis CVPR 18] • Supervision: object extreme points • Target: instance segmentation [Padadopoulos ICCV 17, Maninis CVPR 18]

  7. Scribbles in object 7 MIL Cut [Wu CVPR 14] [BearmanECCV 16] • Supervision: scribbles/lines per instance • Target: instance segmentation

  8. Object bbox 8 BoxSup [Dai ICCV 15] • Supervision: object bounding boxes • Target: instance segmentation [Rother SIGGRAPH 04, Dai ICCV 15, Khoreva CVPR 17]

  9. Webly supervision 9 [Hou et al, arxiv 18] • Supervision: Keywords & search engines • Target: semantic segmentation

  10. Hashtag 10 [Mahajan, ECCV 18] • Supervision: 3.5 billion images with Instagram tags • Target: a good pre-trained model

  11. Mixing full & weak supervision 11 • Supervision: COCO (has bbox) + ImageNet (has image label) • Target: object detection for 9000 classes Blue: COCO class. Dark: ImageNet class YOLO9000, CVPR 2017 Best Paper Honorable Mention [Redmon CVPR 17]

  12. Full + weak supervision + Domain adaptation 12 [Inoue, CVPR 18] • Supervision: bbox in source domain + image label in target domain • Target: bbox in target domain

  13. Count of object 13 • Supervision: counts of object per class • Target: object detection C-WSL [Wang, ECCV 18]

  14. Only number of classes 14 • Supervision: only number of classes • Target: object bbox bMCL [Zhu, CVPR 12, PAMI 15]

  15. Polygon-RNN 15 Polygon-RNN, Honorable Mention Best Paper Award [Castrejon, CVPR 17] Polygon-RNN cuts down the number of required annotation clicks by a factor of 4.74 • Supervision: bbox + interactive key point • Target: object polygon

  16. From the perspective of machine learning 16 WSL [Zhou, 2018, National science review]: • Incomplete supervision • Inaccurate supervision • Inexact supervision Full supervision Person, Dog Incomplete supervision Inaccurate supervision Inexact supervision

  17. Next 17 Person, Horse • Weakly supervised • Weakly supervised semantic segmentation object detection

  18. Standard MIL pipeline 18 1. Window space (usually, using object proposals) 2. Initialization 3. Re-localization & Re-training [Chum CVPR 07, Deselaers ECCV 10, Siva ICCV 11, Wang ICCV15, BilenCVPR 15] Slide credit: VittoFerrari

  19. Weakly-supervised deep detection network (WSDDN) 19 [BilenCVPR 16] J End-to-end Region CNN for WSOD L Normalization over classes hurts performance Slide credit: Hakan Bilen

  20. Online instance classifier refinement (OICR) network 20 [Tang CVPR 17] • Additional blocks (instance classifiers) for score propagation • In-network supervision J The positive proposals in one image are not sharing score J Performance significantly improves L The instance-level in-network supervision may not be correct

  21. Proposal cluster learning 21 [Tang, arXiv:1807.03342v1, under revision of TPAMI] J In-network supervision for proposal cluster is more robust J MIL in MIL network (Bag in bag MIL) L It still relies on hand-crafted object proposals WSDDN OICR PCL

  22. Weakly supervised region proposal network 22 [Tang ECCV 18] J Generating object proposals from neural activations J Confirming that CNN contains rich localization information even under weak supervision J The first weakly supervised region proposal network (wsRPN)

  23. Generative adversarial learning 23 [Shen CVPR 18] J Training SSD by WSOD using GAN loss J Fast inference speed using SSD J Accurate WSOD by adversarial learning

  24. Performance 24 FASTER RCNN (PAMI17) 69.9 WSRPN (ECCV18) 50.4 GAL-FWSD512 (CVPR18) 47.5 OCIR (CVPR17) 47 HCP+ (CVPR17) 43.7 WCCN (CVPR17) 42.8 WSDDN (CVPR16) 39.3 WSOD performance (mAP on PASCAL VOC 2007 test)

  25. Class activation maps 25 [Zhou CVPR 16] J Finding discriminative regions by Global Average Pooling in a CNN trained using image labels J A very insightful work for understanding CNN

  26. Adversarial erasing network 26 [Wei CVPR 17] J Adversarial erasing finds dense and complete object regions J Very impressive WSSS results

  27. Seed, Expand and Constrain (SEC) 27 [Kolesnikov ECCV 16] J Seed with weak localization cues J Expand with image labels J Constrain to object boundary using CRF

  28. Deep seeded region grown (DSRG) network 28 [Huang CVPR 18] seeded region growing Seed Classification network Seed Seeding Segmentation Network Loss Boundary Downscale Loss CRF J Region growing for complete and dense object regions J A segmentation network generates new pixel labels by itself

  29. Iteratively Mining Common Object Features 29 [Wang CVPR 18] J Mining common features between region(super-pixel)-level classification network and pixel-level segmentation network

  30. Performance 30 FCN (RESNET101) 70 DSRG (CVPR18) 63.2 MCOF (CVPR18) 60.3 DCSP (BMVC17) 59.2 AE-PSL (CVPR17) 55.7 AF-SS (ECCV16) 52.7 SEC (ECCV16) 51.7 STC (PAMI16) 51.2 DCSM (ECCV16) 45.1 EM-ADAPT (ICCV15) 39.6 CCNN (CVPR15) 35.6 MIL-FCN (CVPRW14) 24.9 0 10 20 30 40 50 60 70 80 WSSS performance (mIoU on PASCAL VOC 2012 test)

  31. Take always 31 • There are many different kinds of weak supervision for visual recognition. 1 • WSVL significantly reduces human 2 labeling efforts. • Deep learning enables effective WSVL; 3 however, performance still far from full supervised models. 4 • WSVL is a rising research area; there are lots of interesting ideas to explorer.

  32. Resources 32 • CVPR tutorial: Weakly supervised learning for computer vision , by Hakan Belen, Rodrigo Benenson, Jasper Uijlings, https://hbilen.github.io/wsl-cvpr18.github.io • Source codes • WSSDN: https://github.com/hbilen/WSDDN • CAM: http://cnnlocalization.csail.mit.edu • OICR/PCL: https://github.com/ppengtang/oicr/tree/pcl • SEC: https://github.com/kolesman/SEC • DSRG: https://github.com/speedinghzl/DSRG

  33. Questions? 33 Thanks a lot your attention ! www . xinggangw . info

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend