We Weakly and deeply supervised vi visual learning
王兴刚 www.xinggangw.info 华中科技大学
1
We Weakly and deeply supervised vi visual learning www . xinggangw - - PowerPoint PPT Presentation
CSIG We Weakly and deeply supervised vi visual learning www . xinggangw . info 1 Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et
1
Berman et al., What’s the Point: Semantic Segmentation with Point Supervision, ECCV 16 Slide credit: Hakan Bilen
Person, Horse
[Verbeek CVPR 07, Pendey, ICCV 11, Cinbis CVPR 14, Wang ECCV 14, Papandreou ICCV 15, BelienCVPR 15, Tang CVPR 17, Wei CVPR 17, Singh ICCV 17, Huang CVPR 18 etc.]
[Tokmakov ECCV 16] [Papazoglou, ICCV 13]
[BearmanECCV 16]
DEXTR [Maninis CVPR 18]
[Padadopoulos ICCV 17, Maninis CVPR 18]
[BearmanECCV 16] MIL Cut [Wu CVPR 14]
BoxSup [Dai ICCV 15] [Rother SIGGRAPH 04, Dai ICCV 15, Khoreva CVPR 17]
[Hou et al, arxiv 18]
[Mahajan, ECCV 18]
YOLO9000, CVPR 2017 Best Paper Honorable Mention [Redmon CVPR 17]
Blue: COCO class. Dark: ImageNet class
[Inoue, CVPR 18]
C-WSL [Wang, ECCV 18]
bMCL [Zhu, CVPR 12, PAMI 15]
Polygon-RNN, Honorable Mention Best Paper Award [Castrejon, CVPR 17]
Polygon-RNN cuts down the number of required annotation clicks by a factor of 4.74
Full supervision Incomplete supervision Inaccurate supervision Inexact supervision
Person, Dog
WSL [Zhou, 2018, National science review]:
Person, Horse
Slide credit: VittoFerrari
1. Window space (usually, using object proposals) 2. Initialization 3. Re-localization & Re-training
[Chum CVPR 07, Deselaers ECCV 10, Siva ICCV 11, Wang ICCV15, BilenCVPR 15]
Slide credit: Hakan Bilen [BilenCVPR 16]
J End-to-end Region CNN for WSOD L Normalization over classes hurts performance
J The positive proposals in one image are not sharing score J Performance significantly improves L The instance-level in-network supervision may not be correct
[Tang CVPR 17]
WSDDN OICR PCL
J In-network supervision for proposal cluster is more robust J MIL in MIL network (Bag in bag MIL) L It still relies on hand-crafted
[Tang, arXiv:1807.03342v1, under revision of TPAMI]
[Tang ECCV 18]
J Generating object proposals from neural activations J Confirming that CNN contains rich localization information even under weak supervision J The first weakly supervised region proposal network (wsRPN)
[Shen CVPR 18]
J Training SSD by WSOD using GAN loss J Fast inference speed using SSD J Accurate WSOD by adversarial learning
WSOD performance (mAP on PASCAL VOC 2007 test)
39.3 42.8 43.7 47 47.5 50.4 69.9 WSDDN (CVPR16) WCCN (CVPR17) HCP+ (CVPR17) OCIR (CVPR17) GAL-FWSD512 (CVPR18) WSRPN (ECCV18) FASTER RCNN (PAMI17)
[Zhou CVPR 16]
J Finding discriminative regions by Global Average Pooling in a CNN trained using image labels J A very insightful work for understanding CNN
[Wei CVPR 17]
J Adversarial erasing finds dense and complete object regions J Very impressive WSSS results
[Kolesnikov ECCV 16]
J Seed with weak localization cues J Expand with image labels J Constrain to object boundary using CRF
Segmentation Network Classification network
Seeding Loss
Boundary Loss
Downscale CRF Seed Seed
[Huang CVPR 18]
[Wang CVPR 18]
WSSS performance (mIoU on PASCAL VOC 2012 test)
24.9 35.6 39.6 45.1 51.2 51.7 52.7 55.7 59.2 60.3 63.2 70 10 20 30 40 50 60 70 80 MIL-FCN (CVPRW14) CCNN (CVPR15) EM-ADAPT (ICCV15) DCSM (ECCV16) STC (PAMI16) SEC (ECCV16) AF-SS (ECCV16) AE-PSL (CVPR17) DCSP (BMVC17) MCOF (CVPR18) DSRG (CVPR18) FCN (RESNET101)
www.xinggangw.info