recent progresses in visual segmentation
play

Recent Progresses in Visual Segmentation Yunchao Wei ReLER, - PowerPoint PPT Presentation

VALSE Web Webinar nar Recent Progresses in Visual Segmentation Yunchao Wei ReLER, Australian Artificial Intelligence Institute University of Technology Sydney The importance of visual segmentation Medical Agriculture Autonomous Vehicle


  1. VALSE Web Webinar nar Recent Progresses in Visual Segmentation Yunchao Wei ReLER, Australian Artificial Intelligence Institute University of Technology Sydney

  2. The importance of visual segmentation Medical Agriculture Autonomous Vehicle Satellite Imagery Imagery Video Editing Robotics UT UTS ReLER ER Lab Lab VALSE We Webinar 2

  3. Outline Part I: Semantic Segmentation Part II: Interactive Image Segmentation Part III: Video Object Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 3

  4. Part I: Semantic Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 4

  5. Semantic Segmentation Pascal VOC ADE 20K LIP Cityscapes UT UTS ReLER ER Lab Lab VALSE We Webinar 5

  6. Context Modeling in FCN Structures Non-adaptive context modeling [Long et al. ICCV 2015] [Ronneberger et al. MICCAI 2015] [Chen et al. PAMI 2018] [Zhao et al. CVPR 2017] [Chen et al. ECCV 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 6

  7. Graph Neural Networks Adaptive context modeling but [Wang et al. CVPR 2018] high computational complexity UT UTS ReLER ER Lab Lab VALSE We Webinar 7

  8. Criss-Cross Attention Criss-cross attention block, a.k.a. , sparse connected self-attention [Huang et al. ICCV 2019] UT UTS ReLER ER Lab Lab VALSE We Webinar 8

  9. Recurrent Criss-Cross Attention Criss-cross (R=2) equals to Non-local network Time & space complexity: 𝑃 𝑂 ! → 𝑃(𝑂) UT UTS ReLER ER Lab Lab VALSE We Webinar 9

  10. CCNet: Criss-cross Network UT UTS ReLER ER Lab Lab VALSE We Webinar 10

  11. Results on Cityscapes More accurate, 15% FLOPS & 9% memory cost UT UTS ReLER ER Lab Lab VALSE We Webinar 11

  12. Results on ADE20K, LIP & COCO Human parsing results on LIP Scene parsing results on ADE20K Instance segmentation results on COCO UT UTS ReLER ER Lab Lab VALSE We Webinar 12

  13. From Image to Video CCNet 3D Video semantic segmentation results on CamVID [Huang et al. PAMI 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 13

  14. Visualization of the Learned Context on Cityscapes Image Ground Truth R=1 R=2 UT UTS ReLER ER Lab Lab VALSE We Webinar 14

  15. Follow-up Works Axial-Deeplab [Wang et al. ECCV 2020] Axial Attention [Ho et al. arxiv 2019] UT UTS ReLER ER Lab Lab VALSE We Webinar 15

  16. Recent Hotspots: Boundary modeling for better segmentation [Cheng et al. ECCV 2020] [Cheng et al. CVPR 2020] [Chen et al. CVPR 2020] [Li et al. ECCV 2020] [Kirillov et al. CVPR 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 16

  17. Part II: Interactive Image Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 17

  18. What is Interactive Image Segmentation? • Semi-automated, class-agnostic segmentation • Target object depends on the user inputs (e.g. points) • Allows iterative refinement until result is satisfactory Target object Unrelated region UT UTS ReLER ER Lab Lab VALSE We Webinar 18

  19. Why should we consider interactive image segmentation? ≈ 60s per instance Unaffordable!! ≈ 1.5 hours per image UT UTS ReLER ER Lab Lab VALSE We Webinar 19

  20. Why should we consider interactive image segmentation? Accurately & Efficiently UT UTS ReLER ER Lab Lab VALSE We Webinar 20

  21. Standard pipeline • RGB image and user interactions are used as the network input • Train end-to-end with FCNs (e.g. Deeplab series, PSPNet) Image User interactions Fully convolutional network (FCN) Ground-truth [Xu et al. CVPR 2016] UT UTS ReLER ER Lab Lab VALSE We Webinar 21

  22. Common types of user interaction • Sparse clicks • Bounding box • Scribbles UT UTS ReLER ER Lab Lab VALSE We Webinar 22

  23. Common types of user interaction • Sparse clicks • Bounding box • Scribbles UT UTS ReLER ER Lab Lab VALSE We Webinar 23

  24. Common types of user interaction • Sparse clicks ≈ 2s per instance Manual annotation • Bounding box ≈ 7s per instance • Scribbles ≈ 60s per instance ≈ 17s per instance UT UTS ReLER ER Lab Lab VALSE We Webinar 24

  25. Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 25

  26. Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs Segmentation Network Cropped image Location cues [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 26

  27. Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs • Problems • Multiple extreme points appear at similar location • Unrelated object lying inside the target object [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 27

  28. Inside-Outside Guidance (IOG) • Inside guidance (1 click) • Interior point located roughly at the object center • Disambiguate the segmentation target • Outside guidance (2 clicks) • 2 corner clicks of a box enclosing the object • Indicate the background region • The remaining 2 corners can be inferred automatically [Zhang et al. CVPR 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 28

  29. Clicking Paradigm • Click on a corner point • Click on the symmetrical corner • Click on the object center Clicks Time Outside clicks 6.7s Inside click 1.5s UT UTS ReLER ER Lab Lab VALSE We Webinar 29

  30. Input Representation • Follow the practice of DEXTR • Enlarge the bounding box by 10 pixels to include context • Crop and resize the inputs to 512x512 • Input representation • 2 separate Gaussian heatmaps for the inside and outside clicks Segmentation Network RGB Image Inside Guidance Outside Guidance UTS ReLER UT ER Lab Lab VALSE We Webinar 30

  31. Network Architecture • Segmentation errors mostly occur around the object boundaries UT UTS ReLER ER Lab Lab VALSE We Webinar 31

  32. Network Architecture • Segmentation errors mostly occur around the object boundaries • Use a coarse-to-fine network structure (b) FineNet (a) CoarseNet [Chen et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 32

  33. Network Architecture • Segmentation errors mostly occur around the object boundaries • Use a coarse-to-fine network structure UT UTS ReLER ER Lab Lab VALSE We Webinar 33

  34. Beyond Three Clicks • Our IOG naturally supports interactive adding of new clicks • Add a lightweight branch to accept additional inputs • Train with iterative training strategy Refinement Optional click for refinement (b) FineNet (a) CoarseNet UT UTS ReLER ER Lab Lab VALSE We Webinar 34

  35. IOG vs. Extreme Clicks • Observation • IOG is more effective than extreme points across different backbone UT UTS ReLER ER Lab Lab VALSE We Webinar 35

  36. IOG vs. Extreme Clicks • Observation • IOG is more effective than extreme points across different backbone • Using a coarse-to-fine network structure further improves the performance UT UTS ReLER ER Lab Lab VALSE We Webinar 36

  37. Comparison with SOTA PASCAL GrabCut 100 96.9 96.3 94.4 94.4 93.2 91.5 90 85 84 80.7 80 75.2 70 59.3 60 56.9 55.6 55.1 50 45.9 41.1 40 Graph cut Random Geodesic iFCN RIS-Net DEXTR IOG(3 clicks) IOG(4 clicks) walker matting UT UTS ReLER ER Lab Lab VALSE We Webinar 37

  38. Generalization • Our IOG performs well even on unseen categories • Performs well across different domain even without fine-tuning • Can be further improved using 10% domain data for fine-tuning 95 85 85 83.0% 92.8 82.1% 83.8 81.4 84 90.7 81.7% 82.0% 90 80 83 81.0% 80.3% 82 85 79.9% 80.0% 75 81 80.2 80 78.2 80 Curve-GCN 79.0% 79.4 70 79 DEXTR 78.0% 75 78 IOG 77.0% 65 77 68.3 70 76.0% 76 60.9 65 60 75 75.0% W FT W/O FT W/O FT CURVE- DEXTR IOG seen unseen PASCAL -> COCO Aerial imagery Medical domain Autonomous driving GCN UTS ReLER UT ER Lab Lab VALSE We Webinar 38

  39. Qualitative Results Cityscapes Agriculture-Vision Rooftop ssTEM General object scenes UT UTS ReLER ER Lab Lab VALSE We Webinar 39

  40. Demo [Youtube] [Bilibili] UT UTS ReLER ER Lab Lab VALSE We Webinar 40

  41. Automated Mode of IOG Segmentation Network RGB Image Inside Guidance Outside Guidance UT UTS ReLER ER Lab Lab VALSE We Webinar 41

  42. Automated Mode of IOG Segmentation Network Outside Guidance RGB Image • Without user interaction, our IOG can still harvest high quality masks from off-the- shelf datasets with box annotations (e.g. ImageNet) • Solution: Two-stage Training: Inputs IoU (PASCAL) (S1) Train a network that takes box as inputs w/ human 93.2 (S2) Infer interior clicks from the masks produced in S1 w/o human 91.1 and apply IOG UTS ReLER UT ER Lab Lab VALSE We Webinar 42

  43. IM GENET PIXEL- https://github.com/shiyinzhang/Pixel-ImageNet Characteristics Possible Applications • #Classes: 1000 Image classification • #Instance: >600K Instance segmentation Semantic segmentation Salient object detection …. and more UT UTS ReLER ER Lab Lab VALSE We Webinar 43 43

  44. Failure Cases UT UTS ReLER ER Lab Lab VALSE We Webinar 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend