Recent Progresses in Visual Segmentation Yunchao Wei ReLER, - PowerPoint PPT Presentation

VALSE Web Webinar nar Recent Progresses in Visual Segmentation Yunchao Wei ReLER, Australian Artificial Intelligence Institute University of Technology Sydney

The importance of visual segmentation Medical Agriculture Autonomous Vehicle Satellite Imagery Imagery Video Editing Robotics UT UTS ReLER ER Lab Lab VALSE We Webinar 2

Outline Part I: Semantic Segmentation Part II: Interactive Image Segmentation Part III: Video Object Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 3

Part I: Semantic Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 4

Semantic Segmentation Pascal VOC ADE 20K LIP Cityscapes UT UTS ReLER ER Lab Lab VALSE We Webinar 5

Context Modeling in FCN Structures Non-adaptive context modeling [Long et al. ICCV 2015] [Ronneberger et al. MICCAI 2015] [Chen et al. PAMI 2018] [Zhao et al. CVPR 2017] [Chen et al. ECCV 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 6

Graph Neural Networks Adaptive context modeling but [Wang et al. CVPR 2018] high computational complexity UT UTS ReLER ER Lab Lab VALSE We Webinar 7

Criss-Cross Attention Criss-cross attention block, a.k.a. , sparse connected self-attention [Huang et al. ICCV 2019] UT UTS ReLER ER Lab Lab VALSE We Webinar 8

Recurrent Criss-Cross Attention Criss-cross (R=2) equals to Non-local network Time & space complexity: 𝑃 𝑂 ! → 𝑃(𝑂) UT UTS ReLER ER Lab Lab VALSE We Webinar 9

CCNet: Criss-cross Network UT UTS ReLER ER Lab Lab VALSE We Webinar 10

Results on Cityscapes More accurate, 15% FLOPS & 9% memory cost UT UTS ReLER ER Lab Lab VALSE We Webinar 11

Results on ADE20K, LIP & COCO Human parsing results on LIP Scene parsing results on ADE20K Instance segmentation results on COCO UT UTS ReLER ER Lab Lab VALSE We Webinar 12

From Image to Video CCNet 3D Video semantic segmentation results on CamVID [Huang et al. PAMI 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 13

Visualization of the Learned Context on Cityscapes Image Ground Truth R=1 R=2 UT UTS ReLER ER Lab Lab VALSE We Webinar 14

Follow-up Works Axial-Deeplab [Wang et al. ECCV 2020] Axial Attention [Ho et al. arxiv 2019] UT UTS ReLER ER Lab Lab VALSE We Webinar 15

Recent Hotspots: Boundary modeling for better segmentation [Cheng et al. ECCV 2020] [Cheng et al. CVPR 2020] [Chen et al. CVPR 2020] [Li et al. ECCV 2020] [Kirillov et al. CVPR 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 16

Part II: Interactive Image Segmentation UT UTS ReLER ER Lab Lab VALSE We Webinar 17

What is Interactive Image Segmentation? • Semi-automated, class-agnostic segmentation • Target object depends on the user inputs (e.g. points) • Allows iterative refinement until result is satisfactory Target object Unrelated region UT UTS ReLER ER Lab Lab VALSE We Webinar 18

Why should we consider interactive image segmentation? ≈ 60s per instance Unaffordable!! ≈ 1.5 hours per image UT UTS ReLER ER Lab Lab VALSE We Webinar 19

Why should we consider interactive image segmentation? Accurately & Efficiently UT UTS ReLER ER Lab Lab VALSE We Webinar 20

Standard pipeline • RGB image and user interactions are used as the network input • Train end-to-end with FCNs (e.g. Deeplab series, PSPNet) Image User interactions Fully convolutional network (FCN) Ground-truth [Xu et al. CVPR 2016] UT UTS ReLER ER Lab Lab VALSE We Webinar 21

Common types of user interaction • Sparse clicks • Bounding box • Scribbles UT UTS ReLER ER Lab Lab VALSE We Webinar 22

Common types of user interaction • Sparse clicks • Bounding box • Scribbles UT UTS ReLER ER Lab Lab VALSE We Webinar 23

Common types of user interaction • Sparse clicks ≈ 2s per instance Manual annotation • Bounding box ≈ 7s per instance • Scribbles ≈ 60s per instance ≈ 17s per instance UT UTS ReLER ER Lab Lab VALSE We Webinar 24

Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 25

Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs Segmentation Network Cropped image Location cues [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 26

Existing State-of-the-Art Method: DEXTR • DEXTR (Deep Extreme Cut) • Take 4 extreme points (top, bottom, leftmost and rightmost pixels) as inputs • Problems • Multiple extreme points appear at similar location • Unrelated object lying inside the target object [Maninis et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 27

Inside-Outside Guidance (IOG) • Inside guidance (1 click) • Interior point located roughly at the object center • Disambiguate the segmentation target • Outside guidance (2 clicks) • 2 corner clicks of a box enclosing the object • Indicate the background region • The remaining 2 corners can be inferred automatically [Zhang et al. CVPR 2020] UT UTS ReLER ER Lab Lab VALSE We Webinar 28

Clicking Paradigm • Click on a corner point • Click on the symmetrical corner • Click on the object center Clicks Time Outside clicks 6.7s Inside click 1.5s UT UTS ReLER ER Lab Lab VALSE We Webinar 29

Input Representation • Follow the practice of DEXTR • Enlarge the bounding box by 10 pixels to include context • Crop and resize the inputs to 512x512 • Input representation • 2 separate Gaussian heatmaps for the inside and outside clicks Segmentation Network RGB Image Inside Guidance Outside Guidance UTS ReLER UT ER Lab Lab VALSE We Webinar 30

Network Architecture • Segmentation errors mostly occur around the object boundaries UT UTS ReLER ER Lab Lab VALSE We Webinar 31

Network Architecture • Segmentation errors mostly occur around the object boundaries • Use a coarse-to-fine network structure (b) FineNet (a) CoarseNet [Chen et al. CVPR 2018] UT UTS ReLER ER Lab Lab VALSE We Webinar 32

Network Architecture • Segmentation errors mostly occur around the object boundaries • Use a coarse-to-fine network structure UT UTS ReLER ER Lab Lab VALSE We Webinar 33

Beyond Three Clicks • Our IOG naturally supports interactive adding of new clicks • Add a lightweight branch to accept additional inputs • Train with iterative training strategy Refinement Optional click for refinement (b) FineNet (a) CoarseNet UT UTS ReLER ER Lab Lab VALSE We Webinar 34

IOG vs. Extreme Clicks • Observation • IOG is more effective than extreme points across different backbone UT UTS ReLER ER Lab Lab VALSE We Webinar 35

IOG vs. Extreme Clicks • Observation • IOG is more effective than extreme points across different backbone • Using a coarse-to-fine network structure further improves the performance UT UTS ReLER ER Lab Lab VALSE We Webinar 36

Comparison with SOTA PASCAL GrabCut 100 96.9 96.3 94.4 94.4 93.2 91.5 90 85 84 80.7 80 75.2 70 59.3 60 56.9 55.6 55.1 50 45.9 41.1 40 Graph cut Random Geodesic iFCN RIS-Net DEXTR IOG(3 clicks) IOG(4 clicks) walker matting UT UTS ReLER ER Lab Lab VALSE We Webinar 37

Generalization • Our IOG performs well even on unseen categories • Performs well across different domain even without fine-tuning • Can be further improved using 10% domain data for fine-tuning 95 85 85 83.0% 92.8 82.1% 83.8 81.4 84 90.7 81.7% 82.0% 90 80 83 81.0% 80.3% 82 85 79.9% 80.0% 75 81 80.2 80 78.2 80 Curve-GCN 79.0% 79.4 70 79 DEXTR 78.0% 75 78 IOG 77.0% 65 77 68.3 70 76.0% 76 60.9 65 60 75 75.0% W FT W/O FT W/O FT CURVE- DEXTR IOG seen unseen PASCAL -> COCO Aerial imagery Medical domain Autonomous driving GCN UTS ReLER UT ER Lab Lab VALSE We Webinar 38

Qualitative Results Cityscapes Agriculture-Vision Rooftop ssTEM General object scenes UT UTS ReLER ER Lab Lab VALSE We Webinar 39

Demo [Youtube] [Bilibili] UT UTS ReLER ER Lab Lab VALSE We Webinar 40

Automated Mode of IOG Segmentation Network RGB Image Inside Guidance Outside Guidance UT UTS ReLER ER Lab Lab VALSE We Webinar 41

Automated Mode of IOG Segmentation Network Outside Guidance RGB Image • Without user interaction, our IOG can still harvest high quality masks from off-the- shelf datasets with box annotations (e.g. ImageNet) • Solution: Two-stage Training: Inputs IoU (PASCAL) (S1) Train a network that takes box as inputs w/ human 93.2 (S2) Infer interior clicks from the masks produced in S1 w/o human 91.1 and apply IOG UTS ReLER UT ER Lab Lab VALSE We Webinar 42

IM GENET PIXEL- https://github.com/shiyinzhang/Pixel-ImageNet Characteristics Possible Applications • #Classes: 1000 Image classification • #Instance: >600K Instance segmentation Semantic segmentation Salient object detection …. and more UT UTS ReLER ER Lab Lab VALSE We Webinar 43 43

Failure Cases UT UTS ReLER ER Lab Lab VALSE We Webinar 44

Recent Progresses in Visual Segmentation Yunchao Wei ReLER, - PowerPoint PPT Presentation

VALSE Web Webinar nar Recent Progresses in Visual Segmentation Yunchao Wei ReLER, Australian Artificial Intelligence Institute University of Technology Sydney The importance of visual segmentation Medical Agriculture Autonomous Vehicle

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Recent progresses in the calculation of the aggregate exposure to fragrance ingredients Bob

Recent Progresses in Stochastic Algorithms for Big Data Optimization Tong Zhang Rutgers

Build Better Products with HYPOTHESIS TRACKER KALPESH SHAH KSHAH@INTRAEDGE.COM Kalpesh Shah

Common Vulnerability Scoring System Engineering Secure Software Last Revised: November 13, 2020

Search Snippet Evaluation Mikhail Lebedev, Pavel Braslavski, Denis Savenkov CLEF 2011 CLEF 2011

Introduction to Web Application Security Professor Larry Heimann Web Application Security

1 Relying primarily on telling as our way to information Technology integration across to

NFSv4 Requirements Spencer Shepler spencer.shepler@eng.sun.com Spencer Shepler 42nd IETF NFSv4

Networking Fundamentals 2 Lab Schedule AcEviEes Assignments

Factor division Let X and Y be disjoint set of variables Consider two factors: 1 ( X ,