Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , - PowerPoint PPT Presentation

Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , Wenwei Zhang 2 , Yuhang Zang 2 , Yuhang Cao 1 , Jiangmiao Pang 3 , Tao Gong 4 , Kai Chen 1 , Ziwei Liu 1 , Chen Change Loy 2 , Dahua Lin 1 1 The Chinese University of Hong Kong 2 Nanyang Technological University 3 Zhejiang University 4 University of Science and Technology of China Team: MMDet

Results Comparison of our entry with official baseline on LVIS v1 test-dev. 45.42 38.9 40.42 35.42 30.42 26.8 MASK AP 25.42 20.42 15.42 10.42 5.42 0.42 Baseline AP MMDet AP

Results Comparison of our entry with official baseline on LVIS v1 test-dev. 50.42 45.4 45.42 40.42 37 35.42 32 29.5 30.42 MASK AP 25.2 25.42 19 20.42 15.42 10.42 5.42 0.42 Basline APr MMDet APr Basline APc MMDet APc Basline APf MMDet APf

Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed instance segmentation.

Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed instance segmentation. 2. We propose HTC-Lite , a light-weight version of Hybrid Task Cascade (HTC).

Seesaw Loss Existing object detectors struggle on long-tailed datasets, exhibiting unsatisfactory performance on rare classes. The reason lies on that the overwhelming number of samples in frequent classes leads to models whose rare class confidences are severely suppressed.

Seesaw Loss To tackle this problem, we propose Seesaw Loss for long-tailed instance segmentation. • Dynamic: Seesaw Loss dynamically modifies the penalty according to the relative ratio of instance numbers between each category pair. Smooth: Seesaw Loss smoothly adjusts the punishment on rare classes when the • training instances are positive samples of other relatively frequent categories. Self-calibrated: It directly learns to balance the penalty to each categories during • training, without relying on known dataset distributions or a specific data sampler.

Seesaw Loss Seesaw Loss can be derived from cross-entropy loss:

Seesaw Loss Seesaw Loss accumulates the number of training samples for each category during each training iteration. Given an instance with positive label 𝒋 , for the other category 𝒌 , Seesaw Loss dynamically 𝑶 𝒌 adjusts the penalty for negative label 𝒌 w.r.t. 𝑶 𝒋 as,

Seesaw Loss Seesaw Loss When category 𝒋 is more frequent than category 𝒌 , Seesaw Loss will reduce the penalty on • 𝒒 𝑶 𝒌 category 𝒌 by a factor of to protect category 𝒌 . 𝑶 𝒋 Otherwise, Seesaw Loss will keep the penalty on negative classes to reduce misclassification. •

Seesaw Loss Normalized Linear Layer. We adopt a normalized linear layer to predict classification activation. Objectness Score. We adopt an objectness branch to predict objectness scores with normalized linear layer and cross-entropy loss.

Seesaw Loss

Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed classification. 2. We propose HTC-Lite , a light-weight version of Hybrid Task Cascade (HTC).

HTC-Lite • Original HTC Semantic Segmentatin M1 M2 M3 B3 B1 B2 Pool Pool Pool Pool Pool F

HTC-Lite • Reduce the number of mask heads Semantic Segmentatin M3 B3 B1 B2 Pool Pool Pool Pool Pool F

HTC-Lite • Use context encoding rather than semantic segmentation head • Does not rely on semantic segmentation annotation Context Encoding M3 B3 B1 B2 Pool Pool Pool Pool Pool F

HTC-Lite Comparison with HTC w/o semantic Dataset Method Bbox AP Mask AP HTC w/o semantic 41.5 36.7 COCO HTC-Lite 42.5 37.8 HTC w/o semantic 26.8 24.5 LVIS V1 HTC-Lite 27.2 25.2

Experiments Training/Testing details 1. Training Dataset: Detectors: LVIS v1 training split • No extra data or annotation is used in our entry. 2. Training scales long edge: random sampled from 768~1792 pixels • Random crop to 1280 x 1280 • 3. Augmentation Use InstaBoost • 4. Test time augmentation • Random flip • Scales: (1200, 1200), (1400, 1400), (1600, 1600), (1800, 1800), (2000, 2000)

Experiments Model Modifications Synchronized BN • CARAFE • HTC-Lite • TSD • Mask Scoring • Better Neck: FPG + CARAFE + DCNv2 • Better Backbone: ResNest-200 + DCNv2 •

Experiments mask AP on val 40 38 36 34 32 30 28 26 24 22 18.7 20 18 Baseline

Experiments mask AP on val 40 38 36 34 32 30 28 26 24 18.9 22 (+0.2) 18.7 20 SyncBN 18 Baseline

Experiments mask AP on val 40 38 36 34 32 30 28 26 24 19.4 18.9 22 (+0.5) (+0.2) 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 32 30 28 26 21.9 (+2.5) 24 19.4 18.9 22 (+0.5) (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 32 30 28 23.5 26 21.9 (+1.6) (+2.5) 24 19.4 18.9 TSD 22 (+0.5) (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 32 30 28 23.9 23.5 26 (+0.4) 21.9 (+1.6) (+2.5) 24 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 32 30 26.5 28 (+2.6) 23.9 23.5 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 32 27 30 26.5 (+0.5) 28 (+2.6) 23.9 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 38 36 34 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 40 36.8 (+6.9) 38 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 37.3 40 36.8 (+0.5) (+6.9) 38 Finetune 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Experiments mask AP on val 38.92 on test-dev 38.8 37.3 (+1.5) 40 36.8 (+0.5) (+6.9) 38 Test Time Aug. Finetune 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample

Supported Methods • • RPN • GRoIE HRNet • • Guided Anchoring • DIoU GCNet We recently release • • Fast / Faster R-CNN • CIoU NAS-FPN MMDetetion v2.0 & MMDetetion3D • • R-FCN • BIoU PAFPN • • Grid R-CNN • RetinaNet FSAF • • Libra R-CNN • ATSS PointRend • • Mask R-CNN • SSD Instaboost • • Dynamic R-CNN • GHM Mixed Precision Training • • Mask scoring R-CNN • OHEM CARAFE • • Double Head R-CNN • FCOS DCN / DCN V2 • • Cascade R-CNN • NAS-FCOS Weight Standardization GitHub: MMDet GitHub: MMDet3D • • Hybrid Task Cascade • FoveaBox Generalized Attention • • DetectoRS • Reppoints Generalized Focal Loss

Thank you!

Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , - PowerPoint PPT Presentation

Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , Wenwei Zhang 2 , Yuhang Zang 2 , Yuhang Cao 1 , Jiangmiao Pang 3 , Tao Gong 4 , Kai Chen 1 , Ziwei Liu 1 , Chen Change Loy 2 , Dahua Lin 1 1 The Chinese University of Hong Kong 2

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Introduction for Families Watch our video introducing Seesaw Seesaw is how were sharing what

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Probing left-right seesaw in colliders R. N. Mohapatra ACFI Neutrino workshop, July 2017 Why

Seesaws Unequal-weight children dont normally balance the seesaw Moving the heavier

Budget-aware Semi-Supervised Semantic and Instance Segmentation Miriam Bellver, Amaia Salvador,

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Rethinking Class-Balanced Methods for Long-tailed Visual Recognition from a Domain Adaptation

Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang , Saining Xie,

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds B. Yang, J. Wang,

Introduction for Families Seesaw is a new way were sharing what your child is learning at

Study Question Study Question What advantages do modern psychophysical What advantages do modern

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

Patent Law Prof. Roger Ford Class 11 October 4, 2017 Novelty and statutory bars:

Epistemic Game Theory Lecture 3 ESSLLI12, Opole Eric Pacuit Olivier Roy TiLPS, Tilburg

Proving Linearizability Using Partial Orders Artem Khyzha Mike Dodds Alexey Gotsman Matthew

Structure-preserving numerical methods in relativity Douglas N. Arnold, University of Minnesota

De lUne lAutre Towards Linked Data in Special Collections Cataloging Regine Heberlein