0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st - PowerPoint PPT Presentation

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020 Challenge 1) Fan Yang 1,2 , Xin Chang 1 , Chenyu Dang 1 , Ziqiang Zheng 3 , Sakriani Sakti 1,2 , Satoshi Nakamura 1,2 , Yang Wu 4 1 Nara Institute of Science and Technology, Japan 2 RIKEN Center for Advanced Intelligence Project, Japan 3 UISEE Technology (Beijing) Co. Ltd., China 4 Kyoto University, Japan

Background of Multi-Object Tracking and Segmentation (MOTS) • Problem: detect, segment, and track multiple objects in videos. • Input: a video sequence contain that multiple RGB images. • Output: 2D mask and corresponding track ID at each frame. • Application: action recognition, automatic driving, and others. Input Video Data MOTS Instance Segmentation frame k frame k frame k track_id 1 track_id 2 frame k+1 frame k+1 frame k+1 track_id 1 track_id 2 detect+segment detect+segment+track 1

Our solution for MOTS Use off-the-shelf models Step 1 Input Video Data Instance Segmentation frame k frame k frame k+1 frame k+1 detect+segment 2

Instance Segmentation We take off-the-shelf models: X-101-64x4d-FPN of MMDetection + Mask R-CNN X152 of Detectron 2, which refers to the public detection and segmentation methods. But, how to fuse instance masks from different models? Fusing boxes – using NMS Fusing masks – may also using NMS – but change IoU to IoM (Intersection over Minimum). 2 2 0.1 0.1 Instance masks: 1 1 Pixel_IoU = 1/3 = 0.33 Pixel_IoM = 1/2 = 0.5 Acceptable for bounding box, But not for mask. Pixel_IoU = 0.01/2 = 0.005 Pixel_IoM = 0.01/0.01 = 1 3

Our solution for MOTS We proposed an offline method, as ReMOTS (Refining Multi-Object Tracking and Segmentation ). Our main contributions: 1. Refine appearance features 2. Automatically decide threshold Step 1 Step 2 Input Video Data MOTS Instance Segmentation frame k frame k frame k track_id 1 track_id 2 frame k+1 frame k+1 frame k+1 track_id 1 track_id 2 detect+segment detect+segment+track 4

Intra-frame Training and Short-term Tracking t 1 t 1 Intra-frame Training t 2 t 2 Appearance t 3 Ground Truth: t 3 Encoder t 4 t 4 ID 2 ID 3 Short-term t 5 t 5 Tracker Hypotheses: t 6 t 6 9 id 3 9 id 5 9 id 6 9 id 2 t 7 Raw t 7 t 8 t 8 Object-instance Short-term Segmentation Tracklets Estimated Bounding Boxes Form a Mini Batch Ground-truth Tracklets in Test Set Input by the Ratio 1:1 in Training Set frame k Ground Truth: Intra-frame t 1 Observations ID 7 ID 8 Augmentation P Temporal P t 2 t n Hypotheses: overlapped & t 3 '(( ! "#$%& non-overlapped # tracklets # id 2 N Inter-tracklet t 4 id 3 N Intra-frame sampling sampling t 5 Intra-frame Training frame k+1 Cosine Appearance Encoder Distance Linear 0.2 0.4 inf Assignment inf inf 0.3 For masks of frame k, Embedding BoT-Reid (TMM 2020) Appearance Distance Matrix Appearance consider all of IoU > 0 masks Features of frame k+1 for matching 5

Inter-short-term-tracklet Training t 1 t 1 Intra-frame Training Inter-short-term-tracklet Training t 1 t 2 t 2 g Inter-short-term-tracklet Training ( )* ( Appearance )+ t 3 Appearance ( Ground Truth: t 3 ), Similarity Encoder ( Cosine Encoder t 2 )- t 4 ( t 4 )* inf inf 0.1 0.4 ( ( ). ID 2 ID 3 ( )+ Short-term )/ t 3 t 5 Appearance ( ( t 5 )0 Similarity ), ( Tracker Hypotheses: inf inf 0.5 0.2 ( )1 Cosine Encoder )- t 6 t 6 t 4 if same split tracklet ID: 0.1 0.5 inf inf 9 id 3 9 id 5 9 id 6 9 id 2 inf inf 0.1 0.4 t 7 Raw t 7 ( set inf ). ( elif temporal overlapping: 0.4 0.2 inf inf t 8 )/ t 5 t 8 ( set inf )0 Distance Matrix W long ( else: inf inf 0.5 0.2 )1 Object-instance Short-term set cosine distance Segmentation t 6 Tracklets if same split tracklet ID: 0.1 0.5 inf inf t 7 set inf elif temporal overlapping: 0.4 0.2 inf inf t 8 set inf Distance Matrix W long else: Short-term set cosine distance Tracklets Estimated Short-term- Form a Mini Batch Ground-truth Tracklets tracklets in Test Set Input by the Ratio 1:1 in Training Set t 1 t 1 Ground Truth: P P t 2 t 2 ID 7 ID 8 t 3 t 3 Inter-short-term- Hypotheses: N Inter-tracklet t 4 t 4 N tracklet sampling sampling t 5 # # # # t 5 id 2 id 3 id 5 id 6 Temporal overlapped & Temporal overlapped non-overlapped tracklets short-term tracklets 6

What Happened in Each Step of Appearance Training J (H 1 , H 2 ) represents Jaccard Index of two normalized histograms H 1 and H 2 . (3) After Inter-short-tracklet training (2) After Intra-frame training on (1) After Trained on the train on test set with pseudo labels test set without labels set only Intra-frame instance masks Intra-short-tracklet instance mask t 1 t 1 g g t 2 t 2 t 3 t 3 t 4 t 4 7

Merging Short-term Tracklets Hierarchical Clustering t 1 t 1 t 1 Intra-frame Training Inter-short-term-tracklet Training Hierarchical Clustering t 1 t 2 Inter-short-term-tracklet Training t 2 t 2 Distance ( )* ( Appearance )+ t 3 Appearance ( t 3 Ground Truth: t 3 ), Similarity Encoder ( Cosine Cutting threshold Encoder )- t 2 Distance &'' = 2 − ! 456%5 t 4 t 4 t 4 ( inf inf 0.1 0.4 )* ( ). ( ID 2 ID 3 ( )+ Short-term )/ t 5 Appearance t 5 ( t 5 ( t 3 )0 Similarity ( ), Tracker Hypotheses: inf inf 0.5 0.2 )1 ( Cosine Encoder Cutting threshold t 6 clusters )- t 6 t 6 &'' = 2 − ! 456%5 if same split tracklet ID: 0.1 0.5 inf inf t 4 9 id 3 9 id 5 9 id 6 9 Intra-frame &'' id 2 ! "#$% t 7 t 7 Raw t 7 inf inf 0.1 set inf 0.4 Cosine Affinity ( ). elif temporal overlapping: Intra-short-tracklet 0.4 0.2 inf inf t 8 ( t 8 t 8 Cosine Affinity )/ set inf t 5 ( W long )0 else: Distance Matrix Object-instance Short-term Long-term ( )1 inf inf 0.5 0.2 set cosine distance Segmentation Tracklets Tracklets clusters t 6 if same split tracklet ID: 0.1 0.5 inf inf Intra-frame &'' ! "#$% t 7 set inf Cosine Affinity elif temporal overlapping: Intra-short-tracklet 0.4 0.2 inf inf t 8 Cosine Affinity set inf W long else: Distance Matrix Long-term set cosine distance Tracklets t 1 t 2 Temporal- overlapped t 3 Short-term Tracklets t 4 &'' ! "#$% Intra-short- Intra-frame tracklet Cosine Affinity Cosine Affinity 8

Comparison with others on MOTSChallenge 1 May benefit from refinement May benefit from mask fusion Since our strategy can be easily adapted to others, will other methods get better performance by applying our appearance encoder and merging? 9

Limitations of ReMOTS 1. An offline approach. - It worth to explore how to bring it to online approach. 2. It is challenging for ReMOTs to handle objects with similar appearance. e.g., good for persons (wear different clothes) but not very useful for vehicles (similar textures) 3. Trajectory is not considered in our short-term tracker. Failed to associate fast moving objects. Fast moving car with similar appearance ? Slowly moving person with diverse clothes 10

Conclusion • Unlabeled target videos can be used for learning better appearance features, but should take care of the potential of introducing noises. • The suitable hyper parameters for data association may varies from case to case, and the statistical information of tracklets can be used to adjust them. • It is preferred to accommodate some insights of ReMOTS to online MOTS. 11

Thanks for your listening

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st - PowerPoint PPT Presentation

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020 Challenge 1) Fan Yang 1,2 , Xin Chang 1 , Chenyu Dang 1 , Ziqiang Zheng 3 , Sakriani Sakti 1,2 , Satoshi Nakamura 1,2 , Yang Wu 4 1 Nara Institute of

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

REFINING NZ ANNUAL GENERAL MEETING | 12 APRIL 2019 REFINING NZ ANNUAL GENERAL MEETING | 12 APRIL

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D (Multi) Object Detection, Tracking and Segmentation 1 CV3DST | Laura Leal-Taix, Aljoa

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Semantic segmentation Image classification Object detection Semantic segmentation Evolution

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

2017 FLYSET FTC Workshop Hosted by Software Topics Session Brandon Wang Agenda Rev

Northeasterns Evolution Catalyzed by Data Analytics Kathy Spiegelman Vice President &

Xamarin.Forms Introduction to Xamarin Who is this guy? Cross platform developer RedBull Event

Theory of Computer Games Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

What Do Pets Need to be Healthy and Happy? Sam Smith What Do Pets Need to be Healthy and Happy?

Applying CNL Authoring Support to Improve Machine Translation of Forum Data Sabine Lehmann Siu

Les compromis temps-m emoire ` a lassaut de vos (nos) mots de passe ! Gildas Avoine

Toward Astrophysical Black-Hole Binaries Gregory B. Cook Wake Forest University Mar. 29, 2002

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st - PowerPoint PPT Presentation

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020 Challenge 1) Fan Yang 1,2 , Xin Chang 1 , Chenyu Dang 1 , Ziqiang Zheng 3 , Sakriani Sakti 1,2 , Satoshi Nakamura 1,2 , Yang Wu 4 1 Nara Institute of

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

REFINING NZ ANNUAL GENERAL MEETING | 12 APRIL 2019 REFINING NZ ANNUAL GENERAL MEETING | 12 APRIL

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D (Multi) Object Detection, Tracking and Segmentation 1 CV3DST | Laura Leal-Taix, Aljoa

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Semantic segmentation Image classification Object detection Semantic segmentation Evolution

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

2017 FLYSET FTC Workshop Hosted by Software Topics Session Brandon Wang Agenda Rev

Northeasterns Evolution Catalyzed by Data Analytics Kathy Spiegelman Vice President &amp;

Xamarin.Forms Introduction to Xamarin Who is this guy? Cross platform developer RedBull Event

Theory of Computer Games Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

What Do Pets Need to be Healthy and Happy? Sam Smith What Do Pets Need to be Healthy and Happy?

Applying CNL Authoring Support to Improve Machine Translation of Forum Data Sabine Lehmann Siu

Les compromis temps-m emoire ` a lassaut de vos (nos) mots de passe ! Gildas Avoine

Toward Astrophysical Black-Hole Binaries Gregory B. Cook Wake Forest University Mar. 29, 2002

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Northeasterns Evolution Catalyzed by Data Analytics Kathy Spiegelman Vice President &