MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender - PowerPoint PPT Presentation

Visual Computing Institute Computer Vision MOTS: Multi-Object Tracking and Segmentation MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint work with M. Krause, A. O˘ sep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe CVPR 2019 main conference poster #122, Wednesday 15:20

<latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> Motivation MOTS: Multi-Object Tracking and Segmentation t t video sequence video sequence with bounding boxes with pixel-level segmentation masks ◮ Now many datasets for multi-object tracking available ◮ MOTChallenges ◮ MOT15 [Leal-Taix´ e et al., 2015] ◮ MOT16, MOT17 [Milan et al., 2016] ◮ CVPR19 [Dendorfer et al., 2019] ◮ KITTI Tracking [Geiger et al., 2012] ◮ VisDrone2018 [Zhu et al., 2018] ◮ DukeMTMC [Ristani et al., 2016] ◮ UA-DETRAC [Wen et al., 2015] ◮ But annotations are only on the bounding box level Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 1

Motivation MOTS: Multi-Object Tracking and Segmentation ◮ In difficult cases, bounding boxes are a very coarse approximation ◮ Most of the pixels of the bounding box belong to other objects Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 2

So let there be Annotations MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are super expensive... ◮ But we did it! Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 3

So let there be Annotations MOTS: Multi-Object Tracking and Segmentation ◮ Dense pixel-wise annotations are super expensive... ◮ But we did it! ◮ How? ◮ Semi-automatic annotation procedure KITTI MOTS MOTSChallenge train val # Sequences 12 9 4 # Frames 5,027 2,981 2,862 # Tracks Pedestrian 99 68 228 # Masks Pedestrian Total 8,073 3,347 26,894 Manually annotated 1,312 647 3,930 # Tracks Car 431 151 - # Masks Car Total 18,831 8,068 - Manually annotated 1,509 593 - Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 4

Outline MOTS: Multi-Object Tracking and Segmentation ◮ Semi-automatic Annotation Procedure ◮ Evaluation Measures ◮ TrackR-CNN Baseline Method ◮ Results Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 5

Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Starting point: existing box level tracking annotations ◮ Fully convolutional network (Box2Seg) converts bounding boxes to segmentation masks Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 6

<latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> <latexit sha1_base64="QI1IQUhOJTV8GznP03tbQWvZVOo=">AB6HicbVBNS8NAEN3Ur1q/qh69LBbBU0lUsMeCF48t2A9oQ9lsJ+3azSbsToQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IJHCoOt+O4WNza3tneJuaW/4PCofHzSNnGqObR4LGPdDZgBKRS0UKCEbqKBRYGETjC5m/udJ9BGxOoBpwn4ERspEQrO0EpNHJQrbtVdgK4TLycVkqMxKH/1hzFPI1DIJTOm57kJ+hnTKLiEWamfGkgYn7AR9CxVLALjZ4tDZ/TCKkMaxtqWQrpQf09kLDJmGgW2M2I4NqveXPzP6UY1vxMqCRFUHy5KEwlxZjOv6ZDoYGjnFrCuBb2VsrHTDONpuSDcFbfXmdtK+q3nXVa95U6rU8jiI5I+fknjkltTJPWmQFuEyDN5JW/Oo/PivDsfy9aCk8+ckj9wPn8A3iOM8Q=</latexit> Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Starting point: dataset with existing box level tracking annotations Pick erroneous Quality standards Global Annotators manually Quality masks reached Start Box2Seg annotate additional Assurance End training polygons For Each Frame Track Box2Seg Box2Seg (train) (eval) Fine-tuning Segment t t on polygons bounding boxes Track with bounding boxes Pixel-Level Object Masks and some polygon annotations Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 7

Semi-automatic Annotation Procedure MOTS: Multi-Object Tracking and Segmentation ◮ Manual corrections ensure consistent and high quality ◮ Large savings in time ◮ KITTI MOTS: only 13% of car boxes / 17% of pedestrian boxes manually annotated ◮ MOTSChallenge: 15% of pedestrian boxes manually annotated Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 8

Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ We consider mask-based variants of the CLEAR MOT metrics [Bernardin and Stiefelhagen, 2008] ◮ Need to establish correspondences between hypothesized and ground truth objects ◮ Box-based tracking: non-trivial due to allowed overlap ◮ Hungarian matching needed ◮ Mask-based: we require disjoint masks! ◮ Correspondences are unique and straightforward ◮ Hypothesized and ground truth masks are matched iff mask IoU > 0 . 5 Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 9

Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ MOTSA: Multi-Object Tracking and Segmentation Accuracy MOTSA = 1 − | FN | + | FP | + | IDS | = | TP | − | FP | − | IDS | | M | | M | ◮ Like MOTA, but with mask-based IoU instead of box IoU ◮ TP: true positives ◮ FN: false negatives ◮ FP: false positives ◮ IDS: ID switches ◮ M: set of ground truth segmentation masks Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 10

Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ � TP: soft number of true positives � � TP = IoU( h , c ( h )) h ∈ TP ◮ c : unique mapping from hypotheses to ground truth Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 11

Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ � TP: soft number of true positives � � TP = IoU( h , c ( h )) h ∈ TP ◮ MOTSP: Multi-Object Tracking and Segmentation Precision � TP MOTSP = | TP | ◮ c : unique mapping from hypotheses to ground truth Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 11

Evaluation Measures MOTS: Multi-Object Tracking and Segmentation ◮ sMOTSA: Soft Multi-Object Tracking and Segmentation Accuracy � TP − | FP | − | IDS | sMOTSA = | M | ◮ Combines tracking and segmentation quality into a single measure Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 12

Baseline Method: TrackR-CNN MOTS: Multi-Object Tracking and Segmentation ◮ Idea: detection, segmentation, and data association with single convolutional network ◮ Extend Mask R-CNN by 3D convolutions and association head ◮ ResNet-101 backbone, Mask R-CNN pre-trained on Mapillary ◮ Speed: ∼ 2 fps . . During . Image Training Features Image Instance Loss Segmentation ... Bounding Box Feature t-1 Regression Ground Truth Extraction Temporally Enhanced Shared Image weights Classification Features CAR: 0.99 Loss CAR: 0.99 Video Tracking CAR: 0.99 + Scoring CAR: 0.99 CAR: 0.99 Ground Truth Region Feature t 2x Proposal Extraction Network During 3D Conv Online Track Evaluation Shared Association Mask weights Generation Previously ... t+1 Feature Tracked Extraction Objects Association Embedding . . . 128-D Association Vectors Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 13

TrackR-CNN MOTS: Multi-Object Tracking and Segmentation ... Image Features Feature ... t-1 Extraction Temporally Enhanced Shared Image weights Features Region Feature t 2x Proposal Extraction 3D Conv Network Shared weights ... Feature t+1 Extraction ... Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 14

TrackR-CNN MOTS: Multi-Object Tracking and Segmentation During Training Image Instance Loss Segmentation Bounding Box Ground Truth Regression Classification Loss CAR: 0.99 Video Tracking CAR: 0.99 CAR: 0.99 + Scoring CAR: 0.99 CAR: 0.99 Ground Truth Region Proposal During Network Evaluation Online Track Association Mask Generation Previously Tracked Objects Association Embedding 128-D Association Vectors Paul Voigtlaender voigtlaender@vision.rwth-aachen.de 15

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender - PowerPoint PPT Presentation

Visual Computing Institute Computer Vision MOTS: Multi-Object Tracking and Segmentation MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint work with M. Krause, A. O sep, J. Luiten, B. B. G. Sekar,

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020

Une histoire de mots inattendus et de gnomes Sophie Schbath ALEA 2017, Marseille, 22 mars 2017

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Manchester 12 th February 2014 Housing Property MOTs Paul Burr, Durham County Council

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D (Multi) Object Detection, Tracking and Segmentation 1 CV3DST | Laura Leal-Taix, Aljoa

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Semantic segmentation Image classification Object detection Semantic segmentation Evolution

How to hand wash How to hand rub COVID-19 personal protective equipment when working in health

An Approximate Perspective on Word Prediction in Context: Ontological Semantics meets BERT

Image Masking Schemes for Local Manifold Learning Methods Marco F. Duarte Joint work with

for Sound Object Initialization Xin Qi and Andrew C. Myers Cornell University Friday, June 3,

Masking Ying Song Cheryl Lau Sabine Ssstrunk cole Polytechnique Fdrale de Lausanne

ProtoDUNE Trigger Proposal Jonathon Sensenig Josh Klein, Nuno Barros, David Rivera 0 Trigger

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Mastering the DMA and IOMMU APIs Embedded Linux Conference Europe 2014 Dsseldorf Laurent

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender - PowerPoint PPT Presentation

Visual Computing Institute Computer Vision MOTS: Multi-Object Tracking and Segmentation MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint work with M. Krause, A. O sep, J. Luiten, B. B. G. Sekar,

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

0 ReMOTS: Refining Multi-Object Tracking and Segmentation (1 st Place Solution for MOTS 2020

Une histoire de mots inattendus et de gnomes Sophie Schbath ALEA 2017, Marseille, 22 mars 2017

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Manchester 12 th February 2014 Housing Property MOTs Paul Burr, Durham County Council

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

3D (Multi) Object Detection, Tracking and Segmentation 1 CV3DST | Laura Leal-Taix, Aljoa

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Semantic segmentation Image classification Object detection Semantic segmentation Evolution

How to hand wash How to hand rub COVID-19 personal protective equipment when working in health

An Approximate Perspective on Word Prediction in Context: Ontological Semantics meets BERT

Image Masking Schemes for Local Manifold Learning Methods Marco F. Duarte Joint work with

for Sound Object Initialization Xin Qi and Andrew C. Myers Cornell University Friday, June 3,

Masking Ying Song Cheryl Lau Sabine Ssstrunk cole Polytechnique Fdrale de Lausanne

ProtoDUNE Trigger Proposal Jonathon Sensenig Josh Klein, Nuno Barros, David Rivera 0 Trigger

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Mastering the DMA and IOMMU APIs Embedded Linux Conference Europe 2014 Dsseldorf Laurent

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation