3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1

3D multi-object tracking is an important perception task for autonomous driving 2

Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 3

Standard 3D MOT Pipeline Sensor Data 3D Object Detection LiDAR RGB Data Association Evaluation 4

Standard 3D MOT Pipeline Sensor Data 3D Object Detection Detection results Data Association Evaluation 5

Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Tracking results Evaluation 6

Standard 3D MOT Pipeline Sensor Data 3D Object Detection Evaluation: Data Association 1. MOTA: MOT accuracy 2. MOTP: MOT precision 3. IDS: # of identity switches 4. FRAG: # of trajectory fragments 5. …… Also important! Evaluation 7

Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 8

What is the state of the art? 9

State of the Art (3D MOT) Sensor Data Better models from better (bigger) data! 3D Object * Detection 150x increase! Data Association Evaluation * Mined trajectory data not counted for the Argo dataset 10

State of the Art (3D MOT) Sensor Data Monocular 3D Detection (KITTI) 3D Object Detection 15x increase AP (3 years) Data Association Evaluation 11 Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e

State of the Art (3D MOT) Sensor Data Lidar-based 3D Detection (KITTI) 3D Object Detection 27% increase (2 years) Data Association Evaluation 12

State of the Art (3D MOT) Sensor Data 2D MOT (KITTI) 3D Object Detection 18% increase (4 years) Data Association Evaluation *3D methods compared using 2D evaluation on KITTI 13

State of the Art (3D MOT) Sensor Data 3D Object Detection Recent trend: Jointly optimized Feature Extraction D. Frossard R. Urtasun. End-to-End Learning of Multi-Sensor 3D Data Association Tracking by Detection. ICRA 2018. Optimization Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019. Evaluation 14

State of the Art (3D MOT) Sensor Data 3D Object Detection What are open problems in Feature Extraction 3D MOT? Optimization Evaluation 15

Some Open Problems (3D MOT) Sensor Data Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but 3D Object doesn't take into account sensor physics Detection Should also take into account sensor optimization and redundancy Detection and tracking should be coupled Feature Extraction more tightly Representation doesn't take into account context of other objects and the scene Optimization Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Evaluation Weak 3D MOT evaluation datasets and metrics 16

Some Open Problems (3D MOT) Sensor Data Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but 3D Object doesn't take into account sensor physics Detection Should also take into account sensor optimization and redundancy Detection and tracking should be coupled Feature Extraction more tightly Representation doesn't take into account This talk context of other objects and the scene Optimization Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Evaluation Weak 3D MOT evaluation datasets and This talk metrics 17

Recent Work on Evaluation 18

What are the Issues of Evaluation? Sensor Data 3D Object Detector • IoU (intersection of union) • For the pioneering 3D MOT dataset KITTI, evaluation is done in 2D Feature Extractor • IoU is computed on the 2D image plane (not 3D) • The common practice for evaluating 3D MOT methods is: Optimizer • First project 3D trajectories onto the image plane Evaluation • Run the 2D evaluation code provided by KITTI B p : the predicted box B g : the ground truth box B c : the smallest enclosing box I 2D , I 3D : the intersection IoU in 2D space IoU in 3D space 19 Image credit to Xu et al: 3D-GIoU

What are the Issues of Evaluation? Sensor Data 3D Object Detector • Why is it not good to evaluate 3D MOT methods on the 2D space? Feature Extractor • Cannot demonstrate the strength of 3D MOT methods • Throw away the extra information (e.g., depth value, length of the object, heading orientation) Optimizer • Cannot fairly compare 3D MOT methods, why? Evaluation • Not penalized by the wrong predicted depth value, length, heading as long as the 2D projection is good • Which predicted box is better, blue or green? • Conclusion: should not use 2D metrics to evaluate 3D MOT methods Blue: the predicted box 1 Green: the predicted box 2 C Red: the ground truth box 20

Our Solution: Upgrade the Metrics Using 3D IoU • Replace the metrics in KITTI evaluation code with 2D IoU by 3D IoU • https://github.com/xinshuoweng/AB3DMOT ( ~800 stars ) • Work with nuTonomy collaborators and use our 3D metrics in the nuScenes evaluation • https://www.nuscenes.org/ nuScenes 3D MOT evaluation with our metrics Our released new evaluation code 21 X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.

What are the Issues of Evaluation? • Are we done with the evaluation? Can we further improve the current metrics? • E.g., MOTA (multi-object tracking accuracy) • Performance is measured at a single recall point • Common practice • Select a confidence threshold, e.g., 0.9 • Filter out detections with lower confidence • Data association performed on the rest of detections MOTA over Recall curve 22

What are the Issues of Evaluation? • Why is it not good to evaluate at a single recall point? • Consequences • The confidence threshold needs to be carefully tuned, non-trivial effort • Cannot understand the full spectrum of accuracy and precision of a MOT system • Which MOT system is better, blue or orange? • The orange one has higher MOTA at its best recall point (r = 0.9) • The blue one has overall higher MOTA at many recall points • Ideally, we want high performance on all recall points MOTA over Recall curve 3D MOT system 1 3D MOT system 2 1 0.9 0.8 0.7 MOTA 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 23 Recall

Our Solution: Integral Metrics • MOTA does not take into account of the confidence Area under the curve • What do we do to improve the evaluation? • Compute the integral metrics through the area under the curve, e.g., average MOTA (AMOTA) • Analogous to the average precision (AP) in object detection • Can model the full spectrum of the MOT accuracy now MOTA over Recall curve 24 X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.

Recent Work on Improve Feature Learning for 3D MOT 25

What are the Issues of Feature Learning? Sensor Data 3D Object Detector • Goal: learn discriminative features for different objects • Issues in the feature learning? Feature Extractor • Feature extraction for each object is independent of other objects • Why not good? No communication between objects, ignoring the context information Optimizer • Employ feature from one or two modalities • E.g., 2D appearance, or 2D motion, or 3D motion, or 3D appearance Evaluation • Why not good? Not utilize all the information that is complementary 2D (or 3D) feature extractor Objects in frame t Affinity matrix Pipeline from Hungarian algorithm Prior work 2D (or 3D) feature extractor Objects in frame t+1 frame t frame t+1 26 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.

Improve Feature Learning for 3D MOT Sensor Data 3D Object Detector • How can we address these two issues? • Shouldn’t feature depending on the context of other objects? Feature Extractor • Propose a novel feature interaction mechanism Optimizer • How can we utilize the information from all the modalities? • Extract multi-modal features that are complimentary to each other Evaluation • i.e., 2D motion + 2D appearance + 3D motion + 3D appearance 2D + 3D feature extractor Objects in frame t Affinity matrix Feature Pipeline from interaction Our work Hungarian algorithm 2D + 3D feature extractor Objects in frame t+1 Iteratively frame t frame t+1 27 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.

Improve Feature Learning for 3D MOT Sensor Data • Is encoding the multi-modal features really useful? 3D Object Detector • Answer: Yes Feature Extractor • We should encode different features so that they can compliment each other Optimizer Evaluation Use feature from single modality Use feature from multiple modalities: Performance increased! A: appearance feature, M: motion feature 31 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D multi-object tracking is an important perception task for autonomous driving 2 Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Tracking H akan Ard o February 22, 2012 H akan Ard o Tracking February 22, 2012 1

Video Object Tracking Real-time tracking of objects in video is an important problem in various

Multi-Object Synchronization Chapter 6 OSPP Part I Multi-Object Programs What happens when

GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusaw

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Tracking Articulated Objects Alexander (Sasha) Lambert CS7495 Fall 2014 Tracking From Depth

Tracking - VSO Framework Tracking Status Controlling Actions 100% Configuring Actions Device

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

* * 2 :

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can

Analysis of Ultra High Energetic Cosmic Rays measured in monocular mode with the fmuorescence

A PODS-based Extended Kalman Filter: Quantifying Sensing Uncertainties in Automatic Bird Species

Deep Structured Learning Chunhua Shen School of Computer Science, The University of Adelaide

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D multi-object tracking is an important perception task for autonomous driving 2 Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data

Multi-Object Tracking Challenge CV3DST Lecture Exercises Multi-Object Tracking Multi-Object

Overview Introduction Object Tracking Vehicle Tracking Theory &amp; Implementation

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Tracking H akan Ard o March 4, 2013 H akan Ard o Tracking March 4, 2013 1 / 57

Similarity Mapping with Enhanced Siamese Network for Multi-object Tracking Minyoung Kim

MOTS: Multi-Object Tracking and Segmentation Paul Voigtlaender RWTH Aachen University Joint

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Tracking H akan Ard o February 22, 2012 H akan Ard o Tracking February 22, 2012 1

Video Object Tracking Real-time tracking of objects in video is an important problem in various

Multi-Object Synchronization Chapter 6 OSPP Part I Multi-Object Programs What happens when

GPU-Accelerated Object Tracking Using Particle Filtering and Appearance-adaptive Models Bogusaw

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Tracking Articulated Objects Alexander (Sasha) Lambert CS7495 Fall 2014 Tracking From Depth

Tracking - VSO Framework Tracking Status Controlling Actions 100% Configuring Actions Device

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

* * 2 :

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can

Analysis of Ultra High Energetic Cosmic Rays measured in monocular mode with the fmuorescence

A PODS-based Extended Kalman Filter: Quantifying Sensing Uncertainties in Automatic Bird Species

Deep Structured Learning Chunhua Shen School of Computer Science, The University of Adelaide

Overview Introduction Object Tracking Vehicle Tracking Theory & Implementation