3D Multi-Object Tracking for Autonomous Driving
Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier
September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu
1
3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng - - PowerPoint PPT Presentation
3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu 1 3D
Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier
September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu
1
2
3
3D Object Detection Data Association
Evaluation Sensor Data
4
3D Object Detection Data Association
Evaluation Sensor Data
LiDAR RGB
5
3D Object Detection Data Association
Evaluation Sensor Data
Detection results
6
3D Object Detection Data Association
Evaluation Sensor Data
Tracking results
7
Also important!
3D Object Detection Data Association
Evaluation Sensor Data
Evaluation:
fragments
8
3D Object Detection Data Association
Evaluation Sensor Data
9
10
3D Object Detection Data Association
Evaluation Sensor Data
* Mined trajectory data not counted for the Argo dataset
*
150x increase!
11
Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e
AP
3D Object Detection Data Association
Evaluation Sensor Data
15x increase (3 years)
12
3D Object Detection Data Association
Evaluation Sensor Data
27% increase (2 years)
13
3D Object Detection Data Association
Evaluation Sensor Data
18% increase (5 years)
14
3D Object Detection Data Association
Evaluation Sensor Data
Feature Extraction Optimization
End-to-End Learning of Multi-Sensor 3D Tracking by Detection. ICRA 2018. Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019.
Jointly optimized
Recent trend:
15
3D Object Detection Evaluation
Sensor Data
Feature Extraction Optimization
16
3D Object Detection Evaluation
Sensor Data
Feature Extraction Optimization
Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects and the scene Weak 3D MOT evaluation datasets and metrics Should also take into account sensor
Detection and tracking should be coupled more tightly This talk This talk
17
18
IoU in 2D space
Image credit to Xu et al: 3D-GIoU
IoU in 3D space
Bp: the predicted box Bg: the ground truth box Bc: the smallest enclosing box I2D, I3D: the intersection
19
C
Blue: the predicted box 1 Green: the predicted box 2 Red: the ground truth box
20
nuScenes evaluation with the matching criteria of center distance
Our released new evaluation code nuScenes 3D MOT evaluation with our metrics
improve the current metrics?
()*!"
21
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3D MOT system 1 3D MOT system 2
MOTA Recall
22
MOTA over Recall curve
curve, e.g., average MOTA (AMOTA)
23
Area under the curve
24
25
2D (or 3D) feature extractor 2D (or 3D) feature extractor Objects in frame t Objects in frame t+1 frame t frame t+1 Hungarian algorithm Affinity matrix
Pipeline from Prior work
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
26
2D + 3D feature extractor 2D + 3D feature extractor
Feature interaction
Objects in frame t+1 Objects in frame t frame t frame t+1 Hungarian algorithm Affinity matrix
Pipeline from Our work
Iteratively
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
27
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
28
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
29
Graph Neural Networks
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
30
each other
Use feature from single modality
A: appearance feature, M: motion feature
Use feature from multiple modalities: Performance increased!
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
31
information
Performance largely increased with GNN layers = 3 v.s. 0 !
3D Object Detector Feature Extractor
Optimizer Evaluation
Sensor Data
32
33
34
3D Object Detector Feature Extractor 3D detections
Pairwise affinity matrix
Optimizer
3D Object Trajectories Sensor data
Jointly
Trajectory Forecasting
Past object trajectories
35
Pipeline from Prior Work
Detected
current frame
Objects trajectories in past H frames
Last frame Current frame
Trajectory forecasting head Predicted trajectories in future T frames Objects trajectories up to current frame
3D Multi-Object Tracking Trajectory Forecasting
Feature extraction Feature extraction 3D MOT head Feature extraction
Separate
3D Object Detector Feature Extractor
Optimizer
Trajectory Forecasting
Sensor Data
36
Edge features
Diversity sampling
Node features
GNN for feature interaction
Predicted trajectories in future T frames Detected objects in current frame Objects trajectories in past H frames
Last frame
Current frame Feature extraction Feature extraction
3D MOT head Trajectory forecasting head
Joint 3D Tracking and Forecasting
3D Object Detector Feature Extractor
Optimizer
Trajectory Forecasting
Sensor Data
37
3D MOT evaluation without forecasting module
3D Object Detector Feature Extractor
Optimizer
Trajectory Forecasting
Sensor Data
38
Forecasting evaluation without 3D MOT Performance improved after adding MOT!
3D Object Detector Feature Extractor
Optimizer
Trajectory Forecasting
Sensor Data
39
Edge features Node features
GNN for feature interaction
Trajectories up to the current frames Anchors in current frame Objects trajectories in past H frames
Last frame
Current frame Feature extraction Feature extraction
3D MOT head Object detection head
Joint 3D Tracking and Forecasting
3D Object Detector Feature Extractor
Optimizer
Trajectory Forecasting
Sensor Data
40
Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020
3D Object Detection Trajectory Forecasting
Sensor Data Feature Extraction
Optimization
42
Weng et al. Unsupervised Sequence Forecasting of 100,000 Points for Unsupervised Trajectory Forecasting. arXiv 2020
43
3D Object Detection Evaluation
Sensor Data Feature Extraction
Optimization
Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but doesn't take into account sensor physics Doesn't take into account context of multi-level optimization problem (sensors, forecasting, control) Representation doesn't take into account context of other objects, scene and past Need better 3D MOT evaluation Detection and tracking should be coupled more tightly
Dynamics models should be customized to object type
Evaluation Metrics. IROS 2020
Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020
Network and Diversity Sampling. arXiv 2020
Neural Networks and Complete Feature Learning. arXiv 2020
44
Xinshuo Weng Robotics Institute, Carnegie Mellon University RI PhD Speaking Qualifier
September 24, 2020 Committee Member: Kris Kitani (advisor), Martial Hebert, David Held, Peiyun Hu
45