3d multi object tracking for
play

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris - PowerPoint PPT Presentation

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D multi-object tracking is an important perception task for autonomous driving 2 Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data


  1. 3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1

  2. 3D multi-object tracking is an important perception task for autonomous driving 2

  3. Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 3

  4. Standard 3D MOT Pipeline Sensor Data 3D Object Detection LiDAR RGB Data Association Evaluation 4

  5. Standard 3D MOT Pipeline Sensor Data 3D Object Detection Detection results Data Association Evaluation 5

  6. Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Tracking results Evaluation 6

  7. Standard 3D MOT Pipeline Sensor Data 3D Object Detection Evaluation: Data Association 1. MOTA: MOT accuracy 2. MOTP: MOT precision 3. IDS: # of identity switches 4. FRAG: # of trajectory fragments 5. …… Also important! Evaluation 7

  8. Standard 3D MOT Pipeline Sensor Data 3D Object Detection Data Association Evaluation 8

  9. What is the state of the art? 9

  10. State of the Art (3D MOT) Sensor Data Better models from better (bigger) data! 3D Object * Detection 150x increase! Data Association Evaluation * Mined trajectory data not counted for the Argo dataset 10

  11. State of the Art (3D MOT) Sensor Data Monocular 3D Detection (KITTI) 3D Object Detection 15x increase AP (3 years) Data Association Evaluation 11 Image credit to Patrick Langechuan Liu, https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e

  12. State of the Art (3D MOT) Sensor Data Lidar-based 3D Detection (KITTI) 3D Object Detection 27% increase (2 years) Data Association Evaluation 12

  13. State of the Art (3D MOT) Sensor Data 2D MOT (KITTI) 3D Object Detection 18% increase (4 years) Data Association Evaluation *3D methods compared using 2D evaluation on KITTI 13

  14. State of the Art (3D MOT) Sensor Data 3D Object Detection Recent trend: Jointly optimized Feature Extraction D. Frossard R. Urtasun. End-to-End Learning of Multi-Sensor 3D Data Association Tracking by Detection. ICRA 2018. Optimization Zhang et al. Robust Multi-Modality Multi-Object Tracking. ICCV 2019. Evaluation 14

  15. State of the Art (3D MOT) Sensor Data 3D Object Detection What are open problems in Feature Extraction 3D MOT? Optimization Evaluation 15

  16. Some Open Problems (3D MOT) Sensor Data Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but 3D Object doesn't take into account sensor physics Detection Should also take into account sensor optimization and redundancy Detection and tracking should be coupled Feature Extraction more tightly Representation doesn't take into account context of other objects and the scene Optimization Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Evaluation Weak 3D MOT evaluation datasets and metrics 16

  17. Some Open Problems (3D MOT) Sensor Data Many large-scale datasets but sensor suite and annotations are not unified 3D detection performance is improving but 3D Object doesn't take into account sensor physics Detection Should also take into account sensor optimization and redundancy Detection and tracking should be coupled Feature Extraction more tightly Representation doesn't take into account This talk context of other objects and the scene Optimization Doesn't take into account context of multi- level optimization problem (sensors, forecasting, control) Evaluation Weak 3D MOT evaluation datasets and This talk metrics 17

  18. Recent Work on Evaluation 18

  19. What are the Issues of Evaluation? Sensor Data 3D Object Detector • IoU (intersection of union) • For the pioneering 3D MOT dataset KITTI, evaluation is done in 2D Feature Extractor • IoU is computed on the 2D image plane (not 3D) • The common practice for evaluating 3D MOT methods is: Optimizer • First project 3D trajectories onto the image plane Evaluation • Run the 2D evaluation code provided by KITTI B p : the predicted box B g : the ground truth box B c : the smallest enclosing box I 2D , I 3D : the intersection IoU in 2D space IoU in 3D space 19 Image credit to Xu et al: 3D-GIoU

  20. What are the Issues of Evaluation? Sensor Data 3D Object Detector • Why is it not good to evaluate 3D MOT methods on the 2D space? Feature Extractor • Cannot demonstrate the strength of 3D MOT methods • Throw away the extra information (e.g., depth value, length of the object, heading orientation) Optimizer • Cannot fairly compare 3D MOT methods, why? Evaluation • Not penalized by the wrong predicted depth value, length, heading as long as the 2D projection is good • Which predicted box is better, blue or green? • Conclusion: should not use 2D metrics to evaluate 3D MOT methods Blue: the predicted box 1 Green: the predicted box 2 C Red: the ground truth box 20

  21. Our Solution: Upgrade the Metrics Using 3D IoU • Replace the metrics in KITTI evaluation code with 2D IoU by 3D IoU • https://github.com/xinshuoweng/AB3DMOT ( ~800 stars ) • Work with nuTonomy collaborators and use our 3D metrics in the nuScenes evaluation • https://www.nuscenes.org/ nuScenes 3D MOT evaluation with our metrics Our released new evaluation code 21 X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.

  22. What are the Issues of Evaluation? • Are we done with the evaluation? Can we further improve the current metrics? • E.g., MOTA (multi-object tracking accuracy) • Performance is measured at a single recall point • Common practice • Select a confidence threshold, e.g., 0.9 • Filter out detections with lower confidence • Data association performed on the rest of detections MOTA over Recall curve 22

  23. What are the Issues of Evaluation? • Why is it not good to evaluate at a single recall point? • Consequences • The confidence threshold needs to be carefully tuned, non-trivial effort • Cannot understand the full spectrum of accuracy and precision of a MOT system • Which MOT system is better, blue or orange? • The orange one has higher MOTA at its best recall point (r = 0.9) • The blue one has overall higher MOTA at many recall points • Ideally, we want high performance on all recall points MOTA over Recall curve 3D MOT system 1 3D MOT system 2 1 0.9 0.8 0.7 MOTA 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 23 Recall

  24. Our Solution: Integral Metrics • MOTA does not take into account of the confidence Area under the curve • What do we do to improve the evaluation? • Compute the integral metrics through the area under the curve, e.g., average MOTA (AMOTA) • Analogous to the average precision (AP) in object detection • Can model the full spectrum of the MOT accuracy now MOTA over Recall curve 24 X. Weng, K. Kitani. A Baseline for 3D Multi-Object Tracking. arXiv 2019.

  25. Recent Work on Improve Feature Learning for 3D MOT 25

  26. What are the Issues of Feature Learning? Sensor Data 3D Object Detector • Goal: learn discriminative features for different objects • Issues in the feature learning? Feature Extractor • Feature extraction for each object is independent of other objects • Why not good? No communication between objects, ignoring the context information Optimizer • Employ feature from one or two modalities • E.g., 2D appearance, or 2D motion, or 3D motion, or 3D appearance Evaluation • Why not good? Not utilize all the information that is complementary 2D (or 3D) feature extractor Objects in frame t Affinity matrix Pipeline from Hungarian algorithm Prior work 2D (or 3D) feature extractor Objects in frame t+1 frame t frame t+1 26 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.

  27. Improve Feature Learning for 3D MOT Sensor Data 3D Object Detector • How can we address these two issues? • Shouldn’t feature depending on the context of other objects? Feature Extractor • Propose a novel feature interaction mechanism Optimizer • How can we utilize the information from all the modalities? • Extract multi-modal features that are complimentary to each other Evaluation • i.e., 2D motion + 2D appearance + 3D motion + 3D appearance 2D + 3D feature extractor Objects in frame t Affinity matrix Feature Pipeline from interaction Our work Hungarian algorithm 2D + 3D feature extractor Objects in frame t+1 Iteratively frame t frame t+1 27 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020.

  28. Improve Feature Learning for 3D MOT Sensor Data • Is encoding the multi-modal features really useful? 3D Object Detector • Answer: Yes Feature Extractor • We should encode different features so that they can compliment each other Optimizer Evaluation Use feature from single modality Use feature from multiple modalities: Performance increased! A: appearance feature, M: motion feature 31 X. Weng, Y. Wang, Y. Man, K. Kitani. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning. CVPR 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend