Object Detection and Tracking in 3D World
Xinshuo Weng
Object Detection and Tracking in 3D World Xinshuo Weng 3D Object - - PowerPoint PPT Presentation
Object Detection and Tracking in 3D World Xinshuo Weng 3D Object Detection Goal Goal Inputs: LiDAR point cloud Goal Inputs: LiDAR point cloud Monocular Images Goal Inputs: LiDAR point cloud Monocular
Xinshuo Weng
○ LiDAR point cloud
○ LiDAR point cloud ○ Monocular Images
○ LiDAR point cloud ○ Monocular Images ○ Stereo images
Left Right
○ LiDAR point cloud ○ Monocular Images ○ Stereo images ○ Or fusion
○ LiDAR point cloud ○ Monocular Images ○ Stereo images ○ Or fusion
○ Eight corners ○ Four corners + height ○ Size (l,w,h) + center (x,y,z) + heading (𝜾)
Shi et al, “PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud”, CVPR, 2019.
Mousavian et al, “3D Bounding Box Estimation Using Deep Learning and Geometry”, CVPR, 2017.
○ Provide 4 constraints
Mousavian et al, “3D Bounding Box Estimation Using Deep Learning and Geometry”, CVPR, 2017.
○ Provide 4 constraints ○ Need at least another three
Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
Size (l, w, h) 2D bounding box (x, y, z, 𝜾)
Li et al, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, CVPR, 2019.
Matching loss
Qi et al, “Frustum PointNets for 3D Object Detection from RGB-D Data”, CVPR, 2018.
LiDAR-based 3D detection Monocular 3D detection
○ Pseudo-LiDAR framework ○ Two observations: ■ Long tail ■ Local misalignment
○ Pseudo-LiDAR framework ○ Two observations: ■ Long tail – instance mask proposal ■ Local misalignment
○ Pseudo-LiDAR framework ○ Two observations: ■ Long tail – instance mask proposal ■ Local misalignment – bounding box consistency loss (BBCL) and optimization (BBCO)
monocular methods
[6] R. Urtasun et al (University of Toronto). Monocular 3D Object Detection for Autonomous Driving. CVPR 2016. [30] J. Kosecka (George Mason Unibrtsity). 3D Bounding Box Estimation Using Deep Learning and Geometry. CVPR 2017. [58] Z. Chen (Wuhan University) et al. Multi-Level Fusion based 3D Object Detection from Monocular Images. CVPR 2018.
○ LiDAR point cloud ○ Monocular Image ○ Stereo image, add video ○ Or fusion
○ Eight corners ○ Four corners + height ○ Size + center + orientation ○ identity
○ LiDAR point cloud ○ Monocular Image ○ Stereo image, add video ○ Or fusion
○ Eight corners ○ Four corners + height ○ Size + center + orientation ○ Identity – association problem
Deep motion network Deep association network Deep appearance network
Luo et al, “Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net”, CVPR, 2018.
Baser et al, “FANTrack: 3D Multi-Object Tracking with Feature Association Network”, arXiv, 2019.
SimNet AssocNet
Frossard et al, “End-to-end Learning of Multi-sensor 3D Tracking by Detection”, ICRA, 2018.
○ Detection: state-of-the-art 3D object detector ---- PointRCNN ○ Tracking: Kalman filter with 3D constant velocity model + Hungarian algorithm, no appearance model
published works
2D tracking results on KITTI test set 3D tracking results on KITTI validation set
[1] Raquel Urtasun. End-to-End Learning of Multi-Sensor 3D Tracking by Detection. ICRA 2018. [2] Krzysztof Czarnecki. University of Waterloo. FANTrack: 3D Multi-Object Tracking with Feature Association Network. arXiv 2019. [3] Karl Granstrom, Chalmer University of Technology. Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering. ITSC 2018. [5] K. Madhava Krishna. IIIT Hyderabad, India. Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking. ICRA 2018.
performance in practice