Deep Hough Voting for 3D Object Detection in Point Clouds Charles - - PowerPoint PPT Presentation
Deep Hough Voting for 3D Object Detection in Point Clouds Charles - - PowerPoint PPT Presentation
Deep Hough Voting for 3D Object Detection in Point Clouds Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas; The IEEE International Conference on Computer Vision (ICCV), 2019 Jan Bayer, Computational Robotics Laboratory, Czech Technical
Introduction
- Goal: detect object classes, and bounding boxes from
3D point clouds
- Input: Uncolored 3D point clouds
–
Robust to illumination changes
- Contribution
–
A reformulation of Hough voting in the context of deep learning through an end-to-end difgerentiable architecture
–
State-of-the-art 3D object detection performance
- n SUN RGB-D and ScanNet
–
An in-depth analysis of the importance of voting for 3D object detection in point clouds
Deep Hough Voting for 3D Object Detection in Point Clouds, Qi et al. ICCV 2019
3D object detection methods
- Extended 2D-based detectors to 3D
–
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.
–
Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016
–
3D CNN detectors → high cost of 3D convolutions
- Projection to 2D bird’s eye view images
–
Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017
–
Designed for outdoor LIDAR data
- 2D-based detectors, projection to point cloud
–
Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
–
2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017
–
Strictly dependent on the 2D detector
–
2D object detection quickly reduces the search space
Extended 2D-based detectors to 3D
- 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.
–
Fuse both 2D RGB input features with 3D scan geometry features
3D object detection methods
- Extended 2D-based detectors to 3D
–
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.
–
Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016
–
3D CNN detectors → high cost of 3D convolutions
- Projection to 2D bird’s eye view images
–
Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017
–
Designed for outdoor LIDAR data
- 2D-based detectors, projection to point cloud
–
Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
–
2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017
–
Strictly dependent on the 2D detector
–
2D object detection quickly reduces the search space
Projection to 2D bird’s eye view images
MV3D: Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017
- Data from front RGB camera, and LIDAR → 3 views are used to generate 2D features
- Fused features are used to jointly predict
- bject class and do oriented 3D box
regression
3D object detection methods
- Extended 2D-based detectors to 3D
–
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.
–
Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016
–
3D CNN detectors → high cost of 3D convolutions
- Projection to 2D bird’s eye view images
–
Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017
–
Designed for outdoor LIDAR data
- 2D-based detectors, projection to point cloud
–
Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
–
2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017
–
Strictly dependent on the 2D detector
–
2D object detection quickly reduces the search space
2D-based detectors, projection to point cloud
- F-PointNet Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
- 2D CNN object detector to propose 2D regions and classify their content
- Similar architecture to older approach 2D-driven
3D object detection methods
- Extended 2D-based detectors to 3D
–
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.
–
Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016
–
3D CNN detectors → high cost of 3D convolutions
- Projection to 2D bird’s eye view images
–
Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017
–
Designed for outdoor LIDAR data
- 2D-based detectors, projection to point cloud
–
Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
–
2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017
–
Strictly dependent on the 2D detector
–
2D object detection quickly reduces the search space
VoteNet
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Pointnet
- Pointnet
–
Pointnet: Deep learning on point sets for 3d classifjcation and segmentation, Qi et al. CoRR 2016
–
For input point cloud of the size N
- Generates N local features (one for each input point)
- Generates single global feature
- Processing the combination of local and global features → classifjcation, and 3d scene segmentation
Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Qi et al. CoRR 2017
Pointnet++
–
Improves Pointnet recognition of fjne-grained patterns, and complex scenes segmentation
–
For N input points generate M feature points for classifjcation, segmentation requires upsampling to provide information for all the input points
VoteNet – point cloud feature learning backbone
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
- 4 Set abstraction layers
–
Sub sampling: 2048, 1024, 512, 256
- 2 Upsampling layers
–
Upsampling to 1024 points with C=256
–
Interpolate the features on input points to output points (weighted average of 3 nearest input point features)
VoteNet
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Hough voting
- Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, Kehl et al. ECCV 2016
- Sampling the image generates patches
- Network is used to regress features for k-NN search
- Codebook contains pre-computed associations between features and 6D object poses
VoteNet - voting
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
–
Deep NN generates votes directly from the input features
- More effjcient than kNN lookups
- MLP net with fully connected layers
- For each seed is generated one vote
independently on others
– Vote is 3d ofgset of the object center,
relative to the feature position
VoteNet
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
VoteNet – object proposal and classification
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
- Sampling and grouping
–
Votes are divided into K clusters by spatial clustering
- Classifjcation, object location, and boundaries
–
PointNet-like network aggregates the votes in
- rder to generate object proposals
–
Output – set of object proposals:
- Objectness score
- Bounding box
parameters
– Center – Heading – Scale
- Semantic classifjcation
score
Indoor evaluation datasets: description
- ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, Dai et al. CVPR 2017
–
Available at: http://www.scan-net.org/ScanNet/
–
RGB-D data from real-world environments
–
2.5 million views, 15013 scans, 707 spaces
–
Annotated with 3D camera poses
–
Surface reconstructions, CAD models
–
Instance-level semantic segmentations.
- SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, Song et al. CVPR 2015
–
Sensors: Asus Xtion, Intel Ralsense, Microsoft Kinect
–
Available at: http://rgbd.cs.princeton.edu/
–
10,335 RGB-D images (including NYU, B3DO, SUN3D)
–
Annotated 64,595 3D bounding boxes
–
800 object categories, 47 scene categories
–
Built method: SIFT+RANSAC + point-plane ICP
Object detection results on SUN RGB-D set
- Evaluation metric: mean Average Precision (mAP)
–
Intersection over Union (IoU) for thresholding correctly matched objects
- 5000 RGB-D training images with amodal oriented 3D bounding boxes for 37 object categories.
- VoteNet model is 4x smaller than F-PointNet model
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
- Evaluated with 3D IoU threshold 0.25
Object detection results - SUN RGB-D dataset
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Object detection results on ScanNetV2 set
- 1200 training examples, hundreds of rooms, 18 object categories
- VoteNet used non colored point clouds, while others not
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Object detection results on ScanNetV2 set
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Comparing VoteNet with no-vote baseline
- BoxNet
–
VoteNet: bounding boxes are estimated from vote clusters
–
BoxNet: proposes boxes directly from seed points on
- bject surfaces
- For sparse 3D pointclouds, some points are far from object
centroids
–
Direct proposal lowers their confjdence
–
Voting allows to reinforce hypotheses generated based on these points (if their votes are close to the centroids)
- Seeds, if sampled, would generate
accurate bounding boxes
- ScanNet scene
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
Voting: further analysis
- Average object-center distance
- Comparing VoteNet with no-vote baseline
Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019
- Voting accuracy gain of VoteNet
- ScanNet scene with votes coming from
- bject points