Deep Hough Voting for 3D Object Detection in Point Clouds Charles - - PowerPoint PPT Presentation

deep hough voting for 3d object detection in point clouds
SMART_READER_LITE
LIVE PREVIEW

Deep Hough Voting for 3D Object Detection in Point Clouds Charles - - PowerPoint PPT Presentation

Deep Hough Voting for 3D Object Detection in Point Clouds Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas; The IEEE International Conference on Computer Vision (ICCV), 2019 Jan Bayer, Computational Robotics Laboratory, Czech Technical


slide-1
SLIDE 1

Deep Hough Voting for 3D Object Detection in Point Clouds

Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas; The IEEE International Conference on Computer Vision (ICCV), 2019 Jan Bayer, Computational Robotics Laboratory, Czech Technical University in Prague

slide-2
SLIDE 2

Introduction

  • Goal: detect object classes, and bounding boxes from

3D point clouds

  • Input: Uncolored 3D point clouds

Robust to illumination changes

  • Contribution

A reformulation of Hough voting in the context of deep learning through an end-to-end difgerentiable architecture

State-of-the-art 3D object detection performance

  • n SUN RGB-D and ScanNet

An in-depth analysis of the importance of voting for 3D object detection in point clouds

Deep Hough Voting for 3D Object Detection in Point Clouds, Qi et al. ICCV 2019

slide-3
SLIDE 3

3D object detection methods

  • Extended 2D-based detectors to 3D

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.

Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016

3D CNN detectors → high cost of 3D convolutions

  • Projection to 2D bird’s eye view images

Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017

Designed for outdoor LIDAR data

  • 2D-based detectors, projection to point cloud

Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018

2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017

Strictly dependent on the 2D detector

2D object detection quickly reduces the search space

slide-4
SLIDE 4

Extended 2D-based detectors to 3D

  • 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.

Fuse both 2D RGB input features with 3D scan geometry features

slide-5
SLIDE 5

3D object detection methods

  • Extended 2D-based detectors to 3D

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.

Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016

3D CNN detectors → high cost of 3D convolutions

  • Projection to 2D bird’s eye view images

Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017

Designed for outdoor LIDAR data

  • 2D-based detectors, projection to point cloud

Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018

2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017

Strictly dependent on the 2D detector

2D object detection quickly reduces the search space

slide-6
SLIDE 6

Projection to 2D bird’s eye view images

MV3D: Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017

  • Data from front RGB camera, and LIDAR → 3 views are used to generate 2D features
  • Fused features are used to jointly predict
  • bject class and do oriented 3D box

regression

slide-7
SLIDE 7

3D object detection methods

  • Extended 2D-based detectors to 3D

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.

Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016

3D CNN detectors → high cost of 3D convolutions

  • Projection to 2D bird’s eye view images

Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017

Designed for outdoor LIDAR data

  • 2D-based detectors, projection to point cloud

Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018

2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017

Strictly dependent on the 2D detector

2D object detection quickly reduces the search space

slide-8
SLIDE 8

2D-based detectors, projection to point cloud

  • F-PointNet Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018
  • 2D CNN object detector to propose 2D regions and classify their content
  • Similar architecture to older approach 2D-driven
slide-9
SLIDE 9

3D object detection methods

  • Extended 2D-based detectors to 3D

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019.

Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016

3D CNN detectors → high cost of 3D convolutions

  • Projection to 2D bird’s eye view images

Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017

Designed for outdoor LIDAR data

  • 2D-based detectors, projection to point cloud

Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018

2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017

Strictly dependent on the 2D detector

2D object detection quickly reduces the search space

slide-10
SLIDE 10

VoteNet

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-11
SLIDE 11

Pointnet

  • Pointnet

Pointnet: Deep learning on point sets for 3d classifjcation and segmentation, Qi et al. CoRR 2016

For input point cloud of the size N

  • Generates N local features (one for each input point)
  • Generates single global feature
  • Processing the combination of local and global features → classifjcation, and 3d scene segmentation
slide-12
SLIDE 12

Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Qi et al. CoRR 2017

Pointnet++

Improves Pointnet recognition of fjne-grained patterns, and complex scenes segmentation

For N input points generate M feature points for classifjcation, segmentation requires upsampling to provide information for all the input points

slide-13
SLIDE 13

VoteNet – point cloud feature learning backbone

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  • 4 Set abstraction layers

Sub sampling: 2048, 1024, 512, 256

  • 2 Upsampling layers

Upsampling to 1024 points with C=256

Interpolate the features on input points to output points (weighted average of 3 nearest input point features)

slide-14
SLIDE 14

VoteNet

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-15
SLIDE 15

Hough voting

  • Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, Kehl et al. ECCV 2016
  • Sampling the image generates patches
  • Network is used to regress features for k-NN search
  • Codebook contains pre-computed associations between features and 6D object poses
slide-16
SLIDE 16

VoteNet - voting

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

Deep NN generates votes directly from the input features

  • More effjcient than kNN lookups
  • MLP net with fully connected layers
  • For each seed is generated one vote

independently on others

– Vote is 3d ofgset of the object center,

relative to the feature position

slide-17
SLIDE 17

VoteNet

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-18
SLIDE 18

VoteNet – object proposal and classification

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  • Sampling and grouping

Votes are divided into K clusters by spatial clustering

  • Classifjcation, object location, and boundaries

PointNet-like network aggregates the votes in

  • rder to generate object proposals

Output – set of object proposals:

  • Objectness score
  • Bounding box

parameters

– Center – Heading – Scale

  • Semantic classifjcation

score

slide-19
SLIDE 19

Indoor evaluation datasets: description

  • ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, Dai et al. CVPR 2017

Available at: http://www.scan-net.org/ScanNet/

RGB-D data from real-world environments

2.5 million views, 15013 scans, 707 spaces

Annotated with 3D camera poses

Surface reconstructions, CAD models

Instance-level semantic segmentations.

  • SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, Song et al. CVPR 2015

Sensors: Asus Xtion, Intel Ralsense, Microsoft Kinect

Available at: http://rgbd.cs.princeton.edu/

10,335 RGB-D images (including NYU, B3DO, SUN3D)

Annotated 64,595 3D bounding boxes

800 object categories, 47 scene categories

Built method: SIFT+RANSAC + point-plane ICP

slide-20
SLIDE 20

Object detection results on SUN RGB-D set

  • Evaluation metric: mean Average Precision (mAP)

Intersection over Union (IoU) for thresholding correctly matched objects

  • 5000 RGB-D training images with amodal oriented 3D bounding boxes for 37 object categories.
  • VoteNet model is 4x smaller than F-PointNet model

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  • Evaluated with 3D IoU threshold 0.25
slide-21
SLIDE 21

Object detection results - SUN RGB-D dataset

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-22
SLIDE 22

Object detection results on ScanNetV2 set

  • 1200 training examples, hundreds of rooms, 18 object categories
  • VoteNet used non colored point clouds, while others not

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-23
SLIDE 23

Object detection results on ScanNetV2 set

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-24
SLIDE 24

Comparing VoteNet with no-vote baseline

  • BoxNet

VoteNet: bounding boxes are estimated from vote clusters

BoxNet: proposes boxes directly from seed points on

  • bject surfaces
  • For sparse 3D pointclouds, some points are far from object

centroids

Direct proposal lowers their confjdence

Voting allows to reinforce hypotheses generated based on these points (if their votes are close to the centroids)

  • Seeds, if sampled, would generate

accurate bounding boxes

  • ScanNet scene

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

slide-25
SLIDE 25

Voting: further analysis

  • Average object-center distance
  • Comparing VoteNet with no-vote baseline

Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  • Voting accuracy gain of VoteNet
  • ScanNet scene with votes coming from
  • bject points