POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/2018 1 / 57 - - PowerPoint PPT Presentation

point cloud deep learning
SMART_READER_LITE
LIVE PREVIEW

POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/2018 1 / 57 - - PowerPoint PPT Presentation

POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/2018 1 / 57 Introduction Previous Work AGENDA Method Result Conclusion 2 / 57 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object


slide-1
SLIDE 1

1 / 57

Innfarn Yoo, 3/29/2018

POINT CLOUD DEEP LEARNING

slide-2
SLIDE 2

2 / 57 2 / 57

AGENDA

  • Introduction
  • Previous Work
  • Method
  • Result
  • Conclusion
slide-3
SLIDE 3

3 / 57

INTRODUCTION

slide-4
SLIDE 4

4 / 57

2D OBJECT CLASSIFICATION

  • Convolutional Neural Network (CNN) for 2D images

works really well

  • AlexNet, ResNet, & GoogLeNet
  • R-CNN  Fast R-CNN  Faster R-CNN  Mask R-CNN
  • Recent 2D image classification can even extract

precise boundaries of objects (FCN  Mask R-CNN)

Deep Learning for 2D Object Classification

[1] He et al., Mask R-CNN (2017)

slide-5
SLIDE 5

5 / 57

3D OBJECT CLASSIFICATION

  • 3D object classification approaches are getting

more attentions

  • Collecting 3D point data is easier and cheaper

than before (LiDAR & other sensors)

  • Size of data is bigger than 2D images
  • Open datasets are increasing
  • Recent researches approaches human level

detection accuracy

  • MVCNN, ShapeNet, PointNet, VoxNet,

VoxelNet, & VRN Ensemble

Deep Learning for 3D Object Classification

[2] Zhou and Tuzel, VoxelNet (2017)

slide-6
SLIDE 6

6 / 57

GOALS

  • Evaluating & comparing different types of Neural Network models for 3D object

classification

  • Providing the generic framework to test multiple 3D neural network models
  • Simple & easy to implement neural network models
  • Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data)

The goals of our method

slide-7
SLIDE 7

7 / 57

PREVIOUS WORK

slide-8
SLIDE 8

8 / 57

3D POINT-BASED APPROACHES

  • PointNet
  • First 3D point-based classification
  • Unordered dataset
  • Transform  Multi-Layer

Perceptron (MLP)  Max Pool (MP)  Classification

3D Points  Neural Nets

[5] Qi et al., PointNet (2017)

slide-9
SLIDE 9

9 / 57

PIXEL-BASED APPROACHES

  • Multi-Layer Perceptron (MLP)
  • Convolutional Neural Network (CNN)
  • Multi-View Convolutional Neural

Network (MVCNN)

3D  2D Projections  Neural Nets

[3] Su et al., MVCNN (2015) [4] Krizhevsky et al., AlexNet (2012)

slide-10
SLIDE 10

10 / 57

VOXEL-BASED APPROACHES

  • VoxNet
  • VRN Ensemble
  • VoxelNet

3D Points  Voxels  Neural Nets

[6] Maturana and Scherer, VoxNet (2015) [7] Broke et al., VRN Ensemble (2016) [2] Zhou and Tuzel, VoxelNet (2017)

slide-11
SLIDE 11

11 / 57

METHOD

slide-12
SLIDE 12

12 / 57 12 / 57

PREPROCESSING

  • Loading 3D polygonal objects
  • Required Operations on 3D objects
  • Sampling, Shuffling, Jittering, Scaling, & Rotating
  • Projection, & Voxelization
  • Python interface is not that good for multi-core processing (or multi-threading)
  • # of objects is notoriously for single-core processing

Requirement

slide-13
SLIDE 13

13 / 57 13 / 57

FRAMEWORK

Basic Pipeline

Trainer: C++ program Load 3D Objects Sample Points Loading 3D Object

Sampler Threads

Thread N

3D Point Sample

Thread 2

3D Point Sample

… Thread 3

3D Point Sample

Call Python NN Model Functions:

Train, Test, Eval, Report, & Save Epoch #i Increase epoch

Thread 1

3D Point Sample

Converter

Pixel, Point, Voxel

Converter

Pixel, Point, Voxel

Converter

Pixel, Point, Voxel

Converter

Pixel, Point, Voxel

nn3d_trainer (C++)

slide-14
SLIDE 14

14 / 57

SHAPENET CORE V2 MODELNET40 MODELNET10

3D DATASETS

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

10 Categories 4,930 Objects (2 GB) OFF (CAD) File Format ShapeNet

https://www.shapenet.org/

55 Categories 51,191 Objects (90 GB) OBJ File Format Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

40 Categories 12,431 Objects (10 GB) OFF (CAD) File Format

slide-15
SLIDE 15

15 / 57 15 / 57

NEURAL NETWORK MODELS

Point-Based Models Pixel-Based Models Voxel-Based Models

slide-16
SLIDE 16

16 / 57 16 / 57

POINT-BASED NEURAL NETWORK MODELS

  • Preprocessing:
  • Rotate randomly
  • Scale randomly
  • Uniform sampling on 3D object

surfaces

  • Sample 2048 points
  • Shuffle points

Types of Models

  • Tested Models
  • Multi-Layer Perceptron (MLP)
  • Multi Rotational MLPs
  • Single Orientation CNN
  • Multi Rotational CNNs
  • Multi Rotational Resample & Max

Pool Layers

  • ResNet-like
slide-17
SLIDE 17

17 / 57 17 / 57

slide-18
SLIDE 18

18 / 57 18 / 57

POINT-BASED NEURAL NETWORK MODELS

Flatten Vector Fully Connected Layer

Class Onehot Vector

MLP

3D points

Softmax Cross Entropy ReLU + Dropout

slide-19
SLIDE 19

19 / 57 19 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points

Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector

Multi Rotational MLPs

ReLU + Dropout

slide-20
SLIDE 20

20 / 57 20 / 57

POINT-BASED NEURAL NETWORK MODELS

Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector

Single Orientation CNN

3D points

Softmax Cross Entropy ReLU + Dropout ReLU + Dropout

slide-21
SLIDE 21

21 / 57 21 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points

Softmax Cross Entropy Random 3x3 Rotation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector

Multi Rotational CNNs

ReLU + Dropout ReLU + Dropout

slide-22
SLIDE 22

22 / 57 22 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points

Softmax Cross Entropy Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector

Multi Rotational Resample & Max Pool Layers

ReLU + Dropout ReLU + Dropout

slide-23
SLIDE 23

23 / 57 23 / 57

POINT-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy Random 3x3 Rotation Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector

ResNet-like

3D points

ReLU + Dropout ReLU + Dropout ReLU + Dropout

slide-24
SLIDE 24

24 / 57 24 / 57

PIXEL-BASED NEURAL NETWORK MODELS

  • Preprocessing:
  • Sample 8192 points
  • Same as point-based models
  • Depth-only orthogonal projection
  • 32x32 or 64x64
  • Generating multiple rotations
  • 64x64x5 & 64x64x10

Types of Models

  • Tested Models:
  • MLP
  • Depth-Only Orthogonal MVCNN
slide-25
SLIDE 25

25 / 57 25 / 57

slide-26
SLIDE 26

26 / 57 26 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy

MLP

Flatten Vector Fully Connected Layer

Class Onehot Vector

Images (32x32x5)

slide-27
SLIDE 27

27 / 57 27 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Images (32x32x5)

Softmax Cross Entropy

Depth-Only Orthogonal MVCNN

Image Separation 3D Conv Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector Concat

slide-28
SLIDE 28

28 / 57 28 / 57

VOXEL-BASED NEURAL NETWORK MODELS

  • Preprocessing:
  • Sample 8192 points
  • Same as point based models
  • Voxelization
  • 3D points  Voxels
  • Each voxel has intensity 0.0 ~ 1.0
  • how many points hit same voxel
  • 32x32x32 & 64x64x64

Types of Models

  • Tested Models:
  • MLP
  • CNN
  • ResNet-like
slide-29
SLIDE 29

29 / 57 29 / 57

slide-30
SLIDE 30

30 / 57 30 / 57

VOXEL-BASED NEURAL NETWORK MODELS

MLP

Flatten Vector Fully Connected Layer

Class Onehot Vector Softmax Cross Entropy

Images (32x32x5)

slide-31
SLIDE 31

31 / 57 31 / 57

VOXEL-BASED NEURAL NETWORK MODELS

CNN

Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector 3D Conv Layer

Softmax Cross Entropy

Voxels 32x32x32

slide-32
SLIDE 32

32 / 57 32 / 57

VOXEL-BASED NEURAL NETWORK MODELS

ResNet-like

Softmax Cross Entropy

Voxels 32x32x32

Avg Pooling Layer Resample Layer Max Pooling Layer Flatten Vector Fully Connected Layer

Class Onehot Vector 3D Conv Layer Concat

slide-33
SLIDE 33

33 / 57 33 / 57

IMPLEMENTATION

  • System: Ubuntu 16.04, RAM 32 GB & 64 GB, & SSD 512 GB
  • NVIDIA Quadro P6000, Quadro M6000, & GeForce Titan X
  • GCC 5.2.0 for C++ 11x
  • Python 3.5
  • TensorFlow-GPU v1.5.0
  • NumPy 1.0

System Setup

slide-34
SLIDE 34

34 / 57 34 / 57

HYPER PARAMETERS

  • Object Perturbation
  • Random Rotations: -25 ~ 25 degree
  • Random Scaling: 0.7 ~ 1.0
  • Learning Rate: 0.0001
  • Keep Probability (Dropout layer): 0.7
  • Max Epochs: 1000
  • Batch Size: 32
  • Number of Random Rotations: 20
  • Voxel Dim: 32x32x32
  • MVCNN Number of Views: 5
slide-35
SLIDE 35

35 / 57

RESULT

slide-36
SLIDE 36

36 / 57

MODELNET10 # OF TEST & TRAIN OBJECTS

100 200 300 400 500 600 700 800 900 1000 table toilet monitor bathtub sofa chair desk dresser night_stand bed # of Test Models # of Train Models

slide-37
SLIDE 37

37 / 57 10 20 30 40 50 60 70 80 90 100 PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Train Accu Test Accu mAP

MODELNET10 ACCURACY

Iter: 1000

%

slide-38
SLIDE 38

38 / 57

MODELNET40 # OF TEST & TRAIN OBJECTS

100 200 300 400 500 600 700 800 900 # of Test Models # of Train Models

slide-39
SLIDE 39

39 / 57 10 20 30 40 50 60 70 80 90 100 PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Train Accu Test Accu mAP

MODELNET40 ACCURACY

Iter: 1000

%

slide-40
SLIDE 40

40 / 57

MODELNET40 ACCURACY 7 CATEGORIES (# OF TRAIN OBJECTS > 400)

100 200 300 400 500 600 700 800 900 # of Test Models # of Train Models

slide-41
SLIDE 41

41 / 57 10 20 30 40 50 60 70 80 90 100 PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Train Accu Test Accu mAP

MODELNET40, 7 CATEGORIES ACCURACY

Iter: 1000

%

slide-42
SLIDE 42

42 / 57

MODELNET40 ACCURACY 10 CATEGORIES (# OF TRAIN OBJECTS > 300)

100 200 300 400 500 600 700 800 900 # of Test Models # of Train Models

slide-43
SLIDE 43

43 / 57

MODELNET40, 10 CATEGORIES ACCURACY

10 20 30 40 50 60 70 80 90 100 PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Train Accu Test Accu mAP

slide-44
SLIDE 44

44 / 57

MODELNET40 ACCURACY 17 CATEGORIES (# OF TRAIN OBJECTS > 200)

100 200 300 400 500 600 700 800 900 # of Test Models # of Train Models

slide-45
SLIDE 45

45 / 57

MODELNET40,17 CATEGORIES ACCURACY

10 20 30 40 50 60 70 80 90 100 PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet Train Accu Test Accu mAP

Iter: 1000

%

slide-46
SLIDE 46

46 / 57

MODELNET10 PERFORMANCE

0.00 0.50 1.00 1.50 2.00 2.50 3.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

hours

Total Training Time

0.01 0.02 0.03 0.04 0.05 0.06 0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Inference Time Per Batch

seconds

slide-47
SLIDE 47

47 / 57

MODELNET40 PERFORMANCE

0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Total Training Time

hours

0.01 0.02 0.03 0.04 0.05 0.06 0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Inference Time Per Batch

seconds

slide-48
SLIDE 48

48 / 57 48 / 57

SHAPENET CORE V2 ACCURACY

  • ShapeNet has pretty big dataset
  • 90 GB of dataset
  • Only tested 3 NN models
  • Point MP
  • Depth-Only MVCNN
  • Voxel CNN

Iter: 600

10 20 30 40 50 60 70 80 90 100 PC MP PX MVCNN VX CNN Train Accu Test Accu mAP

%

slide-49
SLIDE 49

49 / 57 49 / 57 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410

TRAIN ACCURACY GRAPH

0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410 0.2 0.4 0.6 0.8 1 50 4050 7850 11600 15410

PC MLP1 PC MLPs PC CNN1 PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

slide-50
SLIDE 50

50 / 57 50 / 57 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000

TEST ACCURACY GRAPH

PC MLP1 PC MLPs PC CNN1 PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

slide-51
SLIDE 51

51 / 57 51 / 57 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410 0.5 1 1.5 2 2.5 50 4050 7850 11600 15410

CROSS ENTROPY CONVERGENCE GRAPH

PC MLP1 PC MLPs PC CNN1 PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

slide-52
SLIDE 52

52 / 57

CONCLUSION

slide-53
SLIDE 53

53 / 57

CONCLUSION

  • Voxel ResNet and Voxel CNN provide best result on 3D object classification
  • Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower

accuracy and higher computation cost than ordered data

  • Projection vs. Voxelization: Voxelization provides better result with similar

computation cost (converge faster, & more precise)

  • Strict comparisons with previous methods are not done yet
  • Sampling and jittering methods are different (cannot directly compare yet)

Models

slide-54
SLIDE 54

54 / 57

CONCLUSION

  • ModelNet10 & ModelNet40
  • Some categories have too small training and testing objects
  • Lowering classification accuracy
  • ModelNet40, 7 categories (# of training objects > 400)
  • Achieved 98% accuracy
  • Partial dataset from ModelNet40 (Categories that have more than 400 3D objects)
  • # of train 3D objects in a category matters
  • ShapeNetCore.v2
  • # of 3D objects are big enough, but many object but many duplicated objects

Dataset Evaluation

slide-55
SLIDE 55

55 / 57

FUTURE WORK

  • Running entire procedures in CUDA
  • Using NVIDIA GVDB  Will have advantages of Sparse Voxels
  • Strict comparisons with previous methods
  • PointNet, VRN Ensemble, etc
  • Extending methods to captured dataset (e.g. KITTI)
  • Train set is too small
  • Need to investigate whether Generative Adversarial Networks (GAN) can alleviate

this problem or not

slide-56
SLIDE 56

56 / 57

REFERENCE

  • [1] He et al., 2017, Mask R-CNN
  • [2] Zhou and Tuzel, 2017, VoxelNet: End-to-End Learning for

Point Cloud Based 3D Object Detection

  • [3] Su et al., 2015, Multi-view convolutional neural networks for

3d shape recognition

  • [4] Krizhevsky et al., 2012, ImageNet Classification with Deep

Convolutional Neural Networks

  • [5] Qi et al., 2017, Pointnet: Deep learning on point sets for 3d

classification and segmentation

  • [6] Maturana and Scherer, 2015, VoxNet: A 3D Convolutional

Neural Network for Real-Time Object Recognition

  • [7] Broke et al., 2016, Generative and Discriminative Voxel

Modeling with Convolutional Neural Networks

  • [8] He et al., 2016, Deep residual learning for image recognition
  • [9] Goodfellow et al., 2014, Generative adversarial nets
  • [10] Girshick, 2015, Fast R-CNN
  • [11] Ren et al., 2015, Faster R-CNN: Towards Real-Time Object

Detection with Region Proposal Networks

  • [12] Chang et al., 2015, ShapeNet: An Information-Rich 3D Model

Repository

  • [13] Wu et al., 2015, 3D ShapeNets: A Deep Representation for

Volumetric Shapes

  • [14] Wu et al., 2014, 3D ShapeNets for 2.5D Object Recognition

and Next-Best-View Prediction

slide-57
SLIDE 57

57 / 57