Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , - - PowerPoint PPT Presentation

point voxel cnn for e ffi cient 3d deep learning
SMART_READER_LITE
LIVE PREVIEW

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , - - PowerPoint PPT Presentation

H ardware, A I and N eural-nets Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and Song Han Project Page: http://pvcnn.mit.edu/ 3D Deep Learning 3D Part Segmentation 3D Semantic Segmentation 3D


slide-1
SLIDE 1

Project Page: http://pvcnn.mit.edu/

Hardware, AI and Neural-nets

Zhijian Liu*, Haotian Tang*, Yujun Lin, and Song Han

Point-Voxel CNN for Efficient 3D Deep Learning

slide-2
SLIDE 2

3D Semantic Segmentation (for VR/AR Headsets) 3D Object Detection (for Self-Driving Cars) 3D Part Segmentation (for Robotic Systems)

3D Deep Learning

slide-3
SLIDE 3

Efficient 3D Deep Learning

Off-chip DRAM access is much more expensive than arithmetic operation! Random memory access is inefficient due to the potential bank conflicts!

Mult and Add SRAM Memory DRAM Memory 30 167 668

Bandwidth (GB/s) 20x slower

Random Memory Access

7 5 2 4 6 1 8 3

Sequential Memory Access

1 2 3 4 5 6 7 8

slide-4
SLIDE 4

Voxel-Based Models: Cubically-Growing Memory

*)

VoxNet [IROS’15] 3D ShapeNets [CVPR’15] 3D U-Net [MICCAI’16]

20 40 60 100 120 Voxel Resolution GPU Memory (GB) 128 x 128 x 128 resolution 83 GB (Titan XP x 7) 7% information loss 80 20 40 60 80 64 x 64 x 64 resolution 11 GB (Titan XP x 1) 42% information loss

slide-5
SLIDE 5

Irregular Access Dynamic Kernel Actual Computation

95.1 0.0 4.9 15.6 27.0 57.4 12.2 51.5 36.3 45.3 2.9 51.8

DGCNN PointCNN SpiderCNN Ours

Runtime (%)

Point-Based Models: Sparsity Overheads

PointCNN [NeurIPS’18] PointNet [CVPR’17] DGCNN [SIGGRAPH’19]

' * +)

slide-6
SLIDE 6

Point-Voxel Convolution (PVConv)

Devoxelize Normalize Voxelize Convolve Fuse Multi-Layer Perceptron

slide-7
SLIDE 7

Point-Voxel Convolution (PVConv)

Devoxelize Voxelize Convolve

Point-Based Feature Transformation (Fine-Grained)

Multi-Layer Perceptron Normalize Fuse

slide-8
SLIDE 8

Point-Voxel Convolution (PVConv)

Devoxelize Normalize Voxelize Convolve Fuse

Voxel-Based Feature Aggregation (Coarse-Grained)

Multi-Layer Perceptron

slide-9
SLIDE 9

Point-Voxel Convolution (PVConv)

Devoxelize Normalize Voxelize Convolve Fuse

Point-Based Feature Transformation (Fine-Grained) Voxel-Based Feature Aggregation (Coarse-Grained)

Multi-Layer Perceptron

slide-10
SLIDE 10

Point-Voxel Convolution (PVConv)

Features from Voxel-Based Branch: Features from Point-Based Branch:

slide-11
SLIDE 11

Results: 3D Part Segmentation (ShapeNet)

PVCNN

30 60 90 120 150 180 210 0.7 83.5 84.0 84.5 85.0 85.5 86.0 GPU Latency (ms) GPU Memory (GB)

PointCNN DGCNN RSNet 3D-UNet SpiderCNN PointNet++

1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1

PointNet

Mean IoU

slide-12
SLIDE 12

Results: 3D Part Segmentation (ShapeNet)

PVCNN

30 60 90 120 150 180 210 0.7 83.5 84.0 84.5 85.0 85.5 86.0 GPU Latency (ms) GPU Memory (GB)

PointCNN DGCNN RSNet 3D-UNet SpiderCNN PointNet++

1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1

PointNet

Mean IoU 2.7x speedup 1.5x reduction

slide-13
SLIDE 13

Jetson Nano Jetson TX2 Jetson AGX Xavier

139.9 42.6 19.9 76.0 20.3 8.2

PointNet (83.7 mIoU) PVCNN (85.2 mIoU)

Objects per Second

Results: 3D Part Segmentation (ShapeNet)

slide-14
SLIDE 14

Results: 3D Semantic Segmentation (S3DIS)

20 60 100 140 180 220 300 0.4 42.5 45.0 47.5 50.0 52.5 55.0 GPU Latency (ms) GPU Memory (GB) 1.0 1.6 2.2 2.8 3.4 4.0 4.6 Mean IoU 57.5 260

PVCNN PVCNN++ 3D-UNet PointCNN RSNet DGCNN PointNet

slide-15
SLIDE 15

Results: 3D Semantic Segmentation (S3DIS)

20 60 100 140 180 220 300 0.4 42.5 45.0 47.5 50.0 52.5 55.0 GPU Latency (ms) GPU Memory (GB) 1.0 1.6 2.2 2.8 3.4 4.0 4.6 Mean IoU 57.5 260

PVCNN PVCNN++ 3D-UNet PointCNN RSNet DGCNN PointNet

6.9x speedup 5.7x reduction

slide-16
SLIDE 16

Input Scene PointNet PVCNN (1.8x faster) Ground Truth

Results: 3D Semantic Segmentation (S3DIS)

slide-17
SLIDE 17

Results: 3D Object Detection (KITTI)

GPU Latency GPU Memory Pedestrian Cyclist Car F-PointNet++ 105.2 ms 2.0 GB 61.6 62.4 72.8 PVCNN (efficient) 58.9 ms

(1.8x)

1.4 GB

(1.4x)

60.7 (-0.9) 63.6 (+1.2) 73.0 (+0.2) PVCNN (complete) 69.6 ms

(1.5x)

1.4 GB

(1.4x)

64.9 (+3.3) 65.9 (+3.5) 73.1 (+0.3)

Faster Lower More Accurate

slide-18
SLIDE 18

F-PointNet++ (10 FPS) PVCNN (17 FPS, 1.8x faster)

Results: 3D Object Detection (KITTI)

slide-19
SLIDE 19

GitHub: https://github.com/mit-han-lab/pvcnn Poster: 10:45-12:45 PM @ East Exhibition Hall B + C #112 Project Page: http://pvcnn.mit.edu

Point-Voxel CNN for Efficient 3D Deep Learning

2.7x measured speedup 1.5x memory reduction 6.9x measured speedup 5.7x memory reduction 1.8x measured speedup 1.4x memory reduction Gold Medal in Lyft Challenge on 3D Object Detection for Autonomous Vehicles