Project Page: http://pvcnn.mit.edu/
Hardware, AI and Neural-nets
Zhijian Liu*, Haotian Tang*, Yujun Lin, and Song Han
Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , - - PowerPoint PPT Presentation
H ardware, A I and N eural-nets Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and Song Han Project Page: http://pvcnn.mit.edu/ 3D Deep Learning 3D Part Segmentation 3D Semantic Segmentation 3D
Project Page: http://pvcnn.mit.edu/
Hardware, AI and Neural-nets
Zhijian Liu*, Haotian Tang*, Yujun Lin, and Song Han
3D Semantic Segmentation (for VR/AR Headsets) 3D Object Detection (for Self-Driving Cars) 3D Part Segmentation (for Robotic Systems)
Off-chip DRAM access is much more expensive than arithmetic operation! Random memory access is inefficient due to the potential bank conflicts!
Mult and Add SRAM Memory DRAM Memory 30 167 668
Bandwidth (GB/s) 20x slower
Random Memory Access
7 5 2 4 6 1 8 3
Sequential Memory Access
1 2 3 4 5 6 7 8
VoxNet [IROS’15] 3D ShapeNets [CVPR’15] 3D U-Net [MICCAI’16]
20 40 60 100 120 Voxel Resolution GPU Memory (GB) 128 x 128 x 128 resolution 83 GB (Titan XP x 7) 7% information loss 80 20 40 60 80 64 x 64 x 64 resolution 11 GB (Titan XP x 1) 42% information loss
Irregular Access Dynamic Kernel Actual Computation
95.1 0.0 4.9 15.6 27.0 57.4 12.2 51.5 36.3 45.3 2.9 51.8
DGCNN PointCNN SpiderCNN Ours
Runtime (%)
PointCNN [NeurIPS’18] PointNet [CVPR’17] DGCNN [SIGGRAPH’19]
' * +)
Devoxelize Normalize Voxelize Convolve Fuse Multi-Layer Perceptron
Devoxelize Voxelize Convolve
Point-Based Feature Transformation (Fine-Grained)
Multi-Layer Perceptron Normalize Fuse
Devoxelize Normalize Voxelize Convolve Fuse
Voxel-Based Feature Aggregation (Coarse-Grained)
Multi-Layer Perceptron
Devoxelize Normalize Voxelize Convolve Fuse
Point-Based Feature Transformation (Fine-Grained) Voxel-Based Feature Aggregation (Coarse-Grained)
Multi-Layer Perceptron
Features from Voxel-Based Branch: Features from Point-Based Branch:
PVCNN
30 60 90 120 150 180 210 0.7 83.5 84.0 84.5 85.0 85.5 86.0 GPU Latency (ms) GPU Memory (GB)
PointCNN DGCNN RSNet 3D-UNet SpiderCNN PointNet++
1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1
PointNet
Mean IoU
PVCNN
30 60 90 120 150 180 210 0.7 83.5 84.0 84.5 85.0 85.5 86.0 GPU Latency (ms) GPU Memory (GB)
PointCNN DGCNN RSNet 3D-UNet SpiderCNN PointNet++
1.0 1.3 1.6 1.9 2.2 2.5 2.8 3.1
PointNet
Mean IoU 2.7x speedup 1.5x reduction
Jetson Nano Jetson TX2 Jetson AGX Xavier
139.9 42.6 19.9 76.0 20.3 8.2
PointNet (83.7 mIoU) PVCNN (85.2 mIoU)
Objects per Second
20 60 100 140 180 220 300 0.4 42.5 45.0 47.5 50.0 52.5 55.0 GPU Latency (ms) GPU Memory (GB) 1.0 1.6 2.2 2.8 3.4 4.0 4.6 Mean IoU 57.5 260
PVCNN PVCNN++ 3D-UNet PointCNN RSNet DGCNN PointNet
20 60 100 140 180 220 300 0.4 42.5 45.0 47.5 50.0 52.5 55.0 GPU Latency (ms) GPU Memory (GB) 1.0 1.6 2.2 2.8 3.4 4.0 4.6 Mean IoU 57.5 260
PVCNN PVCNN++ 3D-UNet PointCNN RSNet DGCNN PointNet
6.9x speedup 5.7x reduction
Input Scene PointNet PVCNN (1.8x faster) Ground Truth
GPU Latency GPU Memory Pedestrian Cyclist Car F-PointNet++ 105.2 ms 2.0 GB 61.6 62.4 72.8 PVCNN (efficient) 58.9 ms
(1.8x)
1.4 GB
(1.4x)
60.7 (-0.9) 63.6 (+1.2) 73.0 (+0.2) PVCNN (complete) 69.6 ms
(1.5x)
1.4 GB
(1.4x)
64.9 (+3.3) 65.9 (+3.5) 73.1 (+0.3)
F-PointNet++ (10 FPS) PVCNN (17 FPS, 1.8x faster)
GitHub: https://github.com/mit-han-lab/pvcnn Poster: 10:45-12:45 PM @ East Exhibition Hall B + C #112 Project Page: http://pvcnn.mit.edu
2.7x measured speedup 1.5x memory reduction 6.9x measured speedup 5.7x memory reduction 1.8x measured speedup 1.4x memory reduction Gold Medal in Lyft Challenge on 3D Object Detection for Autonomous Vehicles