High Dimensional Convolutional Neural Networks for 3D perception
Chris Choy,
Ph.D. candidate @ Stanford Vision and Learning Lab
1
for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - - PowerPoint PPT Presentation
High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5
Ph.D. candidate @ Stanford Vision and Learning Lab
1
5
AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] FCNN [Long et al.] GAN [Goodfellow et al.]
6
Speech Recognition, Abdel-Hamid et al. Machine Translation Object Detection Semantic Segmentation
7
3D Reconstruction 3D Object Pose Estimation 3D Registration 3D Object Tracking
8
Nvidia Research, 2019 Microsoft HoloLens Amazon AR View
3D Perception
15
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
3D Perception
16
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
Chris, Danfei, JunYoung, Kevin, Silvio, ECCV’16
JunYoung, Silvio, Manmohan, NIPS’16
Adversarial Constraint, JunYoung, Chris, Manmohan, Animehs, Silvio, 3DV’17
3D Shape Reconstruction from a Single Image, Andrey, Jingwei, Animesh, Viraj, JunYoung, Chris, Silvio, WACV’18
Language by Learning Joint Embeddings, Kevin, Chris, Manolis, Angel, Thomas, Silvio, ACCV’18
Convolutional Neural Networks, Chris, JunYoung, Silvio, CVPR’19
17
18
19
20
[Longuet-Higgins, Haming et al., Snavely et al., …]
[Eigen et al., Saxena et al., …]
Tomography
Object-centric Reconstruction …
Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16
22
23
24
25
26
27
Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16 Number of images
30
Increasing confidence on armrests Update / maintain prediction
33
Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16
3D Perception
35
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
3D Point Clouds, Lyne, Chris, Iro, JunYoung Silvio, 3DV’17
Minkowski Convolutional Neural Networks, Chris, JunYoung, Silvio, CVPR’19
Chris, Jaesik, Vladlen, ICCV’19
36
37
38
39
40
41
43
Conti tinuo nuous us Repr presentat ntation
Discre screte te Representatio tation OctNet and Octree [Riegler et al.] Sparse Tensor [Graham et al., Choy et al.] Points and PointNet [Qi et al.] Continuous Convolution
Occupancy Net [Mescheder et al.] Deep SDF [Park et al.] Deep Level Sets [Michalkiewicz et al.] …. Graph ph Repr presentat ntation
Graph Net [Kipf & Wellings] Conv on Graph [Defferrard et al.] …. …. Hybrid rid Repr presentat ntation
Contiuous + Graph
○ COOrdinate (COO) representation ○ 4 at (0, 0) ○ 1 at (1, 1) (0, 0)
45
○ 4 at (0, 0, 0) ○ 1 at (1, 1, 0) ○ 9 at (1, 1, 1) (0, 0, 0)
46
[Graham et al., Submanifold Sparse ConvNet, 2017] [Graham and Maaten, 3D Sparse ConvNet, 2018]
47
Cannot support arbitrary sparsity Dense Tensor Kernel Static Sparsity Pattern Convolution Sparse Convolution
50
Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern
[Graham et al.] [Choy et al.]
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
51
Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern Sparsity pattern manipulation Ex) C = A + B Ex) Pruning High-dimensional ConvNet Volume of dense convolution kernel: O(ND) Sparse convolution kernel: O(D) Generative Tasks
52
Sparse Tensor Kernel Dynamic Sparsity Pattern
Arbitrary sparsity
A convolutional neural network library for sparse tensors
60
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
○ 42-layer deep neural networks for semantic segmentation ○ 101 layers for classification
61
ResNet18 4D MinkNet18
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
62
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
65
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
66
○ Hierarchical map ○ Increases receptive field size exponentially
67
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
70
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
72
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
3D Perception
74
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
Chris, JunYoung, Silvio, Manmohan, NIPS’16
Chris, Jaesik, Vladlen, ICCV’19
75
○ Correspondence, registration, tracking, scene flow, ...
76
○ Limits context, receptive field ○ Features extracted separately
○ Normal, Signed Distance Function, curvatures
Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19
77
Hand-de designe gned d Feature ures Learn arned d Feature tures Spin Image, USC, SHOT, PFH, FPFH 3DMatch, CGF, PointNet, PPF, FoldNet, PPFFold, CapsuleNet, DirectReg, SmoothNet
○ no receptive field limit by crop size ○ Efficient reuse of shared computation
Choy et al., Univers ersal l Corres responde dence e Network rk, NIPS’16 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19
80
Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19
81
3D Perception
82
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
83
84
Memory: O(TN3) Computation: O(K4 TN3)
85
Memory: O(TN3) Computation: O(K4 TN3)
Minkowski ConvNet Sparse Tensor Generalized Convolution
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
○ Static objects have the same 3D coordinates ○ GPS, SLAM
○ U-shaped Net for semantic segmentation, in 4D
86
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
88
94
Faster & Better Regularized Full 4D convolution More effective for small objects
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
95
3D ConvNet 4D ConvNet
Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19
3D Perception
96
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
98
Global Consistency
99
OANet, Zhao et al., 2019 LFGC, Yi et al., 2018 FCGF, Choy et al. 2019 SmoothNet, Gojcic et al. 2019 CapsuleNet, Zhao et al., 2019 PPF, PPF-Fold, Deng et al., 2019, 2018 FoldingNet, Yang et al., 2017 …
100
OANet, Zhao et al., 2019 LFGC, Yi et al., 2018
((x,y,z), (x’, y’, z’)) Nearest Neighbor Feature Extraction Dimensionless data Approximate P(correspondence correct)
101
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y,z), (x’, y’, z’)
Inliers: Blue, Outliers: Red
102
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y,z), (x’, y’, z’)
Concatenate
○ {(1, 5), (3, 7), (4, 8), (5, 9), (2, 9)}
○ Follow the common geometry ○ Inliers
○ Outliers
103
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
Label each correspondence as Inlier vs. Outlier → Label each 6D point as an Inlier vs. Outlier → Label each 3D point as chair, bed, …
104
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
105
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
Translation invariance: Fragments can be located anywhere in 3D space Multi-resolution (large receptive field, less sparse)
106
3D Fragments Feature Extraction Correspondence Global Registration
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
6D ConvNet Confidence Filter
107
Yi et al., Learning to find good correspondences, 2018 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
108
109
110
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y,z), (x’, y’, z’) 3D Fragments Feature Extraction Correspondence Global Registration
111
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y), (x’, y’) Images Feature Extraction Correspondence Global Registration
112
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y), (x’, y’)
113
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
Second degree polynomial (x, y, x’, y’) = 0
114
115
116
Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
(x,y), (x’, y’)
117
Yi et al., Learning to find good correspondences, 2018 Zhang et al., Learning Two-View Correspondences and Geometry Using Order-Aware Network, 2019 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
118
Yi et al., Learning to find good correspondences, 2018 Zhang et al., Learning Two-View Correspondences and Geometry Using Order-Aware Network, 2019 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020
3D Perception
119
3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction
3D Convolutional Networks 4D Convolutional Networks 4D Convolutional Networks 6D Convolutional Networks
7D Convolutional Networks 32D Convolutional Networks
○ Geometric structure
○ 3D object detection ○ Tracking ○ Reconstruction
120
121
122
Vladlen Koltun Jaesik Park JunYoung Gwak Iro Armeni Lyne Tchapmi Manmohan Chandraker Kevin Chen Kuan Fang
123
Leonidas Guibas Benjamin Van Roy Gordon Wetzstein Tsachy Weissman
Danfei Xu, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Manolis Savva, Angel Chang, Namhoon Lee, Yu Xiang, Junha Lee, Michael Stark
124
125