for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - - PowerPoint PPT Presentation

for 3d perception
SMART_READER_LITE
LIVE PREVIEW

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - - PowerPoint PPT Presentation

High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5


slide-1
SLIDE 1

High Dimensional Convolutional Neural Networks for 3D perception

Chris Choy,

Ph.D. candidate @ Stanford Vision and Learning Lab

1

slide-2
SLIDE 2

The Success of Convolutional Networks

5

AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] FCNN [Long et al.] GAN [Goodfellow et al.]

slide-3
SLIDE 3

Experience Versatility

The Success of Convolutional Networks

6

Efficiency

Speech Recognition, Abdel-Hamid et al. Machine Translation Object Detection Semantic Segmentation

slide-4
SLIDE 4

Examples of 3D Vision Tasks

7

3D Reconstruction 3D Object Pose Estimation 3D Registration 3D Object Tracking

slide-5
SLIDE 5

3D Vision in Action

8

Nvidia Research, 2019 Microsoft HoloLens Amazon AR View

slide-6
SLIDE 6

3D Perception

15

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-7
SLIDE 7

3D Perception

16

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-8
SLIDE 8

3D Reconstruction

  • 3D-Recurrent Reconstruction Neural Networks,

Chris, Danfei, JunYoung, Kevin, Silvio, ECCV’16

  • Universal Correspondence Networks, Chris,

JunYoung, Silvio, Manmohan, NIPS’16

  • Weakly supervised 3D Reconstruction with

Adversarial Constraint, JunYoung, Chris, Manmohan, Animehs, Silvio, 3DV’17

  • DeformNet: Free-Form Deformation Network for

3D Shape Reconstruction from a Single Image, Andrey, Jingwei, Animesh, Viraj, JunYoung, Chris, Silvio, WACV’18

  • Text2Shape: Generating Shapes from Natural

Language by Learning Joint Embeddings, Kevin, Chris, Manolis, Angel, Thomas, Silvio, ACCV’18

  • 4D-Spatio Temporal ConvNets: Minkowski

Convolutional Neural Networks, Chris, JunYoung, Silvio, CVPR’19

17

slide-9
SLIDE 9

3D Reconstruction from Few Images

  • Single or Multi-view images of an object
  • Online retail stores

18

Input Images 3D Reconstruction TODO

slide-10
SLIDE 10

3D Reconstruction from Few Images

  • Wide baseline
  • Specular / texture-less region
  • Single view

19

slide-11
SLIDE 11

3D Reconstruction

20

Observations (Images) 3D Representation Algorithms Structure from Motion

[Longuet-Higgins, Haming et al., Snavely et al., …]

Depth Estimation

[Eigen et al., Saxena et al., …]

MVS

Tomography

Object-centric Reconstruction …

slide-12
SLIDE 12

3D Recurrent Reconstruction Neural Networks

  • End-to-end 3D reconstruction
  • Unified framework
  • Single-view & Multi-view reconst.
  • 3D-Convolutional LSTM
  • Update hidden states

Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16

22

slide-13
SLIDE 13

23

slide-14
SLIDE 14

24

slide-15
SLIDE 15

25

slide-16
SLIDE 16

26

slide-17
SLIDE 17

27

slide-18
SLIDE 18

Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16 Number of images

30

Increasing confidence on armrests Update / maintain prediction

slide-19
SLIDE 19

33

Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks, ECCV’16

Robustness to texture and # views

slide-20
SLIDE 20

3D Perception

35

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-21
SLIDE 21

3D Perception

  • SegCloud: Semantic Segmentation of

3D Point Clouds, Lyne, Chris, Iro, JunYoung Silvio, 3DV’17

  • 4D-Spatio Temporal ConvNets:

Minkowski Convolutional Neural Networks, Chris, JunYoung, Silvio, CVPR’19

  • Fully Convolutional Geometric Features,

Chris, Jaesik, Vladlen, ICCV’19

36

slide-22
SLIDE 22

O(N3) volume

Sparsity of 3D data

37

O(N2) surface

vs.

slide-23
SLIDE 23

38

20cm voxel : 18%

slide-24
SLIDE 24

39

10cm voxel : 9%

slide-25
SLIDE 25

40

5cm voxel : 4.5%

slide-26
SLIDE 26

41

2.5cm voxel : 1.8%

slide-27
SLIDE 27

Sparse Representations and Convolution

43

Conti tinuo nuous us Repr presentat ntation

  • n

Discre screte te Representatio tation OctNet and Octree [Riegler et al.] Sparse Tensor [Graham et al., Choy et al.] Points and PointNet [Qi et al.] Continuous Convolution

  • PointCNN
  • Monte Carlo Conv
  • Surface / Tangent Conv

Occupancy Net [Mescheder et al.] Deep SDF [Park et al.] Deep Level Sets [Michalkiewicz et al.] …. Graph ph Repr presentat ntation

  • n

Graph Net [Kipf & Wellings] Conv on Graph [Defferrard et al.] …. …. Hybrid rid Repr presentat ntation

  • n

Contiuous + Graph

slide-28
SLIDE 28

Sparse Matrix

  • Majority of elements are 0
  • Efficient representation
  • Non-zero elements only
  • Compressed sparse row (CSR)
  • List of lists
  • COOrdinate list
  • Etc.
  • Example: 2x2 matrix

○ COOrdinate (COO) representation ○ 4 at (0, 0) ○ 1 at (1, 1) (0, 0)

45

slide-29
SLIDE 29

Sparse Tensor

  • High-dimensional extension
  • COOrdinate representation

○ 4 at (0, 0, 0) ○ 1 at (1, 1, 0) ○ 9 at (1, 1, 1) (0, 0, 0)

46

slide-30
SLIDE 30

Convolution on a Sparse Tensor

[Graham et al., Submanifold Sparse ConvNet, 2017] [Graham and Maaten, 3D Sparse ConvNet, 2018]

47

Cannot support arbitrary sparsity Dense Tensor Kernel Static Sparsity Pattern Convolution Sparse Convolution

slide-31
SLIDE 31

Generalized Convolution

50

Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern

[Graham et al.] [Choy et al.]

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-32
SLIDE 32

Generalized Convolution

51

Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern Sparsity pattern manipulation Ex) C = A + B Ex) Pruning High-dimensional ConvNet Volume of dense convolution kernel: O(ND) Sparse convolution kernel: O(D) Generative Tasks

slide-33
SLIDE 33

Generalized Convolution: Special Cases

52

Sparse Tensor Kernel Dynamic Sparsity Pattern

  • Dilated Convolution
  • Separable Convolution
  • Sparse Convolution
  • Octree Generative Networks

Arbitrary sparsity

  • Dense Convolution
slide-34
SLIDE 34

Minkowski Engine

A convolutional neural network library for sparse tensors

  • Convolution
  • [Max/Avg/Global] Pool
  • Broadcast
  • [Batch/Instance] Normalization
  • Tensor arithmetic
  • Pruning

60

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-35
SLIDE 35

Minkowski Network

  • Very deep convolutional neural networks possible in 3D

○ 42-layer deep neural networks for semantic segmentation ○ 101 layers for classification

  • Reuse network architectures from years of research in 2D

61

ResNet18 4D MinkNet18

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-36
SLIDE 36

Minkowski Engine for other applications

62

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-37
SLIDE 37

Sparsity Pattern Reconstruction

65

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-38
SLIDE 38
  • Partition 3D scans or data into semantic parts
  • Label each voxel or 3D point as one of semantic labels

3D Perception: Semantic Segmentation

66

slide-39
SLIDE 39

3D Semantic Segmentation on Sparse Tensors

  • Sparse tensors for all input/output feature maps
  • U-shaped network

○ Hierarchical map ○ Increases receptive field size exponentially

67

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

Results: ScanNet

70

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-43
SLIDE 43

Results: Stanford 3D

72

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-44
SLIDE 44

3D Perception

74

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-45
SLIDE 45

3D Feature Learning

  • Universal Correspondence Network,

Chris, JunYoung, Silvio, Manmohan, NIPS’16

  • Fully Convolutional Geometric Features,

Chris, Jaesik, Vladlen, ICCV’19

75

slide-46
SLIDE 46

3D Geometric Feature

  • A vector representation of the local / global 3D geometry

○ Correspondence, registration, tracking, scene flow, ...

76

slide-47
SLIDE 47

Prior works in 3D Geometric Features

  • Extract a small 3D patch

○ Limits context, receptive field ○ Features extracted separately

  • Preprocessing

○ Normal, Signed Distance Function, curvatures

Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19

77

Hand-de designe gned d Feature ures Learn arned d Feature tures Spin Image, USC, SHOT, PFH, FPFH 3DMatch, CGF, PointNet, PPF, FoldNet, PPFFold, CapsuleNet, DirectReg, SmoothNet

slide-48
SLIDE 48

Fully Convolutional Metric Learning

  • No preprocessing, no patch extraction

○ no receptive field limit by crop size ○ Efficient reuse of shared computation

  • Hardest Negative Mining

Choy et al., Univers ersal l Corres responde dence e Network rk, NIPS’16 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19

80

slide-49
SLIDE 49

Fully Convolutional Geometric Features

Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res, ICCV’19

81

slide-50
SLIDE 50

3D Perception

82

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-51
SLIDE 51

4D Spatio-temporal data (3D Video)

83

slide-52
SLIDE 52

3D to 4D Spatio-temporal perception

  • 4D Markov Random Fields for Medical Imaging [McInerney & Terzopoulos, 1995]
  • 4D Cardiac Image Segmentation [Lorenzo-Valdés et al., 2014]

84

Advantages of 4D data

  • Temporal consistency
  • Novel viewpoint
  • Dynamics / Action

Challenges of 4D data

  • Weak 3D perception
  • Complexity

Memory: O(TN3) Computation: O(K4 TN3)

slide-53
SLIDE 53

High Dimensional Spaces and Generalized Convolution

85

Challenges

  • Weak 3D perception
  • Complexity

Memory: O(TN3) Computation: O(K4 TN3)

Minkowski ConvNet Sparse Tensor Generalized Convolution

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-54
SLIDE 54

4D Spatio-Temporal Semantic Segmentation

  • Spatially aligned 3D video

○ Static objects have the same 3D coordinates ○ GPS, SLAM

  • Synthetic dataset: Synthia
  • Network:

○ U-shaped Net for semantic segmentation, in 4D

86

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-55
SLIDE 55
slide-56
SLIDE 56

88

slide-57
SLIDE 57

Results: 4D Synthia Dataset

94

Faster & Better Regularized Full 4D convolution More effective for small objects

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-58
SLIDE 58

95

3D ConvNet 4D ConvNet

Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks, CVPR’19

slide-59
SLIDE 59

3D Perception

96

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

slide-60
SLIDE 60

3D Reconstruction

98

3D Scans Preprocessing Fragments 3D Pairwise Registration

Fragment A Fragment B

Global Consistency

slide-61
SLIDE 61

3D Pairwise Registration

99

3D Fragments Feature Extraction Correspondence Global Registration

OANet, Zhao et al., 2019 LFGC, Yi et al., 2018 FCGF, Choy et al. 2019 SmoothNet, Gojcic et al. 2019 CapsuleNet, Zhao et al., 2019 PPF, PPF-Fold, Deng et al., 2019, 2018 FoldingNet, Yang et al., 2017 …

slide-62
SLIDE 62

3D Pairwise Registration

100

3D Fragments Feature Extraction Correspondence Global Registration

OANet, Zhao et al., 2019 LFGC, Yi et al., 2018

((x,y,z), (x’, y’, z’)) Nearest Neighbor Feature Extraction Dimensionless data Approximate P(correspondence correct)

slide-63
SLIDE 63

Geometry of 3D Correspondence

101

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

Fragment A Fragment B

(x,y,z), (x’, y’, z’)

Inliers: Blue, Outliers: Red

slide-64
SLIDE 64

3D Correspondences and 6D Surface

102

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

(x,y,z), (x’, y’, z’)

  • (x,y,z) → Fragment A
  • (x’,y’,z’) → Fragment B

Concatenate

  • (x, y, z, x’, y’, z’)
  • First 3 follow A, last 3 follow B
  • Inliers follow the common geometry

6D Hyper Surface

slide-65
SLIDE 65

Correspondences form high-dimensional geometry

  • X = {1,2,3,4,5}
  • Y = T(X) where T(x) := x + 4
  • Correspondence

○ {(1, 5), (3, 7), (4, 8), (5, 9), (2, 9)}

  • Correct correspondences

○ Follow the common geometry ○ Inliers

  • Incorrect correspondences

○ Outliers

103

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

slide-66
SLIDE 66

Inlier vs. Outlier

Label each correspondence as Inlier vs. Outlier → Label each 6D point as an Inlier vs. Outlier → Label each 3D point as chair, bed, …

104

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

slide-67
SLIDE 67

6D Convolutional Neural Network

105

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

Translation invariance: Fragments can be located anywhere in 3D space Multi-resolution (large receptive field, less sparse)

slide-68
SLIDE 68

Results: 3D Correspondence Segmentation

106

3D Fragments Feature Extraction Correspondence Global Registration

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

6D ConvNet Confidence Filter

slide-69
SLIDE 69

Results: 3D Correspondence Segmentation

107

Yi et al., Learning to find good correspondences, 2018 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

slide-70
SLIDE 70

108

slide-71
SLIDE 71

109

slide-72
SLIDE 72

3D Correspondences and 6D Geometry

110

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

Fragment A Fragment B

(x,y,z), (x’, y’, z’) 3D Fragments Feature Extraction Correspondence Global Registration

slide-73
SLIDE 73

2D Correspondences and 4D Geometry

111

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

Image A Image B

(x,y), (x’, y’) Images Feature Extraction Correspondence Global Registration

slide-74
SLIDE 74

2D Correspondences and 4D Geometry

112

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

(x,y), (x’, y’)

  • (x,y) → Image A
  • (x’,y’) → Image B
slide-75
SLIDE 75

2D Correspondences and 4D Geometry

113

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

Second degree polynomial (x, y, x’, y’) = 0

slide-76
SLIDE 76

Conic Sections

114

slide-77
SLIDE 77

4D Hyper Conic Section of 5D Hyper Cones

115

slide-78
SLIDE 78

2D Correspondences and 4D Geometry

116

Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

(x,y), (x’, y’)

  • (x,y) → Image A
  • (x’,y’) → Image B
  • 2-nd degree polynomial = 0

4D hyper conic section

slide-79
SLIDE 79

YFCC 100M dataset

117

Yi et al., Learning to find good correspondences, 2018 Zhang et al., Learning Two-View Correspondences and Geometry Using Order-Aware Network, 2019 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

slide-80
SLIDE 80

118

Ours Zhang et al. Yi et al.

Yi et al., Learning to find good correspondences, 2018 Zhang et al., Learning Two-View Correspondences and Geometry Using Order-Aware Network, 2019 Choy et al., High-dimensional Convolutional Networks for Geometric Pattern Recognition, 2020

slide-81
SLIDE 81

3D Perception

119

3D Reconstruction 3D Semantic Segmentation Perception on a Set of 3D Data 3D Feature Learning 4D Spatio-Temporal Perception 4D and 6D for Registration Supervised Reconstruction

3D Convolutional Networks 4D Convolutional Networks 4D Convolutional Networks 6D Convolutional Networks

Conclusions

7D Convolutional Networks 32D Convolutional Networks

slide-82
SLIDE 82

Conclusions and Future Work

  • Many more high-dimensional problems

○ Geometric structure

  • Expand the high-dimensional pattern recognition problems to

○ 3D object detection ○ Tracking ○ Reconstruction

120

slide-83
SLIDE 83

Thank you

121

slide-84
SLIDE 84

Thank you

122

Vladlen Koltun Jaesik Park JunYoung Gwak Iro Armeni Lyne Tchapmi Manmohan Chandraker Kevin Chen Kuan Fang

slide-85
SLIDE 85

Thank you

123

Leonidas Guibas Benjamin Van Roy Gordon Wetzstein Tsachy Weissman

slide-86
SLIDE 86

Thank you

Danfei Xu, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Manolis Savva, Angel Chang, Namhoon Lee, Yu Xiang, Junha Lee, Michael Stark

124

slide-87
SLIDE 87

Thank you for your attention

125