Hao Su
Deep 3D Representation Learning for Visual Computing
July 6, 2017
Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep - - PowerPoint PPT Presentation
Deep 3D Representation Learning for Visual Computing Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms Conclusion 2 Outline Overview of 3D deep learning Background 3D deep learning tasks 3D deep
July 6, 2017
2
Overview of 3D deep learning 3D deep learning algorithms Conclusion
3
Overview of 3D deep learning 3D deep learning algorithms Conclusion Background 3D deep learning tasks
4
5
Robotics
6
Robotics Augmented Reality
Autonomous driving
7
Robotics Augmented Reality
Autonomous driving
`
8
Robotics Augmented Reality Medical Image Processing
Autonomous driving
`
9
Robotics Augmented Reality Medical Image Processing
10
Stanford bunny Utah teapot Princeton shape benchmark [Shilane et al. 04] 1800 models in 90 categories
Status as of 2010:
11
Nowadays millions of 3D models in online repositories
12
Growing market of crowd-sourcing for 3D modeling Nowadays millions of 3D models in online repositories
13
Growing market of crowd-sourcing for 3D modeling
An opportunity of Data-driven 3D Visual Computing
Nowadays millions of 3D models in online repositories
14
15
Build 3D knowledge base
Category
…
Functionality Parts Mass Size Material
…
16
Category
…
Functionality Parts Mass Size Material
…
Build 3D knowledge base Design deep learning methods
18
CV CG ML
19
3D geometry analysis 3D synthesis 3D-assisted image analysis
20
3D geometry analysis
Classification Parsing (object/scene) Correspondence
21
3D synthesis
Monocular 3D reconstruction Shape completion Shape modeling
22
3D-assisted image analysis
Query Results
Cross-view image retrieval Intrinsic decomposition
23
3D geometry analysis 3D synthesis 3D-assisted image analysis
24
3D geometry analysis 3D synthesis
25
Overview of 3D deep learning 3D deep learning algorithms Conclusion
3D Representation issue Deep learning on different 3D representations
26
Images: Unique representation with regular data structure
27
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
28
Novel view image synthesis
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
29
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
30
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
31
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
32
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
33
Rasterized form (regular grids) Geometric form (irregular)
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
34
[Su et al. 2015] [Kalogerakis et al. 2016] …
Volumetric Multi-view
[Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …
35
[Defferard et al. 2016] [Henaff et al. 2015] [Yi et al. 2017] (SyncSpecCNN) …
Volumetric Multi-view
[Qi et al. 2017] (PointNet) [Fan et al. 2017] (PointSetGen)
Point cloud Mesh (Graph CNN) Part assembly
[Tulsiani et al. 2017] [Li et al. 2017] (GRASS) [Su et al. 2015] [Kalogerakis et al. 2016] … [Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …
36
3D geometry analysis 3D synthesis
37
Can we directly apply CNN on 3D data?
38
Can we directly apply CNN on 3D data? Convolution needs an underlying structure
39
3D has many representations: multi-view RGB(D) images volumetric
Rasterized form (regular grids)
40
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)
42
▪ Leverage the huge CNN literature in image analysis
43
▪ Classification
… … … … CNN1
. . .
View pooling CNN2: a second ConvNet producing shape descriptors … CNN2 softmax
Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", Proceedings of ICCV 2015
44 Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox, “Multi-view 3D Models from Single Images with a Convolutional Network”, ECCV2016
▪ Novel-view RGB(D) image synthesis (direct prediction)
45 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros “View Synthesis by Appearance Flow” ECCV2016
▪ Novel-view RGB(D) image synthesis (flow prediction)
46
47
[Tatarchenko et al.]
48
49
3D classification
50
A true 3D representation is more natural for 3D learning
52
[Credit: Su et al.]
3D convolution uses 4D kernels
53
[Credit: Su et al.]
3D convolution uses 4D kernels
High space/time complexity
O(N 3)
54
AlexNet, 2012 3DShapeNets, 2015
Input resolution: 224x224 Input resolution: 30x30x30 224x224=50176 224x224=27000
55
Occupancy Grid 30x30x30 Polygon Mesh
Information loss in voxelization
56
Resolution:
32 64 128
Occupancy:
Yangyan Li, Sören Pirk, Hao Su, Charles R. Qi, Leonidas J. Guibas FPNN: Field Probing Neural Networks for 3D Data NIPS2016
57
Octree: recursively partition the space Each internal node has exactly eight children
58
Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger “OctNet: Learning Deep 3D Representations at High Resolutions” CVPR2017 Pengshuai Wwang, Yang Liu, Yuxiao Guo, Chunyu Sun, Xin Tong “O-CNN: Octree-based Convolutional Neural Network for Understanding 3D Shapes” SIGGRAPH2017
59
Define convolution and pooling along the octree Challenge: how to implement efficiently — build a hash table to index the neighborhood Restrict the convolution stride to be 2
60
Christopher B. Choy, Danfei Xu*, JunYoung Gwak*, Kevin Chen, Silvio Savarese, 3D-R^2N^2: A unified approach for single and multi-view 3D object reconstruction ECCV2016
61
Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox “Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs” arxiv (March, 2017)
62
63
64
Missing thin structures due to improper shape space structure hard for the network to rotate / deform / interpolate
65
3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models
Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)
!! math heavy, you can take a break if you do not like math that
Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces
67
Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces
68
Spatial construction (Geodesic CNN) Spectral construction (Spectral CNN)
69
3D shape graph social network molecules
70
convolutional along spatial coordinates convolutional considering underlying geometry
image credit: D. Boscaini, et al. image credit: D. Boscaini, et al.
71
from Shuman et al. 2013
72
from Michaël Defferrard et al. 2016
grid structure graph structure
73
from Michaël Defferrard et al. 2016
grid structure graph structure hierarchical graph coarsening?
74
Jonathan Masci et al 2015
75
requiring a triangular mesh.
acquire a topological disk.
receptive field.
76
77
modified from Jonathan Masci et al
Bases on meshes: eigenfunction of Laplacian-Bertrami operator
78
79
Li Yi, Hao Su, Xingwen Guo, Leonidas Guibas “SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation” CVPR2017 (spotlight)
Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces
80
81
Ayan Sinha, Jing Bai, Karthik Ramani “Deep Learning 3D Shape Surfaces Using Geometry Images” ECCV2016 Maron et al. “Convolutional Neural Networks on Surfaces via Seamless Toric Covers” SIGGRAPH2017
82
Use CNN to predict the parameterization, then convert to 3D mesh
Step 1 Step 2 Ayan Sinha, Asim Unmesh, Qixing Huang, Karthik Ramani “SurfNet: Generating 3D shape surfaces using deep residual networks” CVPR2017
83
85
PointNet
Hao Su, Charles Qi, Kaichun Mo, Leonidas Guibas PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 (oral)
86
PointNet
Object Classification Part Segmentation Scene Parsing ...
87
Point cloud: N orderless points, each represented by a D dim coordinate
2D array representation
N D
88
Point cloud: N orderless points, each represented by a D dim coordinate
2D array representation
N D
Permutation invariance Transformation invariance
89
Permutation invariance
Point cloud: N orderless points, each represented by a D dim coordinate
2D array representation
N D N D
represents the same set as
90
Examples:
…
f (x1,x2,…,xn) = max{x1,x2,…,xn} f (x1,x2,…,xn) = x1 + x2 +…+ xn
f (x1,x2,…,xn) ≡ f (xπ1,xπ2,…,xπn )
xi ∈!D
91
Observe: f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric
g
92
Observe:
(1,2,3) (1,1,1) (2,3,2) (2,3,4)
f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric
g
h
93
Observe:
(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function
f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric
g
h g
94
Observe:
(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function
PointNet (vanilla)
f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric
g
h g γ
Q: What symmetric functions can be constructed by PointNet?
95
PointNet (vanilla) Symmetric functions
A: Universal approximation to continuous symmetric functions
96
Theorem:
PointNet (vanilla)
A Hausdorff continuous symmetric function can be arbitrarily approximated by PointNet.
f :2X → !
S ⊆ !d ,
97
98
Segmentation from partial scans
99
Density variation is a common issue of 3D point cloud
PointNet++: Robust learning under varying sampling density
100
Original MRG MSG Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space arxiv
101
Input Reconstructed 3D point cloud
Hao Su, Haoqiang Fan, Leonidas Guibas “A Point Set Generation Network for 3D Object Reconstruction from a Single Image” CVPR2017 (oral)
102
Cons: Low resolution Error in structure
Input image ▪ Geometric transformation is hard for upconv Reason:
103
CVPR ’17, Point Set Generation
Transformation friendly for networks Usable as network output? No prior works in deep learning community!
Input Reconstructed 3D point cloud
104
The first work to generate a set in deep learning [CVPR’2017(oral)]
105
CVPR ’17, Point Set Generation
Input Ours (post-processed to volumetric) Volumetric upconv (ECCV 2016, 3D-R2N2) Groundtruth
Network
106
Loss
sets
CVPR ’17, Point Set Generation
Prediction Groundtruth point set
(L)
… … Nx3 Nx3
107
CVPR ’17, Point Set Generation
Differentiable Admit fast computation
108
0.45 0.9 1.35 1.8
Baseline (mean shape)
Ours Chamfer Distance (Error)
CVPR ’17, Point Set Generation
[Choy et. al, ECCV16]
3D volumetric deconv point cloud
63% Error reduction!
109
0.45 0.9 1.35 1.8
Baseline (mean shape)
Ours Chamfer Distance (Error)
CVPR ’17, Point Set Generation
[Choy et. al, ECCV16]
3D volumetric deconv point cloud
Representation choice matters!
110
input
input
CVPR ’17, Point Set Generation
111
input
input
CVPR ’17, Point Set Generation
Out of training categories
112
Need more study on optimal combination strategy.
114
Shubham Tulsiani, Hao Su, Leonidas Guibas, Alexei A. Efros, Jitendra Malik Learning Shape Abstractions by Assembling Volumetric Primitives CVPR 2017
115
We predict primitive parameters: size, rotation, translation of M cuboids. Variable number of parts? We predict “primitive existence probability”
116
Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, Leonidas Guibas “GRASS: Generative Recursive Autoencoders for Shape Structures” SIGGRAPH 2017
117
How to introduce other primitives types? Towards image based modeling, how to add more operations to edit those primitives?
How to use it for design purposes? For example, with certain structural and functional constraints. Ultimately, we expect to automate the modeling process from images, as artists do.
118
Overview of 3D deep learning 3D deep learning algorithms Conclusion
119
CV CG ML
120
http://graphics.stanford.edu/courses/cs468-17-spring/schedule.html
Course (Machine Learning on 3D data) website: Tutorial on 3D deep learning at CVPR, see you at Hawaii!
http://3ddl.stanford.edu/
Workshop on Learning to see 3D data at ICCV’17, Venice, Italy
121
Deep learning for computer vision, computer graphics, and robotics More information on my personal homepage