Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep - - PowerPoint PPT Presentation

hao su
SMART_READER_LITE
LIVE PREVIEW

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep - - PowerPoint PPT Presentation

Deep 3D Representation Learning for Visual Computing Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms Conclusion 2 Outline Overview of 3D deep learning Background 3D deep learning tasks 3D deep


slide-1
SLIDE 1

Hao Su

Deep 3D Representation Learning for Visual Computing

July 6, 2017

slide-2
SLIDE 2

Outline

2

Overview of 3D deep learning 3D deep learning algorithms Conclusion

slide-3
SLIDE 3

Outline

3

Overview of 3D deep learning 3D deep learning algorithms Conclusion Background 3D deep learning tasks

slide-4
SLIDE 4

The world around us is comprised of 3D geometry

4

slide-5
SLIDE 5

Broad applications of 3D data

5

Robotics

slide-6
SLIDE 6

Broad applications of 3D data

6

Robotics Augmented Reality

slide-7
SLIDE 7

Autonomous driving

Broad applications of 3D data

7

Robotics Augmented Reality

slide-8
SLIDE 8

Autonomous driving

Broad applications of 3D data

`

8

Robotics Augmented Reality Medical Image Processing

slide-9
SLIDE 9

Autonomous driving

Broad applications of 3D data

`

9

Robotics Augmented Reality Medical Image Processing

Historically, most 3D visual computing techniques focus on single models, lacking robustness

slide-10
SLIDE 10

Lacking 3D data has been the major bottleneck

10

Stanford bunny Utah teapot Princeton shape benchmark [Shilane et al. 04] 1800 models in 90 categories

Status as of 2010:

slide-11
SLIDE 11

Recent rise of Internet 3D models

11

Nowadays millions of 3D models in online repositories

slide-12
SLIDE 12

…… Recent rise of Internet 3D models

12

Growing market of crowd-sourcing for 3D modeling Nowadays millions of 3D models in online repositories

slide-13
SLIDE 13

…… Recent rise of Internet 3D models

13

Growing market of crowd-sourcing for 3D modeling

An opportunity of Data-driven 3D Visual Computing

Nowadays millions of 3D models in online repositories

slide-14
SLIDE 14

Learning for 3D data

14

slide-15
SLIDE 15

Learning for 3D data

15

Build 3D knowledge base

Category

Functionality Parts Mass Size Material

slide-16
SLIDE 16

Learning for 3D data

16

Category

Functionality Parts Mass Size Material

Build 3D knowledge base Design deep learning methods

slide-17
SLIDE 17

> 30,000,000 units

slide-18
SLIDE 18

The surge of 3D deep learning

18

CV CG ML

  • Arguably started from 2015 along with of big 3D datasets (ShapeNet & ModelNet)
  • Very active due to huge industry interests!
  • Robotics
  • Autonomous driving
  • Virtual/augmented reality
  • Smart manufacturing
slide-19
SLIDE 19

3D deep learning tasks

19

3D geometry analysis 3D synthesis 3D-assisted image analysis

slide-20
SLIDE 20

3D deep learning tasks

20

3D geometry analysis

Classification Parsing (object/scene) Correspondence

slide-21
SLIDE 21

3D deep learning tasks

21

3D synthesis

Monocular 3D reconstruction Shape completion Shape modeling

slide-22
SLIDE 22

3D deep learning tasks

22

3D-assisted image analysis

Query Results

Cross-view image retrieval Intrinsic decomposition

slide-23
SLIDE 23

All about Data and Network

23

3D geometry analysis 3D synthesis 3D-assisted image analysis

slide-24
SLIDE 24

All about Data and Network

24

3D geometry analysis 3D synthesis

slide-25
SLIDE 25

Outline

25

Overview of 3D deep learning 3D deep learning algorithms Conclusion

3D Representation issue Deep learning on different 3D representations

slide-26
SLIDE 26

The representation issue of 3D deep learning

26

Images: Unique representation with regular data structure

slide-27
SLIDE 27

The representation issue of 3D deep learning

27

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-28
SLIDE 28

The representation issue of 3D deep learning

28

Novel view image synthesis

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-29
SLIDE 29

The representation issue of 3D deep learning

29

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-30
SLIDE 30

The representation issue of 3D deep learning

30

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-31
SLIDE 31

The representation issue of 3D deep learning

31

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-32
SLIDE 32

The representation issue of 3D deep learning

32

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-33
SLIDE 33

The representation issue of 3D deep learning

33

Rasterized form (regular grids) Geometric form (irregular)

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

slide-34
SLIDE 34

3D deep learning algorithms (by representations)

34

[Su et al. 2015] [Kalogerakis et al. 2016] …

Volumetric Multi-view

[Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …

slide-35
SLIDE 35

3D deep learning algorithms (by representations)

35

[Defferard et al. 2016] [Henaff et al. 2015] [Yi et al. 2017] (SyncSpecCNN) …

Volumetric Multi-view

[Qi et al. 2017] (PointNet) [Fan et al. 2017] (PointSetGen)

Point cloud Mesh (Graph CNN) Part assembly

[Tulsiani et al. 2017] [Li et al. 2017] (GRASS) [Su et al. 2015] [Kalogerakis et al. 2016] … [Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …

slide-36
SLIDE 36

Cartesian product space of “task” and “representation”

36

3D geometry analysis 3D synthesis

slide-37
SLIDE 37

Fundamental challenges of 3D deep learning

37

Can we directly apply CNN on 3D data?

slide-38
SLIDE 38

Fundamental challenges of 3D deep learning

38

Can we directly apply CNN on 3D data? Convolution needs an underlying structure

slide-39
SLIDE 39

Fundamental challenges of 3D deep learning

39

3D has many representations: multi-view RGB(D) images volumetric

  • Can directly apply CNN
  • But has other challenges

Rasterized form (regular grids)

slide-40
SLIDE 40

Fundamental challenges of 3D deep learning

40

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)

slide-41
SLIDE 41

Deep learning on Multi- view representation

slide-42
SLIDE 42

Multi-view representation as 3D input

42

▪ Leverage the huge CNN literature in image analysis

slide-43
SLIDE 43

Multi-view representation as 3D input

43

▪ Classification

… … … … CNN1

. . .

View pooling CNN2: a second ConvNet producing shape descriptors … CNN2 softmax

Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", Proceedings of ICCV 2015

slide-44
SLIDE 44

Multi-view representation as 3D output

44 Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox, “Multi-view 3D Models from Single Images with a Convolutional Network”, ECCV2016

▪ Novel-view RGB(D) image synthesis (direct prediction)

slide-45
SLIDE 45

Multi-view representation as 3D output

45 Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, Alexei A. Efros “View Synthesis by Appearance Flow” ECCV2016

▪ Novel-view RGB(D) image synthesis (flow prediction)

slide-46
SLIDE 46

Key challenges for multi-view representation

46

  • Each view only contains partial information
  • However, not trivial to aggregate information across viewpoints
  • Cannot see through the surface
  • Regular structures in 3D cannot be well captured
  • e.g., symmetry, straightness, planeness
slide-47
SLIDE 47

Key challenges for multi-view representation

47

  • Each view only contains partial information
  • Not trivial to predict across viewpoints
  • Cannot see through the surface
  • Regular structures in 3D cannot be well captured
  • e.g., symmetry, straightness, planeness

[Tatarchenko et al.]

slide-48
SLIDE 48

Key challenges for multi-view representation

48

  • Each view only contains partial information
  • Not trivial to predict across viewpoints
  • Cannot see through the surface
  • Regular structures in 3D cannot be well captured
  • e.g., symmetry, straightness, planeness
slide-49
SLIDE 49
  • Each view only contains partial information
  • However, not trivial to aggregate information across viewpoints
  • Cannot see through the surface
  • Regular structures in 3D cannot be well captured
  • e.g., symmetry, straightness, roundish

Key challenges for multi-view representation

49

3D classification

slide-50
SLIDE 50
  • Each view only contains partial information
  • Not trivial to aggregate information across viewpoints
  • Cannot see through the surface
  • Regular structures in 3D cannot be well captured
  • e.g., symmetry, straightness, roundish

Key challenges for multi-view representation

50

A true 3D representation is more natural for 3D learning

slide-51
SLIDE 51

Deep learning on volumetric representation

slide-52
SLIDE 52

3D CNN on volumetric data

52

[Credit: Su et al.]

3D convolution uses 4D kernels

slide-53
SLIDE 53

Computational complexity issue

53

[Credit: Su et al.]

3D convolution uses 4D kernels

High space/time complexity

O(N 3)

slide-54
SLIDE 54

Computational complexity issue

54

AlexNet, 2012 3DShapeNets, 2015

Input resolution: 224x224 Input resolution: 30x30x30 224x224=50176 224x224=27000

slide-55
SLIDE 55

Computational complexity issue

55

Occupancy Grid 30x30x30 Polygon Mesh

Information loss in voxelization

slide-56
SLIDE 56

The sparsity characteristic of 3D data

56

Resolution:

32 64 128

Occupancy:

Yangyan Li, Sören Pirk, Hao Su, Charles R. Qi, Leonidas J. Guibas FPNN: Field Probing Neural Networks for 3D Data NIPS2016

slide-57
SLIDE 57

Store only the occupied grids

57

Octree: recursively partition the space Each internal node has exactly eight children

slide-58
SLIDE 58

Skip the computation of empty cells

58

Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger “OctNet: Learning Deep 3D Representations at High Resolutions” CVPR2017 Pengshuai Wwang, Yang Liu, Yuxiao Guo, Chunyu Sun, Xin Tong “O-CNN: Octree-based Convolutional Neural Network for Understanding 3D Shapes” SIGGRAPH2017

slide-59
SLIDE 59

Volumetric representation as input

59

Define convolution and pooling along the octree Challenge: how to implement efficiently — build a hash table to index the neighborhood Restrict the convolution stride to be 2

slide-60
SLIDE 60

Volumetric representation as output

60

Christopher B. Choy, Danfei Xu*, JunYoung Gwak*, Kevin Chen, Silvio Savarese, 3D-R^2N^2: A unified approach for single and multi-view 3D object reconstruction ECCV2016

slide-61
SLIDE 61

Towards higher spatial resolution

61

Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox “Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs” arxiv (March, 2017)

slide-62
SLIDE 62

Progressive voxel refinement

62

slide-63
SLIDE 63
  • Computational complexity (seems to have been resolved)
  • Regular structures in 3D cannot be well captured in reconstruction
  • e.g., symmetry, straightness, roundish

Key challenges for volumetric representation

63

slide-64
SLIDE 64

Typical artifacts of volumetric reconstruction

64

Missing thin structures due to improper shape space structure hard for the network to rotate / deform / interpolate

slide-65
SLIDE 65

How to design neural networks for geometric forms?

65

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based CAD models

Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)

slide-66
SLIDE 66

Deep learning on polygonal mesh

!! math heavy, you can take a break if you do not like math that

  • much. Be normal soon.
slide-67
SLIDE 67

Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces

Two different strategies for deep learning on graphs

67

slide-68
SLIDE 68

Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces

Two different strategies for deep learning on meshes

68

Spatial construction (Geodesic CNN) Spectral construction (Spectral CNN)

slide-69
SLIDE 69

Meshes can be represented as graphs

69

3D shape graph social network molecules

slide-70
SLIDE 70

Geometry aware convolution can be important

70

convolutional along spatial coordinates convolutional considering underlying geometry

image credit: D. Boscaini, et al. image credit: D. Boscaini, et al.

slide-71
SLIDE 71

How to define convolution kernel on graphs?

71

from Shuman et al. 2013

  • Desired properties:
  • locally supported (w.r.t graph metric)
  • allowing weight sharing across different coordinates
slide-72
SLIDE 72

How to allow multi-scale analysis?

72

from Michaël Defferrard et al. 2016

grid structure graph structure

slide-73
SLIDE 73

How to allow multi-scale analysis?

73

from Michaël Defferrard et al. 2016

grid structure graph structure hierarchical graph coarsening?

slide-74
SLIDE 74
  • Constructing convolution kernels:
  • Local system of geodesic polar coordinate
  • Extract a small patch at each point x

Spatial construction: Geodesic CNN

74

Jonathan Masci et al 2015

slide-75
SLIDE 75

Issues of Geodesic CNN

75

  • The local charting method relies on a fast marching-like procedure

requiring a triangular mesh.

  • The radius of the geodesic patches must be sufficiently small to

acquire a topological disk.

  • No effective pooling, purely relying on convolutions to increase

receptive field.

slide-76
SLIDE 76

Spectral construction: Spectral CNN

76

Fourier analysis

Convert convolution to multiplication in spectral domain

slide-77
SLIDE 77

Convolution Theorem in non-Euclidean domain

77

modified from Jonathan Masci et al

slide-78
SLIDE 78

Bases on meshes: eigenfunction of Laplacian-Bertrami operator

78

slide-79
SLIDE 79

Synchronization of functional space across meshes

79

Functional map

Li Yi, Hao Su, Xingwen Guo, Leonidas Guibas “SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation” CVPR2017 (spotlight)

slide-80
SLIDE 80

Directly conduct convolution on graphs Conduct convolution on 2D parameterization of 3D surfaces

Two different strategies for deep learning on meshes

80

slide-81
SLIDE 81

Surface parameterization

81

  • Map curved 3D surfaces to 2D Euclidean plane

Ayan Sinha, Jing Bai, Karthik Ramani “Deep Learning 3D Shape Surfaces Using Geometry Images” ECCV2016 Maron et al. “Convolutional Neural Networks on Surfaces via Seamless Toric Covers” SIGGRAPH2017

slide-82
SLIDE 82

Deep learning on surface parameterization

82

Use CNN to predict the parameterization, then convert to 3D mesh

Step 1 Step 2 Ayan Sinha, Asim Unmesh, Qixing Huang, Karthik Ramani “SurfNet: Generating 3D shape surfaces using deep residual networks” CVPR2017

slide-83
SLIDE 83

Key challenges for mesh representation

83

  • Good progress seems to have been made for meshes as input
  • Mesh as output is very challenging:
  • Need consistent surface parameterization
  • Not clear how to generate shapes with topology variation
slide-84
SLIDE 84

Deep learning on point cloud

slide-85
SLIDE 85

PointNet: Directly process point cloud data

85

PointNet

Hao Su, Charles Qi, Kaichun Mo, Leonidas Guibas PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation CVPR 2017 (oral)

slide-86
SLIDE 86

PointNet: Directly process point cloud data

86

PointNet

Object Classification Part Segmentation Scene Parsing ...

slide-87
SLIDE 87

Properties of a desired neural network on point clouds

87

Point cloud: N orderless points, each represented by a D dim coordinate

2D array representation

N D

slide-88
SLIDE 88

Properties of a desired neural network on point clouds

88

Point cloud: N orderless points, each represented by a D dim coordinate

2D array representation

N D

Permutation invariance Transformation invariance

slide-89
SLIDE 89

Properties of a desired neural network on point clouds

89

Permutation invariance

Point cloud: N orderless points, each represented by a D dim coordinate

2D array representation

N D N D

represents the same set as

slide-90
SLIDE 90

Permutation invariance: Symmetric function

90

Examples:

f (x1,x2,…,xn) = max{x1,x2,…,xn} f (x1,x2,…,xn) = x1 + x2 +…+ xn

f (x1,x2,…,xn) ≡ f (xπ1,xπ2,…,xπn )

xi ∈!D

,

slide-91
SLIDE 91

Construct symmetric function family

91

Observe: f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

slide-92
SLIDE 92

Construct symmetric function family

92

Observe:

(1,2,3) (1,1,1) (2,3,2) (2,3,4)

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

h

slide-93
SLIDE 93

Construct symmetric function family

93

Observe:

(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

h g

slide-94
SLIDE 94

Construct symmetric function family

94

Observe:

(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function

PointNet (vanilla)

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

h g γ

slide-95
SLIDE 95

Q: What symmetric functions can be constructed by PointNet?

95

PointNet (vanilla) Symmetric functions

slide-96
SLIDE 96

A: Universal approximation to continuous symmetric functions

96

Theorem:

PointNet (vanilla)

A Hausdorff continuous symmetric function can be arbitrarily approximated by PointNet.

f :2X → !

S ⊆ !d ,

slide-97
SLIDE 97

Robustness to data corruption

97

slide-98
SLIDE 98

Robustness to data corruption

98

Segmentation from partial scans

slide-99
SLIDE 99

Non-uniform Sampling Density

99

Density variation is a common issue of 3D point cloud

  • perspective effect, radial density variation, motion etc.
slide-100
SLIDE 100

PointNet++: Robust learning under varying sampling density

100

Original MRG MSG Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space arxiv

slide-101
SLIDE 101

Point cloud as output

101

Input Reconstructed 3D point cloud

Hao Su, Haoqiang Fan, Leonidas Guibas “A Point Set Generation Network for 3D Object Reconstruction from a Single Image” CVPR2017 (oral)

slide-102
SLIDE 102

Volumetric upconvolution?

102

Cons: Low resolution Error in structure

Input image ▪ Geometric transformation is hard for upconv Reason:

slide-103
SLIDE 103

Another representation possibility: Point clouds

103

CVPR ’17, Point Set Generation

Transformation friendly for networks Usable as network output? No prior works in deep learning community!

slide-104
SLIDE 104

Recent work on 3D prediction by point clouds

Input Reconstructed 3D point cloud

104

The first work to generate a set in deep learning [CVPR’2017(oral)]

slide-105
SLIDE 105

Comparison to direct 3D volumetric upconvolution

105

CVPR ’17, Point Set Generation

Input Ours (post-processed to volumetric) Volumetric upconv (ECCV 2016, 3D-R2N2) Groundtruth

slide-106
SLIDE 106

Network

Pipeline

106

Loss

  • n

sets

CVPR ’17, Point Set Generation

Prediction Groundtruth point set

(L)

… … Nx3 Nx3

slide-107
SLIDE 107

Loss function: Earth Mover’s Distance (EMD)

107

CVPR ’17, Point Set Generation

  • Given two sets of points, measure their discrepancy:

Differentiable Admit fast computation

slide-108
SLIDE 108

Quantitative evaluation

108

0.45 0.9 1.35 1.8

Baseline (mean shape)

Ours Chamfer Distance (Error)

CVPR ’17, Point Set Generation

[Choy et. al, ECCV16]

3D volumetric deconv point cloud

63% Error reduction!

slide-109
SLIDE 109

Quantitative evaluation

109

0.45 0.9 1.35 1.8

Baseline (mean shape)

Ours Chamfer Distance (Error)

CVPR ’17, Point Set Generation

[Choy et. al, ECCV16]

3D volumetric deconv point cloud

Representation choice matters!

slide-110
SLIDE 110

Real-world results

110

input

  • bserved view

input

  • bserved view

CVPR ’17, Point Set Generation

slide-111
SLIDE 111

Generalization to unseen categories

111

input

  • bserved view

input

  • bserved view

CVPR ’17, Point Set Generation

Out of training categories

slide-112
SLIDE 112

Key challenges for point cloud representation

112

  • Point cloud as output is still very challenging:
  • The global structure is reasonable but details are missing
  • Combined with volumetric representation seems to give better results.

Need more study on optimal combination strategy.

slide-113
SLIDE 113

Deep learning on primitives

slide-114
SLIDE 114

Primitive-based assembly

114

Shubham Tulsiani, Hao Su, Leonidas Guibas, Alexei A. Efros, Jitendra Malik Learning Shape Abstractions by Assembling Volumetric Primitives CVPR 2017

slide-115
SLIDE 115

Approach

115

We predict primitive parameters: size, rotation, translation of M cuboids. Variable number of parts? We predict “primitive existence probability”

slide-116
SLIDE 116

GRASS

116

Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, Leonidas Guibas “GRASS: Generative Recursive Autoencoders for Shape Structures” SIGGRAPH 2017

slide-117
SLIDE 117

Open problems

117

How to introduce other primitives types? Towards image based modeling, how to add more operations to edit those primitives?

  • e.g., Deform? Extrude? Loop cut?

How to use it for design purposes? For example, with certain structural and functional constraints. Ultimately, we expect to automate the modeling process from images, as artists do.

slide-118
SLIDE 118

Outline

118

Overview of 3D deep learning 3D deep learning algorithms Conclusion

slide-119
SLIDE 119

The surge of 3D deep learning

119

CV CG ML

  • A field with very short history — arguably started from 2015
  • But very active due to huge industry interests!
  • Robotics
  • Autonomous driving
  • Virtual/augmented reality
  • Smart manufacturing
slide-120
SLIDE 120

Based upon a new course at Stanford

120

http://graphics.stanford.edu/courses/cs468-17-spring/schedule.html

Course (Machine Learning on 3D data) website: Tutorial on 3D deep learning at CVPR, see you at Hawaii!

http://3ddl.stanford.edu/

Workshop on Learning to see 3D data at ICCV’17, Venice, Italy

slide-121
SLIDE 121

Opening for PhD/Postdoc/Visiting Scholar positions

121

Deep learning for computer vision, computer graphics, and robotics More information on my personal homepage

slide-122
SLIDE 122

Thank you!