3D Deep Learning: An Overview based on My Work Hao Su Feb 23, 2018 - - PowerPoint PPT Presentation

3d deep learning an overview based on my work
SMART_READER_LITE
LIVE PREVIEW

3D Deep Learning: An Overview based on My Work Hao Su Feb 23, 2018 - - PowerPoint PPT Presentation

3D Deep Learning: An Overview based on My Work Hao Su Feb 23, 2018 Our world is 3D Hao Su 2 02/23/2018 Broad applications of 3D data Roboti Hao Su 3 02/23/2018 Broad applications of 3D data Augmented Roboti Hao Su 4 02/23/2018 Broad


slide-1
SLIDE 1

3D Deep Learning: An Overview based on My Work

Hao Su

Feb 23, 2018

slide-2
SLIDE 2

02/23/2018

Our world is 3D

Hao Su

2

slide-3
SLIDE 3

02/23/2018

Broad applications of 3D data

Hao Su

3 Roboti

slide-4
SLIDE 4

02/23/2018

Broad applications of 3D data

Hao Su

4 Roboti Augmented

slide-5
SLIDE 5

02/23/2018 Autonomous

Broad applications of 3D data

Hao Su

5 Roboti Augmented

slide-6
SLIDE 6

02/23/2018 Autonomous

Broad applications of 3D data

Hao Su

6 Roboti Augmented Medical Image Processing

slide-7
SLIDE 7

02/23/2018

3D Understanding Enables Interactions

Hao Su

7 [SIGGRAPH Asia 2016]

Example: 3D understanding for a robot

slide-8
SLIDE 8

02/23/2018

3D Understanding Enables Interactions

Hao Su

8

Shape

slide-9
SLIDE 9

02/23/2018

3D Understanding Enables Interactions

Hao Su

9

Shape Graspable

slide-10
SLIDE 10

02/23/2018

3D Understanding Enables Interactions

Hao Su

10

Shape Mass Graspable

slide-11
SLIDE 11

02/23/2018

3D Understanding Enables Interactions

Hao Su

11

Shape Mass Mobility Graspable

slide-12
SLIDE 12

02/23/2018

AI Perspective of 3D Understanding

Hao Su

12

See the world Understand the world Transform the world

Sensory Cognition Action

Towards interaction with the physical world, 3D is the key!

slide-13
SLIDE 13

3D Perception requires “Knowledge” of 3D World

slide-14
SLIDE 14

02/23/2018

Traditional 3D Vision

Hao Su

14

Multi-view Geometry: Physics based

slide-15
SLIDE 15

3D Learning: Knowledge Based

slide-16
SLIDE 16

02/23/2018

3D Learning: Knowledge Based

Hao Su

16

slide-17
SLIDE 17

02/23/2018

Acquire Knowledge of 3D World by Learning

Hao Su

17

slide-18
SLIDE 18

02/23/2018

3D Learning Tasks

Hao Su

18

3D Analysis

Classification Segmentation (object/scene) Correspondence

slide-19
SLIDE 19

02/23/2018

3D Learning Tasks

Hao Su

19

3D Synthesis

Monocular 3D reconstruction Shape completion Shape modeling

slide-20
SLIDE 20

02/23/2018

3D Learning Tasks

Hao Su

20

3D-based Knowledge Transportation

slide-21
SLIDE 21

02/23/2018

3D Learning Tasks

Hao Su

21

Intuitive Physics based on 3D Understanding

slide-22
SLIDE 22

02/23/2018

Deep Learning on 3D: A New Rising Field

Hao Su

22

3D Understanding

Computer Vision Computer Graphics Robotics Cognitive Science Machine Learning Differential Geometry Topological Analysis Functional Analysis

Artificial Intelligence Mathematics

slide-23
SLIDE 23

02/23/2018

Outline

Hao Su

23

Overview of 3D Deep Learning 3D Deep Learning Algorithms

slide-24
SLIDE 24

02/23/2018

The Representation Issue of 3D Deep Learning

Hao Su

24

Images: Unique representation with regular data structure

slide-25
SLIDE 25

02/23/2018

Hao Su

25

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-26
SLIDE 26

02/23/2018

Hao Su

26

Novel view image synthesis

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-27
SLIDE 27

02/23/2018

Hao Su

27

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-28
SLIDE 28

02/23/2018

Hao Su

28

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-29
SLIDE 29

02/23/2018

Hao Su

29

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-30
SLIDE 30

02/23/2018

Hao Su

30

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

The Representation Issue of 3D Deep Learning

slide-31
SLIDE 31

02/23/2018

Cartesian Product Space of “Task” and “Representation”

Hao Su

31

3D geometry analysis 3D synthesis

slide-32
SLIDE 32

02/23/2018

Fundamental Challenges of 3D Deep Learning

Hao Su

32

Convolution needs an underlying structure Can we directly apply CNN on 3D data?

slide-33
SLIDE 33

02/23/2018

3D has many representations: multi-view RGB(D) images volumetric

Rasterized vs Geometric

Hao Su

33

  • Can directly apply CNN
  • But has other challenges

Rasterized form (regular grids)

slide-34
SLIDE 34

02/23/2018

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

Fundamental Challenges of 3D Deep Learning

Hao Su

34

Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)

slide-35
SLIDE 35

02/23/2018

3D Deep Learning Algorithms (by Representations)

Hao Su

35

  • Projection-based

[Su et al. 2015] [Kalogerakis et al. 2016] … [Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …

Volumetric Multi-view

slide-36
SLIDE 36

02/23/2018

3D Deep Learning Algorithms (by Representations)

Hao Su

36

  • Projection-based

[Defferard et al. 2016] [Henaff et al. 2015] [Yi et al. 2017] (SyncSpecCNN) …

Volumetric Multi-view

[Qi et al. 2017] (PointNet) [Fan et al. 2017] (PointSetGen)

Point cloud Mesh (Graph CNN) Part assembly

[Tulsiani et al. 2017] [Li et al. 2017] (GRASS) [Su et al. 2015] [Kalogerakis et al. 2016] … [Maturana et al. 2015] [Wu et al. 2015] (GAN) [Qi et al. 2016] [Liu et al. 2016] [Wang et al. 2017] (O-Net) [Tatarchenko et al. 2017] (OGN) …

slide-37
SLIDE 37

02/23/2018

3D has many representations: multi-view RGB(D) images volumetric

Fundamental Challenges of 3D Deep Learning

Hao Su

37

  • Can directly apply CNN
  • But has other challenges

Rasterized form (regular grids)

slide-38
SLIDE 38

Deep Learning on Multi-view Representation

slide-39
SLIDE 39

02/23/2018

Multi-view Representation as 3D Input

Hao Su

39

▪ Leverage the huge CNN literature in image analysis

slide-40
SLIDE 40

02/23/2018

Multi-view Representation as 3D Input

Hao Su

40

▪ Classification

… … … … CNN1

. . .

View poolin g CNN2: a second ConvNet producing shape descriptors … CNN2 softmax

Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition", Proceedings of ICCV 2015

slide-41
SLIDE 41

2/15/2018

Multi-view Representation as 3D Output

Hao Su

41

▪ The Novel-view Synthesis Problem

slide-42
SLIDE 42

2/15/2018

Fully Convolutional Network (FCN)

Hao Su

42

Segmentati

  • n:

Learning Deconvolution Network for Semantic Segmentation

slide-43
SLIDE 43

2/15/2018

Idea 1: Direct Novel-view Synthesis

Hao Su

43

Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox, “Multi-view 3D Models from Single Images with a Convolutional Network”, ECCV2016

slide-44
SLIDE 44

2/15/2018

Results are often Blurry

Hao Su

44

slide-45
SLIDE 45

02/23/2018

Hao Su

45 +

+ +

+ …

0.1 0.4 0.3

Observed view image Novel view feature

Su et al, 3D-Assisted Image Feature Synthesis for Novel Views of an Object, ECCV 2016

Idea 2: Explore Cross-View Relationship

slide-46
SLIDE 46

2/15/2018

Idea 2: Explore Cross-View Relationship

Hao Su

46

Single-view network architecture:

Zhou et al, View Synthesis by Appearance Flow, ECCV 2016

slide-47
SLIDE 47

2/15/2018

Hao Su

47

Idea 2: Explore Cross-View Relationship

slide-48
SLIDE 48

2/15/2018

Combine both ideas

Hao Su

48

  • First, apply flow prediction
  • Second, conduct invisible part hallucination

Park et al, Transformation-Grounded Image Generation Network for Novel 3D View Synthesis, CVPR 2017

slide-49
SLIDE 49

2/15/2018

Combine both ideas

Hao Su

49

slide-50
SLIDE 50

2/15/2018

Articulated Shapes: Assist Flow Synthesis by Depth Estimation

Hao Su

50

source image forward flow backward flow target image

Value point to value point to coordinate registered coordinate registered visible region invisible region flow (red is origin)

My latest paper accepted by CVPR’18

slide-51
SLIDE 51

2/15/2018

Articulated Shapes: Assist Flow Synthesis by Depth Estimation

Hao Su

51

depth net

flow net mask net …… …… ……

View Para

  • Conv. block
  • Deconv. block

Full connection block Projection/Transforming layer Residual link Residual conv. block Source image Forward flow Remapped flow Backward flow

Forward flow

Target mask Target image Depth image

My latest paper accepted by CVPR’18

slide-52
SLIDE 52

Deep Learning on Volumetric Representation

slide-53
SLIDE 53

1/30/2018

Popular 3D volumetric data

Hao Su

53

fMRI Manufacturing (finite-element analysis) Geology CT

slide-54
SLIDE 54

02/23/2018

Volumetric Representation as 3D Input

Hao Su

54

▪ The main hurdle is Complexity

slide-55
SLIDE 55

1/30/2018

The Sparsity Characteristic of 3D Data

Hao Su

55

Resolution:

32 64 128

Occupancy:

Li et, FPNN: Field Probing Neural Networks for 3D Data, NIPS 2016

slide-56
SLIDE 56

02/23/2018

Solution: Octree based CNN (O-CNN)

Hao Su

56

Octree

slide-57
SLIDE 57

02/23/2018

Convolution on Octree

Hao Su

57

  • Neighborhood searching: Hash table

OCTREE FullVoxel

Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger “OctNet: Learning Deep 3D Representations at High Resolutions” CVPR2017 Pengshuai Wwang, Yang Liu, Yuxiao Guo, Chunyu Sun, Xin Tong “O-CNN: Octree-based Convolutional Neural Network for Understanding 3D Shapes” SIGGRAPH2017

slide-58
SLIDE 58

02/23/2018

Volumetric Representation as 3D Input

Hao Su

58

▪ The main hurdle is still Complexity

slide-59
SLIDE 59

1/30/2018

A Straight-forward Implementation

Hao Su

59

Choi et al. ECCV 2016

slide-60
SLIDE 60

1/30/2018

Towards Higher Spatial Resolution

Hao Su

60

Maxim Tatarchenko, Alexey Dosovitskiy, Thomas Brox “Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs” arxiv (March, 2017)

slide-61
SLIDE 61

1/30/2018

Progressive Voxel Refinement

Hao Su

61

slide-62
SLIDE 62

02/23/2018

3D has many representations: multi-view RGB(D) images volumetric polygonal mesh point cloud primitive-based models

Fundamental Challenges of 3D Deep Learning

Hao Su

62

Geometric form (irregular) Cannot directly apply CNN Rasterized form (regular grids)

slide-63
SLIDE 63

Deep Learning on Polygonal Meshes

slide-64
SLIDE 64

02/23/2018

Mesh as 3D Input

Hao Su

64

▪ Deep Learning on Graphs

slide-65
SLIDE 65

02/23/2018

Geometry-aware Convolution can be Important

Hao Su

65

convolutional along spatial coordinates convolutional considering underlying geometry

image credit: D. Boscaini, et al. image credit: D. Boscaini, et al.

slide-66
SLIDE 66

02/23/2018

Meshes can be represented as graphs

Hao Su

66

3D shape graph social network molecules

slide-67
SLIDE 67

02/23/2018

How to define convolution kernel on graphs?

Hao Su

67

from Shuman et al. 2013

  • Desired properties:
  • locally supported (w.r.t graph metric)
  • allowing weight sharing across different

coordinates

slide-68
SLIDE 68

02/23/2018

Issues of Geodesic CNN

Hao Su

68

  • The local charting method relies on a fast

marching-like procedure requiring a triangular mesh.

  • The radius of the geodesic patches must be

sufficiently small to acquire a topological disk.

  • No effective pooling, purely relying on

convolutions to increase receptive field.

slide-69
SLIDE 69

02/23/2018

Spectral construction: Spectral CNN

Hao Su

69

Fourier analysis

Convert convolution to multiplication in spectral domain

slide-70
SLIDE 70

02/23/2018

Bases on meshes: eigenfunction of Laplacian- Bertrami operator

Hao Su

70

slide-71
SLIDE 71

02/23/2018

Synchronization of functional space across meshes

Hao Su

71

Functional map

Li Yi, Hao Su, Xingwen Guo, Leonidas Guibas “SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation” CVPR2017 (spotlight)

slide-72
SLIDE 72

02/23/2018

Mesh as 3D Output

Hao Su

72

▪ At the heart a surface parameterization problem

slide-73
SLIDE 73

02/23/2018

Deep learning on surface parameterization

Hao Su

73

Use CNN to predict the parameterization, then convert to 3D mesh

Step 1 Step 2 Ayan Sinha, Asim Unmesh, Qixing Huang, Karthik Ramani “SurfNet: Generating 3D shape surfaces using deep residual networks” CVPR2017

slide-74
SLIDE 74

Deep Learning

  • n Point Cloud Representation
slide-75
SLIDE 75

1/30/2018

Point Cloud: the Most Common Sensor Output

Hao Su

75 Figure from the recent VoxelNet paper from Apple.

slide-76
SLIDE 76

02/23/2018

Point Cloud as 3D Input

Hao Su

76

▪ Deep Learning on Sets (orderless)

slide-77
SLIDE 77

1/30/2018

Properties of a desired neural network on point clouds

Hao Su

77

2D array representation

N D

Point cloud: N orderless points, each represented by a D dim coordinate

Hao Su*, Charles Qi*, Kaichun Mo, Leonidas Guibas “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” CVPR2017 (oral)

slide-78
SLIDE 78

1/30/2018

Hao Su

78

2D array representation

N D

Properties of a desired neural network on point clouds

Point cloud: N orderless points, each represented by a D dim coordinate

slide-79
SLIDE 79

1/30/2018

Hao Su

79

Point cloud: N orderless points, each represented by a D dim coordinate

2D array representation

N D N D

represents the same set as

Properties of a desired neural network on point clouds

slide-80
SLIDE 80

1/30/2018

Permutation invariance: Symmetric function

Hao Su

80

Examples:

f (x1,x2,…,xn) = max{x1,x2,…,xn} f (x1,x2,…,xn) = x1 + x2 +…+ xn

f (x1,x2,…,xn) ≡ f (xπ1,xπ2,…,xπn ) xi ∈!D

,

slide-81
SLIDE 81

1/30/2018

Construct symmetric function family

Hao Su

81

Observe:

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

slide-82
SLIDE 82

1/30/2018

Construct symmetric function family

Hao Su

82

(1,2,3) (1,1,1) (2,3,2) (2,3,4)

h

Observe:

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

slide-83
SLIDE 83

1/30/2018

Construct symmetric function family

Hao Su

83

(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function

h

g

Observe:

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

slide-84
SLIDE 84

1/30/2018

Construct symmetric function family

Hao Su

84

(1,2,3) (1,1,1) (2,3,2) (2,3,4) simple symmetric function

PointNet (vanilla)

h

g γ

Observe:

f (x1,x2,…,xn) = γ ! g(h(x1),…,h(xn)) is symmetric if is symmetric

g

slide-85
SLIDE 85

1/30/2018

Q: What symmetric functions can be constructed by PointNet?

Hao Su

85

PointNet (vanilla) Symmetric functions

slide-86
SLIDE 86

1/30/2018

A: Universal approximation to continuous symmetric functions

Hao Su

86

Theorem:

PointNet (vanilla)

A Hausdorff continuous symmetric function can be arbitrarily approximated by PointNet.

f :2X → !

S ⊆ !d ,

slide-87
SLIDE 87

1/30/2018

PointNet is Light-weight

Hao Su

87 1000K 10000K 100000K

MVCNN

Space complexity (#params)

Subvolume VRN PointNet

multi-view volumetric point cloud ⎧ ⎨ ⎩

Saves 95% GPU memory

100M 10M 1M

[Su et al. 2015] [Su et al. 2016] [Su et al. 2016] [Su et al. 2017]

slide-88
SLIDE 88

1/30/2018

Robustness to data corruption

Hao Su

88

slide-89
SLIDE 89

1/30/2018

Robustness to data corruption

Hao Su

89

Segmentation from partial scans

slide-90
SLIDE 90

1/30/2018

Visualize what is learned by reconstruction

Hao Su

90

Salient points are discovered!

slide-91
SLIDE 91

1/30/2018

PointNet v2.0: Multi-Scale PointNet

Hao Su

91

N points in (x,y) N1 points in (x,y,f) N2 points in (x,y,f’)

  • 1. Larger receptive field in higher layers
  • 2. Less points in higher layers (more scalable)
  • 3. Weight sharing
  • 4. Translation invariance (local coordinates in

local regions)

Charles Qi, Hao Su, Li Yi, Leonidas Guibas “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” NIPS 2017

slide-92
SLIDE 92

1/30/2018

Fuse 2D and 3D: Frustum PointNets for 3D Object Detection

Hao Su

92 + Leveraging mature 2D detectors for region proposal and 3D search space reduction + Solving 3D detection problem with 3D data and 3D deep learning architectures

My latest paper accepted at CVPR 2018

slide-93
SLIDE 93

1/30/2018

Our method ranks No. 1 on KITTI 3D Object Detection Benchmark

Hao Su

93

We get 5% higher AP than Apple’s recent CVPR submission and more than 10% higher AP than previous SOTA in easy category

...

slide-94
SLIDE 94

1/30/2018

Our method ranks No. 1 on KITTI 3D Object Detection Benchmark

Hao Su

94

We are also 1st place for smaller objects (ped. and cyclist) winning with even bigger margins.

... ...

slide-95
SLIDE 95

1/30/2018

Hao Su

95

Remarkable box estimation accuracy even with a dozen

  • f points or with

very partial point cloud

slide-96
SLIDE 96

1/30/2018

Hao Su

96

slide-97
SLIDE 97

02/23/2018

Point Cloud as 3D Output

Hao Su

97

▪ Deep Learning to Generate Combinatorial Objects

slide-98
SLIDE 98

02/23/2018

Supervision from “Synthesize for Learning”

98

ShapeNet

Renderer

slide-99
SLIDE 99

02/23/2018

3D Representation: Point Cloud

Hao Su

99

Describe shape for the whole object Usable as network output? No prior works in the deep learning community!

slide-100
SLIDE 100

02/23/2018

3D Prediction by Point Clouds

Input Reconstructed 3D point cloud

100

Hao Su, Haoqiang Fan, Leonidas Guibas “A Point Set Generation Network for 3D Object Reconstruction from a Single Image” CVPR2017 (oral)

slide-101
SLIDE 101

02/23/2018

3D Prediction by Point Clouds

Input Reconstructed 3D point cloud

101

slide-102
SLIDE 102

02/23/2018

Pipeline

Hao Su

102

CVPR ’17, Point Set Generation

Loss

  • n

sets

sampl e

(L)

Deep network

Prediction

(f)

slide-103
SLIDE 103

02/23/2018

Loss function: Earth Mover’s Distance (EMD)

Hao Su

103

  • Given two sets of points, measure their discrepancy:

Differentiable Admit fast computation

slide-104
SLIDE 104

02/23/2018

Generalization to Unseen Categories

Hao Su

104 input

  • bserved view

input

  • bserved view

Out of training

slide-105
SLIDE 105

Deep Learning on Primitives

slide-106
SLIDE 106

02/23/2018

Describe Shapes by Primitives

Hao Su

▪ What are parts? Reusable substructures! ▪ A Structure Mining Problem ▪ By DL, also a Meta-Learning Problem

106

slide-107
SLIDE 107

02/23/2018

Primitive-based Assembly

Hao Su

107

Shubham Tulsiani, Hao Su, Leonidas Guibas, Alexei A. Efros, Jitendra Malik Learning Shape Abstractions by Assembling Volumetric Primitives CVPR 2017

slide-108
SLIDE 108

02/23/2018

Approach

Hao Su

108

We predict primitive parameters: size, rotation, translation of M cuboids. Variable number of parts? We predict “primitive existence probability”

slide-109
SLIDE 109

02/23/2018

Generative Models for Shapes by Reusing Primitives

Hao Su

109

▪ Incremental Assembly-based modeling ▪ “Transfer Learning” in the sense of reusing prior knowledge

slide-110
SLIDE 110

02/23/2018

Primitive Space from ShapeNet Parts

Hao Su

110

slide-111
SLIDE 111

02/23/2018

Markov Modeling Process

Hao Su

111

Part assembly: Markov process – Incrementally assemble parts.

Sung et al, ComplementMe: Weakly-Supervised Component Suggestions for 3D Modeling SIGGRAPH Asia 2017

slide-112
SLIDE 112

02/23/2018

New part proposal by network

Hao Su

112

Placement
 Network Proposal
 Network

Component
 Embedding
 Space

Partial
 Assembly Output

slide-113
SLIDE 113

02/23/2018

Automatic Shape Synthesis

Hao Su

113

slide-114
SLIDE 114

02/23/2018

Automatic Shape Synthesis

Hao Su

114

slide-115
SLIDE 115

02/23/2018

Hao Su

115

slide-116
SLIDE 116

Thank you!