[PPT] - Unsupervised Learning of Probably Symmetric Deformable 3D Objects PowerPoint Presentation

SLIDE 1

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

VISUAL GEOMETRY GROUP, UNIVERSITY OF OXFORD Shangzhe Wu Christian Rupprecht Andrea Vedaldi

SLIDE 2

29

v Problem Introduction v Method Overview v Results v Discussions v Conclusions

Agenda

SLIDE 3

30

What is 3D Reconstruction?

Vision

Reconstruction

Graphics

Rendering 2D Observations 3D Representation

SLIDE 4

Multi-view 3D Reconstruction

33 Building Rome in a Day. Agarwal et al. ICCV’09

static scene but.. the world is dynamic

SLIDE 5

Multi-view 3D Reconstruction

34 Building Rome in a Day. Agarwal et al. ICCV’09 The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting. Guo et al. SIGGRAPH Asia’19

static scene 100 cameras too expensive for me :(

SLIDE 6

Learning-based Single-view 3D Reconstruction

35

Neural Network

3D prior learned during training Need supervision!

SLIDE 7

Supervision during Training

36

3D ground truth or shape models keypoints silhouettes multi-views camera viewpoint depth maps

SLIDE 8

37

3D ground truth or shape models keypoints silhouettes multi-views camera viewpoint depth maps

Unsupervised Learning of 3D Objects

SLIDE 9

Unsupervised Learning of 3D Objects

38

instance-specific 3D shapes single-view images of a category NO other supervision! Training Data Output Unsup3D

SLIDE 10

39

input reconstruction input reconstruction

SLIDE 11

Unsupervised Learning of 3D Objects

40

instance-specific 3D shapes single-view images of a category NO other supervision! Training Data Output Unsup3D

SLIDE 12

Symmetries in the World

41

SLIDE 13

Training Pipeline: Photo-Geometric Autoencoding

42

SLIDE 14

Photo-Geometric Autoencoding

43 Renderer view ! texture

encoder decoder encoder

depth "

encoder decoder

input # reconstruction $ # Reconstruction Loss

SLIDE 15

Photo-Geometric Autoencoding

44 Renderer view ! texture

encoder decoder encoder

depth "

encoder decoder

input # reconstruction $ # Reconstruction Loss

Q1: How to avoid degenerate solutions?

SLIDE 16

Photo-Geometric Autoencoding

45 Renderer view ! texture

encoder decoder encoder

depth "

encoder decoder

input # reconstruction $ # Reconstruction Loss

Q1: How to avoid degenerate solutions? A1: Enforce symmetry

SLIDE 17

Photo-Geometric Autoencoding

46 view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input #

? ?

Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping

SLIDE 18

Photo-Geometric Autoencoding

47 Renderer view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input # reconstruction $ # Reconstruction Loss

? ? ?

flip switch

Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping

SLIDE 19

Photo-Geometric Autoencoding

48 Renderer view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input # reconstruction $ # Reconstruction Loss

? ?

flip switch

Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping

SLIDE 20

Photo-Geometric Autoencoding

49 Renderer view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input # reconstruction $ # Reconstruction Loss

? ? ?

flip switch

Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping

SLIDE 21

Photo-Geometric Autoencoding

50 Renderer view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input # reconstruction $ # Reconstruction Loss

flip switch

Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping

SLIDE 22

Photo-Geometric Autoencoding

51 view ! texture

encoder decoder encoder

flipped depth "

encoder decoder

depth "′ : horizontal flip input #

Q2: What about non-symmetric lighting?

SLIDE 23

Photo-Geometric Autoencoding

52 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ # Reconstruction Loss

encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q2: What about non-symmetric lighting? A2: Enforce symmetry on albedo

flip switch

SLIDE 24

Photo-Geometric Autoencoding

53 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ # Reconstruction Loss

encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

flip switch

Q3: Non-symmetric albedo, deformation, etc?

SLIDE 25

Photo-Geometric Autoencoding

54 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 26

Photo-Geometric Autoencoding

55 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 27

Photo-Geometric Autoencoding

56 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 28

Photo-Geometric Autoencoding

57 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 29

Photo-Geometric Autoencoding

58 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 30

Photo-Geometric Autoencoding

59 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 31

Photo-Geometric Autoencoding

60 canonical view & Renderer shading input # view ! light ' albedo ( reconstruction $ #

conf. )′
conf. )

Reconstruction Loss

encoder decoder encoder decoder encoder encoder

albedo (′ depth "

encoder decoder

depth "′ : horizontal flip

Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty

flip switch

SLIDE 32

61

Images taken from CelebA, 3DFAW

Results on human faces

SLIDE 33

input reconstruction input reconstruction

62

SLIDE 34

63

Images taken from [1]

Results on face paintings

[1] Elliot J. Crowley, Omkar M. Parkhi, and Andrew Zisserman. Face painting: querying art with photos. In Proc. BMVC, 2015.

SLIDE 35

input reconstruction input reconstruction

64

SLIDE 36

65

Images taken from [1] and the Internet

Results on abstract faces

[1] Elliot J. Crowley, Omkar M. Parkhi, and Andrew Zisserman. Face painting: querying art with photos. In Proc. BMVC, 2015.

SLIDE 37

input reconstruction input reconstruction

66

SLIDE 38

67

Video clips taken from VoxCeleb2

Results on video frames

We do not use videos for training or fine-tuning. These results are obtained by applying our model trained on CelebA frame by frame.

SLIDE 39

68

input input input input recon. new view rotated recon. new view rotated recon. new view rotated recon. new view rotated

SLIDE 40

69

Images taken from CelebA

Relighting effects

SLIDE 41

input reconstruction input reconstruction

70

SLIDE 42

71

Images taken from [2] and [3]

Results on cat faces

[2] Weiwei Zhang, Jian Sun, and Xiaoou Tang. Cat head detection - how to effectively exploit shape and texture features. In Proc. ECCV, 2008. [3] Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, and C. V. Jawahar. Cats and dogs. In Proc. CVPR, 2012.

SLIDE 43

input reconstruction input reconstruction

72

SLIDE 44

73

Images rendered using ShapeNet

Results on synthetic cars

SLIDE 45

74

input reconstruction input reconstruction

SLIDE 46

Symmetry Plane Visualization

75

SLIDE 47

Asymmetry Visualization

76

SLIDE 48

Discussion: Ablation Studies

77

SLIDE 49

78

Ablation – Symmetry

Normal Shading Albedo Shaded Recon. full w/o albedo flip w/o depth flip Depth Insight #1: Symmetry avoids degeneracy Input

SLIDE 50

79

Ablation – Lighting (Shape from Shading)

Input Normal Shading Albedo Shaded Recon. full w/o lighting Depth Insight #2: Lighting avoids bumpy shapes and provides cues for shape

SLIDE 51

80

Ablation – Confidence Maps

Insight #3: Confidence maps allows for asymmetry modelling

Asymmetry perturbation input

recon. w/ conf.
recon. w/o conf.
conf. !
conf. !′

SLIDE 52

Discussion: Limitations

81

SLIDE 53

82

Limitation #1: Poor side reconstruction

Why?

Canonical depth map cannot represent the shape of the side

input reconstructions input reconstructions

SLIDE 54

83

Limitation #2: Side pose input

Why?

There are no/few symmetric correspondences present in the image,

which voids the symmetry regularization

No enough side images in the training set

input reconstructions

SLIDE 55

84

Limitation #3: Lambertian shading

Why?

We assume Lambertian shading with one dominant directional light,

and do not model specularity and shadow input reconstructions

SLIDE 56

85

Limitation #4: Dark texture vs. dark shading

Why?

Disentangling dark texture and dark shading is hard. The model may

produce dark shading with spiky shape to reconstruct dark texture. input reconstructions

SLIDE 57

86

v We present an unsupervised method for learning deformable 3D objects from only raw single-view images v Bilateral symmetry in common objects provides a powerful constraint for learning 3D shapes v Shading provides important geometric cues and helps regularize shape v By modeling symmetric albedo and non-symmetric shading separately, our model automatically learns intrinsic image decomposition without supervision v Confidence maps can be used to model asymmetries

Conclusions

SLIDE 58

87

v 3D understanding is possible from only single-view 2D

bservations; image recognition should go beyond 2D

v Reducing supervision in training is important for 3D understanding in the wild v Physical cues and visual patterns are useful, such as symmetry, shading, planes, repetitions, etc

Key Takeaways

SLIDE 59

88

Thank you!

Demo: bit.ly/2zBNjXx Code: github.com/elliottwu/unsup3d