Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - - PowerPoint PPT Presentation

autoencoders for shape structures
SMART_READER_LITE
LIVE PREVIEW

Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - - PowerPoint PPT Presentation

GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser


slide-1
SLIDE 1

GRASS: Generative Recursive Autoencoders for Shape Structures

Jun Li

NUDT

Kai Xu

NUDT, Shenzen University, Shandong University

Siddhartha Chaudhuri

IIT Bombay

Ersin Yumer

Adobe Research

Hao (Richard) Zhang

Simon Fraser University

Leonidas Guibas

Stanford University

slide-2
SLIDE 2

?

Shapes have different topologies

slide-3
SLIDE 3

Ovsjanikov et al. 2011

Shapes have different geometries

slide-4
SLIDE 4

Wang et al. 2011

Shapes have hierarchical compositionality

slide-5
SLIDE 5

Motivating Question

How can we capture

  • topological variation
  • geometric variation
  • hierarchical composition

in a single, generative, fixed-dimensional representation?

“Shape DNA”

Generate Encode

slide-6
SLIDE 6

Sequences of commands to Maya/AutoCAD

Deformable template [Allen03] Posed template [Anguelov05] Parametrized procedure [Weber95] Probabilistic procedure [Talton09] Learned grammar (single exemplar) [Bokeloh10] Learned grammar (multi-exemplar) [Talton12] Probabilistic grammar [Müller06]

slide-7
SLIDE 7

Pros: direct model of compositional structure, (relatively) low-dimensional, high quality output Cons: limited topological variation, no continuous geometric variation (for generation), no hierarchy, huge effort to segment & label training data Pros: arbitrary geometry/topology, unsupervised Cons: low-resolution, no explicit separation of structure vs fine geometry, no guarantee of symmetry/adjacency, no hierarchy, lots of parameters, lots of training data

Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15]

Structural PGM vs Volumetric DNN

slide-8
SLIDE 8

Pros: direct model of compositional structure, (relatively) low-dimensional, high quality output Cons: limited topological variation, no continuous geometric variation (for generation), no hierarchy, huge effort to segment & label training data Pros: arbitrary geometry/topology, unsupervised Cons: low-resolution, no explicit separation of structure vs fine geometry, no guarantee of symmetry/adjacency, no hierarchy, lots of parameters, lots of training data

Unsupervised [Wu et al. ’15]

Structural PGM vs Volumetric DNN

?

Strongly supervised [Kalogerakis et al. ’12]

slide-9
SLIDE 9

Pros: direct model of compositional structure, (relatively) low-dimensional, high quality output Cons: limited topological variation, no continuous geometric variation (for generation), no hierarchy, huge effort to segment & label training data Pros: arbitrary geometry/topology, unsupervised Cons: low-resolution, no explicit separation of structure vs fine geometry, no guarantee of symmetry/adjacency, no hierarchy, lots of parameters, lots of training data

Unsupervised [Wu et al. ’15]

Structural PGM vs Volumetric DNN

GRASS

?

Strongly supervised [Kalogerakis et al. ’12]

slide-10
SLIDE 10

GRASS: Generative neural networks over unlabeled part layouts

 GRASS factorizes a shape into a hierarchical layout of simplified

parts, plus fine-grained part geometries

 Weakly supervised:

 segments   labels   manually-specified “ground truth” hierarchies

 Structure-aware: learns a generative distribution over richly

informative structures

slide-11
SLIDE 11

Three Challenges

  • Challenge 1: Ingest and generate arbitrary part layouts with

a fixed-dimensional network

  • Convolution doesn’t work over arbitrary graphs
  • Challenge 2: Map a layout invertibly to a fixed-D code

(“Shape DNA”) that implicitly captures adjacency, symmetry and hierarchy

  • Challenge 3: Map layout features to fine geometry
slide-12
SLIDE 12

Li et al. 2008, Wikipedia

Huge variety of (attributed) graphs

 Arbitrary numbers/types of vertices (parts), arbitrary numbers

  • f connections (adjacencies/symmetries)

 For linear graphs (chains) of arbitrary length, we can use a

recurrent neural network (RNN/LSTM)

slide-13
SLIDE 13

Key Insight

  • Edges of a graph can be collapsed sequentially to yield a

hierarchical structure

  • Looks like a parse tree

for a sentence!

  • … and there are

unsupervised sentence parsers

slide-14
SLIDE 14

Socher et al. 2011

Recursive Neural Network (RvNN)

 Repeatedly merge two nodes

into one

 Each node has an n-D feature

vector, computed recursively

 p = f (W [c1;c2] + b)

slide-15
SLIDE 15

Different types of merges, varying cardinalities!

Adjacency Translational symmetry Rotational symmetry Reflectional symmetry

  • How to encode them to the same code space?
  • How to decode them appropriately, given just a code?
slide-16
SLIDE 16

𝑔

𝑡(𝑦, p)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑡(𝑦, p)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

  • Refl. sym.
  • Refl. sym.

Recursively merging parts

Bottom-up merging

Adjacency encoder

𝑔

𝑏(𝑦1, 𝑦2)

slide-17
SLIDE 17

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑡(𝑦, p)

𝑔

𝑏(𝑦1, 𝑦2)

𝑔

𝑏(𝑦1, 𝑦2)

  • Refl. sym.
  • Refl. sym.

Recursively merging parts

Bottom-up merging

Symmetry encoder Root code

𝑔

𝑡(𝑦, p)

Symmetry generator Symmetry parameters

How to determine the merge order?

slide-18
SLIDE 18

RvNN decoder RvNN encoder

𝑜-D root code

𝑀 = 𝑌 − 𝑌′ 2

𝑌 𝑌′

Training with reconstruction loss

  • Learn weights from a variety of randomly sampled merge orders

for each box structure

slide-19
SLIDE 19

In testing

  • Encoding: Given a box structure, determine the merge
  • rder as:
  • The hierarchy that gives the lowest reconstruction error

RvNN decoder RvNN encoder

slide-20
SLIDE 20

Inferring symmetry hierarchical reconstruction loss

Low reconstruction loss High reconstruction loss

slide-21
SLIDE 21

In testing

  • Encoding: Given a box structure, determine the merge
  • rder as:
  • The hierarchy that gives the lowest reconstruction error
  • Decoding: Given an arbitrary code, how to generate the

corresponding structure?

RvNN decoder

Some code Box structure

?

slide-22
SLIDE 22

How to know what type of encoder to use?

Adjacent or symmetry ? Node Classifier

slide-23
SLIDE 23

maximize

𝑄(𝑌) ≈ 𝑄

𝑕𝑢(𝑌)

Making the network generative

  • Variational Auto-Encoder (VAE): Learn a distribution that

approximates the data distribution of true 3D structures

  • Marginalize over a latent “DNA” code

Parameters Likelihood

slide-24
SLIDE 24

maximize

𝑨 should reconstruct 𝑌, given that it was drawn from 𝑅(𝑨|𝑌) Assuming 𝑨’s follow a normal distribution

Variational Bayes formulation

maximize

slide-25
SLIDE 25

maximize

Reconstruction loss KL divergence loss

Encoder

𝑨

Decoder

𝑅 𝑨 𝑌 𝑄(𝑌|𝑨)

𝑌 𝑌′ = 𝑔(𝑨; 𝜄)

𝑀 = 𝑌 − 𝑌′ 2 𝐿𝑀

Variational Autoencoder (VAE)

slide-26
SLIDE 26

Enc Enc Enc

Variational Autoencoder (VAE)

𝑨𝑡~𝑂(𝜈, 𝜏) 𝐹𝑜𝑑(𝑦) 𝑔

𝜈

𝑔

𝜏

𝜈 𝜏

𝑔

𝑚

Encoder Decoder

slide-27
SLIDE 27

Sampling near 𝜈 is robust

(𝜈, 𝜏)

𝑨𝑡~𝑂(𝜈, 𝜏) 𝐹𝑜𝑑(𝑦) 𝑔

𝜈

𝑔

𝜏

𝜈 𝜏

𝑔

𝑚

Encoder Decoder

slide-28
SLIDE 28

Sampling far away from 𝜈?

(𝜈, 𝜏)

𝑨𝑡~𝑂(𝜈, 𝜏) 𝐹𝑜𝑑(𝑦) 𝑔

𝜈

𝑔

𝜏

𝜈 𝜏

𝑔

𝑚

Encoder Decoder 𝑨𝑞~𝑞(𝑨)

slide-29
SLIDE 29

Adversarial training: VAE-GAN

  • Reuse of modules!
  • VAE decoder  GAN generator
  • VAE encoder  GAN discriminator

𝑨𝑡~𝑂(𝜈, 𝜏) 𝐹𝑜𝑑(𝑦)

Variational Auto-Encoder Generative Adversarial Network

𝑔

𝜈

𝑔

𝜏

𝜈 𝜏

𝑨𝑞~𝑞(𝑨) 𝑔

𝑚

Encoder Decoder or Generator 𝐻(𝑨) Discriminator

Real box structures

slide-30
SLIDE 30

VAE

Benefit of adversarial training

slide-31
SLIDE 31

Part geometry synthesis

32D part code

32x32x32 output part volume

Concatenated part code

?

slide-32
SLIDE 32

Results: Shape synthesis

slide-33
SLIDE 33

Results: Inferring consistent hierarchies

slide-34
SLIDE 34

Results: Shape retrieval

slide-35
SLIDE 35

Results: Shape retrieval

Concatenated part code

slide-36
SLIDE 36

Results: Shape interpolation

3-fold 4-fold 4-fold 5-fold 5-fold 6-fold 4-fold 5-fold

slide-37
SLIDE 37

Results: Shape interpolation

slide-38
SLIDE 38

Discussion

  • What does our model learn?
  • Hierarchical organization of part structures
  • A reasonable way to generate 3D structure
  • Part by part
  • Bottom-up
  • Hierarchical organization
  • This is the usual way how a human modeler

creates a 3D model

  • Hierarchical scene graph
  • Refl. sym.
  • Refl. sym.
  • Refl. sym.
  • Refl. sym.
slide-39
SLIDE 39

Discussion

  • A general guideline for 3D shape generation
  • Coarse-to-fine:
  • First generate coarse structure
  • Then generate fine details
  • May employ different representations and models
slide-40
SLIDE 40

Acknowledgement

  • Anonymous reviewers
  • Help on data preparation
  • Yifei Shi, Min Liu, Chengjie Niu and Yizhi Wang
  • Research grants from
  • NSFC, NSERC, NSF
  • Google Focused Research Award
  • Gifts from the Adobe, Qualcomm and Vicarious corporations.
  • Jun Li is a visiting PhD student of University of Bonn, supported by the CSC
slide-41
SLIDE 41

Thank you!

Code & data available at www.kevinkaixu.net