GRAINS: Generative Recursive Autoencoders for INdoor Scenes Manyi Li - - PowerPoint PPT Presentation

grains generative recursive autoencoders for indoor scenes
SMART_READER_LITE
LIVE PREVIEW

GRAINS: Generative Recursive Autoencoders for INdoor Scenes Manyi Li - - PowerPoint PPT Presentation

GRAINS: Generative Recursive Autoencoders for INdoor Scenes Manyi Li 1,2 , Akshay Gadi Patil 2 , Kai Xu 3,4 , Siddhartha Chaudhuri 5,6 , Owais Khan 6 , Ariel Shamir 7 , Changhe Tu 1 , Baoquan Chen 8 , Daniel Cohen-Or 9 , Hao (Richard) Zhang 2 1


slide-1
SLIDE 1

GRAINS: Generative Recursive Autoencoders for INdoor Scenes

1 Shandong University 2 Simon Fraser University 3 National University of Defense Technology 4 AICFVE Beijing Film Academy 5 Adobe Research 6 IIT Bombay 7 The Interdisciplinary Center 8 Peking University 9 Tel-Aviv University

Manyi Li 1,2, Akshay Gadi Patil 2, Kai Xu 3,4, Siddhartha Chaudhuri 5,6, Owais Khan 6, Ariel Shamir 7, Changhe Tu 1, Baoquan Chen 8, Daniel Cohen-Or 9, Hao (Richard) Zhang 2

slide-2
SLIDE 2

Outline

  • Problem & Related work
  • Method
  • Scene representation
  • Network
  • Ablation study
  • Results & Application
slide-3
SLIDE 3

Scene generation problem

Planner5d

  • Generate plausible room layouts automatically, to replace or reduce human work.
slide-4
SLIDE 4

Related works: data-driven

  • Graphical model methods

[Fisher et al. SIGA 2012], [Kermani et al. SGP 2016], [Qi et al. CVPR 2018]

Object appearance Object positioning + Learning

J Respect object-object relations L Too many parameters and rules to tune manually L Time-consuming

Generation

slide-5
SLIDE 5
  • Graphical model methods
  • Deep neural networks

[Wang et al. SIGGRAPH 2018], [Ritchie et al. CVPR 2019]

Related works: data-driven

Indoor scene Multi-channel image CNN training

J Easy-to-use model and better plausibility L No object-object relations

slide-6
SLIDE 6

Our method

  • Indoor scene structures are inherently hierarchical.

Indoor scenes share some common patterns in the sub-scenes.

slide-7
SLIDE 7

Our method

Z

+

Hierarchical scene representation Recursive neural network - Variational Auto-encoder

  • Indoor scene structures are inherently hierarchical.
slide-8
SLIDE 8

Scene Representation

Step1: Deciding the merge

  • rder
slide-9
SLIDE 9

Step1: Deciding the merge

  • rder

Scene Representation

stand stand lamp lamp bed wall wall wall cabinet cabinet floor

  • Leaf nodes: objects

rug

OBB sizes Object label Leaf vector

wall

Step2: Construct the nodes

  • f the hierarchy
slide-10
SLIDE 10

Scene Representation

supp node supp node

  • Support node

sur node

  • Surround node

co-oc node co-oc node

  • Co-occur node

stand stand lamp lamp bed wall wall wall wall cabinet cabinet floor rug wall node wall node

  • Wall node
  • Root node

root node

Step1: Deciding the merge

  • rder
  • Leaf nodes: objects

Step2: Construct the nodes

  • f the hierarchy
  • Internal nodes: groups
slide-11
SLIDE 11

Scene Representation

supp node supp node sur node co-oc node co-oc node stand stand lamp lamp bed wall wall wall wall cabinet cabinet floor rug wall node wall node root node rp rp rp rp rp rp rp rp rp rp rp rp

Step3: Compute relative positions between sibling nodes Step1: Deciding the merge

  • rder
  • Leaf nodes: objects

Step2: Construct the nodes

  • f the hierarchy
  • Internal nodes: groups
slide-12
SLIDE 12

Our method

Z

+

Hierarchical scene representation Recursive neural network - Variational Auto-encoder

  • Indoor scene structures are inherently hierarchical.
slide-13
SLIDE 13
  • Network: RvNN-VAE (Recursive Neural Network – Variational Auto-Encoder)

Network

Z

slide-14
SLIDE 14

Encoding Process

supp node supp node sur node co-oc node co-oc node stand stand lamp lamp bed wall wall wall wall cabinet cabinet floor rug wall node wall node root node rp rp rp rp rp rp rp rp rp rp rp rp

Input hierarchy Encoder module

slide-15
SLIDE 15

Decoding process

supp node supp node sur node co-oc node co-oc node stand stand lamp lamp bed wall wall wall wall cabinet cabinet floor rug wall node wall node root node rp rp rp rp rp rp rp rp rp rp rp rp

Output hierarchy Decoder module

slide-16
SLIDE 16

Generation Pipeline

  • The network learns to map a random vector to a plausible indoor scene.

Generation pipeline

slide-17
SLIDE 17

Scene representation matters!

Indoor scenes are complex and diverse Appropriate scene representation is the key to learning

SUNCG dataset [Song et al. 2017]

Our key points: (1) hierarchical structure, (2) relative position format.

slide-18
SLIDE 18

Key point 1: Hierarchical structure

  • We specifically define wall node and root node in our hierarchies.
  • Reason: walls should serve the role of “grounding” the placement of objects in a room.

Our hierarchical structure

supp node table lamp1 stand1 stand2 table lamp1 bed supp node surr node rug cabinet1 cabinet2 co-oc node co-oc node floor wall1 wall2 wall3 wall node wall node wall4 root

slide-19
SLIDE 19

Ablation Studies: Hierarchical structure

Hierarchy 1 Hierarchy 2 Hierarchy 3

slide-20
SLIDE 20

Hierarchy 1 Hierarchy 2 Hierarchy 3

Hierarchy 1: L No “wall node”s L No “root node”

Generated scenes

Ablation Studies: Hierarchical structure

slide-21
SLIDE 21

Hierarchy 1 Hierarchy 2 Hierarchy 3

Generated scenes

Ablation Studies: Hierarchical structure

Hierarchy 2: J “wall nodes” L No “root nodes”

slide-22
SLIDE 22

Hierarchy 3 (Ours): J “wall nodes” J “root nodes”

Hierarchy 1 Hierarchy 2 Hierarchy 3

Generated scenes

It is important to have floors, wall nodes, and their relative positions in the last merge, to “ground” the

  • bject positions.

Ablation Studies: Hierarchical structure

slide-23
SLIDE 23

Key point 2: Relative Position format

Absolute pos Relative pos 2 Relative pos 1

slide-24
SLIDE 24

Ablation Studies : Relative position format

Absolute pos Relative pos 2 Relative pos 1

Object’s absolute position in the leaf nodes Generated scenes

slide-25
SLIDE 25

Ablation Studies : Relative position format

Absolute pos Relative pos 2 Relative pos 1

Relative position between the object centers Generated scenes

slide-26
SLIDE 26

Absolute pos Relative pos 2 Relative pos 1

Relative position with offsets between closest edges (ours) Generated scenes

Ablation Studies : Relative position format

slide-27
SLIDE 27

Results

Generated scenes:

  • Plausible
  • Novel
  • Diverse
slide-28
SLIDE 28

Comparison against a graphical model method

  • For comparison, we select scenes with the same object shapes.
  • 3-12min / scene.
  • 0.1027sec / scene.
  • No guarantee on the exact alignment.
  • More unreasonable object pairs.
  • Relative positions with attachment and

alignment information.

slide-29
SLIDE 29

Comparison: Perceptual studies

Comparisons are done against (1) the training set, (2) [Kermani et al. 2016] (3) [Wang et al. 2018], (4) [Qi et al. 2018] We ask users to score or select the scenes based on their plausibility.

slide-30
SLIDE 30

Applications

  • Data augmentation method for deep learning tasks

Semantic scene segmentation task:

  • Network: PointNet [Qi et al. 2017]
  • Dataset: Indoor scenes as point clouds
  • Results: More relevant training data, better

learning performance and generalization.

slide-31
SLIDE 31

Applications

  • Data augmentation method for deep learning tasks
  • 2D layout guided 3D scene modeling
  • Goal: 2D box layout to 3D indoor scene
  • Network: Pre-trained RvNN-VAE on 3D scenes
  • Result: Transform between multi-modal data

which share the same hierarchical structures.

slide-32
SLIDE 32

Applications

Hierarchical indoor scene structure helps designers to edit a scene at the sub-scene level.

  • Data augmentation method for deep learning tasks
  • 2D layout guided 3D scene modeling
  • Hierarchy-guided scene editing
slide-33
SLIDE 33

Conclusion

  • We present a generative neural network which enables us to generate plausible

3D indoor scenes in large quantities and varieties, easily and highly efficiently.

  • We study the influence of different scene representations on the learning ability
  • f generative RvNNs.
  • We show the applications of our generated scenes with the corresponding

hierarchies.

slide-34
SLIDE 34

Thank you!