Hao Su Image world Shape world How humans represent 3D in mind? - - PowerPoint PPT Presentation

hao su
SMART_READER_LITE
LIVE PREVIEW

Hao Su Image world Shape world How humans represent 3D in mind? - - PowerPoint PPT Presentation

Synthesize for Learning: Joint analysis of 2D images and 3D shapes Hao Su Image world Shape world How humans represent 3D in mind? Mental rotation by Roger N. Shepard, National Science Medal Laurate, Stanford and Lynn Cooper, Professor at


slide-1
SLIDE 1

Synthesize for Learning:

Joint analysis of 2D images and 3D shapes

Hao Su

Image world Shape world

slide-2
SLIDE 2

How humans represent 3D in mind?

slide-3
SLIDE 3

Mental rotation

by Roger N. Shepard, National Science Medal Laurate, Stanford and Lynn Cooper, Professor at Columbia University

slide-4
SLIDE 4

Shape constancy

slide-5
SLIDE 5

3D Perception is important for robots

Cosimo Alfredo Pina, “The domestic robots are getting closer”

slide-6
SLIDE 6

3D Perception is important for robots

slide-7
SLIDE 7

3D Perception is important for robots

slide-8
SLIDE 8

3D Perception is important for robots

slide-9
SLIDE 9

3D Perception is important for robots

slide-10
SLIDE 10

contrast color motion texture symmetry category-specific 3D knowledge part ……

2D-3D lifting by machine learning

slide-11
SLIDE 11

Synthesize for learning: from virtual world to real world

Shape Database

A shape repository with rich annotation

  • First build & learn in a 3D Virtual Environment,
slide-12
SLIDE 12

Synthesize for learning: from virtual world to real world

Simulator

Synthetic sensory data

Shape Database

Object attributes

Class, Viewpoint, Material, Symmetry, …

A shape repository with rich annotation

  • First build & learn in a 3D Virtual Environment,

slide-13
SLIDE 13

Synthesize for learning: from virtual world to real world

Simulator

Training

Shape Database

Object attributes

Class, Viewpoint, Material, Symmetry, …

  • First build & learn in a 3D Virtual Environment,

A shape repository with rich annotation

Synthetic sensory data

slide-14
SLIDE 14

Synthesize for learning: from virtual world to real world

Object attributes

Real data Testing

  • Then adapt to 2D Real World
slide-15
SLIDE 15

Machine learning is data hungry

2000 2002 2008 2010 2004 2006

Caltech 101 Caltech 256 LabelMe CIFAR

10# 10$ 10% 10& 10'

ImageNet

Review: image classification dataset

slide-16
SLIDE 16

Status review of 3D datasets

<= 60 models per class (average) <= 10,000 models in total <= 100 models in total

slide-17
SLIDE 17

Status review of 3D datasets

2000 2002 2008 2010 2004 2006

Caltech 101 Caltech 256 LabelMe CIFAR

10# 10$ 10% 10& 10'

ImageNet

State-of-the-art 3D shape dataset

# images

Limited in

  • scale
  • bject classes
  • diversity
slide-18
SLIDE 18

My work: Build large-scale 3D datasets of objects

~3 million models in total ~2,000 classes Rich annotations

(in progress)

slide-19
SLIDE 19

An object-centric 3D knowledge-base

Physical properties Part decomposition Symmetry Affordance Material Semantics Images

slide-20
SLIDE 20

ShapeNet: a large-scale 3D datasets of objects

# models # models per classes

10( 10 10) 10# 10$ 10% 10& 10 10( 10) 10#

SHREC14 TSB PSB CCCC WMB SHREC12 MSB BAB ESB

ShapeNet

slide-21
SLIDE 21

My work: Develop data-driven 3D learning algorithms

Simulator

Synthetic sensory data Training

ShapeNet

Object attributes

Class, Viewpoint, Material, Symmetry, …

A shape repository with rich annotation …

slide-22
SLIDE 22

Application 1: 3D viewpoint estimation

car

3D Viewpoint Estimation

azimuth elevatio n in-plane rotation

ICCV 2015 oral: Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

slide-23
SLIDE 23

PASCAL3D+ dataset [Xiang et al.]

Accurate viewpoint label acquisition is expensive

Annotation takes ~1 min per object

slide-24
SLIDE 24

30K images with viewpoint labels in PASCAL3D+ dataset [Xiang et al.]

High-cost Label Acquisition High-capacity Model

60M parameters. AlexNet [Krizhevsky et al.]

How to get MORE images with ACCURATE viewpoint labels?

slide-25
SLIDE 25

Manual alignment by annotators Auto alignment through rendering

slide-26
SLIDE 26

A “Data Engineering” journey ConvNet: Ah ha, I know! Viewpoint is just the brightness pattern!

47% on real test set L 95% on synthetic val set

slide-27
SLIDE 27

A “Data Engineering” journey ConvNet: Ah ha, I know! Viewpoint is just the brightness pattern!

47% on real test set L 95% on synthetic val set

slide-28
SLIDE 28

47% -> 74%

A “Data Engineering” journey

Randomize lighting

ConvNet: hmm.. viewpoint is not the brightness

  • pattern. Maybe it’s the contour?
slide-29
SLIDE 29

47% -> 74%

A “Data Engineering” journey

Randomize lighting

ConvNet: hmm.. viewpoint is not the brightness

  • pattern. Maybe it’s the contour?
slide-30
SLIDE 30

A “Data Engineering” journey

74% -> 86% Add backgrounds

ConvNet: It becomes really hard! Let me look more into the picture.

slide-31
SLIDE 31

A “Data Engineering” journey

bbox crop texture 86% -> 93%

slide-32
SLIDE 32

A “Data Engineering” journey

bbox crop texture

ConvNet: the mapping becomes hard. I have to learn harder to get it right!

86% -> 93%

Key Lesson: Don’t give CNN a chance to “cheat” - it’s very good

at it. When there is no way to cheat, true learning starts.

slide-33
SLIDE 33

Render for CNN Image Synthesis Pipeline

3D model Rendering Add bkg Crop

Hyper-parameters estimation from real images

slide-34
SLIDE 34

2.4M synthesized images for 12 categories

  • High scalability
  • High quality
  • Overfit-resistant
  • Accurate labels
slide-35
SLIDE 35

Metric: viewpoint accuracy and median angle error (lower the better) Real test images from PASCAL3D+ dataset Our model trained on rendered images outperforms state-of-the-art model trained on real images in PASCAL3D+.

8 9 10 11 12 13 14 15 16 Vps&Kps (CVPR15) RenderForCNN (Ours) Viewpoint Median Error

slide-36
SLIDE 36

Results

slide-37
SLIDE 37

Application 2: 3D human pose estimation

3DV 2015 oral: Synthesizing Training Images for Boosting Human 3D Pose Estimation

slide-38
SLIDE 38

Challenge: clothing variation

3DV 2015 oral: Synthesizing Training Images for Boosting Human 3D Pose Estimation

slide-39
SLIDE 39

Automatic texture transfer from images to shapes

3DV 2015 oral: Synthesizing Training Images for Boosting Human 3D Pose Estimation

slide-40
SLIDE 40
slide-41
SLIDE 41

Effectiveness of texture augmentation

slide-42
SLIDE 42

Texture transfer for rigid objects

SIGGRAPH Asia 16: Unsupervised Texture Transfer from Images to Model Collections

Product photos Automatically textured shapes

slide-43
SLIDE 43

Domain adaptation between Virtual and Reality

Map features from real and synthetic images to the same domain 3DV 2015 oral: Synthesizing Training Images for Boosting Human 3D Pose Estimation

slide-44
SLIDE 44

Adversarial learning based domain adaptation

3DV 2015 oral: Synthesizing Training Images for Boosting Human 3D Pose Estimation

slide-45
SLIDE 45

Domain adaptation between Virtual and Reality

slide-46
SLIDE 46

Results: 3D human pose estimation

slide-47
SLIDE 47

Application 3: Attention-based object identification

SIGGRAPH Asia 2016: 3D Attention-Driven Depth Acquisition for Object Identification

slide-48
SLIDE 48

Background

  • 1. How is the scene

composited?

  • 2. What are these?
slide-49
SLIDE 49

Background

49

Object identification

ShapeNet

slide-50
SLIDE 50

Autonomous object identification

slide-51
SLIDE 51

The main challenge – next-best-view problem

  • Observation is partial and progressive

à View planning

  • Assessing views whose observation is unknown

51

Observed view

Unobserve d views

? ? ?

How can you know which view is better without knowing its observation?

slide-52
SLIDE 52

Simulate For Reinforcement learning

  • Train from virtual scanned ShapeNet models using Reinforcement Learning
  • Test in a real environment
slide-53
SLIDE 53

The general framework

slide-54
SLIDE 54

Goal Belief

Observe

Action

Recognition:

  • Incremental

classification based on history View planning:

  • Evaluate a

view based

  • n history

The general framework

slide-55
SLIDE 55

Attention mechanism

  • Goal-oriented and stimulus-driven

55

Glimpse Task

Perform Supervision

  • r reward

Internal Representation

Stores the info.

  • f history

Control of goal oriented and stimulus driven attention mechanisms in the brain, Nature Review Neuroscience. 2002

slide-56
SLIDE 56

3D Recurrent Attention Model

ℎ(

(,)

ℎ(

(()

𝐽(/)

𝜄 , , 𝜚(,) 𝜄 ( , 𝜚(() 𝜄 ) , 𝜚())

𝐽(,) 𝐽(()

ℎ(

())

NBV emission NBV emission NBV emission

ℎ,

(,)

ℎ,

(()

ℎ,

())

Feature extraction Feature extraction Feature extraction

𝜄 / , 𝜚(/)

𝜄 , , 𝜚(,) 𝜄 ( , 𝜚(() initial view classify classify classify

Discriminative view selection View aggregation

slide-57
SLIDE 57

Reinforcement learning needs LOTS of data to train!

  • Simulate many many scan sequences in virtual environment
slide-58
SLIDE 58

Results

slide-59
SLIDE 59

Results

59

slide-60
SLIDE 60

Quantitative results

slide-61
SLIDE 61

Reconstructed 3D scene

SIGGRAPH Asia 2016: 3D Attention-Driven Depth Acquisition for Object Identification

slide-62
SLIDE 62

Summary

  • Key theme: learn in a virtual environment of

3D shapes, test in real scenes of 2D RGB(D) images

  • Data: build a large-scale 3D database

(ShapeNet) with rich annotations

  • Synthesize training data for deep learning,

applicable for many tasks

CV CG ML

slide-63
SLIDE 63

Thank you!