3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman - - PowerPoint PPT Presentation

3d shape attributes
SMART_READER_LITE
LIVE PREVIEW

3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman - - PowerPoint PPT Presentation

3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman CMU & University of Oxford http://www.robots.ox.ac.uk/~vgg To appear: CVPR 2016 Motivation How to describe this object? 1. Label: Henry Moore Sculpture, Oval with


slide-1
SLIDE 1

3D Shape Attributes

David Fouhey, Abhinav Gupta and Andrew Zisserman CMU & University of Oxford http://www.robots.ox.ac.uk/~vgg To appear: CVPR 2016

slide-2
SLIDE 2

Motivation

  • How to describe this object?
  • 1. Label: Henry Moore Sculpture,

“Oval with Points”

  • 2. Shape description: 3D solid object,

smooth for the most part but has pointed/conical parts, has hole, bulbous, rectangular (portrait) aspect ratio, approx. mirror symmetry

slide-3
SLIDE 3

Motivation

  • Objective: represent the shape of 3D objects (in a

viewpoint invariant manner)

  • 1. 3D Shape Attributes:
  • Curvature
  • Contact
  • Volumetric
  • 2. Vector (embedding)
  • Address the “open‐world” problem
  • Use sculptures as objects due to their great variety of shape
slide-4
SLIDE 4

Motivation

3D shape from single images:

  • A fundamental goal of computer vision is 3D understanding from

images, e.g. Koenderink & Van Doorn, 1971, and work from 1980s:

  • shape from contour
  • shape from texture
  • shape from specularities
slide-5
SLIDE 5

Motivation

3D shape from single images is somewhat neglected in the ConvNet era, with some exceptions such as:

  • Regressing pixels ‐> depth map
  • Class‐specific reconstructions, e.g. Kar et al., "Category specific object

reconstruction from a single image.", CVPR 2015

Image Normals Depth

Eigen et al. ‘15 Wang et al. ‘15

Among many others: Saxena et al. ’07, Barron et al. ’11 – ’15, Karsch ‘12, Fouhey ’13, ‘14, Eigen ‘14, ’15, Ladicky ’14, Liu ‘14, Baig ‘15, Wang ’15, etc.

slide-6
SLIDE 6

3D Shape Attributes (12 of these)

slide-7
SLIDE 7

Examples

Positives: Has Planar Surfaces

slide-8
SLIDE 8

Examples

Negatives: Has Planar Surfaces

slide-9
SLIDE 9

Examples

Positives: Has Point/Line Contact

slide-10
SLIDE 10

Examples

Negatives: Has Point/Line Contact

slide-11
SLIDE 11

Examples

Positives: Has Thin Structures

slide-12
SLIDE 12

Examples

Negatives: Has Thin Structures

slide-13
SLIDE 13

Examples

Positives: Has Rough Surfaces

slide-14
SLIDE 14

Examples

Negatives: Has Rough Surfaces

slide-15
SLIDE 15

3D Shape Attributes (12 of these)

slide-16
SLIDE 16

Research Question

  • Can ConvNets learn to predict these 3D shape attributes,

and a 3D embedding, in a viewpoint invariant manner?

  • and can they also generalize to other (non‐sculpture)

classes?

slide-17
SLIDE 17

Data

slide-18
SLIDE 18

Data

Princeton Columbus Toronto Yorkshire Malaga London

slide-19
SLIDE 19

Data

slide-20
SLIDE 20

Data

  • R. Serra
  • H. Moore

  • A. Calder

Two Forms The Arch Knife Edge 5 Swords Gwenfritz Eagle

… … … … …

242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters

… …

slide-21
SLIDE 21

Data

  • R. Serra
  • H. Moore

  • A. Calder

Two Forms The Arch Knife Edge 5 Swords Gwenfritz Eagle

242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters

… … … …

slide-22
SLIDE 22

Data Collection

  • R. Serra
  • H. Moore
  • A. Calder
  • B. Hepworth

Two Forms The Arch Knife Edge 5 Swords Eagle Gwenfritz

Artist / Work Vocabulary Construction Viewpoint Clustering + Cleaning + Query expansion

  • R. Serra
  • H. Moore
  • A. Calder
  • B. Hepworth

Two Forms The Arch Knife Edge 5 Swords Eagle Gwenfritz

~250 Artists ~2K Works ~150K Images ~9K Clusters

slide-23
SLIDE 23

Data Statistics

1196 Artists Works Images Train Val Test 459 532 77K 31K 35K 122 61 59 Total 2187 143K 242

slide-24
SLIDE 24

Training Loss Functions

  • Multi‐task learning
  • 1. Attribute classification loss
  • Sum of 12 cross‐entropy losses, one for each attribute
  • 2. Embedding loss
  • Triplet loss to match images of the same work
slide-25
SLIDE 25
  • 1. Attribute classification loss
  • Sum of 12 cross‐entropy losses, one for each attribute

L(Y, P) =

N

X

i=1 L

X

l=1,Yi,l6=∅

Yi,l log(Pi,l) + (1 − Yi,l) log(1 − Pi,l),

for image i and label l, with labels Yi,l ∈ {0, 1, ∅}N,L, and predicted probabilities Pi,l ∈ [0, 1]N,L

Training Loss Functions

slide-26
SLIDE 26

embedding space Rd near far congruous pair incongruous pair CNN encoder Φ

Training Loss Functions

  • 2. Embedding loss
  • Triplet loss to match images of the same work

φ(a) φ(n) φ(p)

a p n

anchor a, positive p, negative n

Triplet loss as in Schults and Joachims ’04, Schroff et al. ’14, Wang et al. ‘15, Parkhi et al. ‘15

slide-27
SLIDE 27

Embedding loss

distance

margin

embedding space Rd near far congruous pair incongruous pair CNN encoder Φ

φ(a) φ(n) φ(p)

a p n

slide-28
SLIDE 28

Embedding loss

||φ(a) − φ(p)||2 + α ≤ ||φ(a) − φ(n)||2

min

φ

X

triplets

max(0, α + ||φ(a) − φ(p)||2 − ||φ(a) − φ(n)||2)

a p n

distance

margin

slide-29
SLIDE 29

Learning To Predict

12D Shape Attributes

  • Conv. Layers

FC Layers Input VGG‐M 1024D Shape Embedding

slide-30
SLIDE 30

Goals of Experiments

  • How well can we do?
  • Are we modeling 3D shape?
  • Does this generalize?
slide-31
SLIDE 31

Qualitative Results

Point/Line Contact Most Least Rough Surface … …

slide-32
SLIDE 32

Qualitative Results

Thin Structures Most Least …

slide-33
SLIDE 33

Quantitative Results

Curvature Contact Planar Not Planar Cylinder Rough Point/Line Multiple 82.8 77.2 56.9 76.0 74.4 76.4 Occupancy Empty 2+ Pieces Has Hole Is Thin Mirror Sym. Cubic Ratio 87.0 60.4 69.3 85.8 60.8 60.3 Mean Area Under ROC

slide-34
SLIDE 34

Learning To Predict

12D Shape Attributes 1024D Shape Embedding

  • Conv. Layers

FC Layers Input

slide-35
SLIDE 35

Mental Rotation

Shepard and Metzler 1971, Tarr et al. ‘98 Are two 3D objects related by a rotation

slide-36
SLIDE 36

Mental Rotation

Shepard and Metzler 1971, Tarr et al. ‘98

Video credit: Thomas Fulcher

slide-37
SLIDE 37

Mental Rotation

  • Use works from different locations and with different materials
  • Classify using distance between vector descriptors
slide-38
SLIDE 38

Mental Rotation – Classification Results

100 million test image pairs

ROC “Easy”: 0.9% positives ROC “Hard”: 0.3% positives

slide-39
SLIDE 39

Does it generalize to other classes?

slide-40
SLIDE 40

Synthetic Results – has planar

P(Planar)

slide-41
SLIDE 41

Synthetic Results – non planar

P(Non Planar)

slide-42
SLIDE 42

Synthetic Results – roughness

P(Rough Surface)

slide-43
SLIDE 43

Point/Line Contact Planarity

PASCAL VOC Results

Most Most Least Least

slide-44
SLIDE 44

PASCAL VOC Results

Toroidal Pieces Most Most Least Least Thin Structures

slide-45
SLIDE 45

Summary

  • Have learnt to predict 3D shape attributes and

shape embedding

  • Dataset to be released
  • Improvements: binary vs relative attributes