3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman - - PowerPoint PPT Presentation
3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman - - PowerPoint PPT Presentation
3D Shape Attributes David Fouhey, Abhinav Gupta and Andrew Zisserman CMU & University of Oxford http://www.robots.ox.ac.uk/~vgg To appear: CVPR 2016 Motivation How to describe this object? 1. Label: Henry Moore Sculpture, Oval with
Motivation
- How to describe this object?
- 1. Label: Henry Moore Sculpture,
“Oval with Points”
- 2. Shape description: 3D solid object,
smooth for the most part but has pointed/conical parts, has hole, bulbous, rectangular (portrait) aspect ratio, approx. mirror symmetry
Motivation
- Objective: represent the shape of 3D objects (in a
viewpoint invariant manner)
- 1. 3D Shape Attributes:
- Curvature
- Contact
- Volumetric
- …
- 2. Vector (embedding)
- Address the “open‐world” problem
- Use sculptures as objects due to their great variety of shape
Motivation
3D shape from single images:
- A fundamental goal of computer vision is 3D understanding from
images, e.g. Koenderink & Van Doorn, 1971, and work from 1980s:
- shape from contour
- shape from texture
- shape from specularities
- …
Motivation
3D shape from single images is somewhat neglected in the ConvNet era, with some exceptions such as:
- Regressing pixels ‐> depth map
- Class‐specific reconstructions, e.g. Kar et al., "Category specific object
reconstruction from a single image.", CVPR 2015
Image Normals Depth
Eigen et al. ‘15 Wang et al. ‘15
Among many others: Saxena et al. ’07, Barron et al. ’11 – ’15, Karsch ‘12, Fouhey ’13, ‘14, Eigen ‘14, ’15, Ladicky ’14, Liu ‘14, Baig ‘15, Wang ’15, etc.
3D Shape Attributes (12 of these)
Examples
Positives: Has Planar Surfaces
Examples
Negatives: Has Planar Surfaces
Examples
Positives: Has Point/Line Contact
Examples
Negatives: Has Point/Line Contact
Examples
Positives: Has Thin Structures
Examples
Negatives: Has Thin Structures
Examples
Positives: Has Rough Surfaces
Examples
Negatives: Has Rough Surfaces
3D Shape Attributes (12 of these)
Research Question
- Can ConvNets learn to predict these 3D shape attributes,
and a 3D embedding, in a viewpoint invariant manner?
- and can they also generalize to other (non‐sculpture)
classes?
Data
Data
Princeton Columbus Toronto Yorkshire Malaga London
Data
Data
- R. Serra
- H. Moore
…
- A. Calder
…
Two Forms The Arch Knife Edge 5 Swords Gwenfritz Eagle
… … … … …
242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters
… …
Data
- R. Serra
- H. Moore
…
- A. Calder
…
Two Forms The Arch Knife Edge 5 Swords Gwenfritz Eagle
…
242 Artists 2187 Works 143K Images in 9352 Viewpoint Clusters
… … … …
Data Collection
- R. Serra
- H. Moore
- A. Calder
- B. Hepworth
Two Forms The Arch Knife Edge 5 Swords Eagle Gwenfritz
Artist / Work Vocabulary Construction Viewpoint Clustering + Cleaning + Query expansion
- R. Serra
- H. Moore
- A. Calder
- B. Hepworth
Two Forms The Arch Knife Edge 5 Swords Eagle Gwenfritz
~250 Artists ~2K Works ~150K Images ~9K Clusters
Data Statistics
1196 Artists Works Images Train Val Test 459 532 77K 31K 35K 122 61 59 Total 2187 143K 242
Training Loss Functions
- Multi‐task learning
- 1. Attribute classification loss
- Sum of 12 cross‐entropy losses, one for each attribute
- 2. Embedding loss
- Triplet loss to match images of the same work
- 1. Attribute classification loss
- Sum of 12 cross‐entropy losses, one for each attribute
L(Y, P) =
N
X
i=1 L
X
l=1,Yi,l6=∅
Yi,l log(Pi,l) + (1 − Yi,l) log(1 − Pi,l),
for image i and label l, with labels Yi,l ∈ {0, 1, ∅}N,L, and predicted probabilities Pi,l ∈ [0, 1]N,L
Training Loss Functions
embedding space Rd near far congruous pair incongruous pair CNN encoder Φ
Training Loss Functions
- 2. Embedding loss
- Triplet loss to match images of the same work
φ(a) φ(n) φ(p)
a p n
anchor a, positive p, negative n
Triplet loss as in Schults and Joachims ’04, Schroff et al. ’14, Wang et al. ‘15, Parkhi et al. ‘15
Embedding loss
distance
margin
embedding space Rd near far congruous pair incongruous pair CNN encoder Φ
φ(a) φ(n) φ(p)
a p n
Embedding loss
||φ(a) − φ(p)||2 + α ≤ ||φ(a) − φ(n)||2
min
φ
X
triplets
max(0, α + ||φ(a) − φ(p)||2 − ||φ(a) − φ(n)||2)
a p n
distance
margin
Learning To Predict
12D Shape Attributes
- Conv. Layers
FC Layers Input VGG‐M 1024D Shape Embedding
Goals of Experiments
- How well can we do?
- Are we modeling 3D shape?
- Does this generalize?
Qualitative Results
Point/Line Contact Most Least Rough Surface … …
Qualitative Results
Thin Structures Most Least …
Quantitative Results
Curvature Contact Planar Not Planar Cylinder Rough Point/Line Multiple 82.8 77.2 56.9 76.0 74.4 76.4 Occupancy Empty 2+ Pieces Has Hole Is Thin Mirror Sym. Cubic Ratio 87.0 60.4 69.3 85.8 60.8 60.3 Mean Area Under ROC
Learning To Predict
12D Shape Attributes 1024D Shape Embedding
- Conv. Layers
FC Layers Input
Mental Rotation
Shepard and Metzler 1971, Tarr et al. ‘98 Are two 3D objects related by a rotation
Mental Rotation
Shepard and Metzler 1971, Tarr et al. ‘98
Video credit: Thomas Fulcher
Mental Rotation
- Use works from different locations and with different materials
- Classify using distance between vector descriptors
Mental Rotation – Classification Results
100 million test image pairs
ROC “Easy”: 0.9% positives ROC “Hard”: 0.3% positives
Does it generalize to other classes?
Synthetic Results – has planar
P(Planar)
Synthetic Results – non planar
P(Non Planar)
Synthetic Results – roughness
P(Rough Surface)
Point/Line Contact Planarity
PASCAL VOC Results
Most Most Least Least
PASCAL VOC Results
Toroidal Pieces Most Most Least Least Thin Structures
Summary
- Have learnt to predict 3D shape attributes and
shape embedding
- Dataset to be released
- Improvements: binary vs relative attributes