Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, - - PowerPoint PPT Presentation

image interpretation
SMART_READER_LITE
LIVE PREVIEW

Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, - - PowerPoint PPT Presentation

Learning from 3D Data for Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, Adrien Matricon, Wajahat Hussain Slides adapted from David Fouhey Mid-level primitives learned from image+3D can be used to transfer geometric


slide-1
SLIDE 1

Learning from 3D Data for Image Interpretation

Martial Hebert Abhinav Gupta David Fouhey, Adrien Matricon, Wajahat Hussain

slide-2
SLIDE 2

Slides adapted from David Fouhey

slide-3
SLIDE 3
  • Mid-level primitives learned from image+3D can

be used to transfer geometric information?

  • Geometric reasoning can use this local evidence

to produce a consistent geometric interpretation?

slide-4
SLIDE 4

Pattern Repetition

Common patterns correspond to common geometric configurations

slide-5
SLIDE 5

Pattern Repetition

slide-6
SLIDE 6

Pattern Repetition

...

slide-7
SLIDE 7

Physical/Geometric Constraints

slide-8
SLIDE 8

Primitives

Visually Discriminative

Image

Geometrically Informative

Surface Normals

Saurabh Singh et al. Discriminative Mid-Level Patches

slide-9
SLIDE 9

Geometric configurations from large-scale RGBD data.

NYU v2 Dataset (Silberman et al., 2012)

slide-10
SLIDE 10

Representation

Instances Detector Canonical Form

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Representation

Instances

w

Detector Canonical Form

8x8

slide-16
SLIDE 16

Representation

Instances Detector Canonical Form

N

10x10

slide-17
SLIDE 17

Representation

Detector Canonical Form Instances

y

slide-18
SLIDE 18

Learning Primitives

Primitive Patch

min

y,w,N 𝑆 𝑥 + 𝑑1𝑧𝑗Δ N, x𝑗 𝐻 + 𝑑2𝑀(w, N, x𝑗 𝐵, 𝑧𝑗) 𝑗

10x10

slide-19
SLIDE 19

Learning Primitives

Approach: iterative procedure

slide-20
SLIDE 20

Learning Primitives

= Avg

( )

slide-21
SLIDE 21

Learning Primitives

Patches Geometrically Dissimilar to N Cluster Instances

slide-22
SLIDE 22

Learning Primitives

slide-23
SLIDE 23

Learning Primitives

Initialize y by clustering sampled patches

slide-24
SLIDE 24

Inference

Sparse Transfer …

19s

slide-25
SLIDE 25

Inference

Sparse Transfer …

slide-26
SLIDE 26

Inference

Sparse Transfer

slide-27
SLIDE 27

Inference

Dense Transfer

slide-28
SLIDE 28

Sample Results – Qualitative

795 /654

slide-29
SLIDE 29

Confidences

Most Confident Result Least Confident Result

rank

slide-30
SLIDE 30

Cross-dataset

PETS B3DO

slide-31
SLIDE 31

Failures

slide-32
SLIDE 32

Mean Summary Stats (⁰) (Lower Better) Median RMSE % Good Pixels (Higher Better) 11.25⁰ 22.5⁰ 30⁰ 3D Primitives 33.0 28.3 18.8 40.7 52.4 40.0 Karsch et al. 40.8 37.8 7.9 25.8 38.2 46.9 Hoiem et al. 41.2 9.0 31.7 43.9 49.3 34.8 Singh et al. 35.0 32.4 11.2 32.1 45.8 40.6 Saxena et al. 47.1 11.2 28.0 37.4 56.3 42.3 RF + Dense SIFT 36.0 11.4 31.1 44.2 41.7 33.4

RMSE

slide-33
SLIDE 33

Using geometric and physical constraints

slide-34
SLIDE 34

The Story So Far (Sparse)

slide-35
SLIDE 35

The Story So Far (Dense)

slide-36
SLIDE 36

The Story So Far

slide-37
SLIDE 37

Adding Physical/Geometric Constraints

slide-38
SLIDE 38

Adding Physical/Geometric Constraints

slide-39
SLIDE 39

Past Physical Constraints

Camera-in-a-box Top-down Cuboid

Hedau et al. 2009, Flint et al. 2011, Satkin et al. 2012, Schwing et al. 2012, etc. Lee et al. 2010, Gupta et al. 2010, Xiao et al. 2012, etc.

slide-40
SLIDE 40

Digression: Inspiration from the past….

Kanade’s Origami World, 1978

slide-41
SLIDE 41

From the past….

  • Kanade’s chair… (Artificial Intelligence, 1981)
slide-42
SLIDE 42

Concave ( - ) Convex ( + )

Edges between surfaces

slide-43
SLIDE 43

Concave ( - ) Convex ( + )

Edges between surfaces

slide-44
SLIDE 44

Parameterization

vp

1

vp

2

vp

3

slide-45
SLIDE 45

Parameterization

vp

1

vp

2

vp

3

Schwing 2013, Hedau 2010

slide-46
SLIDE 46

Parameterization

vp

1

vp

2

vp

3

slide-47
SLIDE 47

Parameterization

slide-48
SLIDE 48

Parameterization

32/64

slide-49
SLIDE 49

Parameterization

slide-50
SLIDE 50

Parameterization

slide-51
SLIDE 51

Labeling

: is cell i on?

slide-52
SLIDE 52

Formulation

slide-53
SLIDE 53

Variable

: is cell i on?

slide-54
SLIDE 54

Unary Potentials

: should cell i be on?

slide-55
SLIDE 55

Binary Potentials

: should cells i and j both be on?

slide-56
SLIDE 56

Binary Potentials

Convex ( + ) Concave ( - )

slide-57
SLIDE 57

8o7s+UCM

slide-58
SLIDE 58

Binary Potentials

Convex ( + ) Concave ( - )

8o7s

slide-59
SLIDE 59

Constraints

What configurations are forbidden?

Gurobi BB

slide-60
SLIDE 60

Projected 3D Primitives 3D Primitives Proposed Input Ground Truth

slide-61
SLIDE 61

Qualitative Results

Projected 3D Primitives 3D Primitives Proposed Input Ground Truth

slide-62
SLIDE 62

Projected 3D Primitives 3D Primitives Proposed Input Ground Truth

slide-63
SLIDE 63

Random Qualitative Results

Proposed 3D Primitives

slide-64
SLIDE 64

Quantitative Results

Proposed Mean Summary Stats (⁰) (Lower Better) % Good Pixels (Higher Better) Median RMSE 11.25⁰ 22.5⁰ 30⁰ 37.5 17.2 41.9 53.9 58.0 53.2 3D Primitives 38.5 19.0 41.7 52.4 56.3 54.2 Hedau et al. 43.2 24.8 39.1 48.8 52.3 59.4 Lee et al. 47.6 43.4 28.1 39.7 43.9 60.6 Karsch et al. 46.6 43.0 5.4 19.9 31.5 53.6 Hoiem et al. 45.6 8.6 30.5 41.0 55.1 38.2

rank

slide-65
SLIDE 65

Style vs. structure?

Tenenbaum & Freeman. Separating Style and Content with Bilinear Models. Neural

  • Computation. 2000.
slide-66
SLIDE 66

Casablanca Hotel, New York

slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69

More general environments?

slide-70
SLIDE 70

KITTI Dataset: Geiger, Lenz, Urtasun, ‘12

slide-71
SLIDE 71
  • Large regions without surface interpretation
  • Fewer linear/planar structures to anchor
  • Irregular distribution of 3D training data
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74

Discovered Primitives (Examples)

747/203

slide-75
SLIDE 75

Contact points

slide-76
SLIDE 76

Object surfaces + Contact points

slide-77
SLIDE 77

Next:

Better reasoning Semantic information Less structured environments Evaluation Applications

Data-Driven 3D Primitives For Single-Image Understanding, Fouhey, Gupta, Hebert, In ICCV 2013. Unfolding an Indoor Origami World, Fouhey, Gupta, Hebert, In ECCV 2014.

slide-78
SLIDE 78
  • Harvested from tripadvisor.com
slide-79
SLIDE 79

Sheraton Los Angeles Le Champlain Quebec Meritan Apartments Sydney

slide-80
SLIDE 80

Project digression…..

slide-81
SLIDE 81

Next:

Better reasoning Semantic information Less structured environments Evaluation Applications

Data-Driven 3D Primitives For Single-Image Understanding, Fouhey, Gupta, Hebert, In ICCV 2013. Unfolding an Indoor Origami World, Fouhey, Gupta, Hebert, In ECCV 2014.

slide-82
SLIDE 82

Results – Quantitative

Recall