Existing Context models Torralba,*Sinha*(2001)* - - PowerPoint PPT Presentation

existing context models
SMART_READER_LITE
LIVE PREVIEW

Existing Context models Torralba,*Sinha*(2001)* - - PowerPoint PPT Presentation

PanoContext A Whole-room 3D Context Model for Panoramic Scene Understanding by Yinda Zhang, Shuran Song, Ping Tan, Jianxiong Xiao Presented by: William Xie Existing Context models Torralba,*Sinha*(2001)*


slide-1
SLIDE 1

PanoContext

by Yinda Zhang, Shuran Song, Ping Tan, Jianxiong Xiao

A Whole-room 3D Context Model 
 for Panoramic Scene Understanding

Presented by: William Xie

slide-2
SLIDE 2

Existing Context models

Carbone(o,*de*Freitas*&*Barnard*(2004)* Kumar,*Hebert*(2005)* Torralba*Murphy*Freeman*(2004)* Fink*&*Perona*(2003)* Sudderth,*Torralba,* Wilsky,*Freeman*(2005)** Hoiem,*Efros,*Hebert*(2005)* Torralba,*Sinha*(2001)* Rabinovich*et*al*(2007)* Heitz*and*Koller*(2008)* Desai,*Ramanan,*and*Fowlkes*(2009)*

DPM$on$PASCAL$VOC$[Felzenszwalb$et$al.]$

Improvement on PASCAL <1.5%

Slide credit: Zhang et al.

slide-3
SLIDE 3

What is this object?

Slide credit: Zhang et al.

slide-4
SLIDE 4

What is this object?

Slide credit: Zhang et al.

slide-5
SLIDE 5

What is this object?

Slide credit: Zhang et al.

slide-6
SLIDE 6

What is this object?

Slide credit: Zhang et al.

slide-7
SLIDE 7

What is this object?

Slide credit: Zhang et al.

slide-8
SLIDE 8

What is this object?

Slide credit: Zhang et al.

slide-9
SLIDE 9

Why didn’ context help?

slide-10
SLIDE 10

Why didn’ context help?

Perhaps we are not using the right data

slide-11
SLIDE 11
  • On average: 1.5 object classes and 2.7 object

instances per image

  • Average camera field of view: 40° - 60° horizontal

PASCAL VOC

slide-12
SLIDE 12
  • 180° horizontal field of view
  • Ability to see depth
  • Ability to change viewpoint

Human Vision

slide-13
SLIDE 13

Remedy

slide-14
SLIDE 14

PanoContext

Slide credit: Zhang et al.

slide-15
SLIDE 15

Input: Panorama

PanoContext

Slide credit: Zhang et al.

slide-16
SLIDE 16

Input: Panorama Output: 2D projected result Output: 3D model

  • !

bed

window mirror painting nightstand sofa desk door painting tv

bed sofa

nightstand painting

tv

mirror door window painting desk chair

PanoContext

Slide credit: Zhang et al.

slide-17
SLIDE 17

Input: Panorama Output: 2D projected result Output: 3D model

  • !

bed

window mirror painting nightstand sofa desk door painting tv

bed sofa

nightstand painting

tv

mirror door window painting desk chair

PanoContext

Output: 3D room exploration

Slide credit: Zhang et al.

slide-18
SLIDE 18

Pipeline

slide-19
SLIDE 19

Pipeline

Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS. 2012.
slide-20
SLIDE 20
  • Vanishing point estimation for panoramas
  • Room layout hypothesis generation
  • 3D object hypotheses generation
  • Whole-room scene hypotheses generation
  • Data-driven holistic ranking

Pipeline

slide-21
SLIDE 21
  • Vanishing point estimation for panoramas
  • Room layout hypothesis generation
  • 3D object hypotheses generation
  • Whole-room scene hypotheses generation
  • Data-driven holistic ranking

Pipeline

slide-22
SLIDE 22
  • Vanishing point estimation for panoramas
  • Room layout hypothesis generation
  • 3D object hypotheses generation
  • Whole-room scene hypotheses generation
  • Data-driven holistic ranking

Pipeline

… …

slide-23
SLIDE 23
  • Vanishing point estimation for panoramas
  • Room layout hypothesis generation
  • 3D object hypotheses generation
  • Whole-room scene hypotheses generation
  • Data-driven holistic ranking

Pipeline

… …

✓!

slide-24
SLIDE 24

Whole Room

Object Room

Input

Generate a pool of hypotheses

Slide credit: Zhang et al.

slide-25
SLIDE 25

Whole Room

Object Room

Input

Generate a pool of hypotheses

Slide credit: Zhang et al.

slide-26
SLIDE 26

Room layout hypothesis

Slide credit: Zhang et al.

slide-27
SLIDE 27

Room layout hypothesis

Line segments detection Algorithm

Slide credit: Zhang et al.

slide-28
SLIDE 28

Room layout hypothesis

Hough transform for vanishing point

Slide credit: Zhang et al.

slide-29
SLIDE 29

Room layout hypothesis

Hough transform for vanishing point

Slide credit: Zhang et al.

Classify a vanishing direction for each line

slide-30
SLIDE 30 Source: Wikipedia, Emaze
slide-31
SLIDE 31

Sample 5 line segments to generate a room layout

Room layout hypothesis

slide-32
SLIDE 32

Sample 5 line segments to generate a room layout

Room layout hypothesis

slide-33
SLIDE 33

Room layout hypothesis

Sample 5 line segments to generate a room layout

Slide credit: Zhang et al.

slide-34
SLIDE 34

Room layout hypothesis

Sample 5 line segments to generate a room layout

Slide credit: Zhang et al.

slide-35
SLIDE 35

Room layout hypothesis

Sample 5 line segments to generate a room layout

Slide credit: Zhang et al.

slide-36
SLIDE 36

Room layout hypothesis

Sample 5 line segments to generate a room layout

Slide credit: Zhang et al.

slide-37
SLIDE 37

Room layout hypothesis

Pixel-wise surface direction estimation

Slide credit: Zhang et al.

slide-38
SLIDE 38 Line segments

Room layout hypothesis

Slide credit: Zhang et al.

slide-39
SLIDE 39 Line segments

Room layout hypothesis

Slide credit: Zhang et al.

slide-40
SLIDE 40 Surface normal estimation Line segments

Room layout hypothesis

Slide credit: Zhang et al.

slide-41
SLIDE 41 Surface normal estimation Line segments

Room layout hypothesis

Slide credit: Zhang et al.

slide-42
SLIDE 42

0.770 Consistency Score:

Surface normal estimation Line segments

Room layout hypothesis

Slide credit: Zhang et al.

slide-43
SLIDE 43

0.770

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

slide-44
SLIDE 44

0.770 0.711

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

slide-45
SLIDE 45

0.770 0.711

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

slide-46
SLIDE 46

0.770 0.711 0.504

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

slide-47
SLIDE 47

0.770 0.711

Surface normal estimation Line segments

Room layout hypothesis

0.504 Consistency Score:

Slide credit: Zhang et al.

slide-48
SLIDE 48

0.770 0.711

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

slide-49
SLIDE 49

0.770 0.711

Surface normal estimation Line segments

Room layout hypothesis

Consistency Score:

Slide credit: Zhang et al.

Top 50

  • nly
slide-50
SLIDE 50

Whole Room

Object Room

Input

Generate a pool of hypotheses

Slide credit: Zhang et al.

slide-51
SLIDE 51

Input: a single-view panorama

Cuboid detection

Slide credit: Zhang et al.

slide-52
SLIDE 52

Input: a single-view panorama

Cuboid detection

Fitted cuboids

Slide credit: Zhang et al.

slide-53
SLIDE 53

Cuboid detection

Selective search DPM-esque

Slide credit: Zhang et al.

6 rays and 3 vanishing points Largest IoU with the segment

slide-54
SLIDE 54

Semantic classification

bed desk sofa … chair

Features Random forest Object categories

  • Size
  • Aspect ratio & Area
  • Distance to walls

Slide credit: Zhang et al.

slide-55
SLIDE 55

70% Accuracy

Semantic classification

bed desk sofa … chair

Features Random forest Object categories

  • Size
  • Aspect ratio & Area
  • Distance to walls

Slide credit: Zhang et al.

slide-56
SLIDE 56

bed

Slide credit: Zhang et al.

Semantic classification

slide-57
SLIDE 57

nightstand

Slide credit: Zhang et al.

Semantic classification

slide-58
SLIDE 58

painting

Slide credit: Zhang et al.

Semantic classification

slide-59
SLIDE 59

Slide credit: Zhang et al.

Pairwise constraint

slide-60
SLIDE 60

Whole Room

Object Room

Input

Generate a pool of hypotheses

Slide credit: Zhang et al.

slide-61
SLIDE 61

Data-driven sampling

Slide credit: Zhang et al.

With P(layout) ∝ normal consistency score Randomly sample a room layout

slide-62
SLIDE 62

Data-driven sampling

Slide credit: Zhang et al.

With P(layout) ∝ normal consistency score Randomly sample a room layout

slide-63
SLIDE 63

Data-driven sampling

Slide credit: Zhang et al.

slide-64
SLIDE 64

Decide number of object based on prior distribu<on:

paintin g 2 bed 1 desk 1 nightst and 1 mirror 1 sofa 1 tv 1 window 1

Data-driven sampling

Slide credit: Zhang et al.

slide-65
SLIDE 65

Decide object sampling sequence based on bo?om up scores: Decide number of object based on prior distribu<on:

paintin g 2 bed 1 desk 1 nightst and 1 mirror 1 sofa 1 tv 1 window 1 bed nightstand painting desk window painting tv sofa mirror

Data-driven sampling

Slide credit: Zhang et al.

slide-66
SLIDE 66

Bottom-up score as bed

Sample a bed in empty room first…

70 60 50 40 30 20 10 High Low Confidence

Data-driven sampling

Slide credit: Zhang et al.

slide-67
SLIDE 67

Randomly select one according to bottom up priority

Sample a bed in empty room first…

Data-driven sampling

Slide credit: Zhang et al.

rectangle detection score, semantic classifier score

slide-68
SLIDE 68

Randomly select one according to the bottom up + pair-wise priority

Data-driven sampling

Then, sample a nightstand given a bed

Slide credit: Zhang et al.

mean distance to the K nearest neighbors

slide-69
SLIDE 69

Slide credit: Zhang et al.

Pairwise constraint

slide-70
SLIDE 70

Keep on sampling until finishing the list…

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

bed

nightstand

painting

Data-driven sampling

Slide credit: Zhang et al.

slide-71
SLIDE 71

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting

Data-driven sampling

Slide credit: Zhang et al.

slide-72
SLIDE 72

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting

window

Data-driven sampling

Slide credit: Zhang et al.

slide-73
SLIDE 73

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting painting

window

Data-driven sampling

Slide credit: Zhang et al.

slide-74
SLIDE 74

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting painting

tv

window

Data-driven sampling

Slide credit: Zhang et al.

slide-75
SLIDE 75

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting painting

sofa

tv

window

Data-driven sampling

Slide credit: Zhang et al.

slide-76
SLIDE 76

List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror

Keep on sampling until finishing the list…

bed

nightstand desk

painting

mirror

painting

sofa

tv

window

Data-driven sampling

Slide credit: Zhang et al.

slide-77
SLIDE 77

bed

nightstand desk

painting

mirror

painting

sofa

tv

window

Whole-room sampling is finished.

Keep on sampling until finishing the list…

Data-driven sampling

Slide credit: Zhang et al.

slide-78
SLIDE 78

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

Learn a linear SVM for scoring and take the best

Holistic ranking

f ,

( )

f ,

( )

Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200

f ,

( )

Slide credit: Zhang et al.

N MN

slide-79
SLIDE 79

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

Learn a linear SVM for scoring and take the best

Holistic ranking

f ,

( )

f ,

( )

Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200

f ,

( )

wT wT wT

Slide credit: Zhang et al.

N MN

slide-80
SLIDE 80

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

Learn a linear SVM for scoring and take the best

Holistic ranking

f ,

( )

f ,

( )

Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200

f ,

( )

✗! ✗! ✓!

wT wT wT

Slide credit: Zhang et al.

N MN

slide-81
SLIDE 81

Matching cost Binary label

Holistic ranking

slide-82
SLIDE 82

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Slide credit: Zhang et al.

slide-83
SLIDE 83 Hypothesis

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Slide credit: Zhang et al.

slide-84
SLIDE 84

Dataset

Hypothesis Ground Truth 1 Ground Truth 2 Ground Truth N

… …

1.40 0.90 0.20

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Slide credit: Zhang et al.

slide-85
SLIDE 85

Dataset

Hypothesis Ground Truth 1 Ground Truth 2 Ground Truth N

… …

1.40 0.90 0.20

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Slide credit: Zhang et al.

Centroid distance, IoU, semantic type consistency

slide-86
SLIDE 86 Resize X Fix dist. to wall Rotation & Scale

Dataset

A ground truth room Hypothesis

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Transformed ground truth

Slide credit: Zhang et al.

slide-87
SLIDE 87 Resize X Fix dist. to wall Rotation & Scale

Dataset

A ground truth room Hypothesis

Holistic feature

f ,

( )

−200 −100 100 200 100 200 300 400 500 −100 100 200

= bottom-up feature + top-down feature

Transformed ground truth

Slide credit: Zhang et al.

Pick 10 with the lowest cost

slide-88
SLIDE 88
  • !

2D and 3D boxy representation of the scene

Final outputs

door

tv

mirror

painting painting

sofa

nightstand window

desk

chair

bed

door mirror

tv sofa

painting painting nightstand

bed

desk chair

Slide credit: Zhang et al.

slide-89
SLIDE 89 −100 100 200 300 400 500 −300 −200 −100 100 200 −200 −100 100 200 300

door

painting

end table

sofa

sofa

coffee table

painting end table door sofa sofa coffee table

2D and 3D boxy representation of the scene

Final outputs

Slide credit: Zhang et al.

slide-90
SLIDE 90

DPM: Wrong relative position Our detection

  • Helps to decide sizes of objects
  • Helps to decide number of objects
  • Helps to constrain relative position

How does 3D context help?

nightstand desk

Slide credit: Zhang et al.

slide-91
SLIDE 91

Context v.s. Appearance

D

0.5

DPM

0.5

P

0.5 0.5 1 0.5 1

bed

Precision Recall

0.5 1 0.5 1 painting door 0.5 1 0.5 1 desk chair Precision Recall Precision Recall 0.5 1 1 0.5 1 0.5 1 tv DPM Precision Recall 1 0.5 1 0.5 1 chair Precision Recall
  • Context is as powerful as local appearance for detection
  • Context is complementary with local appearance

Context+Detector

1

F PanoContext

Slide credit: Zhang et al.

slide-92
SLIDE 92

Context v.s. Appearance

D

0.5

DPM

0.5

P

0.5 0.5 1 0.5 1

bed

Precision Recall

0.5 1 0.5 1 painting door 0.5 1 0.5 1 desk chair Precision Recall Precision Recall 0.5 1 1 0.5 1 0.5 1 tv DPM Precision Recall 1 0.5 1 0.5 1 chair Precision Recall
  • Context is as powerful as local appearance for detection
  • Context is complementary with local appearance

Context+Detector

1

F PanoContext

Slide credit: Zhang et al.

slide-93
SLIDE 93

Is larger FOV helpful for room layout estimation?

slide-94
SLIDE 94

Is larger FOV better for context?

slide-95
SLIDE 95

My Take

  • Elements of the ensemble could be valuable
  • Too data driven, hard to generalize
  • Future: relax the cuboid constraints, try other ways

to integrate visual recognition in the pipeline

slide-96
SLIDE 96

Discussion

  • How can the model be generalized to other scene

categories (e.g. outdoor)?

  • Performance on deformable or non-axis aligned
  • bjects?
  • Chairs and other non-standard layout objects?
  • Indoor understanding and VQA?
slide-97
SLIDE 97

Is context important in sampling and ranking?

slide-98
SLIDE 98
slide-99
SLIDE 99