PanoContext
by Yinda Zhang, Shuran Song, Ping Tan, Jianxiong Xiao
A Whole-room 3D Context Model for Panoramic Scene Understanding
Presented by: William Xie
Existing Context models Torralba,*Sinha*(2001)* - - PowerPoint PPT Presentation
PanoContext A Whole-room 3D Context Model for Panoramic Scene Understanding by Yinda Zhang, Shuran Song, Ping Tan, Jianxiong Xiao Presented by: William Xie Existing Context models Torralba,*Sinha*(2001)*
by Yinda Zhang, Shuran Song, Ping Tan, Jianxiong Xiao
A Whole-room 3D Context Model for Panoramic Scene Understanding
Presented by: William Xie
Existing Context models
Carbone(o,*de*Freitas*&*Barnard*(2004)* Kumar,*Hebert*(2005)* Torralba*Murphy*Freeman*(2004)* Fink*&*Perona*(2003)* Sudderth,*Torralba,* Wilsky,*Freeman*(2005)** Hoiem,*Efros,*Hebert*(2005)* Torralba,*Sinha*(2001)* Rabinovich*et*al*(2007)* Heitz*and*Koller*(2008)* Desai,*Ramanan,*and*Fowlkes*(2009)*DPM$on$PASCAL$VOC$[Felzenszwalb$et$al.]$
Improvement on PASCAL <1.5%
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
What is this object?
Slide credit: Zhang et al.
Why didn’ context help?
Why didn’ context help?
Perhaps we are not using the right data
instances per image
PASCAL VOC
Human Vision
Remedy
PanoContext
Slide credit: Zhang et al.
Input: Panorama
PanoContext
Slide credit: Zhang et al.
Input: Panorama Output: 2D projected result Output: 3D model
bed
window mirror painting nightstand sofa desk door painting tvbed sofa
nightstand paintingtv
mirror door window painting desk chairPanoContext
Slide credit: Zhang et al.
Input: Panorama Output: 2D projected result Output: 3D model
bed
window mirror painting nightstand sofa desk door painting tvbed sofa
nightstand paintingtv
mirror door window painting desk chairPanoContext
Output: 3D room exploration
Slide credit: Zhang et al.
Pipeline
Pipeline
Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS. 2012.Pipeline
Pipeline
Pipeline
… …
Pipeline
… …
Whole Room
Object Room
Input
Generate a pool of hypotheses
Slide credit: Zhang et al.
Whole Room
Object Room
Input
Generate a pool of hypotheses
Slide credit: Zhang et al.
Room layout hypothesis
Slide credit: Zhang et al.
Room layout hypothesis
Line segments detection Algorithm
Slide credit: Zhang et al.
Room layout hypothesis
Hough transform for vanishing point
Slide credit: Zhang et al.
Room layout hypothesis
Hough transform for vanishing point
Slide credit: Zhang et al.
Classify a vanishing direction for each line
Sample 5 line segments to generate a room layout
Room layout hypothesis
Sample 5 line segments to generate a room layout
Room layout hypothesis
Room layout hypothesis
Sample 5 line segments to generate a room layout
Slide credit: Zhang et al.
Room layout hypothesis
Sample 5 line segments to generate a room layout
Slide credit: Zhang et al.
Room layout hypothesis
Sample 5 line segments to generate a room layout
Slide credit: Zhang et al.
Room layout hypothesis
Sample 5 line segments to generate a room layout
Slide credit: Zhang et al.
Room layout hypothesis
Pixel-wise surface direction estimation
Slide credit: Zhang et al.
Room layout hypothesis
Slide credit: Zhang et al.
Room layout hypothesis
Slide credit: Zhang et al.
Room layout hypothesis
Slide credit: Zhang et al.
Room layout hypothesis
Slide credit: Zhang et al.
0.770 Consistency Score:
Surface normal estimation Line segmentsRoom layout hypothesis
Slide credit: Zhang et al.
0.770
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
0.770 0.711
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
0.770 0.711
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
0.770 0.711 0.504
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
0.770 0.711
Surface normal estimation Line segmentsRoom layout hypothesis
0.504 Consistency Score:
Slide credit: Zhang et al.
0.770 0.711
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
0.770 0.711
Surface normal estimation Line segmentsRoom layout hypothesis
Consistency Score:
Slide credit: Zhang et al.
Top 50
Whole Room
Object Room
Input
Generate a pool of hypotheses
Slide credit: Zhang et al.
Input: a single-view panorama
Cuboid detection
Slide credit: Zhang et al.
Input: a single-view panorama
Cuboid detection
Fitted cuboids
Slide credit: Zhang et al.
Cuboid detection
Selective search DPM-esque
Slide credit: Zhang et al.
6 rays and 3 vanishing points Largest IoU with the segment
Semantic classification
bed desk sofa … chair
Features Random forest Object categories
Slide credit: Zhang et al.
70% Accuracy
Semantic classification
bed desk sofa … chair
Features Random forest Object categories
Slide credit: Zhang et al.
bed
Slide credit: Zhang et al.
Semantic classification
nightstand
Slide credit: Zhang et al.
Semantic classification
painting
Slide credit: Zhang et al.
Semantic classification
Slide credit: Zhang et al.
Pairwise constraint
Whole Room
Object Room
Input
Generate a pool of hypotheses
Slide credit: Zhang et al.
Data-driven sampling
Slide credit: Zhang et al.
With P(layout) ∝ normal consistency score Randomly sample a room layout
Data-driven sampling
Slide credit: Zhang et al.
With P(layout) ∝ normal consistency score Randomly sample a room layout
Data-driven sampling
Slide credit: Zhang et al.
Decide number of object based on prior distribu<on:
paintin g 2 bed 1 desk 1 nightst and 1 mirror 1 sofa 1 tv 1 window 1
Data-driven sampling
Slide credit: Zhang et al.
Decide object sampling sequence based on bo?om up scores: Decide number of object based on prior distribu<on:
paintin g 2 bed 1 desk 1 nightst and 1 mirror 1 sofa 1 tv 1 window 1 bed nightstand painting desk window painting tv sofa mirror
Data-driven sampling
Slide credit: Zhang et al.
Bottom-up score as bed
Sample a bed in empty room first…
70 60 50 40 30 20 10 High Low ConfidenceData-driven sampling
Slide credit: Zhang et al.
Randomly select one according to bottom up priority
Sample a bed in empty room first…
Data-driven sampling
Slide credit: Zhang et al.
rectangle detection score, semantic classifier score
Randomly select one according to the bottom up + pair-wise priority
Data-driven sampling
Then, sample a nightstand given a bed
Slide credit: Zhang et al.
mean distance to the K nearest neighbors
Slide credit: Zhang et al.
Pairwise constraint
Keep on sampling until finishing the list…
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
bed
nightstand
paintingData-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
paintingData-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
paintingwindow
Data-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
painting paintingwindow
Data-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
painting paintingtv
window
Data-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
painting paintingsofa
tv
window
Data-driven sampling
Slide credit: Zhang et al.
List: bed, nightstand, painting, desk, window, painting, TV, sofa, mirror
Keep on sampling until finishing the list…
bed
nightstand desk
paintingmirror
paintingsofa
tv
window
Data-driven sampling
Slide credit: Zhang et al.
bed
nightstand desk
paintingmirror
paintingsofa
tv
window
Whole-room sampling is finished.
Keep on sampling until finishing the list…
Data-driven sampling
Slide credit: Zhang et al.
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200Learn a linear SVM for scoring and take the best
Holistic ranking
f ,
( )
f ,
( )
Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200f ,
( )
Slide credit: Zhang et al.
N MN
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200Learn a linear SVM for scoring and take the best
Holistic ranking
f ,
( )
f ,
( )
Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200f ,
( )
wT wT wT
Slide credit: Zhang et al.
N MN
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200Learn a linear SVM for scoring and take the best
Holistic ranking
f ,
( )
f ,
( )
Input: a single-view panorama −500 −400 −300 −200 −100 100 −100 100 200 −150 −100 −50 50 100 150 200 5 13 3 6 4 11 1 8 12 9 7 10 2 −100 100 200 −300 −200 −100 100 200 −150 −100 −50 50 100 150 9 7 3 10 5 14 1 4 13 11 8 2 12 6 −200 −100 100 200 100 200 300 400 500 −100 100 200f ,
( )
wT wT wT
Slide credit: Zhang et al.
N MN
Matching cost Binary label
Holistic ranking
Holistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Slide credit: Zhang et al.
Holistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Slide credit: Zhang et al.
Dataset
Hypothesis Ground Truth 1 Ground Truth 2 Ground Truth N… …
1.40 0.90 0.20
Holistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Slide credit: Zhang et al.
Dataset
Hypothesis Ground Truth 1 Ground Truth 2 Ground Truth N… …
1.40 0.90 0.20
Holistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Slide credit: Zhang et al.
Centroid distance, IoU, semantic type consistency
Dataset
A ground truth room HypothesisHolistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Transformed ground truth
Slide credit: Zhang et al.
Dataset
A ground truth room HypothesisHolistic feature
f ,
( )
−200 −100 100 200 100 200 300 400 500 −100 100 200= bottom-up feature + top-down feature
Transformed ground truth
Slide credit: Zhang et al.
Pick 10 with the lowest cost
2D and 3D boxy representation of the scene
Final outputs
door
tv
mirror
painting paintingsofa
nightstand windowdesk
chairbed
door mirrortv sofa
painting painting nightstandbed
desk chairSlide credit: Zhang et al.
door
paintingend table
sofa
sofa
coffee table
painting end table door sofa sofa coffee table2D and 3D boxy representation of the scene
Final outputs
Slide credit: Zhang et al.
DPM: Wrong relative position Our detection
How does 3D context help?
nightstand desk
Slide credit: Zhang et al.
Context v.s. Appearance
D
0.5
DPM
0.5
P
0.5 0.5 1 0.5 1
bed
Precision Recall
0.5 1 0.5 1 painting door 0.5 1 0.5 1 desk chair Precision Recall Precision Recall 0.5 1 1 0.5 1 0.5 1 tv DPM Precision Recall 1 0.5 1 0.5 1 chair Precision RecallContext+Detector
1
F PanoContext
Slide credit: Zhang et al.
Context v.s. Appearance
D
0.5
DPM
0.5
P
0.5 0.5 1 0.5 1
bed
Precision Recall
0.5 1 0.5 1 painting door 0.5 1 0.5 1 desk chair Precision Recall Precision Recall 0.5 1 1 0.5 1 0.5 1 tv DPM Precision Recall 1 0.5 1 0.5 1 chair Precision RecallContext+Detector
1
F PanoContext
Slide credit: Zhang et al.
Is larger FOV helpful for room layout estimation?
Is larger FOV better for context?
My Take
to integrate visual recognition in the pipeline
Discussion
categories (e.g. outdoor)?
Is context important in sampling and ranking?