Rent3D:
Floor-Plan Priors for Monocular Layout Estimation
Chenxi Liu1,∗ Alexander Schwing2,∗ Kaustav Kundu2 Raquel Urtasun2 Sanja Fidler2
1Tsinghua University, 2University of Toronto
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 / 22
Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu - - PowerPoint PPT Presentation
Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 , Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 /
1Tsinghua University, 2University of Toronto
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
United States: 11.7% per year Craigslist: 90,000 rental ads per day only in New York 10 million people visit the website per day
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22
Particularly during a winter in Toronto
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 3 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 4 / 22
Plus some meta information e.g. wall height
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 5 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22
Camera localization within apartment
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22
Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 ⊲ Choi et al., 2013 Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction from large photo collections or video ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work
Lee et al., 2010 Xiao & Furukawa, 2012 Cabral & Furukawa, 2014
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22
Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 ⊲ Choi et al., 2013 Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction from large photo collections or video ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work
Lee et al., 2010 Xiao & Furukawa, 2012 Cabral & Furukawa, 2014
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Accurate camera localization: Scene cues
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Accurate camera localization: Scene cues Semantic cues
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
Accurate camera localization: Scene cues Semantic cues Geometric cues by exploiting the dimension information
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room Front wall is the plane defined by vp0 and vp1
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room) y . . . rays representing a room layout Typical parametrization for room layout [Hedau et al., 2009]: Room is a 3D cuboid y = (y1, y2, y3, y4) 4 rays needed to define it vp0 vp1 vp2 y4 y1 y2 y3 r4 r1 r2 r3
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
r ∈ {1, . . . , R} . . . discrete random variable representing the room cr ∈ {1, . . . , |Cr|} . . . a discrete variable representing within room r which wall the picture is facing (|Cr| the number of walls in a room) y . . . rays representing a room layout We formulate the problem as inference in a Conditional Random Field with the following energy: E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y) Potential: Score of a scene classifier predicting scene type (e.g., bedroom, kitchen, reception)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 10 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y) Potential: Score of a scene classifier predicting scene type (e.g., bedroom, kitchen, reception)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 10 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y) Orientation Map [Lee et al., 2009] Geometric Context [Hedau et al., 2009]
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y) Orientation Map [Lee et al., 2009] vp0 vp1 vp2 y4 y1 y2 y3 r4 r1 r2 r3 Potential: Counts of blue, red, etc, pixels inside and outside of each wall Fast computation using integral geometry [Schwing et al., 2012]
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
E(r, cr, y) = Escene type(r) + Elayout( r, cr , y) + Ewin(r, cr, y) vp0 vp1 vp2 y4 y1 y2 y3 r4 r1 r2 r3
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
E(r, cr, y) = Escene type(r) + Elayout( r, cr , y) + Ewin(r, cr, y) vp0 vp1 vp2 y4 y1 y2 y3 r4 r1 r2 r3 y = (y1, y2, y3, ✚ ✚ ❩ ❩ y4 ), y4 = f (r, cr, y1, y2, y3)
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
E(r, cr, y) = Escene type(r) + Elayout( r, cr , y) + Ewin(r, cr, y) vp0 vp1 vp2 y4 y1 y2 y3 r4 r1 r2 r3 y = (y1, y2, y3, ✚ ✚ ❩ ❩ y4 ), y4 = f (r, cr, y1, y2, y3) Additional constraint on y: Camera is inside the room
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin(r, cr, y) Window-background segmentation
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22
E(r, cr, y) = Escene type(r) + Elayout(r, cr, y) + Ewin( r, cr , y) Window-background segmentation Potential: count window pixels inside and outside the window area vp0 vp1 vp2
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22
We are minimizing the energy: (r ∗, c∗
r , y∗) = argmin r,cr ,y
Rent3D 13 / 22
We are minimizing the energy: (r ∗, c∗
r , y∗) = argmin r,cr ,y
Exhaustive enumeration of r and cr Exact branch and bound inference for y [Schwing & Urtasun, 2012]
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22
We are minimizing the energy: (r ∗, c∗
r , y∗) = argmin r,cr ,y
Exhaustive enumeration of r and cr Exact branch and bound inference for y [Schwing & Urtasun, 2012] We use S-SVM for training
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22
We crawled a London apartment rental site # apartments 215 # of images 1570 # of indoor images 1259 # images without GT alignment 82
6
31
6
9
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 14 / 22
Biggest apartment in dataset: 16 rooms, 5 bedrooms, 88 walls
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 15 / 22
Biggest apartment in dataset: 16 rooms, 5 bedrooms, 88 walls.
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 15 / 22
We assume we know which wall the camera is facing Metrics: Pixel accuracy for predicting 5 walls Layout error Evaluations Test time [s] Schwing’12 13.88 16012.4 0.0208 Ours 11.81 1269.5 0.0019
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 16 / 22
We assume we know which wall the camera is facing Metrics: Pixel accuracy for predicting 5 walls Layout error Evaluations Test time [s] Schwing’12 13.88 16012.4 0.0208 Ours 11.81 1269.5 0.0019 2% reduction in layout error
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 16 / 22
We assume we know which wall the camera is facing Metrics: Pixel accuracy for predicting 5 walls Layout error Evaluations Test time [s] Schwing’12 13.88 16012.4 0.0208 Ours 11.81 1269.5 0.0019 2% reduction in layout error 10 times less branching operations
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 16 / 22
We assume we know which wall the camera is facing Metrics: Pixel accuracy for predicting 5 walls Layout error Evaluations Test time [s] Schwing’12 13.88 16012.4 0.0208 Ours 11.81 1269.5 0.0019 2% reduction in layout error 10 times less branching operations 10x speedup
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 16 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 17 / 22
Red arrow: Groundtruth camera Green arrow: Predicted camera
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 18 / 22
Red arrow: Groundtruth camera Green arrow: Predicted camera
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 18 / 22
Window+Aspect +Scene +Room Ground-truth
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 19 / 22
We improve layout prediction over past work Achieve good localization performance
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 20 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 21 / 22
Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 22 / 22