Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu - PowerPoint PPT Presentation

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , ∗ Alexander Schwing 2 , ∗ Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 / 22

How Many Times Have You Looked for Apartments? Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22

How Many Times Have You Looked for Apartments? United States: 11.7% per year Craigslist: 90,000 rental ads per day only in New York 10 million people visit the website per day Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22

How Many Times Have You Looked for Apartments? Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 2 / 22

Finding an Apartment/House is a Pain... Particularly during a winter in Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 3 / 22

Renting Apartments Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 4 / 22

Example Rental Data Plus some meta information e.g. wall height Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 5 / 22

Rent3D: View Rental Ads in 3D Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22

Rent3D: View Rental Ads in 3D Camera localization within apartment Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 6 / 22

Related Work Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 Lee et al., 2010 ⊲ Choi et al., 2013 Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction from large photo collections or video Xiao & Furukawa, 2012 ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work Cabral & Furukawa, 2014 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22

Related Work Room layout estimation ⊲ Hedau et al., 2009, 2012 ⊲ Lee et al., 2010 ⊲ Schwing et al., 2012, 2013 ⊲ Del Pero et al., 2011, 2012 Lee et al., 2010 ⊲ Choi et al., 2013 Our work: Virtual tours ⊲ Xiao & Furukawa, 2012 3D indoor reconstruction and localization using monocular imagery 3D indoor reconstruction from large photo collections or video Xiao & Furukawa, 2012 ⊲ Cabral & Furukawa, 2014 ⊲ Brualla et al., 2014 Indoor localization (video, depth sensors) Project Tango SLAM work Cabral & Furukawa, 2014 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 7 / 22

Overview Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22

Overview Accurate camera localization : Scene cues Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22

Overview Accurate camera localization : Scene cues Semantic cues Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22

Overview Accurate camera localization : Scene cues Semantic cues Geometric cues by exploiting the dimension information Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 8 / 22

Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22

Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room Front wall is the plane defined by vp 0 and vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22

Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22

Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) y . . . rays representing a room layout Typical parametrization for room layout [Hedau et al., 2009]: r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 Room is a 3D cuboid y 2 y 3 y = ( y 1 , y 2 , y 3 , y 4 ) vp 1 4 rays needed to define it Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22

Formulation r ∈ { 1 , . . . , R } . . . discrete random variable representing the room c r ∈ { 1 , . . . , | C r |} . . . a discrete variable representing within room r which wall the picture is facing ( | C r | the number of walls in a room) y . . . rays representing a room layout We formulate the problem as inference in a Conditional Random Field with the following energy: E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 9 / 22

Energy Terms: Scene Type E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Potential: Score of a scene classifier predicting scene type (e.g., bedroom, kitchen, reception) Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 10 / 22

Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Orientation Map [Lee et al., 2009] Geometric Context [Hedau et al., 2009] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22

Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 Orientation Map [Lee et al., 2009] Potential : Counts of blue, red, etc, pixels inside and outside of each wall Fast computation using integral geometry [Schwing et al., 2012] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22

Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22

Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 y = ( y 1 , y 2 , y 3 , ✚ y 4 ), y 4 = f ( r , c r , y 1 , y 2 , y 3 ) ❩ ✚ ❩ Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22

Energy Terms: Layout E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r 2 r 3 r 4 y 4 vp 0 vp 2 y 1 r 1 y 2 y 3 vp 1 y = ( y 1 , y 2 , y 3 , ✚ y 4 ), y 4 = f ( r , c r , y 1 , y 2 , y 3 ) ❩ ✚ ❩ Additional constraint on y : Camera is inside the room Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 11 / 22

Energy Terms: Windows E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Window-background segmentation Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22

Energy Terms: Windows E ( r , c r , y ) = E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) Window-background segmentation Potential : count window pixels inside and outside the window area vp 0 vp 2 vp 1 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 12 / 22

Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22

Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Inference: Exhaustive enumeration of r and c r Exact branch and bound inference for y [Schwing & Urtasun, 2012] Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22

Learning and Inference We are minimizing the energy: � � ( r ∗ , c ∗ r , y ∗ ) = argmin E scene type ( r ) + E layout ( r , c r , y ) + E win ( r , c r , y ) r , c r , y Inference: Exhaustive enumeration of r and c r Exact branch and bound inference for y [Schwing & Urtasun, 2012] We use S-SVM for training Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 13 / 22

Dataset We crawled a London apartment rental site # apartments 215 # of images 1570 # of indoor images 1259 # images without GT alignment 82 avg. # rooms per apt 6 avg. # walls per apt 31 avg. # windows per apt 6 avg. # doors per apt 9 Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 14 / 22

Apartments in Central London Are Not Small Biggest apartment in dataset: 16 rooms, 5 bedrooms, 88 walls Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 15 / 22

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu - PowerPoint PPT Presentation

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 , Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 /

Quality Assurance in Performance: Evaluating Mono Benchmark Results Tomas Kalibera, Lubomir Bulej

Time to Reduce the Implementation Gaps: The role of PCSK9i in routine Clinical Practice

Id Like To Teach The World To Code: Scripting In Second Life Dr Jim Purbrick, Technical

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

From 2D to 3D: Monocular Vision With application to robotics/AR Motivation How many sensors do

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu ,

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, Jos M.

COMPUTER VISION FOR ROBOT NAVIGATION Sanketh Shetty Computer Vision and Robotics Laboratory

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

* * 2 :

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can

Analysis of Ultra High Energetic Cosmic Rays measured in monocular mode with the fmuorescence

A PODS-based Extended Kalman Filter: Quantifying Sensing Uncertainties in Automatic Bird Species

Deep Structured Learning Chunhua Shen School of Computer Science, The University of Adelaide

robots navigation LUKAS HFLIGER SUPERVISED BY MARIAN GEORGE 2 LUKAS HFLIGER 3 4 LUKAS

Deep-Learning: general principles + Convolutional Neural Networks Pr. Fabien MOUTARDE Center

Cue combinations, Bayesian models Thurs. March 1, 2018 1 Visual Cues: image properties that

Cosmic Rays Energy Spectrum from PeV to EeV energies measured by the TALE Detector Tareq

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Sambuz

Useful Links

Newsletter

Mail Us

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu - PowerPoint PPT Presentation

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 , Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 /

Quality Assurance in Performance: Evaluating Mono Benchmark Results Tomas Kalibera, Lubomir Bulej

Time to Reduce the Implementation Gaps: The role of PCSK9i in routine Clinical Practice

Id Like To Teach The World To Code: Scripting In Second Life Dr Jim Purbrick, Technical

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

From 2D to 3D: Monocular Vision With application to robotics/AR Motivation How many sensors do

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu ,

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, Jos M.

COMPUTER VISION FOR ROBOT NAVIGATION Sanketh Shetty Computer Vision and Robotics Laboratory

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

* * 2 :

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can

Analysis of Ultra High Energetic Cosmic Rays measured in monocular mode with the fmuorescence

A PODS-based Extended Kalman Filter: Quantifying Sensing Uncertainties in Automatic Bird Species

Deep Structured Learning Chunhua Shen School of Computer Science, The University of Adelaide

robots navigation LUKAS HFLIGER SUPERVISED BY MARIAN GEORGE 2 LUKAS HFLIGER 3 4 LUKAS

Deep-Learning: general principles + Convolutional Neural Networks Pr. Fabien MOUTARDE Center

Cue combinations, Bayesian models Thurs. March 1, 2018 1 Visual Cues: image properties that

Cosmic Rays Energy Spectrum from PeV to EeV energies measured by the TALE Detector Tareq

Hawkular Metrics Metric Storage &amp; Alerting Stefan Negrea About Me Co-Creator of Hawkular

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Sambuz

Useful Links

Newsletter

Mail Us

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular