Scene Understanding with 3D Deep Networks Thomas Funkhouser - PowerPoint PPT Presentation

Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University

Disclaimer: I am talking about the work of these people … Shuran Song Fisher Yu Yinda Zhang Andy Zeng Maciej Halber Jianxiong Xiao Angela Dai Matt Fisher Matthias Niessner

Goal Understanding indoor scenes observed in RGB-D images • Robotics • Augmented reality • Virtual tourism • Surveillance • Home remodeling • Real estate • Telepresence • Forensics • Games • etc.

Goal Understanding indoor scenes observed in RGB-D images Semantic Segmentation Input RGB-D Image(s)

Goal Understanding indoor scenes observed in RGB-D images in 3D Semantic Segmentation Input RGB-D Image(s) 3D Scene Understanding

Goal Understanding indoor scenes observed in RGB-D images in 3D • Surface reconstruction • Amodal object detection • Object relationships • Materials, lights, etc. • Physical properties • Novel views Semantic Segmentation • Info sharing • Spatial inference • Simulation • etc.

Goal for This Talk Learn ConvNets to recognize patterns in voxels • Local shape descriptor • Amodal object detection • Semantic scene completion

Talk Outline Small Local shape descriptor Scale Amodal object detection Semantic scene completion Large

Talk Outline Small Local shape descriptor Scale Amodal object detection Semantic scene completion Large A. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, T. Funkhouser, “3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions,” submitted to CVPR 2017

Local Shape Descriptor Goal: train a discriminating 3D local shape descriptor from data Local shape descriptor Local shape descriptor … … 0.58 0.21 0.92 0.67 0.04 0.53 0.58 0.21 0.92 0.67 0.04 0.53 Match!

Local Shape Descriptor Challenge: where to get training data?

Local Shape Descriptor: “3D Match” Approach: train on wide-baseline correspondences in RGB-D reconstructions “Ground truth” match between RGB-D Images from different views

Local Shape Descriptor: “3D Match” Approach: train on wide-baseline correspondences in RGB-D reconstructions

Local Shape Descriptor: “3D Match” Method: sample true/false correspondences from RGB-D reconstructions, train Siamese network

Local Shape Descriptor: “3D Match” Result: learns to discriminate local shapes found in real-world data

Local Shape Descriptor: “3D Match” Results Result 1: learned feature descriptor predicts RGB-D point correspondences more accurately than hand-tuned descriptors Match classification error at 95% recall Fragment Alignment Success Rate

Local Shape Descriptor: “3D Match” Results Result 2: feature descriptor learned from RGB-D reconstructions provides matching for recognizing poses of small objects in Amazon Picking Challenge Object pose prediction accuracy Predicting pose of 3D object model in RGB-D scan

Local Shape Descriptor: “3D Match” Results Result 3: feature descriptor learned from RGB-D reconstructions provides discriminative matching of semantic correspondences on 3D meshes

Talk Outline Small Local Shape Descriptor Scale Amodal object detection Semantic scene completion Large S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB- D Images,” CVPR 2016

Object Detection Goal: given a RGB-D image, find objects (labeled 3D amodal bounding boxes) Input: Single RGB-D Output: labeled 3D Amodal Boxes

Object Detection Most previous work: Image 3D Amodal Encode Depth Map 2D Contour 2D Region 2D Object 2D Instance Coarse Pose Point Cloud as Extra Channels Detection Proposal Detection Segmentation Classification Alignment Detection Result Depth Map 3D Input 2D Operations 3D 3D Output [CVPR13] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images [IJCV14] Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and semantic segmentation [ECCV14] Object Detection and Segmentation using Semantically Rich Image and Depth Features [CVPR15] Aligning 3D Models to RGB-D Images of Cluttered Scenes [CVPR16] Cross Modal Distillation for Supervision Transfer

Object Detection: “Deep Sliding Shapes” Approach: Image 3D Deep Learning 3D Amodal Detection Result Depth Map 3D Input 3D Operations 3D Output

Object Detection: “Deep Sliding Shapes” bed RGB-D Image Object Recognition Network Region Proposal Network

Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF

Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF 2.5 m 5.2 m 5.2 m

Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF

Object Detection: “Deep Sliding Shapes” 3D region proposal network: Region Proposal Network TSDF 3D Region Proposals

Object Detection: “Deep Sliding Shapes” 3D region proposal network: Physical Size ×50 Pixel Area ×3

Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network:

Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: ReLU + Pool ReLU + Pool Input: TSDF Conv 1 Conv 2

Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth Receptive field: 0.4 m 3

Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class 0.6×0.2×0.4 m ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth 0.6×0.2×0.4 m 0.5×0.5×0.2 m Level 1 Anchors Receptive field: 0.4 m 3

Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class Conv Softmax Class ReLU + Pool ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth Conv 4 3D Box Smooth Receptive field: 1 m 3 Receptive field: 0.4 m 3

Object Detection: “Deep Sliding Shapes” Conv Softmax Class ReLU + Pool Conv L1 Conv 4 3D Box Smooth Level 2 Anchors Receptive field: 1 m 3

Object Detection: “Deep Sliding Shapes” bed RGB-D Image Object Recognition Network Region Proposal Network

Object Detection: “Deep Sliding Shapes” Joint object recognition network: project to 2D

Object Detection: “Deep Sliding Shapes” Joint object recognition network: TSDF Image Patch

Object Detection: “Deep Sliding Shapes” Joint object recognition network:

Object Detection: “Deep Sliding Shapes” Joint object recognition network: ReLU + Pool ReLU + Pool FC Class Softmax Conv 1 Conv 2 Conv 3 ReLU FC 2 3D ConvNet 2D VGG on ImageNet Concatenation FC 3D Box L1 Smooth FC 3

Object Detection: “Deep Sliding Shapes” Experiments Train and test on amodal boxes provided in SUN RGB-D S. Song, S. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” CVPR 2015

Object Detection: “Deep Sliding Shapes” Results Quantitative comparisons: 3D Non-Deep Learning 2D Deep Learning 3D Deep Learning Object detection accuracy on NYU v2 dataset (mAP)

Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: sofa Ours: bathtub

Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: chair Ours: sofa

Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: table Ours: bed

Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: miss Ours: table and chairs

Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: toilet Ours: garbage bin+bed

Talk Outline Small Local Shape Descriptor Scale Amodal object detection Semantic scene completion Large S. Song, F. Yu, A. Zeng, A. Chang, M. Savva, and T. Funkhouser, “ Semantic Scene Completion from a Single Depth Image,” submitted to CVPR 2017

Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class Input: Single view depth map Output: Semantic scene completion

Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class visible surface free space occluded space outside view outside room 3D Scene

Semantic Scene Completion Prior work: segmentation OR completion Silberman et al. surface segmentation scene completion Firman et al. 3D Scene The occupancy and the object identity This paper are tightly intertwined ! semantic scene completion

Scene Understanding with 3D Deep Networks Thomas Funkhouser - PowerPoint PPT Presentation

Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University Disclaimer: I am talking about the work of these people Shuran Song Fisher Yu Yinda Zhang Andy Zeng Maciej Halber Jianxiong Xiao Angela Dai Matt Fisher

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry 1 ,

COLOR IN GRAPHICS & VISUALIZATION Graphics & Visualization: Principles & Algorithms

Dominant colors in images CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business

Foundations of Computer Science Lecture 1 Warmup: A Taste for Discrete Math and Computing

Multimodal 2DCNN action recognition from RGB-D Data with Video Summarization Vicent Roig Ripoll

COLOR SPECTRUM RECONSTRUCTION USING NEURAL NETWORKS 2 Hyperspectral-sensing.nb THE GOAL

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Jan. 30, 2012 Lecture #2 Visual Features:

Multimodal Gesture Recognition Based on the ResC3D Network Qiguang Miao Yunan Li Wanli Ouyang

Scene Understanding with 3D Deep Networks Thomas Funkhouser - PowerPoint PPT Presentation

Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University Disclaimer: I am talking about the work of these people Shuran Song Fisher Yu Yinda Zhang Andy Zeng Maciej Halber Jianxiong Xiao Angela Dai Matt Fisher

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

Deep Incremental Scene Understanding Federico Tombari &amp; Christian Rupprecht Technical

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry 1 ,

COLOR IN GRAPHICS &amp; VISUALIZATION Graphics &amp; Visualization: Principles &amp; Algorithms

Dominant colors in images CLUS TERIN G METH ODS W ITH S CIP Y Shaumik Daityari Business

Foundations of Computer Science Lecture 1 Warmup: A Taste for Discrete Math and Computing

Multimodal 2DCNN action recognition from RGB-D Data with Video Summarization Vicent Roig Ripoll

COLOR SPECTRUM RECONSTRUCTION USING NEURAL NETWORKS 2 Hyperspectral-sensing.nb THE GOAL

EE 6882 Visual Search Engine Prof. Shih Fu Chang, Jan. 30, 2012 Lecture #2 Visual Features:

Multimodal Gesture Recognition Based on the ResC3D Network Qiguang Miao Yunan Li Wanli Ouyang

Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

COLOR IN GRAPHICS & VISUALIZATION Graphics & Visualization: Principles & Algorithms