Deep Single-View 3D Object Reconstruction with Visual Hull Embedding - PowerPoint PPT Presentation

Deep Single-View 3D Object Reconstruction with Visual Hull Embedding 1,2 2 1 2 Hanqing Wang, Jiaolong Yang, Wei Liang, Xin Tong 2 1 Beijing Institute of Technology Microsoft Research Asia Beijing, China Beijing, China AAAI 2019

Single-View 3D Reconstruction • Input: a single RGB(D) Image • Output: the corresponding 3D representation

Previous Works • Deep Learning based Methods: [Choy ECCV’16] Other works: [Girdhar ECCV’16] [Yan NIPS’16][Wu NIPS’16][ Tulsiani CVPR’17][Zhu ICCV’17]

Limitations of previous works • Problems of Existing Deep Learning based Methods: • 1. Arbitrary-view images vs. Canonical-view aligned 3D shapes Y Z X • 2. Unsatisfactory results Missing shape details Inconsistency with input 2/15/2019 4

Core Idea • Goal: Reconstruct the object precisely with the given image • Idea: Embed explicitly the 3D-2D projection geometry into a network • Approach: Estimating a single-view visual hull inside of the network Multi-view Single-view Visual Hull Visual Hull

Method Overview CNN Coarse Shape CNN CNN Final Shape Input Image Silhouette CNN Single-View Visual Hull Pose

Components (R,T) (R,T) CNN 2D Encoder 2D Encoder 2D Decoder 2D Decoder 2D Encoder 2D Encoder 3D Decoder 3D Decoder Regressor Regressor Coarse Shape + + CNN 3D Decoder 3D Decoder 3D Encoder 3D Encoder CNN Final Shape • V-Net: coarse shape prediction • V-Net: coarse shape prediction Input Image Silhouette • P-Net: object pose and camera parameters estimation • P-Net: object pose and camera parameters estimation • S-Net: silhouette prediction • S-Net: silhouette prediction • PSVH layer: visual hull generation • PSVH layer: visual hull generation CNN • R-Net: coarse shape refinement • R-Net: coarse shape refinement Single-View Visual Hull Pose

Components (R,T) (R,T) 2D Encoder 2D Encoder 2D Decoder 2D Decoder 2D Encoder 2D Encoder 3D Decoder 3D Decoder Regressor Regressor + + 3D Decoder 3D Decoder 3D Encoder 3D Encoder • V-Net: coarse shape prediction • V-Net: coarse shape prediction • P-Net: object pose and camera parameters estimation • P-Net: object pose and camera parameters estimation • S-Net: silhouette prediction • S-Net: silhouette prediction • PSVH layer: visual hull generation • PSVH layer: visual hull generation • R-Net: coarse shape refinement • R-Net: coarse shape refinement

Network Architecture • Overview:

Training Details Loss: We use the binary cross-entropy loss to train V-Net , S-Net and R-Net , let 𝑞 𝑜 be the estimated probability at location 𝑜 , the loss is defined as 𝑚 = − 1 ∗ log 𝑞 𝑜 + 1 − 𝑞 𝑜 ∗ log(1 − 𝑞 𝑜 )) 𝑂 ෍ (𝑞 𝑜 (2) 𝑜 ∗ is the target probability Where 𝑞 𝑜 For P-Net, we use the 𝑀 1 regression loss to train the network: ∗ + ෍ ∗ + 𝛿 𝑢 𝑎 − 𝑢 𝑎 ∗ 𝑚 = ෍ 𝛽 𝜄 𝑗 − 𝜄 𝑗 𝛾 𝑢 𝑘 − 𝑢 𝑘 (3) 𝑗=1,2,3 𝑘=𝑣,𝑤 where we set 𝛽 = 1, 𝛿 = 1, 𝛾 = 0.01

Training Details Steps: 1. Train the V-Net, S-Net, P-Net independently. 2. Train the R-Net with the coarse shape predicted by V-Net and the ground truth visual hull. 3. Train the whole network end-to-end.

Implementation Details • Network implemented in Tensorflow • Input image size: 128x128x3 • Output voxel grid size: 32x32x32

Dataset • Object categories : car , airplane , chair , sofa • Datasets : • Rendered ShapeNet objects – (ShapeNet) dataset of tremendous CAD models • Real images - (PASCAL 3D+ dataset) manually associated with limited CAD models

Experiments • Results on the 3D-R2N2 dataset (rendered ShapeNet objects) • Ablation study:

Experiments • Results on the rendered ShapeNet objects

Experiments • Results on the synthetic dataset (rendered ShapeNet objects) • Ablation study:

Experiments • Comparison with MarrNet[Wu et al. 2017] on the synthetic dataset

Experiments • Results on the PASCAL 3D+ dataset (real images)

Experiments • Results on the PASCAL 3D+ dataset (real images) IoU 0.716 IoU 0.793 IoU 0.937

Running Time • ~18ms for one image ( 55 fps! ) • (Tested with a batch of 24 images on a NVIDIA Tesla M40 GPU)

Contributions • Embedding Domain knowledge (3D-2D perspective geometry) into a DNN • Performing reconstruction jointly with segmentation and pose estimation • A novel, GPU-friendly PSVH (Probabilistic Single-view Visual Hull) layer

Thanks for listening! • Welcome to ask any problem! • Email: hanqingwang@bit.edu.cn

Deep Single-View 3D Object Reconstruction with Visual Hull Embedding - PowerPoint PPT Presentation

Deep Single-View 3D Object Reconstruction with Visual Hull Embedding 1,2 2 1 2 Hanqing Wang, Jiaolong Yang, Wei Liang, Xin Tong 2 1 Beijing Institute of Technology Microsoft Research Asia Beijing, China

3D RECONSTRUCTION Reconstruction method Reconstruction from images Reconstruction from video

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Delaunay Triangulation: Applications Reconstruction Meshing 1 Reconstruction From points 2 -

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Single-Particle Reconstruction Single-Particle Reconstruction Joachim Frank Joachim Frank

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

1. Reconstruction and the West 1.1 Reconstruction: Americas Unfinished Revolution, 1865-1877

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

Vertex reconstruction Vertex reconstruction in large liquid scintillator detectors in large

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Illinois 2020 Skip-Year SGP Analyses Damian Betebenner National Center for the Improvement of

the new the new PRORANK ACADEMY Who is eligible for supportive services? PENNDBE PENNDOT

Global Outlook for Graphic Papers October 2019 John Maine VP, Global Graphic Paper Global

PRESENTATION OF PAPERS TO THE HOUSE This circular sets out the procedures for the presentation of

APRIL 26, 2017 HELSINKI Todays presenters Sakari Ahdekivi Jan strm Deputy CEO of

Paper Recycling in Europe 8 th Research Forum on Recycling 24-26 September 2007 Niagara Falls,

Birdwhistells Theory of Kinesics Rachel Little Origins of Theory Founded by Ray Birdwhistell

Evaluating learners intercultural experiences intercultural experiences Evaluating