Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 - PowerPoint PPT Presentation

Towards Deep Multi-View Stereo Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40

Towards Deep Multi-View Stereo Multi View Stereo 2 / 40

Towards Deep Multi-View Stereo Outline 1 Gipuma: massively parallel multi-view stereo 2 Unsupervised normal prediction for improved multi-view reconstruction 3 Learned multi-patch similarity 4 Conclusions 3 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Gipuma: Geometry based multi-view stereo reconstruction S. Galliani, K. Lasinger, K. Schindler, ICCV 2015 4 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo 1 Accurate multiview stereo reconstruction 2 Highly efficient open source GPU implementation: Correspondence over ten 2MPix images in 3 sec 1.6 sec 5 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Our approach : 1 Estimate depth and fit patch per view by consecutively treating each view as reference camera 2 Fuse depth maps in space to obtain final reconstruction 6 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Multi-view stereopsis Approximate randomized search for the best depth & normal minimizing a local matching error: Initialize all pixels with a random normal Then: Diffuse locally plane and save it when cost decreases Local optimization of normal Repeat (8 times enough) Similar to belief propagation 7 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Why it’s fast 1 Red-Black diffusion of planes → maximum parallelization on GPU 2 Candidates from a bigger neighborhood → faster convergence 8 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Depth map fusion Fusion of depth & normal maps from different views into one 3D point cloud Consistency check on depth ( f ε ) + normal ( f ang ) on at least f con views Average of reliable points (depth + normal) Tunable Adjustment between more accurate or complete result by tuning f ε , f ang and f con 9 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Results on old Middlebury Benchmark 10 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Current MVS dataset – DTU Large scale Multi-View dataset 80 different objects, each covered by 49–64 images of resolution 1600 × 1200 pixels ( ≈ 2 million pixels) ≈ 3 1.6 seconds per depthmap with fast settings ≈ 50 13 seconds per depthmap for accurate settings 11 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Results on DTU Dataset Acc. Comp. Mean Med. Mean Med. ours 0.273 0.196 0.687 0.260 ours comp 0.379 0.234 0.400 0.188 Points ours fast 0.291 0.208 0.825 0.279 tola [Tola-10] 0.307 0.198 1.097 0.456 furu [fur-10] 0.605 0.321 0.842 0.431 camp [Cam-08] 0.753 0.480 0.540 0.179 ours 0.363 0.215 0.766 0.329 Surfaces ours comp 0.631 0.262 0.519 0.309 ours fast 0.366 0.223 0.900 0.347 tola [Tol-10] 0.488 0.244 0.974 0.382 furu [Fur-10] 1.299 0.534 0.702 0.405 camp [Cam-08] 1.411 0.579 0.562 0.322 12 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Figure: Ground truth, textured reconstruction, reconstructed triangulation 13 / 40

Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo New dataset and online benchmark New (multi-view) stereo and video benchmark on unstructured scenes: SLR camera image Multi field of view stereo rig video and images Training dataset available Presented at CVPR2017: eth3d.net 14 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just Look at the Image: Unsupervised normal prediction for improved multi-view reconstruction S. Galliani, K. Schindler, CVPR2016 15 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Multi View Stereo: failure cases Common failure modes for MVS Ambiguous matches: Occlusions Lack of texture on homogeneous regions 16 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just look at the image Dichotomy: Stereo correspondences: more accurate in textured regions with many large image gradients 17 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just look at the image Dichotomy: Stereo correspondences: Shape-from-shading: more accurate in textured typically more robust in flat regions with many large regions with no albedo image gradients variations. 18 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Holy grail of Multi View Stereo Idea Complement MVS with shading information 19 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Explicit modeling of surface, light and material properties is an under-constrained problem: (lights position, lights color, lights intensity, reflectance function) Discriminative approach 20 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction 2 Observations: 1 Shading affects surface orientation not depth 2 Specific light interaction can be view-dependent : we rule out view point based variations like specularity, occluding edges, etc.. We learn the relation between image and surface normal We train a single model per each view 21 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Unsupervised We start with a reliable MVS reconstruction with gipuma For every image we use it as training data to learn a CNN which predict surface normal from RGB patch around point 22 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Unsupervised online training for every image We use of a Convolutional Neural Network that minimizes the error of training Vs predicted normal Accurate results w.r.t. training data Joint training of model did not works Mean Error 18 ◦ Predicted, 11 ◦ MVS Mean of Median Error 16 ◦ Predicted, 9 ◦ MVS 23 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction 24 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Surface normal integration Normals are dense but without depth information 1 Integrate the new normals with a masked Poisson equation 2 Faaa all the new dense depth maps to obtain the final point cloud 25 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Normal integration The vector field g consists of the gradients of both functions, � ∇ f mvs , if x ∈ A ∀ x ∈ Ω : g ( x ) = (1) ∇ f , else Find an interpolant f over Ω \A that minimizes the squared error �� ∇ f − g � 2 min (2) . f Ω \A This leads to the Poisson equation ∆ f = div g , (3) Solved with Gauss-Seidel + Successive Over Relaxation (SOR) (few seconds) 26 / 40

Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Results 27 / 40

Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity W. Hartmann, S. Galliani, M. Havlena, L. V. Gool, K. Schindler, ICCV2017 30 / 40

Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity A crucial component of stereo reconstruction is the matching function. Similarity In 2-view stereo matching similarity is uniquely determined: left vs right But what about Multi View Stereo? No Direct Solution → It’s common and robust to average pairwise scores Idea Learn similarity score across all the views 31 / 40

Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity We train a CNN network which directly learn a conv1 conv1 conv1 conv1 conv1 similarity score from TanH1 TanH1 TanH1 TanH1 TanH1 pool1 pool1 pool1 pool1 pool1 multiple patches conv2 conv2 conv2 conv2 conv2 TanH2 TanH2 TanH2 TanH2 TanH2 Multi-branch siamese pool2 pool2 pool2 pool2 pool2 network with shared weights mean and average aggregation Convolutional Layer 3 ReLU 3 Cast as a binary Convolutional Layer 4 ReLU 4 classification problem Convolutional Layer 5 Softmax 0 .. 1 32 / 40

Towards Deep Multi-View Stereo Learned multi-patch similarity We don’t learn sift → training data from ground truth We directly extract a set of patches obtained from 3D data points backprojected on images: Positive examples are obtained by cropping a rectangle from the backprojected corrected 3d depth on other views Negative examples are extracted from points far from the real depth but still on the epipolar lines Roughly 15 million positive and negative examples are used 33 / 40

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 - PowerPoint PPT Presentation

Towards Deep Multi-View Stereo Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View Stereo Multi View Stereo 2 / 40 Towards Deep Multi-View Stereo Outline 1 Gipuma: massively parallel multi-view

3D Photography: Stereo Matching Kevin Kser, Marc Pollefeys Spring 2012

3D Vision: Stereo Marc Pollefeys, Torsten Sattler Spring 2016

Today Recap: epipolar constraint Stereo image rectification Stereo: Stereo

Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2.

Stereo Matching 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What is stereo

Efficient Deep Learning for Stereo Matching Wenjie Luo, Alex Schwing and Raquel Urtasun W. Luo

Two-View Stereo Slides from S. Lazebnik, S. Seitz, Y. Furukawa Stereo What cues tell us

CSE 152 Section 5 HW2: Stereo Geometry April 29, 2019 Owen Jow Stereo: two views. Why is one

Stereo Thurs Mar 22 Kristen Grauman UT Austin Multiple views Multi-view geometry, matching,

Stereo Vision, Multi-View Object and Scene Reconstruction Veronica SCURTU ARTEMIS Department 1

Human Perception of Depth Lecture 5 Machine Depth Perception Multi-view / Stereo Motion

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

CS 4495 Computer Vision Stereo: Disparity and Matching Aaron Bobick School of Interactive

Stereo Vision Reading: Chapter 11 Stereo matching computes depth from two or more images

CS 4495 Computer Vision Stereo: Disparity and Matching Aaron Bobick School of Interactive

The promise and perils of jacksonian America 1836 1 1815 Timeline 1816 - 1836 1836 1815

Predicting structures: Practical concerns CS 6355: Structured Prediction 1 So far What are

Buy a Feature Adventure in Immutability and Actors David Pollak CUFP September 26 th , 2008

Outline 1. (Meta-) Heuristics for Local Search Local Search (Meta-) Heuristics Heuristics for

Stereo Tues Mar 27 Kristen Grauman UT Austin Outline Human stereopsis Epipolar

OMPT and OMPD: Emerging Tool Interfaces for OpenMP John Mellor-Crummey Department of Computer

Textures I Week 8, Mon Feb 28 http://www.ugrad.cs.ubc.ca/~cs314/Vjan2005 News face to face

Algebraic Data Types Christine Rizkallah CSE, UNSW Term 3 2020 1 Composite Data Types as

Sambuz

Useful Links

Newsletter

Mail Us