 
              Towards Deep Multi-View Stereo Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40
Towards Deep Multi-View Stereo Multi View Stereo 2 / 40
Towards Deep Multi-View Stereo Outline 1 Gipuma: massively parallel multi-view stereo 2 Unsupervised normal prediction for improved multi-view reconstruction 3 Learned multi-patch similarity 4 Conclusions 3 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Gipuma: Geometry based multi-view stereo reconstruction S. Galliani, K. Lasinger, K. Schindler, ICCV 2015 4 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo 1 Accurate multiview stereo reconstruction 2 Highly efficient open source GPU implementation: Correspondence over ten 2MPix images in 3 sec 1.6 sec 5 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Our approach : 1 Estimate depth and fit patch per view by consecutively treating each view as reference camera 2 Fuse depth maps in space to obtain final reconstruction 6 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Multi-view stereopsis Approximate randomized search for the best depth & normal minimizing a local matching error: Initialize all pixels with a random normal Then: Diffuse locally plane and save it when cost decreases Local optimization of normal Repeat (8 times enough) Similar to belief propagation 7 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Why it’s fast 1 Red-Black diffusion of planes → maximum parallelization on GPU 2 Candidates from a bigger neighborhood → faster convergence 8 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Depth map fusion Fusion of depth & normal maps from different views into one 3D point cloud Consistency check on depth ( f ε ) + normal ( f ang ) on at least f con views Average of reliable points (depth + normal) Tunable Adjustment between more accurate or complete result by tuning f ε , f ang and f con 9 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Results on old Middlebury Benchmark 10 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Current MVS dataset – DTU Large scale Multi-View dataset 80 different objects, each covered by 49–64 images of resolution 1600 × 1200 pixels ( ≈ 2 million pixels) ≈ 3 1.6 seconds per depthmap with fast settings ≈ 50 13 seconds per depthmap for accurate settings 11 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Results on DTU Dataset Acc. Comp. Mean Med. Mean Med. ours 0.273 0.196 0.687 0.260 ours comp 0.379 0.234 0.400 0.188 Points ours fast 0.291 0.208 0.825 0.279 tola [Tola-10] 0.307 0.198 1.097 0.456 furu [fur-10] 0.605 0.321 0.842 0.431 camp [Cam-08] 0.753 0.480 0.540 0.179 ours 0.363 0.215 0.766 0.329 Surfaces ours comp 0.631 0.262 0.519 0.309 ours fast 0.366 0.223 0.900 0.347 tola [Tol-10] 0.488 0.244 0.974 0.382 furu [Fur-10] 1.299 0.534 0.702 0.405 camp [Cam-08] 1.411 0.579 0.562 0.322 12 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo Figure: Ground truth, textured reconstruction, reconstructed triangulation 13 / 40
Towards Deep Multi-View Stereo Gipuma: massively parallel multi-view stereo New dataset and online benchmark New (multi-view) stereo and video benchmark on unstructured scenes: SLR camera image Multi field of view stereo rig video and images Training dataset available Presented at CVPR2017: eth3d.net 14 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just Look at the Image: Unsupervised normal prediction for improved multi-view reconstruction S. Galliani, K. Schindler, CVPR2016 15 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Multi View Stereo: failure cases Common failure modes for MVS Ambiguous matches: Occlusions Lack of texture on homogeneous regions 16 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just look at the image Dichotomy: Stereo correspondences: more accurate in textured regions with many large image gradients 17 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Just look at the image Dichotomy: Stereo correspondences: Shape-from-shading: more accurate in textured typically more robust in flat regions with many large regions with no albedo image gradients variations. 18 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Holy grail of Multi View Stereo Idea Complement MVS with shading information 19 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Explicit modeling of surface, light and material properties is an under-constrained problem: (lights position, lights color, lights intensity, reflectance function) Discriminative approach 20 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction 2 Observations: 1 Shading affects surface orientation not depth 2 Specific light interaction can be view-dependent : we rule out view point based variations like specularity, occluding edges, etc.. We learn the relation between image and surface normal We train a single model per each view 21 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Unsupervised We start with a reliable MVS reconstruction with gipuma For every image we use it as training data to learn a CNN which predict surface normal from RGB patch around point 22 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Unsupervised online training for every image We use of a Convolutional Neural Network that minimizes the error of training Vs predicted normal Accurate results w.r.t. training data Joint training of model did not works Mean Error 18 ◦ Predicted, 11 ◦ MVS Mean of Median Error 16 ◦ Predicted, 9 ◦ MVS 23 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction 24 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Surface normal integration Normals are dense but without depth information 1 Integrate the new normals with a masked Poisson equation 2 Faaa all the new dense depth maps to obtain the final point cloud 25 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Normal integration The vector field g consists of the gradients of both functions, � ∇ f mvs , if x ∈ A ∀ x ∈ Ω : g ( x ) = (1) ∇ f , else Find an interpolant f over Ω \A that minimizes the squared error �� �∇ f − g � 2 min (2) . f Ω \A This leads to the Poisson equation ∆ f = div g , (3) Solved with Gauss-Seidel + Successive Over Relaxation (SOR) (few seconds) 26 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Results 27 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Results 28 / 40
Towards Deep Multi-View Stereo Unsupervised normal prediction for improved multi-view reconstruction Results 29 / 40
Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity W. Hartmann, S. Galliani, M. Havlena, L. V. Gool, K. Schindler, ICCV2017 30 / 40
Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity A crucial component of stereo reconstruction is the matching function. Similarity In 2-view stereo matching similarity is uniquely determined: left vs right But what about Multi View Stereo? No Direct Solution → It’s common and robust to average pairwise scores Idea Learn similarity score across all the views 31 / 40
Towards Deep Multi-View Stereo Learned multi-patch similarity Learned Multi-Patch Similarity We train a CNN network which directly learn a conv1 conv1 conv1 conv1 conv1 similarity score from TanH1 TanH1 TanH1 TanH1 TanH1 pool1 pool1 pool1 pool1 pool1 multiple patches conv2 conv2 conv2 conv2 conv2 TanH2 TanH2 TanH2 TanH2 TanH2 Multi-branch siamese pool2 pool2 pool2 pool2 pool2 network with shared weights mean and average aggregation Convolutional Layer 3 ReLU 3 Cast as a binary Convolutional Layer 4 ReLU 4 classification problem Convolutional Layer 5 Softmax 0 .. 1 32 / 40
Towards Deep Multi-View Stereo Learned multi-patch similarity We don’t learn sift → training data from ground truth We directly extract a set of patches obtained from 3D data points backprojected on images: Positive examples are obtained by cropping a rectangle from the backprojected corrected 3d depth on other views Negative examples are extracted from points far from the real depth but still on the epipolar lines Roughly 15 million positive and negative examples are used 33 / 40
Recommend
More recommend