3D Vision Andreas Geiger and Torsten Sattler Spring 2017 3D Vision - - PowerPoint PPT Presentation

3d vision
SMART_READER_LITE
LIVE PREVIEW

3D Vision Andreas Geiger and Torsten Sattler Spring 2017 3D Vision - - PowerPoint PPT Presentation

3D Vision Andreas Geiger and Torsten Sattler Spring 2017 3D Vision Understanding geometric relations between images and the 3D world between images Obtaining 3D information describing our 3D world from images from


slide-1
SLIDE 1

3D Vision

Andreas Geiger and Torsten Sattler

Spring 2017

slide-2
SLIDE 2

3D Vision

  • Understanding geometric relations
  • between images and the 3D world
  • between images
  • Obtaining 3D information describing
  • ur 3D world
  • from images
  • from dedicated sensors
slide-3
SLIDE 3

3D Vision

  • Extremely important in robotics and

AR / VR

  • Visual navigation
  • Sensing / mapping the environment
  • Obstacle detection, …
  • Many further application areas
  • A few examples …
slide-4
SLIDE 4

Google Tango

slide-5
SLIDE 5

Google Tango

slide-6
SLIDE 6

Image-Based Localization

slide-7
SLIDE 7

Geo-Tagging Holiday Photos

(Li et al. ECCV 2012)

slide-8
SLIDE 8

Augmented Reality

(Middelberg et al. ECCV 2014)

slide-9
SLIDE 9

Video credit: Johannes Schönberger

Large-Scale Structure-from-Motion

slide-10
SLIDE 10

Virtual Tourism

slide-11
SLIDE 11

UNC/UKY UrbanScape project

3D Urban Modeling

slide-12
SLIDE 12

3D Urban Modeling

slide-13
SLIDE 13

Mobile Phone 3D Scanner

slide-14
SLIDE 14

Mobile Phone 3D Scanner

slide-15
SLIDE 15

Self-Driving Cars

slide-16
SLIDE 16

Self-Driving Cars

slide-17
SLIDE 17

Self-Driving Cars

slide-18
SLIDE 18

Micro Aerial Vehicles

slide-19
SLIDE 19

Microsoft HoloLens

Mixed Reality

slide-20
SLIDE 20

Virtual Reality

slide-21
SLIDE 21

Raw Kinect Output: Color + Depth

http://grouplab.cpsc.ucalgary.ca/cookbook/index.php/Technologies/Kinect

slide-22
SLIDE 22

Human-Machine Interface

slide-23
SLIDE 23

3D Video with Kinect

slide-24
SLIDE 24

Autonomous Micro-Helicopter Navigation

Use Kinect to map out obstacles and avoid collisions

slide-25
SLIDE 25

Dynamic Reconstruction

slide-26
SLIDE 26

Performance Capture

slide-27
SLIDE 27

Performance Capture

(Oswald et al. ECCV 14)

slide-28
SLIDE 28

Performance Capture

slide-29
SLIDE 29

Motion Capture

slide-30
SLIDE 30

Interactive 3D Modeling

(Sinha et al. Siggraph Asia 08) collaboration with Microsoft Research (and licensed to

slide-31
SLIDE 31

Scanning Industrial Sites

as-build 3D model of off-shore oil platform

slide-32
SLIDE 32

Scanning Cultural Heritage

slide-33
SLIDE 33

Cultural Heritage

S tanford’s Digital Michelangelo

Digital archive Art historic studies

slide-34
SLIDE 34

accuracy ~1/500 from DV video (i.e. 140kb jpegs 576x720)

Archaeology

slide-35
SLIDE 35

Forensics

  • Crime scene recording and analysis
slide-36
SLIDE 36

Forensics

slide-37
SLIDE 37

Sports

slide-38
SLIDE 38

Surgery

slide-39
SLIDE 39

Johannes Schönberger CAB G 85.1

jsch@inf.ethz.ch

Andreas Geiger CNB G105

andreas.geiger@inf.ethz.ch

Torsten Sattler CNB 104

torsten.sattler@inf.ethz.ch

Federico Camposeco CAB G 86.3

federico.camposeco@inf.ethz.ch

Peidong Liu CAB G 84.2

peidong.liu@inf.ethz.ch

Pablo Speciale CAB G 86.3

pablo.speciale@inf.ethz.ch

3D Vision Course Team

slide-40
SLIDE 40
  • To understand the concepts that relate

images to the 3D world and images to

  • ther images
  • Explore the state of the art in 3D vision
  • Implement a 3D vision system/algorithm

Course Objectives

slide-41
SLIDE 41

Learning Approach

  • Introductory lectures:
  • Cover basic 3D vision concepts and approaches.
  • Further lectures:
  • Short introduction to topic
  • Paper presentations (you)

(seminal papers and state-of-the-art, related to your projects)

  • 3D vision project:
  • Choose topic, define scope (by week 4)
  • Implement algorithm/system
  • Presentation/demo and paper report

Grade distribution

  • Paper presentation & discussions: 25%
  • 3D vision project & report: 75%
slide-42
SLIDE 42

Slides and more

http://www.cvg.ethz.ch/teaching/3dvision/

Also check out on-line “shape-from-video” tutorial:

http://www.cs.unc.edu/~ marc/tutorial.pdf http://www.cs.unc.edu/~ marc/tutorial/

Textbooks:

  • Hartley & Zisserman, Multiple View Geometry
  • Szeliski, Computer Vision: Algorithms and Applications

Materials

slide-43
SLIDE 43

Feb 20 Introduction

Feb 27 Geometry, Camera Model, Calibration Mar 6 Features, Tracking / Matching Mar 13

Project Proposals by Students

Mar 20 Structure from Motion (SfM) + papers Mar 27 Dense Correspondence (stereo / optical flow) + papers Apr 3 Bundle Adjustment & SLAM + papers Apr 10

Student Midterm Presentations

Arp17 Easter break Apr 24 Multi-View Stereo & Volumetric Modeling + papers May 1 Labour Day May 8 3D Modeling with Depth Sensors + papers May 15 3D Scene Understanding + papers May 22 4D Video & Dynamic Scenes + papers May 29

Student Project Demo Day = Final Presentations

Schedule

slide-44
SLIDE 44

Fast Forward

  • Quick overview of what is coming…
slide-45
SLIDE 45

Pinhole camera Geometric transformations in 2D and 3D

  • r

Camera Models and Geometry

slide-46
SLIDE 46
  • Know 2D/3D correspondences,

compute projection matrix also radial distortion (non-linear)

Camera Calibration

slide-47
SLIDE 47

Harris corners, KLT features, S IFT features

key concepts: invariance of extraction, descriptors to viewpoint, exposure and illumination changes

Feature Tracking and Matching

slide-48
SLIDE 48

l2 C1 m1 M? L1 m2 L2 M C2 Triangulation

  • calibration
  • correspondences

3D from Images

slide-49
SLIDE 49

Fundamental matrix Essential matrix

Also how to robustly compute from images

Epipolar Geometry

slide-50
SLIDE 50

Initialize Motion (P1,P2 compatibel with F) Initialize Structure (minimize reprojection error) Extend motion (compute pose through matches seen in 2 or more previous views) Extend structure (Initialize new structure, refine existing structure)

Structure from Motion

slide-51
SLIDE 51
  • Visual Simultaneous Navigation and Mapping

Visual SLAM

(Clipp et al. ICCV’09)

slide-52
SLIDE 52

Stereo and Rectification

Warp images to simplify epipolar geometry Compute correspondences for all pixels

slide-53
SLIDE 53

Multi-View Stereo

slide-54
SLIDE 54

Joint 3D Reconstruction and Class Segmentation (Haene et al CVPR13)

joint reconstruction and segmentation

(ground, building, vegetation, stuff)

reconstruction only

(isotropic smoothness prior)

 Building  Ground  Vegetation  Clutter

slide-55
SLIDE 55

Structured Light

  • Projector = camera
  • Use specific patterns to obtain

correspondences

slide-56
SLIDE 56

Papers and Discussion

  • Will cover recent state of the art
  • Each student team will present a paper (5min

per team member), followed by discussion

  • “Adversary” to lead the discussion
  • Papers will be related to projects/topics
  • Will distribute papers later

(depending on chosen projects)

slide-57
SLIDE 57

Projects and reports

  • Project on 3D Vision-related topic
  • Implement algorithm / system
  • Evaluate it
  • Write a report about it
  • 3 Presentations / Demos:
  • Project Proposal Presentation (week 4)
  • Midterm Presentation (week 8)
  • Project Demos (week 15)
  • Ideally: Groups of 3 students
slide-58
SLIDE 58

Course project example: Build your own 3D scanner!

Example: Bouguet ICCV’98

slide-59
SLIDE 59

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Benchmarking Relative Pose Solvers

Implement two pose estimation methods and find which is the best algorithm for relative motion plus a vertical direction.

Many algorithms for upright relative motion (camera to camera motion with a known direction) exist, the most important ones are [1], [2] and [3] (code

  • nline). However, these papers do a bad job of comparing with each other

and currently no survey that compares them fairly exist. In this project you will implement [1,2,3] and compare their time performance and accuracy using simulated data (provided by the supervisor). You will also compare them against the state-of-the art in relative motion without known vertical direction [4] (code online).

Federico Camposeco fede@inf.ethz.ch

Required: MATLAB

[1] Kalantari et al. „A new solution to the relative orientation problem using only 3 points and the vertical direction." J.of Math. Imaging and Vision 2011 [2] Naroditsky et al. "Two efficient solutions for visual odometry using directional correspondence.“ PAMI 2012 [3] Sweeney et al.. „Efficient Computation of Absolute Pose for Gravity-Aware Augmented Reality“. ISMAR 2015. [4] Hartley & Li. “An efficient hidden variable approach to minimal-case camera motion estimation." PAMI 2012.

slide-60
SLIDE 60

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Simulating Structure from Motion

Build a synthetic SfM generator in MATLAB.

Many applications, such as localization, rely on using a Structure from Motion (SfM) dataset as an input. The characteristics of the SfM used affect how the algorithms build to work on them

  • behave. Currently, only real-data SfM reconstructions are a valid

way of validating such applications, however such data is limited and expensive to reproduce. In this project, you will implement a MATLAB simulator that can build a faithful SfM reconstruction.

Federico Camposeco fede@inf.ethz.ch

Required: MATLAB

slide-61
SLIDE 61

Semantic 3D reconstruction using a label hierarchy

Goal: leverage semantic hierarchy in order to

accelerate and reduce memory consumption of semantic 3D reconstruction

Description:

Semantic 3D reconstruction is the task of jointly inferring the 3D geometry and the semantic nature of objects in a scene from images. This can be formulated as a volumetric labeling problem as in [1]. However, [1] is limited in the number of labels that it can reconstruct. Solutions like [2] have been proposed to tackle this issue. Exploiting the hierarchy that exist between labels in a coarse to fine way could improve the efficiency of [1]. For instance, wall is a coarse label, refined by windows, itself refined by frames… We propose to explore this idea in order to reduce the memory footprint and speed up the algorithm, leading to an increase in the number of labels

  • ne can reconstruct at the same time, and more efficiency.

[1] Ηäne et al, Joint 3D scene reconstruction and class segmentation, CVPR ‘13 [2] Cherabier et al, Multi‐Label semantic 3D reconstruction using voxel blocks, 3DV ‘16 Requirements / Tools: Required: C++, Linux Recommended: Convex Optimization Supervisors: Ian Cherabier (ian.cherabier@inf.ethz.ch) Martin Oswald (martin.oswald@inf.ethz.ch) Furniture Bed Night table Pillow Sheet Drawer

3D Vision, Spring Semester 2017

slide-62
SLIDE 62

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Detection of repeated structures in SfM

  • Detect large repeated structures in urban sparse

point clouds obtained through SfM.

Sparse point clouds of urban scenes obtained through Structrure-from-Motion are used in a large number of applications, such as virtual tourism, autonomous navigation, urban scene modelling, etc. However, many of these applications require the localization of a new image against this previously computed point cloud. This task becomes very hard in the presence of repetitive structures in the building. In this project, we aim to use the original images and the sparse point cloud in order to detect all of these repetitive structures and identify them on the point cloud, such that subsequent applications can use this information to improve their performance. We will start by using the SIFT matching-based algorithm described in [1] to detect matching 3D points, and then use guided matching and plane extraction in order to find the whole areas of support of each found symmetry/repetition.

[1] Cohen et al., “Discovering and Exploiting 3D Symmetries in Structure from Motion”, CVPR 2012

Andrea Cohen http://people.inf.ethz.ch/acohen/

Required: C++, OpenCV

slide-63
SLIDE 63

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Multi Camera System Calibration

  • Automatic calibration routine for a multi camera

stereo setup

An FPGA featured board includes 10 cameras to support wide field of view

  • applications. As the camera arrangement is not restricted to a single use case

a calibration routine is needed. The cameras intrinsic and extrinsic parameters need to be estimated in order to perform stereo matching. Based on the parameters the FPGA board performs real-time correction. Tasks:

  • Calibration routine for intrinsic/extrinsic parameters of one camera pair.
  • Extend routine to support multiple camera pairs.
  • Upload parameters to the FPGA board.

Dominik Honegger / Peidong Liu dominik.honegger@inf.ethz.ch peidong.liu@inf.ethz.ch

OpenCV / C++ 8 Camera System Output: Left, Right, Disparity FPGA Board

slide-64
SLIDE 64

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Segmentation of the Heart

  • n 3D MR Images

Use shape priors to improve state-of-the-art cardiac segmentation on 3D MR images

Automated segmentation of cardiac structures in volumetric MR images can be challenging due to poor image contrast. Current approaches [1] may fail to reconstruct anatomically correct shapes. The goal of this project is to incorporate shape priors into the problem formulation, which have been successfully applied to 3D reconstruction of street scenes [2], heads [3], and other objects.

[1] Koch et al. "Multi-atlas segmentation using partially annotated data: Methods and annotation strategies." arXiv preprint arXiv:1605.00029 (2016) [2] Häne et al. ”Joint 3D Scene Reconstruction and Class Segmentation” CVPR 2013 [3] Maninchedda et al. “Semantic 3D Reconstruction of Heads” ECCV 2016

Lisa Koch (l.koch@imperial.ac.uk), Katarina Tóthová (katarina.tothova@inf.ethz.ch)

Required: C++, some experience with image processing Recommended: Experience with medical images 2D cross-section at different locations: MRI (left), labels (right)

slide-65
SLIDE 65

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Augmented Reality with Real‐time Physics Simulation

  • To build an augmented reality framework

The goal is to build an augmented reality framework, capable of building the model of the environment (Kinect fusion or any alternative with additional regularization to fill the holes), human interaction (pose estimation from Kinect) and physics (fluid, rigid bodies, smoke) simulation using our real-time data-driven method [1], and rendering the simulation result back to the input image from the camera.

[1] Ladický et al., “Data-Driven Fluid Simulation using Regression Forests”, SIGGRAPH Asia 2015

Ľubor Ladický lubor.ladicky@inf.ethz.ch

C++, CUDA A representative image

slide-66
SLIDE 66

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

High performance and tunable stereo reconstruction

  • Traditional stereo algorithms have focused their efforts on

reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to

  • bserve

its immediate environment and perform tasks within it. In this project, the students are expected to implement an existing high‐ performance and tunable stereo disparity estimation method [1]. The pipeline can potentially enable robots to quickly reconstruct their immediate surroundings and maneuver at high‐speeds. Please find more details from [1].

Peidong Liu peidong.liu@inf.ethz.ch

Required: C++, OpenCV [1] Sudeep Pillai et al., “High-Performance and Tunable Stereo Reconstruction”, ICRA 2016

slide-67
SLIDE 67

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

High speed visual odometry with embedded stereo camera

  • Towards aggressive maneuvers of mobile robots, fast localization

and mapping is required. Many existing pipelines usually require intense computational power, which limits their applications for resource constrained mobile robots, e.g., micro aerial vehicles (MAVs). In this project, the students are expected to explore the possibility to port one of those pipelines for a resource constrained MAV by exploiting an embedded stereo camera. In particular, the students are expected to extend an existing semi‐ dense/dense visual odometry pipeline for a FPGA accelerated stereo camera, which can provide high speed dense depth map.

Peidong Liu peidong.liu@inf.ethz.ch

Required: C++, OpenCV [1] J. Engel et al., LSD-SLAM: Large-scale direct monocular SLAM, ECCV 2014. [2] D. Honegger et al., Real-time and low latency embedded computer vision hardware based on a combination of FPGA and mobile CPU, IROS 2014.

slide-68
SLIDE 68

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Depth Map and Normal Direction Fusion

Include surface normal direction likelihood estimates to volumetric depth map fusion

Volumetric depth map fusion takes a set of input depth maps and fuses them to a consistent 3D model. The formulation is posed as a segmentation of the voxel space into the free space (exterior of the

  • bject) and occupied space (interior of the object). Traditionally

the depth maps give an input about the surface position. The goal of this project is to include likelihoods of surface normal direction into the fusion. Code to produce the input data (depth maps, surface direction likelihoods) and a baseline fusion approach will be provided.

Fabio Maninchedda (fabiom@inf.ethz.ch)

Required: C++, Linux Recommended: Convex Optimization Depth Map Surface Normals Baseline Fusion

slide-69
SLIDE 69

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Orthogonal / Trifocal Stereo

  • Robust stereo for man‐made environments
  • Stereo is a 1D matching algorithm along a line
  • Line-shaped texture and geometry can lead to unresolvable failures
  • Mostly visible in man-made environments (houses, powerlines, fences)
  • The ambiguity in 1D can be resolved with linearly independent

matching in orthogonal directions using 3 or more cameras

  • The scope of the project is to use the CVG stereo / SfM benchmark

dataset to assess the robustness of an orthogonal stereo matching setup in comparison with traditional 2-camera stereo

Lorenz Meier, lm@inf.ethz.ch http://people.inf.ethz.ch/lomeier/

Required: C++ / Matlab, some experience with image processing Recommended: Efficient C++ coding, familiarity with stereo

  • r optical flow and structure-from-motion

A representative image

slide-70
SLIDE 70

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Real-time 3D Model Difference Reasoning with HoloLens

Interactive visualization of model differences (current scan vs. pre-loaded model on the HoloLens.

After loading a reference model to the HoloLens device, the internal mapping of the HoloLens can be used to directly compare the currently scanned model against the pre-loaded reference model. In a first step the reference model needs to be aligned with the current scan (using either with ICP [1] or global methods [2]). Significant differences between the models shall then be highlighted in real-time with intuitive augmented reality overlays.

References: [1] Besl, Paul J.; N.D. McKay, A Method for Registration of 3-D Shapes, TPAMI, 1992. [2] Zhou QY., Park J., Koltun V., Fast Global Registration. ECCV, 2016.

Martin Oswald <martin.oswald@inf.ethz.ch> Pablo Speciale <pablo@inf.ethz.ch>

Required: C++/Visual Studio on Windows 10 Recommended: Unity3D and graphics experience

slide-71
SLIDE 71

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Fixing Structure‐from‐Motion (SfM) Reconstructions

  • Structure‐from‐Motion (SfM) algorithms such as [1] reconstruct the 3D

structure of a scene from images. Due to ambiguities, caused e.g. by repetitive structures, SfM algorithms might connect unrelated parts, resulting in duplicated or collapsed structures (see images).

  • The goal of this project is to use free‐space constraints [2] and semantic cues

[3] to detect these errors and, if possible, correct them.

  • [1] Schönberger & Frahm, “Structure‐from‐Motion Revisited”, CVPR 2016
  • [2] Cohen et al., “Indoor‐Outdoor 3D Reconstruction Alignment“, ECCV 2016
  • [3] Cohen et al., “Merging the Unmatchable: Stitching Visually Disconnected SfM Models”, ICCV

2015

Torsten Sattler torsten.sattler@inf.ethz.ch CNB G104

Required: C++

Automatically detect (and resolve) errors in SfM models

slide-72
SLIDE 72

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Detailed Object Reconstruction Using HoloLens

  • Detailed reconstruction of objects on a HoloLens

Microsoft‘s HoloLens is able to track the movement of a user through the scene and can also capture a 3D mesh of the surrounding environment with its cameras. However, the mesh provided by HoloLens is rather coarse and does not capture fine details. As such, it is not sufficient when scanning fine objects. The goal of this project is to build a 3D object scanner using HoloLens that can also capture fine details. Using the coarse mesh provided by HoloLens as initialization, multi-view stereo refinement techniques such as [1] should be employed to refine the 3D model (of a selected subset of the scene).

[1] Vu et al., “High Accuracy and Visibility-Consistent Dense Multiview Stereo”, PAMI 2012

Torsten Sattler torsten.sattler@inf.ethz.ch CNB G104

Required: C++ / C#, Windows 10, experience with Unity

slide-73
SLIDE 73

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Learning to navigate without a map

  • Learn a reinforcement policy to navigate from the

start to the goal

The goal of the project is to determine how well deep learning is suited for planning under incomplete information. You will use the policy gradients algorithm [1] to learn to navigate a grid world presented in [2] (best paper at NIPS’16). While [2] assumes a fully-observable grid, you will work with the case in which you don’t know where the obstacles are in advance, and need to discover that. You will compare this partially-observed scenario with the fully-

  • bserved one from [2] to discover how much more difficult it is. You will also

compare the deep net performance with the A-star algorithm [3]. [1] http://karpathy.github.io/2016/05/31/rl/ [2] https://papers.nips.cc/paper/6046-value-iteration-networks.pdf [3] https://en.wikipedia.org/wiki/A*_search_algorithm

Nikolay Savinov nikolay.savinov@inf.ethz.ch

Required: python, machine learning knowledge

slide-74
SLIDE 74

3D Selfie for Game Avatar

Goal: High quality reconstruction of face and eyes. Eyes are challenging to reconstruct, due to their reflectiveness. We seek for a high-quality lightweight eye capture system using eye priors. Challenges:

  • Use weak priors for face reconstruction
  • Detection and tracking of eyes
  • Guided reconstruction based on pre-captured eyes
  • Implementation on mobile device
  • P. Bérard et al.

Mobile phone reconstruction

Pre-captured eyes Guided reconstruction Supervisor: Olivier Saurer <saurero@inf.ethz.ch>

slide-75
SLIDE 75

Dense Reconstruction via Mesh Sweeping

Goal: Design a fast stereo reconstruction algorithms that uses a rough mesh as initialization. The mesh is then refined by sweeping along the normals of the individual faces. Challenges:

  • Mesh sweep
  • Photometric refinement
  • GPU implementation

Supervisor: Olivier Saurer <saurero@inf.ethz.ch> Reference:

https://arxiv.org/pdf/1604.06258.pdf

slide-76
SLIDE 76

Micro Baseline Stereo

Goal: While a user frames an object to capture an image, parallax is created from the small motion. We want to exploit this “accidental” motion to infer depth and create a 3D reconstruction of the scene. Challenges:

  • Rolling Shutter imagery
  • Small baseline stereo

(high uncertainty in depth estimates)

  • Implementation Mobile phone (optional)

Supervisor: Olivier Saurer <saurero@inf.ethz.ch>

Reference:

  • High Quality Structure from Small Motion for

Rolling Shutter Cameras

  • 3D Reconstruction from accidental motion

Courtesy: Sunghoon et al. (High Quality Structure from Small Motion for Rolling Shutter Cameras)

slide-77
SLIDE 77

Efficient Texturing

Goal: Implement an efficient texturing algorithm that can deal with changing lighting conditions. The texture gets artifacts if lighting changes during Image capture, for example when the head is turning during scanning. Challenges:

  • Color tone mapping
  • Seam-leveling
  • Fast processing on the phone

Pan & Taubin, 2015

Supervisor: Petri Tanskanen <tpetri@inf.ethz.ch>

slide-78
SLIDE 78

Goal: The existing live Delaunay reconstruction pipeline should be improved towards AR apps by use of geometrical constraints for man-made scenes à la Atlanta world to capture walls and floor. The result could be further refined with a quick (<1 min) photometric optimization.

Delaunay Scene Reconstruction

Supervisor: Petri Tanskanen <tpetri@inf.ethz.ch>

slide-79
SLIDE 79

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Wide Baseline Matching Using Novel View Synthesis

  • Solve the problem of large baseline matching

using novel view synthesis from deep networks

Establishing feature correspondences between image pairs is an important task in image-based 3D

  • reconstruction. Traditional approaches can robustly match between images with similar viewpoint but

fail to robustly match wide baseline image pairs due to the increasing geometric and radiometric

  • distortions. For example, this leads to disconnected 3D models around buildings, drift in SLAM, etc.

Recent advancements in unsupervised deep learning lead to the develpment of variational methods that can synthesis novel viewpoints from existing images. This thesis aims to leverage existing deep neural network architectures for novel view synthesis for robust wide baseline matching. The basic idea is to generate a smooth sequence of images that can be used to track the motion of keypoints from one to the

  • ther images. In the first part, the project should use existing state-of-the-art deep networks and

tracking algorithms to develop this method. In the second part, the method should be evaluated with respect to traditional wide baseline matching approaches in terms of matching performance.

[1] DeepStereo: Learning to Predict New Views from the World's Imagery, Flynn et al. [2] Detection and Tracking of Point Features, Tomasi and Kanade [3] Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, Matas et al.

Johannes Schönberger jsch@inf.ethz.ch

Required: Python, Matlab, Ability to execute/combine existing code

slide-80
SLIDE 80

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Omni‐Directional Stereo for 360° 3D Virtual Reality Video

Recently, Google and Facebook presented software and hardware (Google Jump and Facebook 360) to capture and render panoramic (360°) and stereoscopic (3D) videos that can be viewed using virtual reality viewers such as the Oculus Rift or Google

  • Cardboard. The underlying technology is based on a ring of synchronized cameras and

3D computer vision methods, including stereo, optical flow, etc. This project aims to build a video renderer that takes images from a ring of calibrated and synchronized cameras and outputs a 360° 3D video that can be viewed using e.g. Google Cardboard on Youtube VR.

Johannes Schönberger jsch@inf.ethz.ch

Required: Python or Matlab or C++

  • Implement a video renderer for 360° 3D virtual reality

video using a cameras ring with omni‐directional stereo

slide-81
SLIDE 81

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Construction Progress Tracking

Compute the difference between two stages in the construction process of a building.

A building is 3D scanned by drones at different time/stages in the construction process. These scans represent the current state of the building construction. The provided data [1] (for week 27, 28, 29 and 30) is:

  • CAD models (as planned).
  • Point Cloud (3D reconstruction [2] from images obtained by drones).

The goal is to obtain the difference between two different stages which will allow to track the progress in the construction process. Certain assumptions will be made in order to simplify the problem: all the data will be pre- aligned; some code will be provided to obtain a volumetric representation from the 3D reconstruction and from the CAD models. [1] Schependomlaan Dataset: https://github.com/openBIMstandards/DataSetSchependomlaan. [2] Schönberger & Frahm Structure-from-Motion Revisited. CVPR 2016. https://github.com/colmap/colmap

Pablo Speciale <pablo@inf.ethz.ch> Martin Oswald <martin.oswald@inf.ethz.ch>

Required: Matlab / C++

slide-82
SLIDE 82

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Global Model Alignment

Robust 3D Registration between scanned models and CAD models.

For the purpose of construction progress tracking we want to register 3D model scans with their corresponding CAD model for a comparison of the planned vs. the actual build progress. To this end, we want to be able to align: 1) scanned model vs. scanned model, 2) scanned model vs. CAD model, 3) CAD model vs. CAD model. For robustness we want to apply global alignment methods [1] and study which features [2] give the best performance. Goals:

  • 1. Obtain a point cloud (PC) from a CAD model (by sampling) and apply PC vs. PC registration [1].
  • 2. Transform to volumetric representation. Investigate which features can be applied.
  • 3. Transform to mesh. Investigate which features can be applied.

[1] Zhou QY., Park J., Koltun V. (2016) Fast Global Registration. ECCV 2016. [2] Radu B. Rusu et al. 3D is here: Point Cloud Library (PCL: pointclouds.org). ICRA 2011.

Pablo Speciale <pablo@inf.ethz.ch> Martin Oswald <martin.oswald@inf.ethz.ch>

Required: Matlab / C++ Recommended: PCL library (pointclouds.org)

slide-83
SLIDE 83

3D Vision, Spring Semester 2017

Goal:

Requirements / Tools: Supervisor:

Description:

Simple AR Game

  • n Mobile Phone

Create a simple AR Game that reconstructs the surface of the table as playfield

In this project you can build on a real-time tracker and depthmap computation as basis to create a simple AR game or

  • app. The goal is to use the depthmaps and reconstruct the

rough geometry of the game scene such that the game can interact with the real-world. You will get a sample Android project that demonstrates the tracking and depthmap computation. Depending on the scene, a combination of Marching Cubes or plane-based scene reconstruction will lead to a dense representation.

Petri Tanskanen (tpetri@inf.ethz.ch)

Required: Java and C++, OpenGL Rendering Recommended: Experience with Android app development

slide-84
SLIDE 84

Your Own Project Learn about the techniques presented in the lecture Choose your own topic! Available hardware: Google Tango Tablets Microsoft HoloLens GoPro Cameras Intel RealSense Sensor We find one for you

Required: Related to 3D Vision / topics of the lecture

slide-85
SLIDE 85

Your Next Steps

  • Find a group (ideally: groups of 3)
  • Find a project (one of ours or your own)
  • Let us know about it:
  • Contact us via the lecture Moodle (preferred)
  • Contact Federico per email
  • First come first serve!
  • Do not contact supervisors directly!
  • Talk with your supervisor
  • Write a project proposal
  • Don’t worry: You’ll get reminders!
slide-86
SLIDE 86

Feb 20 Introduction

Feb 27 Geometry, Camera Model, Calibration Mar 6 Features, Tracking / Matching Mar 13

Project Proposals by Students

Mar 20 Structure from Motion (SfM) + papers Mar 27 Dense Correspondence (stereo / optical flow) + papers Apr 3 Bundle Adjustment & SLAM + papers Apr 10

Student Midterm Presentations

Arp17 Easter break Apr 24 Multi-View Stereo & Volumetric Modeling + papers May 1 Labour Day May 8 3D Modeling with Depth Sensors + papers May 15 3D Scene Understanding + papers May 22 4D Video & Dynamic Scenes + papers May 29

Student Project Demo Day = Final Presentations

Schedule