Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth - PowerPoint PPT Presentation

Oct 27, 2023 •441 likes •610 views

Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet Fails to Generalize! All Drift ! Depth estimation in Indoor environments with Visual Odometry with

Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu Tsinghua University
Monocular Depth-Pose Prediction ⋮ [R, t] Depth and Pose RGB
PoseNet Fails to Generalize! All Drift ! Depth estimation in Indoor environments with Visual Odometry with Unseen complex camera motions and low texture Camera Ego-motions
Joint Learning without PoseNet Sparse Triangulated Depth Scale Alignment DepthNet Loss Sample & Triangulation 1 ⋯ ⋯ 𝐺: ⋮ 1 ⋮ Normalized ⋮ ⋯ 0 [R, t] FlowNet 8‐Poi nt Inlier Mask Sampled Correspondences Built on top of two-frame structure-from-motion
Joint Learning without PoseNet 1 ⋯ ⋯ 𝐺: ⋮ 1 ⋮ Normalized ⋮ ⋯ 0 [R, t] FlowNet 8‐Poi nt Inlier Mask Sampled Correspondences • Correspondences are sampled based on the occlusion mask and the forward-backward consistency score produced by the optical flow network . • 8-Point algorithm is implemented in RANSAC loop to robustly recover the relative pose. • Epipolar distance (Inlier mask) is calculated and used to further filter out the incorrect matchings and non-rigid objects.
Joint Learning without PoseNet Sample + [R, t] Triangulation Sparse Triangulation Flow Correspondence Relative pose • We sample 6k matches from flow to triangulate, according to the occlusion mask, forward-backward score, and the inlier mask. • We use mid- point triangulation for its convenience and it’s naturally differentiable. • A match is abandoned if the angle between two rays is too small.
Joint Learning without PoseNet Scale Sparse Triangulated Depth Alignment DepthNet Loss • Predicted depth is aligned with triangulation depth map to have a consistent scale. • Triangulation loss, depth re-projection loss and the depth smoothness loss are used to supervise the depth-net.
Scale Disentanglement 1. The translation value 𝒖 of estimated pose [𝑺, 𝒖] from monocular video is up-to-scale! 2. Monocular depth prediction 𝑬 from network has a learnt scale. 3. Joint training losses require a consistent scale across learnt depth and pose.
Scale Disentanglement PoseNet-based learning system Our system Scale Alignment 𝑬 DepthNet DepthNet 𝑬 𝑬′ RGB RGB Loss Loss Input Input PoseNet [𝑺, 𝒖] [𝑺, 𝒖] FlowNet + Solver PoseNet needs to learn a translation No need for network to learn a translation scale consistent with DepthNet scale consistent with DepthNet
Quantitative Results on KITTI dataset Our method achieves state-of-the-art performances on KITTI depth and optical flow estimation.
Robustness Improved – KITTI Visual Odometry with unseen camera ego-motion PoseNet-based Our system
Robustness Improved – TUM Visual Odometry with Indoor Environments PoseNet-based Our system
Robustness Improved – NYUv2 Depth Estimation in Indoor Environments PoseNet-based Ours Input Image PoseNet Our system
Robustness Improved – NYUv2 Depth Estimation in Indoor Environments PoseNet-based Our system Best performance on NYUv2 among unsupervised methods!
Code and model are available here Link: https://github.com/B1ueber2y/TrianFlow Check our paper for more details!

Recommend

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery Anisimovskiy, Andrey Shcherbinin, 15 May, 2020 Sergey Turko and Ilya Kurilin Problem Statement: Depth Sensors limitations IR depth sensor Stereo camera

671 views • 21 slides

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1 0 1 2 3 I 1 I 2 0, 1 2, 3 S 1 S 2 SS 1.1 SS 1.1 SS 1.2 1 2 0,1 3 SS 2.1 SS 2.2 3 0 3 2 2,3 1

518 views • 24 slides

6-DoF Pose Localization in 3D Point-Cloud Dense Maps Using a Monocular Camera Authors: Carlos

6-DoF Pose Localization in 3D Point-Cloud Dense Maps Using a Monocular Camera Authors: Carlos Jaramillo [a] Ivan Dryanovski [a] Roberto Valenti [b] Jizhong Xiao [b] Presenter: Dr. Jizhong Xiao City University of New York The Graduate Center [a]

601 views • 22 slides

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg

Human Pose Estimation by Yannic Jnike - 04.11.2019 https://www.youtube.com/watch?v=mxKlUO_tjcg 1 Human Pose Estimation 1. What is Human Pose Estimation 2. OpenPose Pipeline 3. Bottom Up or Top Down Approach 2 What is Human Pose

3.09k views • 33 slides

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose

Hand Pose Estimation Matthew Krenik Advisor: Fabrizio Pece Agenda What is Hand Pose Estimation? Why does it matter? How does it work? What has been done? 2 What is Hand Pose Estimation? Estimate full Degree of

637 views • 48 slides

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth and width during base- -level fluctuations level fluctuations during base during base- -level fluctuations

492 views • 14 slides

Monocular Vision Based Obstacle Avoidance: A Literature Review Outline Introduction

Monocular Vision Based Obstacle Avoidance: A Literature Review Outline Introduction Problem Relevant Work Monocular Cues Machine Learning Expansion Optical Flow SLAM GPU Processing

455 views • 26 slides

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu ,

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu , Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt Marc Habermann DeepCap Human performance capture from a monocular camera Marc

902 views • 27 slides

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI, Shaojie SHEN Source Code: http://github.com/HKUST-Aerial-Robotics/VINS-Mono @2019 HKUST Aerial Robotics Group | http://uav.ust.hk Monocular

1.09k views • 37 slides

Monocular 3D Reconstruction Using Depth Cues from Planar Range Data Ben Eckart Heather Justice

Monocular 3D Reconstruction Using Depth Cues from Planar Range Data Ben Eckart Heather Justice Joe Lisee 1 Motivation Eckart, Justice, Lisee 2 Motivation Laser-Level Obstacles Above-Laser Obstacles Eckart, Justice, Lisee 3 Project

231 views • 7 slides

Monocular Depth Estimation Using Atrous Convolutions Group 5 - Faraz Saeedan Fabian Kessler,

Monocular Depth Estimation Using Atrous Convolutions Group 5 - Faraz Saeedan Fabian Kessler, Dominik Straub, Steven Lang February 15, 2019 Technische Universitt Darmstadt Introduction Experiments Results Discussion Conclusion 1 Fabian

715 views • 45 slides

Human-in-the-loop Data Integration Guoliang Li Department of Computer Science, Tsinghua

Human-in-the-loop Data Integration Guoliang Li Department of Computer Science, Tsinghua University, China http://dbgroup.cs.tsinghua.edu.cn/ligl Acknowledgement Jianhua Feng Lizhu Zhou Beng Chin Ooi Chen Li @Tsinghua @Tsinghua @NUS @UCI

762 views • 41 slides

Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of Learning to Fuse

Credit: https://xkcd.com/1897/ ROI-10D: Monocular Lifting of Learning to Fuse Things and Stuff 2D Detection to 6D Pose and Metric Shape F Manhardt, W Kehl, A Gaidon J Li, A Raventos, A Bhargava, T Tagawa, A Gaidon

1.05k views • 40 slides

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 ,

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 , Kaustav Kundu 2 Raquel Urtasun 2 Sanja Fidler 2 1 Tsinghua University, 2 University of Toronto Liu, Schwing, Kundu, Urtasun, Fidler Rent3D 1 /

790 views • 54 slides

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in depth image shows the distance to camera Device Kinect Kinect2 (we use) SoftKinetic Leapmotion Kinect Depth camera developed by

480 views • 29 slides

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction Prediction Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala virtutech virtu

377 views • 26 slides

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University of Freiburg 1 Human-Oriented Robotics Temporal Reasoning Prof. Kai Arras Social

1k views • 84 slides

Usin ing UAV Technology Thomas Bamford, Kamran Esmaeili, Angela P. Schoellig CAMI 2016 In

A Real-Time Analysis of Rock Fragmentation Usin ing UAV Technology Thomas Bamford, Kamran Esmaeili, Angela P. Schoellig CAMI 2016 In Introduction In Interdiscipli linary team at the Univ iversity of f Toro ronto Thomas Bamford

628 views • 38 slides

DEEP LEARNING (F OR R OBOTIC V ISION ) Juxi Leitner @Juxi http://Juxi.net R ESEARCHER R OBOTICS

H ANDS - ON T UTORIAL DEEP LEARNING (F OR R OBOTIC V ISION ) Juxi Leitner @Juxi http://Juxi.net R ESEARCHER R OBOTICS /AI h7p:/ /Juxi.net/workshop/SoAIR/ roboticvision.org http://Juxi.net/aboutme create agents that see & interact with

536 views • 31 slides

on Petascale Supercomputing Platforms John E. Stone, Kirby L. Vandivort, Klaus Schulten

GPU-Accelerated Molecular Visualization on Petascale Supercomputing Platforms John E. Stone, Kirby L. Vandivort, Klaus Schulten Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

668 views • 31 slides

EE-559 Deep learning 9.3. Visualizing the processing in the input Fran cois Fleuret

EE-559 Deep learning 9.3. Visualizing the processing in the input Fran cois Fleuret https://fleuret.org/ee559/ Aug 17, 2020 Occlusion sensitivity Fran cois Fleuret EE-559 Deep learning / 9.3. Visualizing the processing in the

373 views • 33 slides

Learning Visual Distance Function L i Vi l Di t F ti for Identification from one Example for

Learning Visual Distance Function L i Vi l Di t F ti for Identification from one Example for Identification from one Example. Eric Nowak and Frederic Jurie E i N k d F d i J i Bertin Technologies / CNRS LEAR Group INRIA - France

490 views • 19 slides

Welcome! Todays Agenda: The Postprocessing Pipeline Vignetting, Chromatic Aberration

INFOGR Computer Graphics J. Bikker - April-July 2015 - Lecture 12: Advanced Shading Welcome! Todays Agenda: The Postprocessing Pipeline Vignetting, Chromatic Aberration Film Grain HDR effects Color Grading

1.18k views • 74 slides

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During

Learning to Synthesize Motion Blur CVPR 2019 Tim Brooks and Jon Barron Research Motion During Exposure Causes Blur Motion During Exposure Causes Blur Accidental Motion Blur Purposeful Motion Blur Synthetic Motion Blur Input Image 1 Output

654 views • 20 slides