tsinghua university monocular depth pose prediction
play

Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth - PowerPoint PPT Presentation

Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu Tsinghua University Monocular Depth-Pose Prediction [R, t] Depth and Pose RGB PoseNet Fails to Generalize! All Drift ! Depth estimation in Indoor environments with Visual Odometry with


  1. Wang Zhao, Shaohui Liu, Yezhi Shu, Yong-Jin Liu Tsinghua University

  2. Monocular Depth-Pose Prediction ⋮ [R, t] Depth and Pose RGB

  3. PoseNet Fails to Generalize! All Drift ! Depth estimation in Indoor environments with Visual Odometry with Unseen complex camera motions and low texture Camera Ego-motions

  4. Joint Learning without PoseNet Sparse Triangulated Depth Scale Alignment DepthNet Loss Sample & Triangulation 1 ⋯ ⋯ 𝐺: ⋮ 1 ⋮ Normalized ⋮ ⋯ 0 [R, t] FlowNet 8‐Poi nt Inlier Mask Sampled Correspondences Built on top of two-frame structure-from-motion

  5. Joint Learning without PoseNet 1 ⋯ ⋯ 𝐺: ⋮ 1 ⋮ Normalized ⋮ ⋯ 0 [R, t] FlowNet 8‐Poi nt Inlier Mask Sampled Correspondences • Correspondences are sampled based on the occlusion mask and the forward-backward consistency score produced by the optical flow network . • 8-Point algorithm is implemented in RANSAC loop to robustly recover the relative pose. • Epipolar distance (Inlier mask) is calculated and used to further filter out the incorrect matchings and non-rigid objects.

  6. Joint Learning without PoseNet Sample + [R, t] Triangulation Sparse Triangulation Flow Correspondence Relative pose • We sample 6k matches from flow to triangulate, according to the occlusion mask, forward-backward score, and the inlier mask. • We use mid- point triangulation for its convenience and it’s naturally differentiable. • A match is abandoned if the angle between two rays is too small.

  7. Joint Learning without PoseNet Scale Sparse Triangulated Depth Alignment DepthNet Loss • Predicted depth is aligned with triangulation depth map to have a consistent scale. • Triangulation loss, depth re-projection loss and the depth smoothness loss are used to supervise the depth-net.

  8. Scale Disentanglement 1. The translation value 𝒖 of estimated pose [𝑺, 𝒖] from monocular video is up-to-scale! 2. Monocular depth prediction 𝑬 from network has a learnt scale. 3. Joint training losses require a consistent scale across learnt depth and pose.

  9. Scale Disentanglement PoseNet-based learning system Our system Scale Alignment 𝑬 DepthNet DepthNet 𝑬 𝑬′ RGB RGB Loss Loss Input Input PoseNet [𝑺, 𝒖] [𝑺, 𝒖] FlowNet + Solver PoseNet needs to learn a translation No need for network to learn a translation scale consistent with DepthNet scale consistent with DepthNet

  10. Quantitative Results on KITTI dataset Our method achieves state-of-the-art performances on KITTI depth and optical flow estimation.

  11. Robustness Improved – KITTI Visual Odometry with unseen camera ego-motion PoseNet-based Our system

  12. Robustness Improved – TUM Visual Odometry with Indoor Environments PoseNet-based Our system

  13. Robustness Improved – NYUv2 Depth Estimation in Indoor Environments PoseNet-based Ours Input Image PoseNet Our system

  14. Robustness Improved – NYUv2 Depth Estimation in Indoor Environments PoseNet-based Our system Best performance on NYUv2 among unsupervised methods!

  15. Code and model are available here Link: https://github.com/B1ueber2y/TrianFlow Check our paper for more details!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend