LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan - - PowerPoint PPT Presentation

▶

Aug 24, 2022 584 likes •962 views

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz CORRESPENDECES IN COMPUTER VISION 2 Image courtesy Roy

SLIDE 1

Kihwan Kim, Senior Research Scientist

Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION

SLIDE 2

CORRESPENDECES IN COMPUTER VISION

Image courtesy Roy Shilkrot

SLIDE 3

OPTICAL FLOW

Brox and Malik 2011 Fan et al. 2014 Castro M. 2017

SLIDE 4

OPTICAL FLOW AND 3D SCENE FLOW

Brox and Malik 2011

Fan et al. 2014 Castro M. 2017

Letouzey et al 2011

SLIDE 5

APPLICATION OF 3D MOTION

[DynamicFusion, R. Newcombe, CVPR 2016]

3D reconstruction of dynamic scene

[Holoportation, Microsoft 2016]

AR and telepresence

SLIDE 6

APPLICATION OF 3D MOTION

3D Scene Understanding for autonomous driving Robotics Interaction

[KITTI Dataset, A. Geiger, PAMI 2014] [SE3-Net,A. Byravan, ICRA, 2017]

SLIDE 7

2D OPTICAL FLOW VS 3D SCENE FLOW

Why 3D motion estimation is challenging?

SLIDE 8

𝐽0 𝒗𝟏

x0

𝛁𝟏

STATIC SCENE - MOVING CAMERA

𝒗𝟏

′

𝐽1 𝒗𝟏 𝒗′𝟏

SLIDE 9

𝜀𝒗0→1

𝒅𝒏

𝐽0 𝒗𝟏 𝒗𝟏 𝒗𝟏

′

𝜀𝒗0→1

𝑡𝑔0

𝒗′𝟏

x0

𝛁𝟏 𝜀𝒗0→1

𝒅𝒏 Optical flow from camera motion

𝐽1

STATIC SCENE - MOVING CAMERA

SLIDE 10

𝜀𝒗0→1

𝒅𝒏

𝐽0 𝒗𝟏 𝒗𝟏 𝒗𝟏

′

𝜀𝒗0→1

𝑡𝑔0

𝒗′𝟏

x0

𝛁𝟏 𝜀𝒗0→1

𝒅𝒏 Optical flow from camera motion

𝐽1

STATIC SCENE - MOVING CAMERA

Structure (3D)

from

(camera) Motion

SLIDE 11

𝐽0 𝒗𝟏

x0

𝛁𝟏

DYNAMIC SCENE - FIXED CAMERA

SLIDE 12

𝐽0 𝒗𝟏

𝜀𝒗0→1

𝑡𝑔0

𝒗′𝟐

x0 x1

𝛁𝟏 𝛁𝟐 𝜀𝒗0→1

𝑡𝑔1 Projected scene flow in 𝐽1

𝜀x0→1

Scene flow

𝜀x0→1

DYNAMIC SCENE - FIXED CAMERA

SLIDE 13

𝐽0 𝒗𝟏 𝒗′𝟐

x0 x1

𝛁𝟏 𝛁𝟐 𝜀x0→1

Scene flow

𝜀x0→1

DYNAMIC SCENE - FIXED CAMERA

SLIDE 14

COMMON VIDEOS NOWADAYS

Giphy.com #gopro, #drone, Sondra.T.

SLIDE 15

𝐽0 𝒗𝟏

x0

𝛁𝟏

DYNAMIC SCENE – MOVING CAMERA

SLIDE 16

𝐽0 𝐽1 𝒗𝟏 𝒗𝟐 𝒗𝟏

′

𝒗′𝟐

x0 x1

𝛁𝟏 𝛁𝟐 𝜀x0→1

DYNAMIC SCENE – MOVING CAMERA

SLIDE 17

𝜀𝒗0→1

𝑝𝑔

𝜀𝒗0→1

𝑡𝑔𝟐

𝜀𝒗0→1

𝒅𝒏

𝐽0 𝐽1 𝒗𝟏 𝒗𝟏 𝒗𝟐 𝒗𝟏

′

𝜀𝒗0→1

𝑡𝑔0

𝒗′𝟐

x0 x1

𝛁𝟏 𝛁𝟐 𝜀𝒗0→1

𝑡𝑔1

𝜀𝒗0→1

𝑝𝑔

𝜀𝒗0→1

𝒅𝒏 Projected scene flow in 𝐽1 Optical flow Optical flow from camera motion

𝜀x0→1

DYNAMIC SCENE – MOVING CAMERA

SLIDE 18

Input sequence Optical flow Camera ego-motion flow (projected) scene flow or 3D motion field

Camera Ego motion Projected scene flow (3D motion field) Optical flow

Camera Pose (transform)

RIGIDITY

DYNAMIC SCENE – MOVING CAMERA

SLIDE 19

HOW OTHER WORKS SOLVE THIS?

Non-rigid or rigid local motions as outliers

Menze and Geiger. CVPR 2015 Yang et al. ICRA 2011

SLIDE 20

HOW OTHER FLOW ALGORITHMS SOLVE THIS?

Jaimez et al. ICRA 2017 Vogel et al. ICCV 2013 Quiroga et al. ECCV 2014 Jaimez et al. 3DV 2015 Wulff et al. CVPR 2017

SLIDE 21

OUR PROPOSAL

Learn which parts of the scene is (likely) rigid/non-rigid

SLIDE 22

PIPELINE

Flow network PWC-net Rigidity Transform Network (RTN)

𝑱𝟐 𝑱𝟏 𝑬𝟐 𝑬𝟏 [𝑺|𝒖]

Rigidity Mask Ego-motion flow

Refined [𝑺|𝒖]

Warping Estimated Projected Scene Flow Subtraction

Optical flow

Refinement In 3D

SLIDE 23

RIGIDITY TRANSFORM NETWORK (RTN)

Pose Regressor

𝐒 𝐮

Deconv 1-5

𝑱𝟏 𝑬𝟏 𝑱𝟐 𝑬𝟐

Rigidity Attention Mask

conv1-6

SLIDE 24

RIGIDITY TRANSFORM NETWORK (RTN)

Global Average Pooling conv-T conv-R

𝐒 𝐮

Deconv 1-5 c

v 2 c

v 1 c

v 3 c

v 4 c

v 5 c

v 6

𝑱𝟏 𝑬𝟏 𝑱𝟐 𝑬𝟐

Rigidity Attention Mask Huber loss Translation Rotation

Binary cross entropy loss

SLIDE 25

2D OPTICAL FLOW PWCNET

CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume Sun et al. CVPR 2018

SLIDE 26

POSE REFINEMENT AND FLOW

Rigidity mask Occlusion mask Flow correspondences We solve this objective function using off-the-shelf Gauss Newton solver GTSAM.

𝑺 𝒖 ⋆ = arg min ෍

𝑣,𝑤 ∈Ω

𝐶 𝑣, 𝑤 = 1 𝑃 𝑣, 𝑤 = 0 𝑀 𝑺𝑊

0 𝑣 + 𝜀𝑣, 𝑤 + 𝜀𝑤 + 𝒖 − 𝑊 1 𝑣, 𝑤

SLIDE 27

SUPERVISION NEEDED

Scene-net RGB-D SLAM benchmark

RGB-D dataset Lay-out Number Total Images Scenes Pose (GT) Optical flow (GT) Segmentation (GT) Photo realistic Depth realistic Scene-net 47 5.1M static Yes Yes (from pose) Yes No Yes RGB-D SLAM 18 230K static Yes No No Yes Yes SINTEL 23 1018 dynamic Yes Yes Yes No Yes FlyingThings

dynamic Yes Yes Yes No No Monkaa

dynamic Yes Yes Yes No Yes

SINTEL FlyingThings 3D Monkaa

SLIDE 28

SEMI-SYNTHETIC DYNAMIC SCENE DATASET

SLIDE 29

REFRESH DATASET

SLIDE 30

SLIDE 31

SLIDE 32

SINTEL EVALUATION

Trained from

ur data, testing
n SINTEL data

SLIDE 33

SINTEL EVALUATION (POSE)

SLIDE 34

REAL WORLD DATA EVALUATION

SLIDE 35

SLIDE 36

CONCLUSION

Proposed a learning-based approach to estimate the rigid regions in dynamic scenes observed by a moving camera

 Robust per-pixel “Rigidity” of dynamic scenes  Camera pose refined jointly together with 2D optical flow and rigid/occlusion masks  Novel semi-synthetic dynamic scene dataset, REFRESH  Ours outperforms the state-of-the-art in SINTEL

Future works

End-to-end framework that learns rigidity as well as correspondences More rich contents in dynamic scene data for encouraging more generalization