Kihwan Kim, Senior Research Scientist
Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz
LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan - - PowerPoint PPT Presentation
LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz CORRESPENDECES IN COMPUTER VISION 2 Image courtesy Roy
Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz
2
Image courtesy Roy Shilkrot
3
Brox and Malik 2011 Fan et al. 2014 Castro M. 2017
4
Brox and Malik 2011
Fan et al. 2014 Castro M. 2017
Letouzey et al 2011
5
[DynamicFusion, R. Newcombe, CVPR 2016]
3D reconstruction of dynamic scene
[Holoportation, Microsoft 2016]
AR and telepresence
6
3D Scene Understanding for autonomous driving Robotics Interaction
[KITTI Dataset, A. Geiger, PAMI 2014] [SE3-Net,A. Byravan, ICRA, 2017]
7
8
โฒ
9
๐๐0โ1
๐ ๐
โฒ
๐๐0โ1
๐ก๐0
10
๐๐0โ1
๐ ๐
โฒ
๐๐0โ1
๐ก๐0
11
12
๐๐0โ1
๐ก๐0
Scene flow
13
Scene flow
14
14
Giphy.com #gopro, #drone, Sondra.T.
15
16
โฒ
17
๐๐0โ1
๐๐
๐๐0โ1
๐ก๐๐
๐๐0โ1
๐ ๐
โฒ
๐๐0โ1
๐ก๐0
18
Input sequence Optical flow Camera ego-motion flow (projected) scene flow or 3D motion field
Camera Ego motion Projected scene flow (3D motion field) Optical flow
Camera Pose (transform)
19
Menze and Geiger. CVPR 2015 Yang et al. ICRA 2011
20
Jaimez et al. ICRA 2017 Vogel et al. ICCV 2013 Quiroga et al. ECCV 2014 Jaimez et al. 3DV 2015 Wulff et al. CVPR 2017
21
22
Flow network PWC-net Rigidity Transform Network (RTN)
๐ฑ๐ ๐ฑ๐ ๐ฌ๐ ๐ฌ๐ [๐บ|๐]
Rigidity Mask Ego-motion flow
Refined [๐บ|๐]
Warping Estimated Projected Scene Flow Subtraction
Optical flow
Refinement In 3D
23
Pose Regressor
๐ ๐ฎ
Deconv 1-5
๐ฑ๐ ๐ฌ๐ ๐ฑ๐ ๐ฌ๐
Rigidity Attention Mask
conv1-6
24
Global Average Pooling conv-T conv-R
Deconv 1-5 c
v 2 c
v 1 c
v 3 c
v 4 c
v 5 c
v 6
๐ฑ๐ ๐ฌ๐ ๐ฑ๐ ๐ฌ๐
Rigidity Attention Mask Huber loss Translation Rotation
Binary cross entropy loss
25
26
Rigidity mask Occlusion mask Flow correspondences We solve this objective function using off-the-shelf Gauss Newton solver GTSAM.
๐ฃ,๐ค โฮฉ
0 ๐ฃ + ๐๐ฃ, ๐ค + ๐๐ค + ๐ โ ๐ 1 ๐ฃ, ๐ค
27
Scene-net RGB-D SLAM benchmark
RGB-D dataset Lay-out Number Total Images Scenes Pose (GT) Optical flow (GT) Segmentation (GT) Photo realistic Depth realistic Scene-net 47 5.1M static Yes Yes (from pose) Yes No Yes RGB-D SLAM 18 230K static Yes No No Yes Yes SINTEL 23 1018 dynamic Yes Yes Yes No Yes FlyingThings
dynamic Yes Yes Yes No No Monkaa
dynamic Yes Yes Yes No Yes
SINTEL FlyingThings 3D Monkaa
28
29
30
31
32
Trained from
33
34
35
36
๏ฑ Robust per-pixel โRigidityโ of dynamic scenes ๏ฑ Camera pose refined jointly together with 2D optical flow and rigid/occlusion masks ๏ฑ Novel semi-synthetic dynamic scene dataset, REFRESH ๏ฑ Ours outperforms the state-of-the-art in SINTEL
End-to-end framework that learns rigidity as well as correspondences More rich contents in dynamic scene data for encouraging more generalization