LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan - - PowerPoint PPT Presentation

โ–ถ
learning rigidity in dynamic scenes
SMART_READER_LITE
LIVE PREVIEW

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan - - PowerPoint PPT Presentation

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz CORRESPENDECES IN COMPUTER VISION 2 Image courtesy Roy


slide-1
SLIDE 1

Kihwan Kim, Senior Research Scientist

Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION

slide-2
SLIDE 2

2

CORRESPENDECES IN COMPUTER VISION

Image courtesy Roy Shilkrot

slide-3
SLIDE 3

3

OPTICAL FLOW

Brox and Malik 2011 Fan et al. 2014 Castro M. 2017

slide-4
SLIDE 4

4

OPTICAL FLOW AND 3D SCENE FLOW

Brox and Malik 2011

Fan et al. 2014 Castro M. 2017

Letouzey et al 2011

slide-5
SLIDE 5

5

APPLICATION OF 3D MOTION

[DynamicFusion, R. Newcombe, CVPR 2016]

3D reconstruction of dynamic scene

[Holoportation, Microsoft 2016]

AR and telepresence

slide-6
SLIDE 6

6

APPLICATION OF 3D MOTION

3D Scene Understanding for autonomous driving Robotics Interaction

[KITTI Dataset, A. Geiger, PAMI 2014] [SE3-Net,A. Byravan, ICRA, 2017]

slide-7
SLIDE 7

7

2D OPTICAL FLOW VS 3D SCENE FLOW

Why 3D motion estimation is challenging?

slide-8
SLIDE 8

8

๐ฝ0 ๐’—๐Ÿ

x0

๐›๐Ÿ

STATIC SCENE - MOVING CAMERA

๐’—๐Ÿ

โ€ฒ

๐ฝ1 ๐’—๐Ÿ ๐’—โ€ฒ๐Ÿ

slide-9
SLIDE 9

9

๐œ€๐’—0โ†’1

๐’…๐’

๐ฝ0 ๐’—๐Ÿ ๐’—๐Ÿ ๐’—๐Ÿ

โ€ฒ

๐œ€๐’—0โ†’1

๐‘ก๐‘”0

๐’—โ€ฒ๐Ÿ

x0

๐›๐Ÿ ๐œ€๐’—0โ†’1

๐’…๐’ Optical flow from camera motion

๐ฝ1

STATIC SCENE - MOVING CAMERA

slide-10
SLIDE 10

10

๐œ€๐’—0โ†’1

๐’…๐’

๐ฝ0 ๐’—๐Ÿ ๐’—๐Ÿ ๐’—๐Ÿ

โ€ฒ

๐œ€๐’—0โ†’1

๐‘ก๐‘”0

๐’—โ€ฒ๐Ÿ

x0

๐›๐Ÿ ๐œ€๐’—0โ†’1

๐’…๐’ Optical flow from camera motion

๐ฝ1

STATIC SCENE - MOVING CAMERA

Structure (3D)

from

(camera) Motion

slide-11
SLIDE 11

11

๐ฝ0 ๐’—๐Ÿ

x0

๐›๐Ÿ

DYNAMIC SCENE - FIXED CAMERA

slide-12
SLIDE 12

12

๐ฝ0 ๐’—๐Ÿ

๐œ€๐’—0โ†’1

๐‘ก๐‘”0

๐’—โ€ฒ๐Ÿ

x0 x1

๐›๐Ÿ ๐›๐Ÿ ๐œ€๐’—0โ†’1

๐‘ก๐‘”1 Projected scene flow in ๐ฝ1

๐œ€x0โ†’1

Scene flow

๐œ€x0โ†’1

DYNAMIC SCENE - FIXED CAMERA

slide-13
SLIDE 13

13

๐ฝ0 ๐’—๐Ÿ ๐’—โ€ฒ๐Ÿ

x0 x1

๐›๐Ÿ ๐›๐Ÿ ๐œ€x0โ†’1

Scene flow

๐œ€x0โ†’1

DYNAMIC SCENE - FIXED CAMERA

slide-14
SLIDE 14

14

COMMON VIDEOS NOWADAYS

14

Giphy.com #gopro, #drone, Sondra.T.

slide-15
SLIDE 15

15

๐ฝ0 ๐’—๐Ÿ

x0

๐›๐Ÿ

DYNAMIC SCENE โ€“ MOVING CAMERA

slide-16
SLIDE 16

16

๐ฝ0 ๐ฝ1 ๐’—๐Ÿ ๐’—๐Ÿ ๐’—๐Ÿ

โ€ฒ

๐’—โ€ฒ๐Ÿ

x0 x1

๐›๐Ÿ ๐›๐Ÿ ๐œ€x0โ†’1

DYNAMIC SCENE โ€“ MOVING CAMERA

slide-17
SLIDE 17

17

๐œ€๐’—0โ†’1

๐‘๐‘”

๐œ€๐’—0โ†’1

๐‘ก๐‘”๐Ÿ

๐œ€๐’—0โ†’1

๐’…๐’

๐ฝ0 ๐ฝ1 ๐’—๐Ÿ ๐’—๐Ÿ ๐’—๐Ÿ ๐’—๐Ÿ

โ€ฒ

๐œ€๐’—0โ†’1

๐‘ก๐‘”0

๐’—โ€ฒ๐Ÿ

x0 x1

๐›๐Ÿ ๐›๐Ÿ ๐œ€๐’—0โ†’1

๐‘ก๐‘”1

๐œ€๐’—0โ†’1

๐‘๐‘”

๐œ€๐’—0โ†’1

๐’…๐’ Projected scene flow in ๐ฝ1 Optical flow Optical flow from camera motion

๐œ€x0โ†’1

DYNAMIC SCENE โ€“ MOVING CAMERA

slide-18
SLIDE 18

18

Input sequence Optical flow Camera ego-motion flow (projected) scene flow or 3D motion field

Camera Ego motion Projected scene flow (3D motion field) Optical flow

Camera Pose (transform)

RIGIDITY

DYNAMIC SCENE โ€“ MOVING CAMERA

slide-19
SLIDE 19

19

HOW OTHER WORKS SOLVE THIS?

Non-rigid or rigid local motions as outliers

Menze and Geiger. CVPR 2015 Yang et al. ICRA 2011

slide-20
SLIDE 20

20

HOW OTHER FLOW ALGORITHMS SOLVE THIS?

Jaimez et al. ICRA 2017 Vogel et al. ICCV 2013 Quiroga et al. ECCV 2014 Jaimez et al. 3DV 2015 Wulff et al. CVPR 2017

slide-21
SLIDE 21

21

OUR PROPOSAL

Learn which parts of the scene is (likely) rigid/non-rigid

slide-22
SLIDE 22

22

PIPELINE

Flow network PWC-net Rigidity Transform Network (RTN)

๐‘ฑ๐Ÿ ๐‘ฑ๐Ÿ ๐‘ฌ๐Ÿ ๐‘ฌ๐Ÿ [๐‘บ|๐’–]

Rigidity Mask Ego-motion flow

Refined [๐‘บ|๐’–]

Warping Estimated Projected Scene Flow Subtraction

Optical flow

Refinement In 3D

slide-23
SLIDE 23

23

RIGIDITY TRANSFORM NETWORK (RTN)

Pose Regressor

๐’ ๐ฎ

Deconv 1-5

๐‘ฑ๐Ÿ ๐‘ฌ๐Ÿ ๐‘ฑ๐Ÿ ๐‘ฌ๐Ÿ

Rigidity Attention Mask

conv1-6

slide-24
SLIDE 24

24

RIGIDITY TRANSFORM NETWORK (RTN)

Global Average Pooling conv-T conv-R

๐’ ๐ฎ

Deconv 1-5 c

  • n

v 2 c

  • n

v 1 c

  • n

v 3 c

  • n

v 4 c

  • n

v 5 c

  • n

v 6

๐‘ฑ๐Ÿ ๐‘ฌ๐Ÿ ๐‘ฑ๐Ÿ ๐‘ฌ๐Ÿ

Rigidity Attention Mask Huber loss Translation Rotation

Binary cross entropy loss

slide-25
SLIDE 25

25

2D OPTICAL FLOW PWCNET

CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume Sun et al. CVPR 2018

slide-26
SLIDE 26

26

POSE REFINEMENT AND FLOW

Rigidity mask Occlusion mask Flow correspondences We solve this objective function using off-the-shelf Gauss Newton solver GTSAM.

๐‘บ ๐’– โ‹† = arg min เท

๐‘ฃ,๐‘ค โˆˆฮฉ

๐ถ ๐‘ฃ, ๐‘ค = 1 ๐‘ƒ ๐‘ฃ, ๐‘ค = 0 ๐‘€ ๐‘บ๐‘Š

0 ๐‘ฃ + ๐œ€๐‘ฃ, ๐‘ค + ๐œ€๐‘ค + ๐’– โˆ’ ๐‘Š 1 ๐‘ฃ, ๐‘ค

slide-27
SLIDE 27

27

SUPERVISION NEEDED

Scene-net RGB-D SLAM benchmark

RGB-D dataset Lay-out Number Total Images Scenes Pose (GT) Optical flow (GT) Segmentation (GT) Photo realistic Depth realistic Scene-net 47 5.1M static Yes Yes (from pose) Yes No Yes RGB-D SLAM 18 230K static Yes No No Yes Yes SINTEL 23 1018 dynamic Yes Yes Yes No Yes FlyingThings

  • 25K

dynamic Yes Yes Yes No No Monkaa

  • 10K

dynamic Yes Yes Yes No Yes

SINTEL FlyingThings 3D Monkaa

slide-28
SLIDE 28

28

SEMI-SYNTHETIC DYNAMIC SCENE DATASET

slide-29
SLIDE 29

29

REFRESH DATASET

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

SINTEL EVALUATION

Trained from

  • ur data, testing
  • n SINTEL data
slide-33
SLIDE 33

33

SINTEL EVALUATION (POSE)

slide-34
SLIDE 34

34

REAL WORLD DATA EVALUATION

slide-35
SLIDE 35

35

slide-36
SLIDE 36

36

CONCLUSION

Proposed a learning-based approach to estimate the rigid regions in dynamic scenes observed by a moving camera

๏ฑ Robust per-pixel โ€œRigidityโ€ of dynamic scenes ๏ฑ Camera pose refined jointly together with 2D optical flow and rigid/occlusion masks ๏ฑ Novel semi-synthetic dynamic scene dataset, REFRESH ๏ฑ Ours outperforms the state-of-the-art in SINTEL

Future works

End-to-end framework that learns rigidity as well as correspondences More rich contents in dynamic scene data for encouraging more generalization