Learning Optical Flow with Limited Data Jia Xu ( ) T encent AI Lab - PowerPoint PPT Presentation

Learning Optical Flow with Limited Data Jia Xu ( �� ) T encent AI Lab 2019-03-14 1

Introduction Input Output Dense correspondence for each pixel between two frames 2

Why Optical Flow? q Optical flow has a wide range of applications. Autonomous Driving 3D Shape Reconstruction Object Tracking Video Action Recognition 3

History of Optical Flow Estimation ��9�4��18 ��62�8�� 8�� 9�19��25�927 ��8�� 8��-4� �48�8�� 88�8�� .0��-4� �� 4��18 �� 4

DC Flow Xu, Ranftl, Koltun. Accurate Optical Flow via Direct Cost Volume Processing. CVPR 2017

CNNs for Optical Flow q Advantage: high performance while running at real time. q Disadvantage: need a large amount of labeled data è difficult to obtain. PWC-Net FlowNet Fischer et al. 2015, "FlowNet: Learning Optical Sun et al. 2018, "PWC-Net: CNNs for Optical Flow Flow with Convolutional Networks" Using Pyramid, Warping, and Cost Volume" 6

CNNs for Optical Flow q Advantage: high performance while running at real time. q Disadvantage: need a large amount of labeled data è difficult to obtain. n Pre-training on synthetic dataset: domain gap. n Unsupervised learning: performance gap, cannot predict flow of occluded pixels. Meister et al. 2018, “UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss" 7

Unsupervised Learning for Optical Flow How to learn optical flow of occluded pixels in a totally unsupervised way? 8

Key Observation q Unsupervised Learning: detect occlusion and exclude occluded pixels. Ø The optical flow of non-occluded pixels can be accurately estimated. Ø How do we fully utilize those reliable non-occluded predictions? Ø Data Distillation! Non- Occluded Occluded Liu, King, Lyu, Xu. DDFlow: Learning Optical Flow with Unlabeled Data Distillation. AAAI 2019 9

Framework q Teacher model is trained with photometric loss ! " for non-occluded pixels. Image Warp ! 1 Forward Forward Warped Teacher ! 1 & Flow Occlusion Image Model w ( * ( & Forward- ! 1 ! 2 backward Photometric Loss consistency ! 2 check Backward Warped Backward Teacher Occlusion Image ! 2 & Flow Model w ) & * ) ! 2 ! 1 Student Model 10

Framework q Student model has the same network structure as teacher model. Image Warp ' 1 Forward Forward Warped Teacher ' 1 & Flow Occlusion Image Model w # % # . Forward- ' 1 ' 2 backward Photometric Loss consistency ' 2 check Backward Warped Backward Teacher Occlusion Image ' 2 & Flow Model w $ . % $ ' 2 ' 1 , ' 1 Forward Warped Forward Student ' ̃ 1 Occlusion Image & Flow Model & # w " # , % . ' , ' 2 Forward- 1 backward Photometric Loss consistency , ' 2 check Backward Warped Backward Student ' ̃ 2 & Occlusion Image Flow Model & $ , . % ' w " $ , ' 1 2 Image Warp 11

Framework q Student model is trained with both ! " for non-occluded pixels and ! # for occluded pixels. Only student model is needed during testing. Image Warp ! # only functions ' 1 Cropped Forward Forward Warped Teacher on pixels that are Occlusion ' 1 & Flow Occlusion Image Model . w # % % # 0 non-occluded in Forward- ' 1 ' 2 # backward original images Photometric Loss consistency but occluded in ' 2 check Backward Warped Cropped Backward Teacher Occlusion Image ' 2 Occlusion & Flow cropped patches. Model . w $ 0 % $ % $ ' 2 ' 1 Loss for Occluded Pixels , ' 1 Forward Valid Warped Forward Student ' ̃ 1 Occlusion Mask Image & Flow Model & # w " # , % / 0 ' , ' 2 # Forward- 1 backward Photometric Loss consistency , ' 2 check Backward Valid Warped Backward Student ' ̃ 2 & Occlusion Mask Image Flow Model & $ , 0 % / $ ' w " $ , ' 1 2 Image Warp 12

Loss Functions q Occlusion estimation: based on the forward-backward consistency prior q Photometric loss ! " q Loss for occluded pixels ! # q Teacher model: ! = ! " No hyperparameter ! q Student Model: ! = ! " + ! # 13

Evaluation Metrics q Optical Flow Ø EPE: average endpoint error between the predicted flow and the ground truth flow over all pixels. Ø Fl: percentage of erroneous pixels. A pixel is considered to be correctly estimated if flow end-point error is < 3 pixels or <5%. q Occlusion estimation Ø F-score: the harmonic average of the precision and recall. 14

Quantitative Comparisons q DDFlow outperforms all existing unsupervised flow learning methods on all datasets. 15

Quantitative Comparisons q Our pre-trained model on Flying Chairs even outperforms the finetuned state- of-the-art unsupervised models on Sintel dataset. 16

Quantitative Comparisons q 28.6 % relative improvement on KITTI 2012, 37.7% relative improvement on KITTI 2015. 17

Quantitative Comparisons q 28.6 % relative improvement on KITTI 2012, 37.7% relative improvement on KITTI 2015. q On KITTI 2012, DDFlow outperforms Flownet 2.0 for ranking metric Fl-noc. 18

Quantitative Comparisons q DDFlow achieves the best occlusion estimation performance on Sintel Clean and Sintel Final datasets. q On KITTI dataset, the ground truth occlusion masks only contain pixels moving out of the image boundary. Under such setting, our method can achieve comparable performance. 19

Qualitative Comparisons q Sample results on Sintel datasets. The first three rows are from Sintel Clean, while the last three are from Sintel Final. 20

Qualitative Comparisons q Example results on KITTI datasets. The first three rows are from KITTI 2012, and the last three are from KITTI 2015. q Note that on KITTI datasets, the occlusion masks are sparse and only contain pixels moving out of the image boundary. 21

Quantitative: Ablation Study q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets. 22

Quantitative: Ablation Study q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets. q Comparing row 1, 3 and row 2, 4: census transform constantly improve performance. 23

Quantitative: Ablation Study q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets. q Comparing row 1, 3 and row 2, 4: census transform constantly improve performance. q Comparing row 4, 5: data distillation can greatly improve the performance, especially for occluded pixels, with EPE-OCC decreases 18.5% on Sintel Clean, 16.1% on Sintel Final, 58.2% on KITTI 2012 and 42.1% on KITTI 2015. 24

Video Flow Estimation on Sintel Dataset q The top part is the input frame and the bottom part is the corresponding optical flow estimated by DDFlow. 27

DDFlow code q Code and models available on https://github.com/ppliuboy/DDFlow. 29

What is Next?

Motivation • Can we completely get rid of synthetic data? • Can we win Sintel back? 31 Liu, King, Lyu, Xu. SelFlow: Self-Supervised Learning of Optical Flow. CVPR 2019

' Initially, ! " and ! # are non-occluded from $ % to $ %&" , ! " ' are their corresponding pixels. NOC-Model can and ! # accurately estimate the flow of ! " and ! # using photometric loss. " # " #+1 $ % ′ & 1 & 1 NOC Flow ′ Model & 2 & 2 $ %&"

We inject random noise to ! "#$ and let noise cover % $ and % & , then % $ and % & become occluded from ! " to ' ! "#$ . OCC-Model cannot accurately estimate flow of % $ and % & using photometric loss. " # " #+1 ! " ′ & 1 & 1 NOC Flow ′ Model & 2 & 2 ! "#$ ) " # " #+1 & 1 OCC Flow Model & 2 ' ! "#$

We distill reliable flow estimations of ! " and ! # from NOC-Model to guide the flow learning for OCC-Model. The guidance is only employed to pixels that are occluded from $ % to & $ %'" but non-occluded from $ % to $ %'" , such as ! " and ! # . " # " #+1 $ % ′ & 1 & 1 NOC Flow ′ Model & 2 & 2 Guide $ %'" ) " # " #+1 Self- & 1 OCC Flow supervision Model & 2 mask & $ %'"

Quantitative Results

Our unsupervised results outperform all existing unsupervised results on all datasets by a large margin.

Our unsupervised results even outperform several famous fully-supervised methods.

Our fine-tuned models achieve state-of-the-art results without using any external labeled data.

Qualitative Results

Effect of Self-supervision

Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision

Without Self-supervision Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision

Without Self-supervision Reference Image Flow Estimation without Self-supervision With Self-supervision Flow Estimation with Self-supervision

Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision

Learning Optical Flow with Limited Data Jia Xu ( ) T encent AI Lab - PowerPoint PPT Presentation

Learning Optical Flow with Limited Data Jia Xu ( ) T encent AI Lab 2019-03-14 1 Introduction Input Output Dense correspondence for each pixel between two frames 2 Why Optical Flow? q Optical flow has a wide range of applications.

NVIDIA OPTICAL FLOW Abhijit Patait, 3/18/2019 Optical Flow in Turing GPUs NVIDIA Optical Flow

USING OPTICAL FLOW CMPS261 Project Shweta Philip OPTICAL FLOW Assumptions made by optical

Upgrading Optical Flow to 3D Scene Flow through Optical Expansion Gengshan Yang 1 , Deva Ramanan

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Optical Flow: Constant Flow Computer Vision 16-385 Carnegie Mellon University (Kris Kitani)

Optical flow Cordelia Schmid Motion field The motion field is the projection of the 3D scene

Optical flow Cordelia Schmid Motion field The motion field is the projection of the 3D scene

Optical flow Cordelia Schmid Motion field The motion field is the projection of the 3D scene

Optical Flow: Horn-Schunck 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Optical Recording and Optical Recording and That audio or video is of the highest quality

Experiment 3 Optical Rotation Optical rotation or optical activity The rotation of the plane

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Flow data Simulation

Optical Fiber Madhuri Jash 07/03/2015 What is Optical Fiber? An optical fiber is a flexible,

Optical Filters for Space Instrumentation Angela Piegari ENEA, Optical Coatings Laboratory, Roma,

Optical Recording and Optical Recording and and tilt it just right, the watchs face appears to

Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology

Synthetic Occlusion Augmentation wit ith Volumetric Heatmaps fo for r 3D Human Pose Esti

Point-Based Global Illumination for Movie Production Per Christensen Pixar Animation Studios

Fast Multiple-baseline Stereo with Occlusion Marc-Antoine Drouin Martin Trudeau S ebastien

Depth from Stereo Sanja Fidler CSC420: Intro to Image Understanding 1 / 12 Depth from Two

Person re-identification by Local Maximal Occurrence representation and metric learning Liao

Image Motion COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision Image Motion 1 /

Occlusion Operads for Image Segmentation Third Symposium on Compositional Structures University