SLIDE 1 Learning Optical Flow with Limited Data
1
Jia Xu ()
T encent AI Lab 2019-03-14
SLIDE 2 2
Introduction
Dense correspondence for each pixel between two frames Input Output
SLIDE 3 3
q Optical flow has a wide range of applications.
Why Optical Flow?
Object Tracking Autonomous Driving Video Action Recognition 3D Shape Reconstruction
SLIDE 4 4
History of Optical Flow Estimation
418 9418 8 628 8-4
8 488
SLIDE 5 DC Flow
Xu, Ranftl, Koltun. Accurate Optical Flow via Direct Cost Volume Processing. CVPR 2017
SLIDE 6 6
q Advantage: high performance while running at real time. q Disadvantage: need a large amount of labeled data è difficult to obtain.
CNNs for Optical Flow
FlowNet PWC-Net
Fischer et al. 2015, "FlowNet: Learning Optical Flow with Convolutional Networks" Sun et al. 2018, "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume"
SLIDE 7 7
q Advantage: high performance while running at real time. q Disadvantage: need a large amount of labeled data è difficult to obtain. n Pre-training on synthetic dataset: domain gap. n Unsupervised learning: performance gap, cannot predict flow of occluded pixels.
CNNs for Optical Flow
Meister et al. 2018, “UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss"
SLIDE 8 8
Unsupervised Learning for Optical Flow
How to learn optical flow of occluded pixels in a totally unsupervised way?
SLIDE 9 9
q Unsupervised Learning: detect occlusion and exclude occluded pixels. Ø The optical flow of non-occluded pixels can be accurately estimated. Ø How do we fully utilize those reliable non-occluded predictions? Ø Data Distillation!
Key Observation
Non- Occluded Occluded
Liu, King, Lyu, Xu. DDFlow: Learning Optical Flow with Unlabeled Data Distillation. AAAI 2019
SLIDE 10 10
q Teacher model is trained with photometric loss !" for non-occluded pixels.
Framework
Teacher Model Teacher Model Student Model
Forward- backward consistency check
!1 & !2 !2 & !1
!2 !1
Warped Image
!1
&
Warped Image
!2
&
Image Warp Photometric Loss
Forward Flow
w(
Backward Flow
w)
Backward Occlusion
*)
Forward Occlusion
*(
SLIDE 11 11
q Student model has the same network structure as teacher model.
Framework
Forward Flow
w "#
Backward Flow
w "$
Backward Occlusion
% &$
Forward Occlusion
% &# Teacher Model Teacher Model Student Model Student Model
Forward- backward consistency check Forward- backward consistency check
'1 & '2 '2 & '1 ' ,
2
& ' ,
1
' ,
1
& ' ,
2
'2 '1 '̃2 '̃1
Warped Image
'1
.
Warped Image
'2
.
Warped Image
' ,
1 .
Warped Image
' ,
2 .
Image Warp Image Warp Photometric Loss Photometric Loss
Forward Flow
w#
Backward Flow
w$
Backward Occlusion
%$
Forward Occlusion
%#
SLIDE 12 12
q Student model is trained with both !" for non-occluded pixels and !# for
- ccluded pixels. Only student model is needed during testing.
Framework
Forward Flow
w "#
Backward Flow
w "$
Backward Occlusion
% &$
Forward Occlusion
% &# Teacher Model Teacher Model Student Model Student Model
Forward- backward consistency check Forward- backward consistency check
'1 & '2 '2 & '1 ' ,
2
& ' ,
1
' ,
1
& ' ,
2
'2 '1 '̃2 '̃1
Cropped Occlusion
%$
.
Cropped Occlusion
%
# .
Valid Mask
/
#
Valid Mask
/$
Warped Image
'1
Warped Image
'2
Warped Image
' ,
1
Warped Image
' ,
2
Image Warp Image Warp Photometric Loss Photometric Loss Loss for Occluded Pixels
Forward Flow
w#
Backward Flow
w$
Backward Occlusion
%$
Forward Occlusion
%#
!# only functions
non-occluded in
but occluded in cropped patches.
SLIDE 13 13
q Occlusion estimation: based on the forward-backward consistency prior q Photometric loss !" q Loss for occluded pixels !# q Teacher model: ! = !" q Student Model: ! = !" + !#
Loss Functions
No hyperparameter !
SLIDE 14 14
q Optical Flow Ø EPE: average endpoint error between the predicted flow and the ground truth flow over all pixels. Ø Fl: percentage of erroneous pixels. A pixel is considered to be correctly estimated if flow end-point error is < 3 pixels or <5%. q Occlusion estimation Ø F-score: the harmonic average of the precision and recall.
Evaluation Metrics
SLIDE 15 15
q DDFlow outperforms all existing unsupervised flow learning methods on all datasets.
Quantitative Comparisons
SLIDE 16 16
q Our pre-trained model on Flying Chairs even outperforms the finetuned state-
- f-the-art unsupervised models on Sintel dataset.
Quantitative Comparisons
SLIDE 17 17
q 28.6 % relative improvement on KITTI 2012, 37.7% relative improvement on KITTI 2015.
Quantitative Comparisons
SLIDE 18 18
q 28.6 % relative improvement on KITTI 2012, 37.7% relative improvement on KITTI 2015. q On KITTI 2012, DDFlow outperforms Flownet 2.0 for ranking metric Fl-noc.
Quantitative Comparisons
SLIDE 19 19
q DDFlow achieves the best occlusion estimation performance on Sintel Clean and Sintel Final datasets. q On KITTI dataset, the ground truth occlusion masks only contain pixels moving out of the image boundary. Under such setting, our method can achieve comparable performance.
Quantitative Comparisons
SLIDE 20 20
q Sample results on Sintel datasets. The first three rows are from Sintel Clean, while the last three are from Sintel Final.
Qualitative Comparisons
SLIDE 21 21
q Example results on KITTI datasets. The first three rows are from KITTI 2012, and the last three are from KITTI 2015. q Note that on KITTI datasets, the occlusion masks are sparse and only contain pixels moving out of the image boundary.
Qualitative Comparisons
SLIDE 22 22
q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets.
Quantitative: Ablation Study
SLIDE 23 23
q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets. q Comparing row 1, 3 and row 2, 4: census transform constantly improve performance.
Quantitative: Ablation Study
SLIDE 24 24
q Comparing row 1, 2 and row 3, 4: occlusion handling can improve flow estimation performance on all datasets. q Comparing row 1, 3 and row 2, 4: census transform constantly improve performance. q Comparing row 4, 5: data distillation can greatly improve the performance, especially for occluded pixels, with EPE-OCC decreases 18.5% on Sintel Clean, 16.1% on Sintel Final, 58.2% on KITTI 2012 and 42.1% on KITTI 2015.
Quantitative: Ablation Study
SLIDE 25 27
q The top part is the input frame and the bottom part is the corresponding
- ptical flow estimated by DDFlow.
Video Flow Estimation on Sintel Dataset
SLIDE 26 29
q Code and models available on https://github.com/ppliuboy/DDFlow.
DDFlow code
SLIDE 27
What is Next?
SLIDE 28 Motivation
- Can we completely get rid of synthetic data?
- Can we win Sintel back?
31
Liu, King, Lyu, Xu. SelFlow: Self-Supervised Learning of Optical Flow. CVPR 2019
SLIDE 29 Initially, !" and !# are non-occluded from $% to $%&", !"
'
and !#
' are their corresponding pixels. NOC-Model can
accurately estimate the flow of !" and !# using photometric loss.
NOC Model Flow "# "#+1
&2 &1
′
&2
′
&1
$% $%&"
SLIDE 30 We inject random noise to !"#$and let noise cover %$ and %&, then %$ and %& become occluded from !" to ' !"#$. OCC-Model cannot accurately estimate flow of %$ and %& using photometric loss.
NOC Model OCC Model Flow Flow "# "#+1
&2 &1
′
&2
′
&1
" )
#+1
"#
&1 &2
!" !"#$ ' !"#$
SLIDE 31 NOC Model OCC Model Flow Flow Guide "# "#+1
&2 &1
′
&2
′
&1
" )
#+1
"#
&1 &2
We distill reliable flow estimations of !" and !# from NOC-Model to guide the flow learning for OCC-Model. The guidance is only employed to pixels that are
$%'" but non-occluded from $% to $%'", such as !" and !#. $% $%'" & $%'" Self- supervision mask
SLIDE 32
Quantitative Results
SLIDE 33
Our unsupervised results outperform all existing unsupervised results on all datasets by a large margin.
SLIDE 34
Our unsupervised results even outperform several famous fully-supervised methods.
SLIDE 35
Our fine-tuned models achieve state-of-the-art results without using any external labeled data.
SLIDE 36
SLIDE 37
Qualitative Results
SLIDE 38
Effect of Self-supervision
SLIDE 39
Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 40
Without Self-supervision Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 41
Without Self-supervision With Self-supervision Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 42
Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 43
With Self-supervision Without Self-supervision Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 44
Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 45
Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 46
Reference Image Flow Estimation without Self-supervision Flow Estimation with Self-supervision
SLIDE 47
Compared with PWC-Net, our fine-tuned model estimates optical flow with more accurate details.
SLIDE 48
Reference Image Flow Estimation using PWC-Net Flow Estimation using Our Fine-tuned Model
SLIDE 49
Reference Image Flow Estimation using PWC-Net Flow Estimation using Our Fine-tuned Model
SLIDE 50
Reference Image Flow Estimation using PWC-Net Flow Estimation using Our Fine-tuned Model
SLIDE 51
Reference Image Flow Estimation using PWC-Net Flow Estimation using Our Fine-tuned Model
SLIDE 52
To demonstrate the generalization ability of our model, we further show our flow estimation on real-word videos (from the DAVIS dataset).
SLIDE 53
Reference Image Flow from Our Unsupervised Model Flow from Our Fine-tuned Model
SLIDE 54
Reference Image Flow from Our Unsupervised Model Flow from Our Fine-tuned Model
SLIDE 55 Q & A
58
Hiring in Vision and Graphics ;) http://pages.cs.wisc.edu/~jiaxu/