Learning-based Matching Costs Finalist of the Depth Estimation - - PowerPoint PPT Presentation

learning based matching costs
SMART_READER_LITE
LIVE PREVIEW

Learning-based Matching Costs Finalist of the Depth Estimation - - PowerPoint PPT Presentation

Depth from a Light Field Image with Learning-based Matching Costs Finalist of the Depth Estimation Challenge at LF4CV & Submitted to IEEE TPAMI (under review) Hae-Gon Jeon, Jaesik Park, Gyeongmin Choe, Jinsun Park Yunsu Bok,


slide-1
SLIDE 1

Depth from a Light Field Image with Learning-based Matching Costs

Finalist of the Depth Estimation Challenge at LF4CV & Submitted to IEEE TPAMI (under review) Hae-Gon Jeon¹, Jaesik Park², Gyeongmin Choe¹, Jinsun Park¹ Yunsu Bok³, Yu-Wing Tai⁴, In So Kweon¹ ¹KAIST ²Intel labs ³ETRI ⁴Tencent

slide-2
SLIDE 2

Goal of the Proposed Method

Problem2: Severe noise Problem1: Severe vignetting

  • 1. Hard to find accurate correspondence in

radiometric distortions and severe noise

 Using various hand-craft matching cost

  • 2. Which one is correct matching cost?

 Predicting the correct matching cost using two random forests

  • 3. Does it work well in real world light-field images?

 Realistic dataset generation based on an imaging pipeline of the Lytro camera

slide-3
SLIDE 3

Overview of the Proposed Method

  • 1. Realistic Light Field Image Generation;

Emulating an imaging pipeline of Lytro camera

  • 3. Random Forest 1 - Classification;

Selecting dominant matching costs

  • 4. Random Forest 2 - Regression;

Predicting a disparity value with sub-pixel precision

  • 2. Making Cost Volumes using Phase Shift;

Overcoming inherent degradation of light-field images caused by a lenslet array SAD GRAD Census ZNCC q = [ ]

slide-4
SLIDE 4

Data Generation Vignetting Map

Noise-free multi-view images Vignetting map from averaged white plane images White plane image

slide-5
SLIDE 5

Data Generation Lenslet Image Generation

Sub-aperture image with vignetting map

Extract a pixel from each sub- aperture image Aggregate these pixels in a lenslet

slide-6
SLIDE 6

Data Generation Add Noise

Noise level estimation of each color channel

0.2 0.3 0.4 0.5 0.6 0.005 0.01 0.015 0.02 0.025 Intensity Standard Deviation Green Channel1 0.2 0.3 0.4 0.5 0.6 0.005 0.01 0.015 0.02 0.025 Intensity Standard Deviation Green Channel2 0.2 0.3 0.4 0.5 0.6 0.005 0.01 0.015 0.02 0.025 Intensity Standard Deviation Blue Channel 0.2 0.3 0.4 0.5 0.6 0.005 0.01 0.015 0.02 0.025 Intensity Standard Deviation Red Channel

Convert color image to raw image

  • Y. Schechner et al., “Multiplexing for
  • ptimal lighting”, IEEE TPAMI 2007
slide-7
SLIDE 7

Data Generation Realistic Sub-aperture Image Generation

Noisy raw image Demosaicing

Rearrange pixels at each lenslet to each sub-aperture image

slide-8
SLIDE 8

Training Set http://hci-lightfield.iwr.uni-heidelberg.de/

Antinous, Range: [ -3.3, 2.8 ] Boardgames, Range: [ -1.8, 1.6 ] Dishes, Range: [ -3.1, 3.5 ] Greek, Range: [ -3.5, 3.1 ] Kitchen, Range: [ -1.6, 1.8 ] Medieval2, Range: [ -1.7, 2.0 ] Museum, Range: [ -1.5, 1.3 ] Pens, Range: [ -1.7, 2.0 ] Pillows, Range: [ -1.7, 1.8 ] Platonic, Range: [ -1.7, 1.5 ] Rosemary, Range: [ -1.8, 1.8 ] Table, Range: [ -2.0, 1.6 ] Tomb, Range: [ -1.5, 1.9 ] Tower, Range: [ -3.6, 3.5 ] Town, Range: [ -1.6, 1.6 ] Vinyl, Range: [ -1.6, 1.2 ]

slide-9
SLIDE 9

Cost Volumes Phase Shift

Flipping adjacent views Sub-aperture images

Very narrow baseline; Physically 0.45mm Within 1px

Averbuch and Keller, “A unified approach to FFT based images registration”, IEEE TIP 2003

Phase shift => 1/100 pixel precision

Original Bilinear Bicubic Phase

slide-10
SLIDE 10

Cost Volumes Phase Shift

GT GT Bilinear Bicubic Phase Bilinear Bicubic Phase

0.2 % 1 % 0.2 % 1 %

16.2% 15.35% 9.88% 9.03% 8.73% 6.38%

Jeon et al., “Accurate Depth Map Estimation from a Lenslet Light Field Camera”, CVPR 2015

slide-11
SLIDE 11

Cost Volumes Matching Costs

Sum of Absolute Difference (SAD) Zero-mean Normalized Cross correlation (ZNCC) Census Transform (Census) Sum of Gradient Difference (GRAD)

+ Robust to image noise; act as averaged filter + Compensate for differences in both gain and offset + Synergy with other matching costs + imposing higher weights at edge boundaries + Tolerate radiometric distortions

  • H. Hirschmuller and D. Scharstein, “Evaluation of stereo matching

costs on images with radiometric differences,” IEEE TPAMI 2009.

slide-12
SLIDE 12

Cost Volumes Matching group1

𝑔( )

Sub-aperture images

Matching Cost

,

Reference view Target view

Cost volume

Depth label

slide-13
SLIDE 13

Cost Volumes Matching group2

𝑔( )

Sub-aperture images

Matching Cost

,

Reference view Target view

Cost volume

Depth label

slide-14
SLIDE 14

Cost Volumes Computed Cost Volumes

Matching group Matching cost Sum of Absolute Difference (SAD) Zero-mean Normalized Cross correlation (ZNCC) Census Transform (Census) Sum of Gradient Difference (GRAD)

slide-15
SLIDE 15

Cost Volumes Computed Cost Volumes

Disparities from each cost volume via Winner-Takes-All

slide-16
SLIDE 16

Cost Volumes Computed Cost Volumes

Vectorizing estimated depth labels with a ground truth depth label

31 53 43 55 55 55 55 55 61 60 74

Ground truth

67 51 53 37 66 58 12 25 42 49 55 61 43 57 76 72 66 23 55 58 56

SAD+GRAD GRAD+Census Census+SAD 𝛽 ∈ [0, 1.0]

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯

Multiple disparity hypotheses

Campbell et al., “Using Multiple hypotheses to improve depth-maps for multi-view stereo”, ECCV 2008

Multiple disparity hypotheses Multiple disparity hypotheses

slide-17
SLIDE 17

Cost Volumes Computed Cost Volumes

Vectorizing estimated depth labels with a ground truth depth label

25 54 48 32 32 32 32 32 34 42 11

Ground truth

19 20 43 37 32 33 5 31 42 29 12 41 34 57 44 39 56 49 43 17 32

SAD+GRAD GRAD+Census Census+SAD 𝛽 ∈ [0, 1.0]

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯

Multiple disparity hypotheses Multiple disparity hypotheses Multiple disparity hypotheses

slide-18
SLIDE 18

Training a random forest

32 32 31 42 29 12 41 34 57 44 39 56 49 43 17 32 ⋯ ⋯ ⋯ ⋯ 25 54 48 32 32 34 42 11 ⋯ ⋯ ⋯ 32 19 20 43 37 32 33 5 ⋯ ⋯ ⋯ ⋯ ⋯

𝐫

Random forest 1

Classification

Random Forest1 - Classification

slide-19
SLIDE 19

Random Forest1 - Classification

Importance q4 q3 q1 q2 q7 q9 q5 q8 q10 q11 q6

𝐫

Retrieving a set of important matching costs using the permutation importance measure

[L. Breiman, “Random forests,” Machine learning]

+ Removing unnecessary matching cost + Designing a better prediction model

Matching Group1 Matching Group2 Matching Group3 Matching Group4
slide-20
SLIDE 20

Random Forest2 - Regression

Random forest 2

Regression

𝐫

q4 q3 q1 q2 q7 q9 q5 q8 q10 q11 q6 vs.

Estimated disparity value with sub-pixel precision

SAD+GRAD

[H.-G. Jeon et al., IEEE CVPR 2015]

with Weighted Median Filter

[Z. Ma et al., IEEE ICCV 2013]

Input of a random forest for regression

slide-21
SLIDE 21

Benchmark Bad pixel ratio (>0.07px) & Mean square error

Bad pixel ratio Mean square error

(2017.05.23)

slide-22
SLIDE 22

Evaluation Results - Stratified

GT Estimated Error Map

slide-23
SLIDE 23

GT Estimated Error Map

Evaluation Results - Training

Most errors are shown in depth boundaries

slide-24
SLIDE 24

GT Estimated Error Map

Evaluation Results - Test

slide-25
SLIDE 25

Real World Examples – Lytro Illum

Wanner and Goldluecke, IEEE TPAMI 14 Yu et al, ICCV 13 Jeon et al, CVPR 15 Williem et al, CVPR 16 Wang et al, IEEE TPAMI 16 Tao et al, IEEE TPAMI 17 Proposed Wanner and Goldluecke, IEEE TPAMI 14 Yu et al, ICCV 13 Jeon et al, CVPR 15 Williem et al, CVPR 16 Wang et al, IEEE TPAMI 16 Tao et al, IEEE TPAMI 17 Proposed

slide-26
SLIDE 26

Real World Examples – Lytro Illum

Wanner and Goldluecke, IEEE TPAMI 14 Yu et al, ICCV 13 Jeon et al, CVPR 15 Williem et al, CVPR 16 Wang et al, IEEE TPAMI 16 Tao et al, IEEE TPAMI 17 Proposed Wanner and Goldluecke, IEEE TPAMI 14 Yu et al, ICCV 13 Jeon et al, CVPR 15 Williem et al, CVPR 16 Wang et al, IEEE TPAMI 16 Tao et al, IEEE TPAMI 17 Proposed

slide-27
SLIDE 27

Conclusion

Pros: Accurate disparity estimation + Handling narrow baseline problem + Robust to Image noise + Applicable real-world light field images Cons:

  • Heavy computational burden
  • Need to minimize disparity error in depth discontinuities
  • Requiring for handling textureless regions

Contributions:

  • Analysis of the problems of depth estimation using light-field cameras
  • Data augmentation that simulates a pipeline of a hand-held light-field camera
  • Pixel-wise disparity value prediction using two random forests
Object 3D printing 3D Mesh
slide-28
SLIDE 28

Data Generation Add Noise

Without augmented training set With training set augmented with Gaussian noise With fully augmented training set