A Concurrent Deep Learning Model to Remove Reflections Boxin Shi and - - PowerPoint PPT Presentation

a concurrent deep learning model to remove reflections
SMART_READER_LITE
LIVE PREVIEW

A Concurrent Deep Learning Model to Remove Reflections Boxin Shi and - - PowerPoint PPT Presentation

A Concurrent Deep Learning Model to Remove Reflections Boxin Shi and Renjie Wan shiboxin@pku.edu.cn, wanpeoplejie@gmail.com Collaborators: Ling-Yu Duan, Ah-Hwee Tan, and Alex C. Kot Outline 2 Problem background Two-stage framework based


slide-1
SLIDE 1

A Concurrent Deep Learning Model to Remove Reflections

Collaborators: Ling-Yu Duan, Ah-Hwee Tan, and Alex C. Kot

Boxin Shi and Renjie Wan

shiboxin@pku.edu.cn, wanpeoplejie@gmail.com

slide-2
SLIDE 2

Outline

2

 Problem background  Two-stage framework based methods

 Low-level image prior based methods  ICIP16, TIP18  Learning based solutions  Limitations

 Breaking the limitations of two-stage framework

 SIR2 benchmark dataset  ICCV17  CRRN: a deep learning model to remove reflections  CVPR18

slide-3
SLIDE 3

Problem background

3

Glass Background Reflection Camera

slide-4
SLIDE 4

Problem background

4 Images are from “Li et al. Exploiting Reflection Change for Automatic Reflection Removal . ICCV 2013”

slide-5
SLIDE 5

Problem background

5

 Difficulties of this problem

 Estimate two unknown parameters from one equations  The similarity between background and reflection Mixture image Background Reflection

𝐉 𝐂 𝐒

slide-6
SLIDE 6

Related work

6

 A two-stage framework: Detection and Removal.

Reflection Background

Removal Detection Results

AY07: Levin et al. User assisted separation of reflections from a single image using a sparsity prior. TPAMI 2007

𝑄 𝑀𝐶, 𝑀𝑆 = 𝑄

1(𝑀𝐶) ∙ 𝑄 2(𝑀𝑆)

slide-7
SLIDE 7

Related work

7

Image sequence Background edges Reflection edges

Removal Detection Results

Li et al. Exploiting Reflection Change for Automatic Reflection Removal . ICCV 2013

𝑄 𝑀𝐶, 𝑀𝑆 = 𝑄

1(𝑀𝐶) ∙ 𝑄2(𝑀𝑆)

slide-8
SLIDE 8

Related work

8

Mixture image DoF confidence map Background edges Reflection edges

Removal Detection

WS16: Wan et al. “Depth of field guided reflection removal” ICIP 2016

Result

𝑄 𝑀𝐶, 𝑀𝑆 = 𝑄

1(𝑀𝐶) ∙ 𝑄2(𝑀𝑆)

slide-9
SLIDE 9

Related work

9

 Regional properties of reflections

 Only cover a very small region

WS18: Wan et al. “Region aware reflection removal with unified content and gradient priors” TIP 2018

slide-10
SLIDE 10

Related work

10

 Learning based methods with two-stage framework

 Noroozi et al. ConvNet-based Depth Estimation, Reflection Separation and

Deblurring of Plenoptic Images. ACCV 2016

 Fan, et al. A Generic Deep Architecture for Single Image Reflection Removal

and Image Smoothing. CVPR 2017 Edge extraction Image reconstruction

slide-11
SLIDE 11

Related work

11

 Not dependent on the two-stage framework

 LB14: Li Yu et al. Single Image Layer Separation using Relative Smoothness  NR17: N Arvanitopoulos et al. Single image reflection suppression  SK15: Shih et al. Reflection Removal using Ghosting Cues

Background image Reflection image

𝐧𝐣𝐨𝑀1,𝑀2 ෍

𝑗,𝑘

(𝜍 𝑀1 𝑗 + 𝜐(𝑀2)𝑘

2)

Mixture image

slide-12
SLIDE 12

Related work

12

 Not dependent on the two-stage framework

 LB14: Li Yu et al. Single Image Layer Separation using Relative Smoothness  NR17: N Arvanitopoulos et al. Single image reflection suppression  SK15: Shih et al. Reflection Removal using Ghosting Cues

Image smoothing

slide-13
SLIDE 13

Related work

13

 Not dependent on the two-stage framework

 LB14: Li Yu et al. Single Image Layer Separation using Relative Smoothness  NR17: N Arvanitopoulos et al. Single image reflection suppression  SK15: Shih et al. Reflection Removal using Ghosting Cues

Mixture image Result

slide-14
SLIDE 14

Limitations

14

 The limitations of the two-stage framework.  Highly depend on specific scenarios.

 Limited description ability to the reflection properties.

Blurring effects or ghosting effects.

Mixture image Result by SK15 Mixture image Result obtained by NR17

Failure case of ghosting effects Failure case of blurring effects

slide-15
SLIDE 15

Breaking the two-stage limitations

15

Benchmark data w/ g.t. A concurrent network

slide-16
SLIDE 16

SIR2: Motivations

16

SIngle-image Reflection Removal dataset

LB14 SK15

slide-17
SLIDE 17

SIR2: Motivations

17

SIngle-image Reflection Removal dataset

LB14 SK15

Not available

slide-18
SLIDE 18

SIR2: Motivations

18

SIngle-image Reflection Removal dataset

LB14 SK15

Not available Not enough Not enough

slide-19
SLIDE 19

SIR2: A benchmark dataset

19

Glass Background Background Reflection Reflection

SIngle-image Reflection Removal dataset

slide-20
SLIDE 20

SIR2: A benchmark dataset

20

Glass Background Black paper Background Reflection

SIngle-image Reflection Removal dataset

slide-21
SLIDE 21

SIR2: A benchmark dataset

21

Background Reflection

SIngle-image Reflection Removal dataset

slide-22
SLIDE 22

SIR2: Types of reflections

22

slide-23
SLIDE 23

23

 Different parameters to explore the influence of different settings.

 Seven different aperture sizes and 3 different thickness settings in the postcard and solid

  • bject dataset.

 Different indoor and outdoor scenes in the uncotrolled scene dataset.

SIR2: Images with different reflections

slide-24
SLIDE 24

SIR2: Various scenarios

24

 Image triplets taken in different scenarios.

 The postcard dataset (200 image triplets and 600 images in total).  The solid object dataset (200 image triplets and 600 images in total).  The wild scene dataset (100 scenes and 300 images in total).

Mixture image Background Reflection

slide-25
SLIDE 25

SIR2: Various scenarios

25

 Image triplets taken in different scenarios.

 The postcard dataset (200 image triplets and 600 images in total).  The solid object dataset (200 image triplets and 600 images in total).  The wild scene dataset (100 scenes and 300 images in total).

Mixture image Background Reflection

slide-26
SLIDE 26

SIR2: Various scenarios

26

 Image triplets taken in different scenarios.

 The postcard dataset (200 image triplets and 600 images in total).  The solid object dataset (200 image triplets and 600 images in total).  The wild scene dataset (100 scenes and 300 images in total).

Accepted by ICCV 2017. More details can be found here: https://sir2data.github.io Mixture image Background Reflection

slide-27
SLIDE 27

SIR2: Limitations of evaluated methods

27

 The ignorance of the regional properties of reflections  The highly dependence to specific priors  Ghosting effects and blurring effects

Mixture image Result by SK15 Mixture image Result obtained by NR17

Failure case of ghosting effects Failure case of blurring effects

slide-28
SLIDE 28

CRRN: Deep learning based methods

28

Depth extraction Image reconstruction

Noroozi et al. ConvNet-based Depth Estimation, Reflection Separation and Deblurring of Plenoptic Images. ACCV 2016 FY17: Fan et al. A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing. ICCV 2017

Edge extraction Image reconstruction

slide-29
SLIDE 29

CRRN: Training data preparation

29

𝐉 = 𝐂 + 𝐒 𝐉 = 𝐂 + 𝐒 ∗ 𝒊 LB14, WS16… 𝐉 = 𝐂 + 𝐒 ∗ (𝜷𝜺𝟐 + 𝜸𝜺𝟐) SK15 FY17

slide-30
SLIDE 30

CRRN: Training data preparation

30

𝐉 = 𝐂 + 𝐒 𝐉 = 𝐂 + 𝐒 ∗ 𝒊 LB14, WS16… 𝐉 = 𝐂 + 𝐒 ∗ (𝜷𝜺𝟐 + 𝜸𝜺𝟐) SK15 FY17

slide-31
SLIDE 31

CRRN: Training data preparation

31

𝐉 = 𝐂 + 𝐒 𝐉 = 𝐂 + 𝐒 ∗ 𝒊 LB14, WS16… 𝐉 = 𝐂 + 𝐒 ∗ (𝜷𝜺𝟐 + 𝜸𝜺𝟐) SK15 FY17

slide-32
SLIDE 32

CRRN: Training data preparation

32

 3250 reflection images taken from different places

slide-33
SLIDE 33

CRRN: Network structure

33

Cov layers (stride = 1, 2) Max-pooling layers De-conv layers (stride =2) Feature extraction layers A\B Fine-tuned VGG model Estimated 𝐒∗ Estimated gradient Estimated 𝐂∗

IiN: Image inference network GiN: Gradient inference network

Concat operation Input image Input gradient

𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟖 × 𝟖 × 𝟐𝟏𝟑𝟓 𝟐 × 𝟐 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟒𝟑 𝟓 × 𝟓 × 𝟒𝟑 𝟓 × 𝟓 × 𝟕𝟓 𝟔 × 𝟔 × 𝟐 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟕𝟓 𝟓 × 𝟓 × 𝟒𝟑 𝟒 × 𝟒 × 𝟐𝟕 𝟓 × 𝟓 × 𝟐𝟕 𝟒 × 𝟒 × 𝟒

Multi-scale guided inference

𝟒 × 𝟒 × 𝟑𝟔𝟕

Encoder Decoder

slide-34
SLIDE 34

CRRN: Network structure

34

Cov layers (stride = 1, 2) Max-pooling layers De-conv layers (stride =2) Feature extraction layers A\B Fine-tuned VGG model Estimated 𝐒∗ Estimated gradient Estimated 𝐂∗

IiN: Image inference network GiN: Gradient inference network

Concat operation Input image Input gradient

𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟖 × 𝟖 × 𝟐𝟏𝟑𝟓 𝟐 × 𝟐 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟒𝟑 𝟓 × 𝟓 × 𝟒𝟑 𝟓 × 𝟓 × 𝟕𝟓 𝟔 × 𝟔 × 𝟐 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟕𝟓 𝟓 × 𝟓 × 𝟒𝟑 𝟒 × 𝟒 × 𝟐𝟕 𝟓 × 𝟓 × 𝟐𝟕 𝟒 × 𝟒 × 𝟒

Multi-scale guided inference

𝟒 × 𝟒 × 𝟑𝟔𝟕

Encoder Decoder

slide-35
SLIDE 35

CRRN: Network structure

35

Cov layers (stride = 1, 2) Max-pooling layers De-conv layers (stride =2) Feature extraction layers A\B Fine-tuned VGG model Estimated 𝐒∗ Estimated gradient Estimated 𝐂∗

IiN: Image inference network GiN: Gradient inference network

Concat operation Input image Input gradient

𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟖 × 𝟖 × 𝟐𝟏𝟑𝟓 𝟐 × 𝟐 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟒𝟑 𝟓 × 𝟓 × 𝟒𝟑 𝟓 × 𝟓 × 𝟕𝟓 𝟔 × 𝟔 × 𝟐 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟕𝟓 𝟓 × 𝟓 × 𝟒𝟑 𝟒 × 𝟒 × 𝟐𝟕 𝟓 × 𝟓 × 𝟐𝟕 𝟒 × 𝟒 × 𝟒

Multi-scale guided inference

𝟒 × 𝟒 × 𝟑𝟔𝟕

Encoder Decoder

slide-36
SLIDE 36

CRRN: Network structure

36

Cov layers (stride = 1, 2) Max-pooling layers De-conv layers (stride =2) Feature extraction layers A\B Fine-tuned VGG model Estimated 𝐒∗ Estimated gradient Estimated 𝐂∗

IiN: Image inference network GiN: Gradient inference network

Concat operation Input image Input gradient

𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟔𝟐𝟑 𝟓 × 𝟓 × 𝟔𝟐𝟑 𝟖 × 𝟖 × 𝟐𝟏𝟑𝟓 𝟐 × 𝟐 × 𝟔𝟐𝟑 𝟒 × 𝟒 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟒 × 𝟒 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟒 × 𝟒 × 𝟕𝟓 𝟓 × 𝟓 × 𝟕𝟓 𝟒 × 𝟒 × 𝟒𝟑 𝟓 × 𝟓 × 𝟒𝟑 𝟓 × 𝟓 × 𝟕𝟓 𝟔 × 𝟔 × 𝟐 𝟓 × 𝟓 × 𝟑𝟔𝟕 𝟓 × 𝟓 × 𝟐𝟑𝟗 𝟓 × 𝟓 × 𝟕𝟓 𝟓 × 𝟓 × 𝟒𝟑 𝟒 × 𝟒 × 𝟐𝟕 𝟓 × 𝟓 × 𝟐𝟕 𝟒 × 𝟒 × 𝟒

Multi-scale guided inference

𝟒 × 𝟒 × 𝟑𝟔𝟕

Encoder Decoder

slide-37
SLIDE 37

CRRN: Loss functions

37

 A perceptual motivated loss functions.

 The blurry artifacts generated by pixel-wise losses  Better visual quality due to the perceptual losses

and

SSIM 𝒚, 𝒛 = [𝑚(𝒚, 𝒛)]𝛽 [𝑑(𝒚, 𝒛)]𝛾 [𝑡(𝒚, 𝒛)]𝛿 SI 𝒚, 𝒛 = [𝑡(𝒚, 𝒛)]𝛿

slide-38
SLIDE 38

CRRN: Evaluations

38

 Comparisons with four state-of-the-art methods.

 LB14, WS16, NR17 and FY17

 The generalization comparisons with FY17.  The wild scene dataset from ‘SIR2’.

 100 image triplets  Visually and quantitatively comparison  Global evaluations

SSIM and SI  Local evaluation

SSIMr and SIr

slide-39
SLIDE 39

CRRN: Visual quality evaluations

39

Input image Ground truth Ours FY17 NR17 WS16 LB14

SSIM: 0.947 SSIM𝑠:0.933 SSIM: 0.935 SSIM𝑠:0.918 SSIM: 0.942 SSIM𝑠:0.930 SSIM: 0.939 SSIM𝑠:0.923 SSIM: 0.947 SSIM𝑠:0.935 SSIM: 0.914 SSIM𝑠:0.912 SSIM: 0.858 SSIM𝑠:0.831 SSIM:0.883 SSIM𝑠:0.885 SSIM: 0.884 SSIM𝑠:0.871 SSIM: 0.902 SSIM𝑠:0.872 SSIM: 0.914 SSIM𝑠:0.913 SSIM: 0.868 SSIM𝑠:0.867 SSIM: 0.848 SSIM𝑠:0.863 SSIM: 0.881 SSIM𝑠:0.870 SSIM: 0.878 SSIM𝑠:0.870

slide-40
SLIDE 40

CRRN: Quantitative evaluations

40

slide-41
SLIDE 41

CRRN: Generalization evaluations

41

slide-42
SLIDE 42

CRRN: Generalization evaluations

42

slide-43
SLIDE 43

CRRN: Generalization evaluations

43

Mixture image Ours FY17 Mixture image Ours FY17

slide-44
SLIDE 44

THANK YOU

Boxin Shi, Renjie Wan

shiboxin@pku.edu.cn, wanpeoplejie@gmail.com