Recurrent Transformer Networks for Semantic Correspondence - - PowerPoint PPT Presentation

β–Ά
recurrent transformer networks
SMART_READER_LITE
LIVE PREVIEW

Recurrent Transformer Networks for Semantic Correspondence - - PowerPoint PPT Presentation

Neural Information Processing Systems (NeurIPS) 2018 Recurrent Transformer Networks for Semantic Correspondence Seungryong Kim 1 , Stepthen Lin 2 , Sangryul Jeon 1 , Dongbo Min 3 , Kwanghoon Sohn 1 Dec. 05, 2018 1) 2) 3) In Introduction


slide-1
SLIDE 1

Neural Information Processing Systems (NeurIPS) 2018

Recurrent Transformer Networks for Semantic Correspondence

Seungryong Kim1, Stepthen Lin2, Sangryul Jeon1, Dongbo Min3, Kwanghoon Sohn1

  • Dec. 05, 2018

1) 2) 3)

slide-2
SLIDE 2

Semantic Correspondence

  • Establishing dense correspondences between semantically similar images,

i.e., different instances within the same object or scene categories

  • For example, the wheels of two different cars, the body of people or animals

2

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

In Introduction

slide-3
SLIDE 3

Challenges in Semantic Correspondence

3

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

In Introduction

Pho hotometric De Deformations

  • Intra-class appearance and

attribute variations

  • Etc.

Ge Geometric De Deformations

  • Different viewpoint or baseline
  • Non-rigid shape deformations
  • Etc.

Lac Lack of

  • f Sup

Supervis ision

  • Labor-intensive of annotation
  • Degraded by subjectivity
  • Etc.

?

slide-4
SLIDE 4

Objective How to estimate locally-varying affine transformation fields without ground-truth supervision?

4

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Problem Formulation

𝑗 𝐔𝑗 = 𝐁𝑗, 𝐠𝑗 𝑗′ = T𝑗𝑗

slide-5
SLIDE 5

Methods for Geometric Invariance in Feature Extraction Step

  • UCN [Choy et al., NeurIPS’16]
  • CAT-FCSS [Kim et al., TPAMI’18]
  • Etc.

5

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Background

Spatial Transformer Networks (STNs)-based methods [Jaderberg et al., NeurIPS’15] 𝐁𝑗 is learned wo/𝐁𝑗

βˆ—

But, 𝐠𝑗 is learned w/𝐠𝑗

βˆ—

Geometric inference based on

  • nly source or target image
slide-6
SLIDE 6

Methods for Geometric Invariance in Regularization Step

  • GMat. [Rocco et al., CVPR’17]
  • GMat. w/Inl. [Rocco et al., CVPR’18]
  • Etc.

6

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Background

𝐔𝑗 is learned wo/𝐔𝑗

βˆ—

using self- or meta-supervision Geometric Inference using source/target images Globally-varying geometric Inference only Only fixed, untransformed versions of the features

slide-7
SLIDE 7

Networks Configuration

  • To weaves the advantages of STN-based methods and geometric

matching methods by recursively estimating geometric transformation residuals using geometry-aligned feature activations

7

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Recurrent Transformer Networks (RTNs)

slide-8
SLIDE 8

Feature Extraction Networks

  • Input images 𝐽𝑑 and 𝐽𝑒 are passed through Siamese convolution networks with

parameters 𝐗𝐺 such that 𝐸𝑗 = 𝐺 𝐽 𝐗𝐺

  • Using CAT-FCSS, VGGNet (conv4-4), ResNet (conv4-23)

8

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Recurrent Transformer Networks (RTNs)

slide-9
SLIDE 9

Recurrent Geometric Matching Networks

  • Constraint correlation volume construction

𝐷(𝐸𝑗

𝑑, 𝐸𝑒(π”π‘˜)) =< 𝐸𝑗 𝑑, 𝐸𝑒(π”π‘˜) >/ < 𝐸𝑗 𝑑, 𝐸𝑒(π”π‘˜) >2

9

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Recurrent Transformer Networks (RTNs)

Source Target

slide-10
SLIDE 10

Recurrent Geometric Matching Networks

  • Recurrent geometric inference

𝐔𝑗

𝑙 βˆ’ 𝐔𝑗 π‘™βˆ’1 = 𝐺(𝐷(𝐸𝑗 𝑑, 𝐸𝑒(𝐔𝑗 π‘™βˆ’1))|𝐗𝐻)

10

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Recurrent Transformer Networks (RTNs)

Source Target

  • Iter. 1
  • Iter. 2
  • Iter. 3
  • Iter. 4
slide-11
SLIDE 11

Weakly-supervised Learning

  • Intuition: matching score between the source 𝐸𝑑 at each pixel 𝑗 and the target

𝐸𝑒(𝐔𝑗) should be maximized while keeping the scores of other candidates low

  • Loss Function:

𝑀 𝐸𝑗

𝑑, 𝐸𝑒 𝐔

= βˆ’ ෍

π‘˜βˆˆπ‘π‘—

π‘žπ‘˜

βˆ—log(π‘ž(𝐸𝑗 𝑑, 𝐸𝑒(π”π‘˜)))

where the function π‘ž(𝐸𝑗

𝑑, 𝐸𝑒(π”π‘˜)) is a Softmax probability

π‘ž(𝐸𝑗

𝑑, 𝐸𝑒(π”π‘˜)) =

exp(𝐷(𝐸𝑗

𝑑, 𝐸𝑒(π”π‘˜)))

Οƒπ‘šβˆˆπ‘π‘— exp(𝐷(𝐸𝑗

𝑑, 𝐸𝑒(π”π‘š)))

where π‘žπ‘˜

βˆ— denotes a class label defined as 1 if π‘˜ = 𝑗, 0 otherwise

11

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Recurrent Transformer Networks (RTNs)

slide-12
SLIDE 12

Results on the TSS Benchmark

12

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Experimental Results

Source images Target images SCNet

[Han et al., ICCV’17]

  • GMat. w/Inl.

[Rocco et al., CVPR’18]

RTNs

slide-13
SLIDE 13

Results on the PF-PASCAL Benchmark

13

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Experimental Results

Source images Target images SCNet

[Han et al., ICCV’17]

  • GMat. w/Inl.

[Rocco et al., CVPR’18]

RTNs

slide-14
SLIDE 14

Results on the PF-PASCAL Benchmark

14

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Experimental Results

Source images Target images SCNet

[Han et al., ICCV’17]

  • GMat. w/Inl.

[Rocco et al., CVPR’18]

RTNs

slide-15
SLIDE 15

15

Seungryong Kim et al., Recurrent Transformer Networks for Semantic Correspondence, NeurIPS, 2018

Concluding Remarks

  • RTNs learn to infer locally-varying geometric fields for semantic

correspondence in an end-to-end and weakly-supervised fashion

  • The key idea is to utilize and iteratively refine the transformations and

convolutional activations through matching between the image pair

  • A technique is presented for weakly-supervised training of RTNs
slide-16
SLIDE 16

Seungryong Kim, Ph.D.

Digital Image Media Lab. Yonsei University, Seoul, Korea

Tel: +82-2-2123-2879 E-mail: srkim89@yonsei.ac.kr Homepage: http://diml.yonsei.ac.kr/~srkim/

Thank you!

See you at 210 & 230 AB #119