CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - - PowerPoint PPT Presentation

cnn 2 viewpoint generalization via a binocular vision
SMART_READER_LITE
LIVE PREVIEW

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - - PowerPoint PPT Presentation

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw On Generalizability of CNNs The Convolutional


slide-1
SLIDE 1

CNN^2: Viewpoint Generalization via a Binocular Vision

Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw

slide-2
SLIDE 2

On Generalizability of CNNs

  • The Convolutional Neural Networks (CNNs) have

laid the foundation for many techniques in various applications

  • However, the 3D viewpoint generalizability of

CNNs is still far behind human’s visual capabilities

2

slide-3
SLIDE 3

3D Viewpoint Generalizability

  • Humans can recognize objects at unseen angles
  • But CNNs cannot

Train Test

3

slide-4
SLIDE 4

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

4

slide-5
SLIDE 5

Voxel-Reconstruction Methods

  • E.g., the Perspective Transformer Networks (PTNs) by

Yan et al. 16

  • Learn 3D models directly

5

slide-6
SLIDE 6

Cons

  • Require either
  • Voxel-level supervision, or
  • Omnidirectional images as input
  • Both are expensive to collect in practice

6

slide-7
SLIDE 7

CapsuleNets (Hinton et al. 17, 18)

  • Different capsules are organized in a parse tree where

lower-level capsules are dynamically routed to upper- level capsules using an agreement protocol

  • When viewpoint changes, the “routes” will change in a

coordinate way

7

slide-8
SLIDE 8

But…

  • People found that CapsuleNets are hard to train
  • Capsules increase the number of model parameters
  • Iterative routing-by-agreement algorithm is time-

consuming

  • Does not ensure the emergence of a correct parse tree

(Peer et al. 18)

  • Not compatible with CNNs
  • and therefore cannot benefit the rich CNN ecosystem

8

slide-9
SLIDE 9

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

9

slide-10
SLIDE 10

Our Goals

  • A new model that
  • has improved 3D viewpoint generalizability
  • does not require expensive input and supervision
  • is CNN compatible

10

slide-11
SLIDE 11

Ob Observation: Hu Huma mans under erstand the e world using g tw two eyes!

11

slide-12
SLIDE 12

Binocular Images

  • Today, binocular images can be easily collected
  • Majority of people are using their smartphones,

which are now usually equipped with dual or more lens

  • One can also extract two nearby frames in online

videos to construct a large binocular image dataset

12

slide-13
SLIDE 13

Binocular Solution 1 (LeCun et al. 14)

  • Stacks up two binocular images along the channel

dimension and then feeds them to a regular CNN

  • But don’t model any prior of binocular vision

Classifier Conv Pooling Conv Pooling Conv Pooling merge 13

slide-14
SLIDE 14

Binocular Solution 2:

  • Sol. 1 + Monodepth (Godard et al. 17)
  • Calculate the depth map explicitly, then add it as

additional input channels

14

slide-15
SLIDE 15

However…

  • The depth information is only a subset of the

knowledge that can be learned from binocular vision

  • Studies in neuroscience have found out that

human’s visual system can detect

  • Stereoscopic edges (Von Der Heydt et al. 00)
  • Foreground and background (Qiu andVon Der Heydt 05;

Maruko et al. 08)

  • Illusory contours of objects (Von der Heydt et al. 1984;

Anzai et al. 07)

15

slide-16
SLIDE 16

Our Solution: CNN^2

  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling

16

CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment

slide-17
SLIDE 17

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

17

slide-18
SLIDE 18

Dual Feedforward Pathways

  • Humans visual system at left and right sides of the brain

are known to have bias (Gotts et al. 13)

  • Filters/kernels in the left and right pathways can learn

different (biased) features

18

CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment Optic Nerve Optic Chiasm Lateral Geniculate Nucleus (LGN) Visual Cortex System
slide-19
SLIDE 19

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

19

CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment
slide-20
SLIDE 20

Dual Parallax Augmentation (1/2)

20

hR

W X H X C W X H X C

  • ) =

(

hL hL

W X H X C

concat

W X H X 2C

Left path: hL

W X H X C W X H X C

  • ) =

(

hR hR

W X H X C

concat

W X H X 2C

Right path:

˜ hL

<latexit sha1_base64="YV+Kac5N12c5cD+YnxbVjl1Tmlk=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24cNGCfUAbwmQyaYdOHsxMhBKG3/FjaAibl36Be7c+C1OWhG1HhjmcM693HuPGzMqpGm+a7mFxaXlfxqYW19Y3NL395piyjhmLRwxCLedZEgjIakJalkpBtzgKXkY47Osv8zhXhgkbhpRzHxA7QIKQ+xUgqydH3+5Iyj6R9N2KeGAfqS4eTiXNRcPSiaVTL1bJVgRk5UYCWYU7xTYq1veYHva+/Nhz9re9FOAlIKDFDQvQsM5Z2irikmJFJoZ8IEiM8QgPSUzREARF2Oj1hAo+U4kE/4uqFEk7Vnx0pCkS2nqoMkByKv14m/uf1Eumf2ikN40SEM8G+QmDMoJZHtCjnGDJxogzKnaFeIh4ghLlVoWwtzJ86RdMqyUWpaxVodzJAHB+AQHAMLVEANnIMGaAEMrsEteACP2o12pz1pz7PSnPbVswt+QXv5Btom5Y=</latexit>

˜ hR

<latexit sha1_base64="1QbcNEVtgJMj3gvVYG2N5kn6JlQ=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24bMU+oA1hMpm0QycPZiZCcWNv+JGUBG3Lv0Cd278FietiFoPDHM4517uvceNGRXSN+13MLi0vJKfrWwtr6xuaVv7RFlHBMWjhiEe+6SBGQ9KSVDLSjTlBgctIx2dZX7ninBo/BSjmNiB2gQUp9iJXk6Pt9SZlH0r4bMU+MA/Wlw8nEuSg4etE0quVq2arAjJwoQMswp/gmxdpe84Pe18bjv7W9yKcBCSUmCEhepYZSztFXFLMyKTQTwSJER6hAekpGqKACDudnjCBR0rxoB9x9UIJp+rPjhQFIltPVQZIDsVfLxP/83qJ9E/tlIZxIkmIZ4P8hEZwSwP6FOsGRjRDmVO0K8RBxhKVKLQth7uR50i4ZVtkoNa1irQ5myIMDcAiOgQUqoAbOQO0AbX4BY8gEftRrvTnrTnWlO+rZBb+gvXwCJIabnA=</latexit>
slide-21
SLIDE 21

Dual Parallax Augmentation (2/2)

  • Allows the filters/kernels in convolutional layers to

recursively detect stereoscopic features at different abstraction levels by looking into the parallax

  • The small differences between the two input

images at the pixel level and at shallow layers may add up to a big difference at a deeper layer

21

slide-22
SLIDE 22

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

22

CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment
slide-23
SLIDE 23

Concentric Multiscale (CM) Pooling (1/2)

  • Areas that are out of focus are blurred

23

slide-24
SLIDE 24

Concentric Multiscale (CM) Pooling (2/2)

24

Scan

  • Avg. Pool
slide-25
SLIDE 25

Placed Bef Before Convolution

  • Allows filters/kernels to contrast blurry features

with clear features

25

slide-26
SLIDE 26

Outline

  • Related work
  • CNN^2
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling
  • Experiments

26

slide-27
SLIDE 27

Datasets

  • ModelNet2D (gray scale)
  • SmallNORB (gray scale)
  • RGBD-Object (RGB)

27

slide-28
SLIDE 28

Train/Test Setting

28

slide-29
SLIDE 29

3D Viewpoint Generalization

29

slide-30
SLIDE 30

Learning Efficiency

30

slide-31
SLIDE 31

Backward Compatibility

  • CNN^2, by default, does

not generalize to 2D rotated images

31

  • But can be enhanced by

existing works on 2D rotation generalizability

slide-32
SLIDE 32

Takwaways

  • We propose CNN^2 that
  • gives improved 3D viewpoint generalizability
  • does not require expensive input or supervision
  • is compatible with CNNs and can benefit the rich CNN

ecosystem

  • Detects stereoscopic features beyond depth via:
  • Dual feedforward pathways
  • Dual parallax augmentation
  • Concentric Multiscale (CM) pooling

from binocular images

32