CNN^2: Viewpoint Generalization via a Binocular Vision
Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw
CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - - PowerPoint PPT Presentation
CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw On Generalizability of CNNs The Convolutional
CNN^2: Viewpoint Generalization via a Binocular Vision
Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw
On Generalizability of CNNs
laid the foundation for many techniques in various applications
CNNs is still far behind human’s visual capabilities
2
3D Viewpoint Generalizability
Train Test
3
Outline
4
Voxel-Reconstruction Methods
Yan et al. 16
5
Cons
6
CapsuleNets (Hinton et al. 17, 18)
lower-level capsules are dynamically routed to upper- level capsules using an agreement protocol
coordinate way
7
But…
consuming
(Peer et al. 18)
8
Outline
9
Our Goals
10
Ob Observation: Hu Huma mans under erstand the e world using g tw two eyes!
11
Binocular Images
which are now usually equipped with dual or more lens
videos to construct a large binocular image dataset
12
Binocular Solution 1 (LeCun et al. 14)
dimension and then feeds them to a regular CNN
Classifier Conv Pooling Conv Pooling Conv Pooling merge 13
Binocular Solution 2:
additional input channels
14
However…
knowledge that can be learned from binocular vision
human’s visual system can detect
Maruko et al. 08)
Anzai et al. 07)
15
Our Solution: CNN^2
16
CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment
Outline
17
Dual Feedforward Pathways
are known to have bias (Gotts et al. 13)
different (biased) features
18
CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment Optic Nerve Optic Chiasm Lateral Geniculate Nucleus (LGN) Visual Cortex SystemOutline
19
CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augmentDual Parallax Augmentation (1/2)
20
hR
W X H X C W X H X C
hL hL
W X H X C
concat
W X H X 2C
Left path: hL
W X H X C W X H X C
hR hR
W X H X C
concat
W X H X 2C
Right path:
˜ hL
<latexit sha1_base64="YV+Kac5N12c5cD+YnxbVjl1Tmlk=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24cNGCfUAbwmQyaYdOHsxMhBKG3/FjaAibl36Be7c+C1OWhG1HhjmcM693HuPGzMqpGm+a7mFxaXlfxqYW19Y3NL395piyjhmLRwxCLedZEgjIakJalkpBtzgKXkY47Osv8zhXhgkbhpRzHxA7QIKQ+xUgqydH3+5Iyj6R9N2KeGAfqS4eTiXNRcPSiaVTL1bJVgRk5UYCWYU7xTYq1veYHva+/Nhz9re9FOAlIKDFDQvQsM5Z2irikmJFJoZ8IEiM8QgPSUzREARF2Oj1hAo+U4kE/4uqFEk7Vnx0pCkS2nqoMkByKv14m/uf1Eumf2ikN40SEM8G+QmDMoJZHtCjnGDJxogzKnaFeIh4ghLlVoWwtzJ86RdMqyUWpaxVodzJAHB+AQHAMLVEANnIMGaAEMrsEteACP2o12pz1pz7PSnPbVswt+QXv5Btom5Y=</latexit>˜ hR
<latexit sha1_base64="1QbcNEVtgJMj3gvVYG2N5kn6JlQ=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24bMU+oA1hMpm0QycPZiZCcWNv+JGUBG3Lv0Cd278FietiFoPDHM4517uvceNGRXSN+13MLi0vJKfrWwtr6xuaVv7RFlHBMWjhiEe+6SBGQ9KSVDLSjTlBgctIx2dZX7ninBo/BSjmNiB2gQUp9iJXk6Pt9SZlH0r4bMU+MA/Wlw8nEuSg4etE0quVq2arAjJwoQMswp/gmxdpe84Pe18bjv7W9yKcBCSUmCEhepYZSztFXFLMyKTQTwSJER6hAekpGqKACDudnjCBR0rxoB9x9UIJp+rPjhQFIltPVQZIDsVfLxP/83qJ9E/tlIZxIkmIZ4P8hEZwSwP6FOsGRjRDmVO0K8RBxhKVKLQth7uR50i4ZVtkoNa1irQ5myIMDcAiOgQUqoAbOQO0AbX4BY8gEftRrvTnrTnWlO+rZBb+gvXwCJIabnA=</latexit>Dual Parallax Augmentation (2/2)
recursively detect stereoscopic features at different abstraction levels by looking into the parallax
images at the pixel level and at shallow layers may add up to a big difference at a deeper layer
21
Outline
22
CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augment add Classifier augment CM Pooling Conv CM Pooling Conv CM Pooling Conv augment augmentConcentric Multiscale (CM) Pooling (1/2)
23
Concentric Multiscale (CM) Pooling (2/2)
24
Scan
Placed Bef Before Convolution
with clear features
25
Outline
26
Datasets
27
Train/Test Setting
28
3D Viewpoint Generalization
29
Learning Efficiency
30
Backward Compatibility
not generalize to 2D rotated images
31
existing works on 2D rotation generalizability
Takwaways
ecosystem
from binocular images
32