cnn 2 viewpoint generalization via a binocular vision
play

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - PowerPoint PPT Presentation

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw On Generalizability of CNNs The Convolutional


  1. CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw

  2. On Generalizability of CNNs • The Convolutional Neural Networks (CNNs) have laid the foundation for many techniques in various applications • However, the 3D viewpoint generalizability of CNNs is still far behind human’s visual capabilities 2

  3. 3D Viewpoint Generalizability Train Test • Humans can recognize objects at unseen angles • But CNNs cannot 3

  4. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments 4

  5. Voxel-Reconstruction Methods • E.g., the Perspective Transformer Networks (PTNs) by Yan et al. 16 • Learn 3D models directly 5

  6. Cons • Require either • Voxel-level supervision, or • Omnidirectional images as input • Both are expensive to collect in practice 6

  7. CapsuleNets (Hinton et al. 17, 18) • Different capsules are organized in a parse tree where lower-level capsules are dynamically routed to upper- level capsules using an agreement protocol • When viewpoint changes, the “routes” will change in a coordinate way 7

  8. But… • People found that CapsuleNets are hard to train • Capsules increase the number of model parameters • Iterative routing-by-agreement algorithm is time- consuming • Does not ensure the emergence of a correct parse tree (Peer et al. 18) • Not compatible with CNNs • and therefore cannot benefit the rich CNN ecosystem 8

  9. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments 9

  10. Our Goals • A new model that • has improved 3D viewpoint generalizability • does not require expensive input and supervision • is CNN compatible 10

  11. Ob Observation: Hu Huma mans under erstand the e world using g tw two eyes! 11

  12. Binocular Images • Today, binocular images can be easily collected • Majority of people are using their smartphones, which are now usually equipped with dual or more lens • One can also extract two nearby frames in online videos to construct a large binocular image dataset 12

  13. Binocular Solution 1 (LeCun et al. 14) merge Classifier Pooling Pooling Pooling Conv Conv Conv • Stacks up two binocular images along the channel dimension and then feeds them to a regular CNN • But don’t model any prior of binocular vision 13

  14. Binocular Solution 2: Sol. 1 + Monodepth (Godard et al. 17) • Calculate the depth map explicitly, then add it as additional input channels 14

  15. However… • The depth information is only a subset of the knowledge that can be learned from binocular vision • Studies in neuroscience have found out that human’s visual system can detect • Stereoscopic edges (Von Der Heydt et al. 00) • Foreground and background (Qiu andVon Der Heydt 05; Maruko et al. 08) • Illusory contours of objects (Von der Heydt et al. 1984; Anzai et al. 07) 15

  16. • Concentric Multiscale (CM) pooling • Dual parallax augmentation • Dual feedforward pathways Our Solution: CNN^2 augment CM Pooling CM Pooling Conv Conv augment augment CM Pooling CM Pooling Conv Conv augment augment CM Pooling CM Pooling Conv Conv add Classifier 16

  17. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments 17

  18. Dual Feedforward Pathways Optic Nerve CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv Optic Chiasm augment Classifier add Lateral Geniculate Nucleus (LGN) CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment Visual Cortex System • Humans visual system at left and right sides of the brain are known to have bias (Gotts et al. 13) • Filters/kernels in the left and right pathways can learn different (biased) features 18

  19. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv augment Classifier add CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment 19

  20. <latexit sha1_base64="1QbcNEVtgJMj3gvVYG2N5kn6JlQ=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24bMU+oA1hMpm0QycPZiZCcWNv+JGUBG3Lv0Cd278FietiFoPDHM4517uvceNGRXSN+13MLi0vJKfrWwtr6xuaVv7RFlHBMWjhiEe+6SBGQ9KSVDLSjTlBgctIx2dZX7ninBo/BSjmNiB2gQUp9iJXk6Pt9SZlH0r4bMU+MA/Wlw8nEuSg4etE0quVq2arAjJwoQMswp/gmxdpe84Pe18bjv7W9yKcBCSUmCEhepYZSztFXFLMyKTQTwSJER6hAekpGqKACDudnjCBR0rxoB9x9UIJp+rPjhQFIltPVQZIDsVfLxP/83qJ9E/tlIZxIkmIZ4P8hEZwSwP6FOsGRjRDmVO0K8RBxhKVKLQth7uR50i4ZVtkoNa1irQ5myIMDcAiOgQUqoAbOQO0AbX4BY8gEftRrvTnrTnWlO+rZBb+gvXwCJIabnA=</latexit> <latexit sha1_base64="YV+Kac5N12c5cD+YnxbVjl1Tmlk=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24cNGCfUAbwmQyaYdOHsxMhBKG3/FjaAibl36Be7c+C1OWhG1HhjmcM693HuPGzMqpGm+a7mFxaXlfxqYW19Y3NL395piyjhmLRwxCLedZEgjIakJalkpBtzgKXkY47Osv8zhXhgkbhpRzHxA7QIKQ+xUgqydH3+5Iyj6R9N2KeGAfqS4eTiXNRcPSiaVTL1bJVgRk5UYCWYU7xTYq1veYHva+/Nhz9re9FOAlIKDFDQvQsM5Z2irikmJFJoZ8IEiM8QgPSUzREARF2Oj1hAo+U4kE/4uqFEk7Vnx0pCkS2nqoMkByKv14m/uf1Eumf2ikN40SEM8G+QmDMoJZHtCjnGDJxogzKnaFeIh4ghLlVoWwtzJ86RdMqyUWpaxVodzJAHB+AQHAMLVEANnIMGaAEMrsEteACP2o12pz1pz7PSnPbVswt+QXv5Btom5Y=</latexit> Dual Parallax Augmentation (1/2) Left path: W X H X C W X H X C - ) = W X H X 2C ( W X H X C concat h L h R ˜ h L h L Right path: W X H X C W X H X C - ) = ( W X H X 2C W X H X C concat h R h L ˜ h R h R 20

  21. Dual Parallax Augmentation (2/2) • Allows the filters/kernels in convolutional layers to recursively detect stereoscopic features at different abstraction levels by looking into the parallax • The small differences between the two input images at the pixel level and at shallow layers may add up to a big difference at a deeper layer 21

  22. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv augment Classifier add CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment 22

  23. Concentric Multiscale (CM) Pooling (1/2) • Areas that are out of focus are blurred 23

  24. Concentric Multiscale (CM) Pooling (2/2) Scan Avg. Pool 24

  25. Before Convolution Placed Bef • Allows filters/kernels to contrast blurry features with clear features 25

  26. Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments 26

  27. Datasets • ModelNet2D (gray scale) • SmallNORB (gray scale) • RGBD-Object (RGB) 27

  28. Train/Test Setting 28

  29. 3D Viewpoint Generalization 29

  30. Learning Efficiency 30

  31. Backward Compatibility • CNN^2, by default, does not generalize to 2D rotated images • But can be enhanced by existing works on 2D rotation generalizability 31

  32. Takwaways • We propose CNN^2 that • gives improved 3D viewpoint generalizability • does not require expensive input or supervision • is compatible with CNNs and can benefit the rich CNN ecosystem • Detects stereoscopic features beyond depth via: • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling from binocular images 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend