Towards General Vision Architectures: Attentive Single-Tasking of - - PowerPoint PPT Presentation

towards general vision architectures attentive single
SMART_READER_LITE
LIVE PREVIEW

Towards General Vision Architectures: Attentive Single-Tasking of - - PowerPoint PPT Presentation

Towards General Vision Architectures: Attentive Single-Tasking of Multiple Tasks depth Neural Architects - ICCV 28 October 2019 Iasonas Kokkinos Kevis Maninis Ilija Radosavovic What can we get out of an image? What can we get out of an


slide-1
SLIDE 1

Neural Architects - ICCV 28 October 2019 Iasonas Kokkinos

Towards General Vision Architectures: Attentive Single-Tasking of Multiple Tasks

depth

Kevis Maninis Ilija Radosavovic

slide-2
SLIDE 2

What can we get out of an image?

slide-3
SLIDE 3

What can we get out of an image?

Object detection

slide-4
SLIDE 4

What can we get out of an image?

Semantic segmentation

slide-5
SLIDE 5

What can we get out of an image?

Semantic boundary detection

slide-6
SLIDE 6

What can we get out of an image?

Part segmentation

slide-7
SLIDE 7

What can we get out of an image?

Surface normal estimation

slide-8
SLIDE 8

What can we get out of an image?

Saliency estimation

slide-9
SLIDE 9

What can we get out of an image?

Boundary detection

slide-10
SLIDE 10

Can we do it all in one network?

  • I. Kokkinos, UberNet: A Universal Netwok for Low-,Mid-, and High-level Vision, CVPR 2017
slide-11
SLIDE 11

Ours, 1-Task Ours, Segmentation + Detection 78.7 80.1 Detection

Multi-tasking boosts performance

slide-12
SLIDE 12

Detection

Multi-tasking boosts performance?

Ours, 1-Task Ours, Segmentation + Detection Ours, 7-Task 78.7 80.1 77.8

slide-13
SLIDE 13

Ours, 1-Task Ours, Segmentation + Detection Ours, 7-Task 78.7 80.1 77.8 Detection Semantic Segmentation Ours, 1-Task Ours, Segmentation + Detection Ours, 7-Task 72.4 72.3 68.7

Did multi-tasking turn our network to a dilettante?

slide-14
SLIDE 14

Sh Should ld we just beef-up up the he task-sp specifi fic processi ssing?

Memory consumption

Number of parameters

Computation

Effectively no positive transfer across tasks

Mask R-CNN (ICCV 17), PAD-Net (CVPR18) Ubernet (CVPR 17)

slide-15
SLIDE 15

Multi-tasking can work (sometimes)

  • Mask R-CNN [1]:

multi-task: detection + segmentation

  • Eigen et al. [2] , PAD-Net [3]

multi-task: depth, sem. segmentation

  • Taskonomy [4]

transfer learning among tasks

[1] He et al., "Mask R-CNN", in ICCV 2017 [2] Eigen and Fergus, "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture", in ICCV 2015 [3] Xu et al., "PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing", in CVPR 2018 [4] Zamir et al., "Taskonomy: Disentangling Task Transfer Learning", in CVPR 2018

slide-16
SLIDE 16

Expression recognition

Unaligned Tasks

Identity recognition

MMI Facial Expression Database

One task’s noise is another task’s signal This is not even catastrophic forgetting: plain task interference

Learning Task Grouping and Overlap in Multi-Task Learning

  • A. Kumar, H. Daume, ICML 2012

Learning with Whom to Share in Multi-task Feature Learning

  • Z. Kang, K. Grauman, F. Sha, ICML 2011

Exploiting Unrelated Tasks in Multi-Task Learning,

  • B. Paredes, A. Argyriou, N. Berthouze, M. Pontil, AISTATS 2012

We could even try doing adversarial training on

  • ne task to improve performance for the other

(force desired invariance)

slide-17
SLIDE 17

Count the balls!

slide-18
SLIDE 18

Solution: give each other space

Task A Task B Shared

slide-19
SLIDE 19

Perform A Task A Task B Shared

Solution: give each other space

slide-20
SLIDE 20

Perform A Perform B Task A Task B Shared

Solution: give each other space

Question: how can we enforce and control the modularity of our representation? Less is more: fewer noisy features means easier job!

slide-21
SLIDE 21

Learning Modular networks by differentiable block sampling

Blockout: Dynamic Model Selection for Hierarchical Deep Networks,

  • C. Murdock, Z. Li, H. Zhou, T. Duerig, CVPR 2016

Blockout regularizer Blocks & induced architectures

slide-22
SLIDE 22

Learning Modular networks by differentiable block sampling

MaskConnect: Connectivity Learning by Gradient Descent, Karim Ahmed, Lorenzo Torresani, 2017

slide-23
SLIDE 23

Learning Modular networks by differentiable block sampling

Convolutional Neural Fabrics, S. Saxena and J. Verbeek, NIPS 2016 Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks, T. Veniat and L. Denoyer, CVPR 2018

slide-24
SLIDE 24

Modular networks for multi-tasking

PathNet: Evolution Channels Gradient Descent in Super Neural Networks, Fernando et al., 2017

slide-25
SLIDE 25

Modular networks for multi-tasking

PathNet: Evolution Channels Gradient Descent in Super Neural Networks, Fernando et al., 2017

slide-26
SLIDE 26

Aim: differentiable & modular multi-task networks

Task A Task B Shared Perform A Perform B

How to avoid combinatorial search over feature-task combinations?

slide-27
SLIDE 27

Attentive Single-Tasking of Multiple Tasks

  • Approach
  • Network performs one task at a time
  • Accentuate relevant features
  • Suppress irrelevant features

Kevis Maninis, Ilija Radosavovic, I.K. “Attentive single Tasking of Multiple Tasks”, CVPR 2019

http://www.vision.ee.ethz.ch/~kmaninis/astmt/

slide-28
SLIDE 28

Multi-Tasking Baseline

Need for universal representation Enc Dec

slide-29
SLIDE 29

Attention to Task - Ours

Per-task processing

  • Attention to task:

Focus on one task at a time

  • Accentuate relevant features
  • Suppress irrelevant features

Task-specific layers

Enc Dec

slide-30
SLIDE 30

Continuous search over blocks with attention

A Learned Representation For Artistic Style., V. Dumoulin, J. Shlens, and M. Kudlur. ICLR, 2017. FiLM: Visual Reasoning with a General Conditioning Layer, E. Perez, Florian Strub, H. Vries, V. Dumoulin, A. Courville, AAAI 2018 Learning Visual Reasoning Without Strong Priors, E. Perez, H. Vries, F. Strub, V. Dumoulin, A. Courville, 2017 Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization, Xun Huang, Serge Belongie, 2018 A Style-Based Generator Architecture for Generative Adversarial Networks, T. Karras, S. Laine, T. Aila, CVPR 2019

Modularity through Modulation: we can recover any task-specific block by shunning the remaining neurons

slide-31
SLIDE 31

Modulation: Squeeze and Excitation

slide-32
SLIDE 32

Squeeze and Excitation (SE)

  • Negligible amount of parameters
  • Global feature modulation

Hu et al., "Squeeze and Excitation Networks", in CVPR 2018

slide-33
SLIDE 33

Feature Augmentation: Residual Adapters

slide-34
SLIDE 34

Residual Adapters (RA)

  • Original used for Domain adaptation
  • Negligible amount of parameters
  • In this work: parallel residual adapters

Rebuffi et al., "Learning multiple visual domains with residual adapters", in NIPS 2017 Rebuffi et al., "Efficient parametrization of multi-domain deep neural networks", in CVPR 2018

slide-35
SLIDE 35

Adversarial Task Discriminator

slide-36
SLIDE 36

Handling Conflicting Gradients: Adversarial Training

Loss T1 Loss T2 Loss T3 Enc Dec

slide-37
SLIDE 37

Handling Conflicting Gradients: Adversarial Training

Loss T1 Loss T2 Loss T3

Accumulate Gradients and update weights

Enc Dec

slide-38
SLIDE 38

Handling Conflicting Gradients: Adversarial Training

Loss T1 Loss T2 Loss T3 D Loss Discr. Enc Dec

slide-39
SLIDE 39

Handling Conflicting Gradients: Adversarial Training

Loss T1 Loss T2 Loss T3

Accumulate Gradients and update weights

* (-k) Reverse the Gradient

Ganin and Lempitsky, "Unsupervised Domain Adaptation by Backpropagation", in ICML 15

D Loss Discr. Enc Dec

slide-40
SLIDE 40

Effect of adversarial training on gradients

t-SNE visualizations of gradients for 2 tasks, without and with adversarial training w/o adversarial training w/ adversarial training

slide-41
SLIDE 41

depth t-SNE visualizations of SE modulations for the first 32 val images in various depths of the network

shallow deep

Le Learn rned t task-sp specifi fic represe sentation

slide-42
SLIDE 42

Learned task-specific representation

depth

PCA projections into "RGB" space

slide-43
SLIDE 43

Relative average drop vs. # Parameters

slide-44
SLIDE 44

Relative average drop vs. FLOPS

slide-45
SLIDE 45

Qualitative Results: PASCAL

edge features Ours MTL Baseline

edge detections semantic seg. human part seg. surface normals saliency

slide-46
SLIDE 46

Qualitative Results

Our s Baselin e

slide-47
SLIDE 47

Qualitative Results

Our s Baselin e blurry edges sharper edges

slide-48
SLIDE 48

Qualitative Results

Our s Baselin e mixing of classes consistent

slide-49
SLIDE 49

Qualitative Results

Our s Baselin e sharper blurry

slide-50
SLIDE 50

Qualitative Results

Our s Baselin e checkerboard artifacts no artifacts

slide-51
SLIDE 51

More qualitative Results

slide-52
SLIDE 52

More qualitative Results

slide-53
SLIDE 53

Big picture: continuous optimization vs search

DARTS: Differentiable Architecture Search, H. Liu, K. Simonyan, Y. Yang

slide-54
SLIDE 54

Pre-attentive vs. attentive vision

Human factors and behavioral science: Textons, the fundamental elements in preattentive vision and perception of textures, Bela Julesz, James R. Bergen, 1983

slide-55
SLIDE 55

Human factors and behavioral science: Textons, the fundamental elements in preattentive vision and perception of textures, Bela Julesz, James R. Bergen, 1983

Pre-attentive vs. attentive vision

slide-56
SLIDE 56

Local attention: Harley et al, ICCV 2017

Segmentation-Aware Networks using Local Attention Masks,

  • A. Harley, K. Derpanis, I. Kokkinos, ICCV 2017
slide-57
SLIDE 57

Object-level priming

a.k.a. top-down image segmentation

slide-58
SLIDE 58

AdaptIS: Adaptive Instance Selection Network, Konstantin Sofiiuk, Olga Barinova, Anton Konushin, ICCV 2019

Priming Neural Networks Amir Rosenfeld , Mahdi Biparva , and John K.Tsotsos, CVPR 2018

Object & position-level priming

slide-59
SLIDE 59

Task-level priming: count the balls!

slide-60
SLIDE 60

Attentive Single-Tasking of Multiple Tasks

  • Approach
  • Network performs one task at a time
  • Accentuate relevant features
  • Suppress irrelevant features

Kevis Maninis, Ilija Radosavovic, I.K. “Attentive single Tasking of Multiple Tasks”, CVPR 2019

http://www.vision.ee.ethz.ch/~kmaninis/astmt/

slide-61
SLIDE 61

Thank you for your at attent ntion.

http://www.vision.ee.ethz.ch/~kmaninis/astmt/

slide-62
SLIDE 62

Double back-propagation

Harris Drucker, Yann LeCun, “Double Backpropagation Increasing Generalization Performance”, IJCNN 1991

slide-63
SLIDE 63

Double back-propagation

Harris Drucker, Yann LeCun, “Double Backpropagation Increasing Generalization Performance”, IJCNN 1991

slide-64
SLIDE 64

Adversarial Training using Double Back-Propagation

slide-65
SLIDE 65

Deeplab v3+: Sanity Check

Benchmark our re-implementation on popular benchmarks for different (single) tasks: low-, mid-, and high-level tasks

* COCO pre-training

Chen et al., "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", in ECCV 2018

slide-66
SLIDE 66

Ablation on PASCAL: Modulation

Type of modulation Location of modulation Attention-to-task almost reaches single-tasking performance

slide-67
SLIDE 67

Ablation on PASCAL: Adversarial training

Adversarial training helps! Gains smaller but free of additional computation

slide-68
SLIDE 68

Experiments on NYUD and FSV

Results equal or better to the single-tasking baselines

slide-69
SLIDE 69

Ablation: Different backbones

Results consistent across backbones