Squeeze-and-Excitation Networks Jie Hu 1,* Li Shen 2,* Gang Sun 1 2 - - PowerPoint PPT Presentation

squeeze and excitation networks
SMART_READER_LITE
LIVE PREVIEW

Squeeze-and-Excitation Networks Jie Hu 1,* Li Shen 2,* Gang Sun 1 2 - - PowerPoint PPT Presentation

Squeeze-and-Excitation Networks Jie Hu 1,* Li Shen 2,* Gang Sun 1 2 Department of Engineering Science, 1 Momenta University of Oxford Large Scale Visual Recognition Challenge Squeeze-and-Excitation Networks (SENets) formed the foundation of our


slide-1
SLIDE 1

Squeeze-and-Excitation Networks

Jie Hu1,* Li Shen2,* Gang Sun1

1 Momenta

2 Department of Engineering Science,

University of Oxford

slide-2
SLIDE 2

Large Scale Visual Recognition Challenge

Convolutional Neural Networks Feature Engineering

Squeeze-and-Excitation Networks (SENets) formed the foundation of our winner entry on ILSVRC 2017 Classification

[Statistics provided by ILSVRC] SENets

slide-3
SLIDE 3

Convolution

A convolutional filter is expected to be an informative combination

  • Fusing channel-wise and spatial information
  • Within local receptive fields
slide-4
SLIDE 4

A Simple CNN

slide-5
SLIDE 5

A Simple CNN

Channel dependencies are:

  • Implicit: Entangled with the spatial correlation

captured by the filters

  • Local: Unable to exploit contextual information
  • utside this region
slide-6
SLIDE 6

Exploiting Channel Relationships

Can the representational power of a network be enhanced by channel relationships? Design a new architectural unit

  • Explicitly model interdependencies between the

channels of convolutional features

  • Feature recalibration

q Selectively emphasise informative features and inhibit less useful ones q Use global information

slide-7
SLIDE 7

Squeeze-and-Excitation Blocks

Given transformation F"#:input X → feature maps U

  • Squeeze
  • Excitation
slide-8
SLIDE 8
  • Aggregate feature maps through spatial dimensions

using global average pooling

  • Generate channel-wise statistics

U can be interpreted as a collection of local descriptors whose statistics are expressive for the whole image.

Squeeze: Global Information Embedding

slide-9
SLIDE 9
  • Learn a nonlinear and non-mutually-exclusive relationship

between channels

  • Employ a self-gating mechanism with sigmoid function

q Input: channel-wise statistics q Bottleneck configuration with two FC layers around non-linearity q Output: channel-wise activations

Excitation: Adaptive Recalibration

slide-10
SLIDE 10
  • Rescale the feature maps U with the channel activations

q Act on the channels of U q Channel-wise multiplication

SE blocks intrinsically introduce dynamics conditioned on the input.

Excitation: Adaptive Recalibration

slide-11
SLIDE 11

Example Models

SE-ResNet Module

+

Global pooling FC ReLU

+

ResNet Module

X

  • X

X

  • X

Sigmoid

1 × 1 × C 1 × 1 × C 𝑠 1 × 1 × C 1 × 1 × C

Scale

𝐼 × W × C 𝐼 × W × C 𝐼 × W × C

Residual Residual FC

1 × 1 × C 𝑠

Inception Global pooling FC SE-Inception Module FC

X

Inception

  • X

Inception Module

X

  • X

Sigmoid Scale ReLU

𝐼 × W × C 1 × 1 × C 1 × 1 × C 1 × 1 × C 1 × 1 × C 𝑠 1 × 1 × C 𝑠 𝐼 × W × C

slide-12
SLIDE 12

Object Classification

Experiments on ImageNet-1k dataset

  • Benefits at different depths
  • Incorporation with modern architectures
slide-13
SLIDE 13

Benefits at Different Depths

SE blocks consistently improve performance across different depths at minimal additional computational complexity (no more than 0.26%).

ü SE-ResNet-50 exceeds ResNet-50 by 0.86% and approaches the result of ResNet-101. ü SE-ResNet-101 outperforms ResNet-152.

slide-14
SLIDE 14

Incorporation with Modern Architectures

SE blocks can boost the performance of a variety of network architectures on both residual and non-residual settings.

slide-15
SLIDE 15

Beyond Object Classification

  • Places365-Challenge Scene Classification
  • Object Detection on COCO

SE blocks can generalise well on different datasets and tasks.

slide-16
SLIDE 16

Role of Excitation

The role at different depths adapts to the needs of the network

  • Early layers: Excite informative features in a class agnostic manner

SE_2_3 SE_3_4

slide-17
SLIDE 17

Role of Excitation

The role at different depths adapts to the needs of the network

  • Later layers: Respond to different inputs in a highly class-specific manner

SE_4_6 SE_5_1

slide-18
SLIDE 18

Code and Models: https://github.com/hujie-frank/SENet

Conclusion

  • Designed a novel architectural unit to improve the representational

capacity of networks by dynamic channel-wise feature recalibration.

  • Provided insights into the limitations of previous CNN architectures in

modelling channel dependencies.

  • Induced feature importance may be helpful to related fields, e.g.

network compression.

slide-19
SLIDE 19

Thank you!