Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen - PowerPoint PPT Presentation

Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu Google DeepMind and University of Oxford

ConvNets • Interleaving convolutional layers with max-pooling layers allows translation invariance. - Pooling is simplistic. - Only small invariances per pooling layer - Limited spatial transformation - Pools across entire image + Exceptionally effective • Can we do better?

Motivation 1: transformations of input data Rotated MNIST (+/- 90°)

Motivation 2: attention

Conditional Spatial Warping • Conditional on input featuremap, spatially warp image. + Transforms data to a space expected by subsequent layers + Intelligently select features of interest (attention) + Invariant to more generic warping transform transform

Conditional Spatial Warping network Spatial output input transform er

A differentiable module for spatially transforming data, conditional on the data itself Grid Localisation net generator V U Sampler Spatial Transformer

Sampling Grid Warp regular grid by an affine Can parameterise, e.g. affine transformation transformation ⎛ ⎞ # ⎛ ⎞ Ã ! " x t x t i i x s θ 11 θ 12 θ 13 ⎜ ⎟ ⎜ ⎟ i y t y t = T θ ( G i ) = A θ ⎠ = ⎝ ⎝ ⎠ i i y s θ 21 θ 22 θ 23 i 1 1 V U

Sampling Grid Warp regular grid by an affine Can parameterise attention transformation ⎛ ⎞ ⎛ ⎞ Ã ! x t x t · ¸ i i x s ⎜ ⎟ s 0 t x ⎜ ⎟ i y t y t = T θ ( G i ) = A θ ⎠ = ⎝ ⎝ ⎠ i i y s 0 s t y i 1 1 V U

Identity affine transformation transformation

A differentiable module for spatially transforming data, conditional on the data itself Grid Localisation net generator V U Sampler Spatial Transformer

Spatial Transformer Networks • Spatial Transformers is differentiable, and so can be inserted at any point in a feed forward network and trained by back propogration = Example: • digit classification, loss: cross-entropy for 10 way classification 0 0 CNN CNN ST 9 9

MNIST Digit Classification Training data: 6000 examples of each digit Testing data: 10k images Can achieve testing error of 0.23%

Task: classify MNIST digits • Training and test randomly rotated by (+/- 90°) • Fully connected network with affine ST on input Spatial network output transformer input Performance: • FCN 2.1 • CNN 1.2 • ST-FCN 1.2 • ST-CNN 0.7

Generalizations 1: transformations • Affine transformation – 6 parameters • Projective transformation – 8 parameters • Thin plate spline transformation • Etc Any transformation where parameters can be regressed

Rotated MNIST ST-FCN Affine ST-FCN Thin Plate Spline Input ST Output Input ST Output 7 7 3 6 2 2 9 7 7 8

Rotated, Translated & Scaled MNIST ST-FCN Projective ST-FCN Thin Plate Spline Input ST Output Input ST Output 9 6 0 7 5 3 8 1 5 7

Translated Cluttered MNIST ST-FCN Affine ST-CNN Affine Input ST Output Input ST Output 6 5 3 1 5 2 5 4 6 4

Results on performance R: rotation P: projective RTS: rotation, translation, scale E: elastic

Generalization 2: Multiple spatial transformers • Spatial Transformers can be inserted before/after conv layers, before/after max-pooling • Can also have multiple Spatial Transformers at the same level = 0 conv3 ST2a conv2 conv1 ST3 9 ST1 ST2b

Task: Add digits in two images MNIST digits under rotation, translation and scale Architecture

Task: Add digits in two images input (2 channels) MNIST 2-channel addition chan1 chan2 Add up the digits. One per channel. Random per-channel rotation, scale and translation. SpatialTransformer1 SpatialTransformer2 SpatialTransformer1 automatically specialises to rectify channel 1. SpatialTransformer2 automatically specialises to rectify channel 2. chan1 chan2 chan1 chan2

Task: Add digits in two images MNIST digits under rotation, translation and scale Performance % error

Applications and comparisons with the state of the art

Street View House Numbers (SVHN) 200k real images of house numbers collected from Street View Between 1 and 5 digits in each number 2 2 Architecture: 2 …. fc3 null conv2 conv1 null ST2 ST1 4 spatial transformer + conv layers, 4 conv layers, 3 fc layers, 5 character output layers

SVHN 64x64 • CNN: 4.0% error • (single model) Goodfellow et al 2013 • Attention: 3.9% error • (ensemble with MC averaging) Ba et al, ICLR 2015 • ST net: 3.6% error • (single model)

SVHN 128x128 • CNN: 5.6% error • (single model) • Attention: 4.5% error • (ensemble with MC averaging) Ba et al, ICLR 2015 • ST net: 3.9% error • (single model)

Fine Grained Visual Categorization CUB-200-2011 birds dataset • 200 species of birds • 6k training images • 5.8k test images

Spatial Transformer Network • Pre-train inception networks on ImageNet • Train spatial transformer network on fine grained multi-way classification

CUB Performance

Summary ● Spatial Transformers allow dynamic, conditional cropping and warping of images/feature maps. ● Can be constrained and used as very fast attention mechanism. ● Spatial Transformer Networks localise and rectify objects automatically. Achieve state of the art results. ● Can be used as a generic localisation mechanism which can be learnt with backprop.

Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen - PowerPoint PPT Presentation

Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu Google DeepMind and University of Oxford ConvNets Interleaving convolutional layers with max-pooling layers allows

Components Ari Grant Our Journey Layout of a feed story Code for a feed storys header

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos

The Safe Feed/Safe Food Certification Program Feed Safety Stair Steps HAACP-SF/SF SAFE FEED/

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS

Feed My Starving Children Feed My Starving Children Mobile Pack Event Feed My Starving Children

Spatial Transformer Networks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Investigating positional information in the Transformer Group 9 Outline Background &

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

A Mathematical View of Attention Models in Deep Learning Shuiwang Ji, Yaochen Xie Department of

1 Planning Conside r ations F unding is c o ming o ut Co uld b e re c e iving ne w q uic

Some examples of issue- definitions and their relation to the politics of attention POLI 195

Transformer Networks Amir Ali Moinfar - M. Soleymani Deep Learning Sharif University of

For City and County Recipients September 1, 2020 For State and Territory Recipients

HEALTH INFORMATION TECHNOLOGY INFORMATION SESSION TRINITY RIVER EAST CAMPUS TARRANT COUNTY

Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen - PowerPoint PPT Presentation

Spatial Transformers in Feed-Forward Networks Max Jaederberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu Google DeepMind and University of Oxford ConvNets Interleaving convolutional layers with max-pooling layers allows

Components Ari Grant Our Journey Layout of a feed story Code for a feed storys header

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos

The Safe Feed/Safe Food Certification Program Feed Safety Stair Steps HAACP-SF/SF SAFE FEED/

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS

Feed My Starving Children Feed My Starving Children Mobile Pack Event Feed My Starving Children

Spatial Transformer Networks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial

Neural Networks Greg Mori - CMPT 419/726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Investigating positional information in the Transformer Group 9 Outline Background &amp;

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

A Mathematical View of Attention Models in Deep Learning Shuiwang Ji, Yaochen Xie Department of

1 Planning Conside r ations F unding is c o ming o ut Co uld b e re c e iving ne w q uic

Some examples of issue- definitions and their relation to the politics of attention POLI 195

Transformer Networks Amir Ali Moinfar - M. Soleymani Deep Learning Sharif University of

For City and County Recipients September 1, 2020 For State and Territory Recipients

HEALTH INFORMATION TECHNOLOGY INFORMATION SESSION TRINITY RIVER EAST CAMPUS TARRANT COUNTY

Investigating positional information in the Transformer Group 9 Outline Background &