Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter - - PowerPoint PPT Presentation

equivariant transformer networks
SMART_READER_LITE
LIVE PREVIEW

Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter - - PowerPoint PPT Presentation

Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter Bailis & Gregory Valiant Stanford University github.com/stanford-futuredata/ equivariant-transformers Goal: Transformation-invariant models How can we learn models that


slide-1
SLIDE 1

Kai Sheng Tai, Peter Bailis & Gregory Valiant

Equivariant Transformer Networks

Stanford University

(Poster 18)

github.com/stanford-futuredata/equivariant-transformers

slide-2
SLIDE 2

Goal: Transformation-invariant models

  • How can we learn models that are invariant to certain input transformations?
  • Relevant to many application domains:

astronomical objects plankton micrographs traffic signs

  • In this work, we explore alternatives to data augmentation
  • How can we build invariances directly into network architectures?

[Group Equivariant CNNs (Cohen+’16, Dieleman+’16), Harmonic Networks (Worrall+’17), etc.]

  • Can we achieve invariance while reusing off-the-shelf architectures?

[Spatial Transformer Networks (Jaderberg+’15)]

slide-3
SLIDE 3

Equivariant Transformer Layers

  • An Equivariant Transformer (ET) is a differentiable image-to-image mapping
  • Key property (“local invariance”):

⎼ all transformed versions of a base image are mapped to the same output image

  • Requirement:

⎼ family of transformations forms a Lie group: transformations are invertible, differentiable wrt a real-valued parameter ⎼ includes many common families of transformations: translation, rotation, scaling, shear, perspective, etc.

Standard CNN ET

2

slide-4
SLIDE 4

Key ideas

1. Standard convolutional layers are translation-equivariant ⎼ i.e., input translated by 𝜄 → output translated by 𝜄

slide-5
SLIDE 5

Key ideas

1. Standard convolutional layers are translation-equivariant ⎼ i.e., input translated by 𝜄 → output translated by 𝜄 2. Specialized coordinates turn smooth transformations into translation ⎼ Example (rotation): in polar coordinates, rotation appears as translation by angle 𝜄 ⎼ This can be generalized to other smooth transformations using canonical coordinate systems for Lie groups (Rubinstein+’91)

Cartesian coordinates Polar coordinates

slide-6
SLIDE 6

ETs are locally invariant by construction

translation equivariant CNN estimated transformation parameter canonical coordinate representation

  • Equivariance guarantees that an additional transformation of 𝜄 causes

the estimated parameter to be increased by 𝜄

  • The output is therefore invariant to transformations of the input
  • We implement transformation with differentiable grid resampling (Jaderberg+’15)

inverse transformation

slide-7
SLIDE 7

Compositions of ETs handle more complicated transformations

input x-shear aspect ratio x-perspective y-perspective

  • Since ETs map images to images, they can be composed sequentially
slide-8
SLIDE 8

ETs improve generalization

larger improvements when training data is limited

slide-9
SLIDE 9

Takeaways

  • Equivariant Transformers build transformation

invariance into neural network architectures

  • Main ideas:

⎼ Canonical coordinates let us tailor ET layers to specific transformation groups ⎼ Image-to-image interface lets us compose ETs to handle more complicated transformation groups

Poster #18

kst@cs.stanford.edu

Try it yourself!

github.com/stanford-futuredata/equivariant-transformers