equivariant transformer networks
play

Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter - PowerPoint PPT Presentation

Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter Bailis & Gregory Valiant Stanford University github.com/stanford-futuredata/ equivariant-transformers Goal: Transformation-invariant models How can we learn models that


  1. Equivariant Transformer Networks (Poster 18) Kai Sheng Tai , Peter Bailis & Gregory Valiant Stanford University github.com/stanford-futuredata/ equivariant-transformers

  2. Goal: Transformation-invariant models ● How can we learn models that are invariant to certain input transformations? ● Relevant to many application domains: astronomical objects plankton micrographs traffic signs ● In this work, we explore alternatives to data augmentation ● How can we build invariances directly into network architectures? [Group Equivariant CNNs (Cohen+’16, Dieleman+’16), Harmonic Networks (Worrall+’17), etc.] ● Can we achieve invariance while reusing off-the-shelf architectures? [Spatial Transformer Networks (Jaderberg+’15)]

  3. Equivariant Transformer Layers Standard 2 ET CNN ● An Equivariant Transformer (ET) is a differentiable image-to-image mapping ● Key property (“ local invariance ”): ⎼ all transformed versions of a base image are mapped to the same output image ● Requirement : ⎼ family of transformations forms a Lie group: transformations are invertible, differentiable wrt a real-valued parameter ⎼ includes many common families of transformations: translation, rotation, scaling, shear, perspective, etc.

  4. Key ideas 1. Standard convolutional layers are translation-equivariant ⎼ i.e., input translated by 𝜄 → output translated by 𝜄

  5. Key ideas 1. Standard convolutional layers are translation-equivariant ⎼ i.e., input translated by 𝜄 → output translated by 𝜄 2. Specialized coordinates turn smooth transformations into translation ⎼ Example (rotation): in polar coordinates, rotation appears as translation by angle 𝜄 Cartesian coordinates Polar coordinates ⎼ This can be generalized to other smooth transformations using canonical coordinate systems for Lie groups (Rubinstein+’91)

  6. ETs are locally invariant by construction inverse transformation translation estimated canonical equivariant transformation coordinate CNN parameter representation ● Equivariance guarantees that an additional transformation of 𝜄 causes the estimated parameter to be increased by 𝜄 ● The output is therefore invariant to transformations of the input ● We implement transformation with differentiable grid resampling (Jaderberg+’15)

  7. Compositions of ETs handle more complicated transformations ● Since ETs map images to images, they can be composed sequentially input x-shear aspect ratio x-perspective y-perspective

  8. ETs improve generalization larger improvements when training data is limited

  9. Takeaways Poster #18 kst@cs.stanford.edu ● Equivariant Transformers build transformation invariance into neural network architectures ● Main ideas: ⎼ Canonical coordinates let us tailor ET layers to specific transformation groups ⎼ Image-to-image interface lets us compose ETs to handle more complicated transformation groups Try it yourself! github.com/stanford-futuredata/ equivariant-transformers

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend