Implicit Reparameterization Gradients Michael Figurnov, Shakir - - PowerPoint PPT Presentation

implicit reparameterization gradients
SMART_READER_LITE
LIVE PREVIEW

Implicit Reparameterization Gradients Michael Figurnov, Shakir - - PowerPoint PPT Presentation

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room 210 #33 Reparameterization gradients Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with


slide-1
SLIDE 1

Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room 210 #33

Implicit Reparameterization Gradients

slide-2
SLIDE 2

Implicit Reparameterization Gradients — Michael Figurnov

Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables

Reparameterization gradients

slide-3
SLIDE 3

Implicit Reparameterization Gradients — Michael Figurnov

Reparameterization gradients

differentiable (ELBO, …) continuous (Normal, ...) backpropagation

Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables

slide-4
SLIDE 4

Implicit Reparameterization Gradients — Michael Figurnov

Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables

Reparameterization gradients

requires a tractable inverse transformation! Normal, Logistic, … differentiable (ELBO, …) continuous (Normal, ...) backpropagation

slide-5
SLIDE 5

Implicit Reparameterization Gradients — Michael Figurnov

Core part of variational autoencoders, automatic variational inference, etc. Backpropagation in graphs with continuous random variables We show how to use implicit differentiation for reparameterization of other continuous random variables, such as Gamma and von Mises

Reparameterization gradients

requires a tractable inverse transformation! Normal, Logistic, …

slide-6
SLIDE 6

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit

slide-7
SLIDE 7

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit

slide-8
SLIDE 8

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit

slide-9
SLIDE 9

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit

using any sampler (e.g., rejection sampling)

slide-10
SLIDE 10

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function Sampling (forward pass) Gradients (backward pass) Explicit Implicit

using any sampler (e.g., rejection sampling)

slide-11
SLIDE 11

Implicit Reparameterization Gradients — Michael Figurnov

Explicit and implicit reparameterization

Cumulative density function

Derivation: implicit differentiation

Sampling (forward pass) Gradients (backward pass) Explicit Implicit

using any sampler (e.g., rejection sampling)

slide-12
SLIDE 12

Implicit Reparameterization Gradients — Michael Figurnov

Derivation: implicit differentiation

Sampling (forward pass) Gradients (backward pass) Explicit Implicit

using any sampler (e.g., rejection sampling)

Explicit and implicit reparameterization

  • ften not implemented in

numerical libraries

Cumulative density function

slide-13
SLIDE 13

Implicit Reparameterization Gradients — Michael Figurnov

Jankowiak, Obermeyer “Pathwise Derivatives Beyond the Reparameterization Trick.” ICML, 2018

How to compute ?

Relative metrics (lower is better)

Gamma Von Mises Method Error Time Error Time Automatic differentiation of the CDF code 1x 1x 1x 1x Finite difference 832x 2x 514x 1.2x Jankowiak & Obermeyer (2018) concurrent work; closed-form approximation 18x 5x

slide-14
SLIDE 14

Implicit Reparameterization Gradients — Michael Figurnov

How to compute ?

Relative metrics (lower is better)

Gamma Von Mises Method Error Time Error Time Automatic differentiation of the CDF code 1x 1x 1x 1x Finite difference 832x 2x 514x 1.2x Jankowiak & Obermeyer (2018) concurrent work; closed-form approximation 18x 5x

  • Knowles (2015)

approximate explicit reparameterization 2840x 63x

  • Knowles, “Stochastic gradient variational Bayes for Gamma approximating distributions.” arXiv, 2015

Jankowiak, Obermeyer “Pathwise Derivatives Beyond the Reparameterization Trick.” ICML, 2018

slide-15
SLIDE 15

Implicit Reparameterization Gradients — Michael Figurnov

Variational Autoencoder

2D latent spaces for MNIST Normal prior and posterior

  • 3

3 3 3

slide-16
SLIDE 16

Implicit Reparameterization Gradients — Michael Figurnov

Variational Autoencoder

2D latent spaces for MNIST Normal prior and posterior Uniform prior, von Mises posterior

  • 3

3 3 3

  • 𝜌

𝜌 𝜌 𝜌

Torus adapted from https://en.wikipedia.org/wiki/Torus#/media/File:Sphere-like_degenerate_torus.gif

slide-17
SLIDE 17

Implicit Reparameterization Gradients — Michael Figurnov

Variational Autoencoder

2D latent spaces for MNIST Normal prior and posterior Uniform prior, von Mises posterior

  • 3

3 3 3

  • 𝜌

𝜌 𝜌 𝜌

Torus adapted from https://en.wikipedia.org/wiki/Torus#/media/File:Sphere-like_degenerate_torus.gif

Also in the paper: Latent Dirichlet Allocation

slide-18
SLIDE 18

Implicit Reparameterization Gradients — Michael Figurnov

Implicit Reparameterization Gradients

Michael Figurnov, Shakir Mohamed, Andriy Mnih

  • A more general view of the reparameterization gradients

○ Decouple sampling from gradient estimation

  • Reparameterization gradients for Gamma, von Mises, Beta, Dirichlet, ...

○ Faster and more accurate than the alternatives ○ Implemented in TensorFlow Probability:

tfp.distributions.{Gamma,VonMises,Beta,Dirichlet,...}

  • Move away from making modelling choices for computational convenience

Poster: Room 210 #33