Introduction to Deep Models Part II: Variational Autoencoders and - - PowerPoint PPT Presentation

introduction to deep models
SMART_READER_LITE
LIVE PREVIEW

Introduction to Deep Models Part II: Variational Autoencoders and - - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part


slide-1
SLIDE 1

TensorFlow Workshop 2018

Introduction to Deep Models

Part II: Variational Autoencoders and Latent Spaces Nick Winovich

Department of Mathematics Purdue University

July 2018

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-2
SLIDE 2

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-3
SLIDE 3

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-4
SLIDE 4

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-5
SLIDE 5

Feature Extraction with Autoencoders

As discussed in Part I, the process of manually defining features is typically infeasible for complex datasets. The hidden layers of neural networks naturally define features to a certain degree, however, we may wish to find a collection of features which completely characterizes a given example. To be precise, we must first clarify what it means to “completely characterize” an example. A simple, but natural, way to define this concept is to say that a set of features characterizes an example if the full example can be reproduced from those features alone. Although it may sound rather trivial at first, this leads to a natural approach for automating feature extraction: train a neural network to learn the identity mapping, and introduce a bottleneck layer to force a reduction in the data/feature dimensions.

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-6
SLIDE 6

Autoencoder Model

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II Input Hidden Encoded Hidden Reconstructed

slide-7
SLIDE 7

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-8
SLIDE 8

Auto-Encoding Variational Bayes

Kingma, D.P . and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

A particularly effective autoencoding model, introduced by Kingma and Welling in 2013, is the variational autoencoder (VAE). The VAE model is defined in terms of a probabilistic, Bayesian

  • framework. In this framework, the features at the bottleneck of the

network are interpretted as unobservable latent variables. To approximate the underlying Bayesian model, VAE networks introduce a sampling procedure in the latent variable space.

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-9
SLIDE 9

Variational Autoencoder

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II Input Hidden Latent Hidden Reconstructed

ε

slide-10
SLIDE 10

Variational Autoencoder Graph [TensorBoard]

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-11
SLIDE 11

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-12
SLIDE 12

Sampling Procedure

The encoder produces means {µk} and standard deviations {σk} corresponding to a collection of independent normal distributions for the latent variables. A vector ε is sampled from a normal distribution N(0, I) and the sample latent vector is defined by:

z = µ + σ ⊙ ε

The introduction of the standard normal sample ε, referred to as the “reparameterization trick”, is used to maintain a differentiable relation between the weights of the network and the loss function (since the sample ε is fixed at each step). This allows us to train the network end-to-end using the backpropogation method.

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-13
SLIDE 13

Sampling Procedure

Practical Implementation

In practice, it is not numerically stable to work with the standard deviations {σk} directly; instead, the network is trained to predict the values {log(σk)} and the latent vector is sampled via:

z = µ + exp(log σ) ⊙ ε

This has the additional benefit of removing the restriction that the network predictions for {σk} must always be positive.

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-14
SLIDE 14

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-15
SLIDE 15

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-16
SLIDE 16

Variational Bayesian Model

The VAE framework aims to approximate the intractable poseterior distritbution pθ(z|x) in the latent space by a recognition model:

qφ(z | x) ∼ distribution of z given x

where φ correspond to the model parameters of the encoder component of the network, and θ correspond to the parameters of the network’s decoder which is used to define a generative model:

pθ(x | z) ∼ distribution of x given z

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-17
SLIDE 17

Variational Bayesian Model

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-18
SLIDE 18

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-19
SLIDE 19

Motivation for Kullback–Leibler Divergence

When using an encoder/decoder model structure it is helpful to anchor the input values received by the decoder during training (similar to the motivation for batch normalization). For example, the encoder component may learn to produce latent representations distributed according to a normal distribution N(µ, Σ) for some mean vector µ and covariance matrix Σ. However, this latent distribution can be shifted arbitrarily without affecting the theoretically attainable performance of the network. In particular, there are an infinite number of model configurations which can achieve the optimal level of performance. The lack of a unique solution can be problematic during training; to address this, we can attempt to bias the encoder toward the distribution N(0, I).

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-20
SLIDE 20

Kullback–Leibler Divergence

The KL-Divergence is introduced to the loss as a regularization term; assuming the prior is taken to be a standard normal: KL

  • N(µ, Σ)
  • Nstd
  • = 1

2

  • tr
  • Σ
  • + µT µ − N − log det
  • Σ
  • Model accuracy is accounted for by the “reconstruction loss” term:

Eqφ(z|x)

  • log pθ(x | z)
  • The full loss function is then defined to be the negative Expectation

Lower Bound (ELBO) which, after some manipulation, is given by: −ELBO(θ, φ) = KL

  • qφ(z | x)
  • Nstd
  • − Eqφ(z|x)
  • log pθ(x | z)
  • SIAM@Purdue 2018 - Nick Winovich

Introduction to Deep Models : Part II

slide-21
SLIDE 21

Kullback–Leibler Divergence

Example

For a diagonal covariance, e.g.independent latent variables, with parameters {µk} and {σk} the KL-Divergence reduces to: KL

  • qφ(z | x)
  • Nstd
  • = 1

2

N

  • k=1
  • σ2

k + µ2 k − 1 − log σ2 k

  • In the case of binary classification (assuming a Bernoulli prior

distribution), the reconstruction loss coincides precisely with the negative binary cross entropy; i.e. setting x = D(z), we have: Eqφ(z|x)

  • log pθ(x | z)
  • = x · log(

x) + (1 − x) · log(1 − x)

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-22
SLIDE 22

Outline

1

Variational Autoencoders

Autoencoder Models Variational Autoencoders Reparameterization Trick

2

Latent Represenations

Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-23
SLIDE 23

Example: Latent Space Interpolation

Once the VAE model is trained, we can investigate the learned latent representations by decoding points in the latent space. For example, after training a VAE model on the MNIST dataset we can use the encoder (i.e. recognition model) to retrieve the latent representations of two handwritten digits, e.g. z0 = E[x0] and z1 = E[x1] where x0 ∼ “3” and x1 ∼ “7”. Linear interpolation can then be used to visualize the path connecting the two data points:

xθ = D

  • (1 − θ) · z0 + θ · z1
  • SIAM@Purdue 2018 - Nick Winovich

Introduction to Deep Models : Part II

slide-24
SLIDE 24

Example: Learned Manifold Structure

Figure from Auto-Encoding Variational Bayes

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

slide-25
SLIDE 25

References and Notable Papers

https://github.com/terryum/awesome-deep-learning-papers

SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II