SLIDE 1
Introduction to Deep Models Part II: Variational Autoencoders and - - PowerPoint PPT Presentation
Introduction to Deep Models Part II: Variational Autoencoders and - - PowerPoint PPT Presentation
TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part
SLIDE 2
SLIDE 3
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 4
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 5
Feature Extraction with Autoencoders
As discussed in Part I, the process of manually defining features is typically infeasible for complex datasets. The hidden layers of neural networks naturally define features to a certain degree, however, we may wish to find a collection of features which completely characterizes a given example. To be precise, we must first clarify what it means to “completely characterize” an example. A simple, but natural, way to define this concept is to say that a set of features characterizes an example if the full example can be reproduced from those features alone. Although it may sound rather trivial at first, this leads to a natural approach for automating feature extraction: train a neural network to learn the identity mapping, and introduce a bottleneck layer to force a reduction in the data/feature dimensions.
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 6
Autoencoder Model
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II Input Hidden Encoded Hidden Reconstructed
SLIDE 7
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 8
Auto-Encoding Variational Bayes
Kingma, D.P . and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
A particularly effective autoencoding model, introduced by Kingma and Welling in 2013, is the variational autoencoder (VAE). The VAE model is defined in terms of a probabilistic, Bayesian
- framework. In this framework, the features at the bottleneck of the
network are interpretted as unobservable latent variables. To approximate the underlying Bayesian model, VAE networks introduce a sampling procedure in the latent variable space.
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 9
Variational Autoencoder
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II Input Hidden Latent Hidden Reconstructed
ε
SLIDE 10
Variational Autoencoder Graph [TensorBoard]
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 11
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 12
Sampling Procedure
The encoder produces means {µk} and standard deviations {σk} corresponding to a collection of independent normal distributions for the latent variables. A vector ε is sampled from a normal distribution N(0, I) and the sample latent vector is defined by:
z = µ + σ ⊙ ε
The introduction of the standard normal sample ε, referred to as the “reparameterization trick”, is used to maintain a differentiable relation between the weights of the network and the loss function (since the sample ε is fixed at each step). This allows us to train the network end-to-end using the backpropogation method.
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 13
Sampling Procedure
Practical Implementation
In practice, it is not numerically stable to work with the standard deviations {σk} directly; instead, the network is trained to predict the values {log(σk)} and the latent vector is sampled via:
z = µ + exp(log σ) ⊙ ε
This has the additional benefit of removing the restriction that the network predictions for {σk} must always be positive.
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 14
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 15
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 16
Variational Bayesian Model
The VAE framework aims to approximate the intractable poseterior distritbution pθ(z|x) in the latent space by a recognition model:
qφ(z | x) ∼ distribution of z given x
where φ correspond to the model parameters of the encoder component of the network, and θ correspond to the parameters of the network’s decoder which is used to define a generative model:
pθ(x | z) ∼ distribution of x given z
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 17
Variational Bayesian Model
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 18
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 19
Motivation for Kullback–Leibler Divergence
When using an encoder/decoder model structure it is helpful to anchor the input values received by the decoder during training (similar to the motivation for batch normalization). For example, the encoder component may learn to produce latent representations distributed according to a normal distribution N(µ, Σ) for some mean vector µ and covariance matrix Σ. However, this latent distribution can be shifted arbitrarily without affecting the theoretically attainable performance of the network. In particular, there are an infinite number of model configurations which can achieve the optimal level of performance. The lack of a unique solution can be problematic during training; to address this, we can attempt to bias the encoder toward the distribution N(0, I).
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 20
Kullback–Leibler Divergence
The KL-Divergence is introduced to the loss as a regularization term; assuming the prior is taken to be a standard normal: KL
- N(µ, Σ)
- Nstd
- = 1
2
- tr
- Σ
- + µT µ − N − log det
- Σ
- Model accuracy is accounted for by the “reconstruction loss” term:
Eqφ(z|x)
- log pθ(x | z)
- The full loss function is then defined to be the negative Expectation
Lower Bound (ELBO) which, after some manipulation, is given by: −ELBO(θ, φ) = KL
- qφ(z | x)
- Nstd
- − Eqφ(z|x)
- log pθ(x | z)
- SIAM@Purdue 2018 - Nick Winovich
Introduction to Deep Models : Part II
SLIDE 21
Kullback–Leibler Divergence
Example
For a diagonal covariance, e.g.independent latent variables, with parameters {µk} and {σk} the KL-Divergence reduces to: KL
- qφ(z | x)
- Nstd
- = 1
2
N
- k=1
- σ2
k + µ2 k − 1 − log σ2 k
- In the case of binary classification (assuming a Bernoulli prior
distribution), the reconstruction loss coincides precisely with the negative binary cross entropy; i.e. setting x = D(z), we have: Eqφ(z|x)
- log pθ(x | z)
- = x · log(
x) + (1 − x) · log(1 − x)
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 22
Outline
1
Variational Autoencoders
Autoencoder Models Variational Autoencoders Reparameterization Trick
2
Latent Represenations
Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 23
Example: Latent Space Interpolation
Once the VAE model is trained, we can investigate the learned latent representations by decoding points in the latent space. For example, after training a VAE model on the MNIST dataset we can use the encoder (i.e. recognition model) to retrieve the latent representations of two handwritten digits, e.g. z0 = E[x0] and z1 = E[x1] where x0 ∼ “3” and x1 ∼ “7”. Linear interpolation can then be used to visualize the path connecting the two data points:
xθ = D
- (1 − θ) · z0 + θ · z1
- SIAM@Purdue 2018 - Nick Winovich
Introduction to Deep Models : Part II
SLIDE 24
Example: Learned Manifold Structure
Figure from Auto-Encoding Variational Bayes
SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II
SLIDE 25