Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - - PowerPoint PPT Presentation
Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - - PowerPoint PPT Presentation
Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn how VAEs help in sampling from a data distribution Write the objective function of a VAE Derive how VAE objective is adapted for SGD VAE setup
Objectives
- Learn how VAEs help in sampling from a data
distribution
- Write the objective function of a VAE
- Derive how VAE objective is adapted for SGD
VAE setup
- We are interested in maximizing the data
likelihood
𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨
- Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄
- Further, let us assume that
𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏2𝐽
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
We do not care about distribution of z
- Latent variable z is drawn from a standard
normal
- It may represent many different variations of
the data
N θ X z ~ N(0,I)
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Example of a variable transformation
X = g(z) = z/10 + z/‖z‖ z
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Because of Gaussian assumption, the most obvious variation may not be the most likely
- Although the ‘2’ on the right is a better choice
as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Sampling z from standard normal is problematic
- It may give samples of z that are unlikely to
have produced X
- Can we sample z itself intelligently?
- Enter Q(z|X) to compute, e.g., Ez~QP(X|z)
- All we need to do is reduce the KL divergence
between P(X) and Ez~QP(X|z)
- Hence, a variational method
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
VAE Objective Setup
D[Q(z) ‖ P(z|X)] = Ez~Q[log Q(z) − log P(z|X)] = Ez~Q[log Q(z) − log P(X|z) − log P(z)] + log P(X) Rearranging some terms: log P(X) − D[Q(z) ‖ P(z|X)] = Ez∼Q[log P(X|z)+ − D[Q(z) ‖ P(z)] Introducing dependency of Q on X: log P(X) − D[Q(z|X) ‖ P(z|X)] = Ez∼Q[log P(X|z)+ − D[Q(z|X) ‖ P(z)]
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Optimizing the RHS
- Q is encoding X into z; P(X|z) is decoding z
- Assume in LHS Q(z|X) is a high capacity NN
- For: Ez∼Q[log P(X|z)+ − D[Q(z|X) ‖ P(z)]
- Assume: Q(z|X) = N(z|μ(X;θ),∑(X;θ))
- Then KL divergence is:
D[N(μ(X),Σ(X)) ‖ N(0,I)] = 1/2 [ tr(Σ(X)) + μ(X)Tμ(X) − k − log det(Σ(X)) ]
- In SGD, the objective becomes maximizing:
EX∼D*log P(X)−D*Q(z|X) ‖ P(z|X)]] =EX∼D[Ez∼Q[log P(X|z)+ − D*Q(z|X) ‖ P(z)]]
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Moving the gradient inside the expectation
- We need to compute the gradient of:
log P(X|z) − D*Q(z|X) ‖ P(z)+
- The first term does not depend on parameters
Q, but Ez∼Q[log P(X|z)] does!
- So, we need to generate z that are plausible,
i.e. decodable
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
The actual model that resists backpropagation
- Cannot backpropagate through a stochastic unit
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
The actual model that resists backpropagation
- EX∼D[Ee∼N(0,I)[log P(X|z=μ(X) +Σ1/2(X)∗e)+−D*Q(z|X)‖P(z)++
- Now, we can BP end-to-end, because expectations are not
with respect to distributions dependent on the model
Reparameterization trick: e∼N(0,I) and z=μ(X)+Σ1/2(X)∗e This works, if Q(z|X) and P(z) are continuous
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Test-time sampling is straightforward
- The encoder pathway, including the
multiplication and addition are discarded
- For getting an estimate of likelihood of a test
sample, generate z, and then compute P(z|X)
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Conditional VAE
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Sample results for a MNIST VAE
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch
Sample results for a MNIST CVAE
Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch