Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - - PowerPoint PPT Presentation

advanced machine learning
SMART_READER_LITE
LIVE PREVIEW

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn how VAEs help in sampling from a data distribution Write the objective function of a VAE Derive how VAE objective is adapted for SGD VAE setup


slide-1
SLIDE 1

Advanced Machine Learning Variational Auto-encoders

Amit Sethi, EE, IITB

slide-2
SLIDE 2

Objectives

  • Learn how VAEs help in sampling from a data

distribution

  • Write the objective function of a VAE
  • Derive how VAE objective is adapted for SGD
slide-3
SLIDE 3

VAE setup

  • We are interested in maximizing the data

likelihood

𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨

  • Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄
  • Further, let us assume that

𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏2𝐽

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-4
SLIDE 4

We do not care about distribution of z

  • Latent variable z is drawn from a standard

normal

  • It may represent many different variations of

the data

N θ X z ~ N(0,I)

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-5
SLIDE 5

Example of a variable transformation

X = g(z) = z/10 + z/‖z‖ z

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-6
SLIDE 6

Because of Gaussian assumption, the most obvious variation may not be the most likely

  • Although the ‘2’ on the right is a better choice

as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-7
SLIDE 7

Sampling z from standard normal is problematic

  • It may give samples of z that are unlikely to

have produced X

  • Can we sample z itself intelligently?
  • Enter Q(z|X) to compute, e.g., Ez~QP(X|z)
  • All we need to do is reduce the KL divergence

between P(X) and Ez~QP(X|z)

  • Hence, a variational method

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-8
SLIDE 8

VAE Objective Setup

D[Q(z) ‖ P(z|X)] = Ez~Q[log Q(z) − log P(z|X)] = Ez~Q[log Q(z) − log P(X|z) − log P(z)] + log P(X) Rearranging some terms: log P(X) − D[Q(z) ‖ P(z|X)] = Ez∼Q[log P(X|z)+ − D[Q(z) ‖ P(z)] Introducing dependency of Q on X: log P(X) − D[Q(z|X) ‖ P(z|X)] = Ez∼Q[log P(X|z)+ − D[Q(z|X) ‖ P(z)]

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-9
SLIDE 9

Optimizing the RHS

  • Q is encoding X into z; P(X|z) is decoding z
  • Assume in LHS Q(z|X) is a high capacity NN
  • For: Ez∼Q[log P(X|z)+ − D[Q(z|X) ‖ P(z)]
  • Assume: Q(z|X) = N(z|μ(X;θ),∑(X;θ))
  • Then KL divergence is:

D[N(μ(X),Σ(X)) ‖ N(0,I)] = 1/2 [ tr(Σ(X)) + μ(X)Tμ(X) − k − log det(Σ(X)) ]

  • In SGD, the objective becomes maximizing:

EX∼D*log P(X)−D*Q(z|X) ‖ P(z|X)]] =EX∼D[Ez∼Q[log P(X|z)+ − D*Q(z|X) ‖ P(z)]]

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-10
SLIDE 10

Moving the gradient inside the expectation

  • We need to compute the gradient of:

log P(X|z) − D*Q(z|X) ‖ P(z)+

  • The first term does not depend on parameters

Q, but Ez∼Q[log P(X|z)] does!

  • So, we need to generate z that are plausible,

i.e. decodable

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-11
SLIDE 11

The actual model that resists backpropagation

  • Cannot backpropagate through a stochastic unit

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-12
SLIDE 12

The actual model that resists backpropagation

  • EX∼D[Ee∼N(0,I)[log P(X|z=μ(X) +Σ1/2(X)∗e)+−D*Q(z|X)‖P(z)++
  • Now, we can BP end-to-end, because expectations are not

with respect to distributions dependent on the model

Reparameterization trick: e∼N(0,I) and z=μ(X)+Σ1/2(X)∗e This works, if Q(z|X) and P(z) are continuous

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-13
SLIDE 13

Test-time sampling is straightforward

  • The encoder pathway, including the

multiplication and addition are discarded

  • For getting an estimate of likelihood of a test

sample, generate z, and then compute P(z|X)

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-14
SLIDE 14

Conditional VAE

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-15
SLIDE 15

Sample results for a MNIST VAE

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch

slide-16
SLIDE 16

Sample results for a MNIST CVAE

Source: VAEs by Kingma, Welling, et al.; “Tutorial on Variational Autoencoders” by Carl Doersch