advanced machine learning
play

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn how VAEs help in sampling from a data distribution Write the objective function of a VAE Derive how VAE objective is adapted for SGD VAE setup


  1. Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB

  2. Objectives • Learn how VAEs help in sampling from a data distribution • Write the objective function of a VAE • Derive how VAE objective is adapted for SGD

  3. VAE setup • We are interested in maximizing the data likelihood 𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨 • Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄 • Further, let us assume that 𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏 2 𝐽 Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  4. We do not care about distribution of z • Latent variable z is drawn from a standard normal z ~ N (0, I ) θ X N • It may represent many different variations of the data Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  5. Example of a variable transformation z X = g(z) = z/10 + z/‖z‖ Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  6. Because of Gaussian assumption, the most obvious variation may not be the most likely • Although the ‘2’ on the right is a better choice as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  7. Sampling z from standard normal is problematic • It may give samples of z that are unlikely to have produced X • Can we sample z itself intelligently? • Enter Q( z | X ) to compute, e.g., E z~Q P(X|z) • All we need to do is reduce the KL divergence between P(X) and E z~Q P(X|z) • Hence, a variational method Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  8. VAE Objective Setup D [ Q ( z ) ‖ P ( z | X )] = E z ~ Q [log Q (z) − log P ( z | X )] = E z ~ Q [log Q (z) − log P ( X | z ) − log P (z)] + log P ( X ) Rearranging some terms: log P ( X ) − D [ Q ( z ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z ) ‖ P ( z )] Introducing dependency of Q on X : log P ( X ) − D [ Q ( z | X ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  9. Optimizing the RHS • Q is encoding X into z; P ( X | z ) is decoding z • Assume in LHS Q ( z | X ) is a high capacity NN • For: E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] • Assume: Q ( z | X ) = N (z| μ ( X; θ ),∑( X; θ )) • Then KL divergence is: D [ N ( μ ( X ),Σ( X )) ‖ N ( 0 , I )] = 1/2 [ tr(Σ( X )) + μ ( X ) T μ ( X ) − k − log det(Σ( X )) ] • In SGD, the objective becomes maximizing: E X ∼ D *log P(X)−D*Q( z|X ) ‖ P( z|X)]] =E X ∼ D [E z ∼ Q [log P(X|z )+ − D*Q( z|X ) ‖ P(z )]] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  10. Moving the gradient inside the expectation • We need to compute the gradient of: log P(X|z ) − D*Q( z|X ) ‖ P(z)+ • The first term does not depend on parameters Q , but E z ∼ Q [log P(X|z)] does! • So, we need to generate z that are plausible, i.e. decodable Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  11. The actual model that resists backpropagation • Cannot backpropagate through a stochastic unit Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  12. The actual model that resists backpropagation Reparameterization trick: e ∼ N (0,I) and z=μ(X)+Σ 1/2 (X) ∗ e This works, if Q(z|X) and P(z) are continuous • E X ∼ D [E e ∼ N(0,I) [log P(X|z= μ( X) + Σ 1/2 (X) ∗ e)+−D*Q( z|X )‖P(z)++ • Now, we can BP end-to-end, because expectations are not with respect to distributions dependent on the model Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  13. Test-time sampling is straightforward • The encoder pathway, including the multiplication and addition are discarded • For getting an estimate of likelihood of a test sample, generate z, and then compute P(z|X) Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  14. Conditional VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  15. Sample results for a MNIST VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

  16. Sample results for a MNIST CVAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend