Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB

Objectives • Learn how VAEs help in sampling from a data distribution • Write the objective function of a VAE • Derive how VAE objective is adapted for SGD

VAE setup • We are interested in maximizing the data likelihood 𝑄 𝑌 = 𝑄 𝑌 𝑨; 𝜄 𝑄 𝑨 𝑒𝑨 • Let 𝑄 𝑌 𝑨; 𝜄 be modeled by 𝑔 𝑨; 𝜄 • Further, let us assume that 𝑄 𝑌 𝑨; 𝜄 = 𝒪 𝑌 𝑔 𝑨; 𝜄 , 𝜏 2 𝐽 Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

We do not care about distribution of z • Latent variable z is drawn from a standard normal z ~ N (0, I ) θ X N • It may represent many different variations of the data Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Example of a variable transformation z X = g(z) = z/10 + z/‖z‖ Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Because of Gaussian assumption, the most obvious variation may not be the most likely • Although the ‘2’ on the right is a better choice as a variation of the one on the left, the one in the middle is more likely due to the Gaussian assumption Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sampling z from standard normal is problematic • It may give samples of z that are unlikely to have produced X • Can we sample z itself intelligently? • Enter Q( z | X ) to compute, e.g., E z~Q P(X|z) • All we need to do is reduce the KL divergence between P(X) and E z~Q P(X|z) • Hence, a variational method Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

VAE Objective Setup D [ Q ( z ) ‖ P ( z | X )] = E z ~ Q [log Q (z) − log P ( z | X )] = E z ~ Q [log Q (z) − log P ( X | z ) − log P (z)] + log P ( X ) Rearranging some terms: log P ( X ) − D [ Q ( z ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z ) ‖ P ( z )] Introducing dependency of Q on X : log P ( X ) − D [ Q ( z | X ) ‖ P ( z | X )] = E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Optimizing the RHS • Q is encoding X into z; P ( X | z ) is decoding z • Assume in LHS Q ( z | X ) is a high capacity NN • For: E z ∼ Q [log P ( X | z )+ − D [ Q ( z | X ) ‖ P ( z )] • Assume: Q ( z | X ) = N (z| μ ( X; θ ),∑( X; θ )) • Then KL divergence is: D [ N ( μ ( X ),Σ( X )) ‖ N ( 0 , I )] = 1/2 [ tr(Σ( X )) + μ ( X ) T μ ( X ) − k − log det(Σ( X )) ] • In SGD, the objective becomes maximizing: E X ∼ D *log P(X)−D*Q( z|X ) ‖ P( z|X)]] =E X ∼ D [E z ∼ Q [log P(X|z )+ − D*Q( z|X ) ‖ P(z )]] Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Moving the gradient inside the expectation • We need to compute the gradient of: log P(X|z ) − D*Q( z|X ) ‖ P(z)+ • The first term does not depend on parameters Q , but E z ∼ Q [log P(X|z)] does! • So, we need to generate z that are plausible, i.e. decodable Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

The actual model that resists backpropagation • Cannot backpropagate through a stochastic unit Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

The actual model that resists backpropagation Reparameterization trick: e ∼ N (0,I) and z=μ(X)+Σ 1/2 (X) ∗ e This works, if Q(z|X) and P(z) are continuous • E X ∼ D [E e ∼ N(0,I) [log P(X|z= μ( X) + Σ 1/2 (X) ∗ e)+−D*Q( z|X )‖P(z)++ • Now, we can BP end-to-end, because expectations are not with respect to distributions dependent on the model Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Test-time sampling is straightforward • The encoder pathway, including the multiplication and addition are discarded • For getting an estimate of likelihood of a test sample, generate z, and then compute P(z|X) Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Conditional VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sample results for a MNIST VAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Sample results for a MNIST CVAE Source: VAEs by Kingma , Welling, et al.; “Tutorial on Variational Autoencoders ” by Carl Doersch

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, - PowerPoint PPT Presentation

Advanced Machine Learning Variational Auto-encoders Amit Sethi, EE, IITB Objectives Learn how VAEs help in sampling from a data distribution Write the objective function of a VAE Derive how VAE objective is adapted for SGD VAE setup

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Discrete and Continuous Reinforcement Learning (not part of exam material) 1 1 ADVANCED

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

hebma@nju.edu.cn The context of this lecture is based on the publication [10] and [13] XVI - 2

Variational principle (Ch. 7) Only Sec. 7.1 Theorem: For an arbitrary | , the ground state

variational methods Gabriele Bonanno University of Messina Ancona, June 6-8, 2011 Some remarks

Variational Methods for Path Integral Scattering J. Carron Paul-Scherrer Institute, Villigen

Total generalized variation: From regularization theory to applications in imaging Kristian

Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that

Welcome ! What's your HPV IQ? A Conversation About Immunization Quality Improvement Tools Will

Cop and robber games when the robber can hide and ride emie Chalopin 1 Victor Chepoi 1 Nicolas