About generative aspects of Variational Autoencoders LOD19 The - PowerPoint PPT Presentation

About generative aspects of Variational Autoencoders LOD’19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti DISI - Department of Informatics: Science and Engineering University of Bologna Mura Anteo Zamboni 7, 40127, Bologna, ITALY andrea.asperti@unibo.it Andrea Asperti - University of Bologna, DISI 1

Generative Models Generative models are meant to learn rich data distributions, allowing sampling of new data. Two main classes of generative models • Generative Adversarial Networks (GANs) • Variational Autoencoders (VAEs) At the current state of the art, GANs give better results. What is the problem with VAEs? Andrea Asperti - University of Bologna, DISI 2

Deterministic autoencoder An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance) DNN DNN Latent Encoder Decoder Space Andrea Asperti - University of Bologna, DISI 3

Deterministic autoencoder An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance) DNN DNN Latent Encoder Decoder Space Can we use the decoder to generate data by sampling in the latent space? Andrea Asperti - University of Bologna, DISI 4

Deterministic autoencoder An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance) DNN DNN Latent Encoder Decoder Space Can we use the decoder to generate data by sampling in the latent space? No, since we do not know the distribution of latent variables. Andrea Asperti - University of Bologna, DISI 5

Variational autoencoder In a Variational Autoencoder (VAE) [9, 10, 6] we try to force latent variables to have a known distribution (e.g. a Normal distribution) DNN DNN z~N(0,1) Encoder Decoder Andrea Asperti - University of Bologna, DISI 6

Variational autoencoder In a Variational Autoencoder (VAE) [9, 10, 6] we try to force latent variables to have a known distribution (e.g. a Normal distribution) DNN DNN z~N(0,1) Encoder Decoder How can we do it? Is this actually working? Andrea Asperti - University of Bologna, DISI 7

The encoding distribution Q ( z | X ) Latent Space = X 1 Q(z|X ) 1 Andrea Asperti - University of Bologna, DISI 8

Estimate relevant statistics for Q ( z | X ) Latent Space = X 1 Q(z|X ) 1 Andrea Asperti - University of Bologna, DISI 9

Estimate relevant statistics for Q ( z | X ) Latent Space X = 1 Q(z|X ) = G( , ) µ (X ) (X ) σ 1 1 1 Andrea Asperti - University of Bologna, DISI 10

Estimate relevant statistics for Q ( z | X ) Latent Space X = 1 Q(z|X ) = G( , ) µ (X ) (X ) σ 1 1 1 Q(z|X ) = G( , ) (X ) (X ) µ σ 2 2 2 X = 2 Andrea Asperti - University of Bologna, DISI 11

Estimate relevant statistics for Q ( z | X ) Latent Space X = 1 X = 2 We estimate the variance σ ( X ) around µ ( X ) by gaussian sampling at training time . Andrea Asperti - University of Bologna, DISI 12

Kullback-Leibler regularization Latent Space X = 1 N(0,1) X = 2 minimize the Kullback-Leibler distance between each Q ( z | X ) and a normal distribution: KL ( Q ( z | X ) || N (0 , 1)) Andrea Asperti - University of Bologna, DISI 13

The marginal posterior Latent Space X = 1 N(0,1) X = 2 The actual distribution of latent variables is the marginal (aka cumulative) distribution Q ( z ), hopefully resembling the prior P ( z ) = N (0 , 1) � Q ( z ) = Q ( z | X ) ≈ N (0 , 1) X Andrea Asperti - University of Bologna, DISI 14

MNIST case Disposition in the latent space of 100 MNIST digits after 10 epochs of training It does indeed have a Guassian shape... Why? Andrea Asperti - University of Bologna, DISI 15

Why is KL-divergence working? Many different answers ... relatively complex theory. In this article, we investigate the marginal posterior distribution as a Gaussian Mixture Model (GMM) (one gaussian for each data point). Andrea Asperti - University of Bologna, DISI 16

The normalization idea - For a neural network, it is relatively easy to perform an affine transformation of the latent space - The transformation can be compensated in the next layer of the network, keeping the loss invariant. (same idea behind batch-normalization layers) - This means we may assume the network is able to keep a fixed ratio ρ between the variance and the mean value of each latent variable. Andrea Asperti - University of Bologna, DISI 17

Pushing ρ in KL-divergence Pushing ρ in the closed form of the KL-divergence, we get the expres- sion 2( σ 2 ( X )1 + ρ 2 1 − log ( σ 2 ( X )) − 1) ρ 2 which has a minimum when σ 2 ( X ) + µ 2 ( X ) = 1 Andrea Asperti - University of Bologna, DISI 18

Corollaries - The variance law: averaging on all X, we expect that for each latent variable z � z ( X ) + σ 2 σ 2 z = 1 (supposing � µ z ( X ) = 0) - By effect of the KL divergence the two first moments of the distribution of each latent variable should agree with those of a Normal N (0 , 1) distribution - What about the other moments? Hard to guess. Andrea Asperti - University of Bologna, DISI 19

Conclusion For several years the cause of the mediocre performance of VAEs has been imputed to the so called overpruning phenomenon [2, 11, 12]. Recent research suggests the problem is due to the difformity between the latent distribution and the normal prior [4, 5, 1, 7]. Our contribution : we may reasonably expect the KL-divergence will force the two first moments of the distribution to agree with those of a Normal distribution, but we may hardly presume the same for the other moments. Andrea Asperti - University of Bologna, DISI 20

Essential bibliography (1) Andrea Asperti. Variational Autoencoders and the Variable Collapse Phenomenon Sensors & Transducers V.234, N.3, pages 1-8, 2018. Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. CoRR , abs/1509.00519, 2015. Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in β -vae. 2018. Bin Dai, Yu Wang, John Aston, Gang Hua, and David P. Wipf. Connections with robust PCA and the role of emergent sparsity in variational autoencoder models. Journal of Machine Learning Research , 19, 2018. Bin Dai and David P. Wipf. Diagnosing and enhancing vae models. In Seventh International Conference on Learning Representations (ICLR 2019), May 6-9, New Orleans , 2019. Carl Doersch. Tutorial on variational autoencoders. CoRR , abs/1606.05908, 2016. Andrea Asperti - University of Bologna, DISI 21

Essential bibliography (2) Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael J. Black, Bernhard Sch¨ olkopf From Variational to Deterministic Autoencoders CoRR, abs/1903.12436. Diederik P. Kingma, Tim Salimans, and Max Welling. Improving variational inference with inverse autoregressive flow. CoRR , abs/1606.04934, 2016. Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR , abs/1312.6114, 2013. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 , volume 32 of JMLR Workshop and Conference Proceedings , pages 1278–1286. JMLR.org, 2014. Serena Yeung, Anitha Kannan, and Yann Dauphin. Epitomic variational autoencoder. 2017. Serena Yeung, Anitha Kannan, Yann Dauphin, and Li Fei-Fei. Tackling over-pruning in variational autoencoders. CoRR , abs/1706.03643, 2017. Andrea Asperti - University of Bologna, DISI 22

About generative aspects of Variational Autoencoders LOD19 The - PowerPoint PPT Presentation

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti DISI -

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

Variational Autoencoders + Deep Generative Models Matt Gormley Lecture 27 Dec. 4, 2019 1

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

generative design systems Generative Brief Design Definitions Workshop Processes

Unsupervised Learning There is no direct ground truth for the quantity of interest

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Lecture 22 & 23: Variational Autoencoders April 2020 Lecturer: Steven Wu Scribe: Steven Wu

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

Latent Variables and Real-Time Forecasting in DSGE Models with Occasionally Binding

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation

Discriminating Languages in a Probabilistic Latent Subspace Aleksandr Sizov , Kong Aik Lee, Tomi

About generative aspects of Variational Autoencoders LOD19 The - PowerPoint PPT Presentation

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti DISI -

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

Variational Autoencoders + Deep Generative Models Matt Gormley Lecture 27 Dec. 4, 2019 1

CSC2541: Differentiable Inference and Generative Models Lecture 2: Variational autoencoders

generative design systems Generative Brief Design Definitions Workshop Processes

Unsupervised Learning There is no direct ground truth for the quantity of interest

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Lecture 22 &amp; 23: Variational Autoencoders April 2020 Lecturer: Steven Wu Scribe: Steven Wu

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

Latent Variables and Real-Time Forecasting in DSGE Models with Occasionally Binding

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation

Discriminating Languages in a Probabilistic Latent Subspace Aleksandr Sizov , Kong Aik Lee, Tomi

Lecture 22 & 23: Variational Autoencoders April 2020 Lecturer: Steven Wu Scribe: Steven Wu