Introduction to Deep Models Part II: Variational Autoencoders and - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Outline 1 Variational Autoencoders Autoencoder Models Variational Autoencoders Reparameterization Trick 2 Latent Represenations Bayesian Framework Kullback–Leibler Divergence Latent Space Traversal SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Feature Extraction with Autoencoders As discussed in Part I, the process of manually defining features is typically infeasible for complex datasets. The hidden layers of neural networks naturally define features to a certain degree, however, we may wish to find a collection of features which completely characterizes a given example. To be precise, we must first clarify what it means to “completely characterize” an example. A simple, but natural, way to define this concept is to say that a set of features characterizes an example if the full example can be reproduced from those features alone. Although it may sound rather trivial at first, this leads to a natural approach for automating feature extraction: train a neural network to learn the identity mapping, and introduce a bottleneck layer to force a reduction in the data/feature dimensions. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Autoencoder Model Input Hidden Encoded Hidden Reconstructed SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Auto-Encoding Variational Bayes Kingma, D.P . and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. A particularly effective autoencoding model, introduced by Kingma and Welling in 2013, is the variational autoencoder (VAE). The VAE model is defined in terms of a probabilistic, Bayesian framework. In this framework, the features at the bottleneck of the network are interpretted as unobservable latent variables . To approximate the underlying Bayesian model, VAE networks introduce a sampling procedure in the latent variable space. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Variational Autoencoder Input Hidden Latent Hidden Reconstructed ε SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Variational Autoencoder Graph [TensorBoard] SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Sampling Procedure The encoder produces means { µ k } and standard deviations { σ k } corresponding to a collection of independent normal distributions for the latent variables. A vector ε is sampled from a normal distribution N (0 , I ) and the sample latent vector is defined by: z = µ + σ ⊙ ε The introduction of the standard normal sample ε , referred to as the “reparameterization trick”, is used to maintain a differentiable relation between the weights of the network and the loss function (since the sample ε is fixed at each step). This allows us to train the network end-to-end using the backpropogation method. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Sampling Procedure Practical Implementation In practice, it is not numerically stable to work with the standard deviations { σ k } directly; instead, the network is trained to predict the values { log( σ k ) } and the latent vector is sampled via: z = µ + exp(log σ ) ⊙ ε This has the additional benefit of removing the restriction that the network predictions for { σ k } must always be positive. SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Variational Bayesian Model The VAE framework aims to approximate the intractable poseterior distritbution p θ ( z | x ) in the latent space by a recognition model : q φ ( z | x ) ∼ distribution of z given x where φ correspond to the model parameters of the encoder component of the network, and θ correspond to the parameters of the network’s decoder which is used to define a generative model : p θ ( x | z ) ∼ distribution of x given z SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Variational Bayesian Model SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Motivation for Kullback–Leibler Divergence When using an encoder/decoder model structure it is helpful to anchor the input values received by the decoder during training (similar to the motivation for batch normalization). For example, the encoder component may learn to produce latent representations distributed according to a normal distribution N ( µ, Σ) for some mean vector µ and covariance matrix Σ . However, this latent distribution can be shifted arbitrarily without affecting the theoretically attainable performance of the network. In particular, there are an infinite number of model configurations which can achieve the optimal level of performance. The lack of a unique solution can be problematic during training; to address this, we can attempt to bias the encoder toward the distribution N (0 , I ) . SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Kullback–Leibler Divergence The KL-Divergence is introduced to the loss as a regularization term; assuming the prior is taken to be a standard normal: � � � � � �� = 1 � � + µ T µ − N − log det KL N ( µ, Σ) � N std tr Σ Σ � 2 Model accuracy is accounted for by the “reconstruction loss” term: � � log p θ ( x | z ) E q φ ( z | x ) The full loss function is then defined to be the negative Expectation Lower Bound (ELBO) which, after some manipulation, is given by: � � � � � � � � − ELBO ( θ, φ ) = KL q φ ( z | x ) � � N std − E q φ ( z | x ) log p θ ( x | z ) SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Kullback–Leibler Divergence Example For a diagonal covariance, e.g.independent latent variables, with parameters { µ k } and { σ k } the KL-Divergence reduces to: � � � � � � N � = 1 � � σ 2 k + µ 2 k − 1 − log σ 2 KL q φ ( z | x ) � N std � k 2 k =1 In the case of binary classification (assuming a Bernoulli prior distribution), the reconstruction loss coincides precisely with the negative binary cross entropy; i.e. setting � x = D ( z ) , we have: � � log p θ ( x | z ) = x · log( � x ) + (1 − x ) · log(1 − � x ) E q φ ( z | x ) SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Example: Latent Space Interpolation Once the VAE model is trained, we can investigate the learned latent representations by decoding points in the latent space. For example, after training a VAE model on the MNIST dataset we can use the encoder (i.e. recognition model) to retrieve the latent representations of two handwritten digits, e.g. z 0 = E [ x 0 ] and z 1 = E [ x 1 ] where x 0 ∼ “3” and x 1 ∼ “7” . Linear interpolation can then be used to visualize the path connecting the two data points: � � x θ = D (1 − θ ) · z 0 + θ · z 1 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part II

Introduction to Deep Models Part II: Variational Autoencoders and - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Arne Naess Founder of Deep Ecology: biospheric egalitarianism Coined term deep

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Language Models Language Models Dan Klein, John DeNero UC Berkeley Language Models Acoustic

Language Models Dan Klein, John DeNero UC Berkeley Language Models Language Models Acoustic

Evaluation of Deep Learning Evaluation of Deep Learning Models for Network Models for Network

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications

Hybrid Models with Deep and Invertible Features Eric Nalisnick , Akihiro Matsukawa, Yee Whye

South Deep: Finding Its Feet SOUTH DEEP SITE VISIT 13 February 2015 Nick Holland, Nico Muller

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

a Tool to Investigate the Laws of Gravity Luciano Iess Dipartimento di Ingegneria Meccanica e

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

Exploring the Design Space of Deep Convolu(onal Neural

CS4811 Artificial Intelligence Spiffy Introduction to AI Some slides from: Subbarao Kambhampati,

Tracking particles in space and time Besides a few indirect signals of new physics, particle

Deep Reinforcement Learning with a Natural Language Action Space Authors: Ji He, Jianshu Chen,

What is Artificial Intelligence? } Historical definition (Dartmouth Workshop on AI, 1956): The

Recursion, Efficiency, and the Time-Space Trade Off; Selection Sort and Big-Oh Checkout

Introduction to Deep Models Part II: Variational Autoencoders and - PowerPoint PPT Presentation

TensorFlow Workshop 2018 Introduction to Deep Models Part II: Variational Autoencoders and Latent Spaces Nick Winovich Department of Mathematics Purdue University July 2018 SIAM@Purdue 2018 - Nick Winovich Introduction to Deep Models : Part

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Arne Naess Founder of Deep Ecology: biospheric egalitarianism Coined term deep

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Language Models Language Models Dan Klein, John DeNero UC Berkeley Language Models Acoustic

Language Models Dan Klein, John DeNero UC Berkeley Language Models Language Models Acoustic

Evaluation of Deep Learning Evaluation of Deep Learning Models for Network Models for Network

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications

Hybrid Models with Deep and Invertible Features Eric Nalisnick *, Akihiro Matsukawa*, Yee Whye

South Deep: Finding Its Feet SOUTH DEEP SITE VISIT 13 February 2015 Nick Holland, Nico Muller

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

a Tool to Investigate the Laws of Gravity Luciano Iess Dipartimento di Ingegneria Meccanica e

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

Exploring the Design Space of Deep Convolu(onal Neural

CS4811 Artificial Intelligence Spiffy Introduction to AI Some slides from: Subbarao Kambhampati,

Tracking particles in space and time Besides a few indirect signals of new physics, particle

Deep Reinforcement Learning with a Natural Language Action Space Authors: Ji He, Jianshu Chen,

What is Artificial Intelligence? } Historical definition (Dartmouth Workshop on AI, 1956): The

Recursion, Efficiency, and the Time-Space Trade Off; Selection Sort and Big-Oh Checkout

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Hybrid Models with Deep and Invertible Features Eric Nalisnick , Akihiro Matsukawa, Yee Whye