CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – Variational Auto-Encoders (VAEs) – Reparameterization trick Dhruv Batra Georgia Tech

Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 7643 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 2

Administrativia • HW4 Grades Released – Regrade requests close: 12/03, 11:55pm – Please check solutions first! • Grade histogram: 4803 – Max possible: 100 (regular credit) + 40 (extra credit) (C) Dhruv Batra 3

Recap from last time (C) Dhruv Batra 4

Variational Autoencoders (VAE)

So far... PixelCNNs define tractable density function, optimize likelihood of training data: VAEs define intractable density function with latent z : Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders VAEs are a combination of the following ideas: 1. Auto Encoders 2. Variational Approximation • Variational Lower Bound / ELBO 3. Amortized Inference Neural Networks 4. “Reparameterization” Trick (C) Dhruv Batra 7

Autoencoders Reconstructed data Train such that features Doesn’t use labels! L2 Loss function: can be used to reconstruct original data Reconstructed Encoder : 4-layer conv input data Decoder : 4-layer upconv Decoder Input data Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoders Autoencoders can reconstruct data, and can learn features to initialize a supervised model Reconstructed Features capture factors of input data variation in training data. Can we generate new images from an Decoder autoencoder? Features Encoder Input data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

Key problem • P(z|x) (C) Dhruv Batra 12

What is Variational Inference? • Key idea – Reality is complex – Can we approximate it with something “simple”? – Just make sure simple thing is “close” to the complex thing. (C) Dhruv Batra 13

Intuition (C) Dhruv Batra 14

The general learning problem with missing data • Marginal likelihood – x is observed, z is missing: N Y ll ( θ : D ) = log P ( x i | θ ) i =1 N X = log P ( x i | θ ) i =1 N X X = log P ( x i , z | θ ) i =1 z (C) Dhruv Batra 15

Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 16

Applying Jensen’s inequality • Use: log å z P( z ) g( z ) ≥ å z P( z ) log g( z ) (C) Dhruv Batra 17

Evidence Lower Bound • Define potential function F( q ,Q): N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 18

ELBO: Factorization #1 (GMMs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 19

ELBO: Factorization #2 (VAEs) N Q i ( z ) log P ( x i , z | θ ) X X ll ( θ : D ) ≥ F ( θ , Q i ) = Q i ( z ) i =1 z (C) Dhruv Batra 20

Amortized Inference Neural Networks (C) Dhruv Batra 22

VAEs (C) Dhruv Batra 23 Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

Variational Autoencoders Probabilistic spin on autoencoders - will let us sample from the model to generate data! q 𝜚 𝑨 𝑦 p 𝜄 𝑦 𝑨 Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders: Generating Data Use decoder network. Now sample z from prior! Data manifold for 2-d z Sample x|z from Vary z 1 Decoder network Sample z from Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders: Generating Data Diagonal prior on z => independent Degree of smile latent variables Different Vary dimensions of z z 1 encode interpretable factors of variation Head pose Vary Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 z 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Plan for Today • VAEs – Reparameterization trick (C) Dhruv Batra 32

Variational Auto Encoders Maximize Putting it all together: maximizing the Sample x|z from likelihood of likelihood lower bound original input being reconstructed Decoder network Sample z from Make approximate posterior distribution close to prior Encoder network Input Data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Variational Auto Encoders Putting it all together: maximizing the likelihood lower bound Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Basic Problem n E z ∼ p θ ( z ) [ f ( z )] (C) Dhruv Batra 36

Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 37

Basic Problem min E z ∼ p θ ( z ) [ f ( z )] • Goal θ r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 38

Basic Problem r θ E z ∼ p θ ( z ) [ f ( z )] • Need to compute: (C) Dhruv Batra 39

Example (C) Dhruv Batra 40

Does this happen in supervised learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 41

But what about other kinds of learning? min E z ∼ p θ ( z ) [ f ( z )] • Goal θ (C) Dhruv Batra 42

Two Options • Score Function based Gradient Estimator aka REINFORCE (and variants) • Path Derivative Gradient Estimator aka “reparameterization trick” (C) Dhruv Batra 43

Option 1 • Score Function based Gradient Estimator aka REINFORCE (and variants) (C) Dhruv Batra 44

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs) Reparameterization trick Dhruv Batra Georgia Tech Administrativia HW4 Grades Released Regrade requests close: 12/03, 11:55pm Please check solutions

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 104 Computer Organization and Design Exceptions and Interrupts CS104: Exceptions and

UMBC A B M A L T F O U M B C I M Y O R T 1 (9/7/05) I E S R C E O V U

Programming Heterogeneous Systems F. Bodin June 2013 Uppsala Introduction HPC and embedded

MIKEYv2 ldondeti@qualcomm.com Why MIKEYv2? MIKEY is an efficient key management protocol

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Amortized Analysis

Level-Rebuilt B-Trees Gerth Stlting Brodal BRICS University of Aarhus Pankaj K. Agarwal Lars

Heaps Carola Wenk 9/8/17 1 CMPS 2200 Introduction to Algorithms Priority Queue A priority

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs) Reparameterization trick Dhruv Batra Georgia Tech Administrativia HW4 Grades Released Regrade requests close: 12/03, 11:55pm Please check solutions

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 104 Computer Organization and Design Exceptions and Interrupts CS104: Exceptions and

UMBC A B M A L T F O U M B C I M Y O R T 1 (9/7/05) I E S R C E O V U

Programming Heterogeneous Systems F. Bodin June 2013 Uppsala Introduction HPC and embedded

MIKEYv2 ldondeti@qualcomm.com Why MIKEYv2? MIKEY is an efficient key management protocol

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Amortized Analysis

Level-Rebuilt B-Trees Gerth Stlting Brodal BRICS University of Aarhus Pankaj K. Agarwal Lars

Heaps Carola Wenk 9/8/17 1 CMPS 2200 Introduction to Algorithms Priority Queue A priority

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward