Lecture 19: Generative Models, Part 1 Justin Johnson November 11, - PowerPoint PPT Presentation

Cat image is CC0 public domain Dog image is CC0 Public Domain Monkey image is CC0 Public Domain Abstract image is free to use under the Pixabay license Discriminative vs Generative Models P( |cat) Discriminative Model: P( |cat) P( |cat) P( |cat) Learn a probability distribution p(y|x) P( |dog) P( |dog) P( |dog) P( |dog) Generative Model : … Learn a probability distribution p(x) Conditional Generative Conditional Generative Model: Each possible Model: Learn p(x|y) label induces a competition among all images Justin Johnson November 11, 2020 Lecture 19 - 25

Cat image is CC0 public domain Dog image is CC0 Public Domain Monkey image is CC0 Public Domain Abstract image is free to use under the Pixabay license Discriminative vs Generative Models Recall Bayes’ Rule: Discriminative Model: Learn a probability distribution p(y|x) 𝑄 𝑦 𝑧) = 𝑄 𝑧 𝑦) 𝑄(𝑦) Generative Model : 𝑄 𝑧 Learn a probability distribution p(x) Conditional Generative Model: Learn p(x|y) Justin Johnson November 11, 2020 Lecture 19 - 26

Cat image is CC0 public domain Dog image is CC0 Public Domain Monkey image is CC0 Public Domain Abstract image is free to use under the Pixabay license Discriminative vs Generative Models Recall Bayes’ Rule: Discriminative Model: Learn a probability Discriminative Model (Unconditional) distribution p(y|x) Generative Model 𝑄 𝑦 𝑧) = 𝑄 𝑧 𝑦) 𝑄(𝑦) Generative Model : 𝑄 𝑧 Learn a probability Conditional distribution p(x) Prior over labels Generative Model We can build a conditional generative Conditional Generative model from other components! Model: Learn p(x|y) Justin Johnson November 11, 2020 Lecture 19 - 27

What can we do with a discriminative model? Discriminative Model: Assign labels to data Learn a probability Feature learning (with labels) distribution p(y|x) Generative Model : Learn a probability distribution p(x) Conditional Generative Model: Learn p(x|y) Justin Johnson November 11, 2020 Lecture 19 - 28

What can we do with a generative model? Discriminative Model: Assign labels to data Learn a probability Feature learning (with labels) distribution p(y|x) Detect outliers Generative Model : Feature learning (without labels) Learn a probability distribution p(x) Sample to generate new data Conditional Generative Model: Learn p(x|y) Justin Johnson November 11, 2020 Lecture 19 - 29

What can we do with a generative model? Discriminative Model: Assign labels to data Learn a probability Feature learning (with labels) distribution p(y|x) Detect outliers Generative Model : Feature learning (without labels) Learn a probability distribution p(x) Sample to generate new data Assign labels, while rejecting outliers! Conditional Generative Generate new data conditioned on input labels Model: Learn p(x|y) Justin Johnson November 11, 2020 Lecture 19 - 30

Taxonomy of Generative Models Generative models Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 31

Taxonomy of Generative Models Model does not explicitly compute p(x), but can Model can Generative models sample from p(x) compute p(x) Explicit density Implicit density Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 32

Taxonomy of Generative Models Model does not explicitly compute p(x), but can Model can Generative models sample from p(x) compute p(x) Explicit density Implicit density Can compute approximation to p(x) Tractable density Approximate density Can compute p(x) Autoregressive - NADE / MADE - NICE / RealNVP - Glow - Ffjord - Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 33

Taxonomy of Generative Models Model does not explicitly compute p(x), but can Model can Generative models sample from p(x) compute p(x) Explicit density Implicit density Can compute approximation to p(x) Tractable density Approximate density Can compute p(x) Autoregressive - NADE / MADE - Variational Markov Chain NICE / RealNVP - Glow - Variational Autoencoder Boltzmann Machine Ffjord - Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 34

Taxonomy of Generative Models Model does not explicitly compute p(x), but can Model can Generative models sample from p(x) compute p(x) Explicit density Implicit density Can compute approximation to p(x) Tractable density Approximate density Markov Chain Direct GSN Generative Adversarial Can compute p(x) Autoregressive Networks (GANs) - NADE / MADE - Variational Markov Chain NICE / RealNVP - Glow - Variational Autoencoder Boltzmann Machine Ffjord - Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 35

Taxonomy of Generative Models Model does not explicitly compute p(x), but can Model can Generative models sample from p(x) compute p(x) Explicit density Implicit density Can compute approximation to p(x) Tractable density Approximate density Markov Chain Direct GSN Generative Adversarial Can compute p(x) Autoregressive Networks (GANs) - NADE / MADE - Variational Markov Chain NICE / RealNVP - We will talk Glow - Variational Autoencoder Boltzmann Machine about these Ffjord - Figure adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Justin Johnson November 11, 2020 Lecture 19 - 36

Autoregressive models Justin Johnson November 11, 2020 Lecture 19 - 37

Explicit Density Estimation Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Justin Johnson November 11, 2020 Lecture 19 - 38

Explicit Density Estimation Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Given dataset 𝑦 (#) , 𝑦 (%) , … 𝑦 & , train the model by solving: 𝑋 ∗ = arg max 𝑞(𝑦 ) ) Maximize probability of training data ( 2 (Maximum likelihood estimation) ) Justin Johnson November 11, 2020 Lecture 19 - 39

Explicit Density Estimation Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Given dataset 𝑦 (#) , 𝑦 (%) , … 𝑦 & , train the model by solving: 𝑋 ∗ = arg max 𝑞(𝑦 ) ) Maximize probability of training data ( 2 (Maximum likelihood estimation) ) * ∑ ) log 𝑞(𝑦 ) ) = arg max Log trick to exchange product for sum Justin Johnson November 11, 2020 Lecture 19 - 40

Explicit Density Estimation Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Given dataset 𝑦 (#) , 𝑦 (%) , … 𝑦 & , train the model by solving: 𝑋 ∗ = arg max 𝑞(𝑦 ) ) Maximize probability of training data ( 2 (Maximum likelihood estimation) ) * ∑ ) log 𝑞(𝑦 ) ) = arg max Log trick to exchange product for sum * ∑ ) log 𝑔(𝑦 ) , 𝑋) This will be our loss function! = arg max Train with gradient descent Justin Johnson November 11, 2020 Lecture 19 - 41

Explicit Density: Autoregressive Models Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Assume x consists of 𝑦 = 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ multiple subparts: Justin Johnson November 11, 2020 Lecture 19 - 42

Explicit Density: Autoregressive Models Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Assume x consists of 𝑦 = 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ multiple subparts: 𝑞 𝑦 = 𝑞 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ Break down probability using the chain rule: = 𝑞 𝑦 ! 𝑞 𝑦 " 𝑦 ! )𝑞 𝑦 # 𝑦 ! , 𝑦 " ) … Justin Johnson November 11, 2020 Lecture 19 - 43

Explicit Density: Autoregressive Models Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Assume x consists of 𝑦 = 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ multiple subparts: 𝑞 𝑦 = 𝑞 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ Break down probability using the chain rule: = 𝑞 𝑦 ! 𝑞 𝑦 " 𝑦 ! )𝑞 𝑦 # 𝑦 ! , 𝑦 " ) … $ = ∏ %&! 𝑞 𝑦 % 𝑦 ! , … , 𝑦 %'! ) Probability of the next subpart given all the previous subparts Justin Johnson November 11, 2020 Lecture 19 - 44

Explicit Density: Autoregressive Models Goal : Write down an explicit function for 𝑞 𝑦 = 𝑔(𝑦, 𝑋) Assume x consists of 𝑦 = 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ multiple subparts: 𝑞 𝑦 = 𝑞 𝑦 ! , 𝑦 " , 𝑦 # , … , 𝑦 $ Break down probability using the chain rule: = 𝑞 𝑦 ! 𝑞 𝑦 " 𝑦 ! )𝑞 𝑦 # 𝑦 ! , 𝑦 " ) … $ = ∏ %&! 𝑞 𝑦 % 𝑦 ! , … , 𝑦 %'! ) p(x 1 ) p(x 2 ) p(x 3 ) p(x 4 ) We’ve already seen this! Probability of the next subpart h 1 h 2 h 3 h 4 Language given all the previous subparts modeling with x 0 x 1 x 2 x 3 an RNN! Justin Johnson November 11, 2020 Lecture 19 - 45

PixelRNN Generate image pixels one at a time, starting at the upper left corner Compute a hidden state for each pixel that depends on hidden states and RGB values from the left and from above (LSTM recurrence) ℎ !,# = 𝑔(ℎ !$%,# , ℎ !,#$% , 𝑋) At each pixel, predict red, then blue, then green: softmax over [0, 1, …, 255] Van den Oord et al, “Pixel Recurrent Neural Networks”, ICML 2016 Justin Johnson November 11, 2020 Lecture 19 - 46

PixelRNN Generate image pixels one at a time, starting at the upper left corner Compute a hidden state for each pixel that depends on hidden states and RGB values from the left and from above (LSTM recurrence) ℎ !,# = 𝑔(ℎ !$%,# , ℎ !,#$% , 𝑋) At each pixel, predict red, then blue, then green: softmax over [0, 1, …, 255] Each pixel depends implicity on all pixels above and to the left: Van den Oord et al, “Pixel Recurrent Neural Networks”, ICML 2016 Justin Johnson November 11, 2020 Lecture 19 - 53

PixelRNN Generate image pixels one at a time, starting at the upper left corner Compute a hidden state for each pixel that depends on hidden states and RGB values from the left and from above (LSTM recurrence) ℎ !,# = 𝑔(ℎ !$%,# , ℎ !,#$% , 𝑋) At each pixel, predict red, then blue, then green: softmax over [0, 1, …, 255] Each pixel depends implicity on all pixels above and to the left: Van den Oord et al, “Pixel Recurrent Neural Networks”, ICML 2016 Justin Johnson November 11, 2020 Lecture 19 - 54

PixelRNN Problem: Very slow during both training and testing; N x N image Generate image pixels one at a time, starting at requires 2N-1 sequential steps the upper left corner Compute a hidden state for each pixel that depends on hidden states and RGB values from the left and from above (LSTM recurrence) ℎ !,# = 𝑔(ℎ !$%,# , ℎ !,#$% , 𝑋) At each pixel, predict red, then blue, then green: softmax over [0, 1, …, 255] Each pixel depends implicity on all pixels above and to the left: Van den Oord et al, “Pixel Recurrent Neural Networks”, ICML 2016 Justin Johnson November 11, 2020 Lecture 19 - 55

PixelCNN Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Van den Oord et al, “Conditional Image Generation with PixelCNN Decoders”, NeurIPS 2016 Justin Johnson November 11, 2020 Lecture 19 - 56

PixelCNN Softmax loss at each pixel Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training: maximize likelihood of training images Van den Oord et al, “Conditional Image Generation with PixelCNN Decoders”, NeurIPS 2016 Justin Johnson November 11, 2020 Lecture 19 - 57

PixelCNN Softmax loss at each pixel Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training: maximize likelihood of training images Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow Van den Oord et al, “Conditional Image Generation with PixelCNN Decoders”, NeurIPS 2016 Justin Johnson November 11, 2020 Lecture 19 - 58

PixelRNN: Generated Samples 32x32 ImageNet 32x32 CIFAR-10 Van den Oord et al, “Pixel Recurrent Neural Networks”, ICML 2016 Justin Johnson November 11, 2020 Lecture 19 - 59

Autoregressive Models: PixelRNN and PixelCNN Improving PixelCNN performance - Gated convolutional layers Pros: - Short-cut connections - Can explicitly compute likelihood p(x) - Discretized logistic loss - Explicit likelihood of training data - Multi-scale gives good evaluation metric - Training tricks - Good samples - Etc… Con: See - Sequential generation => slow - Van der Oord et al. NIPS 2016 - Salimans et al. 2017 (PixelCNN++) Justin Johnson November 11, 2020 Lecture 19 - 60

Variational Autoencoders Justin Johnson November 11, 2020 Lecture 19 - 61

Variational Autoencoders PixelRNN / PixelCNN explicitly parameterizes density function with a neural network, so we can train to maximize likelihood of training data: Variational Autoencoders (VAE) define an intractable density that we cannot explicitly compute or optimize But we will be able to directly optimize a lower bound on the density Justin Johnson November 11, 2020 Lecture 19 - 62

Variational Autoencoders Justin Johnson November 11, 2020 Lecture 19 - 63

(Regular, non-variational) Autoencoders Unsupervised method for learning feature vectors from raw data x, without any labels Features should extract useful information (maybe object identities, Originally : Linear + nonlinearity (sigmoid) properties, scene type, etc) that we Later : Deep, fully-connected can use for downstream tasks Later : ReLU CNN Features Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 64

(Regular, non-variational) Autoencoders Problem : How can we learn this feature transform from raw data? Features should extract useful information (maybe object identities, Originally : Linear + nonlinearity (sigmoid) properties, scene type, etc) that we Later : Deep, fully-connected can use for downstream tasks Later : ReLU CNN But we can’t observe features! Features Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 65

(Regular, non-variational) Autoencoders Problem : How can we learn this feature transform from raw data? Idea : Use the features to reconstruct the input data with a decoder “Autoencoding” = encoding itself Originally : Linear + nonlinearity (sigmoid) Reconstructed Later : Deep, fully-connected input data Later : ReLU CNN (upconv) Decoder Features Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 66

(Regular, non-variational) Autoencoders Loss : L2 distance between input and reconstructed data. Loss Function Does not use any ! 𝑦 − 𝑦 ! ' labels! Just raw data! Reconstructed input data Decoder Features Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 67

Reconstructed data (Regular, non-variational) Autoencoders Loss : L2 distance between input and reconstructed data. Loss Function Does not use any ! 𝑦 − 𝑦 ! ' labels! Just raw data! Decoder: 4 tconv layers Reconstructed Encoder: input data 4 conv layers Decoder Features Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 68

Reconstructed data (Regular, non-variational) Autoencoders Loss : L2 distance between input and reconstructed data. Loss Function Does not use any ! 𝑦 − 𝑦 ! ' labels! Just raw data! Decoder: 4 tconv layers Reconstructed Encoder: input data 4 conv layers Decoder Features need to be Features lower dimensional than the data Encoder Input data Input Data Justin Johnson November 11, 2020 Lecture 19 - 69

(Regular, non-variational) Autoencoders After training, throw away decoder and use encoder for a downstream task Reconstructed input data Decoder After training, Features throw away decoder Encoder Input data Justin Johnson November 11, 2020 Lecture 19 - 70

(Regular, non-variational) Autoencoders After training, throw away decoder and use encoder for a downstream task Loss function Encoder can be (Softmax, etc) used to initialize a supervised model Predicted Label bird plane Fine-tune Classifier truck dog deer encoder jointly with Features classifier Encoder Train for final task (sometimes with Input data small data) Justin Johnson November 11, 2020 Lecture 19 - 71

(Regular, non-variational) Autoencoders Autoencoders learn latent features for data without any labels! Can use features to initialize a supervised model Not probabilistic: No way to sample new data from learned model Reconstructed input data Decoder Features Encoder Input data Justin Johnson November 11, 2020 Lecture 19 - 72

Variational Autoencoders Kingma and Welling, Auto-Encoding Variational Beyes, ICLR 2014 Justin Johnson November 11, 2020 Lecture 19 - 73

Variational Autoencoders Probabilistic spin on autoencoders: 1. Learn latent features z from raw data 2. Sample from the model to generate new data Justin Johnson November 11, 2020 Lecture 19 - 74

Variational Autoencoders ( Assume training data 𝑦 & Probabilistic spin on autoencoders: is &'% 1. Learn latent features z from raw data generated from unobserved (latent) 2. Sample from the model to generate new data representation z Intuition: x is an image, z is latent factors used to generate x: attributes, orientation, etc. Justin Johnson November 11, 2020 Lecture 19 - 75

Variational Autoencoders ( Assume training data 𝑦 & Probabilistic spin on autoencoders: is &'% 1. Learn latent features z from raw data generated from unobserved (latent) 2. Sample from the model to generate new data representation z Intuition: x is an image, z is latent After training, sample new data like this: factors used to generate x: Sample from attributes, orientation, etc. conditional Sample z from prior Justin Johnson November 11, 2020 Lecture 19 - 76

Variational Autoencoders ( Assume training data 𝑦 & Probabilistic spin on autoencoders: is &'% 1. Learn latent features z from raw data generated from unobserved (latent) 2. Sample from the model to generate new data representation z Intuition: x is an image, z is latent After training, sample new data like this: factors used to generate x: Sample from attributes, orientation, etc. conditional Assume simple prior p(z), e.g. Gaussian Sample z from prior Justin Johnson November 11, 2020 Lecture 19 - 77

Variational Autoencoders ( Assume training data 𝑦 & Probabilistic spin on autoencoders: is &'% 1. Learn latent features z from raw data generated from unobserved (latent) 2. Sample from the model to generate new data representation z Intuition: x is an image, z is latent After training, sample new data like this: factors used to generate x: Sample from attributes, orientation, etc. conditional Assume simple prior p(z), e.g. Gaussian Sample z Represent p(x|z) with a neural network from prior (Similar to decoder from autencoder) Justin Johnson November 11, 2020 Lecture 19 - 78

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean μ x|z and (diagonal) covariance ∑ x|z Intuition: x is an image, z is latent factors used to generate x: Sample from attributes, orientation, etc. conditional Assume simple prior p(z), e.g. Gaussian Sample z Represent p(x|z) with a neural network from prior (Similar to decoder from autencoder) Justin Johnson November 11, 2020 Lecture 19 - 79

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional If we could observe the z for each x, then could train a conditional generative model Sample z p(x|z) from prior Justin Johnson November 11, 2020 Lecture 19 - 80

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional We don’t observe z, so need to marginalize: 𝑞 " 𝑦 = ! 𝑞 " 𝑦, 𝑨 𝑒𝑨 = ! 𝑞 " 𝑦 𝑨 𝑞 " 𝑨 𝑒𝑨 Sample z from prior Justin Johnson November 11, 2020 Lecture 19 - 81

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional We don’t observe z, so need to marginalize: 𝑞 " 𝑦 = ! 𝑞 " 𝑦, 𝑨 𝑒𝑨 = ! 𝑞 " 𝑦 𝑨 𝑞 " 𝑨 𝑒𝑨 Sample z from prior Ok, can compute this with decoder network Justin Johnson November 11, 2020 Lecture 19 - 82

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional We don’t observe z, so need to marginalize: 𝑞 " 𝑦 = ! 𝑞 " 𝑦, 𝑨 𝑒𝑨 = ! 𝑞 " 𝑦 𝑨 𝑞 " 𝑨 𝑒𝑨 Sample z from prior Ok, we assumed Gaussian prior for z Justin Johnson November 11, 2020 Lecture 19 - 83

Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional We don’t observe z, so need to marginalize: 𝑞 " 𝑦 = ! 𝑞 " 𝑦, 𝑨 𝑒𝑨 = ! 𝑞 " 𝑦 𝑨 𝑞 " 𝑨 𝑒𝑨 Sample z from prior Problem: Impossible to integrate over all z! Justin Johnson November 11, 2020 Lecture 19 - 84

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Sample z 𝑞 " 𝑨 𝑦) from prior Justin Johnson November 11, 2020 Lecture 19 - 85

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Ok, compute with Sample z 𝑞 " 𝑨 𝑦) decoder network from prior Justin Johnson November 11, 2020 Lecture 19 - 86

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Ok, we assumed Sample z 𝑞 " 𝑨 𝑦) Gaussian prior from prior Justin Johnson November 11, 2020 Lecture 19 - 87

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Problem : No way Sample z 𝑞 " 𝑨 𝑦) to compute this! from prior Justin Johnson November 11, 2020 Lecture 19 - 88

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Solution: Train Sample z another network 𝑞 " 𝑨 𝑦) from prior (encoder) that learns 𝑟 ! 𝑨 𝑦) ≈ 𝑞 " 𝑨 𝑦) Justin Johnson November 11, 2020 Lecture 19 - 89

Recall 𝑞 𝑦, 𝑨 = 𝑞 𝑦 𝑨 𝑞 𝑨 = 𝑞 𝑨 𝑦 𝑞 𝑦 Variational Autoencoders Decoder must be probabilistic : ( Assume training data 𝑦 & is Decoder inputs z, outputs mean μ x|z &'% and (diagonal) covariance ∑ x|z generated from unobserved (latent) representation z Sample x from Gaussian with mean How to train this model? μ x|z and (diagonal) covariance ∑ x|z Sample from Basic idea: maximize likelihood of data conditional Another idea: Try Bayes’ Rule: 𝑞 " 𝑦 = 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 ≈ 𝑞 " 𝑦 𝑨)𝑞 " 𝑨 Sample z 𝑞 " 𝑨 𝑦) 𝑟 # 𝑨 𝑦) from prior Use encoder to compute 𝑟 ) 𝑨 𝑦) ≈ 𝑞 * 𝑨 𝑦) Justin Johnson November 11, 2020 Lecture 19 - 90

Variational Autoencoders Decoder network inputs Encoder network inputs latent code z, gives data x, gives distribution If we can ensure that distribution over data x over latent codes z 𝑟 # 𝑨 𝑦) ≈ 𝑞 " 𝑨 𝑦) , 𝑞 " 𝑦 | 𝑨 = 𝑂(𝜈 $|& , Σ $|& ) 𝑟 # 𝑨 | 𝑦 = 𝑂(𝜈 &|$ , Σ &|$ ) then we can approximate 𝑞 " 𝑦 ≈ 𝑞 " 𝑦 𝑨)𝑞(𝑨) 𝑟 # 𝑨 𝑦) Idea : Jointly train both encoder and decoder Justin Johnson November 11, 2020 Lecture 19 - 91

Variational Autoencoders log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) 𝑞 ( 𝑨 𝑦) Bayes’ Rule Justin Johnson November 11, 2020 Lecture 19 - 92

Variational Autoencoders 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) = log 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) Multiply top and bottom by q Φ (z|x) Justin Johnson November 11, 2020 Lecture 19 - 93

Variational Autoencoders 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) = log 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) 𝑟 ) 𝑨|𝑦 𝑟 ) (𝑨|𝑦) = log 𝑞 ( 𝑦 𝑨 − log + log 𝑞(𝑨) 𝑞 ( (𝑨|𝑦) Split up using rules for logarithms Justin Johnson November 11, 2020 Lecture 19 - 94

Variational Autoencoders 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) c = log 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) c 𝑟 ) 𝑨|𝑦 𝑟 ) (𝑨|𝑦) c = log 𝑞 ( 𝑦 𝑨 − log + log 𝑞(𝑨) 𝑞 ( (𝑨|𝑦) Split up using rules for logarithms Justin Johnson November 11, 2020 Lecture 19 - 95

Variational Autoencoders 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) = log 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) 𝑟 ) 𝑨|𝑦 𝑟 ) (𝑨|𝑦) = log 𝑞 ( 𝑦 𝑨 − log + log 𝑞(𝑨) 𝑞 ( (𝑨|𝑦) We can wrap in an log 𝑞 ( 𝑦 = 𝐹 *~, ' (*|/) log 𝑞 ( (𝑦) expectation since it doesn’t depend on z Justin Johnson November 11, 2020 Lecture 19 - 96

Variational Autoencoders = log 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) = 𝐹 * [log 𝑞 ( (𝑦|𝑨)] − 𝐹 * log 𝑟 ) 𝑨 𝑦 + 𝐹 * log 𝑟 ) (𝑨|𝑦) 𝑞 𝑨 𝑞 ( (𝑨|𝑦) We can wrap in an log 𝑞 ( 𝑦 = 𝐹 *~, ' (*|/) log 𝑞 ( (𝑦) expectation since it doesn’t depend on z Justin Johnson November 11, 2020 Lecture 19 - 97

Variational Autoencoders = log 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) = 𝐹 * [log 𝑞 ( (𝑦|𝑨)] − 𝐹 * log 𝑟 ) 𝑨 𝑦 + 𝐹 * log 𝑟 ) (𝑨|𝑦) 𝑞 𝑨 𝑞 ( (𝑨|𝑦) = 𝐹 (~* + ((|-) [log 𝑞 . (𝑦|𝑨)] − 𝐸 /0 𝑟 1 𝑨 𝑦 , 𝑞 𝑨 + 𝐸 /0 (𝑟 1 𝑨 𝑦 , 𝑞 . 𝑨 𝑦 ) KL divergence between prior, and samples from the encoder network Justin Johnson November 11, 2020 Lecture 19 - 99

Variational Autoencoders = log 𝑞 ( 𝑦 𝑨 𝑞 𝑨 𝑟 ) (𝑨|𝑦) log 𝑞 ( (𝑦) = log 𝑞 ( 𝑦 𝑨)𝑞(𝑨) 𝑞 ( 𝑨 𝑦) 𝑞 ( 𝑨 𝑦 𝑟 ) (𝑨|𝑦) = 𝐹 * [log 𝑞 ( (𝑦|𝑨)] − 𝐹 * log 𝑟 ) 𝑨 𝑦 + 𝐹 * log 𝑟 ) (𝑨|𝑦) 𝑞 𝑨 𝑞 ( (𝑨|𝑦) = 𝐹 (~* + ((|-) [log 𝑞 . (𝑦|𝑨)] − 𝐸 /0 𝑟 1 𝑨 𝑦 , 𝑞 𝑨 + 𝐸 /0 (𝑟 1 𝑨 𝑦 , 𝑞 . 𝑨 𝑦 ) KL divergence between encoder and posterior of decoder Justin Johnson November 11, 2020 Lecture 19 - 100

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, - PowerPoint PPT Presentation

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, 2020 Lecture 19 - 1 Reminder: Assignment 5 A5 released; due Monday November 16, 11:59pm EST A5 covers object detection: - Single-stage detectors - Two-stage detectors

generative design systems Generative Brief Design Definitions Workshop Processes

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Evaluating Generative Models Stefano Ermon, Aditya Grover Stanford University Lecture 13

Evaluating Generative Models Stefano Ermon, Aditya Grover Stanford University Lecture 11

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

LEARNING GENERATIVE MODELS ACROSS INCOMPARABLE SPACES Cha harlot otte Bunne unne , David

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model

Geometric Spanner Networks Course Outline Textbook Introduction Algorithms Review Greedy

Digital Integrated Circuits Chapter 6 The CMOS Inverter EEL7312 INE5442 1 Digital

Spanners Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Distances

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua

The strong asymptotic freeness of large random and deterministic matrices Camille Male

Thermodynamics of the BMN matrix model at strong coupling Miguel S. Costa Faculdade de Cincias

Towards Energy Efficient XPath Evaluation in Wireless Sensor Networks N. Hoeller, C. Reinke, J.

Few-Body in EFT Gautam Rupak Mississippi State University EMMI: The Systematic Treatment of the

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, - PowerPoint PPT Presentation

Lecture 19: Generative Models, Part 1 Justin Johnson November 11, 2020 Lecture 19 - 1 Reminder: Assignment 5 A5 released; due Monday November 16, 11:59pm EST A5 covers object detection: - Single-stage detectors - Two-stage detectors

generative design systems Generative Brief Design Definitions Workshop Processes

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Evaluating Generative Models Stefano Ermon, Aditya Grover Stanford University Lecture 13

Evaluating Generative Models Stefano Ermon, Aditya Grover Stanford University Lecture 11

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

LEARNING GENERATIVE MODELS ACROSS INCOMPARABLE SPACES Cha harlot otte Bunne unne , David

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model

Geometric Spanner Networks Course Outline Textbook Introduction Algorithms Review Greedy

Digital Integrated Circuits Chapter 6 The CMOS Inverter EEL7312 INE5442 1 Digital

Spanners Social and Technological Networks Rik Sarkar University of Edinburgh, 2018. Distances

Genera&amp;ve Stochas&amp;c Networks Trainable by Backprop Yoshua

The strong asymptotic freeness of large random and deterministic matrices Camille Male

Thermodynamics of the BMN matrix model at strong coupling Miguel S. Costa Faculdade de Cincias

Towards Energy Efficient XPath Evaluation in Wireless Sensor Networks N. Hoeller, C. Reinke, J.

Few-Body in EFT Gautam Rupak Mississippi State University EMMI: The Systematic Treatment of the

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Genera&ve Stochas&c Networks Trainable by Backprop Yoshua