Lecture #11 – Variational Autoencoders
Aykut Erdem // Hacettepe University // Spring 2020
CMP784
DEEP LEARNING
latent by Tom White
CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut - - PowerPoint PPT Presentation
latent by Tom White CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut Erdem // Hacettepe University // Spring 2020 Artificial faces synthesized by StyleGAN (Nvidia) Previously on CMP784 Supervised vs. Unsupervised
Lecture #11 – Variational Autoencoders
Aykut Erdem // Hacettepe University // Spring 2020
DEEP LEARNING
latent by Tom White
Previously on CMP784
Representation Learning
2
Artificial faces synthesized by StyleGAN (Nvidia)
Lecture overview
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
3
Lecture overview
coders s (VAEs) s)
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
4
Recap: Autoencoders
5
Encoder Decoder
Input Image Feature Representation Feed-back, generative, top-down path Feed-forward, bottom-up
generative, top-down Feed-forward, bottom-up
Parameter space of autoencoder
classes? If the AE learned the “essence”
should be close to each other.
separation.
the latent space.
6 Image taken from A. Glassner, Deep Learning, Vol. 2: From Basics to Practice
Traversing the latent space
space and then move to end of the arrow in 7 steps.
decoder to produce an image.
7 Image taken from A. Glassner, Deep Learning, Vol. 2: From Basics to Practice
Problems with Autoencoders
8
Lecture overview
chanics cs of VAEs
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
9
Generative models
x ∼ p(x) x ∼ N(µ, σ)
Generative models
z ∼ Unif(0, 1)
Generative models
z ∼ Unif(0, 1) x = ln z
Generative models
such as: where in general " is some complicated function.
# = "(&) # ∼ )(#)
# = "(&) # ∼ )(#) & ∼ *(&)
Traditional Autoencoders
some function mapping.
14
Encoder Decoder
z
! " = $(&) & = ℎ(")
Variational Autoencoders
stochasticity and think of it as a probabilistic modeling.
15
Encoder Decoder
z
Variational Autoencoders
16
Decoder
!(# $|&)
z
Sample from g(z) e.g. Standard Gaussian
& ∼ )(&) # $ = +(&) # $ ∼ !($|&)
Variational Autoencoders
17
Encoder
z
Encoder
!" !#
Consider this to be the mean
Consider this to be the std of a normal % Randomly chosen value Latent value, z
Tr Tradit ditiona ional A l AE E Decode Va Variational AE
Variational Autoencoders
18
Variational Autoencoders
19
Variational Autoencoders
20
512 neurons ReLU 512 neurons ReLU 256 neurons ReLU 20 neurons ReLU 256 neurons ReLU 784 neurons ReLU
512 neurons ReLU 512 neurons ReLU 256 neurons ReLU 20 neurons ReLU 256 neurons ReLU 784 neurons ReLU
Centers Spreads Random Variable
Lecture overview
Separ arat atibility of
AEs
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
21
Separability in Variational Autoencoders
we also want similar items in the same class to be near each other.
writing “2”, we want similar styles to end up near each other.
magic happening once we add stochasticity in the latent space.
22
Separability in Variational Autoencoders
23
Latent Space
Mean µ
SD σ
ENCODER DECODER
Encode the first sample (a “2”) and find !", $"
Separability in Variational Autoencoders
24
DECODER ENCODER
Latent Space
Mean µ
SD σ
Sample z" ∼ $(&", (")
Blending Latent Variables
25
DECODER ENCODER
Latent Space
Mean µ
SD σ
Decode to ! "#
Separability in Variational Autoencoders
26
Latent Space
Mean µ
SD σ
DECODER ENCODER
Encode the second sample (a “3”) find !", $". Sample z" ∼ ((!", $")
Separability in Variational Autoencoders
27
Latent Space
Mean µ
SD σ
DECODER ENCODER
Decode to ! "#
Separability in Variational Autoencoders
28
Latent Space
Mean µ
SD σ
DECODER ENCODER
Train with the first sample (a “2”) again and find !", $". However z" ∼ ((!", $") will not be the sam
Separability in Variational Autoencoders
29
Latent Space
Mean µ
SD σ
DECODER ENCODER
Decode to ! "#. Since the decoder only knows how to map from latent space to ! " space, it will return a “3”.
Latent space starts to re-organize
Separability in Variational Autoencoders
30
Latent Space
Mean µ
SD σ
Train with 1st sample again
DECODER ENCODER
Separability in Variational Autoencoders
31
Latent Space
Mean µ
SD σ
And again…
3 is pushed away
DECODER ENCODER
Separability in Variational Autoencoders
32
Mean µ
SD σ
Many times…
DECODER ENCODER
Latent Space
Separability in Variational Autoencoders
33
Mean µ
SD σ
Now lets test again
DECODER ENCODER
Latent Space
Separability in Variational Autoencoders
34
Mean µ
SD σ
Training on 3’s again
DECODER ENCODER
Latent Space
Separability in Variational Autoencoders
35
Latent Space
Mean µ
SD σ
Many times…
DECODER ENCODER
Lecture overview
Traini ning ng of
AEs
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
36
Training
37
!
"
!
#
µ % &
' ( '
Encoder Decoder
Training means learning !
" and ! #.
The Loss function:
,
'/ 1
distribution p(z), which can be computed by Kullback-Leibler divergence (KL): 67(8(&|')||8 & )
Bayesian AE
38
!
"
!
#
µ % &
' ( '
Encoder Decoder
Parameters
() is z)
p ) * ∝ , * ) , ) p & ', ( ' ∝ , ( ' &, ' , & Bayes rule: Posterior for our parameters, z is: Posterior predictive, probability to see ( ' given '; this is INFERENCE: p ( ' ' = ∫ , ( ' &, ' , & ' 1&
Decoder: NN Posterior
Bayesian AE
39
The posterior, ! " #, % # , can be sampled with MCMC, i.e. no minimization of Loss function. How?
# ", #
# /! " #, % # >1: accept, "∗
# /! " #, % # <1 throw a random coin and accept/reject "∗
# !
# # = ∫ ! % # ", # ! " # +" (Note: this is easily done with sample from z and re-weight given the likelihood)
Variational AE
40
Pr Probl
$ %, $ " % $ '% becomes intractable. Instead we turn this into a minimization problem – Variational Calculus Find a q % $ that is similar to " % $ by minimizing their difference. After some math:
−Ez~qφ z x
( ) log pθ x z
( )
( ) + KL qφ z x
( ) pθ(z)
( )
Reconstruction Loss Proposal distribution should resemble a Gaussian
Evidence Lower BOund (ELBO)
Training VAE
Pr Problem:
– Move sampling to input layer, so that the sampling step is independent
41
Reparametrization Trick
42
Encoder Decoder
z ! "
Reparametrization Trick
43
Encoder Decoder
z ! "
# = ! + & ∘ "
Reparametrization Trick
44
Encoder Decoder
z ! "
# ∼ %(0, )) + = ! + # ∘ "
Training VAE
45
Input Image: Output Images: Difference: Tr Tradit ditiona ional A l AE: E: Input Image: Output Images: Va Variational AE:
Latent space of VAE
46
Lecture overview
sentations
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
47
Desiderata for representations
What do we want out a representation? Many possible answers here. First, a few uncontroversial desiderata:
interpretable by a human, they can be easily evaluated. (e.g. noisy-OR: "features" are diseases a patient has)
Sparsity of a representation is an important subcase: "explanatory" features for sample can be examined if there are a small number of them.
: the features are "useful" for downstream tasks. Some examples: Improving label efficiency: if, for a task, a linear (or otherwise "simple") classifier can be trained on features and it works well, smaller # of labeled samples are needed.
48
Desiderate for representations
This is a lot more contraversial – here we survey some general desiderata, proposed as early as Bengio-Courville-Vincent ’14:
structure – depth helps induce such structure.
same category) are clustered.
meaningful data points (i.e. ”latent space is convex”). Sometimes called manifold flattening.
Courville-Vincent ’14). Has been very popular in modern unsupervised learning, though many potential issues with it.
49
Semantic clustering
(e.g. images in the same category) are clustered together.
50
The intuition: If semantic classes are linearly (or other simple function) separable, and labels
learn a simple classifier!!
t-SNE projection of VAE-learned features of the 10 MNIST classes. Image from https://pyro.ai/examples/vae.html
Semantic clustering
(e.g. images in the same category) are clustered together.
51
t-SNE projection of word embeddings for artists (clustered by genre). Image from https://medium.com/free-code- camp/learn-tensorflow-the- word2vec-model-and-the-tsne-algorithm-using-rock-bands-97c99b5dcb3a
Linear interpolation
produce meaningful data points. (i.e. “latent space is convex”)
52
Linear interpolatio
: in representation space, line
The data manifold is complicated/curved. The latent variable manifold is a convex set – moving in straight lines keeps us on it.
Interpolations for a VAE trained on MNIST.
Linear interpolation
produce meaningful data points. (i.e. “latent space is convex”)
53
Interpolations for a BigGAN, image from https://thegradient.pub/bigganex-a-dive-into- the-latent-space-of-biggan/
Prior disentangling: is a product distribution, i.e. Classical example: ICA (independent component analysis) Posterior disentangling: fit a variational posterior s.t. is (on average over ) a product distribution In other words -- usually called the aggregate posterior – is close to a product distribution.
Disentangled representations
(Bengio-Courville-Vincent ’14).
with latent variables . , observables , and joint distribution
54
θ(z)
<latexit sha1_base64="N2Hi8O7JPr4/1sYAl83Che2mU34=">ACDHicbVDLSsNAFJ3UV62vqks3g0Wom5KIoMuiG5cV7AOaUCbTSTt0MgkzN0IN+QA3/obF4q49QPc+TdO2iy09cAwh3P5d57/FhwDb9bZVWVtfWN8qbla3tnd296v5BR0eJoqxNIxGpnk80E1yNnAQrBcrRkJfsK4/uc7r3XumNI/kHUxj5oVkJHnAKQEjDaq1eJC6fiSGehqaL3VhzIBkWd0NCYz9IH3ITo3Lbtgz4GXiFKSGCrQG1S93GNEkZBKoIFr3HTsGLyUKOBUsq7iJZjGhEzJifUMlCZn20tkxGT4xyhAHkTJPAp6pvztSEup8WePMV9SLtVz8r9ZPILj0Ui7jBJik80FBIjBEOE8GD7liFMTUEIVN7tiOiaKUD5VUwIzuLJy6Rz1nAMvz2vNa+KOMroCB2jOnLQBWqiG9RCbUTRI3pGr+jNerJerHfrY24tWUXPIfoD6/MHRuOcXg=</latexit><latexit sha1_base64="N2Hi8O7JPr4/1sYAl83Che2mU34=">ACDHicbVDLSsNAFJ3UV62vqks3g0Wom5KIoMuiG5cV7AOaUCbTSTt0MgkzN0IN+QA3/obF4q49QPc+TdO2iy09cAwh3P5d57/FhwDb9bZVWVtfWN8qbla3tnd296v5BR0eJoqxNIxGpnk80E1yNnAQrBcrRkJfsK4/uc7r3XumNI/kHUxj5oVkJHnAKQEjDaq1eJC6fiSGehqaL3VhzIBkWd0NCYz9IH3ITo3Lbtgz4GXiFKSGCrQG1S93GNEkZBKoIFr3HTsGLyUKOBUsq7iJZjGhEzJifUMlCZn20tkxGT4xyhAHkTJPAp6pvztSEup8WePMV9SLtVz8r9ZPILj0Ui7jBJik80FBIjBEOE8GD7liFMTUEIVN7tiOiaKUD5VUwIzuLJy6Rz1nAMvz2vNa+KOMroCB2jOnLQBWqiG9RCbUTRI3pGr+jNerJerHfrY24tWUXPIfoD6/MHRuOcXg=</latexit><latexit sha1_base64="N2Hi8O7JPr4/1sYAl83Che2mU34=">ACDHicbVDLSsNAFJ3UV62vqks3g0Wom5KIoMuiG5cV7AOaUCbTSTt0MgkzN0IN+QA3/obF4q49QPc+TdO2iy09cAwh3P5d57/FhwDb9bZVWVtfWN8qbla3tnd296v5BR0eJoqxNIxGpnk80E1yNnAQrBcrRkJfsK4/uc7r3XumNI/kHUxj5oVkJHnAKQEjDaq1eJC6fiSGehqaL3VhzIBkWd0NCYz9IH3ITo3Lbtgz4GXiFKSGCrQG1S93GNEkZBKoIFr3HTsGLyUKOBUsq7iJZjGhEzJifUMlCZn20tkxGT4xyhAHkTJPAp6pvztSEup8WePMV9SLtVz8r9ZPILj0Ui7jBJik80FBIjBEOE8GD7liFMTUEIVN7tiOiaKUD5VUwIzuLJy6Rz1nAMvz2vNa+KOMroCB2jOnLQBWqiG9RCbUTRI3pGr+jNerJerHfrY24tWUXPIfoD6/MHRuOcXg=</latexit><latexit sha1_base64="N2Hi8O7JPr4/1sYAl83Che2mU34=">ACDHicbVDLSsNAFJ3UV62vqks3g0Wom5KIoMuiG5cV7AOaUCbTSTt0MgkzN0IN+QA3/obF4q49QPc+TdO2iy09cAwh3P5d57/FhwDb9bZVWVtfWN8qbla3tnd296v5BR0eJoqxNIxGpnk80E1yNnAQrBcrRkJfsK4/uc7r3XumNI/kHUxj5oVkJHnAKQEjDaq1eJC6fiSGehqaL3VhzIBkWd0NCYz9IH3ITo3Lbtgz4GXiFKSGCrQG1S93GNEkZBKoIFr3HTsGLyUKOBUsq7iJZjGhEzJifUMlCZn20tkxGT4xyhAHkTJPAp6pvztSEup8WePMV9SLtVz8r9ZPILj0Ui7jBJik80FBIjBEOE8GD7liFMTUEIVN7tiOiaKUD5VUwIzuLJy6Rz1nAMvz2vNa+KOMroCB2jOnLQBWqiG9RCbUTRI3pGr+jNerJerHfrY24tWUXPIfoD6/MHRuOcXg=</latexit>pθ(z, x)
<latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit>z, x)
<latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit>z, x)
<latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit><latexit sha1_base64="ugHeUmJdnwU71c7slYHERK6yU90=">AC3icbZDLSsNAFIYn9VbrerSzdAiVJCSiKDLohuXFewFmhAm0k7dHJh5kSsIXs3vobF4q49QXc+TZO2gra+sPAx3/OYc75vVhwBab5ZRSWldW14rpY3Nre2d8u5eW0WJpKxFIxHJrkcUEzxkLeAgWDeWjASeYB1vdJnXO7dMKh6FNzCOmROQch9Tgloy1XYje1YciAZDU7ID0/PQ+O8Y/fJcdueWqWTcnwotgzaCKZmq65U+7H9EkYCFQZTqWYMTkokcCpYVrITxWJCR2TAehpDEjDlpJNbMnyonT72I6lfCHji/p5ISaDUOPB0Z76imq/l5n+1XgL+uZPyME6AhXT6kZ8IDBHOg8F9LhkFMdZAqOR6V0yHRBIKOr6SDsGaP3kR2id1S/P1abVxMYujiA5QBdWQhc5QA12hJmohih7QE3pBr8aj8Wy8Ge/T1oIxm9lHf2R8fAODxptT</latexit>Disentangled representations
variable for an image, then change a single latent variable gradually.
55
Irina Higgins et al. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017.
Prior disentangling
is a product distribution, i.e. Classical example: ICA (independent component analysis), also called the “cocktail party problem”. Assume data is generated as
56
If z has an independent, non-Gaussian prior, model is identifiable and efficiently learnable. (See, e.g. Frieze-Jerum-Kannan ‘96, Anandkumar et al ’12) Other examples: noisy-OR networks (diseases are independent), general Bayesian nets, viewing top variables as z’s, GANs, …
Prior dis
g: (z) is a Π : ICA (indepen , ∈ ℝ, ∈ ℝ B GA
e a
tangling
nalysis), also called the , ∈ ℝ, ∈ ℝ B GA
e a
R AE
| log | || R R || ||
The KL term implicitly penalizes distributions for which is large – i.e. the aggregated posterior is far from a product distribution
57
”Regularization towards prior” ”Reconstruction” error
R AE
| log | || R R || ||
58
”Regularization towards prior”
The idea of Higgins et al ’17 introduce a “weighting” factor to put more weight
β-VAE objective:
”Reconstruction” error
Posterior disentanglement in VAEs
59
Irina Higgins et al. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017.
Posterior disentanglement in VAEs
60
Irina Higgins et al. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017.
Posterior disentanglement in VAEs
61
Irina Higgins et al. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017.
Measuring disentanglement
variation factors.
62
Generate a training set of samples as follows: Sample a batch of B samples as follows: Pick a ground-truth variation factor k uniformly at random from [K]. Generate two sets of “ground truth” latent factors, v1, v2 ∈ RK, s.t. (v1)k = (v2)k , and other coords are independently, randomly sampled. Generate images x1, x2 from v1, v2. Infer latent vars z1, z2 using model we are evaluating. (e.g. encoder in VAE) Calculate average zavg of | z1 - z2 | in batch, add (zavg, k) to training set. Train linear predictor on training set, evaluate it’s test performance.
BetaVAE metric: based on "linear separability" of factors
Measuring disentanglement
classifier should “focus” on k.
63
Generate a training set of samples as follows: Sample a batch of B samples as follows: Pick a ground-truth variation factor k uniformly at random from [K]. Generate two sets of “ground truth” latent factors, v1, v2 ∈ RK, s.t. (v1)k = (v2)k , and other coords are independently, randomly sampled. Generate images x1, x2 from v1, v2. Infer latent vars z1, z2 using model we are evaluating. (e.g. encoder in VAE) Calculate average zavg of | z1 - z2 | in batch, add (zavg, k) to training set. Train linear predictor on training set, evaluate it’s test performance.
BetaVAE metric: based on "linear separability" of factors
Measuring disentanglement
Learning of Disentangled Representations” (Best paper award ar ICML’19): A large-scale study of disentanglement measures, as well as gen. models.
64
Usefulness of disentanglement?
(w/ multiclass logistic regression)
65
Locatello et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. ICML 2019.
Usefulness of disentanglement?
divided by the average accuracy based on 10,000 samples
66
Locatello et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. ICML 2019.
Issue of ill-posedness?
be re-parametrized, s.t. the distribution over the data is unchanged, but it can be arbitrarily more "entangled".
seems necessary.
, for any non-identity orthogonal matrix U.
entangled with – small changes of coordinates of z cause global changes in .
67
Locatello et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations. ICML 2019.
Lecture overview
ctor Quantized Variational Autoenco coders s (VQ-VAEs) s)
Discl sclaimer: Much of the material and slides for this lecture were borrowed from
—Pavlov Protopapas, Mark Glickman and Chris Tanner's Harvard CS109B class —Andrej Risteski's CMU 10707 class —David McAllester's TTIC 31230 class
68
Gaussian VAEs 2013
69
Sample z ∼ N(0, I) and compute
Sample and compute
pute yΦ(z)
[Alec Radford]
Vector Quantized VAEs (VQ-VAE) 2019
70
VQ-VAE-2, Razavi et al., NeurIPS 2019
Vector Quantized VAEs (VQ-VAE) 2019
71
VQ-VAE-2, Razavi et al., NeurIPS 2019
Vector Quantized VAEs (VQ-VAE)
to represent vectors by discrete cluster centers.
layer of quantization.
denote images.
72
VQ-VAE Encoder-Decoder
where is the center vector of cluster k.
73
L[X, Y, I] = EncΦ(s) z[x, y] = argmin
k
||L[x, y, I] − C[k, I]|| ˆ L[x, y, I] = C[z[x, y], I] ˆ s = DecΦ(ˆ L[X, Y, I]) ry C[K, I] w e C[k, I] i
VQ-VAE Training Loss
distortion between and its reconstruction
74
Φ∗ = argmin
Φ
Es β||L[X, Y, I] − ˆ L[X, Y, I]||2 + ||s − ˆ s||2 − ˆ L[X, Y, I]|| ||L[X, Y, I] −
Parameter-Specific Learning Rates
.
points where the gradients are zero).
75
||L[X, Y, I]− ˆ L[X, Y, I]||2 = X
x,y
||L[x, y, I]−C[z[x, y], I]||2 for x, y L[x, y, I].grad += 2β(L[x, y, I] − C[z[x, y], I]) for x, y C[z[x, y], I].grad += 2(C[z[x, y], I] − L[x, y, I])
ry C[K, I] w
The Relationship to K-means
is the mean of the set of vectors with (as in K-means).
76
for x, y C[z[x, y], I].grad += 2(C[z[x, y], I] − L[x, y, I])
e C[k, I] i
ectors L[x, y, I] w h z[x, y] = k
Straight Through Gradients
be used.
and .
77
− ˆ L[X, Y, I]|| ||L[X, Y, I] −
for x, y L[x, y, I].grad += ˆ L[x, y, I].grad
Training Phase II
“symbolic image” .
we can learn an auto- regressive model of these symbolic images using a pixel- CNN.
which provides a tighter upper bound on the rate.
This is something GANs cannot do.
78
ained we can e” z[X, Y ]. ained we can e” z[X, Y ].
ate.
Multi-Layer Vector Quantized VAEs
79
Quantitative Evaluation
class-conditional image generation.
the ImageNet training data.
measure its accuracy on the ImageNet test set.
80
Direct Rate-Distortion Evaluation
resentations support unambiguous rate-distortion evaluation.
trade-off.
81
Image Compression
82
Vector Quantization (Emergent Symbols)
with a discrete set of embedded symbols.
compression.
representation of images and hence a measurable image compression rate-distortion trade-off.
83
Symbols: A Better Learning Bias
classification.
84
Symbols: Improved Interpretability
to the emergent symbols.
85
Symbols: Unifying Vision and Language
symbols.
86
Symbols: Addressing the “Forgetting” Problem
second tasks degrades performance on the first (the model “forgets”).
training on a different task.
87
Symbols: Improved Transfer Learning
parameters may improve transfer between domains.
88
89