From Variational to Deterministic Autoencoders or the joys of - - PowerPoint PPT Presentation

from variational to deterministic autoencoders
SMART_READER_LITE
LIVE PREVIEW

From Variational to Deterministic Autoencoders or the joys of - - PowerPoint PPT Presentation

From Variational to Deterministic Autoencoders or the joys of density estimation in latent spaces Antonio Vergari Joint work with: Partha Ghosh, Mehdi S.M. Sajjadi , Bernhard Schlkopf, Michael Black University of California, Los Angeles


slide-1
SLIDE 1

From Variational to Deterministic Autoencoders

Antonio Vergari University of California, Los Angeles @tetraduzione

26th August 2020 - UCL - AI Center Seminars

  • r the joys of density estimation in latent spaces

Joint work with: Partha Ghosh, Mehdi S.M. Sajjadi, Bernhard Schölkopf, Michael Black

slide-2
SLIDE 2

Why?

slide-3
SLIDE 3

Why?

learning time

slide-4
SLIDE 4

Why?

learning time

the generative modeling paradigm

inference time

slide-5
SLIDE 5

Variational Autoencoders

(VAEs)

⇒ Generative modeling [Van Den Oord2017, Tolstikhin2019, Razavi2019,...] ⇒ Density Estimation [Kingma2014,Rezende2014,Burda2015,...] ⇒ Disentanglement [Higgings2016, ...]

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." ICLR 2014

slide-6
SLIDE 6

Variational Autoencoders

(VAEs)

Regularized Autoencoders

(RAEs)

⇒ Generative modeling [Van Den Oord2017, Tolstikhin2019, Razavi2019,...] ⇒ Density Estimation [Kingma2014,Rezende2014,Burda2015,...] ⇒ Disentanglement [Higgings2016, ...]

a simpler alternative for generative modeling

slide-7
SLIDE 7

! disclaimer !

slide-8
SLIDE 8

Variational Autoencoders (VAEs)

slide-9
SLIDE 9

Variational Autoencoders (VAEs)

slide-10
SLIDE 10

Variational Autoencoders (VAEs)

slide-11
SLIDE 11

How to train VAEs?

slide-12
SLIDE 12

How to train VAEs?

slide-13
SLIDE 13

How to train VAEs?

slide-14
SLIDE 14

How to train VAEs?

slide-15
SLIDE 15

Training VAEs: issues

Balancing reconstruction quality and compression [Burda et al. 2015, Tolshkin et al. 2018, ...] Spurious global optima [Dai et al. 2019] Posterior collapse [van den Oord et al. 2018, ...] Prior/aggregate posterior mismatch [Tolshkin et al. 2018, Dai et al. 2019, ...]

slide-16
SLIDE 16

Issue #1: balancing training

slide-17
SLIDE 17

Issue #1: balancing training

  • ne sample approximation!
slide-18
SLIDE 18

Issue #1: balancing training

Weighting the KL term!

slide-19
SLIDE 19

Sampling VAEs

slide-20
SLIDE 20

Sampling VAEs

slide-21
SLIDE 21

Sampling VAEs

slide-22
SLIDE 22

Sampling VAEs

slide-23
SLIDE 23

Sampling VAEs

the aggregate posterior should ideally match the prior!

slide-24
SLIDE 24

Issue #2: sampling spurious codes

the prior/aggregate posterior mismatch

slide-25
SLIDE 25

Issue #2: sampling spurious codes

the decoder has a hard time “imagining”

slide-26
SLIDE 26

Can we do better?

slide-27
SLIDE 27

Simpler VAEs?

slide-28
SLIDE 28

Simpler VAEs?

slide-29
SLIDE 29

Simpler VAEs?

slide-30
SLIDE 30

Simpler VAEs?

slide-31
SLIDE 31

Simpler VAEs?

slide-32
SLIDE 32

Simpler VAEs?

slide-33
SLIDE 33

Simpler VAEs?

slide-34
SLIDE 34

Simpler VAEs?

slide-35
SLIDE 35

How to have a smooth latent space?

ideally,

slide-36
SLIDE 36

Regularized Autoencoders (RAEs)!

slide-37
SLIDE 37

Which regularization for RAEs?

slide-38
SLIDE 38

Which regularization for RAEs?

Gradient penalization [Gulrajani et al. 2017; Mescheder et al. 2018]

slide-39
SLIDE 39

Which regularization for RAEs?

Gradient penalization [Gulrajani et al. 2017; Mescheder et al. 2018] Spectral normalization [Miyato et al. 2018]

slide-40
SLIDE 40

Which regularization for RAEs?

Gradient penalization [Gulrajani et al. 2017; Mescheder et al. 2018] Spectral normalization [Miyato et al. 2018] Weight decay [Bishop et al. 1996]

slide-41
SLIDE 41

RAE for image generation

RAEs generate equally good or better samples and interpolations

RAE+L2 VAE

slide-42
SLIDE 42

RAE for image generation

even when regularization is implicit!

AE RAE+L2 VAE

slide-43
SLIDE 43

Common image benchmarks: MNIST

slide-44
SLIDE 44

Common image benchmarks: CIFAR10

slide-45
SLIDE 45

Common image benchmarks: CelebA

slide-46
SLIDE 46

How do we sample from RAEs…?

slide-47
SLIDE 47

Sampling RAEs…?

slide-48
SLIDE 48

Ex-Post Density Estimation (XPDE)

slide-49
SLIDE 49

Ex-Post Density Estimation (XPDE)

slide-50
SLIDE 50

Which density estimator for XPDE?

slide-51
SLIDE 51

Which density estimator for XPDE?

...or another VAE!

VAE training and sampling issues ...are still there!

a SOTA deep generative model e.g. autoregressive model or Flow

[van den Oord et al. 2019, Razavi et al. 2020]

slide-52
SLIDE 52

Which density estimator for XPDE?

striving for simplicity: just Gaussian Mixture Models

slide-53
SLIDE 53

Can’t we just do XPDE for VAEs?

slide-54
SLIDE 54

Can’t we just do XPDE for VAEs?

slide-55
SLIDE 55

Can’t we just do XPDE for VAEs?

slide-56
SLIDE 56

Ex-Post Density Estimation (XPDE)

XPDE consistently improves sample quality for all VAE variants

slide-57
SLIDE 57

Why...does it work?

slide-58
SLIDE 58

Why...does it work?

ConvNets are very, very, very smooth! [LeCun et al. 1994]

slide-59
SLIDE 59

Why...does it work?

ConvNets are very, very, very smooth! [LeCun et al. 1994] ...and these datasets are full, full, full of regularities!

slide-60
SLIDE 60

What about more challenging data?

E.g., generating structured objects like molecules

slide-61
SLIDE 61

VAEs for molecules?

Molecule VAE [Bombardelli et al. 2017]

GrammarVAE (GVAE) [Kusner et al. 2019]

Constrained Graph VAE (CGVAE) [Liu et al. 2018, ...]

⇒ ...

slide-62
SLIDE 62

GRAE: RAEifying the Grammar VAE

More accurate generation than Kusner et al. 2017

slide-63
SLIDE 63

RAEify your VAEs!

RAE VAE

slide-64
SLIDE 64

RAEify your VAEs!

RAE VAE

slide-65
SLIDE 65

RAEify your VAEs!

RAE VAE

slide-66
SLIDE 66

Is this really simple… and new?

slide-67
SLIDE 67

AEs for generative modeling

MCMC schemes to sample from Contractive [Rifai et al. 2011] and Denoising Autoencoders [Bengio et al. 2009]

slide-68
SLIDE 68

Other flavours of XPDE

Two-Stage VAEs [Dai et al. 2019]

use another VAE for XPDE

VAE training and sampling issues ...are still there!

VQ-VAEs [van den Oord et al. 2019,

Razavi et al. 2020] use PixelCNN over discrete latents

VQ-VAEs are RAEs not VAEs!

slide-69
SLIDE 69

What did we lose?

slide-70
SLIDE 70

What did we lose?

Variational Autoencoders

(VAEs)

⇒ Generative modeling ✓ ⇒ Density Estimation ✓ ⇒ Disentanglement ✓

Regularized Autoencoders

(RAEs)

⇒ Generative modeling ✓ ⇒ Density Estimation ? ⇒ Disentanglement ?

slide-71
SLIDE 71

RAEs for density estimation ?

RAEs (and VQ-VAEs) are like GANs, they are implicit likelihood models!

slide-72
SLIDE 72

RAEs for density estimation (?)

RAEs (and VQ-VAEs) are like GANs, they are implicit likelihood models! An approximate ELBO can be recovered under some geometric assumptions

slide-73
SLIDE 73

RAEs for disentanglement (?)

slide-74
SLIDE 74

Conclusions

slide-75
SLIDE 75

aiPhones

⇒ Phone capabilities ⇒ aiCloud, aiWatch, aiTunes,... ⇒ 4k Video, ...

slide-76
SLIDE 76

aiPhones RegularPhone

⇒ Phone capabilities ⇒ aiCloud, aiWatch, aiTunes,... ⇒ 4k Video, ...

what is the simplest model that gets you further?

slide-77
SLIDE 77

Takeaway #1: RAEify your VAEs!

RAE VAE

slide-78
SLIDE 78

Takeaway #2: use XPDE!

Boost your VAEs by training a density estimator on the latent codes!

slide-79
SLIDE 79

Paper Code

https://openreview.net/forum?id=S1g7tpEYDS https://github.com/ParthaEth/Regularized_autoencoders-RAE-