About generative aspects of Variational Autoencoders LOD19 The - - PowerPoint PPT Presentation

about generative aspects of variational autoencoders
SMART_READER_LITE
LIVE PREVIEW

About generative aspects of Variational Autoencoders LOD19 The - - PowerPoint PPT Presentation

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti DISI -


slide-1
SLIDE 1

About generative aspects of Variational Autoencoders

LOD’19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti

DISI - Department of Informatics: Science and Engineering University of Bologna Mura Anteo Zamboni 7, 40127, Bologna, ITALY andrea.asperti@unibo.it

Andrea Asperti - University of Bologna, DISI 1

slide-2
SLIDE 2

Generative Models

Generative models are meant to learn rich data distributions, allowing sampling of new data. Two main classes of generative models

  • Generative Adversarial Networks (GANs)
  • Variational Autoencoders (VAEs)

At the current state of the art, GANs give better results. What is the problem with VAEs?

Andrea Asperti - University of Bologna, DISI 2

slide-3
SLIDE 3

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance)

Space Latent Encoder DNN Decoder DNN

Andrea Asperti - University of Bologna, DISI 3

slide-4
SLIDE 4

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance)

Space Latent Encoder DNN Decoder DNN

Can we use the decoder to generate data by sampling in the latent space?

Andrea Asperti - University of Bologna, DISI 4

slide-5
SLIDE 5

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of a learned internal representation (e.g. minimizing quadratic distance)

Space Latent Encoder DNN Decoder DNN

Can we use the decoder to generate data by sampling in the latent space? No, since we do not know the distribution of latent variables.

Andrea Asperti - University of Bologna, DISI 5

slide-6
SLIDE 6

Variational autoencoder

In a Variational Autoencoder (VAE) [9, 10, 6] we try to force latent variables to have a known distribution (e.g. a Normal distribution)

Encoder DNN Decoder DNN

z~N(0,1)

Andrea Asperti - University of Bologna, DISI 6

slide-7
SLIDE 7

Variational autoencoder

In a Variational Autoencoder (VAE) [9, 10, 6] we try to force latent variables to have a known distribution (e.g. a Normal distribution)

Encoder DNN Decoder DNN

z~N(0,1)

How can we do it? Is this actually working?

Andrea Asperti - University of Bologna, DISI 7

slide-8
SLIDE 8

The encoding distribution Q(z|X)

Latent Space Q(z|X )

1

X

1

=

Andrea Asperti - University of Bologna, DISI 8

slide-9
SLIDE 9

Estimate relevant statistics for Q(z|X)

Latent Space Q(z|X )

1

X

1

=

Andrea Asperti - University of Bologna, DISI 9

slide-10
SLIDE 10

Estimate relevant statistics for Q(z|X)

Latent Space

1

X =

1

µ(X )

1

(X ) σ Q(z|X ) = G( , )

1

Andrea Asperti - University of Bologna, DISI 10

slide-11
SLIDE 11

Estimate relevant statistics for Q(z|X)

1

X =

1

µ(X )

1

(X ) σ Q(z|X ) = G( , )

1

Latent Space X =

2

µ σ

2

(X )

2

(X ) Q(z|X ) = G( , )

2

Andrea Asperti - University of Bologna, DISI 11

slide-12
SLIDE 12

Estimate relevant statistics for Q(z|X)

1

X = Latent Space X =

2

We estimate the variance σ(X) around µ(X) by gaussian sampling at training time.

Andrea Asperti - University of Bologna, DISI 12

slide-13
SLIDE 13

Kullback-Leibler regularization

1

X = Latent Space X =

2

N(0,1)

minimize the Kullback-Leibler distance between each Q(z|X) and a normal distribution: KL(Q(z|X)||N(0, 1))

Andrea Asperti - University of Bologna, DISI 13

slide-14
SLIDE 14

The marginal posterior

1

X = Latent Space X =

2

N(0,1)

The actual distribution of latent variables is the marginal (aka cumulative) distribution Q(z), hopefully resembling the prior P(z) = N(0, 1) Q(z) =

  • X

Q(z|X) ≈ N(0, 1)

Andrea Asperti - University of Bologna, DISI 14

slide-15
SLIDE 15

MNIST case

Disposition in the latent space of 100 MNIST digits after 10 epochs of training It does indeed have a Guassian shape... Why?

Andrea Asperti - University of Bologna, DISI 15

slide-16
SLIDE 16

Why is KL-divergence working?

Many different answers ... relatively complex theory. In this article, we investigate the marginal posterior distribution as a Gaussian Mixture Model (GMM) (one gaussian for each data point).

Andrea Asperti - University of Bologna, DISI 16

slide-17
SLIDE 17

The normalization idea

  • For a neural network, it is relatively easy to perform an affine

transformation of the latent space

  • The transformation can be compensated in the next layer
  • f the network, keeping the loss invariant.

(same idea behind batch-normalization layers)

  • This means we may assume the network is able to keep a

fixed ratio ρ between the variance and the mean value of each latent variable.

Andrea Asperti - University of Bologna, DISI 17

slide-18
SLIDE 18

Pushing ρ in KL-divergence

Pushing ρ in the closed form of the KL-divergence, we get the expres- sion 1 2(σ2(X)1 + ρ2 ρ2 −log(σ2(X))−1) which has a minimum when σ2(X) + µ2(X) = 1

Andrea Asperti - University of Bologna, DISI 18

slide-19
SLIDE 19

Corollaries

  • The variance law:

averaging on all X, we expect that for each latent variable z

  • σ2

z(X) + σ2 z = 1

(supposing µz(X) = 0)

  • By effect of the KL divergence the two first moments of the

distribution of each latent variable should agree with those of a Normal N(0, 1) distribution

  • What about the other moments? Hard to guess.

Andrea Asperti - University of Bologna, DISI 19

slide-20
SLIDE 20

Conclusion

For several years the cause of the mediocre performance of VAEs has been imputed to the so called overpruning phenomenon [2, 11, 12]. Recent research suggests the problem is due to the difformity between the latent distribution and the normal prior [4, 5, 1, 7]. Our contribution: we may reasonably expect the KL-divergence will force the two first moments of the distribution to agree with those of a Normal distribution, but we may hardly presume the same for the other moments.

Andrea Asperti - University of Bologna, DISI 20

slide-21
SLIDE 21

Essential bibliography (1)

Andrea Asperti. Variational Autoencoders and the Variable Collapse Phenomenon Sensors & Transducers V.234, N.3, pages 1-8, 2018. Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. CoRR, abs/1509.00519, 2015. Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in β-vae. 2018. Bin Dai, Yu Wang, John Aston, Gang Hua, and David P. Wipf. Connections with robust PCA and the role of emergent sparsity in variational autoencoder models. Journal of Machine Learning Research, 19, 2018. Bin Dai and David P. Wipf. Diagnosing and enhancing vae models. In Seventh International Conference on Learning Representations (ICLR 2019), May 6-9, New Orleans, 2019. Carl Doersch. Tutorial on variational autoencoders. CoRR, abs/1606.05908, 2016. Andrea Asperti - University of Bologna, DISI 21

slide-22
SLIDE 22

Essential bibliography (2)

Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael J. Black, Bernhard Sch¨

  • lkopf

From Variational to Deterministic Autoencoders CoRR, abs/1903.12436. Diederik P. Kingma, Tim Salimans, and Max Welling. Improving variational inference with inverse autoregressive flow. CoRR, abs/1606.04934, 2016. Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, volume 32 of JMLR Workshop and Conference Proceedings, pages 1278–1286. JMLR.org, 2014. Serena Yeung, Anitha Kannan, and Yann Dauphin. Epitomic variational autoencoder. 2017. Serena Yeung, Anitha Kannan, Yann Dauphin, and Li Fei-Fei. Tackling over-pruning in variational autoencoders. CoRR, abs/1706.03643, 2017. Andrea Asperti - University of Bologna, DISI 22