Disentangling Disentanglement in Variational Autoencoders ICML 2019 - - PowerPoint PPT Presentation

disentangling disentanglement in variational autoencoders
SMART_READER_LITE
LIVE PREVIEW

Disentangling Disentanglement in Variational Autoencoders ICML 2019 - - PowerPoint PPT Presentation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu , Tom Rainforth , N. Siddharth , Yee Whye Teh Variational


slide-1
SLIDE 1

Disentangling Disentanglement in Variational Autoencoders

ICML 2019

Emile Mathieu⋆, Tom Rainforth⋆, N. Siddharth⋆, Yee Whye Teh June 12, 2019

Departments of Statistics and Engineering Science, University of Oxford

slide-2
SLIDE 2

Variational Autoencoders x1 x2 x3 x4 x5 x z1 z2 z3 z4 Generative Model Inference Model

zl

(gender)

zm (beard) zn

(makeup)

Factors

1

slide-3
SLIDE 3

Disentanglement Independence x1 x2 x3 x4 x5 x z1 z2 z3 z4 Generative Model Inference Model xi xj

zl

(gender)

zm (beard) zn

(makeup)

Meaningful Factors

1

slide-4
SLIDE 4

Disentanglement = Independence x1 x2 x3 x4 x5 x z1 z2 z3 z4 Generative Model Inference Model

zl

(shape)

zm (angle) zn

(scale)

Independent Factors

1

slide-5
SLIDE 5

Decomposition ∈ {Independence, Clustering, Sparsity, …} x1 x2 x3 x4 x5 x z1 z2 z3 z4 Generative Model Inference Model

zl

(gender)

zm (beard) zn

(makeup)

Co-Related Factors

1

slide-6
SLIDE 6

Decomposition: A Generalization of Disentanglement

Characterise decomposition as the fulfilment of two factors: (a) level of overlap between encodings in the latent space, (b) matching between the marginal posterior qφ(z) and structured prior p(z) to constrain with the required decomposition.

2

slide-7
SLIDE 7

Decomposition: An Analysis

Desired Structure

p(z)

3

slide-8
SLIDE 8

Decomposition: An Analysis

Insufficient Overlap

ent qφ(z|x) pθ(x|z) pD(x) qφ(z) p(z) pθ(x)

3

slide-9
SLIDE 9

Decomposition: An Analysis

Too Much Overlap

qφ(z|x) pθ(x|z) ch pD(x) qφ(z) p(z) pθ(x)

3

slide-10
SLIDE 10

Decomposition: An Analysis

Appropriate Overlap

ate qφ(z|x) pθ(x|z) pD(x) qφ(z) p(z) pθ(x)

3

slide-11
SLIDE 11

Overlap — Deconstructing the β-VAE Lβ(x) = Eqφ(z|x)[log pθ(x|z)] − β · KL(qφ(z|x)||p(z)) = L(x) (πθ,β, qφ)

  • ELBO with β-annealed prior

+(β − 1) · Hqφ

  • maxent

+ log Fβ

constant

Implications β-VAE disentangles largely by controlling the level of overlap It places no direct pressure on the latents to be independent!

4

slide-12
SLIDE 12

Decomposition: Objective Lα,β(x) = Eqφ(z|x)[log pθ(x | z)]

Reconstruct observations

− β · KL(qφ(z | x) p(z))

Control level of overlap

− α · D(qφ(z), p(z))

Impose desired structure

5

slide-13
SLIDE 13

Decomposition: Generalising Disentanglement Independence: p(z) = N(0, σ⋆)

Figure 1: β-VAE trained on 2D Shapes1 computing disentanglement2.

1Matthey et al., dSprites: Disentanglement testing Sprites dataset, p. 1. 2Kim and Mnih, “Disentangling by Factorising”, p. 2.

6

slide-14
SLIDE 14

Decomposition: Generalising Disentanglement Clustering: p(z) = ∑

k ρk · N(µk, σk)

β = 0.01 β = 0.5 β = 1.0 β = 1.2 α = 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

β = 0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

α = 1 α = 3 α = 5 α = 8

Figure 2: Density of aggregate posterior qφ(z) with different α, β for the pinwheel dataset.3

3http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab.

7

slide-15
SLIDE 15

Decomposition: Generalising Disentanglement Sparsity: p(z) = ∏

d (1 − γ) · N(zd; 0, 1) + γ · N(zd; 0, σ2 0) 5 10 15 20 25 30 35 40 45 Latent dimension 0.0 0.2 0.4 0.6

  • Avg. latent magnitude

Trouser Dress Shirt

Figure 3: Sparsity of learnt representations for the Fashion-MNIST4 dataset.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

slide-16
SLIDE 16

Decomposition: Generalising Disentanglement Sparsity: p(z) = ∏

d (1 − γ) · N(zd; 0, 1) + γ · N(zd; 0, σ2 0)

(a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40

leg separation dress width shirt fit sleeve style

Figure 3: Latent space traversals for “active” dimensions4.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

slide-17
SLIDE 17

Decomposition: Generalising Disentanglement Sparsity: p(z) = ∏

d (1 − γ) · N(zd; 0, 1) + γ · N(zd; 0, σ2 0) 200 400 600 800 1000 alpha 0.2 0.3 0.4 0.5

  • Avg. Normalised Sparsity

γ = 0, β = 0.1 γ = 0.8, β = 0.1 γ = 0, β = 1 γ = 0.8, β = 1 γ = 0, β = 5 γ = 0.8, β = 5

Figure 3: Sparsity vs regularisation strength α (higher better)4.

4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

8

slide-18
SLIDE 18

Recap

We propose and develop:

  • Decomposition: a generalisation of disentanglement involving:

(a) overlap of latent encodings (b) match between qφ(z) and p(z)

  • A theoretical analysis of the β-VAE objective showing it primarily
  • nly contributes to overlap.
  • An objective that incorporates both factors (a) and (b).
  • Experiments that showcase efficacy at different decompositions:
  • independence
  • clustering
  • sparsity

9

slide-19
SLIDE 19

Emile Mathieu Tom Rainforth

  • N. Siddharth

Yee Whye Teh

Code Paper

iffsid/disentangling-disentanglement arXiv:1812.02833

Come talk to us at our poster: #5

9