Disentangling Disentanglement in Variational Autoencoders ICML 2019 - PowerPoint PPT Presentation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu ⋆ , Tom Rainforth ⋆ , N. Siddharth ⋆ , Yee Whye Teh

Variational Autoencoders x 1 Factors (makeup) z n (gender) z l Model Inference Model Generative z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 1 z m (beard)

Disentanglement Independence Factors Meaningful (makeup) z n (gender) z l x j x i Model Inference Model Generative z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 x 1 1 z m (beard)

1 Model Factors Independent (scale) z n (shape) z l Model Inference Generative x 1 z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 Disentanglement = Independence z m (angle)

1 Model Factors Co-Related (makeup) z n (gender) z l Model Inference Generative x 1 z 4 z 3 z 2 z 1 x x 5 x 4 x 3 x 2 Decomposition ∈ {Independence, Clustering, Sparsity, …} z m (beard)

Decomposition: A Generalization of Disentanglement Characterise decomposition as the fulfilment of two factors: (a) level of overlap between encodings in the latent space, 2 (b) matching between the marginal posterior q φ ( z ) and structured prior p ( z ) to constrain with the required decomposition.

Decomposition: An Analysis Desired Structure 3 p ( z )

Decomposition: An Analysis Insufficient Overlap 3 ent q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

Decomposition: An Analysis Too Much Overlap 3 ch q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

Decomposition: An Analysis Appropriate Overlap 3 ate q φ ( z | x ) p θ ( x | z ) p D ( x ) q φ ( z ) p ( z ) p θ ( x )

4 maxent It places no direct pressure on the latents to be independent! Implications constant Overlap — Deconstructing the β -VAE L β ( x ) = E q φ ( z | x ) [ log p θ ( x | z )] − β · KL ( q φ ( z | x ) || p ( z )) = L ( x ) ( π θ,β , q φ ) +( β − 1 ) · H q φ + log F β � �� ELBO with β -annealed prior β -VAE disentangles largely by controlling the level of overlap

Decomposition: Objective Reconstruct observations Control level of overlap Impose desired structure 5 L α,β ( x ) = E q φ ( z | x ) [ log p θ ( x | z )] − β · KL ( q φ ( z | x ) � p ( z )) − α · D ( q φ ( z ) , p ( z ))

Decomposition: Generalising Disentanglement 1Matthey et al., dSprites: Disentanglement testing Sprites dataset , p. 1. 2Kim and Mnih, “Disentangling by Factorising”, p. 2. 6 Independence : p ( z ) = N ( 0 , σ ⋆ ) Figure 1: β -VAE trained on 2D Shapes 1 computing disentanglement 2 .

Decomposition: Generalising Disentanglement pinwheel dataset. 3 3 http://hips.seas.harvard.edu/content/synthetic-pinwheel-data-matlab . 7 Clustering : p ( z ) = ∑ k ρ k · N ( µ k , σ k ) β = 0 . 01 β = 0 . 5 β = 1 . 0 β = 1 . 2 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 α = 0 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 β = 0 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 α = 1 α = 3 α = 5 α = 8 Figure 2: Density of aggregate posterior q φ ( z ) with different α , β for the

Decomposition: Generalising Disentanglement 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) Avg. latent magnitude Trouser 0.6 Dress Shirt 0.4 0.2 0.0 0 5 10 15 20 25 30 35 40 45 Latent dimension Figure 3: Sparsity of learnt representations for the Fashion-MNIST 4 dataset.

leg separation Decomposition: Generalising Disentanglement dress width shirt fit sleeve style Figure 3: Latent space traversals for “active” dimensions 4 . 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) (a) d = 49 (b) d = 30 (c) d = 19 (d) d = 40

Decomposition: Generalising Disentanglement 4Xiao, Rasul, and Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms . 8 Sparsity : p ( z ) = ∏ d ( 1 − γ ) · N ( z d ; 0 , 1 ) + γ · N ( z d ; 0 , σ 2 0 ) Avg. Normalised Sparsity 0.5 0.4 0.3 0.2 0 200 400 600 800 1000 alpha γ = 0, β = 0.1 γ = 0, β = 1 γ = 0, β = 5 γ = 0.8, β = 0.1 γ = 0.8, β = 1 γ = 0.8, β = 5 Figure 3: Sparsity vs regularisation strength α (higher better) 4 .

Recap We propose and develop: • Decomposition: a generalisation of disentanglement involving: (a) overlap of latent encodings only contributes to overlap. • An objective that incorporates both factors (a) and (b). • Experiments that showcase efficacy at different decompositions: 9 (b) match between q φ ( z ) and p ( z ) • A theoretical analysis of the β -VAE objective showing it primarily • independence • clustering • sparsity

Emile Mathieu Tom Rainforth N. Siddharth Yee Whye Teh Code Paper iffsid/disentangling-disentanglement arXiv:1812.02833 Come talk to us at our poster: #5 9

Disentangling Disentanglement in Variational Autoencoders ICML 2019 - PowerPoint PPT Presentation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu , Tom Rainforth , N. Siddharth , Yee Whye Teh Variational

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Self-Supervised Model Training and Selection for Disentangling GANs Previous title: InfoGAN-CR:

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

Unsupervised Learning There is no direct ground truth for the quantity of interest

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Objectives You should be able to ... Dynamic Prolog You can often tell what the language

K ts t tt

Is the Heap Manager Important to Many Cores? Ye Liu, Shinpei Kato, Masato Edahiro Outline

Q: Between mid-March and mid-August, how many Americans filed for unemployment? Goals Build a

Income & Rent Calculation in 25 Minutes! National HOPWA Institute 2017 Tampa, FL Income

Making Housing More Affordable for Working Families: Prosperity Nows Housing Priorities for

Calculational HoTT International Conference on Homotopy Type Theory (HoTT 2019) Carnegie Mellon

Autumn 2020 Ling 5201 Syntax I 2: Syntax as deduction Robert Levine Ohio State University

Disentangling Disentanglement in Variational Autoencoders ICML 2019 - PowerPoint PPT Presentation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments of Statistics and Engineering Science, University of Oxford Emile Mathieu , Tom Rainforth , N. Siddharth , Yee Whye Teh Variational

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Semi-Amortized Variational Autoencoders Yoon Kim Sam Wiseman Andrew Miller David Sontag

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

CS598LAZ - Variational Autoencoders Raymond Yeh, Junting Lou, Teck-Yian Lim Outline - Review

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Self-Supervised Model Training and Selection for Disentangling GANs Previous title: InfoGAN-CR:

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

Unsupervised Learning There is no direct ground truth for the quantity of interest

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

Objectives You should be able to ... Dynamic Prolog You can often tell what the language

K ts t tt

Is the Heap Manager Important to Many Cores? Ye Liu, Shinpei Kato, Masato Edahiro Outline

Q: Between mid-March and mid-August, how many Americans filed for unemployment? Goals Build a

Income &amp; Rent Calculation in 25 Minutes! National HOPWA Institute 2017 Tampa, FL Income

Making Housing More Affordable for Working Families: Prosperity Nows Housing Priorities for

Calculational HoTT International Conference on Homotopy Type Theory (HoTT 2019) Carnegie Mellon

Autumn 2020 Ling 5201 Syntax I 2: Syntax as deduction Robert Levine Ohio State University

Income & Rent Calculation in 25 Minutes! National HOPWA Institute 2017 Tampa, FL Income