Understanding and Organising the Latent Space of Autoencoders - PowerPoint PPT Presentation

Understanding and Organising the Latent Space of Autoencoders Alasdair Newson Télécom ParisTech alasdair.newson@telecom-paristech.fr 6 February, 2020 1 / 53 1 / 53

Collaborators This work was carried out in collaboration with the following colleagues Saïd Ladjal Andrés Almansa Chi-Hieu Pham Yann Gousseau (T élécom ParisT ech) (Université Paris Descartes) (T élécom ParisT ech) (T élécom ParisT ech) 2 / 53 2 / 53

Autoencoders - introduction What are autoencoders ? Deep neural networks Cascaded operations : linear transformations, convolutions, non-linearities Great flexibility : approximate a large class of functions Autoencoder : neural network designed for compressing and uncompressing data Encoder Decoder The lower-dimensional space in the middle is known as the latent space 3 / 53 3 / 53

Autoencoders - introduction What are autoencoders used for ? Synthesis of high-level/abstract images Autoencoder-type networks which are designed for synthesis are known as Generative Models Eg.: Variational Autoencoders and Generative Adversarial Networks (GANs) Density estimation using Real NVP , L. Dinh, J. Sohl-Dickstein, S. Bengio, arXiv 2016 These produce impressive results. However, autoencoder mechanisms and latent spaces are not well understood Goal of our work : understand underlying mechanisms , and create interpretable and navigable latent spaces 4 / 53 4 / 53

Subject of this talk Understanding and Organising the Latent Space of Autoencoders Encoder Decoder Subjects of this talk Understand how autoencoders can encode/decode basic geometric 1 attributes of images Size Position Propose an autoencoder algorithm which aims to separate different 2 image attributes in the latent space PCA-like autoencoder Encourage ordered and decorrelated latent spaces 5 / 53 5 / 53

Summary Autoencoding size 1 Autoencoding Position 2 PCA-like Autoencoder 3 6 / 53 6 / 53

Autoencoding size We are interested in understanding how autoencoders can encode/decode shapes Example of latent space interpolation in a generative model Simple example of such a shape is a disk How can an autoencoder encode and decode a disk ? We present our problem setup now † Generative Visual Manipulation on the Natural Image Manifold , J-Y. Zhu, P. Krähenbühl, E. Schechtman, A. Efros, CVPR 2016 7 / 53 7 / 53

Disk autoencoder : problem setup Autoencoding size Can AEs encode and decode a disk “optimally”; if so, how ? Training set : square, disk, images of size 64 × 64 Blurred slightly to avoid discrete parameterisation Each image contains one centred disk of random radius r Optimality, perfect reconstruction : x = D ◦ E ( x ) , with smallest d possible ( d = 1 ) E is the encoder, D is the decoder 8 / 53 8 / 53

Disk autoencoder : problem setup Disk autoencoder design Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Subsampling Subsampling Subsampling Subsampling Subsampling Subsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Conv 3x3 Upsampling Upsampling Upsampling Upsampling Upsampling Upsampling Bias Bias Bias Bias Bias Bias LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu LeakyReLu Four operations : convolution, sub/up-sampling, additive biases, Leaky ReLU : � t , if t > 0 φ α ( t ) = αt , if t ≤ 0 Number of layers determined by subsampling factor s = 1 2 9 / 53 9 / 53

Disk autoencoder Disk autoencoding training minimisation problem Θ E , ˆ ˆ � D ◦ E ( x r ) − x r � 2 � Θ D = arg min (1) 2 Θ E , Θ D x r Θ E , Θ D : parameters of the network (weights and biases) x r : image containing disk of radius r NB : We do not enter into the minimisation details here (Adam optimiser) 10 / 53 10 / 53

Investigating autoencoders First question, can we compress disks to 1 dimension ? Yes ! Input ( x ) Output ( y ) Let us try to understand how this works 11 / 53 11 / 53

Investigating autoencoders How does the autoencoder work in the case of disks ? First idea, inspect network weights Unfortunately, very difficult to interpret Example of weights ( 3 × 3 convolutions) 12 / 53 12 / 53

Investigating autoencoders How does the encoder work : inspect the latent space Encoding simple to understand : averaging filter gives area of disks ∗ How about decoding ? Inspecting weights and biases is tricky We can describe the decoding function when we remove the biases (ablation study) 13 / 53 ∗ In fact, one can show that the optimal encoding is indeed the area, when a contractive loss is used 13 / 53

Decoding a disk Ablation study : remove biases of the network Input Output 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 2 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 1 . 0 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 0 . 8 Disk profile Disk profile Disk profile Disk profile y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) y ( t ) 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 0 . 6 Output profile Output profile Output profile Output profile 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 0 . 4 14 / 53 Disk profile Disk profile 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 0 . 2 Output profile Output profile 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) t (spatial position) 14 / 53

Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) 15 / 53 15 / 53

Investigating autoencoders Positive Multiplicative Action of the Decoder Without Bias Consider a decoder, without biases, with D ℓ +1 = LeakyReLU α � U ( D ℓ ) ∗ w ℓ � , where U is an upsampling operator. In this case, we have ∀ z, ∀ λ ∈ R + , D ( λz ) = λD ( z ) . (2) � U ( λz ) ∗ w ℓ � = λ max � U ( z ) ∗ w ℓ , 0 � + λα min � U ( z ) ∗ w ℓ , 0 � D ( λz ) = LeakyReLU α � U ( z ) ∗ w ℓ � = λ LeakyReLU α = λD ( z ) . Output can be written y = h ( r ) f , with f learned during training In the case without bias, we can rewrite the training problem in a simpler form 15 / 53 15 / 53

Decoding a disk Disk autoencoding training problem (continuous case), without biases � R � f, ✶ B r � 2 dr ˆ f = arg max (3) 0 f Proof : The continuous training minimisation problem can be written as � R � ( h ( r ) f ( t ) − ✶ B r ( t )) 2 dt dr f, ˆ ˆ h = arg min (4) 0 Ω f,h Also, for a fixed f , the optimal h is given by h ( r ) = � f, ✶ B r � ˆ (5) � f � 2 2 16 / 53 16 / 53

Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 This gives us the final result : � R − � f, ✶ B r � 2 dr ˆ f = arg min (6) 0 f � R � f, ✶ B r � 2 dr. = arg max (7) 0 f 17 / 53 17 / 53

Decoding a disk We insert the optimal ˆ h ( r ) , and choose the (arbitrary) normalisation � f � 2 2 = 1 This gives us the final result : � R − � f, ✶ B r � 2 dr ˆ f = arg min (6) 0 f � R � f, ✶ B r � 2 dr. = arg max (7) 0 f Since the disks are radially symmetric , the integration can be simplified to one dimension The first variation of the functional in Equation (3) leads to a differential equation, Airy’s equation f ′′ ( ρ ) = − kf ( ρ ) ρ, (8) 17 / 53 with f (0) = 1 , f ′ (0) = 0 17 / 53

Decoding a disk The functional is indeed minimised by the training procedure Comparison of autoencoder, numerical minimisation and Airy’s equation 1 . 0 0 . 8 0 . 6 f ( t ) 0 . 4 0 . 2 Result of autoencoder Numerical minimisation of energy Airy’s function 0 . 0 0 5 10 15 20 25 30 t 18 / 53 18 / 53

Decoding a disk Summary Encoder : integration (averaging filter) sufficient Decoder : a function learned, scaled and thresholded The encoder extracts the parameter of the shape (radius here) The decoder contains a primitive of the shape Parametrisation of this shape uses latent space 19 / 53 19 / 53

Decoding a disk Summary Further work : apply this to scaling of any shape Useful for understanding how autoencoders process binary images Scaled mnist data Corpus callosum data (MRI images) 20 / 53 20 / 53

Summary Autoencoding size 1 Autoencoding Position 2 PCA-like Autoencoder 3 21 / 53 21 / 53

Autoencoding position The second characteristic we wish to extract is position In many cases, the objects in images are somewhat centred, however, not completely Autoencoders still need to be able to describe position 22 / 53 22 / 53

Autoencoding position Few workq concentrate on the positional aspect of autoencoders “CoordConv” ∗ Solution to position problem : explicitly add spatial information However, we wish to understand how an autoencoder can do this without explicit “instructions” (in an unsupervised manner) 23 / 53 ∗ R. Liu et al, An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution, NIPS, 2018. 23 / 53

Understanding and Organising the Latent Space of Autoencoders - PowerPoint PPT Presentation

Understanding and Organising the Latent Space of Autoencoders Alasdair Newson Tlcom ParisTech alasdair.newson@telecom-paristech.fr 6 February, 2020 1 / 53 1 / 53 Collaborators This work was carried out in collaboration with the

Momentum- Driven Organising Presentation at By 2020 We Rise Up European meeting, 4-8 March 2020

Empirical Analysis of Latent Space Embedding David Mount and Eunhui Park Department of Computer

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Demystifying Relational Latent Representations Sebastijan Dumani, Hendrik Blockeel DTAI, KU

Sociocracy An Overview In a nutshell Working and organising together to get things done

Organising maintenance by contractors David Bevan Historic Church Buildings Support Officer

Supervised Self-Organising Maps similarity/distance (Kohonen, 1982). Ron Wehrens Institute of

Structure 1 Objective of an auction of EAs 2 Organising an auction of EAs 3 Special aspects

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

DeepMDP Learning Latent Space Continuous Models for Representation Learning Carles Gelada,

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

On the nighttime enhancement in ionospheric electron density over the equatorial region Sneha

Waseda at TRECVID 2016 Ad-hoc Video Search(AVS) Kazuya UEKI Kotaro KIKUCHI

Fundamental Physics Tests using Fundamental Physics Tests using Rubidium Rubidium and Cesium

What is Algebraic Biology? Matthew Macauley Department of Mathematical Sciences Clemson

www.nnecos.org/concord

Computational Linguistics Prep Course 2012-2013 Week 1 Seminar Room Building C7 2 (Foyer) Time

Welcome Share WiFi Photos Free wi-fi for all attendees Instagram: #gotoldn Network: codenode

Understanding and Organising the Latent Space of Autoencoders - PowerPoint PPT Presentation

Understanding and Organising the Latent Space of Autoencoders Alasdair Newson Tlcom ParisTech alasdair.newson@telecom-paristech.fr 6 February, 2020 1 / 53 1 / 53 Collaborators This work was carried out in collaboration with the

Momentum- Driven Organising Presentation at By 2020 We Rise Up European meeting, 4-8 March 2020

Empirical Analysis of Latent Space Embedding David Mount and Eunhui Park Department of Computer

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Demystifying Relational Latent Representations Sebastijan Dumani, Hendrik Blockeel DTAI, KU

Sociocracy An Overview In a nutshell Working and organising together to get things done

Organising maintenance by contractors David Bevan Historic Church Buildings Support Officer

Supervised Self-Organising Maps similarity/distance (Kohonen, 1982). Ron Wehrens Institute of

Structure 1 Objective of an auction of EAs 2 Organising an auction of EAs 3 Special aspects

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

DeepMDP Learning Latent Space Continuous Models for Representation Learning Carles Gelada,

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor &amp; Client: Dr. Randy

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

On the nighttime enhancement in ionospheric electron density over the equatorial region Sneha

Waseda at TRECVID 2016 Ad-hoc Video Search(AVS) Kazuya UEKI Kotaro KIKUCHI

Fundamental Physics Tests using Fundamental Physics Tests using Rubidium Rubidium and Cesium

What is Algebraic Biology? Matthew Macauley Department of Mathematical Sciences Clemson

www.nnecos.org/concord

Computational Linguistics Prep Course 2012-2013 Week 1 Seminar Room Building C7 2 (Foyer) Time

Welcome Share WiFi Photos Free wi-fi for all attendees Instagram: #gotoldn Network: codenode

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy