Neural Discrete Representation Learning A. van den Oord, O. Vinyals, - PowerPoint PPT Presentation

Neural Discrete Representation Learning A. van den Oord, O. Vinyals, K. Kavukcuoglu 2017 Presented by: Yulia Rubanova and Eddie (Shu Jian) Du CSC2547/STA4273

Introduction Vector quantization variational autoencoder (VQ-VAE) - VAE with discrete latent space Why discrete? - Many important real-world things are discrete (words, phonemes, etc.) - Learn global structure instead of noise and details - Achieve data compression by embedding into discrete latent space

Algorithm Step I: Input is encoded into continuous

Algorithm Step I: Input is encoded into continuous Step II: transforming into -- discrete variable over K categories

Algorithm Step I: Input is encoded into continuous Step II: transforming into -- discrete variable over K categories We define a latent embedding space (D is the dimensionality of each latent embedding vector)

Algorithm Step I: Input is encoded into continuous Step II: transforming into -- discrete variable over K categories We define a latent embedding space (D is the dimensionality of each latent embedding vector) To discretize : calculate a nearest neighbour in the embedding space

Algorithm The posterior categorical distribution -- deterministic!

Algorithm The posterior categorical distribution -- deterministic! Step III: use as input to the decoder

Algorithm The posterior categorical distribution -- deterministic! Step III: use as input to the decoder Reconstruction loss Model is trained as a VAE in which we can bound log p(x) with the ELBO.

Training How can we get a gradient for this?

Training How can we get a gradient for this? Just copy gradients from decoder input to encoder output (straight-through estimator)

Training How can we get a gradient for this? Just copy gradients from decoder input to encoder output (straight-through estimator) Main idea: Gradients from decoder contain information for how the encoder has to change its output to lower the reconstruction loss.

How do we train embeddings? Embedding don’t get gradient from reconstruction loss

How do we train embeddings? Embedding don’t get gradient from reconstruction loss Use L2 error to move the embedding vectors towards Embedding loss = sg = stopgradient operator

Training

How to reconstruct an image? Discrete z : a field of 32 x 32 latents (ImageNet), K=512 32 32 Discrete categories for each patch

How to reconstruct an image?

Experiments & Results

ImageNet - Reconstruction 128x128x3 images ↔ 32x32x1 discrete latent space (K=512) Original Reconstruction

128x128x3x(8 bits per pixel) / 32x32x(9 bits to index a vector) ImageNet - Recon = 42.6 times compression in bits 128x128x3 images ↔ 32x32x1 discrete latent space (K=512) Original Reconstruction

ImageNet - Samples Train PixelCNN on the 32x32x1 discrete latent space. Sample from PixelCNN, decode with VQ-VAE decoder.

ImageNet - Samples Train PixelCNN on the 32x32x1 discrete latent space. Sample from PixelCNN, decode with VQ-VAE decoder. Learn an autoregressive prior over discrete z ● PixelCNN for images ● WaveNet for raw audio PixelCNN PixelRNN Image Source: https://towardsdatascience.com/summary-of-pixelrnn-by-google-deepmind-7-min-read-938d9871d6d9

ImageNet - Generation Microwave pickup tiger beetle coral reef brown bear

DeepMind Lab - Reconstruction 84x84x3 images ↔ 21x21x1 discrete latent space (K=512) ↔ 3x1 discrete latent space (K=512) Two VQ-VAE layers! 3x9 = 27 bits in latent representation. Can’t reconstruct exactly, but does capture global structure.

DeepMind Lab 84x84x3 images ↔ 21x21x1 discrete latent space (K=512) ↔ 3x1 discrete latent space (K=512) Source: https://avdnoord.github.io/homepage/slides/SANE2017.pdf

DeepMind Lab - Reconstruction Original “Reconstruction”

Audio (VCTK) - Reconstruction Use WaveNet decoder. Source: https://avdnoord.github.io/homepage/slides/SANE2017.pdf

Audio (VCTK) - Reconstruction Original Reconstruction Again, not exact reconstruction, but captures global structure. (More examples at https://avdnoord.github.io/homepage/vqvae/)

Audio (LibriSpeech) - Latents == phonemes? It turns out discrete latent variables roughly correspond to phonemes. Note that the semantics of discrete codes could be dependent on previous codes; so it’s interesting that individual discrete codes actually hold meaning! Source: https://avdnoord.github.io/homepage/slides/SANE2017.pdf

Audio (LibriSpeech) - Sampling Example Source: https://avdnoord.github.io/homepage/slides/SANE2017.pdf

Audio (LibriSpeech) - Change Speaker Identity Original Transferred => Discrete latent variables are not speaker-specific! Source: https://avdnoord.github.io/homepage/slides/SANE2017.pdf

Summary - Pros: - Learn meaningful representations with global information - Can model long range sequences - Fully unsupervised - Avoids “posterior collapse” issue - Model features that usually span many dimensions in data space - Cons: - Straight-through estimator is biased - Compression relies on large lookup tables

Neural Discrete Representation Learning A. van den Oord, O. Vinyals, - PowerPoint PPT Presentation

Neural Discrete Representation Learning A. van den Oord, O. Vinyals, K. Kavukcuoglu 2017 Presented by: Yulia Rubanova and Eddie (Shu Jian) Du CSC2547/STA4273 Introduction Vector quantization variational autoencoder (VQ-VAE) - VAE with

Neural Discrete Representation Learning (VQ-VAE) Aaron van den Oord, Oriol Vinyals, Koray

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Network Approaches to Representation Learning for NLP Navid Rekabsaz Idiap Research

Neural representation of linguistic feature Neural representation of linguistic feature hierarchy

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Information Systems M Prof. Paolo Ciaccia

Display Technology Images stolen from various locations on the web... Cathode Ray Tube 1

T2Pair: Secure and Usable Pairing for Heterogeneous IoT Devices Xiaopeng Li, Qiang Zeng , Lannan

Measurement Study of Low- Introduction bitrate Internet Video Streaming Many studies of

Color Quantization Common color resolution for high quality images is 256 levels for each Red,

The LHCb Upgrade from 1 to 40 MHz readout The 20th Anniversary International Workshop on Vertex

8/31/2015 Personal Computing BITS BYTES AND FILES What is a bit Technically, its a change

Step Response Metric for Video Encoder Rate Control Characterisation NOSSDAV 2016 Ralf Globisch