Pixel Recurrent Neural Networks Aaron van den Oord, Nal - - PowerPoint PPT Presentation

▶

Aug 09, 2023 318 likes •435 views

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16 188 citations Pixel Recurrent Neural Networks 1. What is the task? 2. Other models: GAN, VAE, 3. PixelCNN model 4. Results

SLIDE 1

Pixel Recurrent Neural Networks

Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16

188 citations

SLIDE 2

Pixel Recurrent Neural Networks

1. What is the task?
2. Other models: GAN, VAE, …
3. PixelCNN model
4. Results
5. Discussion & Conclusion
6. Extensions (preview of next coffeetalk)

SLIDE 3

Goal: learning the distribution of natural images

Task:
Input: training set of images
Output: model that estimates p(x)

for any image x

Evaluation: measure p(x) on testset.

Higher p(x) is better.

Note: p(x) should be normalized
Why learn p(x)?
Image reconstruction / inpainting /

denoising: input corrupted image,

utput fixed image
Image colorization: input greyscale,
utput color image
Semi-supervised learning (low

density separation)

Representation learning (find

manifold of natural images)

Dimensionality reduction / finding

variations in data

Clustering
…

SLIDE 4

Other approaches

GAN Variational Autoencoder (VAE) Pixel CNN (This talk) Invertible models Real NVP Compute exact likelihood p(x)

 



Has latent variable z

   

Compute latent variable z (inference)

   

Stable training? (No mode collapse)

   ?

Sharp images?

   ?

SLIDE 5

Pixel CNN (1/2)

Why is computing 𝑞(𝑦) so difficult?
This is the reason why GANs avoid it, and VAE approximate it
Answer: normalization of 𝑞(𝑦)
We need to integrate the model output over all images x which is intractable
Pixel CNN computes 𝑞(𝑦) using the chain rule of probability

𝑞 𝑦 = 𝑞 𝑦4 𝑦3, 𝑦2, 𝑦1 𝑞 𝑦3 𝑦2, 𝑦1 𝑞 𝑦2 𝑦1 𝑞 𝑦1

The function 𝑞(𝑦𝑗|𝑦𝑗−1, … , 𝑦1) is modeled using a CNN
This 1D function is easy to keep normalized
If this conditional density is normalized, 𝑞(𝑦) is properly normalized as well!

SLIDE 6

Pixel CNN (2/2)

1. Order pixels 2. Imagine already generated pixels 1-6, want to predict pixel 7 3. Mask pixels 7-16 (set to 0) 4. CNN outputs normalized histogram for pixel 7 given pixel values 1-6 (maksed input)

Maximize log likelihood w.r.t. CNN parameters

Image from trainset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Masked image 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CNN INPUT OUTPUT

SLIDE 7

Results (1/2)

SLIDE 8

Results of generating ‘new’ images

SLIDE 9

Results & Discussion

Sampled images
Good local coherence
Incoherent global structure
Sharp images!
SOTA on likelihood CIFAR-10
Discussion
Slow generation (sequential)
No latent representation
(Teacher forcing)

CIFAR-10. NLL = Negative log likelihood in bits per dimension (lower is better)

SLIDE 10

Preview of next coffeetalk

PixelCNN++ (faster), Conditional PixelCNN, PixelVAE, …
Use a pyramid of pixel CNN models
Go from low resolution to high resolution
Improves global coherence of generated images
Model becomes much faster
Decomposition of likelihood (high level details, low level details)
Next coffeetalk: “PixelCNN with Auxiliary Variables for Natural Image Modeling” C.H.

Lampert

Want to know more?

https://www.cs.toronto.edu/~duvenaud/courses/csc2541/index.html Good course on Deep Generative Models (GAN, VAE, pixelCNN, Real NVP,…)