Pixel Recurrent Neural Networks Aaron van den Oord, Nal - - PowerPoint PPT Presentation

pixel recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Pixel Recurrent Neural Networks Aaron van den Oord, Nal - - PowerPoint PPT Presentation

Pixel Recurrent Neural Networks Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16 188 citations Pixel Recurrent Neural Networks 1. What is the task? 2. Other models: GAN, VAE, 3. PixelCNN model 4. Results


slide-1
SLIDE 1

Pixel Recurrent Neural Networks

Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu Google Deepmind ICML'16

188 citations

slide-2
SLIDE 2

Pixel Recurrent Neural Networks

  • 1. What is the task?
  • 2. Other models: GAN, VAE, …
  • 3. PixelCNN model
  • 4. Results
  • 5. Discussion & Conclusion
  • 6. Extensions (preview of next coffeetalk)
slide-3
SLIDE 3

Goal: learning the distribution of natural images

  • Task:
  • Input: training set of images
  • Output: model that estimates p(x)

for any image x

  • Evaluation: measure p(x) on testset.

Higher p(x) is better.

  • Note: p(x) should be normalized
  • Why learn p(x)?
  • Image reconstruction / inpainting /

denoising: input corrupted image,

  • utput fixed image
  • Image colorization: input greyscale,
  • utput color image
  • Semi-supervised learning (low

density separation)

  • Representation learning (find

manifold of natural images)

  • Dimensionality reduction / finding

variations in data

  • Clustering
slide-4
SLIDE 4

Other approaches

GAN Variational Autoencoder (VAE) Pixel CNN (This talk) Invertible models Real NVP Compute exact likelihood p(x)

 

Has latent variable z

   

Compute latent variable z (inference)

   

Stable training? (No mode collapse)

   ?

Sharp images?

   ?

slide-5
SLIDE 5

Pixel CNN (1/2)

  • Why is computing 𝑞(𝑦) so difficult?
  • This is the reason why GANs avoid it, and VAE approximate it
  • Answer: normalization of 𝑞(𝑦)
  • We need to integrate the model output over all images x which is intractable
  • Pixel CNN computes 𝑞(𝑦) using the chain rule of probability

𝑞 𝑦 = 𝑞 𝑦4 𝑦3, 𝑦2, 𝑦1 𝑞 𝑦3 𝑦2, 𝑦1 𝑞 𝑦2 𝑦1 𝑞 𝑦1

  • The function 𝑞(𝑦𝑗|𝑦𝑗−1, … , 𝑦1) is modeled using a CNN
  • This 1D function is easy to keep normalized
  • If this conditional density is normalized, 𝑞(𝑦) is properly normalized as well!
slide-6
SLIDE 6

Pixel CNN (2/2)

1. Order pixels 2. Imagine already generated pixels 1-6, want to predict pixel 7 3. Mask pixels 7-16 (set to 0) 4. CNN outputs normalized histogram for pixel 7 given pixel values 1-6 (maksed input)

  • Maximize log likelihood w.r.t. CNN parameters

Image from trainset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Masked image 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CNN INPUT OUTPUT

slide-7
SLIDE 7

Results (1/2)

slide-8
SLIDE 8

Results of generating ‘new’ images

slide-9
SLIDE 9

Results & Discussion

  • Sampled images
  • Good local coherence
  • Incoherent global structure
  • Sharp images!
  • SOTA on likelihood CIFAR-10
  • Discussion
  • Slow generation (sequential)
  • No latent representation
  • (Teacher forcing)

CIFAR-10. NLL = Negative log likelihood in bits per dimension (lower is better)

slide-10
SLIDE 10

Preview of next coffeetalk

  • PixelCNN++ (faster), Conditional PixelCNN, PixelVAE, …
  • Use a pyramid of pixel CNN models
  • Go from low resolution to high resolution
  • Improves global coherence of generated images
  • Model becomes much faster
  • Decomposition of likelihood (high level details, low level details)
  • Next coffeetalk: “PixelCNN with Auxiliary Variables for Natural Image Modeling” C.H.

Lampert

  • Want to know more?

https://www.cs.toronto.edu/~duvenaud/courses/csc2541/index.html Good course on Deep Generative Models (GAN, VAE, pixelCNN, Real NVP,…)