pixelcnn models with auxiliary variables for natural
play

PixelCNN Models with Auxiliary Variables for Natural Image Modeling - PowerPoint PPT Presentation

PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*, Christoph H. Lampert* *IST Austria ICML 2017 PixelCNN Models with Auxiliary Variables 1. What is the task? 2. PixelCNN model (recap of last coffeetalk)


  1. PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*, Christoph H. Lampert* *IST Austria ICML 2017

  2. PixelCNN Models with Auxiliary Variables 1. What is the task? 2. PixelCNN model (recap of last coffeetalk) 3. Proposed models a) Grayscale Pixel CNN b) Pyramid Pixel CNN 4. Conclusion

  3. What is the task? Density estimation • Task: • Why learn p(x)? • Input: training set of images • Representation learning • Output: model estimating p(x) • Image reconstruction • Evaluation: measure p(x) on testset. • Deblurring • Higher p(x) is better. • Super resolution • Note: p(x) should be normalized • Image compression • …

  4. Recap of Pixel CNN • Pixel CNN is a recurrent network • Generative model • Input: previously generated pixels • Output: pdf (prediction) for next pixel • Pro’s • Cons • Can compute p(x) (unlike GANs) • No latent variables • Train by maximum likelihood • Generation of images is very slow • Stable training (unlike GANs) because of recurrent structure • Generates sharp images (unlike VAE) • Incoherent global image structure

  5. First Pixel CNN Proposed Grayscale Pixel CNN • PixelCNN models lowlevel feature well, but not global structure. • Idea: Likelihood is dominated by low-level details. Deep CNN (feature extractor) • First pixel CNN for global details • Output: grayscale version with 4 bits / pixel. Second Pixel CNN • Second pixel CNN for low level details • Input: output of first model (auxiliary variable) • Output: 24 bit color image.

  6. Grayscale Pixel CNN: Results • State of the art performance on testset (CIFAR-10) • Samples are highly diverse and have coherent global structure • Is not overfitting (train loss = test loss, approximately) • Decomposition of likelihood in 2 parts shows that indeed low-level detail have more influence in the likelihood objective. • Because grayscale pixel CNN has 2 models, the objectives do not interfere

  7. Pyramid Pixel CNN • Motivations: (1) asymmetry, lower right pixel has access to more information. (2) speed up model. Very Deep CNN Very Deep CNN Very Deep CNN Very Deep CNN • Idea P1 P2 P3 P4 P5

  8. Pyramid Pixel CNN: Results (1/2) • Close to SOTA on CIFAR-10 • Speed up factor at least 10x • Evaluation on CelebA MAP MAP

  9. Pyramid Pixel CNN: Results (2/2)

  10. Conclusions • Low-level details distract models from learning high level details • Use 2 models (low level model, high level model) • Multiscale architecture can model high resolution faces • Next coffeetalk : “ Neural Discrete Representation Learning”

  11. Grayscale results

  12. Grayscale colored

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend