PixelCNN Models with Auxiliary Variables for Natural Image Modeling - - PowerPoint PPT Presentation

pixelcnn models with auxiliary variables for natural
SMART_READER_LITE
LIVE PREVIEW

PixelCNN Models with Auxiliary Variables for Natural Image Modeling - - PowerPoint PPT Presentation

PixelCNN Models with Auxiliary Variables for Natural Image Modeling Alexander Kolesnikov*, Christoph H. Lampert* *IST Austria ICML 2017 PixelCNN Models with Auxiliary Variables 1. What is the task? 2. PixelCNN model (recap of last coffeetalk)


slide-1
SLIDE 1

PixelCNN Models with Auxiliary Variables for Natural Image Modeling

Alexander Kolesnikov*, Christoph H. Lampert* *IST Austria ICML 2017

slide-2
SLIDE 2

PixelCNN Models with Auxiliary Variables

  • 1. What is the task?
  • 2. PixelCNN model (recap of last coffeetalk)
  • 3. Proposed models

a) Grayscale Pixel CNN b) Pyramid Pixel CNN

  • 4. Conclusion
slide-3
SLIDE 3

What is the task? Density estimation

  • Task:
  • Input: training set of images
  • Output: model estimating p(x)
  • Evaluation: measure p(x) on testset.
  • Higher p(x) is better.
  • Note: p(x) should be normalized
  • Why learn p(x)?
  • Representation learning
  • Image reconstruction
  • Deblurring
  • Super resolution
  • Image compression
slide-4
SLIDE 4

Recap of Pixel CNN

  • Pixel CNN is a recurrent network
  • Generative model
  • Input: previously generated pixels
  • Output: pdf (prediction) for next pixel
  • Pro’s
  • Can compute p(x) (unlike GANs)
  • Train by maximum likelihood
  • Stable training (unlike GANs)
  • Generates sharp images (unlike VAE)
  • Cons
  • No latent variables
  • Generation of images is very slow

because of recurrent structure

  • Incoherent global image structure
slide-5
SLIDE 5

Proposed Grayscale Pixel CNN

  • PixelCNN models lowlevel feature well, but not

global structure.

  • Idea: Likelihood is dominated by low-level details.
  • First pixel CNN for global details
  • Output: grayscale version with 4 bits / pixel.
  • Second pixel CNN for low level details
  • Input: output of first model (auxiliary variable)
  • Output: 24 bit color image.

First Pixel CNN Second Pixel CNN

Deep CNN (feature extractor)

slide-6
SLIDE 6

Grayscale Pixel CNN: Results

  • State of the art performance on testset (CIFAR-10)
  • Samples are highly diverse and have coherent global structure
  • Is not overfitting (train loss = test loss, approximately)
  • Decomposition of likelihood in 2 parts shows that indeed low-level

detail have more influence in the likelihood objective.

  • Because grayscale pixel CNN has 2 models, the objectives do not interfere
slide-7
SLIDE 7

Pyramid Pixel CNN

  • Motivations: (1) asymmetry, lower

right pixel has access to more

  • information. (2) speed up model.
  • Idea

P1

Very Deep CNN

P2

Very Deep CNN

P3

Very Deep CNN

P4

Very Deep CNN

P5

slide-8
SLIDE 8

Pyramid Pixel CNN: Results (1/2)

  • Close to SOTA on CIFAR-10
  • Speed up factor at least 10x
  • Evaluation on CelebA

MAP MAP

slide-9
SLIDE 9

Pyramid Pixel CNN: Results (2/2)

slide-10
SLIDE 10

Conclusions

  • Low-level details distract models from learning high level details
  • Use 2 models (low level model, high level model)
  • Multiscale architecture can model high resolution faces
  • Next coffeetalk: “Neural Discrete Representation Learning”
slide-11
SLIDE 11
slide-12
SLIDE 12

Grayscale results

slide-13
SLIDE 13

Grayscale colored