Applications of GANs Photo-Realistic Single Image Super-Resolution - - PowerPoint PPT Presentation

applications of gans
SMART_READER_LITE
LIVE PREVIEW

Applications of GANs Photo-Realistic Single Image Super-Resolution - - PowerPoint PPT Presentation

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Generative Adversarial Text to Image Synthesis


slide-1
SLIDE 1

Applications of GANs

  • Photo-Realistic Single Image Super-Resolution Using a

Generative Adversarial Network

  • Deep Generative Image Models using a Laplacian Pyramid
  • f Adversarial Networks
  • Generative Adversarial Text to Image Synthesis

1

slide-2
SLIDE 2

Using GANs for Single Image Super-Resolution

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

2

slide-3
SLIDE 3

Problem

How do we get a high resolution (HR) image from just one (LR) lower resolution image? Answer: We use super-resolution (SR) techniques.

http://www.extremetech.com/wp-content/uploads/2012/07/super-resolution-freckles.jpg 3

slide-4
SLIDE 4

Previous Attempts

4

slide-5
SLIDE 5

SRGAN

5

slide-6
SLIDE 6

SRGAN - Generator

  • G: generator that takes a low-res image ILR and outputs its high-res

counterpart ISR

  • θG: parameters of G, {W1:L, b1:L}
  • lSR: loss function measures the difference between the 2 high-res images

6

slide-7
SLIDE 7

SRGAN - Discriminator

  • D: discriminator that classifies whether a high-res image is IHR or ISR
  • θD: parameters of D

7

slide-8
SLIDE 8

SRGAN - Perceptual Loss Function

Loss is calculated as weighted combination of: ➔ Content loss ➔ Adversarial loss ➔ Regularization loss

8

slide-9
SLIDE 9

SRGAN - Content Loss

Instead of MSE, use loss function based on ReLU layers of pre-trained VGG

  • network. Ensures similarity of content.
  • i,j : feature map of jth convolution before ith maxpooling
  • Wi,j and Hi,j: dimensions of feature maps in the VGG

9

slide-10
SLIDE 10

SRGAN - Adversarial Loss

Encourages network to favour images that reside in manifold of natural images.

10

slide-11
SLIDE 11

SRGAN - Regularization Loss

Encourages spatially coherent solutions based on total variations.

11

slide-12
SLIDE 12

SRGAN - Examples

12

slide-13
SLIDE 13

SRGAN - Examples

13

slide-14
SLIDE 14

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Work by Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

14

slide-15
SLIDE 15

Short Background

15

slide-16
SLIDE 16

Conditional Generative Adversarial Nets (CGAN)

Mirza and Osindero (2014)

GAN CGAN

16

slide-17
SLIDE 17

Laplacian pyramid

Burt and Adelson (1983)

17

slide-18
SLIDE 18

Laplacian pyramid

Burt and Adelson (1983)

18

slide-19
SLIDE 19

Laplacian Pyramid Generative Adversarial Network (LAPGAN)

19

slide-20
SLIDE 20

Image Generation

20

slide-21
SLIDE 21

Training

21

slide-22
SLIDE 22

Generation: Coarse to fine

22

slide-23
SLIDE 23

Different draws, starting from the same initial 4x4 image

23

slide-24
SLIDE 24
  • The Laplacian Pyramid Framework is independent of the Generative Model

Some thoughts on the method

Possible to use a completely different model like Pixel RNN

24

slide-25
SLIDE 25

Some thoughts on the method

  • The Generative Models at each step can be totally different!

These can also be different models!

25

slide-26
SLIDE 26

Some thoughts on the method

  • The Generative Models at each step can be totally different!

Low resolution architecture High resolution architecture

26

slide-27
SLIDE 27

Generative Adversarial Text to Image Synthesis

Author’s code available at: https://github.com/reedscot/icml2016

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

27

slide-28
SLIDE 28

Motivation

Current deep learning models enable us to...

➢ Learn feature representations of images & text ➢ Generate realistic images & text

pull out images based on captions generate descriptions based on images answer questions about image content

28

slide-29
SLIDE 29

Problem - Multimodal distribution

  • Many plausible image can be associated with one

single text description

  • Previous attempt uses Variational Recurrent

Autoencoders to generate image from text caption but the images were not realistic enough. (Mansimov et al. 2016)

29

slide-30
SLIDE 30

What GANs can do

  • CGAN: Use side information (eg. classes) to guide

the learning process

  • Minimax game: Adaptive loss function

➢ Multi-modality is a very well suited property for GANs to learn.

30

slide-31
SLIDE 31

The Model - Basic CGAN

Pre-trained char-CNN-RNN Learns a compatibility function of images and text -> joint embedding

31

slide-32
SLIDE 32

The Model - Variations

GAN-CLS In order to distinguish different error sources: Present to the discriminator network 3 different types of input. (instead of 2) Algorithm

32

slide-33
SLIDE 33

The Model - Variations cont.

GAN-INT In order to generalize the output of G: Interpolate between training set embeddings to generate new text and hence fill the gaps

  • n the image data

manifold. Updated Equation GAN-INT-CLS: Combination of both previous variations {fake image, fake text}

33

slide-34
SLIDE 34

Disentangling

❖Style is background, position & orientation of the object, etc. ❖Content is shape, size & colour of the object, etc.

  • Introduce S(x), a style encoder with a squared loss

function:

  • Useful in generalization: encoding style and content

separately allows for different new combinations

34

slide-35
SLIDE 35

Training - Data (separated into class-disjoint train and test sets)

Caltech-UCSD Birds MS COCO Oxford Flowers

35

slide-36
SLIDE 36

Training – Results: Flower & Bird

36

slide-37
SLIDE 37

Training – Results: MS COCO Mansimov et al.

37

slide-38
SLIDE 38

Training – Results Style disentangling

38

slide-39
SLIDE 39

Thoughts on the paper

  • Image quality
  • Generalization
  • Future work

39