Generative adversarial networks Ian Jean Mehdi Goodfellow - - PowerPoint PPT Presentation

generative adversarial networks
SMART_READER_LITE
LIVE PREVIEW

Generative adversarial networks Ian Jean Mehdi Goodfellow - - PowerPoint PPT Presentation

Generative adversarial networks Ian Jean Mehdi Goodfellow Pouget-Abadie Mirza David Bing Sherjil Warde-Farley Xu Ozair Aaron Yoshua Courville Bengio 1 Discriminative deep learning Recipe for success x 2014 NIPS Workshop on


slide-1
SLIDE 1

Generative adversarial networks

1

Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville Yoshua Bengio

slide-2
SLIDE 2

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Discriminative deep learning

  • Recipe for success

2

x

slide-3
SLIDE 3

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Recipe for success:
  • 3
  • Google’s winning entry

into the ImageNet 1K competition (with extra data).

Discriminative deep learning

slide-4
SLIDE 4

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Recipe for success:
  • Gradient backpropagation.
  • Dropout
  • Activation functions:
  • rectified linear
  • maxout

4

Discriminative deep learning

  • Google’s winning entry

into the ImageNet 1K competition (with extra data).

slide-5
SLIDE 5

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative modeling

  • Have training examples x ~ pdata(x )
  • Want a model that can draw samples: x ~

pmodel(x )

  • Where pmodel ≈ pdata

5

x ~ pdata(x ) x ~ pmodel(x )

slide-6
SLIDE 6

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Why generative models?

  • Conditional generative models
  • Speech synthesis: Text ⇒ Speech
  • Machine Translation: French ⇒ English
  • French: Si mon tonton tond ton tonton, ton tonton sera tondu.
  • English: If my uncle shaves your uncle, your uncle will be shaved
  • Image ⇒ Image segmentation
  • Environment simulator
  • Reinforcement learning
  • Planning
  • Leverage unlabeled data

6

slide-7
SLIDE 7

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Maximum likelihood: the dominant approach

  • ML objective function

7

θ∗ = max

θ

1 m

m

  • i=1

log p

  • x(i); θ
slide-8
SLIDE 8

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Undirected graphical models

  • State-of-the-art general purpose undirected

graphical model: Deep Boltzmann machines

  • Several “hidden layers” h

8

p(h, x) = 1 Z ˜ p(h, x) ˜ p(h, x) = exp(−E(h, x)) Z =

  • h,x

˜ p(h, x)

h(1) h(2) h(3) x

slide-9
SLIDE 9

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Undirected graphical models: disadvantage

  • ML Learning requires that we draw samples:
  • Common way to do this is via MCMC (Gibbs sampling).

9

h(1) h(2) h(3) x d dθi log p(x) = d dθi

  • log
  • h

˜ p(h, x) − log Z(θ)

  • d

dθi log Z(θ) =

d dθi Z(θ)

Z(θ)

slide-10
SLIDE 10

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Boltzmann Machines: disadvantage

  • Model is badly parameterized for learning high

quality samples.

  • Why?
  • Learning leads to large values of the model parameters.
  • Large valued parameters = peaky distribution.
  • Large valued parameters means slow mixing of sampler.
  • Slow mixing means that the gradient updates are

correlated ⇒ leads to divergence of learning.

10

slide-11
SLIDE 11

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow 11

Boltzmann Machines: disadvantage

  • Model is badly parameterized for learning high

quality samples.

  • Why poor mixing?

MNIST dataset 1st layer features (RBM)

Coordinated flipping of low- level features

slide-12
SLIDE 12

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Directed graphical models

  • Two problems:
  • 1. Summation over exponentially many states in h
  • 2. Posterior inference, i.e. calculating p(h | x), is intractable.

12

p(x, h) = p(x | h(1))p(h(1) | h(2)) . . . p(h(L−1) | h(L))p(h(L))

h(1) h(2) h(3) x

d dθi log p(x) = 1 p(x) d dθi p(x) p(x) =

  • h

p(x | h)p(h)

slide-13
SLIDE 13

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Directed graphical models: New approaches

13

  • The Variational Autoencoder model:
  • Kingma and Welling, Auto-Encoding Variational Bayes, International

Conference on Learning Representations (ICLR) 2014.

  • Rezende, Mohamed and Wierstra, Stochastic back-propagation and

variational inference in deep latent Gaussian models. ArXiv.

  • Use a reparametrization that allows them to train very efficiently

with gradient backpropagation.

slide-14
SLIDE 14

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative stochastic networks

  • General strategy: Do not write a formula for p(x),

just learn to sample incrementally.

  • Main issue: Subject to some of the same constraints
  • n mixing as undirected graphical models.

14

...

slide-15
SLIDE 15

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative adversarial networks

  • Don’t write a formula for p(x), just learn to sample

directly.

  • No summation over all states.
  • How? By playing a game.

15

slide-16
SLIDE 16

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Two-player zero-sum game

  • Your winnings + your opponent’s winnings = 0
  • Minimax theorem: a rational strategy exists for all

such finite games

16

slide-17
SLIDE 17

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Strategy: specification of which moves you make in which

circumstances.

  • Equilibrium: each player’s strategy is the best possible for

their opponent’s strategy.

  • Example: Rock-paper-scissors:
  • Mixed strategy equilibrium
  • Choose you action at random

17

  • 1

1 1

  • 1
  • 1

1

You Your opponent Rock Paper Scissors Rock Paper Scissors

Two-player zero-sum game

slide-18
SLIDE 18

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative modeling with game theory?

  • Can we design a game with a mixed-strategy

equilibrium that forces one player to learn to generate from the data distribution?

18

slide-19
SLIDE 19

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Adversarial nets framework

19

  • A game between two players:
  • 1. Discriminator D
  • 2. Generator G
  • D tries to discriminate between:
  • A sample from the data distribution.
  • And a sample from the generator G.
  • G tries to “trick” D by generating samples that are

hard for D to distinguish from data.

slide-20
SLIDE 20

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Adversarial nets framework

20

Input noise Z Differentiable function G x sampled from model Differentiable function D D tries to

  • utput 0

x sampled from data Differentiable function D D tries to

  • utput 1

x x

z

slide-21
SLIDE 21

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Minimax objective function:
  • In practice, to estimate G we use:

Why? Stronger gradient for G when D is very good.

Zero-sum game

21

min

G max D V (D, G) = Ex∼pdata(x)[log D(x)] + Ez∼pz(z)[log(1 − D(G(z)))].

max

G Ez∼pz(z)[log D(G(z))]

slide-22
SLIDE 22

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Discriminator strategy

  • Optimal strategy for any pmodel(x) is always

22

D(x) = pdata(x) pdata(x) + pmodel(x)

slide-23
SLIDE 23

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

23

. . .

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-24
SLIDE 24

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

24

. . .

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-25
SLIDE 25

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

25

. . .

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-26
SLIDE 26

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

26

. . .

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-27
SLIDE 27

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Theoretical properties

  • Theoretical properties (assuming infinite data, infinite

model capacity, direct updating of generator’s distribution):

  • Unique global optimum.
  • Optimum corresponds to data distribution.
  • Convergence to optimum guaranteed.

27

min

G max D V (D, G) = Ex∼pdata(x)[log D(x)] + Ez∼pz(z)[log(1 − D(G(z)))].

slide-28
SLIDE 28

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Quantitative likelihood results

28

Model MNIST TFD DBN [3] 138 ± 2 1909 ± 66 Stacked CAE [3] 121 ± 1.6 2110 ± 50 Deep GSN [6] 214 ± 1.1 1890 ± 29 Adversarial nets 225 ± 2 2057 ± 26

  • Parzen window-based log-likelihood estimates.
  • Density estimate with Gaussian kernels centered on

the samples drawn from the model.

slide-29
SLIDE 29

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Visualization of model samples

29

MNIST TFD CIFAR-10 (fully connected) CIFAR-10 (convolutional)

slide-30
SLIDE 30

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learned 2-D manifold of MNIST

30

slide-31
SLIDE 31

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • 1. Draw sample (A)
  • 2. Draw sample (B)
  • 3. Simulate samples

along the path between A and B

  • 4. Repeat steps 1-3 as

desired.

Visualizing trajectories

31

A B

slide-32
SLIDE 32

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Visualization of model trajectories

32

MNIST digit dataset Toronto Face Dataset (TFD)

slide-33
SLIDE 33

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow 33

CIFAR-10 (convolutional)

Visualization of model trajectories

slide-34
SLIDE 34

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

34

  • Conditional model:
  • Learn p(x | y)
  • Discriminator is trained on (x,y) pairs
  • Generator net gets y and z as input
  • Useful for: Translation, speech synth, image

segmentation.

slide-35
SLIDE 35

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

35

  • Inference net:
  • Learn a network to model p(z | x)
  • Infinite training set!
slide-36
SLIDE 36

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

36

  • Take advantage of high amounts of unlabeled data

using the generator.

  • Train G on a large, unlabeled dataset
  • Train G’ to learn p(z|x) on an infinite training set
  • Add a layer on top of G’, train on a small labeled

training set

slide-37
SLIDE 37

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

37

  • Take advantage of unlabeled data using the

discriminator

  • Train G and D on a large amount of unlabeled data
  • Replace the last layer of D
  • Continue training D on a small amount of labeled

data

slide-38
SLIDE 38

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Emily Denton1∗, Soumith Chintala2∗, Arthur Szlam2, Rob Fergus2

1New York University 2Facebook AI Research ∗Denotes equal contribution

December 16, 2015

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-39
SLIDE 39

Overview

Parametric generative model of natural images Difficult to generate large natural images in one shot, but we can exploit their multi-scale structure We combine the power of generative adversarial networks (GAN) with a multi-scale image representation (Laplacian pyramid) → → → →

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-40
SLIDE 40

Generative modelling of natural images

Have access to x ∼ pdata(x) through training set Want to learn a model x ∼ pmodel(x) Want pmodel to be similar to pdata

Samples drawn from pmodel reflect structure of pdata Samples from true data distribution have high likelihood under pmodel

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-41
SLIDE 41

Why do generative modeling?

Unsupervised representation learning

Can transfer learned representation so discriminative tasks, retrieval, clustering, etc.

Train network with both discriminative and generative criterion

Very little labeled data Regularization

Understand data Density estimation ...

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-42
SLIDE 42

CIFAR-10 samples from other models

Goodfellow et al. (2014): Sohl-Dickstein et al. (2015): Gregor et al. (2015):

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-43
SLIDE 43

Generative adversarial networks (Goodfellow et al., 2014)

Generative model G: captures data distribution Discriminative model D: trained to distinguish between real and fake samples , defines loss function for G

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-44
SLIDE 44

Generative adversarial networks

D is trained to estimate the probability that a sample came from data distribution rather than G G is trained to maximize the probability of D making a mistake min

G max D Ex∼pdata(x)[log D(x)] + Ez∼pnoise(z)[log(1 − D(G(z)))]

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-45
SLIDE 45

Conditional generative adversarial networks (CGAN)

Condition generation on additional info y (e.g. class label, another image) D has to determine if samples are realistic given y

[Mirza and Osindero (2014); Gauthier (2014)]

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-46
SLIDE 46

Laplacian pyramid (Burt & Adelson, 1983)

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-47
SLIDE 47

Laplacian pyramid (Burt & Adelson, 1983)

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-48
SLIDE 48

Training procedure

Train conditional GAN for each level of Laplacian pyramid G learns to generate high frequency structure consistent with low frequency image

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-49
SLIDE 49

Training procedure

Each level of Laplacian pyramid trained independently

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-50
SLIDE 50

Sampling procedure

G2

~ I3

G3

z2 ~ h2 z3

G1

z1

G0

z0 ~ I2 l2 ~ I0 h0 ~ I1 ~ ~ h1 l1 l0

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-51
SLIDE 51

CIFAR-10

Small dataset 32x32 images of objects, 50k images, 10 classes

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-52
SLIDE 52

CIFAR-10 ship samples

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-53
SLIDE 53

CIFAR-10 horse samples

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-54
SLIDE 54

CIFAR-10 nearest neighbours (pixel space)

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-55
SLIDE 55

CIFAR-10 nearest neighbours (nn feature space)

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-56
SLIDE 56

CIFAR-10 human evaluations

Humans randomly presented with real or generated image and asked to determine if real of fake Humans think LAPGAN generations are real ∼40% of the time

50 75 100 150 200 300 400 650 1000 2000 10 20 30 40 50 60 70 80 90 100

Presentation time (ms) % classified real Real CC−LAPGAN LAPGAN GAN

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-57
SLIDE 57

LSUN

Large dataset of scenes, ∼10M images, 10 classes.

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-58
SLIDE 58

LSUN coarse-to-fine chain

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-59
SLIDE 59

LSUN church samples

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-60
SLIDE 60

LSUN tower samples

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-61
SLIDE 61

LSUN variability

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-62
SLIDE 62

Recent developments in GAN training

Radford, Metz and Chintala (2015) propose several tricks to make GAN training more stable

http://arxiv.org/pdf/1511.06434v1.pdf

Future work: apply same tricks to training of LAPGAN model to potenitally improve samples and produce higher resolution images

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-63
SLIDE 63

Conclusion

Proposed a simple generative model that can produce decent quality samples of natural images Potential to be used as a decoder in autoencoder framework for unsupervised learning GAN framework is difficult to train, no clear objective function to track Code & demo: http://soumith.ch/eyescream

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets

slide-64
SLIDE 64

The End

Code & demo: http://soumith.ch/eyescream

  • E. Denton, S. Chintala, et al.

Laplacian Pyramid of Generative Adversarial Nets