Generative adversarial networks Ian Jean Mehdi Goodfellow - - PowerPoint PPT Presentation

generative adversarial networks
SMART_READER_LITE
LIVE PREVIEW

Generative adversarial networks Ian Jean Mehdi Goodfellow - - PowerPoint PPT Presentation

Generative adversarial networks Ian Jean Mehdi Goodfellow Pouget-Abadie Mirza David Bing Sherjil Warde-Farley Xu Ozair Aaron Yoshua Courville Bengio 1 Discriminative deep learning Recipe for success x 2014 NIPS Workshop on


slide-1
SLIDE 1

Generative adversarial networks

1

Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville Yoshua Bengio

slide-2
SLIDE 2

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Discriminative deep learning

  • Recipe for success

2

x

slide-3
SLIDE 3

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Recipe for success:
  • 3
  • Google’s winning entry

into the ImageNet 1K competition (with extra data).

Discriminative deep learning

slide-4
SLIDE 4

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Recipe for success:
  • Gradient backpropagation.
  • Dropout
  • Activation functions:
  • rectified linear
  • maxout

4

Discriminative deep learning

  • Google’s winning entry

into the ImageNet 1K competition (with extra data).

slide-5
SLIDE 5

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative modeling

  • Have training examples x ~ pdata(x )
  • Want a model that can draw samples: x ~ pmodel(x )
  • Where pmodel ≈ pdata

5

x ~ pdata(x ) x ~ pmodel(x )

slide-6
SLIDE 6

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Why generative models?

  • Conditional generative models
  • Speech synthesis: Text ⇒ Speech
  • Machine Translation: French ⇒ English
  • French: Si mon tonton tond ton tonton, ton tonton sera tondu.
  • English: If my uncle shaves your uncle, your uncle will be shaved
  • Image ⇒ Image segmentation
  • Environment simulator
  • Reinforcement learning
  • Planning
  • Leverage unlabeled data

6

slide-7
SLIDE 7

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Maximum likelihood: the dominant approach

  • ML objective function

7

θ∗ = max

θ

1 m

m

X

i=1

log p ⇣ x(i); θ ⌘

slide-8
SLIDE 8

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Undirected graphical models

  • State-of-the-art general purpose undirected

graphical model: Deep Boltzmann machines

  • Several “hidden layers” h

8

p(h, x) = 1 Z ˜ p(h, x) ˜ p(h, x) = exp(−E(h, x)) Z =

  • h,x

˜ p(h, x)

h(1) h(2) h(3) x

slide-9
SLIDE 9

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Undirected graphical models: disadvantage

  • ML Learning requires that we draw samples:
  • Common way to do this is via MCMC (Gibbs sampling).

9

h(1) h(2) h(3) x d dθi log p(x) = d dθi " log X

h

˜ p(h, x) − log Z(θ) # d dθi log Z(θ) =

d dθi Z(θ)

Z(θ)

slide-10
SLIDE 10

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Boltzmann Machines: disadvantage

  • Model is badly parameterized for learning high

quality samples.

  • Why?
  • Learning leads to large values of the model parameters.
  • Large valued parameters = peaky distribution.
  • Large valued parameters means slow mixing of sampler.
  • Slow mixing means that the gradient updates are

correlated ⇒ leads to divergence of learning.

10

slide-11
SLIDE 11

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow 11

Boltzmann Machines: disadvantage

  • Model is badly parameterized for learning high

quality samples.

  • Why poor mixing?

MNIST dataset 1st layer features (RBM)

Coordinated flipping of low- level features

slide-12
SLIDE 12

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Directed graphical models

  • Two problems:
  • 1. Summation over exponentially many states in h
  • 2. Posterior inference, i.e. calculating p(h | x), is intractable.

12

p(x, h) = p(x | h(1))p(h(1) | h(2)) . . . p(h(L−1) | h(L))p(h(L))

h(1) h(2) h(3) x

d dθi log p(x) = 1 p(x) d dθi p(x) p(x) =

  • h

p(x | h)p(h)

slide-13
SLIDE 13

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Directed graphical models: New approaches

13

  • The Variational Autoencoder model:
  • Kingma and Welling, Auto-Encoding Variational Bayes, International

Conference on Learning Representations (ICLR) 2014.

  • Rezende, Mohamed and Wierstra, Stochastic back-propagation and

variational inference in deep latent Gaussian models. ArXiv.

  • Use a reparametrization that allows them to train very efficiently

with gradient backpropagation.

slide-14
SLIDE 14

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative stochastic networks

  • General strategy: Do not write a formula for p(x),

just learn to sample incrementally.

  • Main issue: Subject to some of the same constraints
  • n mixing as undirected graphical models.

14

...

slide-15
SLIDE 15

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative adversarial networks

  • Don’t write a formula for p(x), just learn to sample

directly.

  • No summation over all states.
  • How? By playing a game.

15

slide-16
SLIDE 16

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Two-player zero-sum game

  • Your winnings + your opponent’s winnings = 0
  • Minimax theorem: a rational strategy exists for all

such finite games

16

slide-17
SLIDE 17

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Strategy: specification of which moves you make in which

circumstances.

  • Equilibrium: each player’s strategy is the best possible for

their opponent’s strategy.

  • Example: Rock-paper-scissors:
  • Mixed strategy equilibrium
  • Choose you action at random

17

  • 1

1 1

  • 1
  • 1

1

You Your opponent Rock Paper Scissors Rock Paper Scissors

Two-player zero-sum game

slide-18
SLIDE 18

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Generative modeling with game theory?

  • Can we design a game with a mixed-strategy

equilibrium that forces one player to learn to generate from the data distribution?

18

slide-19
SLIDE 19

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Adversarial nets framework

19

  • A game between two players:
  • 1. Discriminator D
  • 2. Generator G
  • D tries to discriminate between:
  • A sample from the data distribution.
  • And a sample from the generator G.
  • G tries to “trick” D by generating samples that are

hard for D to distinguish from data.

slide-20
SLIDE 20

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Adversarial nets framework

20

Input noise Z Differentiable function G x sampled from model Differentiable function D D tries to

  • utput 0

x sampled from data Differentiable function D D tries to

  • utput 1

x x z

slide-21
SLIDE 21

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • Minimax objective function:
  • In practice, to estimate G we use:

Why? Stronger gradient for G when D is very good.

Zero-sum game

21

min

G max D V (D, G) = Ex∼pdata(x)[log D(x)] + Ez∼pz(z)[log(1 − D(G(z)))].

max

G Ez∼pz(z)[log D(G(z))]

slide-22
SLIDE 22

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Discriminator strategy

  • Optimal strategy for any pmodel(x) is always

22

D(x) = pdata(x) pdata(x) + pmodel(x)

slide-23
SLIDE 23

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

23

...

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-24
SLIDE 24

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

24

...

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-25
SLIDE 25

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

25

...

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-26
SLIDE 26

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learning process

26

...

Poorly fit model After updating D After updating G Mixed strategy equilibrium Data distribution Model distribution

pD(data)

slide-27
SLIDE 27

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Theoretical properties

  • Theoretical properties (assuming infinite data, infinite

model capacity, direct updating of generator’s distribution):

  • Unique global optimum.
  • Optimum corresponds to data distribution.
  • Convergence to optimum guaranteed.

27

min

G max D V (D, G) = Ex∼pdata(x)[log D(x)] + Ez∼pz(z)[log(1 − D(G(z)))].

slide-28
SLIDE 28

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Quantitative likelihood results

28

Model MNIST TFD DBN [3] 138 ± 2 1909 ± 66 Stacked CAE [3] 121 ± 1.6 2110 ± 50 Deep GSN [6] 214 ± 1.1 1890 ± 29 Adversarial nets 225 ± 2 2057 ± 26

  • Parzen window-based log-likelihood estimates.
  • Density estimate with Gaussian kernels centered on

the samples drawn from the model.

slide-29
SLIDE 29

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Visualization of model samples

29

MNIST TFD CIFAR-10 (fully connected) CIFAR-10 (convolutional)

slide-30
SLIDE 30

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Learned 2-D manifold of MNIST

30

slide-31
SLIDE 31

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

  • 1. Draw sample (A)
  • 2. Draw sample (B)
  • 3. Simulate samples

along the path between A and B

  • 4. Repeat steps 1-3 as

desired.

Visualizing trajectories

31

A B

slide-32
SLIDE 32

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Visualization of model trajectories

32

MNIST digit dataset Toronto Face Dataset (TFD)

slide-33
SLIDE 33

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow 33

CIFAR-10 (convolutional)

Visualization of model trajectories

slide-34
SLIDE 34

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

34

  • Conditional model:
  • Learn p(x | y)
  • Discriminator is trained on (x,y) pairs
  • Generator net gets y and z as input
  • Useful for: Translation, speech synth, image

segmentation.

slide-35
SLIDE 35

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

35

  • Inference net:
  • Learn a network to model p(z | x)
  • Infinite training set!
slide-36
SLIDE 36

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

36

  • Take advantage of high amounts of unlabeled data

using the generator.

  • Train G on a large, unlabeled dataset
  • Train G’ to learn p(z|x) on an infinite training set
  • Add a layer on top of G’, train on a small labeled

training set

slide-37
SLIDE 37

2014 NIPS Workshop on Perturbations, Optimization, and Statistics --- Ian Goodfellow

Extensions

37

  • Take advantage of unlabeled data using the

discriminator

  • Train G and D on a large amount of unlabeled data
  • Replace the last layer of D
  • Continue training D on a small amount of labeled

data

slide-38
SLIDE 38

Thank You.

38

Questions?