SLIDE 1 Some results on GAN dynamics
Ioannis Mitliagkas
SLIDE 2
Game dynamics are weird fascinating
SLIDE 3
Start with optimization dynamics
SLIDE 4
Optimization
Smooth, differentiable cost function, L → Looking for stationary (fixed) points (gradient is 0) → Gradient descent
SLIDE 5 Optimization
Conservative vector field → Straightforward dynamics
Ferenc Huszar
SLIDE 6 Gradient descent
Conservative vector field → Straightforward dynamics Fixed-point analysis Jacobian of operator
Hessian of objective, L
SLIDE 7 Local convergence
Jacobian of operator
Hessian of objective, L Symmetric, real-eigenvalues
Eigenvalues of op. Jacobian If ρ(θ*)=max |λ(θ*)|<1, then fast local convergence
SLIDE 8
Games
SLIDE 9 Implicit generative models
- Generative moment matching networks [Li et al. 2017]
- Other, domain-specific losses can be used
- Variational AutoEncoders [Kingma, Welling, 2014]
- Autoregressive models (PixelRNN [van den Oord, 2016])
SLIDE 10 Generative Adversarial Networks
Generator network, G Given latent code, z, produces sample G(z) Discriminator network, D Given sample x or G(z), estimates probability it is real Both differentiable
SLIDE 11
Generative Adversarial Networks
SLIDE 12
Games
Nash Equilibrium Smooth, differentiable L → Looking for local Nash equil. → Gradient descent → Simultaneous → Alternating
SLIDE 13
Game dynamics
Non-conservative vector field → Rotational dynamics
SLIDE 14 Game dynamics under gradient descent
Jacobian is non-symmetric, with complex eigenvalues → Rotations in decision space
Games demonstrate rotational dynamics.
SLIDE 15
The Numerics of GANs
by Mescheder, Nowozin, Geiger
SLIDE 16
A word on notation and formulation
Warning: Maximization vs minimization Step size
SLIDE 17
Eigen-analysis, gradient descent
SLIDE 18
The Numerics of GANs
SLIDE 19
Make vector field “more conservative”
Idea 1: Minimize the norm of the gradient
SLIDE 20
Idea 1: Minimize vector field norm
SLIDE 21
Idea 2: use L as regularizer
SLIDE 22
Idea 2: use L as regularizer
SLIDE 23
Idea 2: use L as regularizer
SLIDE 24
Other ways to control these rotations?
SLIDE 25 Momentum (heavy ball, Polyak 1964)
Jacobian of momentum operator
Non-symmetric, with complex eigenvalues → Rotations in augmented state-space
SLIDE 26
Summary
Positive momentum can be bad for adversarial games Practice that was very common when GANs were first invented. → Recent work reduced the momentum parameter. → Not an accident
SLIDE 27 Negative Momentum for Improved Game Dynamics
Gidel, Askari Hemmat, Pezeshki, Huang, Lepriol, Lacoste-Julien, Mitliagkas AISTATS 2019
SLIDE 28
Our results
Negative momentum is optimal on simple bilinear game Negative momentum is empirically best for certain zero sum games like “saturating GANs’’ Negative momentum values are locally preferrable near 0 on a more general class of games
SLIDE 29 Momentum on games
Fixed point operator requires a state augmentation: (because we need previous iterate) Recall Polyak’s momentum (on top of simultaneous grad. desc.):
SLIDE 30
Bilinear game
SLIDE 31
“Proof by picture”
Gradient descent → Simultaneous → Alternating Momentum → Positive → Negative
SLIDE 32
General games
SLIDE 33
Eigen-analysis, 0 momentum
SLIDE 34
Zero vs negative momentum
Momentum → Zero → Negative
SLIDE 35
Negative Momentum
SLIDE 36
Empirical results
SLIDE 37 What happens in practice ?
Fashion MNIST:
SLIDE 38 What happens in practice ?
CIFAR-10:
SLIDE 39 Negative Momentum
To sum up:
- Negative momentum seems to improve the behaviour due to
“bad” eigenvalues.
- Optimal for a class of games
- Empirically optimal on “saturating” GANs