Some results on GAN dynamics Ioannis Mitliagkas Game dynamics are - - PowerPoint PPT Presentation

▶

Feb 01, 2024 15 likes •406 views

Some results on GAN dynamics Ioannis Mitliagkas Game dynamics are weird fascinating Start with optimization dynamics Optimization Smooth, differentiable cost function, L Looking for stationary (fixed) points (gradient is 0)

SLIDE 1

Some results on GAN dynamics

Ioannis Mitliagkas

SLIDE 2

Game dynamics are weird fascinating

SLIDE 3

Start with optimization dynamics

SLIDE 4

Optimization

Smooth, differentiable cost function, L → Looking for stationary (fixed) points (gradient is 0) → Gradient descent

SLIDE 5

Optimization

Conservative vector field → Straightforward dynamics

Ferenc Huszar

SLIDE 6

Gradient descent

Conservative vector field → Straightforward dynamics Fixed-point analysis Jacobian of operator

Hessian of objective, L

SLIDE 7

Local convergence

Jacobian of operator

Hessian of objective, L Symmetric, real-eigenvalues

Eigenvalues of op. Jacobian If ρ(θ)=max |λ(θ)|<1, then fast local convergence

SLIDE 8

Games

SLIDE 9

Implicit generative models

Generative moment matching networks [Li et al. 2017]
Other, domain-specific losses can be used
Variational AutoEncoders [Kingma, Welling, 2014]
Autoregressive models (PixelRNN [van den Oord, 2016])

SLIDE 10

Generative Adversarial Networks

Generator network, G Given latent code, z, produces sample G(z) Discriminator network, D Given sample x or G(z), estimates probability it is real Both differentiable

SLIDE 11

Generative Adversarial Networks

SLIDE 12

Games

Nash Equilibrium Smooth, differentiable L → Looking for local Nash equil. → Gradient descent → Simultaneous → Alternating

SLIDE 13

Game dynamics

Non-conservative vector field → Rotational dynamics

SLIDE 14

Game dynamics under gradient descent

Jacobian is non-symmetric, with complex eigenvalues → Rotations in decision space

Games demonstrate rotational dynamics.

SLIDE 15

The Numerics of GANs

by Mescheder, Nowozin, Geiger

SLIDE 16

A word on notation and formulation

Warning: Maximization vs minimization Step size

SLIDE 17

Eigen-analysis, gradient descent

SLIDE 18

The Numerics of GANs

SLIDE 19

Make vector field “more conservative”

Idea 1: Minimize the norm of the gradient

SLIDE 20

Idea 1: Minimize vector field norm

SLIDE 21

Idea 2: use L as regularizer

SLIDE 22

Idea 2: use L as regularizer

SLIDE 23

Idea 2: use L as regularizer

SLIDE 24

Other ways to control these rotations?

SLIDE 25

Momentum (heavy ball, Polyak 1964)

Jacobian of momentum operator

Non-symmetric, with complex eigenvalues → Rotations in augmented state-space

SLIDE 26

Summary

Positive momentum can be bad for adversarial games Practice that was very common when GANs were first invented. → Recent work reduced the momentum parameter. → Not an accident

SLIDE 27

Negative Momentum for Improved Game Dynamics

Gidel, Askari Hemmat, Pezeshki, Huang, Lepriol, Lacoste-Julien, Mitliagkas AISTATS 2019

SLIDE 28

Our results

Negative momentum is optimal on simple bilinear game Negative momentum is empirically best for certain zero sum games like “saturating GANs’’ Negative momentum values are locally preferrable near 0 on a more general class of games

SLIDE 29

Momentum on games

Fixed point operator requires a state augmentation: (because we need previous iterate) Recall Polyak’s momentum (on top of simultaneous grad. desc.):

SLIDE 30

Bilinear game

SLIDE 31

“Proof by picture”

Gradient descent → Simultaneous → Alternating Momentum → Positive → Negative

SLIDE 32

General games

SLIDE 33

Eigen-analysis, 0 momentum

SLIDE 34

Zero vs negative momentum

Momentum → Zero → Negative

SLIDE 35

Negative Momentum

SLIDE 36

Empirical results

SLIDE 37

What happens in practice ?

Fashion MNIST:

SLIDE 38

What happens in practice ?

CIFAR-10:

SLIDE 39

Negative Momentum

To sum up:

Negative momentum seems to improve the behaviour due to

“bad” eigenvalues.

Optimal for a class of games
Empirically optimal on “saturating” GANs

Some results on GAN dynamics

Game dynamics are weird fascinating

Start with optimization dynamics

Optimization

Smooth, differentiable cost function, L → Looking for stationary (fixed) points (gradient is 0) → Gradient descent

Optimization

Conservative vector field → Straightforward dynamics

Gradient descent

Conservative vector field → Straightforward dynamics Fixed-point analysis Jacobian of operator

Local convergence

Jacobian of operator

Eigenvalues of op. Jacobian If ρ(θ*)=max |λ(θ*)|<1, then fast local convergence

Games

Implicit generative models

Generative Adversarial Networks

Generative Adversarial Networks

Games

Nash Equilibrium Smooth, differentiable L → Looking for local Nash equil. → Gradient descent → Simultaneous → Alternating

Game dynamics

Non-conservative vector field → Rotational dynamics

Game dynamics under gradient descent

Games demonstrate rotational dynamics.

The Numerics of GANs

by Mescheder, Nowozin, Geiger

A word on notation and formulation

Warning: Maximization vs minimization Step size

Eigen-analysis, gradient descent

The Numerics of GANs

Make vector field “more conservative”

Idea 1: Minimize the norm of the gradient

Idea 1: Minimize vector field norm

Idea 2: use L as regularizer

Idea 2: use L as regularizer

Idea 2: use L as regularizer

Other ways to control these rotations?

Momentum (heavy ball, Polyak 1964)

Jacobian of momentum operator

Summary

Positive momentum can be bad for adversarial games Practice that was very common when GANs were first invented. → Recent work reduced the momentum parameter. → Not an accident

Negative Momentum for Improved Game Dynamics

Our results

Negative momentum is optimal on simple bilinear game Negative momentum is empirically best for certain zero sum games like “saturating GANs’’ Negative momentum values are locally preferrable near 0 on a more general class of games

Momentum on games

Bilinear game

“Proof by picture”

Gradient descent → Simultaneous → Alternating Momentum → Positive → Negative

General games

Eigen-analysis, 0 momentum

Zero vs negative momentum

Momentum → Zero → Negative

Negative Momentum

Empirical results

What happens in practice ?

What happens in practice ?

Negative Momentum

Eigenvalues of op. Jacobian If ρ(θ)=max |λ(θ)|<1, then fast local convergence