Nested Optimization in Games Rozhina Ghanavi University of Toronto - - PowerPoint PPT Presentation

nested optimization in games
SMART_READER_LITE
LIVE PREVIEW

Nested Optimization in Games Rozhina Ghanavi University of Toronto - - PowerPoint PPT Presentation

Nested Optimization in Games Rozhina Ghanavi University of Toronto November 2, 2019 Different types of games Simultaneous games Sequential games, Stackelberg games: consists of a leader and a follower, the follower observes the


slide-1
SLIDE 1

Nested Optimization in Games

Rozhina Ghanavi

University of Toronto

November 2, 2019

slide-2
SLIDE 2

Different types of games

  • Simultaneous games
  • Sequential games, Stackelberg games: consists of a leader and

a follower, the follower observes the leader’s quantity choice and choose action based on that. min

x1∈X1

  • f1 (x1, x2) |x2 ∈ arg min

y∈X2 f2 (x1, y)

  • (1)
slide-3
SLIDE 3

Motivations

  • Why we are interested in games?

Use cases in ML: GANs, adverserial training, and primal-dual RL.

  • What is the problem?

Simple gradient based methods are not working and we are looking for other optimization methods.

slide-4
SLIDE 4

GANs

from Binglin, Shashan, and Bhargav.

slide-5
SLIDE 5

GANs

min

G max D V (G, D)

(2) V (G, D) =

  • x

pdata (x) log(D(x))dx +

  • z

pz(z) log(1−D(g(z)))dz (3)

  • Equilibrium no longer consist of a single loss, hence nested
  • ptimization.
slide-6
SLIDE 6

GAN optimization algorithm

  • GAN optimization is based on gradient descent ascent (GDA).
  • Update the discriminator by ascending gradient:

∇θd 1 m

m

  • i=1
  • log D
  • x(i)

+ log

  • 1 − D
  • G
  • z(i)

(4)

  • Update the generator by descending gradient:

∇θg 1 m

m

  • i=1

log

  • 1 − D
  • G
  • z(i)

(5)

slide-7
SLIDE 7

Convergence of Learning Dynamics in Stackelberg Games

  • T. Fiez, B. Chasnov, and L. J. Ratliff
slide-8
SLIDE 8

Games setting

  • They considered a sequential Stackelberg game (pure

strategy: Stackelberg equilibrium).

  • This game consists of a leader and a follower.
slide-9
SLIDE 9

Finite-Time High-Probability Guarantees

The follower converges to: P (x2,n − zn ≤ ε, ∀n ≥ ¯ n|x2,n0, zn0 ∈ Bq0) → 1 (6) where, zk = r (x1,k) and r(x) is the implicit function.

slide-10
SLIDE 10

Finite-Time High-Probability Guarantees

The leader converges to: P

  • x1,n − x1

ˆ tn

  • ≤ ε, ∀n ≥ ¯

n|xn0, xn0 ∈ Bq0) → 1 (7) Take away point, we converge to a neighborhood of a Stackelberg equilibrium in finite-time, with a good probability!!

slide-11
SLIDE 11

Conclusions

  • Shows that there exist stable attractors of simultaneous

gradient play that are Stackelberg equilibria and not Nash equilibria.

slide-12
SLIDE 12

Conclusions

  • Shows that there exist stable attractors of simultaneous

gradient play that are Stackelberg equilibria and not Nash equilibria.

  • A finite-time high probability bound for local convergence to a

neighborhood of a stable Stackelberg equilibrium in general-sum games.

slide-13
SLIDE 13

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Under blind review at ICLR 2020

slide-14
SLIDE 14

Games Setting

  • Differentiable sequential games,
  • Two players,
  • zero-sum, minimax,

min

x∈Rn max y∈Rm f (x, y)

(8)

slide-15
SLIDE 15

How to solve minimax optimization?

  • Gradient descent-ascent (GDA)
  • Problem 1. The goal is to converge to local minimax points,

but GDA fails. Problem 2. Strong rotation around fixed points. Requires small learning rate.

  • Follow-the-Ridge (FR), proposed by this paper.
  • Solves both issues.
slide-16
SLIDE 16

Follow the ridge (FR)

  • GDA tends to drift away from the ridge.
  • How to solve it?

By definition, a local minimax has to lie on a ridge. So, follow the ridge!

slide-17
SLIDE 17

FR algorithm

slide-18
SLIDE 18

FR algorithm

slide-19
SLIDE 19

FR results

from the paper.

slide-20
SLIDE 20

Conclusion

  • It addresses the rotational behaviour of gradient dynamics and

allows larger learning rate than GDA.

slide-21
SLIDE 21

Conclusion

  • It addresses the rotational behaviour of gradient dynamics and

allows larger learning rate than GDA.

  • Standard acceleration techniques can be added.
slide-22
SLIDE 22

Conclusion

  • It addresses the rotational behaviour of gradient dynamics and

allows larger learning rate than GDA.

  • Standard acceleration techniques can be added.
  • In general we were so hyped about using GD in neural

networks because we knew they are converging, this method can be viewed as a similar way to think about GANs