nested optimization in games
play

Nested Optimization in Games Rozhina Ghanavi University of Toronto - PowerPoint PPT Presentation

Nested Optimization in Games Rozhina Ghanavi University of Toronto November 2, 2019 Different types of games Simultaneous games Sequential games, Stackelberg games: consists of a leader and a follower, the follower observes the


  1. Nested Optimization in Games Rozhina Ghanavi University of Toronto November 2, 2019

  2. Different types of games • Simultaneous games • Sequential games, Stackelberg games: consists of a leader and a follower, the follower observes the leader’s quantity choice and choose action based on that. � � min f 1 ( x 1 , x 2 ) | x 2 ∈ arg min y ∈ X 2 f 2 ( x 1 , y ) (1) x 1 ∈ X 1

  3. Motivations • Why we are interested in games? Use cases in ML: GANs, adverserial training, and primal-dual RL. • What is the problem? Simple gradient based methods are not working and we are looking for other optimization methods.

  4. GANs from Binglin, Shashan, and Bhargav.

  5. GANs min G max D V ( G , D ) (2) � � V ( G , D ) = p data ( x ) log( D ( x )) dx + p z ( z ) log(1 − D ( g ( z ))) dz z x (3) • Equilibrium no longer consist of a single loss, hence nested optimization.

  6. GAN optimization algorithm • GAN optimization is based on gradient descent ascent (GDA). • Update the discriminator by ascending gradient: m 1 � � x ( i ) � � � � z ( i ) ���� � ∇ θ d log D + log 1 − D G (4) m i =1 • Update the generator by descending gradient: m 1 � � � z ( i ) ��� � ∇ θ g log 1 − D (5) G m i =1

  7. Convergence of Learning Dynamics in Stackelberg Games T. Fiez, B. Chasnov, and L. J. Ratliff

  8. Games setting • They considered a sequential Stackelberg game (pure strategy: Stackelberg equilibrium). • This game consists of a leader and a follower.

  9. Finite-Time High-Probability Guarantees The follower converges to: P ( � x 2 , n − z n � ≤ ε, ∀ n ≥ ¯ n | x 2 , n 0 , z n 0 ∈ B q 0 ) → 1 (6) where, z k = r ( x 1 , k ) and r ( x ) is the implicit function.

  10. Finite-Time High-Probability Guarantees The leader converges to: � ≤ ε, ∀ n ≥ ¯ �� � ˆ �� � x 1 , n − x 1 n | x n 0 , x n 0 ∈ B q 0 ) → 1 (7) P t n Take away point, we converge to a neighborhood of a Stackelberg equilibrium in finite-time, with a good probability!!

  11. Conclusions • Shows that there exist stable attractors of simultaneous gradient play that are Stackelberg equilibria and not Nash equilibria.

  12. Conclusions • Shows that there exist stable attractors of simultaneous gradient play that are Stackelberg equilibria and not Nash equilibria. • A finite-time high probability bound for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games.

  13. On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach Under blind review at ICLR 2020

  14. Games Setting • Differentiable sequential games, • Two players, • zero-sum, minimax, x ∈ R n max min y ∈ R m f ( x , y ) (8)

  15. How to solve minimax optimization? • Gradient descent-ascent (GDA) • Problem 1. The goal is to converge to local minimax points, but GDA fails. Problem 2. Strong rotation around fixed points. Requires small learning rate. • Follow-the-Ridge (FR), proposed by this paper. • Solves both issues.

  16. Follow the ridge (FR) • GDA tends to drift away from the ridge. • How to solve it? By definition, a local minimax has to lie on a ridge. So, follow the ridge!

  17. FR algorithm

  18. FR algorithm

  19. FR results from the paper.

  20. Conclusion • It addresses the rotational behaviour of gradient dynamics and allows larger learning rate than GDA.

  21. Conclusion • It addresses the rotational behaviour of gradient dynamics and allows larger learning rate than GDA. • Standard acceleration techniques can be added.

  22. Conclusion • It addresses the rotational behaviour of gradient dynamics and allows larger learning rate than GDA. • Standard acceleration techniques can be added. • In general we were so hyped about using GD in neural networks because we knew they are converging, this method can be viewed as a similar way to think about GANs

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend