Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas - PowerPoint PPT Presentation

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas Loizou joint work with Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent † , Simon Lacoste-Julien † , Ioannis Mitliagkas † . � : � : � : � ICML 2020 � : � : � : � † Canada CIFAR AI Chair N. Loizou, Stochastic Hamiltonian Methods 1 / 14

Overview Min-max Optimization Problem 1 Motivation Related Work Main Contributions Classes of Stochastic Games and Hamiltonian Viewpoint 2 Stochastic Hamiltonian Gradient Methods 3 Stochastic Hamiltonian Gradient Descent Stochastic Variance Reduced Hamiltonian Gradient Method Convergence Guarantees Numerical Experiments 4 Conclusion & Future Directions of Research 5 N. Loizou, Stochastic Hamiltonian Methods 2 / 14

The Min-Max Optimization Problem Problem: Stochastic Smooth Game. n x 2 2 R d 2 g ( x 1 , x 2 ) = 1 X x 1 2 R d 1 max min g i ( x 1 , x 2 ) (1) n i =1 where g : R d 1 ⇥ R d 2 ! R is a smooth objective. Goal: Find Min-max solution / Nash Equilibrium. Find x ⇤ = ( x ⇤ 2 ) 2 R d such that, for every x 1 2 R d 1 and x 2 2 R d 2 , 1 , x ⇤ g ( x ⇤ 1 , x 2 )  g ( x ⇤ 1 , x ⇤ 2 )  g ( x 1 , x ⇤ 2 ) , Appears in many applications: Domain Generalization (Albuquerque et al., 2019) Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) Formulations in Reinforcement Learning (Pfau, Vinyals, 2016) N. Loizou, Stochastic Hamiltonian Methods 3 / 14

Related Work Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich, 1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019). N. Loizou, Stochastic Hamiltonian Methods 4 / 14

Related Work Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich, 1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019). Stochastic Games: Convergent methods rely on iterate averaging over compact domains (Nemirovski, 2004) . Palaniappan & Bach, 2016 and Chavdarova et al., 2019 proposed methods with last-iterate convergence guarantees over a non-compact domain but under strong monotonicity assumption. N. Loizou, Stochastic Hamiltonian Methods 4 / 14

Related Work Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich, 1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019). Stochastic Games: Convergent methods rely on iterate averaging over compact domains (Nemirovski, 2004) . Palaniappan & Bach, 2016 and Chavdarova et al., 2019 proposed methods with last-iterate convergence guarantees over a non-compact domain but under strong monotonicity assumption. Second-Order Methods: Consensus optimization method (Mescheder et al., 2017) and Hamiltonian gradient descent (Balduzzi et al., 2018; Abernethy et al., 2019) . No available analysis for the stochastic problem. N. Loizou, Stochastic Hamiltonian Methods 4 / 14

Main Contributions 1 First global non-asymptotic last-iterate convergence guarantees in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games. N. Loizou, Stochastic Hamiltonian Methods 5 / 14

Main Contributions 1 First global non-asymptotic last-iterate convergence guarantees in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games. 2 First convergence analysis of stochastic Hamiltonian methods for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018) . N. Loizou, Stochastic Hamiltonian Methods 5 / 14

Main Contributions 1 First global non-asymptotic last-iterate convergence guarantees in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games. 2 First convergence analysis of stochastic Hamiltonian methods for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018) . 3 A novel unbiased estimator of the Hamiltonian gradient. Crucial point for proving convergence for the proposed methods (existing methods use biased estimators). N. Loizou, Stochastic Hamiltonian Methods 5 / 14

Main Contributions 1 First global non-asymptotic last-iterate convergence guarantees in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games. 2 First convergence analysis of stochastic Hamiltonian methods for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018) . 3 A novel unbiased estimator of the Hamiltonian gradient. Crucial point for proving convergence for the proposed methods (existing methods use biased estimators). 4 First stochastic Hamiltonian variance reduced method (linear convergence guarantees). N. Loizou, Stochastic Hamiltonian Methods 5 / 14

Main Contributions 1 First global non-asymptotic last-iterate convergence guarantees in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games. 2 First convergence analysis of stochastic Hamiltonian methods for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018) . 3 A novel unbiased estimator of the Hamiltonian gradient. Crucial point for proving convergence for the proposed methods (existing methods use biased estimators). 4 First stochastic Hamiltonian variance reduced method (linear convergence guarantees). Hamiltonian Perspective: Popular stochastic optimization algorithms can be used as methods for solving stochastic min-max problems. N. Loizou, Stochastic Hamiltonian Methods 5 / 14

Smooth Games and Hamiltonian Gradient Descent x 1 2 R d 1 max min x 2 2 R d 2 g ( x 1 , x 2 ) (2) ✓ r x 1 g ✓ r 2 r 2 ◆ ◆ x 1 , x 1 g x 1 , x 2 g x = ( x 1 , x 2 ) > 2 R d ξ ( x ) = J = r ξ = �r 2 �r 2 �r x 2 g x 2 , x 1 g x 2 , x 2 g Vector x ⇤ 2 R d is a stationary point when ξ ( x ⇤ ) = 0. Key Assumption: All stationary points of the objective g are global min-max solutions. Hamiltonian Gradient Descent (HGD) (Balduzzi et al., 2018) H ( x ) = 1 2 k ξ ( x ) k 2 . min (3) x HGD can be expressed using a Jacobian-vector product: x k +1 = x k � η k r H ( x ) = x k � η k h i J > ξ N. Loizou, Stochastic Hamiltonian Methods 6 / 14

Stochastic Hamiltonian Function n x 2 2 R d 2 g ( x 1 , x 2 ) = 1 X x 1 2 R d 1 max min g i ( x 1 , x 2 ) (4) n i =1 ✓ r x 1 g i ✓ r 2 n r 2 ◆ J = 1 ◆ x 1 , x 1 g i x 1 , x 2 g i X ξ i ( x ) = J i , where J i = . �r 2 �r 2 �r x 2 g i x 2 , x 1 g i x 2 , x 2 g i n i =1 Finite-Sum Structure Hamiltonian Function n H ( x ) = 1 H i , j ( x ) = 1 X H i , j ( x ) 2 h ξ i ( x ) , ξ j ( x ) i (5) where n 2 i , j =1 Algorithms use gradient of only one component function H i , j ( x ): r H i , j ( x ) = 1 h i J > i ξ j + J > j ξ i . (6) 2 Unbiased estimator of the r H ( x ). That is, E i , j [ r H i , j ( x )] = r H ( x ). N. Loizou, Stochastic Hamiltonian Methods 7 / 14

Classes of Stochastic Smooth Games Stochastic Bilinear Games. n g ( x 1 , x 2 ) = 1 X x > 1 b i + x > 1 A i x 2 + c > (7) i x 2 n i =1 Stochastic su ffi ciently bilinear games. (Abernethy et al., 2019) Games where the following condition is true: ( δ 2 + ρ 2 )( δ 2 + β 2 ) � 4 L 2 ∆ 2 > 0 , (8) ⇤ 2 and  ∆ , ρ 2 = min x 1 , x 2 λ min r 2 r 2 � � ⇥ where 0 < δ  σ i x 1 , x 1 g ( x 1 , x 2 ) x 1 , x 2 g β 2 = min x 1 , x 2 λ min ⇤ 2 . r 2 ⇥ x 2 , x 2 g ( x 1 , x 2 ) N. Loizou, Stochastic Hamiltonian Methods 8 / 14

Classes of Stochastic Smooth Games Stochastic Bilinear Games. n g ( x 1 , x 2 ) = 1 X x > 1 b i + x > 1 A i x 2 + c > (7) i x 2 n i =1 Proposition: Stochastic bilinear game (7) ) Stochastic Hamiltonian function (5) is a smooth quadratic quasi-strongly convex function. Stochastic su ffi ciently bilinear games. (Abernethy et al., 2019) Games where the following condition is true: ( δ 2 + ρ 2 )( δ 2 + β 2 ) � 4 L 2 ∆ 2 > 0 , (8) ⇤ 2 and  ∆ , ρ 2 = min x 1 , x 2 λ min r 2 r 2 � � ⇥ where 0 < δ  σ i x 1 , x 1 g ( x 1 , x 2 ) x 1 , x 2 g β 2 = min x 1 , x 2 λ min ⇤ 2 . r 2 ⇥ x 2 , x 2 g ( x 1 , x 2 ) N. Loizou, Stochastic Hamiltonian Methods 8 / 14

Classes of Stochastic Smooth Games Stochastic Bilinear Games. n g ( x 1 , x 2 ) = 1 X x > 1 b i + x > 1 A i x 2 + c > (7) i x 2 n i =1 Proposition: Stochastic bilinear game (7) ) Stochastic Hamiltonian function (5) is a smooth quadratic quasi-strongly convex function. Stochastic su ffi ciently bilinear games. (Abernethy et al., 2019) Games where the following condition is true: ( δ 2 + ρ 2 )( δ 2 + β 2 ) � 4 L 2 ∆ 2 > 0 , (8) ⇤ 2 and  ∆ , ρ 2 = min x 1 , x 2 λ min r 2 r 2 � � ⇥ where 0 < δ  σ i x 1 , x 1 g ( x 1 , x 2 ) x 1 , x 2 g β 2 = min x 1 , x 2 λ min ⇤ 2 . r 2 ⇥ x 2 , x 2 g ( x 1 , x 2 ) Proposition : Stochastic su ffi ciently bilinear game ) Stochastic Hamiltonian function (5) is smooth and satisfies the PL condition. N. Loizou, Stochastic Hamiltonian Methods 8 / 14

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas - PowerPoint PPT Presentation

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas Loizou joint work with Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent , Simon Lacoste-Julien , Ioannis Mitliagkas . : : : ICML 2020 : :

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Adaptive primal-dual stochastic gradient methods Yangyang Xu Mathematical Sciences, Rensselaer

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Hamiltonian engineering for many-body quantum systems by Shortcuts To Adiabaticity Kazutaka

Generalized Hamiltonian Cycles Jakub Teska School of ITMS University of Ballarat, VIC 3353,

Digital Data Extraction Using R & Other Tools Jaya M. Satagopan Attending Biostatistician

Self-testing of qutrit systems Jdrzej Kaniewski QMATH, Department of Mathematical Sciences

No Please, After You: Detecting Fraud in Affiliate Marketing Networks Peter Snyder

Object-Oriented Programming In Mechatronic Systems Summer School 2018 Module 1 Introduction to

QA/QC & Installation protoDUNE Design and Production Review April 25, 2017 Jaehoon Yu for A.

Disclosure Statement This study was sponsored by Novartis Pharma AG, Basel, Switzerland

Ob Oberli lin s s Wom omen: : A Legacy of Leadership & Activism Ken Grossi, Oberlin

Welcome to the Faculty Show & Tell! Department of Anthropology Professor Katerina

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas - PowerPoint PPT Presentation

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas Loizou joint work with Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent , Simon Lacoste-Julien , Ioannis Mitliagkas . : : : ICML 2020 : :

Hamiltonian Cycles Hamiltonian Cycles CSE, IIT KGP Hamiltonian Cycle Hamiltonian Cycle A A

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Adaptive primal-dual stochastic gradient methods Yangyang Xu Mathematical Sciences, Rensselaer

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Hamiltonian engineering for many-body quantum systems by Shortcuts To Adiabaticity Kazutaka

Generalized Hamiltonian Cycles Jakub Teska School of ITMS University of Ballarat, VIC 3353,

Digital Data Extraction Using R &amp; Other Tools Jaya M. Satagopan Attending Biostatistician

Self-testing of qutrit systems Jdrzej Kaniewski QMATH, Department of Mathematical Sciences

No Please, After You: Detecting Fraud in Affiliate Marketing Networks Peter Snyder

Object-Oriented Programming In Mechatronic Systems Summer School 2018 Module 1 Introduction to

QA/QC &amp; Installation protoDUNE Design and Production Review April 25, 2017 Jaehoon Yu for A.

Disclosure Statement This study was sponsored by Novartis Pharma AG, Basel, Switzerland

Ob Oberli lin s s Wom omen: : A Legacy of Leadership &amp; Activism Ken Grossi, Oberlin

Welcome to the Faculty Show &amp; Tell! Department of Anthropology Professor Katerina

Digital Data Extraction Using R & Other Tools Jaya M. Satagopan Attending Biostatistician

QA/QC & Installation protoDUNE Design and Production Review April 25, 2017 Jaehoon Yu for A.

Ob Oberli lin s s Wom omen: : A Legacy of Leadership & Activism Ken Grossi, Oberlin

Welcome to the Faculty Show & Tell! Department of Anthropology Professor Katerina