Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas - - PowerPoint PPT Presentation

stochastic hamiltonian gradient methods for smooth games
SMART_READER_LITE
LIVE PREVIEW

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas - - PowerPoint PPT Presentation

Stochastic Hamiltonian Gradient Methods for Smooth Games Nicolas Loizou joint work with Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent , Simon Lacoste-Julien , Ioannis Mitliagkas . : : : ICML 2020 : :


slide-1
SLIDE 1

Stochastic Hamiltonian Gradient Methods for Smooth Games

Nicolas Loizou

joint work with Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent†, Simon Lacoste-Julien†, Ioannis Mitliagkas†.

: : :

ICML 2020

: : :

† Canada CIFAR AI Chair

  • N. Loizou, Stochastic Hamiltonian Methods

1 / 14

slide-2
SLIDE 2

Overview

1

Min-max Optimization Problem Motivation Related Work Main Contributions

2

Classes of Stochastic Games and Hamiltonian Viewpoint

3

Stochastic Hamiltonian Gradient Methods Stochastic Hamiltonian Gradient Descent Stochastic Variance Reduced Hamiltonian Gradient Method Convergence Guarantees

4

Numerical Experiments

5

Conclusion & Future Directions of Research

  • N. Loizou, Stochastic Hamiltonian Methods

2 / 14

slide-3
SLIDE 3

The Min-Max Optimization Problem

Problem: Stochastic Smooth Game.

min

x12Rd1 max x22Rd2 g(x1, x2) = 1

n

n

X

i=1

gi(x1, x2) (1) where g : Rd1 ⇥ Rd2 ! R is a smooth objective.

Goal: Find Min-max solution / Nash Equilibrium.

Find x⇤ = (x⇤

1, x⇤ 2) 2 Rd such that, for every x1 2 Rd1 and x2 2 Rd2,

g(x⇤

1, x2)  g(x⇤ 1, x⇤ 2)  g(x1, x⇤ 2),

Appears in many applications:

Domain Generalization (Albuquerque et al., 2019) Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) Formulations in Reinforcement Learning (Pfau, Vinyals, 2016)

  • N. Loizou, Stochastic Hamiltonian Methods

3 / 14

slide-4
SLIDE 4

Related Work

Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich,

1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019).

  • N. Loizou, Stochastic Hamiltonian Methods

4 / 14

slide-5
SLIDE 5

Related Work

Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich,

1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019).

Stochastic Games: Convergent methods rely on iterate averaging over compact domains (Nemirovski, 2004).

Palaniappan & Bach, 2016 and Chavdarova et al., 2019 proposed methods

with last-iterate convergence guarantees over a non-compact domain but under strong monotonicity assumption.

  • N. Loizou, Stochastic Hamiltonian Methods

4 / 14

slide-6
SLIDE 6

Related Work

Deterministic Games: Last-iterate convergence guarantees. Classic results (Korpelevich,

1976; Nemirovski, 2004) and recent results (Mescheder et al., 2017; Daskalakis et al., 2017; Gidel et al., 2018b; Azizian et al., 2019).

Stochastic Games: Convergent methods rely on iterate averaging over compact domains (Nemirovski, 2004).

Palaniappan & Bach, 2016 and Chavdarova et al., 2019 proposed methods

with last-iterate convergence guarantees over a non-compact domain but under strong monotonicity assumption. Second-Order Methods: Consensus optimization method (Mescheder et al., 2017) and Hamiltonian gradient descent (Balduzzi et al., 2018; Abernethy et al.,

2019). No available analysis for the stochastic problem.

  • N. Loizou, Stochastic Hamiltonian Methods

4 / 14

slide-7
SLIDE 7

Main Contributions

1 First global non-asymptotic last-iterate convergence guarantees

in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games.

  • N. Loizou, Stochastic Hamiltonian Methods

5 / 14

slide-8
SLIDE 8

Main Contributions

1 First global non-asymptotic last-iterate convergence guarantees

in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games.

2 First convergence analysis of stochastic Hamiltonian methods

for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018).

  • N. Loizou, Stochastic Hamiltonian Methods

5 / 14

slide-9
SLIDE 9

Main Contributions

1 First global non-asymptotic last-iterate convergence guarantees

in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games.

2 First convergence analysis of stochastic Hamiltonian methods

for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018).

3 A novel unbiased estimator of the Hamiltonian gradient. Crucial

point for proving convergence for the proposed methods (existing methods use biased estimators).

  • N. Loizou, Stochastic Hamiltonian Methods

5 / 14

slide-10
SLIDE 10

Main Contributions

1 First global non-asymptotic last-iterate convergence guarantees

in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games.

2 First convergence analysis of stochastic Hamiltonian methods

for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018).

3 A novel unbiased estimator of the Hamiltonian gradient. Crucial

point for proving convergence for the proposed methods (existing methods use biased estimators).

4 First stochastic Hamiltonian variance reduced method (linear

convergence guarantees).

  • N. Loizou, Stochastic Hamiltonian Methods

5 / 14

slide-11
SLIDE 11

Main Contributions

1 First global non-asymptotic last-iterate convergence guarantees

in the stochastic setting (without assuming strong monotonicity or bounded domain) including a class of non-convex non-concave games.

2 First convergence analysis of stochastic Hamiltonian methods

for solving min-max problems. Existing papers on these methods are empirical (Mescheder et al. 2017, Balduzzi et al. 2018).

3 A novel unbiased estimator of the Hamiltonian gradient. Crucial

point for proving convergence for the proposed methods (existing methods use biased estimators).

4 First stochastic Hamiltonian variance reduced method (linear

convergence guarantees). Hamiltonian Perspective: Popular stochastic optimization algorithms can be used as methods for solving stochastic min-max problems.

  • N. Loizou, Stochastic Hamiltonian Methods

5 / 14

slide-12
SLIDE 12

Smooth Games and Hamiltonian Gradient Descent

min

x12Rd1 max x22Rd2 g(x1, x2)

(2) x = (x1, x2)> 2 Rd ξ(x) = ✓ rx1g rx2g ◆ J = rξ = ✓ r2

x1,x1g

r2

x1,x2g

r2

x2,x1g

r2

x2,x2g

◆ Vector x⇤ 2 Rd is a stationary point when ξ(x⇤) = 0.

Key Assumption:

All stationary points of the objective g are global min-max solutions.

Hamiltonian Gradient Descent (HGD) (Balduzzi et al., 2018)

min

x

H(x) = 1 2kξ(x)k2. (3) HGD can be expressed using a Jacobian-vector product: xk+1 = xk ηkrH(x) = xk ηk h J>ξ i

  • N. Loizou, Stochastic Hamiltonian Methods

6 / 14

slide-13
SLIDE 13

Stochastic Hamiltonian Function

min

x12Rd1 max x22Rd2 g(x1, x2) = 1

n

n

X

i=1

gi(x1, x2) (4) ξi(x) = ✓ rx1gi rx2gi ◆ J = 1 n

n

X

i=1

Ji, where Ji = ✓ r2

x1,x1gi

r2

x1,x2gi

r2

x2,x1gi

r2

x2,x2gi

◆ .

Finite-Sum Structure Hamiltonian Function

H(x) = 1 n2

n

X

i,j=1

Hi,j(x) where Hi,j(x) = 1 2hξi(x), ξj(x)i (5) Algorithms use gradient of only one component function Hi,j(x): rHi,j(x) = 1 2 h J>

i ξj + J> j ξi

i . (6) Unbiased estimator of the rH(x). That is, Ei,j [rHi,j(x)] = rH(x).

  • N. Loizou, Stochastic Hamiltonian Methods

7 / 14

slide-14
SLIDE 14

Classes of Stochastic Smooth Games

Stochastic Bilinear Games. g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

(7) Stochastic sufficiently bilinear games.(Abernethy et al., 2019) Games where the following condition is true: (δ2 + ρ2)(δ2 + β2) 4L2∆2 > 0, (8) where 0 < δ  σi

  • r2

x1,x2g

  •  ∆, ρ2 = minx1,x2 λmin

⇥ r2

x1,x1g(x1, x2)

⇤2 and β2 = minx1,x2 λmin ⇥ r2

x2,x2g(x1, x2)

⇤2.

  • N. Loizou, Stochastic Hamiltonian Methods

8 / 14

slide-15
SLIDE 15

Classes of Stochastic Smooth Games

Stochastic Bilinear Games. g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

(7) Proposition: Stochastic bilinear game (7) ) Stochastic Hamiltonian function (5) is a smooth quadratic quasi-strongly convex function. Stochastic sufficiently bilinear games.(Abernethy et al., 2019) Games where the following condition is true: (δ2 + ρ2)(δ2 + β2) 4L2∆2 > 0, (8) where 0 < δ  σi

  • r2

x1,x2g

  •  ∆, ρ2 = minx1,x2 λmin

⇥ r2

x1,x1g(x1, x2)

⇤2 and β2 = minx1,x2 λmin ⇥ r2

x2,x2g(x1, x2)

⇤2.

  • N. Loizou, Stochastic Hamiltonian Methods

8 / 14

slide-16
SLIDE 16

Classes of Stochastic Smooth Games

Stochastic Bilinear Games. g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

(7) Proposition: Stochastic bilinear game (7) ) Stochastic Hamiltonian function (5) is a smooth quadratic quasi-strongly convex function. Stochastic sufficiently bilinear games.(Abernethy et al., 2019) Games where the following condition is true: (δ2 + ρ2)(δ2 + β2) 4L2∆2 > 0, (8) where 0 < δ  σi

  • r2

x1,x2g

  •  ∆, ρ2 = minx1,x2 λmin

⇥ r2

x1,x1g(x1, x2)

⇤2 and β2 = minx1,x2 λmin ⇥ r2

x2,x2g(x1, x2)

⇤2. Proposition: Stochastic sufficiently bilinear game ) Stochastic Hamiltonian function (5) is smooth and satisfies the PL condition.

  • N. Loizou, Stochastic Hamiltonian Methods

8 / 14

slide-17
SLIDE 17

Stochastic Hamiltonian Gradient Methods

Stochastic Hamiltonian Gradient Descent (SHGD)

1 Generate fresh samples i ⇠ D and j ⇠ D and evaluate rHi,j(xk). 2 Set step-size γk (constant, decreasing) 3 Set

xk+1 = xk γkrHi,j(xk)

  • N. Loizou, Stochastic Hamiltonian Methods

9 / 14

slide-18
SLIDE 18

Stochastic Hamiltonian Gradient Methods

Stochastic Hamiltonian Gradient Descent (SHGD)

1 Generate fresh samples i ⇠ D and j ⇠ D and evaluate rHi,j(xk). 2 Set step-size γk (constant, decreasing) 3 Set

xk+1 = xk γkrHi,j(xk)

Loopless Stochastic Variance Reduced Hamiltonian Gradient (L-SVRHG)

Input: Choose initial points x0 = w0 2 Rd and probability p 2 (0, 1].

1 Generate fresh samples i ⇠ D and j ⇠ D and evaluate rHi,j(xk). 2 Evaluate gk = rHi,j(xk) rHi,j(wk) + rH(wk) . 3 Set xk+1 = xk γgk 4 Set wk+1 =

( xk with probability p wk with probability 1 p

  • N. Loizou, Stochastic Hamiltonian Methods

9 / 14

slide-19
SLIDE 19

Convergence Guarantees

Algorithm Stochastic Bilinear Game E ⇥ kxk x⇤k2⇤ Stochastic Sufficiently Bilinear Game E [H(x)] Remarks on Rates (all: global, non-asymptotic) SHGD Constant step-size Linear Linear last-iterate convergence to neighborhood SHGD Decreasing step-size sublinear: O(1/k) sublinear: O(1/k) last-iterate convergence to min-max solution L-SVRHG with/without restarts Linear Linear last-iterate convergence to min-max solution

Table: Summary of Convergence Analysis Results

Remark: In our results we do not assume bounded gradient or bounded

  • variance. We use the recently introduced weak assumptions of Expected

smoothness and Expected Residual. (Gower et al., 2019, 2020)

  • N. Loizou, Stochastic Hamiltonian Methods

10 / 14

slide-20
SLIDE 20

Numerical Evaluation

Stochastic Bilinear Games Stochastic Sufficiently Bilinear Games GANs

  • N. Loizou, Stochastic Hamiltonian Methods

11 / 14

slide-21
SLIDE 21

Stochastic Bilinear Game

g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

n = d1 = d2 = 100, [bi]k, [ci]k ⇠ N(0, 1/n) and [Ai]kl = 1 if i = k = l .

  • N. Loizou, Stochastic Hamiltonian Methods

12 / 14

slide-22
SLIDE 22

Stochastic Bilinear Game

g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

n = d1 = d2 = 100, [bi]k, [ci]k ⇠ N(0, 1/n) and [Ai]kl = 1 if i = k = l .

Figure: Distance to optimality ||xk x∗||2/||x0 x∗||2

  • N. Loizou, Stochastic Hamiltonian Methods

12 / 14

slide-23
SLIDE 23

Stochastic Bilinear Game

g(x1, x2) = 1 n

n

X

i=1

x>

1 bi + x> 1 Aix2 + c> i x2

n = d1 = d2 = 100, [bi]k, [ci]k ⇠ N(0, 1/n) and [Ai]kl = 1 if i = k = l .

Figure: Distance to optimality ||xk x∗||2/||x0 x∗||2 Figure: Gradient Vector Field and

  • Trajectory. (x1 and x2 are scalars)
  • N. Loizou, Stochastic Hamiltonian Methods

12 / 14

slide-24
SLIDE 24

Take-Away Message

1 First set of global non-asymptotic last-iterate convergence guarantees

for stochastic smooth games over a non-compact domain, in the absence of strong monotonicity assumptions.

2 Present the first variance reduced Hamiltonian method (linear

convergence).

3 Hamiltonian Perspective: Popular stochastic optimization algorithms

can be used as methods for solving stochastic min-max problems.

Future Extensions

Hamiltonian-type methods for solving more classes of games. Development of efficient accelerated, distributed / decentralized Hamiltonian methods.

  • N. Loizou, Stochastic Hamiltonian Methods

13 / 14

slide-25
SLIDE 25

Thank You! (for questions welcome to our virtual poster)

  • N. Loizou, Stochastic Hamiltonian Methods

14 / 14