[PPT] - Announcements Minbiaos office hour will be changed to Thursday 1-2 PowerPoint Presentation

SLIDE 1

1

Announcements

Ø Minbiao’s office hour will be changed to Thursday 1-2 pm,

starting from next week, at Rice Hall 442

SLIDE 2

CS6501: T

pics in Learning and Game Theory

(Fall 2019)

Introduction to Game Theory (II)

Instructor: Haifeng Xu

SLIDE 3

3

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

SLIDE 4

4

Recap: Normal-Form Games

Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏* ∈ 𝐵* Ø An outcome is the action profile 𝑏 = (𝑏., ⋯ , 𝑏/)

As a convention, 𝑏1* = (𝑏., ⋯ , 𝑏*1., 𝑏*2., ⋯ , 𝑏/) denotes all actions

excluding 𝑏*

ØPlayer 𝑗 receives payoff 𝑣*(𝑏) for any outcome 𝑏 ∈ Π*5.

/ 𝐵*

𝑣* 𝑏 = 𝑣*(𝑏*, 𝑏1*) depends on other players’ actions

Ø 𝐵* , 𝑣* *∈[/] are public knowledge

A mixed strategy profile 𝑦∗ = (𝑦.

∗, ⋯ , 𝑦/ ∗) is a Nash equilibrium

(NE) if for any 𝑗, 𝑦*

∗ is a best response to 𝑦1* ∗ .

SLIDE 5

5

ØNE rests on two key assumptions

1. Players move simultaneously (so they cannot see others’ strategies

before the move)

2. Players take actions independently

ØLast lecture: sequential move results in different player behaviors

The corresponding game is called Stackelberg game and its

equilibrium is called Strong Stackelberg equilibrium

NE Is Not the Only Solution Concept

Today: we study what happens if players do not take actions independently but instead are “coordinated” by a central mediator Ø This results in the study of correlated equilibrium

SLIDE 6

6

An Illustrative Example

ØThere is a mediator – the traffic light – that coordinates cars’ moves Ø For example, recommend (GO, STOP) for (A,B) with probability 3/5

and (STOP, GO) for (A,B) with probability 2/5

GO = green light, STOP = red light
Following the recommendation is a best response for each player
It turns out that this recommendation policy results in equal player utility

− 6/5 and thus is “fair” STOP GO STOP (-3, -2) (-3, 0) GO (0, -2) (-100, -100)

A B

The Traffic Light Game Well, we did not see many crushes in reality… Why?

This is exactly how traffic lights are designed!

SLIDE 7

7

Correlated Equilibrium (CE)

ØA (randomized) recommendation policy 𝜌 assigns probability 𝜌(𝑏)

for each action profile 𝑏 ∈ 𝐵 = Π*∈ / 𝐵*

A mediator first samples 𝑏 ∼ 𝜌, then recommends 𝑏* to 𝑗 privately

ØUpon receiving a recommendation 𝑏*, player 𝑗’s expected utility is

. @ ∑BCD∈ECD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*)

𝑑 is a normalization term that equals the probability 𝑏* is recommended

A recommendation policy 𝜌 is a correlated equilibrium if

∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 . Ø That is, any recommended action to any player is a best response

CE makes incentive compatible action recommendations

Ø Assumed 𝜌 is public knowledge so every player can calculate her utility

SLIDE 8

8

Basic Facts about Correlated Equilibrium

ØIn fact, distributions 𝜌 satisfies a set of linear constraints

Fact. Any Nash equilibrium is also a correlated equilibrium.

Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists

Fact. The set of correlated equilibria forms a convex set.

∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

SLIDE 9

9

Basic Facts about Correlated Equilibrium

ØIn fact, distributions 𝜌 satisfies a set of linear constraints ØThis is nice because that allows us to optimize over all CEs ØNot true for Nash equilibrium

Fact. Any Nash equilibrium is also a correlated equilibrium.

Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists

Fact. The set of correlated equilibria forms a convex set.

SLIDE 10

10

Coarse Correlated Equilibrium (CCE)

ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player

does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if

∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.

Compare to correlated equilibrium condition: ∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

SLIDE 11

11

Coarse Correlated Equilibrium (CCE)

ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player

does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if

∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.

Fact. Any correlated equilibrium is a coarse correlated equilibrium.

SLIDE 12

12

The Equilibrium Hierarchy

Nash Equilibrium (NE) Correlated Equilibrium (CE) Coarse Correlated Equilibrium (CCE) There are other equilibrium concepts, but NE and CE are most

ften used. CCE is not used that often.

SLIDE 13

13

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

SLIDE 14

14

Zero-Sum Games

ØTwo players: player 1 action 𝑗 ∈ 𝑛 = {1, ⋯ , 𝑛}, player 2 action 𝑘 ∈ [𝑜] ØThe game is zero-sum if 𝑣. 𝑗, 𝑘 + 𝑣N 𝑗, 𝑘 = 0, ∀𝑗 ∈ 𝑛 , 𝑘 ∈ [𝑜]

Models the strictly competitive scenarios
“Zero-sum” almost always mean “2-player zero-sum” games
𝑜-player games can also be zero-sum, but not particularly interesting

Ø Let 𝑣. 𝑦, 𝑧 = ∑*∈ Q ,R∈[/] 𝑣. 𝑗, 𝑘 𝑦*𝑧R for any 𝑦 ∈ ΔQ, 𝑧 ∈ Δ/ Ø (𝑦∗, 𝑧∗) is a NE for the zero-sum game if: (1) 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣.(𝑗, 𝑧∗) for

any 𝑗 ∈ [𝑛]; (2) 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) for any j ∈ [𝑛]

Ø Condition 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) ⟺ 𝑣N 𝑦∗, 𝑧∗ ≥ 𝑣N 𝑦∗, 𝑘 Ø We can “forget” 𝑣N; Instead think of player 2 as minimizing player 1’s utility

SLIDE 15

15

Maximin and Minimax Strategy

ØPrevious observations motivate the following definitions

Definition. 𝑦∗ ∈ ΔQ is a maximin strategy of player 1 if it solves

The corresponding utility value is called maximin value of the game. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).

Remarks: Ø 𝑦∗ is player 1’s best action if he was to move first

SLIDE 16

16

Maximin and Minimax Strategy

ØPrevious observations motivate the following definitions

Definition. 𝑦∗ ∈ ΔQ is a maximin strategy of player 1 if it solves

The corresponding utility value is called maximin value of the game. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).

Definition. 𝑧∗ ∈ Δ/ is a minimax strategy of player 2 if it solves

The corresponding utility value is called minimax value of the game. min

_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Remark: 𝑧∗ is player 2’s best action if he was to move first

SLIDE 17

17

Duality of Maximin and Minimax

Ø Let 𝑧∗ = argmin

_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧), so

min

_∈[` max *∈ Q 𝑣.(𝑗, 𝑧) = max *∈ Q 𝑣1(𝑗, 𝑧∗)

Ø We have max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ max Z∈[\ 𝑣1(𝑦, 𝑧∗)

Fact. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

That is, moving first is no better.

= max

*∈ Q 𝑣1(𝑗, 𝑧∗)

SLIDE 18

18

Duality of Maximin and Minimax

max 𝑣 s.t. 𝑣 ≤ ∑*5.

Q 𝑣.(𝑗, 𝑘) 𝑦*, ∀𝑘 ∈ [𝑜]

∑*5.

Q 𝑦* = 1

𝑦* ≥ 0, ∀𝑗 ∈ [𝑛] Maximin Minimax min 𝑤 s.t. 𝑤 ≥ ∑R5.

/

𝑣.(𝑗, 𝑘) 𝑧R, ∀𝑗 ∈ [𝑛] ∑R5.

/

𝑧R = 1 𝑧R ≥ 0, ∀𝑘 ∈ [𝑜] Theorem. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) = min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Fact. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Ø Maximin and minimax can both be formulated as linear program Ø This turns out to be primal and dual LP. Strong duality yields the equation

SLIDE 19

19

“Uniqueness” of Nash Equilibrium (NE)

⇐: if 𝑦∗ [𝑧∗] is the maximin [minimax] strategy, then (𝑦∗, 𝑧∗) is a NE

ØWant to prove 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗 ∈ [𝑛]

𝑣. 𝑦∗, 𝑧∗ ≥ min

e

𝑣. 𝑦∗, 𝑘 = max

Z∈[\ min e

𝑣. 𝑦, 𝑘 = min

_∈[` max *∈[Q] 𝑣.(𝑗, 𝑧)

= max

*∈[Q] 𝑣.(𝑗, 𝑧∗)

≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗

Ø Similar argument shows 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣. 𝑦∗, 𝑘 , ∀𝑘 ∈ [𝑜] Ø So 𝑦∗, 𝑧∗ is a NE

Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.

SLIDE 20

20

“Uniqueness” of Nash Equilibrium (NE)

⇒: if (𝑦∗, 𝑧∗) is a NE, then 𝑦∗ [𝑧∗] is the maximin [minimax] strategy

ØObserve the following inequalities

𝑣. 𝑦∗, 𝑧∗ = max

*∈[Q] 𝑣.(𝑗, 𝑧∗)

≥ min

_∈[` max *∈ Q 𝑣. 𝑗, 𝑧

= max

Z∈[\ min e

𝑣. 𝑦, 𝑘 ≥ min

e

𝑣. 𝑦∗, 𝑘 = 𝑣. 𝑦∗, 𝑧∗

Ø So the two “≥” must both achieve equality.

The first equality implies 𝑧∗ is the minimax strategy
The second equality implies 𝑦∗ is the maximin strategy
Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.

SLIDE 21

21

“Uniqueness” of Nash Equilibrium (NE)

Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively. Corollary. Ø NE of any 2-player zero-sum game can be computed by LPs Ø Players achieve the same utility in any Nash equilibrium.

Player 1’s NE utility always equals maximin (or minimax) value
This utility is also called the game value

SLIDE 22

22

The Collapse of Equilibrium Concepts in Zero-Sum Games

ØCan be proved using similar proof techniques as for the previous

theorem

ØThe problem of optimizing a player’s utility over equilibrium can

also be solved easily as the equilibrium utility is the same

Theorem. In a 2-player zero-sum game, a player achieves the same

utility in any Nash equilibrium, any correlated equilibrium, any coarse correlated equilibrium and any Strong Stackelberg equilibrium.

SLIDE 23

23

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

SLIDE 24

24

Generative Modeling

Input data points drawn from distribution 𝑄hijk Output data points drawn from distribution 𝑄lmnko Goal: use data points from 𝑄hijk to generate a 𝑄lmnko that is close to 𝑄hijk

SLIDE 25

25

Applications

Input images from true distributions Generated new images, i.e., samples from 𝑄lmnko

A few another Demos: https://miro.medium.com/max/928/1*tUhgr3m54Qc80GU2BkaOiQ.gif http://ganpaint.io/demo/?project=church https://www.youtube.com/watch?v=PCBTZh41Ris&feature=youtu.be Celeb training data [Karras et al. 2017]

SLIDE 26

26

GANs: Generative Adversarial Networks

ØGAN is one particular generative model – a zero-sum game

between the Generator and Discriminator

Objective: select model parameter 𝑣 such that distribution of 𝐻q(𝑨), denoted as 𝑄lmnko, is close to 𝑄ikso Objective: select model parameter 𝑤 such that 𝐸u(𝑦) is large if 𝑦 ∼ 𝑄ikso and 𝐸u(𝑦) is small if 𝑦 ∼ 𝑄lmnko

𝐻q 𝑨 = 𝑦 𝐸u 𝑦

z ∼ 𝑂(0,1)

SLIDE 27

27

GANs: Generative Adversarial Networks

ØGAN is one particular generative model – a zero-sum game

between the Generator and Discriminator

ØThe loss function originally formulated in [Goodfellow et al.’14]

𝐸u 𝑦 = probability of classifying 𝑦 as ”Real”
Log of the likelihood of being correct

𝑀 𝑣, 𝑤 = 𝔽Z∼z{|}~ log[𝐸u(𝑦)] + 𝔽•∼‚(ƒ,.) log[1 − 𝐸u(𝐻q 𝑨 )] Ø The game: Discriminator maximizes this loss function whereas Generator minimizes this loss function

Results in the following zero-sum game
The design of Discriminator is to improve training of Generator

min

q max u

𝑀(𝑣, 𝑤)

SLIDE 28

28

GANs: Generative Adversarial Networks

ØGAN is a large zero-sum game with intricate player payoffs ØGenerator strategy 𝐻q and Discriminator strategy 𝐸u are

typically deep neural networks, with parameters 𝑣, 𝑤

ØGenerator’s utility function has the following general form where

𝜚 is an increasing concave function (e.g., 𝜚 𝑦 = log 𝑦 , 𝑦 etc.) 𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) GAN research is mainly about modeling and solving this extremely large zero-sum game for various applications

SLIDE 29

29

WGAN – A Popular Variant of GAN

Ø Drawbacks of log-likelihood loss: unbounded at boundary, unstable Ø Wasserstein GAN is a popular variant using a different loss function

I.e., substitute log-likelihood by the likelihood itself
Training is typically more stable

𝔽Z∼z{|}~𝐸u 𝑦 − 𝔽•∼‚(ƒ,.)𝐸u(𝐻q 𝑨 )

SLIDE 30

30

Research Challenges in GANs

min

q

max

u

𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø What are the correct choice of loss function 𝜚? Ø What neural network structure for 𝐻q and 𝐸u? Ø Only pure strategies allowed – equilibrium may not exist or is not unique due to non-convexity of strategies and loss function Ø Do not know 𝑄hijk exactly but only have samples Ø How to optimize parameters 𝑣, 𝑤? Ø . . . A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?

SLIDE 31

31

Research Challenges in GANs

min

q

max

u

𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø Intuitively, if the discriminator network 𝐸u is strong enough, we should be able to get close to 𝑄hijk Ø Next, we will analyze the equilibrium of a stylized example A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?

SLIDE 32

32

(Stylized) WGANs for Learning Mean

ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦

Remarks: a) Both Generator and Discriminator can be deep neural networks in general b) We picked particular format for illustrative purpose and also convenience of theoretical analysis

SLIDE 33

33

(Stylized) WGANs for Learning Mean

ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦 Ø WGAN then has the following close-form format

⇒ min

q

max

u

𝔽Z∼‚(†,.) 𝑤𝑦 + 𝔽•∼‚ ƒ,. [1 − 𝑤(𝑨 + 𝑣)] min

q

max

u

𝔽Z∼z{|}~[𝐸u(𝑦)] + 𝔽•∼‚ ƒ,. [1 − 𝐸u(𝐻q 𝑨 )] ⇒ min

q

max

u

𝑤𝛽 + [1 − 𝑤𝑣] Ø This minimax problem solves to 𝑣∗ = 𝛽 Ø I.e, WGAN does precisely learn 𝑄hijk at equilibrium in this case

SLIDE 34

34

See paper “Generalization and Equilibrium in GANs” by Aaora et

al. (2017) for more analysis regarding the equilibrium of GANs and

whether they learn a good distribution at equilibrium

SLIDE 35

Thank You

Haifeng Xu

University of Virginia hx4ad@virginia.edu