1
Announcements
Ø Minbiao’s office hour will be changed to Thursday 1-2 pm,
starting from next week, at Rice Hall 442
Announcements Minbiaos office hour will be changed to Thursday 1-2 - - PowerPoint PPT Presentation
Announcements Minbiaos office hour will be changed to Thursday 1-2 pm, starting from next week, at Rice Hall 442 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (II) Instructor: Haifeng Xu Outline
1
Ø Minbiao’s office hour will be changed to Thursday 1-2 pm,
starting from next week, at Rice Hall 442
CS6501: T
(Fall 2019)
Instructor: Haifeng Xu
3
Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis
4
Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏* ∈ 𝐵* Ø An outcome is the action profile 𝑏 = (𝑏., ⋯ , 𝑏/)
excluding 𝑏*
ØPlayer 𝑗 receives payoff 𝑣*(𝑏) for any outcome 𝑏 ∈ Π*5.
/ 𝐵*
Ø 𝐵* , 𝑣* *∈[/] are public knowledge
A mixed strategy profile 𝑦∗ = (𝑦.
∗, ⋯ , 𝑦/ ∗) is a Nash equilibrium
(NE) if for any 𝑗, 𝑦*
∗ is a best response to 𝑦1* ∗ .
5
ØNE rests on two key assumptions
before the move)
ØLast lecture: sequential move results in different player behaviors
equilibrium is called Strong Stackelberg equilibrium
Today: we study what happens if players do not take actions independently but instead are “coordinated” by a central mediator Ø This results in the study of correlated equilibrium
6
ØThere is a mediator – the traffic light – that coordinates cars’ moves Ø For example, recommend (GO, STOP) for (A,B) with probability 3/5
and (STOP, GO) for (A,B) with probability 2/5
− 6/5 and thus is “fair” STOP GO STOP (-3, -2) (-3, 0) GO (0, -2) (-100, -100)
A B
The Traffic Light Game Well, we did not see many crushes in reality… Why?
This is exactly how traffic lights are designed!
7
ØA (randomized) recommendation policy 𝜌 assigns probability 𝜌(𝑏)
for each action profile 𝑏 ∈ 𝐵 = Π*∈ / 𝐵*
ØUpon receiving a recommendation 𝑏*, player 𝑗’s expected utility is
. @ ∑BCD∈ECD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*)
A recommendation policy 𝜌 is a correlated equilibrium if
∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 . Ø That is, any recommended action to any player is a best response
Ø Assumed 𝜌 is public knowledge so every player can calculate her utility
8
ØIn fact, distributions 𝜌 satisfies a set of linear constraints
Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists
∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .
9
ØIn fact, distributions 𝜌 satisfies a set of linear constraints ØThis is nice because that allows us to optimize over all CEs ØNot true for Nash equilibrium
Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists
10
ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player
does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if
∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .
That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.
Compare to correlated equilibrium condition: ∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .
11
ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player
does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if
∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .
That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.
12
Nash Equilibrium (NE) Correlated Equilibrium (CE) Coarse Correlated Equilibrium (CCE) There are other equilibrium concepts, but NE and CE are most
13
Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis
14
ØTwo players: player 1 action 𝑗 ∈ 𝑛 = {1, ⋯ , 𝑛}, player 2 action 𝑘 ∈ [𝑜] ØThe game is zero-sum if 𝑣. 𝑗, 𝑘 + 𝑣N 𝑗, 𝑘 = 0, ∀𝑗 ∈ 𝑛 , 𝑘 ∈ [𝑜]
Ø Let 𝑣. 𝑦, 𝑧 = ∑*∈ Q ,R∈[/] 𝑣. 𝑗, 𝑘 𝑦*𝑧R for any 𝑦 ∈ ΔQ, 𝑧 ∈ Δ/ Ø (𝑦∗, 𝑧∗) is a NE for the zero-sum game if: (1) 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣.(𝑗, 𝑧∗) for
any 𝑗 ∈ [𝑛]; (2) 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) for any j ∈ [𝑛]
Ø Condition 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) ⟺ 𝑣N 𝑦∗, 𝑧∗ ≥ 𝑣N 𝑦∗, 𝑘 Ø We can “forget” 𝑣N; Instead think of player 2 as minimizing player 1’s utility
15
ØPrevious observations motivate the following definitions
The corresponding utility value is called maximin value of the game. max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).
Remarks: Ø 𝑦∗ is player 1’s best action if he was to move first
16
ØPrevious observations motivate the following definitions
The corresponding utility value is called maximin value of the game. max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).
The corresponding utility value is called minimax value of the game. min
_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).
Remark: 𝑧∗ is player 2’s best action if he was to move first
17
Ø Let 𝑧∗ = argmin
_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧), so
min
_∈[` max *∈ Q 𝑣.(𝑗, 𝑧) = max *∈ Q 𝑣1(𝑗, 𝑧∗)
Ø We have max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ max Z∈[\ 𝑣1(𝑦, 𝑧∗)
Fact. max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).
That is, moving first is no better.
= max
*∈ Q 𝑣1(𝑗, 𝑧∗)
18
max 𝑣 s.t. 𝑣 ≤ ∑*5.
Q 𝑣.(𝑗, 𝑘) 𝑦*, ∀𝑘 ∈ [𝑜]
∑*5.
Q 𝑦* = 1
𝑦* ≥ 0, ∀𝑗 ∈ [𝑛] Maximin Minimax min 𝑤 s.t. 𝑤 ≥ ∑R5.
/
𝑣.(𝑗, 𝑘) 𝑧R, ∀𝑗 ∈ [𝑛] ∑R5.
/
𝑧R = 1 𝑧R ≥ 0, ∀𝑘 ∈ [𝑜] Theorem. max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) = min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).
Fact. max
Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).
Ø Maximin and minimax can both be formulated as linear program Ø This turns out to be primal and dual LP. Strong duality yields the equation
19
⇐: if 𝑦∗ [𝑧∗] is the maximin [minimax] strategy, then (𝑦∗, 𝑧∗) is a NE
ØWant to prove 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗 ∈ [𝑛]
𝑣. 𝑦∗, 𝑧∗ ≥ min
e
𝑣. 𝑦∗, 𝑘 = max
Z∈[\ min e
𝑣. 𝑦, 𝑘 = min
_∈[` max *∈[Q] 𝑣.(𝑗, 𝑧)
= max
*∈[Q] 𝑣.(𝑗, 𝑧∗)
≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗
Ø Similar argument shows 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣. 𝑦∗, 𝑘 , ∀𝑘 ∈ [𝑜] Ø So 𝑦∗, 𝑧∗ is a NE
if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.
20
⇒: if (𝑦∗, 𝑧∗) is a NE, then 𝑦∗ [𝑧∗] is the maximin [minimax] strategy
ØObserve the following inequalities
𝑣. 𝑦∗, 𝑧∗ = max
*∈[Q] 𝑣.(𝑗, 𝑧∗)
≥ min
_∈[` max *∈ Q 𝑣. 𝑗, 𝑧
= max
Z∈[\ min e
𝑣. 𝑦, 𝑘 ≥ min
e
𝑣. 𝑦∗, 𝑘 = 𝑣. 𝑦∗, 𝑧∗
Ø So the two “≥” must both achieve equality.
if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.
21
if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively. Corollary. Ø NE of any 2-player zero-sum game can be computed by LPs Ø Players achieve the same utility in any Nash equilibrium.
22
The Collapse of Equilibrium Concepts in Zero-Sum Games
ØCan be proved using similar proof techniques as for the previous
theorem
ØThe problem of optimizing a player’s utility over equilibrium can
also be solved easily as the equilibrium utility is the same
utility in any Nash equilibrium, any correlated equilibrium, any coarse correlated equilibrium and any Strong Stackelberg equilibrium.
23
Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis
24
Input data points drawn from distribution 𝑄hijk Output data points drawn from distribution 𝑄lmnko Goal: use data points from 𝑄hijk to generate a 𝑄lmnko that is close to 𝑄hijk
25
Input images from true distributions Generated new images, i.e., samples from 𝑄lmnko
A few another Demos: https://miro.medium.com/max/928/1*tUhgr3m54Qc80GU2BkaOiQ.gif http://ganpaint.io/demo/?project=church https://www.youtube.com/watch?v=PCBTZh41Ris&feature=youtu.be Celeb training data [Karras et al. 2017]
26
ØGAN is one particular generative model – a zero-sum game
between the Generator and Discriminator
Objective: select model parameter 𝑣 such that distribution of 𝐻q(𝑨), denoted as 𝑄lmnko, is close to 𝑄ikso Objective: select model parameter 𝑤 such that 𝐸u(𝑦) is large if 𝑦 ∼ 𝑄ikso and 𝐸u(𝑦) is small if 𝑦 ∼ 𝑄lmnko
𝐻q 𝑨 = 𝑦 𝐸u 𝑦
z ∼ 𝑂(0,1)
27
ØGAN is one particular generative model – a zero-sum game
between the Generator and Discriminator
ØThe loss function originally formulated in [Goodfellow et al.’14]
𝑀 𝑣, 𝑤 = 𝔽Z∼z{|}~ log[𝐸u(𝑦)] + 𝔽•∼‚(ƒ,.) log[1 − 𝐸u(𝐻q 𝑨 )] Ø The game: Discriminator maximizes this loss function whereas Generator minimizes this loss function
min
q max u
𝑀(𝑣, 𝑤)
28
ØGAN is a large zero-sum game with intricate player payoffs ØGenerator strategy 𝐻q and Discriminator strategy 𝐸u are
typically deep neural networks, with parameters 𝑣, 𝑤
ØGenerator’s utility function has the following general form where
𝜚 is an increasing concave function (e.g., 𝜚 𝑦 = log 𝑦 , 𝑦 etc.) 𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) GAN research is mainly about modeling and solving this extremely large zero-sum game for various applications
29
Ø Drawbacks of log-likelihood loss: unbounded at boundary, unstable Ø Wasserstein GAN is a popular variant using a different loss function
𝔽Z∼z{|}~𝐸u 𝑦 − 𝔽•∼‚(ƒ,.)𝐸u(𝐻q 𝑨 )
30
min
q
max
u
𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø What are the correct choice of loss function 𝜚? Ø What neural network structure for 𝐻q and 𝐸u? Ø Only pure strategies allowed – equilibrium may not exist or is not unique due to non-convexity of strategies and loss function Ø Do not know 𝑄hijk exactly but only have samples Ø How to optimize parameters 𝑣, 𝑤? Ø . . . A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?
31
min
q
max
u
𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø Intuitively, if the discriminator network 𝐸u is strong enough, we should be able to get close to 𝑄hijk Ø Next, we will analyze the equilibrium of a stylized example A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?
32
ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦
Remarks: a) Both Generator and Discriminator can be deep neural networks in general b) We picked particular format for illustrative purpose and also convenience of theoretical analysis
33
ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦 Ø WGAN then has the following close-form format
⇒ min
q
max
u
𝔽Z∼‚(†,.) 𝑤𝑦 + 𝔽•∼‚ ƒ,. [1 − 𝑤(𝑨 + 𝑣)] min
q
max
u
𝔽Z∼z{|}~[𝐸u(𝑦)] + 𝔽•∼‚ ƒ,. [1 − 𝐸u(𝐻q 𝑨 )] ⇒ min
q
max
u
𝑤𝛽 + [1 − 𝑤𝑣] Ø This minimax problem solves to 𝑣∗ = 𝛽 Ø I.e, WGAN does precisely learn 𝑄hijk at equilibrium in this case
34
See paper “Generalization and Equilibrium in GANs” by Aaora et
whether they learn a good distribution at equilibrium
Haifeng Xu
University of Virginia hx4ad@virginia.edu