Announcements Minbiaos office hour will be changed to Thursday 1-2 - - PowerPoint PPT Presentation

announcements
SMART_READER_LITE
LIVE PREVIEW

Announcements Minbiaos office hour will be changed to Thursday 1-2 - - PowerPoint PPT Presentation

Announcements Minbiaos office hour will be changed to Thursday 1-2 pm, starting from next week, at Rice Hall 442 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (II) Instructor: Haifeng Xu Outline


slide-1
SLIDE 1

1

Announcements

Ø Minbiao’s office hour will be changed to Thursday 1-2 pm,

starting from next week, at Rice Hall 442

slide-2
SLIDE 2

CS6501: T

  • pics in Learning and Game Theory

(Fall 2019)

Introduction to Game Theory (II)

Instructor: Haifeng Xu

slide-3
SLIDE 3

3

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

slide-4
SLIDE 4

4

Recap: Normal-Form Games

Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏* ∈ 𝐵* Ø An outcome is the action profile 𝑏 = (𝑏., ⋯ , 𝑏/)

  • As a convention, 𝑏1* = (𝑏., ⋯ , 𝑏*1., 𝑏*2., ⋯ , 𝑏/) denotes all actions

excluding 𝑏*

ØPlayer 𝑗 receives payoff 𝑣*(𝑏) for any outcome 𝑏 ∈ Π*5.

/ 𝐵*

  • 𝑣* 𝑏 = 𝑣*(𝑏*, 𝑏1*) depends on other players’ actions

Ø 𝐵* , 𝑣* *∈[/] are public knowledge

A mixed strategy profile 𝑦∗ = (𝑦.

∗, ⋯ , 𝑦/ ∗) is a Nash equilibrium

(NE) if for any 𝑗, 𝑦*

∗ is a best response to 𝑦1* ∗ .

slide-5
SLIDE 5

5

ØNE rests on two key assumptions

  • 1. Players move simultaneously (so they cannot see others’ strategies

before the move)

  • 2. Players take actions independently

ØLast lecture: sequential move results in different player behaviors

  • The corresponding game is called Stackelberg game and its

equilibrium is called Strong Stackelberg equilibrium

NE Is Not the Only Solution Concept

Today: we study what happens if players do not take actions independently but instead are “coordinated” by a central mediator Ø This results in the study of correlated equilibrium

slide-6
SLIDE 6

6

An Illustrative Example

ØThere is a mediator – the traffic light – that coordinates cars’ moves Ø For example, recommend (GO, STOP) for (A,B) with probability 3/5

and (STOP, GO) for (A,B) with probability 2/5

  • GO = green light, STOP = red light
  • Following the recommendation is a best response for each player
  • It turns out that this recommendation policy results in equal player utility

− 6/5 and thus is “fair” STOP GO STOP (-3, -2) (-3, 0) GO (0, -2) (-100, -100)

A B

The Traffic Light Game Well, we did not see many crushes in reality… Why?

This is exactly how traffic lights are designed!

slide-7
SLIDE 7

7

Correlated Equilibrium (CE)

ØA (randomized) recommendation policy 𝜌 assigns probability 𝜌(𝑏)

for each action profile 𝑏 ∈ 𝐵 = Π*∈ / 𝐵*

  • A mediator first samples 𝑏 ∼ 𝜌, then recommends 𝑏* to 𝑗 privately

ØUpon receiving a recommendation 𝑏*, player 𝑗’s expected utility is

. @ ∑BCD∈ECD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*)

  • 𝑑 is a normalization term that equals the probability 𝑏* is recommended

A recommendation policy 𝜌 is a correlated equilibrium if

∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 . Ø That is, any recommended action to any player is a best response

  • CE makes incentive compatible action recommendations

Ø Assumed 𝜌 is public knowledge so every player can calculate her utility

slide-8
SLIDE 8

8

Basic Facts about Correlated Equilibrium

ØIn fact, distributions 𝜌 satisfies a set of linear constraints

  • Fact. Any Nash equilibrium is also a correlated equilibrium.

Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists

  • Fact. The set of correlated equilibria forms a convex set.

∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

slide-9
SLIDE 9

9

Basic Facts about Correlated Equilibrium

ØIn fact, distributions 𝜌 satisfies a set of linear constraints ØThis is nice because that allows us to optimize over all CEs ØNot true for Nash equilibrium

  • Fact. Any Nash equilibrium is also a correlated equilibrium.

Ø True by definition. Nash equilibrium can be viewed as independent action recommendation Ø As a corollary, correlated equilibrium always exists

  • Fact. The set of correlated equilibria forms a convex set.
slide-10
SLIDE 10

10

Coarse Correlated Equilibrium (CCE)

ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player

does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if

∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.

Compare to correlated equilibrium condition: ∑BCD 𝑣* 𝑏*, 𝑏1* ⋅ 𝜌(𝑏*, 𝑏1*) ≥ ∑BCD 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏*, 𝑏1* , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

slide-11
SLIDE 11

11

Coarse Correlated Equilibrium (CCE)

ØA weaker notion of correlated equilibrium ØAlso a recommendation policy 𝜌, but only requires that any player

does not have incentives to opting out of our recommendations A recommendation policy 𝜌 is a coarse correlated equilibrium if

∑B∈E 𝑣* 𝑏 ⋅ 𝜌(𝑏) ≥ ∑B∈E 𝑣* 𝑏I*, 𝑏1* ⋅ 𝜌 𝑏 , ∀ 𝑏I* ∈ 𝐵*, ∀𝑗 ∈ 𝑜 .

That is, for any player 𝑗, following 𝜌’s recommendations is better than opting out of the recommendation and “acting on his own”.

  • Fact. Any correlated equilibrium is a coarse correlated equilibrium.
slide-12
SLIDE 12

12

The Equilibrium Hierarchy

Nash Equilibrium (NE) Correlated Equilibrium (CE) Coarse Correlated Equilibrium (CCE) There are other equilibrium concepts, but NE and CE are most

  • ften used. CCE is not used that often.
slide-13
SLIDE 13

13

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

slide-14
SLIDE 14

14

Zero-Sum Games

ØTwo players: player 1 action 𝑗 ∈ 𝑛 = {1, ⋯ , 𝑛}, player 2 action 𝑘 ∈ [𝑜] ØThe game is zero-sum if 𝑣. 𝑗, 𝑘 + 𝑣N 𝑗, 𝑘 = 0, ∀𝑗 ∈ 𝑛 , 𝑘 ∈ [𝑜]

  • Models the strictly competitive scenarios
  • “Zero-sum” almost always mean “2-player zero-sum” games
  • 𝑜-player games can also be zero-sum, but not particularly interesting

Ø Let 𝑣. 𝑦, 𝑧 = ∑*∈ Q ,R∈[/] 𝑣. 𝑗, 𝑘 𝑦*𝑧R for any 𝑦 ∈ ΔQ, 𝑧 ∈ Δ/ Ø (𝑦∗, 𝑧∗) is a NE for the zero-sum game if: (1) 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣.(𝑗, 𝑧∗) for

any 𝑗 ∈ [𝑛]; (2) 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) for any j ∈ [𝑛]

Ø Condition 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣.(𝑦∗, 𝑘) ⟺ 𝑣N 𝑦∗, 𝑧∗ ≥ 𝑣N 𝑦∗, 𝑘 Ø We can “forget” 𝑣N; Instead think of player 2 as minimizing player 1’s utility

slide-15
SLIDE 15

15

Maximin and Minimax Strategy

ØPrevious observations motivate the following definitions

  • Definition. 𝑦∗ ∈ ΔQ is a maximin strategy of player 1 if it solves

The corresponding utility value is called maximin value of the game. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).

Remarks: Ø 𝑦∗ is player 1’s best action if he was to move first

slide-16
SLIDE 16

16

Maximin and Minimax Strategy

ØPrevious observations motivate the following definitions

  • Definition. 𝑦∗ ∈ ΔQ is a maximin strategy of player 1 if it solves

The corresponding utility value is called maximin value of the game. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘).

  • Definition. 𝑧∗ ∈ Δ/ is a minimax strategy of player 2 if it solves

The corresponding utility value is called minimax value of the game. min

_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Remark: 𝑧∗ is player 2’s best action if he was to move first

slide-17
SLIDE 17

17

Duality of Maximin and Minimax

Ø Let 𝑧∗ = argmin

_∈[` max *∈[Q] 𝑣1(𝑗, 𝑧), so

min

_∈[` max *∈ Q 𝑣.(𝑗, 𝑧) = max *∈ Q 𝑣1(𝑗, 𝑧∗)

Ø We have max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ max Z∈[\ 𝑣1(𝑦, 𝑧∗)

Fact. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

That is, moving first is no better.

= max

*∈ Q 𝑣1(𝑗, 𝑧∗)

slide-18
SLIDE 18

18

Duality of Maximin and Minimax

max 𝑣 s.t. 𝑣 ≤ ∑*5.

Q 𝑣.(𝑗, 𝑘) 𝑦*, ∀𝑘 ∈ [𝑜]

∑*5.

Q 𝑦* = 1

𝑦* ≥ 0, ∀𝑗 ∈ [𝑛] Maximin Minimax min 𝑤 s.t. 𝑤 ≥ ∑R5.

/

𝑣.(𝑗, 𝑘) 𝑧R, ∀𝑗 ∈ [𝑛] ∑R5.

/

𝑧R = 1 𝑧R ≥ 0, ∀𝑘 ∈ [𝑜] Theorem. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) = min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Fact. max

Z∈[\ min R∈[/] 𝑣1(𝑦, 𝑘) ≤ min _∈[` max *∈[Q] 𝑣1(𝑗, 𝑧).

Ø Maximin and minimax can both be formulated as linear program Ø This turns out to be primal and dual LP. Strong duality yields the equation

slide-19
SLIDE 19

19

“Uniqueness” of Nash Equilibrium (NE)

⇐: if 𝑦∗ [𝑧∗] is the maximin [minimax] strategy, then (𝑦∗, 𝑧∗) is a NE

ØWant to prove 𝑣. 𝑦∗, 𝑧∗ ≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗 ∈ [𝑛]

𝑣. 𝑦∗, 𝑧∗ ≥ min

e

𝑣. 𝑦∗, 𝑘 = max

Z∈[\ min e

𝑣. 𝑦, 𝑘 = min

_∈[` max *∈[Q] 𝑣.(𝑗, 𝑧)

= max

*∈[Q] 𝑣.(𝑗, 𝑧∗)

≥ 𝑣. 𝑗, 𝑧∗ , ∀𝑗

Ø Similar argument shows 𝑣. 𝑦∗, 𝑧∗ ≤ 𝑣. 𝑦∗, 𝑘 , ∀𝑘 ∈ [𝑜] Ø So 𝑦∗, 𝑧∗ is a NE

  • Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.

slide-20
SLIDE 20

20

“Uniqueness” of Nash Equilibrium (NE)

⇒: if (𝑦∗, 𝑧∗) is a NE, then 𝑦∗ [𝑧∗] is the maximin [minimax] strategy

ØObserve the following inequalities

𝑣. 𝑦∗, 𝑧∗ = max

*∈[Q] 𝑣.(𝑗, 𝑧∗)

≥ min

_∈[` max *∈ Q 𝑣. 𝑗, 𝑧

= max

Z∈[\ min e

𝑣. 𝑦, 𝑘 ≥ min

e

𝑣. 𝑦∗, 𝑘 = 𝑣. 𝑦∗, 𝑧∗

Ø So the two “≥” must both achieve equality.

  • The first equality implies 𝑧∗ is the minimax strategy
  • The second equality implies 𝑦∗ is the maximin strategy
  • Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively.

slide-21
SLIDE 21

21

“Uniqueness” of Nash Equilibrium (NE)

  • Theorem. In 2-player zero-sum games, (𝑦∗, 𝑧∗) is a NE if and only

if 𝑦∗ and 𝑧∗ are the maximin and minimax strategy, respectively. Corollary. Ø NE of any 2-player zero-sum game can be computed by LPs Ø Players achieve the same utility in any Nash equilibrium.

  • Player 1’s NE utility always equals maximin (or minimax) value
  • This utility is also called the game value
slide-22
SLIDE 22

22

The Collapse of Equilibrium Concepts in Zero-Sum Games

ØCan be proved using similar proof techniques as for the previous

theorem

ØThe problem of optimizing a player’s utility over equilibrium can

also be solved easily as the equilibrium utility is the same

  • Theorem. In a 2-player zero-sum game, a player achieves the same

utility in any Nash equilibrium, any correlated equilibrium, any coarse correlated equilibrium and any Strong Stackelberg equilibrium.

slide-23
SLIDE 23

23

Outline

Ø Correlated and Coarse Correlated Equilibrium Ø Zero-Sum Games Ø GANs and Equilibrium Analysis

slide-24
SLIDE 24

24

Generative Modeling

Input data points drawn from distribution 𝑄hijk Output data points drawn from distribution 𝑄lmnko Goal: use data points from 𝑄hijk to generate a 𝑄lmnko that is close to 𝑄hijk

slide-25
SLIDE 25

25

Applications

Input images from true distributions Generated new images, i.e., samples from 𝑄lmnko

A few another Demos: https://miro.medium.com/max/928/1*tUhgr3m54Qc80GU2BkaOiQ.gif http://ganpaint.io/demo/?project=church https://www.youtube.com/watch?v=PCBTZh41Ris&feature=youtu.be Celeb training data [Karras et al. 2017]

slide-26
SLIDE 26

26

GANs: Generative Adversarial Networks

ØGAN is one particular generative model – a zero-sum game

between the Generator and Discriminator

Objective: select model parameter 𝑣 such that distribution of 𝐻q(𝑨), denoted as 𝑄lmnko, is close to 𝑄ikso Objective: select model parameter 𝑤 such that 𝐸u(𝑦) is large if 𝑦 ∼ 𝑄ikso and 𝐸u(𝑦) is small if 𝑦 ∼ 𝑄lmnko

𝐻q 𝑨 = 𝑦 𝐸u 𝑦

z ∼ 𝑂(0,1)

slide-27
SLIDE 27

27

GANs: Generative Adversarial Networks

ØGAN is one particular generative model – a zero-sum game

between the Generator and Discriminator

ØThe loss function originally formulated in [Goodfellow et al.’14]

  • 𝐸u 𝑦 = probability of classifying 𝑦 as ”Real”
  • Log of the likelihood of being correct

𝑀 𝑣, 𝑤 = 𝔽Z∼z{|}~ log[𝐸u(𝑦)] + 𝔽•∼‚(ƒ,.) log[1 − 𝐸u(𝐻q 𝑨 )] Ø The game: Discriminator maximizes this loss function whereas Generator minimizes this loss function

  • Results in the following zero-sum game
  • The design of Discriminator is to improve training of Generator

min

q max u

𝑀(𝑣, 𝑤)

slide-28
SLIDE 28

28

GANs: Generative Adversarial Networks

ØGAN is a large zero-sum game with intricate player payoffs ØGenerator strategy 𝐻q and Discriminator strategy 𝐸u are

typically deep neural networks, with parameters 𝑣, 𝑤

ØGenerator’s utility function has the following general form where

𝜚 is an increasing concave function (e.g., 𝜚 𝑦 = log 𝑦 , 𝑦 etc.) 𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) GAN research is mainly about modeling and solving this extremely large zero-sum game for various applications

slide-29
SLIDE 29

29

WGAN – A Popular Variant of GAN

Ø Drawbacks of log-likelihood loss: unbounded at boundary, unstable Ø Wasserstein GAN is a popular variant using a different loss function

  • I.e., substitute log-likelihood by the likelihood itself
  • Training is typically more stable

𝔽Z∼z{|}~𝐸u 𝑦 − 𝔽•∼‚(ƒ,.)𝐸u(𝐻q 𝑨 )

slide-30
SLIDE 30

30

Research Challenges in GANs

min

q

max

u

𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø What are the correct choice of loss function 𝜚? Ø What neural network structure for 𝐻q and 𝐸u? Ø Only pure strategies allowed – equilibrium may not exist or is not unique due to non-convexity of strategies and loss function Ø Do not know 𝑄hijk exactly but only have samples Ø How to optimize parameters 𝑣, 𝑤? Ø . . . A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?

slide-31
SLIDE 31

31

Research Challenges in GANs

min

q

max

u

𝔽Z∼z{|}~𝜚([𝐸u(𝑦)]) + 𝔽•∼‚ ƒ,. 𝜚([1 − 𝐸u(𝐻q 𝑨 )]) Ø Intuitively, if the discriminator network 𝐸u is strong enough, we should be able to get close to 𝑄hijk Ø Next, we will analyze the equilibrium of a stylized example A Basic Question Even if we computed the equilibrium w.r.t. some loss function, does that really mean we generated a distribution close to 𝑄hijk?

slide-32
SLIDE 32

32

(Stylized) WGANs for Learning Mean

ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦

Remarks: a) Both Generator and Discriminator can be deep neural networks in general b) We picked particular format for illustrative purpose and also convenience of theoretical analysis

slide-33
SLIDE 33

33

(Stylized) WGANs for Learning Mean

ØTrue data drawn from 𝑄hijk = 𝑂(𝛽, 1) Ø Generator 𝐻q 𝑨 = 𝑨 + 𝑣 where 𝑨 ∼ 𝑂(0,1) Ø Discriminator 𝐸u 𝑦 = 𝑤𝑦 Ø WGAN then has the following close-form format

⇒ min

q

max

u

𝔽Z∼‚(†,.) 𝑤𝑦 + 𝔽•∼‚ ƒ,. [1 − 𝑤(𝑨 + 𝑣)] min

q

max

u

𝔽Z∼z{|}~[𝐸u(𝑦)] + 𝔽•∼‚ ƒ,. [1 − 𝐸u(𝐻q 𝑨 )] ⇒ min

q

max

u

𝑤𝛽 + [1 − 𝑤𝑣] Ø This minimax problem solves to 𝑣∗ = 𝛽 Ø I.e, WGAN does precisely learn 𝑄hijk at equilibrium in this case

slide-34
SLIDE 34

34

See paper “Generalization and Equilibrium in GANs” by Aaora et

  • al. (2017) for more analysis regarding the equilibrium of GANs and

whether they learn a good distribution at equilibrium

slide-35
SLIDE 35

Thank You

Haifeng Xu

University of Virginia hx4ad@virginia.edu