[PPT] - L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M PowerPoint Presentation

SLIDE 1

LECTURE 28: GAME THEORY 3

INSTRUCTOR: GIANNI A. DI CARO

15-382 COLLECTIVE INTELLIGENCE – S18

SLIDE 2

15781 Fall 2016: Lecture 22

MIXED NASH EQUILIBRIUM

0,0

1,1

1,-1 1,-1 0,0

1,1
1,1

1,-1 0,0

2

1 3 ,1 3 , 1 3 , 1 3 ,1 3 , 1 3

Finding ME: § Let 𝜌%

& ,𝜌' & ,𝜌( &, be the probabilities of the pure

strategy mix for player 𝑗 = 1,2, 𝜌%

& + 𝜌' & + 𝜌( & = 1

§ A mixed strategy equilibrium needs to make player 𝑗 indifferent among all three of his strategies (i.e. same expected utility) § à Find player 𝑗 expected utilities as a function of the parameters of the mixed strategy and set the parameters in order to satisfy the previous requirement R S P R P S

0,0

2,2

1,-1 2,-2 0,0

1,1
1,1

1,-1 0,0

R S P R P S § In symmetric zero-sum games the expected utility of the players at equilibrium is zero à This property can be used to rule out equilibrium candidates (just check if one player has a positive utility!)

SLIDE 3

3

GAME OF CHICKEN

http://youtu.be/u7hZ9jKrwvo

§ Each player, in attempting to secure his best outcome, risks the worst § Every player wants to dare, but only if the other chickens out! § A mediator would help…

SLIDE 4

4

GAME OF CHICKEN

§ Social welfare is the sum of utilities § Optimal social welfare = 6 § Pure NE: (C,D) and (D,C), social welfare = 5 § Mixed NE: both (.

/, . /), social

welfare = 4 § Can we do better? Players are independent so far …

Dare Chicken Dare

0,0 4,1

Chicken

1,4 3,3

SLIDE 5

5

CORRELATED EQUILIBRIUM

§ A “trusted” authority / mediator chooses a pair of strategies (𝑡1, 𝑡2) according to a probability distribution 𝑞 over 𝑇2 (it can be generalized to 𝑜 players)

§ The mediator “flips a coin” / draw according to the distribution 𝑞(𝑡., 𝑡/) and, based on the outcome, tells the players which pure strategy to use based

Robert Aumann Nobel prize, 2005

SLIDE 6

6

CORRELATED EQUILIBRIUM

§ à The trusted party only tells each player what to do, but it does not reveal what the other party is supposed to do! § The distribution 𝒒 is known to the players: each player knows the probability of observing a strategy profile and assumes the other player will follow mediator’s instructions § à Posterior conditional probability is known: Pr [𝑡&|𝑡

;]

§ It is a Correlated Equilibrium (CE) if no player wants to deviate from the trusted party’s instructions, such that choices are correlated § à Find distribution 𝑞 that guarantees a CE

SLIDE 7

Dare Chicken Dare

0,0 7,2

Chicken

2,7 6,6

7

CORRELATED EQUILIBRIUM

§ Common knowledge: Distribution 𝑞 (is CE)

(D,D): 0
(D,C): .

>

(C,D): .

>

(C,C): .

>

§ If Player 2 is told to play D, then P2 knows that the outcome must be (C,D) and that Player 1 will obey the instructions à P1 plays C ü Based on this, Player 2 has no incentive to change from playing D, as given

SLIDE 8

8

CORRELATED EQUILIBRIUM

§ Distribution 𝑞 (is CE)

(D,D): 0
(D,C): .

>

(C,D): .

>

(C,C): .

>

0,0 7,2 2,7 6,6

§ If Player 2 is told to play C, then 2 knows that the outcome must be (D,C) or (C,C) with equal probability. § Player’s 2 expected utility on playing C conditioned on the fact that he is told to play C (and Player 1 will obey instructions) is:

. / 𝑣/ 𝐸, 𝐷 + . / 𝑣/ 𝐷, 𝐷 = . / 2 + . / 6 = 4

§ If Player 2 deviates from instructions and plays D: 𝑣/ = 3.5 < 4 ü It’s better to follow the instructions!

Chicken Dare Dare Chicken

SLIDE 9

9

CORRELATED EQUILIBRIUM

§ Distribution 𝑞 (is CE)

(D,D): 0
(D,C): .

>

(C,D): .

>

(C,C): .

>

§ Player 2 does not have incentive to deviate § Since the game is symmetric, also Player 1 does not have incentive to deviate § → Correlated equilibrium! § Expected reward per player: (1/3)7 + (1/3)2 + (1/3)6 = 5 § Mixed strategy NE: 4(2/3), which is < 5 § Social welfare: 30/3

0,0 7,2 2,7 6,6

Chicken Dare Dare Chicken

SLIDE 10

10

CORRELATED EQUILIBRIUM

§ Let 𝑂 = {1,2} for simplicity § A mediator chooses a pair of strategies (𝑡.,𝑡/) according to a distribution 𝑞 over 𝑇/ § Reveals 𝑡. to player 1 and 𝑡/ to player 2 § When player 1 gets 𝑡. ∈ 𝑇, he knows that the distribution

ver strategies of 2 is

Pr 𝑡/ 𝑡. = Pr 𝑡. ∧ 𝑡/ Pr 𝑡. = 𝑞 𝑡., 𝑡/ ∑ 𝑞(𝑡., 𝑡/

M) NO

P∈(

SLIDE 11

11

COMPUTING CE STRATEGY

§ Player’s 1 strategy 𝑡. is a best response if its expected utility cannot be unilaterally improved based on what he knows: Q Pr 𝑡/ 𝑡. 𝑣. 𝑡.,𝑡/ ≥ Q Pr 𝑡/ 𝑡. 𝑣.(𝑡.

M,𝑡/) NO∈( NO∈(

, ∀𝑡.

M ∈ 𝑇

§ Equivalently, replacing using Bayes’ rule Q 𝑞 𝑡., 𝑡/ 𝑣. 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣.(𝑡.

M,𝑡/) NO∈( NO∈(

§ 𝑞 is a correlated equilibrium (CE) if both players are best responding

SLIDE 12

12

CE AS LP

§ Can compute CE via linear programming in polynomial time!

find 𝑞 𝑡., 𝑡/ s.t. ∀𝑡., 𝑡.

M, 𝑡/ ∈ 𝑇,

∀𝑡., 𝑡/, 𝑡/

M ∈ 𝑇,

Q 𝑞 𝑡., 𝑡/ = 1

NT,NO∈(

∀𝑡., 𝑡/ ∈ 𝑇, 𝑞 𝑡., 𝑡/ ∈ [0,1]

Q 𝑞 𝑡.,𝑡/ 𝑣. 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣.(𝑡.

M,𝑡/) NO∈U NO∈U

Q 𝑞 𝑡.,𝑡/ 𝑣/ 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣/(𝑡.,𝑡/

M) NT∈U NT∈U

SLIDE 13

13

BEST WELFARE CE

§ Adding an objective (linear) function f, the best correlated equilibrium (e.g., max welfare) can be found

max 𝑔(𝑞 𝑡., 𝑡/ ;𝑣., 𝑣/)

s.t. ∀𝑡., 𝑡.

M, 𝑡/ ∈ 𝑇,

∀𝑡., 𝑡/, 𝑡/

M ∈ 𝑇,

Q 𝑞 𝑡., 𝑡/ = 1

NT,NO∈(

∀𝑡., 𝑡/ ∈ 𝑇, 𝑞 𝑡., 𝑡/ ∈ [0,1]

Q 𝑞 𝑡.,𝑡/ 𝑣. 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣.(𝑡.

M,𝑡/) NO∈U NO∈U

Q 𝑞 𝑡.,𝑡/ 𝑣/ 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣/(𝑡.,𝑡/

M) NT∈U NT∈U

SLIDE 14

14

IMPLEMENTATION OF CE

§ Instead of a mediator, use a hat! § Balls in hat are labeled with “chicken” or “dare”, each blindfolded player takes a ball Which balls implement the distribution 𝑞 before ?

1. 1 chicken, 1 dare
2. 2 chicken, 1 dare
3. 2 chicken, 2 dare
4. 3 chicken, 2 dare

C D D C C C D C

E.g., An automatic trusting authority can be implemented using cryptographic algorithms

SLIDE 15

15

CE VS. NE

What is the relation between CE and NE? 1. CE ⇒ NE 2. NE ⇒ CE 3. NE ⇔ CE 4. NE ∥ CE

§ For any pure strategy NE, there is a corresponding correlated equilibrium yielding the same outcome. § For any mixed strategy NE, there is a corresponding correlated equilibrium yielding the same distribution of outcomes. § From Nash theorem, “all” games have a mixed strategies NE”. Since a NE implies a CE, a CE always exist

SLIDE 16

16

A DIFFERENT TYPE OF GAMES: STACKELBERG GAMES

§ Playing up is a dominant strategy for row player § Row player plays up § Column player would then play left § Therefore, (1,1) is the only Nash equilibrium outcome

1,1 3,0 0,0 2,1

L R U D

SLIDE 17

17

COMMITMENT IS GOOD

§ Suppose the game is played sequentially as follows:

Row player commits to playing

a row

Column player observes the

commitment and chooses a column § Row player can commit to playing Down: Column player will play Right and the Row player gets now a better reward!

1,1 3,0 0,0 2,1

L R U D

SLIDE 18

1,1 3,0 0,0 2,1

18

COMMITMENT TO MIXED STRATEGY

§ By committing to a mixed strategy, row player can get even better and guarantee a reward of almost 2.5: 0.49×3 + 0.51×2 § Stackelberg strategy (1934) § Rooted in duopoly scenarios

𝜌] = 0.49

§ Player 1 (Leader) moves at the start of the game. Then use backward induction to find the subgame perfect equilibrium. § First for any output of leader, find the strategy of Follower that maximizes its payoff (its expected best reply). § Next, find the strategy of leader that maximizes leader player utility, given the strategy of follower

𝜌^ = 0.51 𝜌% = 1 𝜌_ = 0

SLIDE 19

19

COMPUTING STACKELBERG

§ Theorem [Conitzer and Sandholm, 2006]: In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time § Theorem: The problem is NP-hard when the number of players is ≥ 3

SLIDE 20

20

TRACTABILITY: 2 PLAYERS

§ For each pure strategy 𝑡/ of the follower, we compute via the LP below a mixed strategy 𝑦. for the leader such that:

Playing 𝑡/ is a best response for the follower
Under this constraint, 𝑦. is optimal (for follower)

§ Choose 𝑦.

∗ that maximizes leader’s utility

max ∑

𝑦. 𝑡. 𝑣.(𝑡.,𝑡/)

NT∈(

s.t.

∀𝑡/

M ∈ 𝑇,

∀𝑡. ∈ 𝑇, 𝑦. 𝑡. ∈ [0,1] ∑ 𝑦. 𝑡. 𝑣/ 𝑡

., 𝑡/ ≥ ∑

𝑦. 𝑡. 𝑣/ 𝑡

., 𝑡/ M NT∈( NT∈(

∑ 𝑦. 𝑡. = 1

NT∈(

SLIDE 21

21

APPLICATION: SECURITY

§ Airport security: deployed at LAX § Federal Air Marshals § Coast Guard § Idea:

Defender commits to mixed strategy
Attacker observes and best responds

§ Attacker monitors defender and tries to maximize damage, while a defender deploys resources to minimize damage based on knowledge of what the attacker would like to obtain

SLIDE 22

22

SECURITY GAMES

§ Set of targets 𝑈 = {1,… ,𝑜}

E.g., Entrance gates of an airport

§ Set Ω of 𝑠 security resources available to the defender (leader)

E.g., Video cameras looking at specific

directions

E.g., Air marshals flying on specific flights

§ Set Σ of feasible partitions of target set: Σ ⊆ 2𝑈 § Resource 𝜕 ∈ Ω can be assigned to one of the partitions in 𝐵 𝜕 ⊆ Σ à Schedule § Defender (leader): how to deploy the resources

ver the target à How to assign resources to

targets § Attacker (follower) chooses targets to attack

resources targets

SLIDE 23

23

SECURITY GAMES

§ For each target 𝑢, there are four numbers defining the payoffs of defender and attacker in case of successful and unsuccessful attack:

𝑣𝑒

+ 𝑢 = Defender’s payoff when target 𝑢 is attacked and target

was covered by at least one resource

𝑣𝑒

− 𝑢 = Defender’s payoff when target 𝑢 is attacked and target

was not covered

𝑣𝑏

− 𝑢 = Attacker’s payoff when target 𝑢 is attacked and target

was not covered

𝑣𝑏

+ 𝑢 = Attacker’s payoff when target 𝑢 is attacked and target

was covered by at least one resource § 𝑣𝑒

+ 𝑢 ≥ 𝑣𝑒 − 𝑢 , and 𝑣𝑏 + 𝑢 ≤ 𝑣𝑏 − 𝑢

§ For each target 𝑢 there’s a coverage probability by the defender using the resources in Ω, such that the set of targets defines 𝒅 = (𝑑1,… , 𝑑𝑜), a vector of coverage probabilities

SLIDE 24

24

SECURITY GAMES

§ The expected utilities to the defender/attacker under coverage c if target 𝑢 is attacked are:

resources targets

𝑣q 𝑢,𝒅 = 𝑣q

r 𝑢 ⋅ 𝑑t + 𝑣q u 𝑢

1 − 𝑑t 𝑣v 𝑢, 𝒅 = 𝑣v

r 𝑢 ⋅ 𝑑t + 𝑣v u 𝑢

1 − 𝑑t

Payoffs example with two targets:

SLIDE 25

25

BAYESIAN SECURITY GAMES

§ There are multiple types of potential attackers, each type with different payoffs § The defender knows about their payoffs, as well as the distribution

f attacker types

§ Bayesian extensions are used to model uncertainty over the payoffs and preferences of the players, where more uncertainty can be expressed with increasing number of types. Payoffs example with two targets and two types of attackers:

SLIDE 26

26

SOLVING SECURITY GAMES

§ It’s is a 2-player Stackelberg game, so we can compute an

ptimal strategy for the defender in polynomial time…?

§ Consider the case of Σ = 𝑈, i.e., resources are assigned to individual targets, i.e., schedules have size 1 § Nevertheless, number of leader strategies is exponential: 2𝑠 § à Representation of the linear program is exponential § Theorem [Korzhyk et al. 2010]: Optimal leader strategy can be computed in poly time § A number of smart algorithms have been developed …

§ Jain M., An B., Tambe M. (2013) Security Games Applied to Real-World: Research Contributions and Challenges. In: Jajodia S., Ghosh A., Subrahmanian V., Swarup V., Wang C., Wang X. (eds) Moving Target Defense II. Advances in Information Security, vol 100. Springer, New York, NY

SLIDE 27

27

LECTURE 28: GAME THEORY 3

INSTRUCTOR: GIANNI A. DI CARO

15-382 COLLECTIVE INTELLIGENCE – S18

MIXED NASH EQUILIBRIUM

0,0

1,-1 1,-1 0,0

1,-1 0,0

0,0

1,-1 2,-2 0,0

1,-1 0,0

GAME OF CHICKEN

§ Each player, in attempting to secure his best outcome, risks the worst § Every player wants to dare, but only if the other chickens out! § A mediator would help…

GAME OF CHICKEN

§ Social welfare is the sum of utilities § Optimal social welfare = 6 § Pure NE: (C,D) and (D,C), social welfare = 5 § Mixed NE: both (.

welfare = 4 § Can we do better? Players are independent so far …

0,0 4,1

1,4 3,3

CORRELATED EQUILIBRIUM

§ A “trusted” authority / mediator chooses a pair of strategies (𝑡1, 𝑡2) according to a probability distribution 𝑞 over 𝑇2 (it can be generalized to 𝑜 players)

§ The mediator “flips a coin” / draw according to the distribution 𝑞(𝑡., 𝑡/) and, based on the outcome, tells the players which pure strategy to use based

CORRELATED EQUILIBRIUM

§ It is a Correlated Equilibrium (CE) if no player wants to deviate from the trusted party’s instructions, such that choices are correlated § à Find distribution 𝑞 that guarantees a CE

0,0 7,2

2,7 6,6

CORRELATED EQUILIBRIUM

§ Common knowledge: Distribution 𝑞 (is CE)

§ If Player 2 is told to play D, then P2 knows that the outcome must be (C,D) and that Player 1 will obey the instructions à P1 plays C ü Based on this, Player 2 has no incentive to change from playing D, as given

CORRELATED EQUILIBRIUM

§ If Player 2 is told to play C, then 2 knows that the outcome must be (D,C) or (C,C) with equal probability. § Player’s 2 expected utility on playing C conditioned on the fact that he is told to play C (and Player 1 will obey instructions) is:

§ If Player 2 deviates from instructions and plays D: 𝑣/ = 3.5 < 4 ü It’s better to follow the instructions!

CORRELATED EQUILIBRIUM

§ Player 2 does not have incentive to deviate § Since the game is symmetric, also Player 1 does not have incentive to deviate § → Correlated equilibrium! § Expected reward per player: (1/3)*7 + (1/3)*2 + (1/3)*6 = 5 § Mixed strategy NE: 4*(2/3), which is < 5 § Social welfare: 30/3

CORRELATED EQUILIBRIUM

§ Let 𝑂 = {1,2} for simplicity § A mediator chooses a pair of strategies (𝑡.,𝑡/) according to a distribution 𝑞 over 𝑇/ § Reveals 𝑡. to player 1 and 𝑡/ to player 2 § When player 1 gets 𝑡. ∈ 𝑇, he knows that the distribution

Pr 𝑡/ 𝑡. = Pr 𝑡. ∧ 𝑡/ Pr 𝑡. = 𝑞 𝑡., 𝑡/ ∑ 𝑞(𝑡., 𝑡/

COMPUTING CE STRATEGY

§ Player’s 1 strategy 𝑡. is a best response if its expected utility cannot be unilaterally improved based on what he knows: Q Pr 𝑡/ 𝑡. 𝑣. 𝑡.,𝑡/ ≥ Q Pr 𝑡/ 𝑡. 𝑣.(𝑡.

, ∀𝑡.

§ Equivalently, replacing using Bayes’ rule Q 𝑞 𝑡., 𝑡/ 𝑣. 𝑡.,𝑡/ ≥ Q 𝑞 𝑡.,𝑡/ 𝑣.(𝑡.

§ 𝑞 is a correlated equilibrium (CE) if both players are best responding

CE AS LP

§ Can compute CE via linear programming in polynomial time!

find 𝑞 𝑡., 𝑡/ s.t. ∀𝑡., 𝑡.

M, 𝑡/ ∈ 𝑇,

∀𝑡., 𝑡/, 𝑡/

M ∈ 𝑇,

Q 𝑞 𝑡., 𝑡/ = 1

∀𝑡., 𝑡/ ∈ 𝑇, 𝑞 𝑡., 𝑡/ ∈ [0,1]

BEST WELFARE CE

§ Adding an objective (linear) function f, the best correlated equilibrium (e.g., max welfare) can be found

max 𝑔(𝑞 𝑡., 𝑡/ ;𝑣., 𝑣/)

s.t. ∀𝑡., 𝑡.

M, 𝑡/ ∈ 𝑇,

∀𝑡., 𝑡/, 𝑡/

M ∈ 𝑇,

Q 𝑞 𝑡., 𝑡/ = 1

∀𝑡., 𝑡/ ∈ 𝑇, 𝑞 𝑡., 𝑡/ ∈ [0,1]

IMPLEMENTATION OF CE

§ Instead of a mediator, use a hat! § Balls in hat are labeled with “chicken” or “dare”, each blindfolded player takes a ball Which balls implement the distribution 𝑞 before ?

E.g., An automatic trusting authority can be implemented using cryptographic algorithms

CE VS. NE

What is the relation between CE and NE? 1. CE ⇒ NE 2. NE ⇒ CE 3. NE ⇔ CE 4. NE ∥ CE

A DIFFERENT TYPE OF GAMES: STACKELBERG GAMES

§ Playing up is a dominant strategy for row player § Row player plays up § Column player would then play left § Therefore, (1,1) is the only Nash equilibrium outcome

1,1 3,0 0,0 2,1

COMMITMENT IS GOOD

§ Suppose the game is played sequentially as follows:

a row

commitment and chooses a column § Row player can commit to playing Down: Column player will play Right and the Row player gets now a better reward!

1,1 3,0 0,0 2,1

1,1 3,0 0,0 2,1

COMMITMENT TO MIXED STRATEGY

§ By committing to a mixed strategy, row player can get even better and guarantee a reward of almost 2.5: 0.49×3 + 0.51×2 § Stackelberg strategy (1934) § Rooted in duopoly scenarios

COMPUTING STACKELBERG

§ Theorem [Conitzer and Sandholm, 2006]: In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time § Theorem: The problem is NP-hard when the number of players is ≥ 3

TRACTABILITY: 2 PLAYERS

§ For each pure strategy 𝑡/ of the follower, we compute via the LP below a mixed strategy 𝑦. for the leader such that:

§ Choose 𝑦.

𝑦. 𝑡. 𝑣.(𝑡.,𝑡/)

∀𝑡/

§ Player 2 does not have incentive to deviate § Since the game is symmetric, also Player 1 does not have incentive to deviate § → Correlated equilibrium! § Expected reward per player: (1/3)7 + (1/3)2 + (1/3)6 = 5 § Mixed strategy NE: 4(2/3), which is < 5 § Social welfare: 30/3