L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M - - PowerPoint PPT Presentation
L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M - - PowerPoint PPT Presentation
15-382 C OLLECTIVE I NTELLIGENCE S18 L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO M IXED N ASH E QUILIBRIUM R P S 1 3 ,1 3 , 1 3 , 1 3 ,1 3 , 1 3 R 0,0 -1,1 1,-1 Finding ME: & , ' & , ( & ,
15781 Fall 2016: Lecture 22
MIXED NASH EQUILIBRIUM
0,0
- 1,1
1,-1 1,-1 0,0
- 1,1
- 1,1
1,-1 0,0
2
1 3 ,1 3 , 1 3 , 1 3 ,1 3 , 1 3
Finding ME: Β§ Let π%
& ,π' & ,π( &, be the probabilities of the pure
strategy mix for player π = 1,2, π%
& + π' & + π( & = 1
Β§ A mixed strategy equilibrium needs to make player π indifferent among all three of his strategies (i.e. same expected utility) Β§ Γ Find player π expected utilities as a function of the parameters of the mixed strategy and set the parameters in order to satisfy the previous requirement R S P R P S
0,0
- 2,2
1,-1 2,-2 0,0
- 1,1
- 1,1
1,-1 0,0
R S P R P S Β§ In symmetric zero-sum games the expected utility of the players at equilibrium is zero Γ This property can be used to rule out equilibrium candidates (just check if one player has a positive utility!)
3
GAME OF CHICKEN
http://youtu.be/u7hZ9jKrwvo
Β§ Each player, in attempting to secure his best outcome, risks the worst Β§ Every player wants to dare, but only if the other chickens out! Β§ A mediator would helpβ¦
4
GAME OF CHICKEN
Β§ Social welfare is the sum of utilities Β§ Optimal social welfare = 6 Β§ Pure NE: (C,D) and (D,C), social welfare = 5 Β§ Mixed NE: both (.
/, . /), social
welfare = 4 Β§ Can we do better? Players are independent so far β¦
Dare Chicken Dare
0,0 4,1
Chicken
1,4 3,3
5
CORRELATED EQUILIBRIUM
Β§ A βtrustedβ authority / mediator chooses a pair of strategies (π‘1, π‘2) according to a probability distribution π over π2 (it can be generalized to π players)
Β§ The mediator βflips a coinβ / draw according to the distribution π(π‘., π‘/) and, based on the outcome, tells the players which pure strategy to use based
Robert Aumann Nobel prize, 2005
6
CORRELATED EQUILIBRIUM
Β§ Γ The trusted party only tells each player what to do, but it does not reveal what the other party is supposed to do! Β§ The distribution π is known to the players: each player knows the probability of observing a strategy profile and assumes the other player will follow mediatorβs instructions Β§ Γ Posterior conditional probability is known: Pr [π‘&|π‘
;]
Β§ It is a Correlated Equilibrium (CE) if no player wants to deviate from the trusted partyβs instructions, such that choices are correlated Β§ Γ Find distribution π that guarantees a CE
Dare Chicken Dare
0,0 7,2
Chicken
2,7 6,6
7
CORRELATED EQUILIBRIUM
Β§ Common knowledge: Distribution π (is CE)
- (D,D): 0
- (D,C): .
>
- (C,D): .
>
- (C,C): .
>
Β§ If Player 2 is told to play D, then P2 knows that the outcome must be (C,D) and that Player 1 will obey the instructions Γ P1 plays C ΓΌ Based on this, Player 2 has no incentive to change from playing D, as given
8
CORRELATED EQUILIBRIUM
Β§ Distribution π (is CE)
- (D,D): 0
- (D,C): .
>
- (C,D): .
>
- (C,C): .
>
0,0 7,2 2,7 6,6
Β§ If Player 2 is told to play C, then 2 knows that the outcome must be (D,C) or (C,C) with equal probability. Β§ Playerβs 2 expected utility on playing C conditioned on the fact that he is told to play C (and Player 1 will obey instructions) is:
. / π£/ πΈ, π· + . / π£/ π·, π· = . / 2 + . / 6 = 4
Β§ If Player 2 deviates from instructions and plays D: π£/ = 3.5 < 4 ΓΌ Itβs better to follow the instructions!
Chicken Dare Dare Chicken
9
CORRELATED EQUILIBRIUM
Β§ Distribution π (is CE)
- (D,D): 0
- (D,C): .
>
- (C,D): .
>
- (C,C): .
>
Β§ Player 2 does not have incentive to deviate Β§ Since the game is symmetric, also Player 1 does not have incentive to deviate Β§ β Correlated equilibrium! Β§ Expected reward per player: (1/3)*7 + (1/3)*2 + (1/3)*6 = 5 Β§ Mixed strategy NE: 4*(2/3), which is < 5 Β§ Social welfare: 30/3
0,0 7,2 2,7 6,6
Chicken Dare Dare Chicken
10
CORRELATED EQUILIBRIUM
Β§ Let π = {1,2} for simplicity Β§ A mediator chooses a pair of strategies (π‘.,π‘/) according to a distribution π over π/ Β§ Reveals π‘. to player 1 and π‘/ to player 2 Β§ When player 1 gets π‘. β π, he knows that the distribution
- ver strategies of 2 is
Pr π‘/ π‘. = Pr π‘. β§ π‘/ Pr π‘. = π π‘., π‘/ β π(π‘., π‘/
M) NO
Pβ(
11
COMPUTING CE STRATEGY
Β§ Playerβs 1 strategy π‘. is a best response if its expected utility cannot be unilaterally improved based on what he knows: Q Pr π‘/ π‘. π£. π‘.,π‘/ β₯ Q Pr π‘/ π‘. π£.(π‘.
M,π‘/) NOβ( NOβ(
, βπ‘.
M β π
Β§ Equivalently, replacing using Bayesβ rule Q π π‘., π‘/ π£. π‘.,π‘/ β₯ Q π π‘.,π‘/ π£.(π‘.
M,π‘/) NOβ( NOβ(
Β§ π is a correlated equilibrium (CE) if both players are best responding
12
CE AS LP
Β§ Can compute CE via linear programming in polynomial time!
find π π‘., π‘/ s.t. βπ‘., π‘.
M, π‘/ β π,
βπ‘., π‘/, π‘/
M β π,
Q π π‘., π‘/ = 1
NT,NOβ(
βπ‘., π‘/ β π, π π‘., π‘/ β [0,1]
Q π π‘.,π‘/ π£. π‘.,π‘/ β₯ Q π π‘.,π‘/ π£.(π‘.
M,π‘/) NOβU NOβU
Q π π‘.,π‘/ π£/ π‘.,π‘/ β₯ Q π π‘.,π‘/ π£/(π‘.,π‘/
M) NTβU NTβU
13
BEST WELFARE CE
Β§ Adding an objective (linear) function f, the best correlated equilibrium (e.g., max welfare) can be found
max π(π π‘., π‘/ ;π£., π£/)
s.t. βπ‘., π‘.
M, π‘/ β π,
βπ‘., π‘/, π‘/
M β π,
Q π π‘., π‘/ = 1
NT,NOβ(
βπ‘., π‘/ β π, π π‘., π‘/ β [0,1]
Q π π‘.,π‘/ π£. π‘.,π‘/ β₯ Q π π‘.,π‘/ π£.(π‘.
M,π‘/) NOβU NOβU
Q π π‘.,π‘/ π£/ π‘.,π‘/ β₯ Q π π‘.,π‘/ π£/(π‘.,π‘/
M) NTβU NTβU
14
IMPLEMENTATION OF CE
Β§ Instead of a mediator, use a hat! Β§ Balls in hat are labeled with βchickenβ or βdareβ, each blindfolded player takes a ball Which balls implement the distribution π before ?
- 1. 1 chicken, 1 dare
- 2. 2 chicken, 1 dare
- 3. 2 chicken, 2 dare
- 4. 3 chicken, 2 dare
C D D C C C D C
E.g., An automatic trusting authority can be implemented using cryptographic algorithms
15
CE VS. NE
What is the relation between CE and NE? 1. CE β NE 2. NE β CE 3. NE β CE 4. NE β₯ CE
Β§ For any pure strategy NE, there is a corresponding correlated equilibrium yielding the same outcome. Β§ For any mixed strategy NE, there is a corresponding correlated equilibrium yielding the same distribution of outcomes. Β§ From Nash theorem, βallβ games have a mixed strategies NEβ. Since a NE implies a CE, a CE always exist
16
A DIFFERENT TYPE OF GAMES: STACKELBERG GAMES
Β§ Playing up is a dominant strategy for row player Β§ Row player plays up Β§ Column player would then play left Β§ Therefore, (1,1) is the only Nash equilibrium outcome
1,1 3,0 0,0 2,1
L R U D
17
COMMITMENT IS GOOD
Β§ Suppose the game is played sequentially as follows:
- Row player commits to playing
a row
- Column player observes the
commitment and chooses a column Β§ Row player can commit to playing Down: Column player will play Right and the Row player gets now a better reward!
1,1 3,0 0,0 2,1
L R U D
1,1 3,0 0,0 2,1
18
COMMITMENT TO MIXED STRATEGY
Β§ By committing to a mixed strategy, row player can get even better and guarantee a reward of almost 2.5: 0.49Γ3 + 0.51Γ2 Β§ Stackelberg strategy (1934) Β§ Rooted in duopoly scenarios
π] = 0.49
Β§ Player 1 (Leader) moves at the start of the game. Then use backward induction to find the subgame perfect equilibrium. Β§ First for any output of leader, find the strategy of Follower that maximizes its payoff (its expected best reply). Β§ Next, find the strategy of leader that maximizes leader player utility, given the strategy of follower
π^ = 0.51 π% = 1 π_ = 0
19
COMPUTING STACKELBERG
Β§ Theorem [Conitzer and Sandholm, 2006]: In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time Β§ Theorem: The problem is NP-hard when the number of players is β₯ 3
20
TRACTABILITY: 2 PLAYERS
Β§ For each pure strategy π‘/ of the follower, we compute via the LP below a mixed strategy π¦. for the leader such that:
- Playing π‘/ is a best response for the follower
- Under this constraint, π¦. is optimal (for follower)
Β§ Choose π¦.
β that maximizes leaderβs utility
max β
π¦. π‘. π£.(π‘.,π‘/)
NTβ(
s.t.
βπ‘/
M β π,
βπ‘. β π, π¦. π‘. β [0,1] β π¦. π‘. π£/ π‘
., π‘/ β₯ β
π¦. π‘. π£/ π‘
., π‘/ M NTβ( NTβ(
β π¦. π‘. = 1
NTβ(
21
APPLICATION: SECURITY
Β§ Airport security: deployed at LAX Β§ Federal Air Marshals Β§ Coast Guard Β§ Idea:
- Defender commits to mixed strategy
- Attacker observes and best responds
Β§ Attacker monitors defender and tries to maximize damage, while a defender deploys resources to minimize damage based on knowledge of what the attacker would like to obtain
22
SECURITY GAMES
Β§ Set of targets π = {1,β¦ ,π}
- E.g., Entrance gates of an airport
Β§ Set Ξ© of π security resources available to the defender (leader)
- E.g., Video cameras looking at specific
directions
- E.g., Air marshals flying on specific flights
Β§ Set Ξ£ of feasible partitions of target set: Ξ£ β 2π Β§ Resource π β Ξ© can be assigned to one of the partitions in π΅ π β Ξ£ Γ Schedule Β§ Defender (leader): how to deploy the resources
- ver the target Γ How to assign resources to
targets Β§ Attacker (follower) chooses targets to attack
resources targets
23
SECURITY GAMES
Β§ For each target π’, there are four numbers defining the payoffs of defender and attacker in case of successful and unsuccessful attack:
- π£π
+ π’ = Defenderβs payoff when target π’ is attacked and target
was covered by at least one resource
- π£π
β π’ = Defenderβs payoff when target π’ is attacked and target
was not covered
- π£π
β π’ = Attackerβs payoff when target π’ is attacked and target
was not covered
- π£π
+ π’ = Attackerβs payoff when target π’ is attacked and target
was covered by at least one resource Β§ π£π
+ π’ β₯ π£π β π’ , and π£π + π’ β€ π£π β π’
Β§ For each target π’ thereβs a coverage probability by the defender using the resources in Ξ©, such that the set of targets defines π = (π1,β¦ , ππ), a vector of coverage probabilities
24
SECURITY GAMES
Β§ The expected utilities to the defender/attacker under coverage c if target π’ is attacked are:
resources targets
π£q π’,π = π£q
r π’ β πt + π£q u π’
1 β πt π£v π’, π = π£v
r π’ β πt + π£v u π’
1 β πt
Payoffs example with two targets:
25
BAYESIAN SECURITY GAMES
Β§ There are multiple types of potential attackers, each type with different payoffs Β§ The defender knows about their payoffs, as well as the distribution
- f attacker types
Β§ Bayesian extensions are used to model uncertainty over the payoffs and preferences of the players, where more uncertainty can be expressed with increasing number of types. Payoffs example with two targets and two types of attackers:
26
SOLVING SECURITY GAMES
Β§ Itβs is a 2-player Stackelberg game, so we can compute an
- ptimal strategy for the defender in polynomial time�
Β§ Consider the case of Ξ£ = π, i.e., resources are assigned to individual targets, i.e., schedules have size 1 Β§ Nevertheless, number of leader strategies is exponential: 2π Β§ Γ Representation of the linear program is exponential Β§ Theorem [Korzhyk et al. 2010]: Optimal leader strategy can be computed in poly time Β§ A number of smart algorithms have been developed β¦
Β§ Jain M., An B., Tambe M. (2013) Security Games Applied to Real-World: Research Contributions and Challenges. In: Jajodia S., Ghosh A., Subrahmanian V., Swarup V., Wang C., Wang X. (eds) Moving Target Defense II. Advances in Information Security, vol 100. Springer, New York, NY
27