CSC2556 Lecture 11
Noncooperative Games 2: Zero-Sum Games, Stackelberg Games
CSC2556 - Nisarg Shah 1
CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, - - PowerPoint PPT Presentation
CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg Shah 1 Zero-Sum Games Total reward is constant in all outcomes (w.l.o.g. 0 ) Focus on two-player zero-sum games (2p-zs) The more I
CSC2556 - Nisarg Shah 1
CSC2556 - Nisarg Shah 2
➢ “The more I win, the more you lose” ➢ Chess, tic-tac-toe, rock-paper-scissor, …
P1 P2 Rock Paper Scissor Rock (0 , 0) (-1 , 1) (1 , -1) Paper (1 , -1) (0 , 0) (-1 , 1) Scissor (-1 , 1) (1 , -1) (0 , 0)
CSC2556 - Nisarg Shah 3
➢ Only need a single matrix 𝐵 : reward for P1 ➢ P1 wants to maximize, P2 wants to minimize
P1 P2 Rock Paper Scissor Rock
1 Paper 1
Scissor
1
CSC2556 - Nisarg Shah 4
➢ P1 uses mixed strategy 𝑦1 ➢ P2 uses mixed strategy 𝑦2 ➢ 𝑦1
𝑈 𝐵 𝑦2 (where 𝑦1 and 𝑦2 are column vectors)
CSC2556 - Nisarg Shah 5
➢ If I commit to 𝑦1 first, P2 would choose 𝑦2 to minimize my
1 ∗ = max 𝑦1
𝑦2
𝑈 ∗ 𝐵 ∗ 𝑦2
➢ A maximizer 𝑦1
∗ is a maximin strategy for P1
CSC2556 - Nisarg Shah 6
1 ∗ = max 𝑦1
𝑦2
𝑈 ∗ 𝐵 ∗ 𝑦2
2 ∗ = min 𝑦2
𝑦1
𝑈 ∗ 𝐵 ∗ 𝑦2
➢ P2’s minimax strategy 𝑦2
∗ minimizes this
1 ∗ ≤ 𝑊 2 ∗ (both play their “safe” strategies together)
CSC2556 - Nisarg Shah 7
➢ 𝑊
1 ∗ = 𝑊 2 ∗ = 𝑊∗ (called the minimax value of the game)
➢ Set of Nash equilibria =
∗, x2 ∗ ∶ x1 ∗ = maximin for P1, x2 ∗ = minimax for P2}
∗ is best response to 𝑦2 ∗ and vice-versa.
CSC2556 - Nisarg Shah 8
CSC2556 - Nisarg Shah 9
➢ Trivia: Another notable PPAD-complete problem is finding
➢ Polynomial in #actions of the two players: 𝑛1 and 𝑛2
CSC2556 - Nisarg Shah 10
𝑈 𝐵 𝑘 ≥ 𝑤, 𝑘 ∈ 1, … , 𝑛2
CSC2556 - Nisarg Shah 11
CSC2556 - Nisarg Shah 12
CSC2556 - Nisarg Shah 13
Kicker Goalie L R L 0.58 0.95 R 0.93 0.70
CSC2556 - Nisarg Shah 14
Kicker Goalie L R L 0.58 0.95 R 0.93 0.70
CSC2556 - Nisarg Shah 15
John von Neumann George Dantzig
CSC2556 - Nisarg Shah 16
George Dantzig loves to tell the story of his meeting with John von Neumann on October 3, 1947 at the Institute for Advanced Study at Princeton. Dantzig went to that meeting with the express purpose of describing the linear programming problem to von Neumann and asking him to suggest a computational procedure. He was actually looking for methods to benchmark the simplex method. Instead, he got a 90-minute lecture on Farkas Lemma and Duality (Dantzig's notes of this session formed the source of the modern perspective on linear programming duality). Not wanting Dantzig to be completely amazed, von Neumann admitted: "I don't want you to think that I am pulling all this out of my sleeve like a magician. I have recently completed a book with Morgenstern on the theory of games. What I am doing is conjecturing that the two problems are equivalent. The theory that I am outlining is an analogue to the one we have developed for games.“
CSC2556 - Nisarg Shah 17
➢ We can assume 𝑦2 to be a pure strategy w.l.o.g. ➢ We don’t need 𝑦1 to be a best response to 𝑦2
CSC2556 - Nisarg Shah 18
P1 P2 Left Right Up (1 , 1) (3 , 0) Down (0 , 0) (2 , 1)
CSC2556 - Nisarg Shah 19
P1 P2 Left Right Up (1 , 1) (3 , 0) Down (0 , 0) (2 , 1)
CSC2556 - Nisarg Shah 20
➢ If P1 commits to playing Up and Down with probabilities
➢ P2 is still better off playing Right than Left, in expectation ➢ 𝔽[Reward] for P1 increases to ~2.5
P1 P2 Left Right Up (1 , 1) (3 , 0) Down (0 , 0) (2 , 1)
CSC2556 - Nisarg Shah 21
➢ The leader always has the option to commit to a Nash
CSC2556 - Nisarg Shah 22
Maximize 𝑤 Subject to 𝑦1
𝑈 𝐵 𝑘 ≥ 𝑤, 𝑘 ∈ 1, … , 𝑛2
𝑦1 1 + ⋯ + 𝑦1 𝑛1 = 1 𝑦1 𝑗 ≥ 0, 𝑗 ∈ {1, … , 𝑛1}
CSC2556 - Nisarg Shah 23
𝑦1
𝑦2
CSC2556 - Nisarg Shah 24
max Σ𝑡1∈𝑇1𝑦1 𝑡1 ⋅ 𝜌1(𝑡1, 𝑡2
∗)
subject to ∀𝑡2 ∈ 𝑇2, Σ𝑡1∈𝑇1 𝑦1 𝑡1 ⋅ 𝜌2 𝑡1, 𝑡2
∗
≥ Σ𝑡1∈𝑇1𝑦1 𝑡1 ⋅ 𝜌2 𝑡1, 𝑡2 Σ𝑡1∈𝑇1𝑦1 𝑡1 = 1 ∀𝑡1 ∈ 𝑇1, 𝑦1 𝑡1 ≥ 0
∗,
take the maximum
to 𝑡2
∗ optimizes over
all 𝑦1 for which 𝑡2
∗ is
the best response
CSC2556 - Nisarg Shah 25
➢ Defender (leader) wants to deploy
security resources to protect targets
➢ A resource can protect one of several
subsets of targets
➢ Attacker (follower) observes the
defender’s strategy, and chooses a target to attack
➢ Both players get a reward/penalty
CSC2556 - Nisarg Shah 26
CSC2556 - Nisarg Shah 27
➢ Must return home
➢ Continuous-time strategies
➢ Bathroom breaks !!!
➢ Poachers are not fully rational