game theory lecture 9
play

Game Theory: Lecture #9 Outline: Zero-sum games Security - PDF document

Game Theory: Lecture #9 Outline: Zero-sum games Security strategies and values Value Minimax Theorem Individual Optimization Previous focus: Single decision-maker i A set of actions for the individual, denoted by A i .


  1. Game Theory: Lecture #9 Outline: • Zero-sum games • Security strategies and values • Value • Minimax Theorem

  2. Individual Optimization • Previous focus: Single decision-maker i – A set of actions for the individual, denoted by A i . – A set of “other things that could happen in the world,” denoted A − i – This induces the set of states of the world A = A i × A − i – The individual’s preferences over states characterized by a function: U i : A → R • Terminology: – U i ( · ) referred to as “payoff” or “utility” or “reward” function – The individual i is referred to as an “agent,” “player,” “decision-maker,” or “user” • Player i prefers state a to state a ′ if and only if U i ( a ) > U i ( a ′ ) In case U i ( a ) = U i ( a ′ ) player i is “indifferent” • Dominant question: What should the decision-maker do in such scenarios? • New focus: What if the “other things” are adversarial? 1

  3. Zero-sum games • Setup: Two-player zero-sum games – Set of players, N = { 1 , 2 } – Set of actions, A 1 and A 2 – This induces the set of action profiles A = A 1 × A 2 – For each player, preferences over action profiles characterized by a function: U i : A → R – Zero-sum constraint: For any action profile a ∈ A , U 1 ( a ) + U 2 ( a ) = 0 • Matrix form is a convenient representation for two player strategic games. First entry Player 1 ’s payoff, second entry Player 2’s payoff • Example: Matching pennies H T H 1 , − 1 − 1 , 1 T − 1 , 1 1 , − 1 • Example: Rock-paper-scissors R P S R 0 , 0 − 1 , 1 1 , − 1 P 1 , − 1 0 , 0 − 1 , 1 S − 1 , 1 1 , − 1 0 , 0 2

  4. Zero-sum games • Note: Zero-sum distinction is only relevant for two-player games. Why? • Convention: – Represent game by single matrix – View row player as “maximizer” – View column player as “minimizer” (rather than maximizer of negative values) • Example: Rock-paper-scissors R P S R 0 − 1 1 P 1 0 − 1 S − 1 1 0 • Player set: { row , col } • Action sets: A row = { R, P, S } A col = { R, P, S } • Action profiles: A = { ( R, R ) , ( R, P ) , ( R, S ) , ..., ( S, R ) , ( S, P ) , ( S, S ) } • Payoff functions: U row ( R, R ) = 0 & U col ( R, R ) = 0 U row ( R, P ) = − 1 & U col ( R, P ) = 1 . . . U row ( S, S ) = 0 & U col ( S, S ) = 0 • Interpretation: row tries to maximize cell number, col tries to minimize cell number 3

  5. Worst-case analysis • What is reasonable prediction of behavior in zero-sum games? • Worst-case model: “worst for row is best for col ” – One such model that seems reasonable in zero-sum games – Requires analyzing “what if” scenarios, i.e., if I play T what would opponent do? • Example: L R T 3 0 B 1 2 – If row plays T : Worst case outcome is 0 – If row plays B : Worst case outcome is 1 – row ’s security strategy: B (worst case is col = L ) – Likewise, col ’s security strategy is R (worst case is row = B ) • Guaranteed levels: – Let v denote the guaranteed payoff for row (=1) – Let v denote the maximum penalty for col (=2) • Example: L R T 3 0 B 2 1 – row : B (worst case is col = R ) – col : R (worst case is row = B ) – v = v = 1 4

  6. Maximin and minimax • Question: How do v and v compare in general? – Is it always the case that v ≤ v ? – When is it the case that v = v ? • Fact: Computation of v and v is precisely the same as “maximin” and “minimax” computations • Let F : X × Y → R • Maximin: max x ∈ X min y ∈ Y F ( x, y ) – x commits – y maximizes (as a function of x ) max x ∈ X F ( x, y wc ( x )) where F ( x, y wc ( x )) = min y ∈ Y F ( x, y ) • Minimax: (Similar interpretation) min y ∈ Y max x ∈ X F ( x, y ) • Claim: “Largest minimum is smaller that smallest maximum” max x ∈ X min y ∈ Y F ( x, y ) ≤ min y ∈ Y max x ∈ X F ( x, y ) • Proof: y 1 y 2 y 3 x 1 · · min y max x F ( x, y ) x 2 · · · x 3 max x min y F ( x, y ) · α max x ∈ X min y ∈ Y F ( x, y ) ≤ α ≤ min y ∈ Y max x ∈ X F ( x, y ) 5

  7. Security strategies • Notation: – Set of rows: I – Set of columns: J – Game matrix elements: m ij • Define: v = max i ∈I min j ∈J m ij v = min j ∈J max i ∈I m ij • From prior result: v ≤ v • Game has a value of v ∗ if v = v = v ∗ • i ∗ is a maximizing security strategy (or maximinimizer) if v ≤ m i ∗ j for all j i.e., i ∗ assures a payoff of at least v • j ∗ is a minimizing security strategy (or minimaximizer) if m ij ∗ ≤ v for all i i.e., j ∗ assures a penalty of at most v 6

  8. Mixed strategies • Recall previous example H T H 1 , − T − 1 1 – Row player cannot assure payoff greater than − 1 with either H or T – What if row player randomizes 50/50? Then can assure payoff at least 0 • Recall previous example L R T 3 0 B 1 2 – Col player cannot assure penalty of less than 2 with either L or R – What if Col player randomizes (1 / 2 , 1 / 2) ? Then can assure penalty of at most 1 . 5 • How do security levels change when using mixed strategies (i.e., probabilistic strategies) and opposed to pure strategies (i.e., non-probabilistic strategies)? 7

  9. (Mixed) Security strategies • Discussion parallels that of pure actions • Notation: – Mixed strategy of row player: p ∈ ∆ – Mixed strategy of column player: q ∈ ∆ • Claim: Expected payoff to (maximizing) row player is p T Mq = � � m ij p i q j i ∈I j ∈J • Define: q ∈ ∆ p T Mq v = max p ∈ ∆ min p ∈ ∆ p T Mq v = min q ∈ ∆ max • As before v ≤ v and game has a value of v ∗ if v = v = v ∗ • p ∗ is a maximizing security strategy (or maximinimizer) if v ≤ p ∗ T Mq for all q • q ∗ is a minimizing security strategy (or minimaximizer) if p T Mq ∗ ≤ v for all p 8

  10. Value • Minimax theorem: With mixed strategies, v = v = v ∗ • Proof: Judicious use of “separating hyperplane” theorem • Remarks: – Every zero-sum matrix game has a value over mixed strategies – Mixed strategies reasonable prediction of behavior in zero-sum games – Relatively easy to compute security strategies in zero-sum games. 9

  11. Example • What are the security strategies and value of the following game? L R T 3 0 B 1 2 • Suppose ROW playing a strategy ( p, 1 − p ) , i.e., play T with probability p – ROW’s expected utility if COL plays L : 3 p + 1(1 − p ) = 2 p + 1 – ROW’s expected utility if COL plays R : 2(1 − p ) = 2 − p • Plot and inspect: 3 2.5 Expected Utility for ROW 2 COL playing R 1.5 COL player L 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p • The p that maximizes the minimum payoff is: p = 0 . 25 with a security level of v = 1 . 5 . • Similar analysis for COL demonstrates (1 / 2 , 1 / 2) is security strategy with a security level of v = 1 . 5 . • Hence, the game has a value v ∗ = v = v = 1 . 5 . 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend