csc304 lecture 6 game theory
play

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and


  1. CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1

  2. 2-Player Zero-Sum Games β€’ Reward of P2 = - Reward of P1 ➒ Matrix 𝐡 s.t. 𝐡 𝑗,π‘˜ is reward to P1 when P1 chooses her 𝑗 π‘’β„Ž action and P2 chooses her π‘˜ π‘’β„Ž action π‘ˆ 𝐡 𝑦 2 ➒ Mixed strategy profile (𝑦 1 , 𝑦 2 ) β†’ reward to P1 is 𝑦 1 β€’ Minimax Theorem: For all 𝐡 , π‘ˆ 𝐡 𝑦 2 = min π‘ˆ 𝐡 𝑦 2 max min 𝑦 1 max 𝑦 1 𝑦 1 𝑦 2 𝑦 2 𝑦 1 ➒ Proof through online expert learning! CSC304 - Nisarg Shah 2

  3. Online Expert Learning β€’ Setup: ➒ On each day, we want to predict if a stock price will go up or down ➒ π‘œ experts provide their predictions every day o Each expert says either up or down ➒ Based on their advice, we make a final prediction ➒ At the end of the day, we learn if our prediction was correct (reward = 1) or wrong (reward = 0) β€’ Goal: ➒ Do almost as good as the best expert in hindsight! CSC2420 – Allan Borodin & Nisarg Shah 3

  4. Online Expert Learning β€’ Notation ➒ π‘œ = #experts ➒ Predictions and ground truth: 1 or 0 (π‘ˆ) = #mistakes of expert 𝑗 in first π‘ˆ steps ➒ 𝑛 𝑗 ➒ 𝑁 (π‘ˆ) = #mistakes of the algorithm in first π‘ˆ steps β€’ Simplest idea: ➒ Keep a weight for each expert ➒ Use weighted majority of experts to make prediction ➒ Decrease the weight of an expert whenever the expert makes a mistake CSC2420 – Allan Borodin & Nisarg Shah 4

  5. Online Expert Learning β€’ Weighted Majority: ➒ Fix πœƒ ≀ 1/2 . (1) = 1 . ➒ Start with π‘₯ 𝑗 ➒ In time step 𝑒 , predict 1 if the total weight of experts predicting 1 is larger than the total weight of experts predicting 0 , and vice-versa. (𝑒+1) ← π‘₯ 𝑗 (𝑒) β‹… (1 βˆ’ πœƒ) for ➒ At the end of time step 𝑒 , set π‘₯ 𝑗 every expert that made a mistake. CSC2420 – Allan Borodin & Nisarg Shah 5

  6. Online Expert Learning β€’ Theorem: For every 𝑗 and π‘ˆ , (π‘ˆ) + 2 ln π‘œ 𝑁 (π‘ˆ) ≀ 2 1 + πœƒ 𝑛 𝑗 πœƒ β€’ Proof: ➒ Consider a β€œpotential function” Ξ¦ (𝑒) = Οƒ 𝑗 π‘₯ 𝑗 (𝑒) . ➒ If the algorithm makes a mistake in round 𝑒 , at least half of the weight decreases by a factor of 1 βˆ’ πœƒ : Ξ¦ (𝑒+1) ≀ Ξ¦ (𝑒) 1 2 + 1 = Ξ¦ (𝑒) 1 βˆ’ πœƒ 2 1 βˆ’ πœƒ 2 CSC2420 – Allan Borodin & Nisarg Shah 6

  7. Online Expert Learning β€’ Theorem: For every 𝑗 and π‘ˆ , (π‘ˆ) + 2 ln π‘œ 𝑁 (π‘ˆ) ≀ 2 1 + πœƒ 𝑛 𝑗 πœƒ β€’ Proof: ➒ Ξ¦ (1) = π‘œ 𝑁 (π‘ˆ) ➒ Thus: Ξ¦ (π‘ˆ+1) ≀ π‘œ 1 βˆ’ πœƒ . 2 (π‘ˆ+1) = 1 βˆ’ πœƒ 𝑛 𝑗 (π‘ˆ) ➒ Weight of expert 𝑗 : π‘₯ 𝑗 π‘ˆ+1 and βˆ’ ln 1 βˆ’ πœƒ ≀ πœƒ + πœƒ 2 ➒ Use Ξ¦ (π‘ˆ+1) β‰₯ π‘₯ 𝑗 (as πœƒ ≀ 1/2 ). CSC2420 – Allan Borodin & Nisarg Shah 7

  8. Online Expert Learning β€’ Beautiful! ➒ Comparison to the best expert in hindsight . ➒ At most (roughly) twice as many mistakes + small additive term ➒ In the worst case over how experts make mistakes o No statistical assumptions. ➒ Simple policy to implement. β€’ It can be shown that this bound is tight for any deterministic algorithm. CSC2420 – Allan Borodin & Nisarg Shah 8

  9. Randomized Weighted Majority β€’ Randomization β‡’ beat the factor of 2 β€’ Simple Change: ➒ At the beginning of round 𝑒 , let (𝑒) = total weight of experts predicting 1 o Ξ¦ 1 𝑒 = total weight of experts predicting 0 o Ξ¦ 0 (𝑒) > Ξ¦ 0 (𝑒) , 0 otherwise. ➒ Deterministic: predict 1 if Ξ¦ 1 𝑒 Ξ¦ 1 ➒ Randomized: predict 1 with probability (𝑒) , 0 with (𝑒) +Ξ¦ 0 Ξ¦ 1 the remaining probability. CSC2420 – Allan Borodin & Nisarg Shah 9

  10. Randomized Weighted Majority β€’ Equivalently: ➒ β€œPick an expert with probability proportional to weight, and go with their prediction” 𝑒 𝑒 = π‘₯ 𝑗 ➒ Pr[ picking expert 𝑗 in step 𝑒] = π‘ž 𝑗 Ξ¦ 𝑒 𝑒 = 1 if expert 𝑗 makes a mistake in step 𝑒 , 0 otherwise. β€’ Let 𝑐 𝑗 β€’ Algorithm makes a mistake in round 𝑒 with probability 𝑒 𝑐 𝑗 𝑒 = 𝒒 𝑒 β‹… 𝒄 𝑒 ෍ π‘ž 𝑗 𝑗 𝒒 𝑒 β‹… 𝒄 𝑒 π‘ˆ β€’ 𝐹[ #mistakes after π‘ˆ rounds ] = Οƒ 𝑒=1 CSC2420 – Allan Borodin & Nisarg Shah 10

  11. Randomized Weighted Majority 𝑒+1 = Οƒ 𝑗 π‘₯ 𝑗 𝑒 β‹… 1 βˆ’ πœƒπ‘ 𝑗 Ξ¦ 𝑒+1 = Οƒ 𝑗 π‘₯ 𝑗 𝑒 𝑒 β‹… 𝑐 𝑗 = Ξ¦ 𝑒 βˆ’ πœƒ Ξ¦ 𝑒 Οƒ 𝑗 π‘ž 𝑗 𝑒 1 βˆ’ πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒 = Ξ¦ 𝑒 ≀ Ξ¦ 𝑒 exp βˆ’πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒 β€’ Applying iteratively: Ξ¦ π‘ˆ+1 ≀ π‘œ β‹… exp βˆ’πœƒ β‹… 𝐹 #mistakes π‘ˆ+1 β‰₯ 1 βˆ’ πœƒ 𝑛 𝑗 π‘ˆ β€’ But Ξ¦ π‘ˆ+1 β‰₯ π‘₯ 𝑗 β€’ QED! CSC2420 – Allan Borodin & Nisarg Shah 11

  12. Randomized Weighted Majority β€’ Theorem: For every 𝑗 and π‘ˆ , the expected number of mistakes of randomized weighted majority in the first π‘ˆ rounds is π‘ˆ + 2 ln π‘œ 𝑁 π‘ˆ ≀ 1 + πœƒ 𝑛 𝑗 πœƒ π‘ˆ + 𝑃 ln π‘œ : 𝑁 π‘ˆ ≀ 𝑛 𝑗 β€’ Setting πœƒ = π‘ˆ β‹… ln π‘œ π‘ˆ β€’ We say that the algorithm has 𝑃 π‘ˆ β‹… ln π‘œ regret β€’ Sublinear regret in π‘ˆ β€’ Regret per round β†’ 0 as π‘ˆ β†’ ∞ CSC2420 – Allan Borodin & Nisarg Shah 12

  13. How is this related to the minimax theorem?!! CSC304 - Nisarg Shah 13

  14. Minimax via Regret Learning β€’ Recall: π‘ˆ 𝐡 𝑦 2 π‘Š 𝑆 = max 𝑦 1 min 𝑦 2 𝑦 1 π‘ˆ 𝐡 𝑦 2 π‘Š 𝐷 = min 𝑦 2 max 𝑦 1 𝑦 1 β€’ Row player’s guarantee: my reward β‰₯ π‘Š 𝑆 β€’ Column player’s guarantee: row player’s reward ≀ π‘Š 𝐷 β€’ Hence, π‘Š 𝑆 ≀ π‘Š 𝐷 (trivial direction) β€’ To prove: π‘Š 𝑆 = π‘Š 𝐷 CSC2420 - Allan Borodin & Nisarg Shah 14

  15. Minimax via Regret Learning β€’ Scale values in 𝐡 to be in [0,1] . ➒ Without loss of generality. β€’ Suppose for contradiction that π‘Š 𝑆 = π‘Š 𝐷 βˆ’ πœ€ , πœ€ > 0 . β€’ Suppose row player 𝑆 uses randomized weighted majority (experts = row player’s actions) ➒ In each round, column player 𝐷 responds by choosing her action that minimizes the row player’s expected reward. CSC2420 - Allan Borodin & Nisarg Shah 15

  16. Minimax via Regret Learning β€’ After π‘ˆ iterations, row player’s reward is: ➒ π‘Š ≀ π‘ˆ β‹… π‘Š 𝑆 ➒ π‘Š β‰₯ β€œreward of best action in hindsight” βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ o Reward of best action in hindsight β‰₯ π‘ˆ β‹… π‘Š 𝐷 . o Why? o Suppose column player plays action π‘˜ 𝑒 in round 𝑒 o Equivalent to playing mixed strategy 𝑑 in each round β€’ 𝑑 picks 𝑒 ∈ {1, … , π‘ˆ} at random and plays π‘˜ 𝑒 o By definition of π‘Š 𝐷 , 𝑑 cannot ensure that row player’s reward is less than π‘Š 𝐷 β€’ Then, there is an action of row player with E[reward] at least π‘Š 𝐷 against 𝑑 CSC2420 - Allan Borodin & Nisarg Shah 16

  17. Minimax via Regret Learning β€’ After π‘ˆ iterations, row player’s reward is: ➒ π‘Š ≀ π‘ˆ β‹… π‘Š 𝑆 ➒ π‘Š β‰₯ π‘ˆ β‹… π‘Š 𝐷 βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ ➒ π‘ˆ β‹… π‘Š 𝑆 = π‘ˆ β‹… (π‘Š 𝐷 βˆ’ πœ€) β‰₯ π‘ˆ β‹… π‘Š 𝐷 βˆ’ 𝑃 π‘ˆ β‹… ln π‘œ ➒ πœ€ π‘ˆ ≀ 𝑃 π‘ˆ β‹… ln π‘œ ➒ Contradiction for sufficiently large π‘ˆ . β€’ QED! CSC2420 - Allan Borodin & Nisarg Shah 17

  18. Yao’s Minimax Principle β€’ Goal: ➒ Provide a lower bound on the expected running time that any randomized algorithm for a problem can achieve in the worst case over problem instances β€’ Note: ➒ Expectation (in running time) is over randomization of the algorithm ➒ The problem instance (worst case) is chosen to maximize this expected running time CSC304 - Nisarg Shah 18

  19. Yao’s Minimax Principle β€’ Notation ➒ Capital letters for β€œrandomized”, small for deterministic ➒ 𝑒 : a deterministic algorithm ➒ 𝑆 : a randomized algorithm ➒ π‘ž : a problem instance ➒ 𝑄 : a distribution over problem instances ➒ π‘ˆ : running time β€’ We are interested in min max π‘ˆ(𝑆, π‘ž) 𝑆 π‘ž CSC304 - Nisarg Shah 19

  20. Yao’s Minimax Principle Det. Algorithms Running Problem Instances times CSC304 - Nisarg Shah 20

  21. Yao’s Minimax Principle β€’ Minimax Theorem: min max π‘ˆ(𝑆, π‘ž) = max min π‘ˆ(𝑒, 𝑄) 𝑆 π‘ž 𝑄 𝑒 β€’ So: ➒ To lower bound the E[running time] of any randomized algorithm 𝑆 on its worst-case instance π‘ž by a quantity 𝑅 … ➒ Choose a distribution 𝑄 over problem instances, and show that every det. algorithm 𝑒 has expected running time at least 𝑅 on problems drawn from 𝑄 CSC304 - Nisarg Shah 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend