cmu 15 896
play

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: - PowerPoint PPT Presentation

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The Minimax Theorem Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o


  1. CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia

  2. Reminder: The Minimax Theorem • Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o least Player 2 can guarantee loss at o most • We will prove the theorem via no-regret learning 15896 Spring 2016: Lecture 18 2

  3. How to reach your spaceship • Each morning pick one of possible routes • Then find out how long each route took • Is there a strategy for picking routes that does almost as well as the best 53 minutes fixed route in hindsight? 47 minutes ⋯ 15896 Spring 2016: Lecture 18 3

  4. The model • View as a matrix (maybe infinite #columns) Adversary Algorithm • Algorithm picks row, adversary column • Alg pays cost of (row,column) and gets column as feedback • Assume costs are in 15896 Spring 2016: Lecture 18 4

  5. The model • Define average regret in time steps as (average per-day cost of alg) (average per-day cost of best fixed row in hindsight) • No-regret algorithm: regret as • Not competing with adaptive strategy, just the best fixed row 15896 Spring 2016: Lecture 18 5

  6. Example • Algorithm 1: Alternate between U and D • Poll 1: What is algorithm 1’s worst-case average regret? 1. Adversary 2. Algorithm 1 0 3. 0 1 4. 15896 Spring 2016: Lecture 18 6

  7. Example • Algorithm 2: Choose action that has lower cost so far • Poll 2: What is algorithm 2’s worst-case average regret? 1. Adversary 2. Algorithm 1 0 3. 0 1 4. 15896 Spring 2016: Lecture 18 7

  8. What can we say more generally about deterministic algorithms? 15896 Spring 2016: Lecture 18 8

  9. Using expert advice • Want to predict the stock market • Solicit advice from experts Expert = someone with an opinion o Day Expert 1 Expert 2 Expert 3 Charlie Truth 1 � � � � � 2 � � � � � ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ • Can we do as well as best in hindsight? 15896 Spring 2016: Lecture 18 9

  10. Simpler question • One of the experts never makes a mistake • We want to find out which one • Algorithm 3: Take majority vote over experts that have been correct so far • Poll 3: What is algorithm 3’s worst-case number of mistakes? Θ 1 1. Θ log � 2. Θ��� 3. ∞ 4. 15896 Spring 2016: Lecture 18 10

  11. What if no expert is perfect? • Idea: Run algorithm 3 until all experts are crossed off, then repeat • Makes at most mistakes per mistake of the best expert • But this is wasteful: we keep forgetting what we’ve learned 15896 Spring 2016: Lecture 18 11

  12. Weighted Majority • Intuition: Making a mistake doesn’t disqualify an expert, just lowers its weight • Weighted Majority Algorithm: Start with all experts having weight 1 o Predict based on weighted majority vote o Penalize mistakes by cutting weight in o half 15896 Spring 2016: Lecture 18 12

  13. Expert 1 Expert 2 Expert 3 Charlie Alg Truth 1 1 1 1 Weight 1 � � � � � � Prediction 1 0.5 1 1 1 Weight 2 � � � � � � Prediction 2 0.5 1 0.5 0.5 Weight 3 Wrong, 1 Right, 1.5 Wrong, 2 Right, 3 15896 Spring 2016: Lecture 18 13

  14. Weighted Majority: Analysis #mistakes we’ve made so far • #mistakes of best expert so far • total weight (starts at ) • • For each mistake, drops by at least 25% � after mistakes: � • Weight of best expert is � � � � 15896 Spring 2016: Lecture 18 14

  15. Randomized Weighted Majority • Randomized Weighted Majority Algorithm: Start with all experts having weight 1 o Predict proportionally to weights: the total o weight of is � and the total weight of � � is � , predict with probability � � �� � and � � with probability � � �� � Penalize mistakes by removing fraction of o weight 15896 Spring 2016: Lecture 18 15

  16. Randomized Weighted Majority Idea: smooth out the worst case Wrong, 1 The worst-case is What about 90-10? ∼ 50-50: now we have We’re very likely to a 50% chance of agree with the getting it right majority 15896 Spring 2016: Lecture 18 16

  17. Analysis • At time we have a fraction � of weight on experts that made a mistake • Prob. � of making a mistake, remove � fraction of total weight • ����� � � • ����� � � � � ln 1 � � � �� (next slide) 15896 Spring 2016: Lecture 18 17

  18. Analysis � � � ln�1 � �� � � � �� 15896 Spring 2016: Lecture 18 18

  19. Analysis � • Weight of best expert is ���� • ����� ���� ��� � • By setting and solving, we get � • Since , • Average regret is 15896 Spring 2016: Lecture 18 19

  20. More generally • Each expert is an action with cost in • Run Randomized Weighted Majority Choose expert with probability � o Update weights: � � � o • Same analysis applies: Our expected cost: � � � o Fraction of weight removed: � � � o So, fraction removed (our cost) o 15896 Spring 2016: Lecture 18 20

  21. Proof of the minimax thm • Suppose for contradiction that zero-sum game has � � such that: If column player commits first, there is a row o that guarantees row player at least � If row player commits first, there is a column o that guarantees row player at most � • Scale matrix so that payoffs to row player are in , and let � � 15896 Spring 2016: Lecture 18 21

  22. Proof of the minimax thm • Row player plays RWM, and column player responds optimally to current mixed strategy • After steps ALG � best row in hindsight �2 � log � o Best row in hindsight � � ⋅ � � o ALG � � ⋅ � � o • It follows that � � contradiction for large • enough 15896 Spring 2016: Lecture 18 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend