CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: - PowerPoint PPT Presentation

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia

Reminder: The Minimax Theorem • Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o least Player 2 can guarantee loss at o most • We will prove the theorem via no-regret learning 15896 Spring 2016: Lecture 18 2

How to reach your spaceship • Each morning pick one of possible routes • Then find out how long each route took • Is there a strategy for picking routes that does almost as well as the best 53 minutes fixed route in hindsight? 47 minutes ⋯ 15896 Spring 2016: Lecture 18 3

The model • View as a matrix (maybe infinite #columns) Adversary Algorithm • Algorithm picks row, adversary column • Alg pays cost of (row,column) and gets column as feedback • Assume costs are in 15896 Spring 2016: Lecture 18 4

The model • Define average regret in time steps as (average per-day cost of alg) (average per-day cost of best fixed row in hindsight) • No-regret algorithm: regret as • Not competing with adaptive strategy, just the best fixed row 15896 Spring 2016: Lecture 18 5

Example • Algorithm 1: Alternate between U and D • Poll 1: What is algorithm 1’s worst-case average regret? 1. Adversary 2. Algorithm 1 0 3. 0 1 4. 15896 Spring 2016: Lecture 18 6

Example • Algorithm 2: Choose action that has lower cost so far • Poll 2: What is algorithm 2’s worst-case average regret? 1. Adversary 2. Algorithm 1 0 3. 0 1 4. 15896 Spring 2016: Lecture 18 7

What can we say more generally about deterministic algorithms? 15896 Spring 2016: Lecture 18 8

Using expert advice • Want to predict the stock market • Solicit advice from experts Expert = someone with an opinion o Day Expert 1 Expert 2 Expert 3 Charlie Truth 1 � � � � � 2 � � � � � ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ • Can we do as well as best in hindsight? 15896 Spring 2016: Lecture 18 9

Simpler question • One of the experts never makes a mistake • We want to find out which one • Algorithm 3: Take majority vote over experts that have been correct so far • Poll 3: What is algorithm 3’s worst-case number of mistakes? Θ 1 1. Θ log � 2. Θ�� 3. ∞ 4. 15896 Spring 2016: Lecture 18 10

What if no expert is perfect? • Idea: Run algorithm 3 until all experts are crossed off, then repeat • Makes at most mistakes per mistake of the best expert • But this is wasteful: we keep forgetting what we’ve learned 15896 Spring 2016: Lecture 18 11

Weighted Majority • Intuition: Making a mistake doesn’t disqualify an expert, just lowers its weight • Weighted Majority Algorithm: Start with all experts having weight 1 o Predict based on weighted majority vote o Penalize mistakes by cutting weight in o half 15896 Spring 2016: Lecture 18 12

Expert 1 Expert 2 Expert 3 Charlie Alg Truth 1 1 1 1 Weight 1 � � � � � � Prediction 1 0.5 1 1 1 Weight 2 � � � � � � Prediction 2 0.5 1 0.5 0.5 Weight 3 Wrong, 1 Right, 1.5 Wrong, 2 Right, 3 15896 Spring 2016: Lecture 18 13

Weighted Majority: Analysis #mistakes we’ve made so far • #mistakes of best expert so far • total weight (starts at ) • • For each mistake, drops by at least 25% � after mistakes: � • Weight of best expert is � � � � 15896 Spring 2016: Lecture 18 14

Randomized Weighted Majority • Randomized Weighted Majority Algorithm: Start with all experts having weight 1 o Predict proportionally to weights: the total o weight of is � and the total weight of � � is � , predict with probability � � �� and � � with probability � � �� Penalize mistakes by removing fraction of o weight 15896 Spring 2016: Lecture 18 15

Randomized Weighted Majority Idea: smooth out the worst case Wrong, 1 The worst-case is What about 90-10? ∼ 50-50: now we have We’re very likely to a 50% chance of agree with the getting it right majority 15896 Spring 2016: Lecture 18 16

Analysis • At time we have a fraction � of weight on experts that made a mistake • Prob. � of making a mistake, remove � fraction of total weight • �� • �� ln 1 � � � �� (next slide) 15896 Spring 2016: Lecture 18 17

Analysis � � � ln�1 � �� 15896 Spring 2016: Lecture 18 18

Analysis � • Weight of best expert is �� • �� • By setting and solving, we get � • Since , • Average regret is 15896 Spring 2016: Lecture 18 19

More generally • Each expert is an action with cost in • Run Randomized Weighted Majority Choose expert with probability � o Update weights: � � � o • Same analysis applies: Our expected cost: � � � o Fraction of weight removed: � � � o So, fraction removed (our cost) o 15896 Spring 2016: Lecture 18 20

Proof of the minimax thm • Suppose for contradiction that zero-sum game has � � such that: If column player commits first, there is a row o that guarantees row player at least � If row player commits first, there is a column o that guarantees row player at most � • Scale matrix so that payoffs to row player are in , and let � � 15896 Spring 2016: Lecture 18 21

Proof of the minimax thm • Row player plays RWM, and column player responds optimally to current mixed strategy • After steps ALG � best row in hindsight �2 � log � o Best row in hindsight � � ⋅ � � o ALG � � ⋅ � � o • It follows that � � contradiction for large • enough 15896 Spring 2016: Lecture 18 22

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: - PowerPoint PPT Presentation

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The Minimax Theorem Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o

Why Are We Here? CSCE CSCE 496/896 496/896 Lecture 10: Lecture 10: CSCE 496/896 Lecture 10:

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

Introduction Supervised Learning CSCE CSCE 496/896 496/896 Lecture 2: Lecture 2: Basic

Welcome to CSCE 496/896: Deep Learning! Welcome to CSCE 496/896: Deep Learning! Please check

FACT: A Diagnostic for Group Fairness Trade-offs Joon Kim, CMU (joonsikk@cs.cmu.edu ) Jiahao Chen,

The bluetides simulation Tiziana DiMatteo (CMU ) Yu Feng (Berkeley), Rupert Croft (CMU ), Aklant

A New Boosting Algorithm Using Input-Dependent Regularizer Rong Jin rong+@cs.cmu.edu Yan Liu

CMU 15-896 Noncooperative games 1: Basic concepts Teacher: Ariel Procaccia Normal-Form Game

CMU 15-896 Social choice 1: The basics Teacher: Ariel Procaccia Social choice theory A

CMU 15-896 Social networks 1: Coordination Games Teacher: Ariel Procaccia Background

CMU 15-896 Social choice 3: Advanced manipulation Teacher: Ariel Procaccia Recap A

CMU 15-896 Mechanism design 2: With money Teacher: Ariel Procaccia MD with money Money

CMU 15-896 Fair division 1: Cake cutting Teacher: Ariel Procaccia Single heterogeneous

CMU 15-896 Noncooperative games 3: Price of anarchy Teacher: Ariel Procaccia Back to prison

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1

Homework 7.1 C D Here is the payoff matrix for the most commonly used version of the

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians,

CS540 Midterm Review Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Imprecision in learning: introduction Sebastien Destercke Universit de Technologie de

Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work

Decision Problems Decision Making under Uncertainty, Part III Christos Dimitrakakis Chalmers

a chaining algorithm for online non parametric regression Pierre Gaillard December 2, 2015