CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: - - PowerPoint PPT Presentation

cmu 15 896
SMART_READER_LITE
LIVE PREVIEW

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: - - PowerPoint PPT Presentation

CMU 15-896 Noncooperative games 2: Learning and minimax Teacher: Ariel Procaccia Reminder: The Minimax Theorem Theorem [von Neumann, 1928]: Every 2-player zero-sum game has a unique value such that: Player 1 can guarantee value at o


slide-1
SLIDE 1

CMU 15-896

Noncooperative games 2: Learning and minimax

Teacher: Ariel Procaccia

slide-2
SLIDE 2

15896 Spring 2016: Lecture 18

Reminder: The Minimax Theorem

  • Theorem [von Neumann, 1928]:

Every 2-player zero-sum game has a unique value such that:

  • Player 1 can guarantee value at

least

  • Player 2 can guarantee loss at

most

  • We will prove the theorem via

no-regret learning

2

slide-3
SLIDE 3

15896 Spring 2016: Lecture 18

How to reach your spaceship

  • Each morning pick one of

possible routes

  • Then find out how long each

route took

  • Is there a strategy for

picking routes that does almost as well as the best fixed route in hindsight?

3

53 minutes 47 minutes ⋯

slide-4
SLIDE 4

15896 Spring 2016: Lecture 18

The model

  • View as a matrix (maybe infinite

#columns)

  • Algorithm picks row, adversary column
  • Alg pays cost of (row,column) and gets

column as feedback

  • Assume costs are in

4

Algorithm Adversary

slide-5
SLIDE 5

15896 Spring 2016: Lecture 18

The model

  • Define average regret in

time steps as (average per-day cost of alg) (average per-day cost of best fixed row in hindsight)

  • No-regret algorithm: regret

as

  • Not competing with adaptive strategy, just

the best fixed row

5

slide-6
SLIDE 6

15896 Spring 2016: Lecture 18

Example

  • Algorithm 1: Alternate between

U and D

  • Poll 1: What is algorithm 1’s

worst-case average regret?

1. 2. 3. 4.

6

1

Algorithm Adversary

1

slide-7
SLIDE 7

15896 Spring 2016: Lecture 18

Example

  • Algorithm 2: Choose action that

has lower cost so far

  • Poll 2: What is algorithm 2’s

worst-case average regret?

1. 2. 3. 4.

7

1

Algorithm Adversary

1

slide-8
SLIDE 8

15896 Spring 2016: Lecture 18

8

What can we say more generally about deterministic algorithms?

slide-9
SLIDE 9

15896 Spring 2016: Lecture 18

Using expert advice

  • Want to predict the stock market
  • Solicit advice from

experts

  • Expert = someone with an opinion
  • Can we do as well as best in hindsight?

9

Expert 1 Expert 2 Expert 3 Charlie Truth 1 2 ⋯ Day

slide-10
SLIDE 10

15896 Spring 2016: Lecture 18

Simpler question

  • One of the

experts never makes a mistake

  • We want to find out which one
  • Algorithm 3: Take majority vote over experts

that have been correct so far

  • Poll 3: What is algorithm 3’s worst-case number
  • f mistakes?

1.

Θ 1

2.

Θ log

3.

Θ

4.

10

slide-11
SLIDE 11

15896 Spring 2016: Lecture 18

What if no expert is perfect?

  • Idea: Run algorithm 3 until all experts are

crossed off, then repeat

  • Makes at most

mistakes per mistake

  • f the best expert
  • But this is wasteful: we keep forgetting

what we’ve learned

11

slide-12
SLIDE 12

15896 Spring 2016: Lecture 18

Weighted Majority

  • Intuition: Making a mistake doesn’t

disqualify an expert, just lowers its weight

  • Weighted Majority Algorithm:
  • Start with all experts having weight 1
  • Predict based on weighted majority vote
  • Penalize mistakes by cutting weight in

half

12

slide-13
SLIDE 13

15896 Spring 2016: Lecture 18

13

  • Expert 1

Expert 2 Expert 3 Charlie Alg

  • Truth

Prediction 1

  • 0.5
  • 1
  • 1
  • 1
  • Prediction 2

Weight 2

1 1 1 1

Weight 1

0.5 1 0.5 0.5

Weight 3

Right, 3 Wrong, 1 Right, 1.5 Wrong, 2

slide-14
SLIDE 14

15896 Spring 2016: Lecture 18

Weighted Majority: Analysis

  • #mistakes we’ve made so far
  • #mistakes of best expert so far
  • total weight (starts at )
  • For each mistake,

drops by at least 25% after mistakes:

  • Weight of best expert is
  • 14
slide-15
SLIDE 15

15896 Spring 2016: Lecture 18

Randomized Weighted Majority

  • Randomized Weighted Majority

Algorithm:

  • Start with all experts having weight 1
  • Predict proportionally to weights: the total

weight of is

and the total weight of

is

, predict

with probability

  • and

with probability

  • Penalize mistakes by removing

fraction of weight

15

slide-16
SLIDE 16

15896 Spring 2016: Lecture 18

Randomized Weighted Majority Idea: smooth out the worst case

16

The worst-case is ∼50-50: now we have a 50% chance of getting it right What about 90-10? We’re very likely to agree with the majority

Wrong, 1

slide-17
SLIDE 17

15896 Spring 2016: Lecture 18

Analysis

  • At time we have a fraction of weight
  • n experts that made a mistake
  • Prob. of making a mistake, remove
  • fraction of total weight
  • 17

ln 1 (next slide)

slide-18
SLIDE 18

15896 Spring 2016: Lecture 18

Analysis

18

ln1

slide-19
SLIDE 19

15896 Spring 2016: Lecture 18

Analysis

  • Weight of best expert is
  • By setting
  • and solving, we get
  • Since

,

  • Average regret is

19

slide-20
SLIDE 20

15896 Spring 2016: Lecture 18

More generally

  • Each expert is an action with cost in
  • Run Randomized Weighted Majority
  • Choose expert with probability
  • Update weights:
  • Same analysis applies:
  • Our expected cost:
  • Fraction of weight removed:
  • So, fraction removed

(our cost)

20

slide-21
SLIDE 21

15896 Spring 2016: Lecture 18

Proof of the minimax thm

  • Suppose for contradiction that zero-sum

game has

such that:

  • If column player commits first, there is a row

that guarantees row player at least

  • If row player commits first, there is a column

that guarantees row player at most

  • Scale matrix so that payoffs to row player

are in , and let

  • 21
slide-22
SLIDE 22

15896 Spring 2016: Lecture 18

Proof of the minimax thm

  • Row player plays RWM, and column player

responds optimally to current mixed strategy

  • After

steps

  • ALG best row in hindsight 2 log
  • Best row in hindsight ⋅
  • ALG ⋅
  • It follows that
  • contradiction for large

enough

22