CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - - PowerPoint PPT Presentation

β–Ά
csc304 lecture 6 game theory
SMART_READER_LITE
LIVE PREVIEW

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - - PowerPoint PPT Presentation

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and


slide-1
SLIDE 1

CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning

CSC304 - Nisarg Shah 1

slide-2
SLIDE 2

2-Player Zero-Sum Games

CSC304 - Nisarg Shah 2

  • Reward of P2 = - Reward of P1

➒ Matrix 𝐡 s.t. 𝐡𝑗,π‘˜ is reward to P1 when P1 chooses her π‘—π‘’β„Ž

action and P2 chooses her π‘˜π‘’β„Ž action

➒ Mixed strategy profile (𝑦1, 𝑦2) β†’ reward to P1 is 𝑦1

π‘ˆ 𝐡 𝑦2

  • Minimax Theorem: For all 𝐡,

max

𝑦1

min

𝑦2

𝑦1

π‘ˆ 𝐡 𝑦2 = min 𝑦2

max

𝑦1

𝑦1

π‘ˆ 𝐡 𝑦2

➒ Proof through online expert learning!

slide-3
SLIDE 3

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 3

  • Setup:

➒ On each day, we want to predict if a stock price will go up

  • r down

➒ π‘œ experts provide their predictions every day

  • Each expert says either up or down

➒ Based on their advice, we make a final prediction ➒ At the end of the day, we learn if our prediction was

correct (reward = 1) or wrong (reward = 0)

  • Goal:

➒ Do almost as good as the best expert in hindsight!

slide-4
SLIDE 4

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 4

  • Notation

➒ π‘œ = #experts ➒ Predictions and ground truth: 1 or 0 ➒ 𝑛𝑗

(π‘ˆ) = #mistakes of expert 𝑗 in first π‘ˆ steps

➒ 𝑁(π‘ˆ) = #mistakes of the algorithm in first π‘ˆ steps

  • Simplest idea:

➒ Keep a weight for each expert ➒ Use weighted majority of experts to make prediction ➒ Decrease the weight of an expert whenever the expert

makes a mistake

slide-5
SLIDE 5

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 5

  • Weighted Majority:

➒ Fix πœƒ ≀ 1/2. ➒ Start with π‘₯𝑗

(1) = 1.

➒ In time step 𝑒, predict 1 if the total weight of experts

predicting 1 is larger than the total weight of experts predicting 0, and vice-versa.

➒ At the end of time step 𝑒, set π‘₯𝑗

(𝑒+1) ← π‘₯𝑗 (𝑒) β‹… (1 βˆ’ πœƒ) for

every expert that made a mistake.

slide-6
SLIDE 6

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 6

  • Theorem: For every 𝑗 and π‘ˆ,

𝑁(π‘ˆ) ≀ 2 1 + πœƒ 𝑛𝑗

(π‘ˆ) + 2 ln π‘œ

πœƒ

  • Proof:

➒ Consider a β€œpotential function” Ξ¦(𝑒) = σ𝑗 π‘₯𝑗

(𝑒).

➒ If the algorithm makes a mistake in round 𝑒, at least half

  • f the weight decreases by a factor of 1 βˆ’ πœƒ:

Ξ¦(𝑒+1) ≀ Ξ¦(𝑒) 1 2 + 1 2 1 βˆ’ πœƒ = Ξ¦(𝑒) 1 βˆ’ πœƒ 2

slide-7
SLIDE 7

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 7

  • Theorem: For every 𝑗 and π‘ˆ,

𝑁(π‘ˆ) ≀ 2 1 + πœƒ 𝑛𝑗

(π‘ˆ) + 2 ln π‘œ

πœƒ

  • Proof:

➒ Ξ¦(1) = π‘œ ➒ Thus: Ξ¦(π‘ˆ+1) ≀ π‘œ 1 βˆ’

πœƒ 2 𝑁(π‘ˆ)

.

➒ Weight of expert 𝑗: π‘₯𝑗

(π‘ˆ+1) = 1 βˆ’ πœƒ 𝑛𝑗

(π‘ˆ)

➒ Use Ξ¦(π‘ˆ+1) β‰₯ π‘₯𝑗

π‘ˆ+1 and βˆ’ ln 1 βˆ’ πœƒ ≀ πœƒ + πœƒ2

(as πœƒ ≀ 1/2).

slide-8
SLIDE 8

Online Expert Learning

CSC2420 – Allan Borodin & Nisarg Shah 8

  • Beautiful!

➒ Comparison to the best expert in hindsight. ➒ At most (roughly) twice as many mistakes + small additive

term

➒ In the worst case over how experts make mistakes

  • No statistical assumptions.

➒ Simple policy to implement.

  • It can be shown that this bound is tight for any

deterministic algorithm.

slide-9
SLIDE 9

Randomized Weighted Majority

CSC2420 – Allan Borodin & Nisarg Shah 9

  • Randomization β‡’ beat the factor of 2
  • Simple Change:

➒ At the beginning of round 𝑒, let

  • Ξ¦1

(𝑒) = total weight of experts predicting 1

  • Ξ¦0

𝑒 = total weight of experts predicting 0

➒ Deterministic: predict 1 if Φ1

(𝑒) > Ξ¦0 (𝑒), 0 otherwise.

➒ Randomized: predict 1 with probability

Ξ¦1

𝑒

Ξ¦1

(𝑒)+Ξ¦0 (𝑒), 0 with

the remaining probability.

slide-10
SLIDE 10

Randomized Weighted Majority

CSC2420 – Allan Borodin & Nisarg Shah 10

  • Equivalently:

➒ β€œPick an expert with probability proportional to weight, and

go with their prediction”

➒ Pr[picking expert 𝑗 in step 𝑒] = π‘žπ‘—

𝑒 = π‘₯𝑗

𝑒

Ξ¦ 𝑒

  • Let 𝑐𝑗

𝑒 = 1 if expert 𝑗 makes a mistake in step 𝑒, 0 otherwise.

  • Algorithm makes a mistake in round 𝑒 with probability

෍

𝑗

π‘žπ‘—

𝑒 𝑐𝑗 𝑒 = 𝒒 𝑒 β‹… 𝒄 𝑒

  • 𝐹[#mistakes after π‘ˆ rounds] = σ𝑒=1

π‘ˆ

𝒒 𝑒 β‹… 𝒄 𝑒

slide-11
SLIDE 11

Randomized Weighted Majority

CSC2420 – Allan Borodin & Nisarg Shah 11

Ξ¦ 𝑒+1 = σ𝑗 π‘₯𝑗

𝑒+1 = σ𝑗 π‘₯𝑗 𝑒 β‹… 1 βˆ’ πœƒπ‘π‘— 𝑒

= Ξ¦ 𝑒 βˆ’ πœƒ Ξ¦ 𝑒 σ𝑗 π‘žπ‘—

𝑒 β‹… 𝑐𝑗 𝑒

= Ξ¦ 𝑒 1 βˆ’ πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒 ≀ Ξ¦ 𝑒 exp βˆ’πœƒ 𝒒 𝑒 β‹… 𝒄 𝑒

  • Applying iteratively:

Ξ¦ π‘ˆ+1 ≀ π‘œ β‹… exp βˆ’πœƒ β‹… 𝐹 #mistakes

  • But Ξ¦ π‘ˆ+1 β‰₯ π‘₯𝑗

π‘ˆ+1 β‰₯ 1 βˆ’ πœƒ 𝑛𝑗

π‘ˆ

  • QED!
slide-12
SLIDE 12

Randomized Weighted Majority

CSC2420 – Allan Borodin & Nisarg Shah 12

  • Theorem: For every 𝑗 and π‘ˆ, the expected number of

mistakes of randomized weighted majority in the first π‘ˆ rounds is

𝑁 π‘ˆ ≀ 1 + πœƒ 𝑛𝑗

π‘ˆ + 2 ln π‘œ

πœƒ

  • Setting πœƒ =

ln π‘œ π‘ˆ

: 𝑁 π‘ˆ ≀ 𝑛𝑗

π‘ˆ + 𝑃

π‘ˆ β‹… ln π‘œ

  • We say that the algorithm has 𝑃

π‘ˆ β‹… ln π‘œ regret

  • Sublinear regret in π‘ˆ
  • Regret per round β†’ 0 as π‘ˆ β†’ ∞
slide-13
SLIDE 13

CSC304 - Nisarg Shah 13

How is this related to the minimax theorem?!!

slide-14
SLIDE 14

Minimax via Regret Learning

CSC2420 - Allan Borodin & Nisarg Shah 14

  • Recall:

π‘Š

𝑆 = max𝑦1 min𝑦2 𝑦1 π‘ˆ 𝐡 𝑦2

π‘Š

𝐷 = min𝑦2 max𝑦1 𝑦1 π‘ˆ 𝐡 𝑦2

  • Row player’s guarantee: my reward β‰₯ π‘Š

𝑆

  • Column player’s guarantee: row player’s reward ≀ π‘Š

𝐷

  • Hence, π‘Š

𝑆 ≀ π‘Š 𝐷 (trivial direction)

  • To prove: π‘Š

𝑆 = π‘Š 𝐷

slide-15
SLIDE 15

Minimax via Regret Learning

CSC2420 - Allan Borodin & Nisarg Shah 15

  • Scale values in 𝐡 to be in [0,1].

➒ Without loss of generality.

  • Suppose for contradiction that π‘Š

𝑆 = π‘Š 𝐷 βˆ’ πœ€, πœ€ > 0.

  • Suppose row player 𝑆 uses randomized weighted

majority (experts = row player’s actions)

➒ In each round, column player 𝐷 responds by choosing her

action that minimizes the row player’s expected reward.

slide-16
SLIDE 16

Minimax via Regret Learning

CSC2420 - Allan Borodin & Nisarg Shah 16

  • After π‘ˆ iterations, row player’s reward is:

➒ π‘Š ≀ π‘ˆ β‹… π‘Š

𝑆

➒ π‘Š β‰₯ β€œreward of best action in hindsight” βˆ’ 𝑃

π‘ˆ β‹… ln π‘œ

  • Reward of best action in hindsight β‰₯ π‘ˆ β‹… π‘Š

𝐷.

  • Why?
  • Suppose column player plays action π‘˜π‘’ in round 𝑒
  • Equivalent to playing mixed strategy 𝑑 in each round
  • 𝑑 picks 𝑒 ∈ {1, … , π‘ˆ} at random and plays π‘˜π‘’
  • By definition of π‘Š

𝐷, 𝑑 cannot ensure that row player’s

reward is less than π‘Š

𝐷

  • Then, there is an action of row player with E[reward]

at least π‘Š

𝐷 against 𝑑

slide-17
SLIDE 17

Minimax via Regret Learning

CSC2420 - Allan Borodin & Nisarg Shah 17

  • After π‘ˆ iterations, row player’s reward is:

➒ π‘Š ≀ π‘ˆ β‹… π‘Š

𝑆

➒ π‘Š β‰₯ π‘ˆ β‹… π‘Š

𝐷 βˆ’ 𝑃

π‘ˆ β‹… ln π‘œ

➒ π‘ˆ β‹… π‘Š

𝑆 = π‘ˆ β‹… (π‘Š 𝐷 βˆ’ πœ€) β‰₯ π‘ˆ β‹… π‘Š 𝐷 βˆ’ 𝑃

π‘ˆ β‹… ln π‘œ

➒ πœ€ π‘ˆ ≀ 𝑃

π‘ˆ β‹… ln π‘œ

➒ Contradiction for sufficiently large π‘ˆ.

  • QED!
slide-18
SLIDE 18

Yao’s Minimax Principle

CSC304 - Nisarg Shah 18

  • Goal:

➒ Provide a lower bound on the expected running time that

any randomized algorithm for a problem can achieve in the worst case over problem instances

  • Note:

➒ Expectation (in running time) is over randomization of the

algorithm

➒ The problem instance (worst case) is chosen to maximize

this expected running time

slide-19
SLIDE 19

Yao’s Minimax Principle

CSC304 - Nisarg Shah 19

  • Notation

➒ Capital letters for β€œrandomized”, small for deterministic ➒ 𝑒 : a deterministic algorithm ➒ 𝑆 : a randomized algorithm ➒ π‘ž : a problem instance ➒ 𝑄 : a distribution over problem instances ➒ π‘ˆ : running time

  • We are interested in

min

𝑆

max

π‘ž

π‘ˆ(𝑆, π‘ž)

slide-20
SLIDE 20

Yao’s Minimax Principle

CSC304 - Nisarg Shah 20

  • Det. Algorithms

Problem Instances Running times

slide-21
SLIDE 21

Yao’s Minimax Principle

CSC304 - Nisarg Shah 21

  • Minimax Theorem:

min

𝑆

max

π‘ž

π‘ˆ(𝑆, π‘ž) = max

𝑄

min

𝑒

π‘ˆ(𝑒, 𝑄)

  • So:

➒ To lower bound the E[running time] of any randomized

algorithm 𝑆 on its worst-case instance π‘ž by a quantity 𝑅…

➒ Choose a distribution 𝑄 over problem instances, and

show that every det. algorithm 𝑒 has expected running time at least 𝑅 on problems drawn from 𝑄