CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning
CSC304 - Nisarg Shah 1
CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning - - PowerPoint PPT Presentation
CSC304 Lecture 6 Game Theory : Minimax Theorem via Expert Learning CSC304 - Nisarg Shah 1 2-Player Zero-Sum Games Reward of P2 = - Reward of P1 Matrix s.t. , is reward to P1 when P1 chooses her action and
CSC304 - Nisarg Shah 1
CSC304 - Nisarg Shah 2
β’ Matrix π΅ s.t. π΅π,π is reward to P1 when P1 chooses her ππ’β
action and P2 chooses her ππ’β action
β’ Mixed strategy profile (π¦1, π¦2) β reward to P1 is π¦1
π π΅ π¦2
π¦1
π¦2
π π΅ π¦2 = min π¦2
π¦1
π π΅ π¦2
β’ Proof through online expert learning!
CSC2420 β Allan Borodin & Nisarg Shah 3
β’ On each day, we want to predict if a stock price will go up
β’ π experts provide their predictions every day
β’ Based on their advice, we make a final prediction β’ At the end of the day, we learn if our prediction was
correct (reward = 1) or wrong (reward = 0)
β’ Do almost as good as the best expert in hindsight!
CSC2420 β Allan Borodin & Nisarg Shah 4
β’ π = #experts β’ Predictions and ground truth: 1 or 0 β’ ππ
(π) = #mistakes of expert π in first π steps
β’ π(π) = #mistakes of the algorithm in first π steps
β’ Keep a weight for each expert β’ Use weighted majority of experts to make prediction β’ Decrease the weight of an expert whenever the expert
makes a mistake
CSC2420 β Allan Borodin & Nisarg Shah 5
β’ Fix π β€ 1/2. β’ Start with π₯π
(1) = 1.
β’ In time step π’, predict 1 if the total weight of experts
predicting 1 is larger than the total weight of experts predicting 0, and vice-versa.
β’ At the end of time step π’, set π₯π
(π’+1) β π₯π (π’) β (1 β π) for
every expert that made a mistake.
CSC2420 β Allan Borodin & Nisarg Shah 6
(π) + 2 ln π
β’ Consider a βpotential functionβ Ξ¦(π’) = Οπ π₯π
(π’).
β’ If the algorithm makes a mistake in round π’, at least half
Ξ¦(π’+1) β€ Ξ¦(π’) 1 2 + 1 2 1 β π = Ξ¦(π’) 1 β π 2
CSC2420 β Allan Borodin & Nisarg Shah 7
(π) + 2 ln π
β’ Ξ¦(1) = π β’ Thus: Ξ¦(π+1) β€ π 1 β
π 2 π(π)
.
β’ Weight of expert π: π₯π
(π+1) = 1 β π ππ
(π)
β’ Use Ξ¦(π+1) β₯ π₯π
π+1 and β ln 1 β π β€ π + π2
(as π β€ 1/2).
CSC2420 β Allan Borodin & Nisarg Shah 8
β’ Comparison to the best expert in hindsight. β’ At most (roughly) twice as many mistakes + small additive
term
β’ In the worst case over how experts make mistakes
β’ Simple policy to implement.
CSC2420 β Allan Borodin & Nisarg Shah 9
β’ At the beginning of round π’, let
(π’) = total weight of experts predicting 1
π’ = total weight of experts predicting 0
β’ Deterministic: predict 1 if Ξ¦1
(π’) > Ξ¦0 (π’), 0 otherwise.
β’ Randomized: predict 1 with probability
Ξ¦1
π’
Ξ¦1
(π’)+Ξ¦0 (π’), 0 with
CSC2420 β Allan Borodin & Nisarg Shah 10
β’ βPick an expert with probability proportional to weight, and
go with their predictionβ
β’ Pr[picking expert π in step π’] = ππ
π’ = π₯π
π’
Ξ¦ π’
π’ = 1 if expert π makes a mistake in step π’, 0 otherwise.
ΰ·
π
ππ
π’ ππ π’ = π π’ β π π’
π
π π’ β π π’
CSC2420 β Allan Borodin & Nisarg Shah 11
Ξ¦ π’+1 = Οπ π₯π
π’+1 = Οπ π₯π π’ β 1 β πππ π’
= Ξ¦ π’ β π Ξ¦ π’ Οπ ππ
π’ β ππ π’
= Ξ¦ π’ 1 β π π π’ β π π’ β€ Ξ¦ π’ exp βπ π π’ β π π’
Ξ¦ π+1 β€ π β exp βπ β πΉ #mistakes
π+1 β₯ 1 β π ππ
π
CSC2420 β Allan Borodin & Nisarg Shah 12
mistakes of randomized weighted majority in the first π rounds is
π + 2 ln π
ln π π
π + π
π β ln π regret
CSC304 - Nisarg Shah 13
CSC2420 - Allan Borodin & Nisarg Shah 14
π
π = maxπ¦1 minπ¦2 π¦1 π π΅ π¦2
π
π· = minπ¦2 maxπ¦1 π¦1 π π΅ π¦2
π
π·
π β€ π π· (trivial direction)
π = π π·
CSC2420 - Allan Borodin & Nisarg Shah 15
β’ Without loss of generality.
π = π π· β π, π > 0.
β’ In each round, column player π· responds by choosing her
CSC2420 - Allan Borodin & Nisarg Shah 16
β’ π β€ π β π
π
β’ π β₯ βreward of best action in hindsightβ β π
π·.
π·, π‘ cannot ensure that row playerβs
π·
at least π
π· against π‘
CSC2420 - Allan Borodin & Nisarg Shah 17
β’ π β€ π β π
π
β’ π β₯ π β π
π· β π
β’ π β π
π = π β (π π· β π) β₯ π β π π· β π
π β ln π
β’ π π β€ π
π β ln π
β’ Contradiction for sufficiently large π.
CSC304 - Nisarg Shah 18
β’ Provide a lower bound on the expected running time that
any randomized algorithm for a problem can achieve in the worst case over problem instances
β’ Expectation (in running time) is over randomization of the
algorithm
β’ The problem instance (worst case) is chosen to maximize
CSC304 - Nisarg Shah 19
β’ Capital letters for βrandomizedβ, small for deterministic β’ π : a deterministic algorithm β’ π : a randomized algorithm β’ π : a problem instance β’ π : a distribution over problem instances β’ π : running time
π
π
CSC304 - Nisarg Shah 20
Problem Instances Running times
CSC304 - Nisarg Shah 21
π
π
π
π
β’ To lower bound the E[running time] of any randomized
β’ Choose a distribution π over problem instances, and
show that every det. algorithm π has expected running time at least π on problems drawn from π