CS6501: T
- pics in Learning and Game Theory
(Fall 2019)
Introduction
Instructor: Haifeng Xu
Introduction Instructor: Haifeng Xu Outline Course Overview - - PowerPoint PPT Presentation
CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction Instructor: Haifeng Xu Outline Course Overview Administrivia An Example 2 Single-Agent Decision Making A decision maker picks an action , resulting
CS6501: T
(Fall 2019)
Instructor: Haifeng Xu
2
Ø Course Overview Ø Administrivia Ø An Example
3
Ø A decision maker picks an action 𝑦 ∈ 𝑌, resulting in utility 𝑔(𝑦) Ø Typically an optimization problem: minimize (or maximize) 𝑔(𝑦) subject to 𝑦 ∈ 𝑌
Ø Example 1: minimize 𝑦', s.t. 𝑦 ∈ [−1,1] Ø Example 2: pick a road to school
4
Ø A decision maker picks an action 𝑦 ∈ 𝑌, resulting in utility 𝑔(𝑦) Ø Typically an optimization problem: minimize (or maximize) 𝑔(𝑦) subject to 𝑦 ∈ 𝑌
Ø Example 1: minimize 𝑦', s.t. 𝑦 ∈ [−1,1] Ø Example 2: pick a road to school Ø Example 3: invest a subset of stocks
5
Ø Usually, your payoffs affected not only by your actions, but also others’ Ø Agent 𝑗’s utility 𝑔
.(𝑦., 𝑦/.) depends on his own action 𝑦., as well as
Ø Is this still an optimization problem? Should each agent 𝑗 just pick 𝑦. ∈ 𝑌. to minimize 𝑔
.(𝑦., 𝑦/.)?
Ø Examples: stock investment, routing, sales, even taking courses…
6
Ø Two members A,B of a criminal gang are arrested
Ø They are questioned in two separate rooms
v No communications between them
Ø Both of them betray Ø (-1,-1) is the best, but is not a stable status
inefficiency
Q: How should each prisoner act?
7
8
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
$200!
If the market has only one book seller… Q: What price should this monopoly set?
9
What if the market has two book sellers…
$200! $199
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
10
What if the market has two book sellers…
$200! $199 $198
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
11
What if the market has two book sellers…
$200! $199 $198 $100
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
12
What if the market has two book sellers…
$200! $199 $198 $100 $20
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
13
What if the market has two book sellers…
$200! $199 $198 $100 $20 $20
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
14
What if the market has two book sellers…
$20 $20
Ø The market reaches a “stable status” (a.k.a., equilibrium) Ø Nobody can benefit via unilateral deviation
inefficiency (to sellers)
Ø Assume people will buy if the book price ≤ $200
Ø Product cost = $20
Q: What price should each seller set?
15
Game Theory studies multiple-agent decision making in competitive scenarios where an agent’s payoff depends on other agents’ actions.
Ø Fundamental concept --- Equilibrium
unilateral deviation
Ø A central theme in game theory is to study the equilibrium
inefficiency of equilibrium . . .
16
Ø Difficult to give a universal definition Ø At a high level, the task is to learn a function 𝑔: 𝑌 → 𝑍, where
x, y ∈ 𝑌×𝑍 is drawn from some distribution 𝐸
𝑦., 𝑧.
.9:,',⋯,< drawn from 𝐸
measured by some loss function)
ØExamples
thought as 𝑌 = state space, 𝑍 = action space
17
Problems at Interface of Learning and Game Theory
Ø If a game is unknown or too complex, can players learn to play the
game optimally?
Ø Can game-theoretic models inspire machine learning models?
Ø Data is the fuel for ML – Can we collect high-quality data from crowd?
Ø We know how to learn to recognize faces or languages, but can we
also learn to design games to achieve some goal?
Ø Game-theoretic/strategic behaviors in ML? How to handle them?
student to UVA based on their features
Ø. ..
18
First Half: Machine learning for game theory
Ø No regret learning and its convergence to equilibrium Ø Learning optimal auction mechanisms
Second Half: Game theory for machine learning
Ø Incentivize high-quality data via information elicitation (a.k.a.,
crowdsourcing)
Ø Handle strategic behaviors in machine learning
19
First Half: Machine learning for game theory
Ø No regret learning and its convergence to equilibrium Ø Learning optimal auction mechanisms
Second Half: Game theory for machine learning
Ø Incentivize high-quality data via information elicitation (a.k.a.,
crowdsourcing)
Ø Handle strategic behaviors in machine learning
Only cover fundamentals of each direction
20
Ø Get familiar with basics of game theory and learning Ø Understand machine learning questions in game-theoretic
settings, and how to deal with some of them
Ø Understand strategic aspects in machine learning tasks, and how
to deal with some of them
Ø Can understand cutting-edge research papers in relevant areas
21
Ø Anyone planning to do research at the interface of game theory
(or algorithm design) and machine learning
Ø Anyone interested in theoretical ML, game theory, human factors
in learning, AI
game-theoretic reasoning becomes increasingly important
toolkits for designing and solving games
Ø Anyone interested in understanding basics of game theory and
learning
22
Ø Those who do not satisfy the prerequisites “in practice” Ø Those who are looking for a recipe to implement ML/DL
algorithms, or want to learn how to use TensorFlow, PyTorch, etc.
problems
23
Ø Course Overview Ø Administrivia Ø An Example
24
Ø Course time: Tuesday/Thursday, 3:30 pm – 4:45 pm Ø Lecture place: Thornton Hall E303 Ø Instructor: Haifeng Xu
ØTAs
ØDepending on demand, can add more office hours (let us know!) ØCouse website: http://www.haifeng-xu.com/cs6501fa19/ Ø References: linked papers/notes on website, no official textbooks
25
Ø Mathematically mature: be comfortable with proofs Ø Sufficient exposures to algorithms/optimization
26
Ø3-4 homeworks, 60% of grade.
independently
ØResearch project, 40% of grade. Project instructions will be posted
ØFYI: should not worry about your grade if you do invest time
27
If you have any suggestions/comments/concerns, feel free to email me.
28
Ø Course Overview Ø Administrivia Ø An Example
29
Ø You are a product seller facing 𝑂 unknown buyers Ø These buyers all value your product at the same 𝑤 ∈ [0,1], which
however is unknown to you
ØBuyers come in sequence 1,2, ⋯ , 𝑂; For each buyer, you can choose
a price 𝑞 and ask him whether he is willing to buy the product
30
Ø You are a product seller facing 𝑂 unknown buyers Ø These buyers all value your product at the same 𝑤 ∈ [0,1], which
however is unknown to you
ØBuyers come in sequence 1,2, ⋯ , 𝑂; For each buyer, you can choose
a price 𝑞 and ask him whether he is willing to buy the product
Ø How to quickly learn these buyers’ value 𝑤 within precision 𝜗 =
: J ?
Ø Answer: log(𝑂) rounds via BinarySearch
31
Ø You are a product seller facing 𝑂 unknown buyers Ø These buyers all value your product at the same 𝑤 ∈ [0,1], which
however is unknown to you Let us move to a natural game-theoretic setup ……
ØYou also have an objective of maximizing your revenue, but do not
really care about learning the 𝑤 (though you may have to)
Ø How much revenue can BinarySearch secure?
Rev = First log(𝑂) rounds + Remaining rounds (𝑂 − log 𝑂)(𝑤 − 1 𝑂) ≈ 𝑤𝑂 − 𝑤 log 𝑂 -1
32
Ø To measure algorithm performance, we use regret Ø Had we know 𝑤, should just price the product at 𝑞 = 𝑤, earning 𝑤𝑂 Ø The regret is then
Regret binary search ≈ 𝑤𝑂 − 𝑤𝑂 − 𝑤 log 𝑂 −1 = 𝑤 log 𝑂 +1 Regret ∶= how much less is an algorithm’s utility compared to the (idealized) case where we know 𝑤. Q: Is this the best (i.e., the smallest) regret?
33
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂) Why BinarySearch may be bad?
Ø For buyer 𝑗, BinarySearch maintains an interval bound [𝑏., 𝑐.]
and use 𝑞. = (𝑏. + 𝑐.)/2 for buyer 𝑗
𝑞. = (𝑏. + 𝑐.)/2 may be too high/aggressive
Ø Algorithm idea: use more conservative prices
𝑏. 𝑐. 𝑏. + 𝑐. 2 𝑞. Δ.
34
The Algorithm (note 𝑤 ∈ [0,1]): Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂)
ØMaintains an interval bound [𝑏., 𝑐.] and a step size Δ. ØOffer price 𝑞. = 𝑏. + Δ. for buyer 𝑗 ØIf 𝑗 accepts, update 𝑏.a: = 𝑞., 𝑐.a: = 𝑐., Δ.a: = Δ. ØOtherwise, update 𝑏.a: = 𝑏., 𝑐.a: = 𝑞., Δ.a: = Δ. ' ØStart with 𝑏: = 0, 𝑐: = 1, Δ: = 1/2; Once 𝑐. − 𝑏. ≤ :
J , always use
𝑞 = 𝑏. afterwards
𝑏. 𝑐. 𝑞. Δ.
Remark: searching smaller region with smaller step size.
35
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂) Algorithm analysis: Claim 1: The step size Δ. takes values 2/'b for 𝑘 = 0,1, ⋯. Moreover, whenever Δ.a: = Δ. ' happens, 𝑐.a: − 𝑏.a: = Δ.a:. Proof
Ø Recall Δ: =
: ' = 2/'d, and step size update Δ.a: = Δ. '
Ø If Δ. = 2/'b, then (Δ.)' = 2/'b/'b = 2/'bhi
𝑏. 𝑐. 𝑞. Δ.
36
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂) Algorithm analysis: Claim 1: The step size Δ. takes values 2/'b for 𝑘 = 0,1, ⋯. Moreover, whenever Δ.a: = Δ. ' happens, 𝑐.a: − 𝑏.a: = Δ.a:. Proof
Ø Recall Δ: =
: ' = 2/'d, and step size update Δ.a: = Δ. '
Ø If Δ. = 2/'b, then (Δ.)' = 2/'b/'b = 2/'bhi Ø When Δ.a: = Δ. ' happens, 𝑐.a: − 𝑏.a: = Δ. =
Δ.a:
𝑏. 𝑐. 𝑞. Δ.
37
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂)
𝑏. 𝑐. 𝑞. Δ.
Algorithm analysis:
Ø After 𝑐. − 𝑏. ≤
: J, the total regret is at most 1
: J; (2) there are at most 𝑂
rounds
Ø Main step is to bound regret before reaching 𝑐. − 𝑏. =
: J
38
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂)
𝑏. 𝑐. 𝑞. Δ.
Algorithm analysis:
ØHow many step size value updates needed to reach 𝑐. − 𝑏. =
: J?
: J à 𝑗 = log log 𝑂
Claim 2: total regret from any step size value Δ is at most 2.
39
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂)
𝑏. 𝑐. 𝑞. Δ.
Algorithm analysis:
ØHow many step size value updates needed to reach 𝑐. − 𝑏. =
: J?
: J à 𝑗 = log log 𝑂
Claim 2: total regret from any step size value Δ is at most 2. Ø No sale happens only once for any step size à regret at most 1
40
Theorem [Kleinberg/Leighton, FOCS'03] : there is an algorithm achieving regret at most (1 + 2 log log 𝑂)
𝑏. 𝑐. 𝑞. Δ.
Algorithm analysis:
ØHow many step size value updates needed to reach 𝑐. − 𝑏. =
: J?
: J à 𝑗 = log log 𝑂
𝑞.a: Δ. 𝑞.a' Δ.
Claim 2: total regret from any step size value Δ is at most 2. Ø No sale happens only once for any step size à regret at most 1 Ø What about the regret when sales happen?
Δ; regret from each time is at most 𝑐. − 𝑏. ≤ Δ
41
Remarks
Ø 𝑃(log log 𝑂) is also the order-wise best regret [KL, FOCS’13] ØThis is an example of exploration vs exploitation
Ø BinarySearch is best for exploration, but did not balance the two
42
Remarks
Ø 𝑃(log log 𝑂) is also the order-wise best regret [KL, FOCS’13] ØThis is an example of exploration vs exploitation
Ø BinarySearch is best for exploration, but did not balance the two Ø The “optimal” algorithm uses less step value updates, but more
interval updates
revenue maximization
43
Ø Here, it is crucial that each buyer only shows up once Ø What if the same buyer shows up repeatedly?
Ø How should a (repeatedly showing up) buyer behave if he knows
seller is learning her value 𝑤 and then uses it to set a price for her? Open Research Questions: 1. How to design pricing schemes for a repeatedly showing up buyer to maximize revenue when the buyer knows you are learning his value? 2. How to generalize to selling multiple products?
Haifeng Xu
University of Virginia hx4ad@virginia.edu