A tutorial on: Dynamic Mechanism Design
Ruggiero Cavallo University of Pennsylvania
Department of Computer and Information Science
July 7, 2009 ACM EC
A tutorial on: Dynamic Mechanism Design Ruggiero Cavallo - - PowerPoint PPT Presentation
A tutorial on: Dynamic Mechanism Design Ruggiero Cavallo University of Pennsylvania Department of Computer and Information Science July 7, 2009 ACM EC The setting Sequence of decisions to be made impacting utility experienced by a
Ruggiero Cavallo University of Pennsylvania
Department of Computer and Information Science
July 7, 2009 ACM EC
utility experienced by a group of agents
agents hold private valuation information knowledge of private information required to determine optimal decision at every point in time new private information potentially arrives after each decision
2
Example: a resource – say a government-
repeatedly for 1 week intervals.
3
Problem: Agents’ goals differ from center’s goals, but agent cooperation is essential.
specification of payment schemes such that
equilibrium. An extension/generalization of “static” mechanism design.
4
completely independent of future decisions.
the resource? What opportunities for reselling the resource?
5
design.
dynamic settings, dynamic equilibrium notions.
6
Just social-welfare maximizing, here
7
have (more complicated) analogs in the dynamic setting.
8
9
10
private information decision, payments
11
monetary charge/payment imposed on each agent.
social-welfare maximizing (a.k.a., efficient) individual rational (no agent worse off) budget properties
utility-maximizing strategy, regardless of what
type is utility-maximizing, whatever the types of
[same as strategyproof in private values setting]
type is utility-maximizing, in expectation given distribution over others’ types, assuming other agents are truthful.
12
maximizing according to agent reports.
value of all other agents for the chosen
independent of the agent’s report.
[Vickrey, 1961; Clarke, 1971; Groves 1973]
13
14
$10 $10 $10
corresponds to those that are efficient in dominant strategies.*
[Green & Laffont, 77], strengthened by [Holmstrom, 79]
Our freedom is limited to defining the agent-independent “charge” term
*For sufficiently rich domains (“for all practical purposes”).
15
payoff equal to social welfare.
payment independent of his behavior.
16
payoff equal to social welfare.
payment independent of his behavior.
16
A little more subtle in dynamic setting...
to the value other agents could have obtained if i’s interests were ignored.
Each agent’s utility equals contribution to social welfare. Ex post individual rational (if agents have non- negative values for all outcomes) No-deficit... in fact often yields high revenue.
17
welfare others will get given his report, minus some uninfluencable quantity.
18
weakly efficient and IR, unlike VCG.
agents – thus improving social welfare – without weakening equilibrium or running deficit.
settings.
19
Idea: leverage domain information to obtain “revenue-guarantees”.
For each agent i, compute minimum revenue that i could cause to result, given reports of other agents (Gi). Run VCG. Give each i payment of Gi/n.
Applicable to any setting (e.g., combinatorial allocation). In single- item allocation, coincides with [Bailey, 97] mechanism.
20
21
10 20 30 40 50 60 70 80 90 100 3 4 5 6 7 8 9 10 % value retained by agents number of agents dynamic-VCG dynamic-RM
case optimality.
mechanism.
burning when payments not possible.
22
with, e.g., revenue maximization, maximizing the minimum utility, etc.
23
24
information (“local state”).
execute, which generates value (of varying degree) for agents and yields new local states.
depend on state.
future to be γk x, for some 0 < γ ≤ 1.
Key variable: local state.
25
determine: a conditional distribution over value the agent would obtain for every possible action a conditional distribution over future local states for every possible action
26
value obtained and subsequent local state for each agent are independent of other agents’ local states. Dynamic version of private values.
27
When a given action is taken in a given state, what value results?
When a given action is taken in a given state, what new state results?
28
29
10 1/4 3/4 1/2 20 100 20 1/2 1/2 60 1/2 20
Transition dynamics between local states. Value function for state-action pairs. Indicator of “current” state.
30
gradually revealed to the agent by nature over time.
the uncertainty.
transitions, we’re in a static MD setting – just decide policy at time 0.
31
special case of the general sequential decision-making framework.
allocation scenarios.
32
by a Markov chain: no multiplicity of actions.
When an agent’s action is chosen, his state changes; otherwise, it doesn’t.
33
34
3/4 1/4 1/2 1/2 30 30 30 1/4 3/4 1/2 1/2 20 20 20
35
3/4 1/4 1/2 1/2 30 30 30 1/4 3/4 1/2 1/2 20 20 20
Allocate to agent 1, who finds no value.
36
takes an action.
a decision policy: a function that maps a joint type to an action. a transfer function: that maps a joint type to a payment for each agent.
37
If all other agents play the equilibrium strategy in the future, no agent can benefit from deviating – regardless of what the joint state is and regardless of what came before.
38
If all other agents report types truthfully in the future, no agent can benefit from misreporting type – regardless of what the joint type is and regardless of what came before.
39
No incentive to deviate even if agents know everything
predictions about the future in determining how to maximize utility – and this requires positing some behavior for other agents.
to the agent’s utility, incentives couldn’t possibly be aligned.
40
Given distribution over other agents’ types, no agent can expect to gain from deviating if
Within-period ex post also involves expectation, but expectation is over uncertain type transitions, not current types.
41
achieved in equilibrium.
lose from participating.
Within-period ex post: at every time-step, for every joint type. Ex ante: from beginning of the mechanism, for whatever the joint type is then.
42
holds [Myerson, 1986].
revelation mechanisms, without loss of generality.
43
44
[Athey & Segal, 07]
Follows efficient policy given agent reports. In each period, pays each agent the expected immediate value obtained by other agents given reported types (“Groves payment”).
45
200 1/4 3/4 1/4 20 1/2 1/2 60 3/4 20 20 100
Agent 2 Agent 1
20 20 9/10 1/10
A B C J K L D E F G H I M N O P Q * *
46
* → blue AJ → red or blue BJ → red or blue CK → red CL → blue
200 1/4 3/4 1/4 20 1/2 1/2 60 3/4 20 20 100
Agent 2 Agent 1
20 20 9/10 1/10
A B C J K L D E F G H I M N O P Q * *
47
200 1/4 3/4 1/4 20 1/2 1/2 60 3/4 20 20 100
Agent 2 Agent 1
20 20 9/10 1/10
A B C J K L D E F G H I M N O P Q * *
48
200 1/4 3/4 1/4 20 1/2 1/2 60 3/4 20 20 100
Agent 2 Agent 1
20 20 9/10 1/10
A B C J K L D E F G H I M N O P Q * *
49
Theorem: The dynamic team mechanism is truthful and efficient in within-period ex post Nash equilibrium.
[Athey & Segal, 07]
50
defines payments such that:
Each agent’s expected sum of payments when he follows strategy σ equals the expected value
some quantity independent of σ.
51
52
Theorem: Every dynamic-Groves mechanism is truthful and efficient in within-period ex post Nash equilibrium.
[Cavallo, Parkes, & Singh, 07]
Proof: Each agent obtains social utility (aligns incentives) minus some constant (doesn’t distort).
Theorem: For unrestricted types, the dynamic- Groves class exactly corresponds to the history- independent dynamic mechanisms that are truthful and efficient in within-period ex post Nash equilibrium. [Cavallo, 08]
For within-period ex post efficient (and history- independent) dynamic mechanism design, dynamic-Groves is the only game in town.
53
Theorem: For unrestricted types, the dynamic- Groves class exactly corresponds to the history- independent dynamic mechanisms that are truthful and efficient in within-period ex post Nash equilibrium. [Cavallo, 08]
Generalizes [Green & Laffont, 77] (Groves class unique for static settings). Proof idea: If non-Groves, there is always some type for which incentives are sufficiently distorted from efficiency.
54
demand efficiency in strongest sense, we know what the possibilities are.
desirable budget/participation properties. basic “team mechanism” won’t fly – extreme budget imbalance need to recover payments...
55
Charge agents some quantity computed “ex ante” of anything they report.
56
Choose efficient decision given reported types. Make Groves payments. Charge each agent a quantity based only on the reported types of other agents in the first time-step: (1-γ) times total value other agents would obtain, in expectation from beginning of mechanism, if policy optimal for them was chosen. [Cavallo, Parkes, & Singh, 06] Ti(θt) = r−i(θt
−i, π∗(θt)) − (1 − γ)V−i(θ0 −i)
57
200 1/4 30 3/4 20 20 100
Agent 2 Agent 1
20 9/10 1/10
A B C J K L E F G I M P Q * *
10
58
N
Theorem: The dynamic-EAC mechanism is truthful and efficient in within-period ex post Nash equilibrium, ex ante individual rational, and ex ante no-deficit.
[Cavallo, Parkes, & Singh, 06]
59
“sign up” at beginning of mechanism, but may wish to back out...
60
expected value other agents would obtain if i were ignored after one step, minus the value they’d obtain if i were always ignored. Each agent has to pay the amount he inhibits
in the future) by his current report.
61
expected value other agents would obtain if i were ignored after one step, minus the value they’d obtain if i were always ignored.
Ti(θt) = r−i(θt
−i, π∗(θt)) + γE[V−i(τ(θt −i, π∗(θt)))] − V−i(θt −i)
62
200 1/4 30 3/4 20 100
Agent 2 Agent 1
20 9/10 1/10
A B C J K L E F G I M P Q * *
10
63
N
positive.
from any joint state, at any time t, is:
64
(NB: assumes no negative values)
Ti(θt) = r−i(θt
−i, π∗(θt)) + γE[V−i(τ(θt −i, π∗(θt)))] − V−i(θt −i)
r−i(θt
−i, π∗(θt)) + γ [V−i(τ(θt −i, π∗(θt)))] ≤ V−i(θt −i)
V (θt) − V−i(θt
−i) ≥ 0
Theorem: The dynamic-VCG mechanism is truthful and efficient in within-period ex post Nash equilibrium, within-period ex post individual rational, and ex post no-deficit.
65
66
10 20 30 40 50 60 70 80 90 100 3 4 5 6 7 8 9 10 % value retained by agents number of agents dynamic-VCG
In a single-item allocation setting, with values normally distributed.
Theorem: Among all history-independent mechanisms that are efficient in within- period ex post Nash equilibrium and within- period ex post individual rational, dynamic- VCG yields the most expected revenue, for every joint type.
[Cavallo, 08]
67
agents, what do we do?
budget balance was achieved by redistribution mechanisms; strong budget- balance by moving to Bayes-Nash equilibrium.
68
the dynamic setting. Now redistribution payment computed in later time periods can potentially be influenced via an agent’s reports in earlier periods... in subtle ways. Focus on worlds representable as multi- armed bandits.
69
70
i pays (1-γ) times the expected value other agents would get if i were always ignored. Other agents pay nothing.
71
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20
72
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20 T1 = -(1-γ) (10 + γ10)
73
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20 T1 = -(1-γ) (10 + γ10)
74
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20 T1 = -(1-γ) (10 + γ10) T2 = -(1-γ) 7.5
75
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20 T1 = -(1-γ) (10 + γ10) T2 = -(1-γ) 7.5
76
expected value other agents would get if he were always ignored.
3/4 1/4 1/2 1/2 30 30 30 1/2 1/2 20 20 T1 = -(1-γ) (10 + γ10) T2 = -(1-γ) 7.5 T2 = -(1-γ) 7.5
77
payments to the agents each period:
For agent i receiving item: (1-γ)/n times the expected total discounted revenue that would result if i were ignored going forward. For every other agent j: 1/n times the expected immediate revenue that would have resulted this period if j were ignored.
Lemma: Whatever strategy an agent follows, his expected redistribution payments over time equal: a 1/n share of the expected total (over time) revenue that would result if the agent were not present.
(This is the hard part to prove. Once we have, it follows that dynamic-RM is a dynamic-Groves mechanism, and thus efficient.)
78
Theorem: Dynamic-RM is efficient in within- period ex post Nash equilibrium, within- period ex post IR, and never runs a deficit.
And yields significantly more value for the agents than dynamic-VCG.
Examples with three or more agents are tough to illustrate, so let’s just look at aggregate results:
79
80
10 20 30 40 50 60 70 80 90 100 3 4 5 6 7 8 9 10 % value retained by agents number of agents dynamic-RM dynamic-VCG
81
10 20 30 40 50 60 70 80 90 100 3 4 5 6 7 8 9 10 % value retained by agents number of agents dynamic-RM dynamic-VCG
82
efficiency IR budget- balance team mechanism w.p. ex post w.p. ex post huge deficit dynamic-EAC w.p. ex post ex ante ex ante no-deficit dynamic-VCG w.p. ex post w.p. ex post ex post no-deficit dynamic-RM
(only for MABs)
w.p. ex post w.p. ex post ex post no-deficit,
much closer to perfect BB
balanced- mechanism Bayes-Nash ex ante perfect
83
84
temporarily or permanently – become “inaccessible”, i.e., unable to communicate with the center or make/receive payments.
85
who plan to see multiple shows over a period of days. New tourists always arriving, others leaving (dynamic population). A tourist may see a show, realize she likes the theater more/less (dynamic types).
86
but static types – all private information an agent will ever obtain can be reported in arrival period.
[Friedman & Parkes, 03] [Parkes & Singh, 03] [Lavi & Nisan, 04] [Porter, 04]
87
in her “arrival period”. Within-period ex post efficient. Ex post individual rational Ex post no-deficit.
88
in her “arrival period”. Within-period ex post efficient. Ex post individual rational Ex post no-deficit.
88
But only for static types.
Optimal policy must consider accessibility/ inaccessibility dynamics Agents may not be available for payment while still exerting influence on welfare of other agents.
89
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
90
2 inaccessible at t = 1 but very likely to become accessible at t = 2.
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
90
2 inaccessible at t = 1 but very likely to become accessible at t = 2.
better off “hiding” to improve social-welfare.
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
90
2 inaccessible at t = 1 but very likely to become accessible at t = 2.
better off “hiding” to improve social-welfare.
dynamic-VCG payments only to accessible agents, agent 2 can benefit by hiding.
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
90
payments dynamic-VCG would impose on agent; when the agent becomes accessible, execute “lump sum” payment, appropriately scaled for discounting.
back”.
91
Imagine both agents accessible in all periods. Should agent 2 feign inaccessibility until t = 2?
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
92
Imagine both agents accessible in all periods. Should agent 2 feign inaccessibility until t = 2?
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
T2 = -6 - 2 = -8, same whether he hides at t = 1 or not.
92
Imagine both agents accessible in all periods. Should agent 2 feign inaccessibility until t = 2?
→ 1 → 1 A B C → 2 t = 1 t = 2 t = 3 8 2
(a) Agent 1’s Type.
→ 1 → 1 E G → 2 D → 1 → 1 F H → 2 t = 1 t = 2 t = 3 0.2 4 0.8 20
(b) Agent 2’s Type.
T2 = -6 - 2 = -8, same whether he hides at t = 1 or not.
difference in optimal value for agent 1, with and without agent 2 present at t=1 difference in optimal value for agent 1, with and without agent 2 present at t=2
92
period ex post efficiency is recovered if agent arrivals are independent conditioned
93
maximization.
deadline.
94
95
very hard... but often necessary.
Approximations, yielding approximate equilibria (even this is hard)? Identify tractable special cases. Thankfully, MABs are such a case.
96
agent’s Markov chain.
with highest index. Complexity: Gittins indices are independent, so linear in number of agents.
97
Coordination of value information acquisition preceding one-time allocation of a single item (“metadeliberation auctions”).
98
initial valuations for the resource. Valuations can potentially be increased by costly “deliberation” (e.g., researching new ways
to maximize social welfare?
99
can be applied to deal with incentives.
policy is tractable (reduction to multi-armed bandits problem).
scenario, a realistic analysis of the problem reveals the need for dynamic solution.
100
101
infeasible...
strategyproof, yet achieves social-welfare ~90% of optimal.
102
quantity.
restricted setting.
103
104
kind of helps here: ex post payments become natural.
Version of the team mechanism is still within- period ex post efficient. But no apparent way to extend dynamic-VCG... Can we achieve no-deficit, IR, and efficienct in interdependent settings?
105
106
sealed tenders. Journal of Finance, 16:8–37, 1961.
8:19–33, 1971.
631, 1973.
Econometrica, 47(5):1137–1144, 1979.
107
revelation under incomplete information. In M. Boskin, editor, Economics and Human Welfare. Academic Press, 1979.
Incentives and incomplete information. Journal of Public Economics, 11:25– 45, 1979.
Strategyproof redistribution of VCG payments. In Proceedings of the 5th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’06), pages 882–889, 2006.
redistribution of VCG payments. In Proceedings of the 8th ACM Conference
balanced assignment. unpublished, 2007.
108
expectation redistribution mechanisms. In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS-08), 2008.
Optimal mechanism design and money burning. In Proceedings of the 40th annual ACM symposium on Theory of Computing (STOC’08), 2008.
Naroditskiy, and Amy Greenwald. Destroy to Save. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC-09), 2009.
Econometrica, 54(2):323–358, 1986.
private state. In Proceedings of the Twenty-second Annual Conference on Uncertainty in Artificial Intelligence (UAI’06), 2006.
109
mechanism design. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC-08) (to appear), 2008.
dynamic auctions. Cowles Foundation Discussion Paper 1584, http:// cowles.econ.yale.edu/P/cd/d15b/d1584.pdf, 2006.
interested agents. In DIMACS Workshop on the Boundary between Economic Theory and Computer Science, 2007.
Starbucks– issues in online mechanism design. In Proc. Fourth ACM Conference on Electronic Commerce (EC’03), pages 240–241, 2003.
approach to Online Mechanism Design. In Proceedings of the 17th Annual
incentive compatible on-line auctions. Theoretical Computer Science, 310:159–180, 2004. Earlier version in ACMEC 2000.
110
In Proceedings of the ACM Conference on Electronic Commerce (EC’04), pages 61–70, 2004.
Dynamic Revenue Maximization with Heterogeneous Objects: A Mechanism Design Approach. Forthcoming in American Economic Journal: Microeconomics.
for the sequential design of experiments. In In Progress in Statistics, pages 241–266. J. Gani et al., 1974.
metade-liberation auctions. In Proceedings of the 26th Annual Conference
Correcting Sampling-Based Dynamic Multi-Unit Auctions. In the 10th ACM Electronic Commerce Conference (EC'09), 2009
Blumrosen, and Aaron Roth. Auctions with Online Supply. In Fifth Workshop
111