a tutorial on dynamic mechanism design
play

A tutorial on: Dynamic Mechanism Design Ruggiero Cavallo - PowerPoint PPT Presentation

A tutorial on: Dynamic Mechanism Design Ruggiero Cavallo University of Pennsylvania Department of Computer and Information Science July 7, 2009 ACM EC The setting Sequence of decisions to be made impacting utility experienced by a


  1. Within-period ex post incentive compatibility If all other agents report types truthfully in the future, no agent can benefit from misreporting type – regardless of what the joint type is and regardless of what came before. No incentive to deviate even if agents know everything one can know – without being able to see the future. 39

  2. This is the gold standard • In a dynamic setting, agents needs to make predictions about the future in determining how to maximize utility – and this requires positing some behavior for other agents. • Weaker than dominant strategy. • But if others’ future types were irrelevant to the agent’s utility, incentives couldn’t possibly be aligned. 40

  3. Bayes-Nash equilibrium Given distribution over other agents’ types, no agent can expect to gain from deviating if others don’t. Within-period ex post also involves expectation, but expectation is over uncertain type transitions, not current types. 41

  4. Mechanism desiderata • Efficiency: social-welfare maximizing decisions achieved in equilibrium. • Individual rationality: no agent expects to lose from participating. Within-period ex post: at every time-step, for every joint type. Ex ante: from beginning of the mechanism, for whatever the joint type is then. • Budget-balance / no-deficit. 42

  5. By the way... • A dynamic analog of the revelation principle holds [Myerson, 1986]. • So we can think only about direct revelation mechanisms, without loss of generality. 43

  6. Some solutions so far 44

  7. A basic efficient dynamic mechanism • Dynamic team mechanism [Athey & Segal, 07] Follows efficient policy given agent reports. In each period, pays each agent the expected immediate value obtained by other agents given reported types (“Groves payment”). 45

  8. Dynamic team mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 20 20 0 * 20 0 1/4 3/4 Q M N P O 0 200 C B A 0 3/4 0 0 0 1/2 1/4 1/2 60 20 I G H F E D 46

  9. Dynamic team mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 20 20 0 * 20 0 1/4 3/4 Q M N P O 0 200 optimal policy ( γ close to 1): C B A 0 * → blue 3/4 0 0 0 1/2 1/4 AJ → red or blue 1/2 60 20 BJ → red or blue I G H F E D CK → red CL → blue 47

  10. Dynamic team mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 20 20 0 * 20 0 1/4 3/4 Q M N P O 0 200 C B A 0 3/4 0 0 0 1/2 • T 1 (*) = 0, T 2 (*) = 0 1/4 1/2 60 20 I G H F E D 48

  11. Dynamic team mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 20 20 0 * 20 0 1/4 3/4 Q M N P O 0 200 C B A 0 3/4 0 0 0 1/2 • T 1 (*) = 0, T 2 (*) = 0 1/4 1/2 60 20 • T 1 (CL) = 100, T 2 (CL) = 0 I G H F E D 49

  12. Dynamic team mechanism Theorem: The dynamic team mechanism is truthful and efficient in within-period ex post Nash equilibrium. [Athey & Segal, 07] 50

  13. Dynamic-Groves mechanism class • Follows efficient policy given agent reports; defines payments such that: Each agent’s expected sum of payments when he follows strategy σ equals the expected value other agents obtain when he follows σ , minus some quantity independent of σ . 51

  14. Dynamic-Groves mechanism class Theorem: Every dynamic-Groves mechanism is truthful and efficient in within-period ex post Nash equilibrium. [Cavallo, Parkes, & Singh, 07] Proof: Each agent obtains social utility (aligns incentives) minus some constant (doesn’t distort). 52

  15. Dynamic-Groves: all efficient mechanisms Theorem: For unrestricted types, the dynamic- Groves class exactly corresponds to the history- independent dynamic mechanisms that are truthful and efficient in within-period ex post Nash equilibrium. [Cavallo, 08] For within-period ex post efficient (and history- independent) dynamic mechanism design, dynamic-Groves is the only game in town. 53

  16. Dynamic-Groves: all efficient mechanisms Theorem: For unrestricted types, the dynamic- Groves class exactly corresponds to the history- independent dynamic mechanisms that are truthful and efficient in within-period ex post Nash equilibrium. [Cavallo, 08] Generalizes [Green & Laffont, 77] (Groves class unique for static settings). Proof idea: If non-Groves, there is always some type for which incentives are sufficiently distorted from efficiency. 54

  17. Budget & participation • Given characterization theorem, if we demand efficiency in strongest sense, we know what the possibilities are. • Now pick mechanisms in class with desirable budget/participation properties. basic “team mechanism” won’t fly – extreme budget imbalance need to recover payments... 55

  18. Recovering payments: ex ante charge (EAC) Charge agents some quantity computed “ex ante” of anything they report. 56

  19. Recovering payments: ex ante charge (EAC) • At every time-step: Choose efficient decision given reported types. Make Groves payments. Charge each agent a quantity based only on the reported types of other agents in the first time-step : (1- γ ) times total value other agents would obtain, in expectation from beginning of mechanism, if policy optimal for them was chosen. T i ( θ t ) = r − i ( θ t − i , π ∗ ( θ t )) − (1 − γ ) V − i ( θ 0 − i ) [Cavallo, Parkes, & Singh, 06] 57

  20. Dynamic-EAC mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 20 20 0 * 0 1/4 3/4 Q M P N 0 200 • γ = 0.9 C B A • V 1 (* -2 ) = 50 + γ 10 = 59 0 10 30 • V 2 (* -1 ) = γ 90 = 81 0 • T 1 (*) = -8.1, T 2 (*) = -5.9 I G F E • T 1 (CL) = 100 - 8.1, T 2 (CL) = -5.9 • ... 58

  21. Recovering payments: ex ante charge (EAC) Theorem: The dynamic-EAC mechanism is truthful and efficient in within-period ex post Nash equilibrium, ex ante individual rational, and ex ante no-deficit. [Cavallo, Parkes, & Singh, 06] 59

  22. Weak IR and budget- balance properties • With dynamic-EAC scheme agents will “sign up” at beginning of mechanism, but may wish to back out... • Same for center. • Can we strengthen? 60

  23. Dynamic-VCG [Bergemann & Valimaki, 08] • At each time-step, pay each agent i the expected value other agents would obtain if i were ignored after one step, minus the value they’d obtain if i were always ignored. Each agent has to pay the amount he inhibits other agents from obtaining value (now and in the future) by his current report. 61

  24. Dynamic-VCG [Bergemann & Valimaki, 08] • At each time-step, pay each agent i the expected value other agents would obtain if i were ignored after one step, minus the value they’d obtain if i were always ignored. T i ( θ t ) = r − i ( θ t − i , π ∗ ( θ t )) + γ E [ V − i ( τ ( θ t − i , π ∗ ( θ t )))] − V − i ( θ t − i ) 62

  25. Dynamic-VCG mechanism example * Agent 2 9/10 20 0 1/10 0 L J K 100 Agent 1 0 20 0 * 0 1/4 3/4 Q M P N 0 200 • γ = 0.9 C B A • T 1 (*) = γ 90 - γ 90 0 10 30 0 • T 2 (*) = γ 30 - (50 + γ 10) • T 1 (CL) = 100 - 100 I G F E • T 2 (CL) = 0 - 30 • ... 63

  26. Dynamic-VCG [Bergemann & Valimaki, 08] T i ( θ t ) = r − i ( θ t − i , π ∗ ( θ t )) + γ E [ V − i ( τ ( θ t − i , π ∗ ( θ t )))] − V − i ( θ t − i ) • No payment to any agent in any period is positive. r − i ( θ t − i , π ∗ ( θ t )) + γ [ V − i ( τ ( θ t − i , π ∗ ( θ t )))] ≤ V − i ( θ t − i ) • Expected future payoff to every agent i, from any joint state, at any time t, is: V ( θ t ) − V − i ( θ t − i ) ≥ 0 (NB: assumes no negative values) 64

  27. Dynamic-VCG [Bergemann & Valimaki, 08] Theorem: The dynamic-VCG mechanism is truthful and efficient in within-period ex post Nash equilibrium, within-period ex post individual rational, and ex post no-deficit. 65

  28. Dynamic-VCG: good social-welfare? 100 % value retained by agents 90 dynamic-VCG 80 70 60 50 40 30 20 10 0 3 4 5 6 7 8 9 10 number of agents In a single-item allocation setting, with values normally distributed. 66

  29. Dynamic-VCG: good social-welfare? Theorem: Among all history-independent mechanisms that are efficient in within- period ex post Nash equilibrium and within- period ex post individual rational, dynamic- VCG yields the most expected revenue, for every joint type. [Cavallo, 08] 67

  30. Dynamic-VCG: good social-welfare? • Since dynamic-VCG can be so bad for the agents, what do we do? • Think back to the static setting... better budget balance was achieved by redistribution mechanisms; strong budget- balance by moving to Bayes-Nash equilibrium. 68

  31. A dynamic redistribution mechanism? • Redistribution much more complicated in the dynamic setting. Now redistribution payment computed in later time periods can potentially be influenced via an agent’s reports in earlier periods... in subtle ways. Focus on worlds representable as multi- armed bandits. 69

  32. Dynamic-VCG for MABs reduces to: • Determine optimal agent i to activate. i pays (1- γ ) times the expected value other agents would get if i were always ignored. Other agents pay nothing. 70

  33. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the expected value other agents would get if he were always ignored. • Other agents pay nothing. 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 71

  34. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the T 1 = -(1- γ ) (10 + γ 10) expected value other agents would get if he were always ignored. • Other agents pay nothing. 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 72

  35. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the T 1 = -(1- γ ) (10 + γ 10) expected value other agents would get if he were always ignored. • Other agents pay nothing. 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 73

  36. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the T 1 = -(1- γ ) (10 + γ 10) expected value other agents would T 2 = -(1- γ ) 7.5 get if he were always ignored. • Other agents pay nothing. 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 74

  37. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the T 1 = -(1- γ ) (10 + γ 10) expected value other agents would T 2 = -(1- γ ) 7.5 get if he were always ignored. • Other agents pay nothing. 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 75

  38. Dynamic-VCG for MABs: • Winner pays (1- γ ) times the T 1 = -(1- γ ) (10 + γ 10) expected value other agents would T 2 = -(1- γ ) 7.5 get if he were always ignored. • Other agents pay nothing. T 2 = -(1- γ ) 7.5 1/2 1/2 1/2 1/2 0 30 0 20 3/4 1/4 30 0 20 0 30 76

  39. Dynamic-RM for MABs [Cavallo, 08] • Modify dynamic-VCG by adding the following payments to the agents each period: For agent i receiving item: (1- γ )/n times the expected total discounted revenue that would result if i were ignored going forward. For every other agent j : 1/n times the expected immediate revenue that would have resulted this period if j were ignored. 77

  40. Dynamic-RM for MABs [Cavallo, 08] Lemma: Whatever strategy an agent follows, his expected redistribution payments over time equal: a 1/n share of the expected total (over time) revenue that would result if the agent were not present. (This is the hard part to prove. Once we have, it follows that dynamic-RM is a dynamic-Groves mechanism, and thus efficient.) 78

  41. Dynamic-RM for MABs [Cavallo, 08] Theorem: Dynamic-RM is efficient in within- period ex post Nash equilibrium, within- period ex post IR, and never runs a deficit . And yields significantly more value for the agents than dynamic-VCG. Examples with three or more agents are tough to illustrate, so let’s just look at aggregate results: 79

  42. Value retained: normal distribution 100 90 % value retained by agents dynamic-RM 80 dynamic-VCG 70 60 50 40 30 20 10 0 3 4 5 6 7 8 9 10 number of agents 80

  43. Value retained: uniform distribution 100 90 % value retained by agents dynamic-RM 80 dynamic-VCG 70 60 50 40 30 20 10 0 3 4 5 6 7 8 9 10 number of agents 81

  44. budget- efficiency IR balance team w.p. ex post w.p. ex post huge deficit mechanism dynamic-EAC w.p. ex post ex ante ex ante no-deficit dynamic-VCG w.p. ex post w.p. ex post ex post no-deficit dynamic-RM ex post no-deficit, w.p. ex post w.p. ex post (only for MABs) much closer to perfect BB balanced- Bayes-Nash ex ante perfect mechanism 82

  45. (Balanced team mechanism presented by Susan Athey) 83

  46. Extensions 84

  47. Dynamically changing populations of agents • What’s new: agents may – either temporarily or permanently – become “inaccessible”, i.e., unable to communicate with the center or make/receive payments. • Generalizes arrival/departure dynamics. 85

  48. For instance: • Imagine selling theater tickets to tourists who plan to see multiple shows over a period of days. New tourists always arriving, others leaving (dynamic population). A tourist may see a show, realize she likes the theater more/less (dynamic types). 86

  49. Related area: online mechanism design • Dynamic population (arrivals and departures), but static types – all private information an agent will ever obtain can be reported in arrival period. [Friedman & Parkes, 03] [Parkes & Singh, 03] [Lavi & Nisan, 04] [Porter, 04] 87

  50. Online-VCG mechanism [Parkes & Singh, 03] • Collects a single payment from each agent in her “arrival period”. Within-period ex post efficient. Ex post individual rational Ex post no-deficit. 88

  51. Online-VCG mechanism [Parkes & Singh, 03] • Collects a single payment from each agent in her “arrival period”. Within-period ex post efficient. Ex post individual rational Ex post no-deficit. But only for static types. 88

  52. Dynamic populations, dynamic types [Cavallo, Parkes, & Singh, 07] • Unifies dynamic mechanism design and online mechanism design. • The new challenges: Optimal policy must consider accessibility/ inaccessibility dynamics Agents may not be available for payment while still exerting influence on welfare of other agents. 89

  53. → 1 → 1 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. 90

  54. • Imagine agent 1 accessible at t = 1, and agent 2 inaccessible at t = 1 but very likely to become accessible at t = 2. → 1 → 1 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. 90

  55. • Imagine agent 1 accessible at t = 1, and agent 2 inaccessible at t = 1 but very likely to become accessible at t = 2. • In “naive” dynamic-VCG mechanism, agent 1 better off “hiding” to improve social-welfare. → 1 → 1 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. 90

  56. • Imagine agent 1 accessible at t = 1, and agent 2 inaccessible at t = 1 but very likely to become accessible at t = 2. • In “naive” dynamic-VCG mechanism, agent 1 better off “hiding” to improve social-welfare. • In non-naive mechanism that makes dynamic-VCG payments only to accessible → 1 → 1 agents, agent 2 can benefit by hiding. 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. 90

  57. A fix • For any inaccessible agent, keep log of payments dynamic-VCG would impose on agent; when the agent becomes accessible, execute “lump sum” payment, appropriately scaled for discounting. • Requires that all agents eventually “come back”. 91

  58. Imagine both agents accessible in all periods. Should agent 2 feign → 1 → 1 inaccessibility until t = 2? 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. 92

  59. Imagine both agents accessible in all periods. Should agent 2 feign → 1 → 1 inaccessibility until t = 2? 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. T 2 = -6 - 2 = -8, same whether he hides at t = 1 or not. 92

  60. Imagine both agents accessible in all periods. Should agent 2 feign → 1 → 1 inaccessibility until t = 2? 4 → 2 E G 0 . 2 D → 1 → 1 → 1 → 1 8 2 0 . 8 20 → 2 → 2 A B C F H t = 1 t = 2 t = 3 t = 1 t = 2 t = 3 (a) Agent 1’s Type. (b) Agent 2’s Type. T 2 = -6 - 2 = -8, same whether he hides at t = 1 or not. difference in optimal value difference in optimal value for agent 1, with and without for agent 1, with and without agent 2 present at t=2 agent 2 present at t=1 92

  61. What if agents don’t always come back? • In general, the scheme won’t work. • For an arrival/departure model, within- period ex post efficiency is recovered if agent arrivals are independent conditioned on actions chosen. 93

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend