ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - - PowerPoint PPT Presentation
ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - - PowerPoint PPT Presentation
ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi Outline Finitely and infinitely repeated games w/ and w/o perfect monitoring Trigger strategies Folk theorems Readings:
Outline
- Finitely and infinitely repeated games w/ and w/o perfect monitoring
- Trigger strategies
- Folk theorems
- Readings:
- MAS Sec. 6.1, GT Sec. 5.1 and 5.5
Finitely Repeated Games (with Perfect Monitoring)
- In repeated games, stage game π» is played by same agents for R rounds
- Agents discount utilities by discount factor 0 β€ π β€ 1
- Game is denoted by π»& π
- At each round, outcomes of all past rounds are observed by all agents
- Agentsβ overall utility is sum of discounted utilities at each round
- Given sequence of utilities π£(
) , β¦ , π£( &
- In general, strategies at each round could depend on history of play
- Memory-less (also called stationary) strategies are special cases
ui =
R
X
r=1
Ξ΄rβ1u(r)
i
<latexit sha1_base64="praSL3IReXNHTQC1lxMvD0NwPxQ=">ACFHicbVDLSgMxFM3UV62vqks3wSJUxDJTxQdYqLhxWcU+oNMOmTRtQzMPkoxQhvkIN/6Cn+DGhSJuBd35A278BRemnSJqPRA4nHMuN/fYPqNC6vqblpiYnJqeSc6m5uYXFpfSysV4QUckzL2mMdrNhKEUZeUJZWM1HxOkGMzUrV7JwO/ekm4oJ57Ifs+aTio49I2xUgqyUpvBRaFBWiKwLFCXjCiZngemS3CJGqGfNuIoAo0wyzfjFJWOqPn9CHgODFGJFM8frn5zH+8l6z0q9nycOAQV2KGhKgbui8bIeKSYkailBkI4iPcQx1SV9RFDhGNcHhUBDeU0oJtj6vnSjhUf06EyBGi79gq6SDZFX+9gfifVw9k+6ARUtcPJHFxvKgdMCg9OGgItignWLK+Ighzqv4KcRdxhKXqMS7hcIC975PHSWfM3Zyu2eqjSMQIwnWwDrIAgPsgyI4BSVQBhcgVtwDx60a+1Oe9Se4mhCG82sgl/Qnr8ACui1Q=</latexit>Example: Finitely-Repeated Prisonersβ Dilemma
- Suppose that Prisonersβ Dilemma is played in R (< β) rounds
- What is SPE of this game?
- We can use backward induction
- Starting from last round, (C, C) is dominant strategy
- Regardless of history, (C, C) is dominant strategy at each round
- There exists unique SPE which is (C, C) at each round
Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2)
SPE in Finitely Repeated Games
[Theorem]
- If stage game π» has unique pure strategy equilibrium π‘β, then π»& π has
unique SPE in which π‘ 0 = π‘β for all π = 1, β¦ , π, regardless of history [Proof]
- By backward induction, at round π, we have π‘ & = π‘β
- Given this, then we have π‘ &4) = π‘β, and continuing inductively, π‘ 0 = π‘β
for all π = 1, β¦ , π, regardless of history
Infinitely Repeated Games
- Infinitely repeated play of π» with discount factor π is denoted by π»5 π
- Agentsβ utility is average of discounted utilities at each round
- For π < 1, given sequence of utilities π£(
) , β¦ , π£( 5
- For π = 1, given sequence of utilities π£(
) , β¦ , π£( 5
ui = lim
Rββ
PR
r=1 u(r) i
R
<latexit sha1_base64="UD7+EuZcQV4tNkSs84mcnMkcsZI=">ACLHicbVBLSwMxGMz6rPV9egl+AC9lF0VrYdCwYvHWqwK3bpk02wbms0uybdKWfYHefGvCOJBEa/evHoSNG1FfA0EJjPfkHzjx4JrsO0Ha2R0bHxiMjeVn56ZnZsvLCye6ChRlNVpJCJ15hPNBJesDhwEO4sVI6Ev2KnfPej7pxdMaR7JY+jFrBmStuQBpwSM5BUOEo/jMnYFD7205ire7gBRKrELpcB9DLsBorQ1NWJGVBlJztPa5kJnacbajPLzCXvFVbtoj0A/kucT7Jacd7fVtZeXqte4dZtRTQJmQqiNYNx46hmRIFnAqW5d1Es5jQLmzhqGShEw308GyGV43SgsHkTJHAh6o3xMpCbXuhb6ZDAl09G+vL/7nNRISs2UyzgBJunwoSARGCLcbw63uGIURM8QhU3f8W0Q0w5YPodlrDfx+7Xyn/JyVbR2S7uHJk2SmiIHFpGK2gDOWgPVdAhqI6ougK3aB79GBdW3fWo/U0HB2xPjNL6Aes5w+Qyq2e</latexit>ui = (1 β Ξ΄)
β
X
r=1
Ξ΄rβ1u(r)
i
<latexit sha1_base64="HaS4GJK/zsYRK2PrnJxz5C2EcQ8=">ACI3icbVDLSsNAFJ34rPVdelmUIS6sCQqvlBQ3LhUsCo0bZhMJ3XoZBJmboQS8i9uxD9x48IHbly4deMvuHDaiPg6MHA451zu3OPHgmuw7Wer39gcGi4MFIcHRufmCxNTZ/oKFGUVWkInXmE80El6wKHAQ7ixUjoS/Yqd/e7/qnF0xpHslj6MSsHpKW5AGnBIzklbYSj+MdXHaW3CYTQBZdnYReqnacrJG6XAbQyXKnkaolJ8Mm30jLajEreqV5u2L3gP8S5PM7+49XL8v70eqVHtxnRJGQSqCBa1xw7hnpKFHAqWFZ0E81iQtukxWqGShIyXU97N2Z4wShNHETKPAm4p36fSEmodSf0TIkcK5/e13xP6+WQLBRT7mME2CS5ouCRGCIcLcw3OSKURAdQwhV3PwV03OiCAVTa17CZhdrXyf/JSfLFWelsnpk2thGOQpoFs2hMnLQOtpFB+gQVRFl+gG3aF768q6tR6tpzaZ3OzKAfsF4+ADl7qLQ=</latexit>Trigger Strategies (TS)
- Agents get punished if they deviate from agreed profile
- In non-forgivingTS (or grim TS), punishment continues forever
- Here, π‘β is agreed profile, and π‘(
6 is punishment strategy of π for agent π
- Single deviation by π trigers agent π to switch to π‘(
6, forever
s(t)
i
= ( sβ
i
if s(r) = sβ, 8r < t sj
i
if s(r)
j
6= sβ
j, 9r < t
<latexit sha1_base64="J6hXRWqJpSkiJ6eFsJQRNSPICXs=">ACcnicbVFdaxQxFM1Mq9b1a1V8UdCri/aDsy24AdsoeCLjxXctrBZh0z2zjZtJjMkd6TLMC+fd89U/4IuvgpnZRaz1QuDk3HOS3JOk0MpRFH0LwpXVK1evrV3v3Lh56/ad7t17hy4vrcSRzHVujxPhUCuDI1Kk8biwKLJE41Fy9rbpH31C61RuPtC8wEkmZkalSgryVNz94mL1sdqgzXqPJzhTpL+NFd3Gn4LXnDCc6pUCjU4r7ObNex5tLUNPM2t0BrskDhv5acX5fHp0sANtrvGhOd+JgcWhtDYOJrp8sa424v6UVtwGQyWoLe/vlX+X1VH8Tdr3yayzJDQ1IL58aDqKBJSwpqbHu8NJhIeSZmOHYQyMydJOqjayG56Zgh/BL0PQsn87KpE5N8Sr8wEnbh/ew35v964pPT1pFKmKAmNXFyUlhohyZ/mCqLkvTcAyGt8m8FeSKskOR/qdOG8Kapl39GvgwOd/qD3f7ue5/GkC1qjT1iz9gG7BXbJ+9YwdsxCT7ETwIHgdPgp/hw/Bp2FtIw2Dpuc8uVLj9G3Jbvms=</latexit>Trigger Strategies in Repeated Prisonersβ Dilemma
- Suppose both agents use following trigger strategy
- Play S unless someone has ever played C in past
- Play C forever if someone has played C in past
- Under what conditions is this SPE?
- We use one-stage deviation principle
- Step 1: (S is best response to S)
- Utility from S: β 1 β π
1 + π + π; + β― = β 1 β π / 1 β π = β1
- Utility from C: β 1 β π
0 + 2π + 2π; + β― = β2π 1 β π / 1 β π = β2π
- S is better than C if π β₯ 1/2
- Step 2: (C is best response to C)
- Other agents will always play C, thus C is best response
Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2)
Remarks
- Cooperation is equilibrium, but so are many other strategy profiles
- If π‘β is NE of π», then βeach agent plays π‘(
ββ is SPE of π»& π
- Future play of other agents is independent of how each agent plays
- Optimal play is to maximize current utility, i.e., play static best response
- Sets of equilibria for finite and infinite horizon versions can be different
- Multiplicity of equilibria in repeated prisonerβs dilemma only occurs at π = β
- For any finite π (thus for π β β), repeated prisonersβ dilemma has unique SPE
TS in Finitely Repeated Games
- If π» has multiple equilibria, then π»&(π) does not have unique SPE
- Consider following example
- Stage game has two pure NE: (y, y) and (z, z)
- Socially optimal outcome, (x, x), is not equilibrium
- In twice repeated play, we can support (x, x) in first round
Agent 2 Agent 1 x y z x (3, 3) (0, 4) (-2, 0) y (4, 0) (1, 1) (-2, 0) z (0, -2) (0, -2) (-1, -1)
TS in Finitely Repeated Games (cont.)
- TS strategy
- Play x in first round
- Play y in second round if opponent played x; otherwise, play z
- We can use one-shot deviation principle
- For simplicity, suppose π = 1
- Playing x first and y next leads to utility of 4
- Playing y first triggers opponent to play z next, which leads to utility 3
- Deviation is not profitable!
Repetition Can Lead to Bad Outcomes
- Consider this game
- Strategy x strictly dominates y and z for both agents
- Unique Nash equilibrium of stage game is (x, x)
- If π β₯ 1/2, this game has SPE in which (y, y) is played in every round
- It is supported by slightly more complicated strategy than grim trigger
- I. Play y in every round unless someone deviates, then go to II
- II. Play z. If no one deviates go to I. If someone deviates stay in II
Agent 2 Agent 1 x y z x (2, 2) (2, 1) (0, 0) y (1, 2) (1, 1) (-1, 0) z (0, 0) (0, -1) (-1, -1)
Feasible and Individually Rational Utilities
- π = Convex hull of π€ β β β
there exists π‘ β π such that π£ s = π€
- Utility in repeated game is just a weighted average of utilities in stage game
- Note that π β π€ β β β
there exists π β Ξ£ such that π£ π = π€
- Recall minmax value of agent π
- Also recall minmax strategy against π
- Utility vector π€ β β β is strictly individually rational if π€( > π€(, βπ
vi = min
Οβi max si
ui(si, Οβi)
<latexit sha1_base64="nKWPzLxWbg6YlOqMulrJ9pCOQ7c=">ACLXicbVDLSgMxFM34tr6qLrsJFkFBy4yCVlAQdOFSwWqhU0Imk9ZgkhmSTLEM8yH+ghu/wr0ILiqiS3/DTCtSHwcCh3Pu4eaeIOZMG9ftOSOjY+MTk1PThZnZufmF4uLShY4SRWiNRDxS9QBrypmkNcMp/VYUSwCTi+D6Pcv+xQpVkz03pk2B25K1GMHGSqh47CcypCqPp50MXgAfcEkSn3N2gKjdJNlmZXwDUo1YhlMEFuzZGPIX0fFsltx+4B/ifdFyofV9vSQz08RcUnP4xIqg0hGOtG54bm2aKlWGE06zgJ5rGmFzjNm1YKrGgupn2r83gqlVC2IqUfdLAvjqcSLHQuisCOymwudK/vVz8z2skplVtpkzGiaGSDBa1Eg5NBPqYMgUJYZ3LcFEMftXSK6wsTYgv9EvZy7Hyf/JdcbFW87cr2mW1jHwBUpgBawBD+yCQ3ACTkENEHAHkEPvDj3zrPz6rwNRkecr8wy+AHn4xP/gaz1</latexit>mi
βi = arg min Οβi max si
ui(si, Οβi)
<latexit sha1_base64="IkDCELqS51Mqf/r7UGub8peXdw=">ACLXicbVDLSgMxFM34tr6qLt0Ei6CgZWrBygIunCpYGuhU4c7adqGJpkhyYhlmA/xF9z4Fe5FcKGILv0N06mIrwOBwzncnNPEHGmjes+OUPDI6Nj4xOTuanpmdm5/PxCVYexIrRCQh6qWgCaciZpxTDaS1SFETA6XnQPez75dUaRbKM9OLaENAW7IWI2Cs5OePhJ9sPQiYSnex6oNvYEk37iadYWkJmpleDKT7RvQ7HPVi1Z/+av+fmCW3Qz4L+k9EkKBztv10t3teaJn3/wmiGJBZWGcNC6XnIj0hAGUY4TXNerGkEpAtWrdUgqC6kWTXpnjFKk3cCpV90uBM/T6RgNC6JwKbFGA6+rfXF/z6rFp7TQSJqPYUEkGi1oxybE/epwkylKDO9ZAkQx+1dMOqCAGFtwLitht4+tr5P/kupmsVQulk9tG3togAm0hJbRKiqhbXSAjtEJqiCbtA9ekLPzq3z6Lw4r4PokPM5s4h+wHn/A8IrGw=</latexit>Example
- What is minimax value of agent 1?
- Let π denote probability that agent 2 chooses L
- What is minimax value of agent 2?
- Let π and π denote probabilities that agent 1 chooses U and M, respectively
- Agent 2
Agent 1
L R U (-2, 2) (1, -2) M (1, -2) (-2, 2) D (0, 1) (0, 1)
m1
2 β [1/3, 2/3]
<latexit sha1_base64="l9x9P5BZwbW6uZ9IWcvwmWhPos=">AB/HicbVDLSsNAFJ34rPVLbhxM1gEF9ImLdS6K7hxWcE+oI1lMp20QyeTMDMRQqjf4Uo3LhRx64e4E/0YJ2kRtR64cDjnXu69xwkYlco0342FxaXldXMWnZ9Y3NrO7ez25J+KDBpYp/5ouMgSRjlpKmoYqQTCI8h5G2Mz5L/PY1EZL6/FJFAbE9NOTUpRgpLfVzea9fvrJgj3LYtUqV43KpYvdzBbNopoDzxJqRQn0vuLv9/Kg1+rm3sDHoUe4wgxJ2bXMQNkxEopiRibZXihJgPAYDUlXU48Iu04PX4CD7UygK4vdHEFU/XnRIw8KSP0Z0eUiP510vE/7xuqNyaHVMehIpwPF3khgwqHyZJwAEVBCsWaYKwoPpWiEdIKx0Xtk0hNME1e+X50mrXLQqxcqFTqMKpsiAfXAjoAFTkAdnIMGaAIMInAPHsGTcWM8GM/Gy7R1wZjN5MEvGK9fb+GWhg=</latexit>v2 = min
0β€p,qβ€1 max {p β 3q + 1, β3p + q + 1} = 0
<latexit sha1_base64="+0pmj/aIXLiQ575swn642MAKcjw=">ACMHicbVBbSxtBGJ31buol2kf7MHgBQ27WYj6IBUs1EcFkwjZsMxOvuiQ2dnJzKwYlu1j/01f+lP0xUKL+Np/0DdnEyneDgxzOc7zHwnkpxp47q/nLHxicmp6ZnZ0oe5+YXF8tJyQyepolCnCU/UeUQ0cCagbpjhcC4VkDji0Ix6R4XfvAKlWSLOzEBCOyYXgnUZJcZKYflrkIoOqCKeXeVhFR/gIGYizNyA5bfVzcXm5Vco2/BRmWO35/y9vGO7csgQHuc24YXnNrbhD4LfEeyJrh/73f1/WP30+Ccs3QSehaQzCUE60bnmuNO2MKMoh7wUpBokoT1yAS1LBYlBt7PhwjnesEoHdxNljzB4qD5PZCTWehBHdjIm5lK/9grxPa+Vmu5eO2NCpgYEHT3UTk2CS7awx2mgBo+sIRQxexfMb0kilBjOy4NS9gvUPu/8lvSqFY8v+Kf2jZqaIQZtIJW0Sby0C46RMfoBNURT/QLfqN/jg/nTvn3nkYjY45T5mP6AWcv4/tai4</latexit>m2
1 = (1/2, 1/2, 0)
<latexit sha1_base64="5ercTrqmLlp0bS5/7x25oeOwk/Y=">AB/XicbVDLSsNAFJ3UV62v+MCNm8EiVJCatFDrQi4cVnBPqCNYTKdtEMnD2YmQi3F3Alblwo4tb/cCf6MU6aImo9cOFwzr3ce48TMiqkYbxrqZnZufmF9GJmaXldU1f36iLIOKY1HDAt50kCM+qQmqWSkGXKCPIeRhtM/jf3GFeGCBv6FHITE8lDXpy7FSCrJ1rc827wswBOYMw8LB3EZ+xlbzxp5Yw4TcwJyVa2w7vbz49y1dbf2p0ARx7xJWZIiJZphNIaIi4pZmSUaUeChAj3UZe0FPWR4Q1HF8/gntK6UA34Kp8Ccfqz4kh8oQYeI7q9JDsib9eLP7ntSLplq0h9cNIEh8ni9yIQRnAOArYoZxgyQaKIMypuhXiHuISxVYEsJxjNL3y9OkXsibxXzxXKVRAgnSYAfsghwRGogDNQBTWAwTW4B4/gSbvRHrRn7SVpTWmTmU3wC9rFxXqlZY=</latexit>v1 = min
0β€qβ€1 max {1 β 3q, β2 + 3q, 0} = 0
<latexit sha1_base64="PyJL4QW9G7YivbFjEisW0PojKJM=">ACL3icbVDLSgMxFM34tr5aXboJiCoZal6EYoCOJSwarQKUMmvbWhmcw0yRTLMP6MWzd+g3/gRkQRt+4VXJpXfg6kNzDufeQ3ONHnClt2w/WyOjY+MTk1HRuZnZufiFfWDxRYSwp1GjIQ3nmEwWcCahpjmcRJI4HM49Tt7Wf+0B1KxUBzrfgSNgJwL1mKUaCN5+X03Fk2QmT3pZ6Dd7EbMOEltsBd3F2O6nRyAW+dBPsbJW7m3irtJEVG7upMdi5nJdftYv2APgvcb7IarV0dftReHs/9PJ3bjOkcQBCU06Uqjt2pBsJkZpRDmnOjRVEhHbIOdQNFSQA1UgG+6Z4zShN3AqlOULjgfrdkZBAqX7gm8mA6Lb63cvE/3r1WLd2GgkTUaxB0OFDrZhjHeIsPNxkEqjmfUMIlcz8FdM2kYRqE3EWgvN75b/kpFR0ysXKkUmjgoaYQstoBa0jB2jKjpAh6iGKLpGd+gRPVk31r31bL0MR0esL8S+gHr9RNxQalQ</latexit>Minmax Utility Lower Bounds
[Theorem]
- If πβ is NE of π», then π£( πβ β₯ π€(
- If πβ is NE of π»5 π , then π£( πβ β₯ π€(
[Proof]
- Agent π can always guarantee herself π€( in stage game and also in each
round of repeated game, meaning that she can always achieve at least this utility against even most adversarial opponents
Nash Folk Theorem
[Nash Folk Theorem]
- If π€ is feasible and strictly individually rational, then there exists π < 1,
such that for all π > π, π»5 π has NE with utilities π€ [Proof]
- Suppose for simplicity that there exists pure strategy profile π‘β such that
π£( π‘β = π€( (otherwise, proof is more involved)
- Consider following grim trigger strategy for agent π
- Play π‘( as long as no one deviates
- If some agent π deviates, then play π(
6 thereafter
- If π plays π‘, her utility is π€(
Proof of Nash Folk Theorem
- We can use one-shot deviation principle
- Suppose π deviates from π‘ in round π
- Define Μ
π€( = max
VW π£( π‘(, π‘4( β
- We have
- Following π‘β will be optimal if
- This means, π‘β is NE of π»5 π if
ui β€ (1 β Ξ΄)(vi + Ξ΄vi + Β· Β· Β· + Ξ΄rβ1vi + Ξ΄rΒ― vi + Ξ΄r+1vi + Ξ΄r+2vi + . . . )
<latexit sha1_base64="t7ZPoC2fNr21i8lTgqgbVjGKfM8=">ACg3icbVFdS+NAFJ1EXbW6a9WXBV+GLS6VYk2q+PGwUPTFR4WtCk03TCa3OjiZhJmbQg0Ff4c/w5+ybwv7Y3aSirjWA8OcOede7p17o0wKg573x3Hn5hc+LS4t1ZWP39Zq69vXJk01x6PJWpvomYASkU9FCghJtMA0siCdfR/VnpX49AG5GqnzjOYJCwWyWGgjO0Ulh/ykNBAwm06e8GMUhkO06CkVr+igpbdEgTtFUdyn+KvSuP3m1KknTIGK6GE3eioVu2bgVzHosFZu/OxXVbCesNr+1VoLPEfyGN7teHv8uPz6cXYf23zeR5Agq5ZMb0fS/DQcE0Ci5hUgtyAxnj9+wW+pYqloAZFNUMJ3TbKjEdptoehbRS32YULDFmnEQ2MmF4Z957pfiR189xeDwohMpyBMWnhYa5pJjSciE0Fho4yrEljGthe6X8jmnG0a6tVg3hpMTh65dnyVWn7e+3Dy79RveYTLFEtsg30iQ+OSJdck4uSI9whzjfnT3HcxfcltxD6ahrvOSs0n+g/vjH2ZbwkU=</latexit>vi β₯ (1 β Ξ΄r)vi + Ξ΄r(1 β Ξ΄)Β― vi + Ξ΄r+1vi = vi β Ξ΄r(vi β Β― vi + Ξ΄(Β― vi β vi))
<latexit sha1_base64="Xn37rhesk6srf/MvyUebuO/pAf8=">ACkHicbVFdT9swFHXCYFC+OvbIyxUVqBVqlQw0OmloTHtBPIFEAakplePcFgvHiWynUhXl9+z/7G3/Zk4oZVCOZOn4nHuvr+8NU8G18by/jrv0YXnl4+pabX1jc2u7/mnRieZYthjiUjUXUg1Ci6xZ7gReJcqpHEo8DZ8/FX6txNUmify2kxTHMR0LPmIM2qsNKz/ngw5HEAwRmj67SBCYei9aln1EJ5vc6MFQUhVPilsztzO1aFfQJDJCFXZRWUHQe0ATqEs3n4JLKD5LC3Uab5I7bfVWtAa1htex6sAi8SfkQaZ4XJY/xNECctilIYJqnXf91IzyKkynAksakGmMaXskY6xb6mkMepBXg20gH2rRDBKlD3SQKX+n5HTWOtpHNrImJoH/dYrxfe8fmZG3UHOZoZlOzpoVEmwCRQbgcirpAZMbWEMsVtr8AeqKLM2B3WqiF8K/F1/uVFcvOl4x91jq+OG2fd2ThWyS7ZI03ikxNyRs7JekR5mw6R85359TdcbvuD/fnU6jrzHI+k1dwL/4BlqHDcA=</latexit>Ξ΄ β₯ Ξ΄ = max
i
Β― vi β vi Β― vi β vi
<latexit sha1_base64="m7LMG5noPteaUI5TqX5kMr6yQxk=">ACOnicbVDLSsNAFJ34tr6qLkUIiuDGkqpUXQiCG5cKVoWmhJvJTR2cTMLMpFhCfkBwKfgnbvwKdy7cuFDErR/gpBWpjwMD57z8zc4yecKe04j9bA4NDwyOjYeGlicmp6pjw7d6LiVFKs05jH8swHhZwJrGumOZ4lEiHyOZ76F/tF/7SNUrFYHOtOgs0IWoKFjI2klc+cgPkGtwWuqkIUBb3ZD0t3UjuPSYG0qgmeuDzNq5x9baHsv7yj5fUededmpOF3Yf0n1iyzv1W4WO9dXt4de+cENYpGKDTloFSj6iS6mYHUjHLMS26qMAF6AS1sGCogQtXMuqvn9opRAjuMpTlC212135FBpFQn8s1kBPpc/e4V4n+9RqrD7WbGRJqFLT3UJhyW8d2kaMdMIlU84hQCUzf7XpOZiktEm71A1hp0Dte+W/5GS9Ut2obB6ZNLZJD2NkgSyRVIlW2SPHJBDUieU3JEn8kJerXvr2Xqz3nujA9aXZ578gPXxCShAtGo=</latexit>Problems with Nash Folk Theorem
- Any utility can be achieved when agents are patient enough
- NE involves non-forgiving TS which may be costly for punishers
- NE may include non-credible threats
- NE may not be subgame perfect
- Example:
- Unique NE in this game is (D, L)
- Minmax values are given by π€) = 0 and π€; = 1
- Minmax strategy against agent 1 requires agent 2 to play R
- R is strictly dominated by L for agent 2
Agent 2 Agent 1 L R U (6, 6) (0, -100) D (7, 1) (0, -100)
Subgame Perfect Folk Theorem
[Theorem]
- Let π‘β be NE of stage game π» with utilities π€β
- For any feasible utility π€ with π€( > π€(
β, βπ β β, there exists π < 1 such that for all π > π,
π»5 π has SPE with utilities π€
[Proof]
- Simply construct non-forgiving TS with punishment by static NE
- Punishments are therefore subgame perfect
- For π sufficiently close to 1, it is better for each agent π to obtain π€( rather than
deviate and get π€(
β forever thereafter
- This shows that any utility higher than NE utilities can be sustained as SPE
Repeated Games with Imperfect Monitoring
- At each round, all agents observe some public outcome, which is
correlated with stage game actions
- Let π§ β π denote publicly observed outcome
- Each strategy profile π‘ induces probability distribution over π§
- Let π π§, π‘ denote probability distribution of π§ under action profile π‘
- Public information at round π is β \ = π§ ) , β¦ , π§ \4)
- Strategy of agent π is sequence of maps π‘(
0 : β 0 β π(
Repeated Games with Imperfect Monitoring (cont.)
- Each agentβs utility depends only on her own action and public outcome
- Dependence on actions of others is through their effect on distribution of π§
- Agent πβs realized utility at round π is
- Given strategy profile π‘ 0 , agent πβs expected utility is
- Agent πβs average discounted utility for sequence {π‘_} is
ui(s(r)
i , y(r))
<latexit sha1_base64="/WomN2swvDj5/SQgYoFwOjeZFXs=">AB/3icbVDLSsNAFJ34rPUVFdx0M1iEFqQkWrQuhIblxXsA9oYJtNpO3TyYGYihNiFv+LGhSLd+hFu3Pk3TpIiaj1wuYdz7mXuHCdgVEjD+NQWFpeWV1Zza/n1jc2tbX1ntyX8kGPSxD7zecdBgjDqkakpFOwAlyHUbazvgy8dt3hAvqezcyCojloqFHBxQjqSRb3w9tWhI2vY1LvDw5irJetvWiUTFSwHlizkixXriodt7htGHrH72+j0OXeBIzJETXNAJpxYhLihmZ5HuhIAHCYzQkXU95BJhxen9E3iolD4c+FyVJ2Gq/tyIkStE5Dpq0kVyJP56ifif1w3loGbF1AtCSTycPTQIGZQ+TMKAfcoJlixSBGFO1a0QjxBHWKrI8mkI5wlOv78T1rHFfOkUr1WadRAhwogANQAiY4A3VwBRqgCTC4B4/gGbxoD9qT9qpNs9EFbazB35Be/sCW3+YEg=</latexit>ui(s(r)) = X
yβY
Ο(y(r), s(r))ui(s(r)
i , y(r))
<latexit sha1_base64="PVGEb3Mtn1chUAMq4swRsXUprB8=">ACL3icbVBbS8MwGE3nbc7b1EdfgkPYQEar4gURJoL4OMG5yVpLmqUaTNOSpEIp+0e+O6v2IuIr76L0y7Kd4OBA7nI8v3/EiRqUyzSejMDY+MTlVnC7NzM7NL5QXl85lGAtMWjhkoeh4SBJGOWkpqhjpRIKgwGOk7d0cZX7lghJQ36mkog4Abri1KcYKS25ePYpV5mVZFrV+DB7aMAzdNbMrhRd+OaDUZWufkTzu0pGafA65YpZN3PAv8QakUrjcP+hYjlR0y0P7F6I4BwhRmSsmuZkXJSJBTFjPRLdixJhPANuiJdTkKiHTS/N4+XNKD/qh0I8rmKvfJ1IUSJkEnk4GSF3L314m/ud1Y+XvOinlUawIx8NFfsygCmFWHuxRQbBiSYIC6r/CvE1EgrXEpL2Evw/bXyX/J+Ubd2qxvneo2dsEQRbACVkEVWGAHNMAJaIWwOAODMAzeDHujUfj1XgbRgvGaGYZ/IDx/gFMGape</latexit>ui = (1 β Ξ΄)
β
X
r=1
Ξ΄rβ1ui(s(r))
<latexit sha1_base64="BoJdR0BUt5cJ9+1row0IOZpCzk=">ACJXicbVBNSyNBFHwT3V3NfpjVoyDNhoXkDCji6uQgODFg4cIxgiZOjp9GhjT8/Q/UYIw/wZL/4Bz573sgdFBE/+FTuJyO5mCxqKqnq8fhWmUh03SentLD47v2HpeXyx0+fv6xUvq6emCTjHdZIhN9GlLDpVC8iwIlP01p3EoeS+82J/4vUujUjUMY5TPojpmRKRYBStFRaWSBIu+Y1/BGXSOu+yeIg12vGOa+UBGOi5kzHXDK4iN18wr+l6USdBpeo23SnIPFeSXWvBYe3GzekE1Tu/VHCspgrZJIa0/fcFAc51SiY5EXZzwxPKbugZ7xvqaIxN4N8emVBvltlRKJE26eQTNU/J3IaGzOQ5uMKZ6bf72J+D+vn2G0M8iFSjPkis0WRZkmJBJZWQkNGcox5ZQpoX9K2HnVFOGtjytITdCbfTp4nJ5tNb6v548i2sQ0zLME6fIMaePAT9uAOtAFBlfwC+7g3rl2fjsPzuMsWnJeZ9bgLzjPL3dnpvw=</latexit>Example: Cournot Competition with Noisy Demand
[Green and Porter, Noncooperative Collusion under Imperfect Price Information, 1984]
- Firms set production levels π)
\ , β¦ , πa \ privately at round π
- Firms do not observe each otherβs output levels
- Market demand is stochastic
- Market price depends on total production and market demand
- Low price could be due to high production or low demand
- Firms utility depends on their own production and market price
Simpler Example: Noisy Prisonerβs Dilemma
- Prisoners donβt observe each others actions, they only observe signal π§
- π£) π, π§ = 1 + π§
π£) π·, π§ = 4 + π§
- π£; π, π§ = 1 + π§
π£; π·, π§ = 4 + π§
- Signal π§ is defined by continuous random variable π with πΉ π = 0
- If π‘ = π, π , then π§ = π
- If π‘ = π, π· or π·, π , then π§ = π β 2
- If π‘ = π·, π· , then π§ = π β 4
- Normal form stage game is
Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (1+X, 1+X) (-1+X, 2+X) Confess (2+X, -1+X) (X, X)
Trigger-Price Strategy
- Consider following trigger strategy for noisy Prisonerβs Dilemma
- (I) - Play (S, S) until π§ β€ π§β, then go to (II)
- (II) - Play (C, C) for π rounds, then go back to (I)
- Note that punishment uses NE of stage game
- We can choose π§β and π such that this strategy profile is SPE
- We use one-shot deviation principle
- Deviation in (II) is obviously not beneficial
- In (I), if agents do not deviate, their expected utility is
v = (1 β Ξ΄) β£ (1 + 0) + F(yβ)Ξ΄(R+1)v +
- 1 β F(yβ)
- Ξ΄v
β
<latexit sha1_base64="FpByB7DCrlbsfWkDcWlL6BXGA=">ACPXicbVBLSwMxGMzWV62vqkdBgyLsWqy7KloPgiI4EXF2kK3Ldk0raHZB0m2UJb+sV78D968ieBEa9ezXaL+BoITGbmI/nGCRgV0jQftNTI6Nj4RHoyMzU9MzuXnV+4EX7IMSlin/m87CBGPVIUVLJSDngBLkOIyWnfRL7pQ7hgvretewGpOqilkebFCOpHr2ugMPoW5t2g3CJDLsY9rSdStnGrlTvVvbMBK9FulXOcvowQ7MQdtRGWtz6KuLAZOUcuN5I1Prpl5cwD4l1hDsna0FfRXzpefLurZe7vh49AlnsQMCVGxzEBWI8QlxYz0MnYoSIBwG7VIRVEPuURUo8H2PbiulAZs+lwdT8KB+n0iQq4QXdRSRfJW/Hbi8X/vEom4VqRL0glMTDyUPNkEHpw7hK2KCcYMm6iDMqforxLeIyxV4UkJBzH2vlb+S26289ZOfvdStVEACdJgCawCHVhgHxyBM3ABigCDPngEL+BVu9OetTftPYmtOHMIvgB7eMTfBGrWg=</latexit>β v = 1 β Ξ΄ 1 β Ξ΄(1 β Ξ΄)
- 1 β F(yβ)(1 β Ξ΄R)
- <latexit sha1_base64="dlMJWSRY+QSMznUM7cBVIx/QRlw=">ACP3icbZDLSgMxFIYz3q23qgsXboIitIJlRkWrIAiCulSxVejUkzbTBzITlTKcO48w18D1/Aja/gzq0bF4q4dWfavH2Q+DLf84hOb8TCq7ANB+Mnt6+/oHBoeHUyOjY+ER6cqogkhSVqCBCOSpQxQT3GcF4CDYaSgZ8RzBTpznVb9pMGk4oF/DM2QlT1S87nLKQFtVdJF+4jX6kCkDC7wJW7gLWy7ktDYWrKrTABJupT5gqzt8Jq+7WaZ4vZrn0WHyXtUjZJVdLzZs5sC/8F6xPmtzf3bq6uszMHlfS9XQ1o5DEfqCBKlSwzhHJMJHAqWJKyI8VCQs9JjZU0+sRjqhy390/wgnaq2A2kPj7gtvt9IiaeUk3P0Z0egbr6XWuZ/9VKEbj5csz9MALm085DbiQwBLgVJq5ySiIpgZCJd/xbROdHygI+EsNHSWnflv1BczlkrudVDnUYedTSEZtEcyiALraNtI8OUAFRdIse0TN6Me6MJ+PVeOu09hifM9Poh4z3D6B0r98=</latexit>
Trigger-Price Strategy (cont.)
- If some agent deviates in (1), then her expected utility is
- Deviation provides immediate utility, but increases probability of entering (II)
- To have SPE, we mush have π€ β₯ π€f which means
- Any π and π§β that satisfy this constraint construct SPE
- Best trigger-price strategy could be found if we maximize π€ subject to this constraint
vd = (1 β Ξ΄) β£ (2 + 0) + F(yβ + 2)Ξ΄(R+1)v +
- 1 β F(yβ + 2)
- Ξ΄v
β
<latexit sha1_base64="wvuFfCkQlQHlOcDZIyhyhdHW3G4=">ACQ3icbVBLSwMxGMz6tr6qHgUNirDrYt2tovUgFAURvKhYFbu1ZLNpDWYfJNlCWfrHPHnxD3jz6sGLB0W8Cma7Ir4GApOZ+fIYN2JUSMu613p6+/oHBoeGcyOjY+MT+cmpExHGHJMKDlnIz1wkCKMBqUgqGTmLOEG+y8ipe7WT+qctwgUNg2PZjkjNR82ANihGUkn1/Hmr7sEtqNvLjkeYRIazTZu6XjQtw9zV2xdLZtHInItEPzJtowNb0ISOq1L28ldCbQ2Y5ZSfnmHk6vkFq2B1Af8S+5MslFei67n92YeDev7O8UIc+ySQmCEhqrYVyVqCuKSYkU7OiQWJEL5CTVJVNEA+EbWk20EHLirFg42QqxVI2FW/TyTIF6LtuyrpI3kpfnup+J9XjWjVEtoEMWSBDi7qBEzKEOYFgo9ygmWrK0Iwpyqt0J8iTjCUtWelbCZYv3ry3/JSbFgrxbWDlUbJZBhCMyAeaADG2yAMtgDB6ACMLgBj+AZvGi32pP2qr1l0R7tc2Ya/ID2/gE0/K0U</latexit>v β₯ 2(1 β Ξ΄) 1 β Ξ΄(1 β Ξ΄)
- 1 β F(yβ + 2)(1 β Ξ΄R)
- <latexit sha1_base64="FRgkm/e13O/nUwU92LtqPUkP8=">ACOHicbVDJSgNBEO1xN25RDx68NIqQKIaZKG4nQVBvLpgYyCShp1MTG3sWunuEMzRP/A/PHnxM7yJFw+KePUL7CTu8UHB6/eq6KrnhJxJZr3Rk9vX/A4NBwamR0bHwiPTlVlEkKBRowANRcogEznwoKY4lEIBxHM4nDrnOy3/9AKEZIF/opohVDzS8JnLKFaqUPLrDdAGy7gtA4n7GW7TpwRbJ/Em/NdthDf3azTSri0v57JdRjY+TtplNUrX0vJkz28DdxPog89tbe9eXV9mZw1r6zq4HNPLAV5QTKcuWGapKTIRilEOSsiMJIaHnpAFlTX3igazE7cMTvKCVOnYDoctXuK3+nIiJ2XTc3SnR9SZ/Ou1xP+8cqTcjUrM/DBS4NPOR27EsQpwK0VcZwKo4k1NCBVM74rpGdEZKp1J4TNFta+Tu4mxXzOWsmtHuk0NlAHQ2gWzaEMstA62kb76BAVEU36AE9oWfj1ng0XozXTmuP8TEzjX7BeHsHVGesIQ=</latexit>
β 2F(yβ + 2) β F(yβ) β€ 1 β Ξ΄(1 β Ξ΄) Ξ΄(1 β Ξ΄)(1 β Ξ΄R)
<latexit sha1_base64="lK7aAaiHonfkzLe6PLlSfGoCFw=">ACQ3icbZDfShtBFMZnra0xtjG2l94MipC0JOxGieldQBAvU2lUmo1hdnI2GTL7h5mzLemyj9FX8Dm8QW8wV60wtFelvoZGPFP/1g4Mf3ncPMfF4shUbvrIWXiy+fLVUWC6uvH5TWi2vT3SUaI4dHkI3XiMQ1ShNBFgRJOYgUs8CQce5O9WX78FZQWUfgZpzH0AzYKhS84Q2MNyl/cQzEaI1Mq+kYb+5Xp6fsPjWothyp1JVDXV4ynTs0dgkRW+QfVLH3q3NpephVs0F5067buehzcO5gs938YX+/LZ1BuVLdxjxJIAQuWRa9xw7xn7KFAouISu6iYaY8QkbQc9gyALQ/TvIKNbxhlSP1LmhEhz9+FGygKtp4FnJgOGY/0m5n/y3oJ+q1+KsI4Qj5/CI/kRQjOiuUDoUCjnJqgHElzFspHzPTGZrai3kJH2dq3n/5ORw16s52feTaNF5iqQdbJBKsQhu6RNDkiHdAkn5+QnuSY31oX1y7q1fs9HF6y7nXfkaw/fwHBhLGr</latexit>Questions?
Acknowledgement
- This lecture is a slightly modified version of one prepared by
- Asu Ozdaglar [MIT 6.254]