ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - - PowerPoint PPT Presentation

β–Ά
ece700 07 game theory with engineering applications
SMART_READER_LITE
LIVE PREVIEW

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - - PowerPoint PPT Presentation

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi Outline Finitely and infinitely repeated games w/ and w/o perfect monitoring Trigger strategies Folk theorems Readings:


slide-1
SLIDE 1

ECE700.07: Game Theory with Engineering Applications

Seyed Majid Zahedi

Le Lecture 6: Re Repeated Games

slide-2
SLIDE 2

Outline

  • Finitely and infinitely repeated games w/ and w/o perfect monitoring
  • Trigger strategies
  • Folk theorems
  • Readings:
  • MAS Sec. 6.1, GT Sec. 5.1 and 5.5
slide-3
SLIDE 3

Finitely Repeated Games (with Perfect Monitoring)

  • In repeated games, stage game 𝐻 is played by same agents for R rounds
  • Agents discount utilities by discount factor 0 ≀ πœ€ ≀ 1
  • Game is denoted by 𝐻& πœ€
  • At each round, outcomes of all past rounds are observed by all agents
  • Agents’ overall utility is sum of discounted utilities at each round
  • Given sequence of utilities 𝑣(

) , … , 𝑣( &

  • In general, strategies at each round could depend on history of play
  • Memory-less (also called stationary) strategies are special cases

ui =

R

X

r=1

Ξ΄rβˆ’1u(r)

i

<latexit sha1_base64="praSL3IReXNHTQC1lxMvD0NwPxQ=">ACFHicbVDLSgMxFM3UV62vqks3wSJUxDJTxQdYqLhxWcU+oNMOmTRtQzMPkoxQhvkIN/6Cn+DGhSJuBd35A278BRemnSJqPRA4nHMuN/fYPqNC6vqblpiYnJqeSc6m5uYXFpfSysV4QUckzL2mMdrNhKEUZeUJZWM1HxOkGMzUrV7JwO/ekm4oJ57Ifs+aTio49I2xUgqyUpvBRaFBWiKwLFCXjCiZngemS3CJGqGfNuIoAo0wyzfjFJWOqPn9CHgODFGJFM8frn5zH+8l6z0q9nycOAQV2KGhKgbui8bIeKSYkailBkI4iPcQx1SV9RFDhGNcHhUBDeU0oJtj6vnSjhUf06EyBGi79gq6SDZFX+9gfifVw9k+6ARUtcPJHFxvKgdMCg9OGgItignWLK+Ighzqv4KcRdxhKXqMS7hcIC975PHSWfM3Zyu2eqjSMQIwnWwDrIAgPsgyI4BSVQBhcgVtwDx60a+1Oe9Se4mhCG82sgl/Qnr8ACui1Q=</latexit>
slide-4
SLIDE 4

Example: Finitely-Repeated Prisoners’ Dilemma

  • Suppose that Prisoners’ Dilemma is played in R (< ∞) rounds
  • What is SPE of this game?
  • We can use backward induction
  • Starting from last round, (C, C) is dominant strategy
  • Regardless of history, (C, C) is dominant strategy at each round
  • There exists unique SPE which is (C, C) at each round

Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2)

slide-5
SLIDE 5

SPE in Finitely Repeated Games

[Theorem]

  • If stage game 𝐻 has unique pure strategy equilibrium π‘‘βˆ—, then 𝐻& πœ€ has

unique SPE in which 𝑑 0 = π‘‘βˆ— for all 𝑠 = 1, … , 𝑆, regardless of history [Proof]

  • By backward induction, at round 𝑆, we have 𝑑 & = π‘‘βˆ—
  • Given this, then we have 𝑑 &4) = π‘‘βˆ—, and continuing inductively, 𝑑 0 = π‘‘βˆ—

for all 𝑠 = 1, … , 𝑆, regardless of history

slide-6
SLIDE 6

Infinitely Repeated Games

  • Infinitely repeated play of 𝐻 with discount factor πœ€ is denoted by 𝐻5 πœ€
  • Agents’ utility is average of discounted utilities at each round
  • For πœ€ < 1, given sequence of utilities 𝑣(

) , … , 𝑣( 5

  • For πœ€ = 1, given sequence of utilities 𝑣(

) , … , 𝑣( 5

ui = lim

Rβ†’βˆž

PR

r=1 u(r) i

R

<latexit sha1_base64="UD7+EuZcQV4tNkSs84mcnMkcsZI=">ACLHicbVBLSwMxGMz6rPV9egl+AC9lF0VrYdCwYvHWqwK3bpk02wbms0uybdKWfYHefGvCOJBEa/evHoSNG1FfA0EJjPfkHzjx4JrsO0Ha2R0bHxiMjeVn56ZnZsvLCye6ChRlNVpJCJ15hPNBJesDhwEO4sVI6Ev2KnfPej7pxdMaR7JY+jFrBmStuQBpwSM5BUOEo/jMnYFD7205ire7gBRKrELpcB9DLsBorQ1NWJGVBlJztPa5kJnacbajPLzCXvFVbtoj0A/kucT7Jacd7fVtZeXqte4dZtRTQJmQqiNYNx46hmRIFnAqW5d1Es5jQLmzhqGShEw308GyGV43SgsHkTJHAh6o3xMpCbXuhb6ZDAl09G+vL/7nNRISs2UyzgBJunwoSARGCLcbw63uGIURM8QhU3f8W0Q0w5YPodlrDfx+7Xyn/JyVbR2S7uHJk2SmiIHFpGK2gDOWgPVdAhqI6ougK3aB79GBdW3fWo/U0HB2xPjNL6Aes5w+Qyq2e</latexit>

ui = (1 βˆ’ Ξ΄)

∞

X

r=1

Ξ΄rβˆ’1u(r)

i

<latexit sha1_base64="HaS4GJK/zsYRK2PrnJxz5C2EcQ8=">ACI3icbVDLSsNAFJ34rPVdelmUIS6sCQqvlBQ3LhUsCo0bZhMJ3XoZBJmboQS8i9uxD9x48IHbly4deMvuHDaiPg6MHA451zu3OPHgmuw7Wer39gcGi4MFIcHRufmCxNTZ/oKFGUVWkInXmE80El6wKHAQ7ixUjoS/Yqd/e7/qnF0xpHslj6MSsHpKW5AGnBIzklbYSj+MdXHaW3CYTQBZdnYReqnacrJG6XAbQyXKnkaolJ8Mm30jLajEreqV5u2L3gP8S5PM7+49XL8v70eqVHtxnRJGQSqCBa1xw7hnpKFHAqWFZ0E81iQtukxWqGShIyXU97N2Z4wShNHETKPAm4p36fSEmodSf0TIkcK5/e13xP6+WQLBRT7mME2CS5ouCRGCIcLcw3OSKURAdQwhV3PwV03OiCAVTa17CZhdrXyf/JSfLFWelsnpk2thGOQpoFs2hMnLQOtpFB+gQVRFl+gG3aF768q6tR6tpzaZ3OzKAfsF4+ADl7qLQ=</latexit>
slide-7
SLIDE 7

Trigger Strategies (TS)

  • Agents get punished if they deviate from agreed profile
  • In non-forgivingTS (or grim TS), punishment continues forever
  • Here, π‘‘βˆ— is agreed profile, and 𝑑(

6 is punishment strategy of 𝑗 for agent π‘˜

  • Single deviation by π‘˜ trigers agent 𝑗 to switch to 𝑑(

6, forever

s(t)

i

= ( sβˆ—

i

if s(r) = sβˆ—, 8r < t sj

i

if s(r)

j

6= sβˆ—

j, 9r < t

<latexit sha1_base64="J6hXRWqJpSkiJ6eFsJQRNSPICXs=">ACcnicbVFdaxQxFM1Mq9b1a1V8UdCri/aDsy24AdsoeCLjxXctrBZh0z2zjZtJjMkd6TLMC+fd89U/4IuvgpnZRaz1QuDk3HOS3JOk0MpRFH0LwpXVK1evrV3v3Lh56/ad7t17hy4vrcSRzHVujxPhUCuDI1Kk8biwKLJE41Fy9rbpH31C61RuPtC8wEkmZkalSgryVNz94mL1sdqgzXqPJzhTpL+NFd3Gn4LXnDCc6pUCjU4r7ObNex5tLUNPM2t0BrskDhv5acX5fHp0sANtrvGhOd+JgcWhtDYOJrp8sa424v6UVtwGQyWoLe/vlX+X1VH8Tdr3yayzJDQ1IL58aDqKBJSwpqbHu8NJhIeSZmOHYQyMydJOqjayG56Zgh/BL0PQsn87KpE5N8Sr8wEnbh/ew35v964pPT1pFKmKAmNXFyUlhohyZ/mCqLkvTcAyGt8m8FeSKskOR/qdOG8Kapl39GvgwOd/qD3f7ue5/GkC1qjT1iz9gG7BXbJ+9YwdsxCT7ETwIHgdPgp/hw/Bp2FtIw2Dpuc8uVLj9G3Jbvms=</latexit>
slide-8
SLIDE 8

Trigger Strategies in Repeated Prisoners’ Dilemma

  • Suppose both agents use following trigger strategy
  • Play S unless someone has ever played C in past
  • Play C forever if someone has played C in past
  • Under what conditions is this SPE?
  • We use one-stage deviation principle
  • Step 1: (S is best response to S)
  • Utility from S: βˆ’ 1 βˆ’ πœ€

1 + πœ€ + πœ€; + β‹― = βˆ’ 1 βˆ’ πœ€ / 1 βˆ’ πœ€ = βˆ’1

  • Utility from C: βˆ’ 1 βˆ’ πœ€

0 + 2πœ€ + 2πœ€; + β‹― = βˆ’2πœ€ 1 βˆ’ πœ€ / 1 βˆ’ πœ€ = βˆ’2πœ€

  • S is better than C if πœ€ β‰₯ 1/2
  • Step 2: (C is best response to C)
  • Other agents will always play C, thus C is best response

Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2)

slide-9
SLIDE 9

Remarks

  • Cooperation is equilibrium, but so are many other strategy profiles
  • If π‘‘βˆ— is NE of 𝐻, then β€œeach agent plays 𝑑(

βˆ—β€ is SPE of 𝐻& πœ€

  • Future play of other agents is independent of how each agent plays
  • Optimal play is to maximize current utility, i.e., play static best response
  • Sets of equilibria for finite and infinite horizon versions can be different
  • Multiplicity of equilibria in repeated prisoner’s dilemma only occurs at 𝑆 = ∞
  • For any finite 𝑆 (thus for 𝑆 β†’ ∞), repeated prisoners’ dilemma has unique SPE
slide-10
SLIDE 10

TS in Finitely Repeated Games

  • If 𝐻 has multiple equilibria, then 𝐻&(πœ€) does not have unique SPE
  • Consider following example
  • Stage game has two pure NE: (y, y) and (z, z)
  • Socially optimal outcome, (x, x), is not equilibrium
  • In twice repeated play, we can support (x, x) in first round

Agent 2 Agent 1 x y z x (3, 3) (0, 4) (-2, 0) y (4, 0) (1, 1) (-2, 0) z (0, -2) (0, -2) (-1, -1)

slide-11
SLIDE 11

TS in Finitely Repeated Games (cont.)

  • TS strategy
  • Play x in first round
  • Play y in second round if opponent played x; otherwise, play z
  • We can use one-shot deviation principle
  • For simplicity, suppose πœ€ = 1
  • Playing x first and y next leads to utility of 4
  • Playing y first triggers opponent to play z next, which leads to utility 3
  • Deviation is not profitable!
slide-12
SLIDE 12

Repetition Can Lead to Bad Outcomes

  • Consider this game
  • Strategy x strictly dominates y and z for both agents
  • Unique Nash equilibrium of stage game is (x, x)
  • If πœ€ β‰₯ 1/2, this game has SPE in which (y, y) is played in every round
  • It is supported by slightly more complicated strategy than grim trigger
  • I. Play y in every round unless someone deviates, then go to II
  • II. Play z. If no one deviates go to I. If someone deviates stay in II

Agent 2 Agent 1 x y z x (2, 2) (2, 1) (0, 0) y (1, 2) (1, 1) (-1, 0) z (0, 0) (0, -1) (-1, -1)

slide-13
SLIDE 13

Feasible and Individually Rational Utilities

  • π‘Š = Convex hull of 𝑀 ∈ ℝ ℐ

there exists 𝑑 ∈ 𝑇 such that 𝑣 s = 𝑀

  • Utility in repeated game is just a weighted average of utilities in stage game
  • Note that π‘Š β‰  𝑀 ∈ ℝ ℐ

there exists 𝜏 ∈ Ξ£ such that 𝑣 𝜏 = 𝑀

  • Recall minmax value of agent 𝑗
  • Also recall minmax strategy against 𝑗
  • Utility vector 𝑀 ∈ ℝ ℐ is strictly individually rational if 𝑀( > 𝑀(, βˆ€π‘—

vi = min

Οƒβˆ’i max si

ui(si, Οƒβˆ’i)

<latexit sha1_base64="nKWPzLxWbg6YlOqMulrJ9pCOQ7c=">ACLXicbVDLSgMxFM34tr6qLrsJFkFBy4yCVlAQdOFSwWqhU0Imk9ZgkhmSTLEM8yH+ghu/wr0ILiqiS3/DTCtSHwcCh3Pu4eaeIOZMG9ftOSOjY+MTk1PThZnZufmF4uLShY4SRWiNRDxS9QBrypmkNcMp/VYUSwCTi+D6Pcv+xQpVkz03pk2B25K1GMHGSqh47CcypCqPp50MXgAfcEkSn3N2gKjdJNlmZXwDUo1YhlMEFuzZGPIX0fFsltx+4B/ifdFyofV9vSQz08RcUnP4xIqg0hGOtG54bm2aKlWGE06zgJ5rGmFzjNm1YKrGgupn2r83gqlVC2IqUfdLAvjqcSLHQuisCOymwudK/vVz8z2skplVtpkzGiaGSDBa1Eg5NBPqYMgUJYZ3LcFEMftXSK6wsTYgv9EvZy7Hyf/JdcbFW87cr2mW1jHwBUpgBawBD+yCQ3ACTkENEHAHkEPvDj3zrPz6rwNRkecr8wy+AHn4xP/gaz1</latexit>

mi

βˆ’i = arg min Οƒβˆ’i max si

ui(si, Οƒβˆ’i)

<latexit sha1_base64="IkDCELqS51Mqf/r7UGub8peXdw=">ACLXicbVDLSgMxFM34tr6qLt0Ei6CgZWrBygIunCpYGuhU4c7adqGJpkhyYhlmA/xF9z4Fe5FcKGILv0N06mIrwOBwzncnNPEHGmjes+OUPDI6Nj4xOTuanpmdm5/PxCVYexIrRCQh6qWgCaciZpxTDaS1SFETA6XnQPez75dUaRbKM9OLaENAW7IWI2Cs5OePhJ9sPQiYSnex6oNvYEk37iadYWkJmpleDKT7RvQ7HPVi1Z/+av+fmCW3Qz4L+k9EkKBztv10t3teaJn3/wmiGJBZWGcNC6XnIj0hAGUY4TXNerGkEpAtWrdUgqC6kWTXpnjFKk3cCpV90uBM/T6RgNC6JwKbFGA6+rfXF/z6rFp7TQSJqPYUEkGi1oxybE/epwkylKDO9ZAkQx+1dMOqCAGFtwLitht4+tr5P/kupmsVQulk9tG3togAm0hJbRKiqhbXSAjtEJqiCbtA9ekLPzq3z6Lw4r4PokPM5s4h+wHn/A8IrGw=</latexit>
slide-14
SLIDE 14

Example

  • What is minimax value of agent 1?
  • Let π‘Ÿ denote probability that agent 2 chooses L
  • What is minimax value of agent 2?
  • Let π‘ž and π‘Ÿ denote probabilities that agent 1 chooses U and M, respectively
  • Agent 2

Agent 1

L R U (-2, 2) (1, -2) M (1, -2) (-2, 2) D (0, 1) (0, 1)

m1

2 ∈ [1/3, 2/3]

<latexit sha1_base64="l9x9P5BZwbW6uZ9IWcvwmWhPos=">AB/HicbVDLSsNAFJ34rPVLbhxM1gEF9ImLdS6K7hxWcE+oI1lMp20QyeTMDMRQqjf4Uo3LhRx64e4E/0YJ2kRtR64cDjnXu69xwkYlco0342FxaXldXMWnZ9Y3NrO7ez25J+KDBpYp/5ouMgSRjlpKmoYqQTCI8h5G2Mz5L/PY1EZL6/FJFAbE9NOTUpRgpLfVzea9fvrJgj3LYtUqV43KpYvdzBbNopoDzxJqRQn0vuLv9/Kg1+rm3sDHoUe4wgxJ2bXMQNkxEopiRibZXihJgPAYDUlXU48Iu04PX4CD7UygK4vdHEFU/XnRIw8KSP0Z0eUiP510vE/7xuqNyaHVMehIpwPF3khgwqHyZJwAEVBCsWaYKwoPpWiEdIKx0Xtk0hNME1e+X50mrXLQqxcqFTqMKpsiAfXAjoAFTkAdnIMGaAIMInAPHsGTcWM8GM/Gy7R1wZjN5MEvGK9fb+GWhg=</latexit>

v2 = min

0≀p,q≀1 max {p βˆ’ 3q + 1, βˆ’3p + q + 1} = 0

<latexit sha1_base64="+0pmj/aIXLiQ575swn642MAKcjw=">ACMHicbVBbSxtBGJ31buol2kf7MHgBQ27WYj6IBUs1EcFkwjZsMxOvuiQ2dnJzKwYlu1j/01f+lP0xUKL+Np/0DdnEyneDgxzOc7zHwnkpxp47q/nLHxicmp6ZnZ0oe5+YXF8tJyQyepolCnCU/UeUQ0cCagbpjhcC4VkDji0Ix6R4XfvAKlWSLOzEBCOyYXgnUZJcZKYflrkIoOqCKeXeVhFR/gIGYizNyA5bfVzcXm5Vco2/BRmWO35/y9vGO7csgQHuc24YXnNrbhD4LfEeyJrh/73f1/WP30+Ccs3QSehaQzCUE60bnmuNO2MKMoh7wUpBokoT1yAS1LBYlBt7PhwjnesEoHdxNljzB4qD5PZCTWehBHdjIm5lK/9grxPa+Vmu5eO2NCpgYEHT3UTk2CS7awx2mgBo+sIRQxexfMb0kilBjOy4NS9gvUPu/8lvSqFY8v+Kf2jZqaIQZtIJW0Sby0C46RMfoBNURT/QLfqN/jg/nTvn3nkYjY45T5mP6AWcv4/tai4</latexit>

m2

1 = (1/2, 1/2, 0)

<latexit sha1_base64="5ercTrqmLlp0bS5/7x25oeOwk/Y=">AB/XicbVDLSsNAFJ3UV62v+MCNm8EiVJCatFDrQi4cVnBPqCNYTKdtEMnD2YmQi3F3Alblwo4tb/cCf6MU6aImo9cOFwzr3ce48TMiqkYbxrqZnZufmF9GJmaXldU1f36iLIOKY1HDAt50kCM+qQmqWSkGXKCPIeRhtM/jf3GFeGCBv6FHITE8lDXpy7FSCrJ1rc827wswBOYMw8LB3EZ+xlbzxp5Yw4TcwJyVa2w7vbz49y1dbf2p0ARx7xJWZIiJZphNIaIi4pZmSUaUeChAj3UZe0FPWR4Q1HF8/gntK6UA34Kp8Ccfqz4kh8oQYeI7q9JDsib9eLP7ntSLplq0h9cNIEh8ni9yIQRnAOArYoZxgyQaKIMypuhXiHuISxVYEsJxjNL3y9OkXsibxXzxXKVRAgnSYAfsghwRGogDNQBTWAwTW4B4/gSbvRHrRn7SVpTWmTmU3wC9rFxXqlZY=</latexit>

v1 = min

0≀q≀1 max {1 βˆ’ 3q, βˆ’2 + 3q, 0} = 0

<latexit sha1_base64="PyJL4QW9G7YivbFjEisW0PojKJM=">ACL3icbVDLSgMxFM34tr5aXboJiCoZal6EYoCOJSwarQKUMmvbWhmcw0yRTLMP6MWzd+g3/gRkQRt+4VXJpXfg6kNzDufeQ3ONHnClt2w/WyOjY+MTk1HRuZnZufiFfWDxRYSwp1GjIQ3nmEwWcCahpjmcRJI4HM49Tt7Wf+0B1KxUBzrfgSNgJwL1mKUaCN5+X03Fk2QmT3pZ6Dd7EbMOEltsBd3F2O6nRyAW+dBPsbJW7m3irtJEVG7upMdi5nJdftYv2APgvcb7IarV0dftReHs/9PJ3bjOkcQBCU06Uqjt2pBsJkZpRDmnOjRVEhHbIOdQNFSQA1UgG+6Z4zShN3AqlOULjgfrdkZBAqX7gm8mA6Lb63cvE/3r1WLd2GgkTUaxB0OFDrZhjHeIsPNxkEqjmfUMIlcz8FdM2kYRqE3EWgvN75b/kpFR0ysXKkUmjgoaYQstoBa0jB2jKjpAh6iGKLpGd+gRPVk31r31bL0MR0esL8S+gHr9RNxQalQ</latexit>
slide-15
SLIDE 15

Minmax Utility Lower Bounds

[Theorem]

  • If πœβˆ— is NE of 𝐻, then 𝑣( πœβˆ— β‰₯ 𝑀(
  • If πœβˆ— is NE of 𝐻5 πœ€ , then 𝑣( πœβˆ— β‰₯ 𝑀(

[Proof]

  • Agent 𝑗 can always guarantee herself 𝑀( in stage game and also in each

round of repeated game, meaning that she can always achieve at least this utility against even most adversarial opponents

slide-16
SLIDE 16

Nash Folk Theorem

[Nash Folk Theorem]

  • If 𝑀 is feasible and strictly individually rational, then there exists πœ€ < 1,

such that for all πœ€ > πœ€, 𝐻5 πœ€ has NE with utilities 𝑀 [Proof]

  • Suppose for simplicity that there exists pure strategy profile π‘‘βˆ— such that

𝑣( π‘‘βˆ— = 𝑀( (otherwise, proof is more involved)

  • Consider following grim trigger strategy for agent 𝑗
  • Play 𝑑( as long as no one deviates
  • If some agent π‘˜ deviates, then play 𝑛(

6 thereafter

  • If 𝑗 plays 𝑑, her utility is 𝑀(
slide-17
SLIDE 17

Proof of Nash Folk Theorem

  • We can use one-shot deviation principle
  • Suppose 𝑗 deviates from 𝑑 in round 𝑠
  • Define Μ…

𝑀( = max

VW 𝑣( 𝑑(, 𝑑4( βˆ—

  • We have
  • Following π‘‘βˆ— will be optimal if
  • This means, π‘‘βˆ— is NE of 𝐻5 πœ€ if

ui ≀ (1 βˆ’ Ξ΄)(vi + Ξ΄vi + Β· Β· Β· + Ξ΄rβˆ’1vi + Ξ΄rΒ― vi + Ξ΄r+1vi + Ξ΄r+2vi + . . . )

<latexit sha1_base64="t7ZPoC2fNr21i8lTgqgbVjGKfM8=">ACg3icbVFdS+NAFJ1EXbW6a9WXBV+GLS6VYk2q+PGwUPTFR4WtCk03TCa3OjiZhJmbQg0Ff4c/w5+ybwv7Y3aSirjWA8OcOede7p17o0wKg573x3Hn5hc+LS4t1ZWP39Zq69vXJk01x6PJWpvomYASkU9FCghJtMA0siCdfR/VnpX49AG5GqnzjOYJCwWyWGgjO0Ulh/ykNBAwm06e8GMUhkO06CkVr+igpbdEgTtFUdyn+KvSuP3m1KknTIGK6GE3eioVu2bgVzHosFZu/OxXVbCesNr+1VoLPEfyGN7teHv8uPz6cXYf23zeR5Agq5ZMb0fS/DQcE0Ci5hUgtyAxnj9+wW+pYqloAZFNUMJ3TbKjEdptoehbRS32YULDFmnEQ2MmF4Z957pfiR189xeDwohMpyBMWnhYa5pJjSciE0Fho4yrEljGthe6X8jmnG0a6tVg3hpMTh65dnyVWn7e+3Dy79RveYTLFEtsg30iQ+OSJdck4uSI9whzjfnT3HcxfcltxD6ahrvOSs0n+g/vjH2ZbwkU=</latexit>

vi β‰₯ (1 βˆ’ Ξ΄r)vi + Ξ΄r(1 βˆ’ Ξ΄)Β― vi + Ξ΄r+1vi = vi βˆ’ Ξ΄r(vi βˆ’ Β― vi + Ξ΄(Β― vi βˆ’ vi))

<latexit sha1_base64="Xn37rhesk6srf/MvyUebuO/pAf8=">ACkHicbVFdT9swFHXCYFC+OvbIyxUVqBVqlQw0OmloTHtBPIFEAakplePcFgvHiWynUhXl9+z/7G3/Zk4oZVCOZOn4nHuvr+8NU8G18by/jrv0YXnl4+pabX1jc2u7/mnRieZYthjiUjUXUg1Ci6xZ7gReJcqpHEo8DZ8/FX6txNUmify2kxTHMR0LPmIM2qsNKz/ngw5HEAwRmj67SBCYei9aln1EJ5vc6MFQUhVPilsztzO1aFfQJDJCFXZRWUHQe0ATqEs3n4JLKD5LC3Uab5I7bfVWtAa1htex6sAi8SfkQaZ4XJY/xNECctilIYJqnXf91IzyKkynAksakGmMaXskY6xb6mkMepBXg20gH2rRDBKlD3SQKX+n5HTWOtpHNrImJoH/dYrxfe8fmZG3UHOZoZlOzpoVEmwCRQbgcirpAZMbWEMsVtr8AeqKLM2B3WqiF8K/F1/uVFcvOl4x91jq+OG2fd2ThWyS7ZI03ikxNyRs7JekR5mw6R85359TdcbvuD/fnU6jrzHI+k1dwL/4BlqHDcA=</latexit>

Ξ΄ β‰₯ Ξ΄ = max

i

Β― vi βˆ’ vi Β― vi βˆ’ vi

<latexit sha1_base64="m7LMG5noPteaUI5TqX5kMr6yQxk=">ACOnicbVDLSsNAFJ34tr6qLkUIiuDGkqpUXQiCG5cKVoWmhJvJTR2cTMLMpFhCfkBwKfgnbvwKdy7cuFDErR/gpBWpjwMD57z8zc4yecKe04j9bA4NDwyOjYeGlicmp6pjw7d6LiVFKs05jH8swHhZwJrGumOZ4lEiHyOZ76F/tF/7SNUrFYHOtOgs0IWoKFjI2klc+cgPkGtwWuqkIUBb3ZD0t3UjuPSYG0qgmeuDzNq5x9baHsv7yj5fUededmpOF3Yf0n1iyzv1W4WO9dXt4de+cENYpGKDTloFSj6iS6mYHUjHLMS26qMAF6AS1sGCogQtXMuqvn9opRAjuMpTlC212135FBpFQn8s1kBPpc/e4V4n+9RqrD7WbGRJqFLT3UJhyW8d2kaMdMIlU84hQCUzf7XpOZiktEm71A1hp0Dte+W/5GS9Ut2obB6ZNLZJD2NkgSyRVIlW2SPHJBDUieU3JEn8kJerXvr2Xqz3nujA9aXZ578gPXxCShAtGo=</latexit>
slide-18
SLIDE 18

Problems with Nash Folk Theorem

  • Any utility can be achieved when agents are patient enough
  • NE involves non-forgiving TS which may be costly for punishers
  • NE may include non-credible threats
  • NE may not be subgame perfect
  • Example:
  • Unique NE in this game is (D, L)
  • Minmax values are given by 𝑀) = 0 and 𝑀; = 1
  • Minmax strategy against agent 1 requires agent 2 to play R
  • R is strictly dominated by L for agent 2

Agent 2 Agent 1 L R U (6, 6) (0, -100) D (7, 1) (0, -100)

slide-19
SLIDE 19

Subgame Perfect Folk Theorem

[Theorem]

  • Let π‘‘βˆ— be NE of stage game 𝐻 with utilities π‘€βˆ—
  • For any feasible utility 𝑀 with 𝑀( > 𝑀(

βˆ—, βˆ€π‘— ∈ ℐ, there exists πœ€ < 1 such that for all πœ€ > πœ€,

𝐻5 πœ€ has SPE with utilities 𝑀

[Proof]

  • Simply construct non-forgiving TS with punishment by static NE
  • Punishments are therefore subgame perfect
  • For πœ€ sufficiently close to 1, it is better for each agent 𝑗 to obtain 𝑀( rather than

deviate and get 𝑀(

βˆ— forever thereafter

  • This shows that any utility higher than NE utilities can be sustained as SPE
slide-20
SLIDE 20

Repeated Games with Imperfect Monitoring

  • At each round, all agents observe some public outcome, which is

correlated with stage game actions

  • Let 𝑧 ∈ 𝑍 denote publicly observed outcome
  • Each strategy profile 𝑑 induces probability distribution over 𝑧
  • Let 𝜌 𝑧, 𝑑 denote probability distribution of 𝑧 under action profile 𝑑
  • Public information at round 𝑠 is β„Ž \ = 𝑧 ) , … , 𝑧 \4)
  • Strategy of agent 𝑗 is sequence of maps 𝑑(

0 : β„Ž 0 β†’ 𝑇(

slide-21
SLIDE 21

Repeated Games with Imperfect Monitoring (cont.)

  • Each agent’s utility depends only on her own action and public outcome
  • Dependence on actions of others is through their effect on distribution of 𝑧
  • Agent 𝑗’s realized utility at round 𝑠 is
  • Given strategy profile 𝑑 0 , agent 𝑗’s expected utility is
  • Agent 𝑗’s average discounted utility for sequence {𝑑_} is

ui(s(r)

i , y(r))

<latexit sha1_base64="/WomN2swvDj5/SQgYoFwOjeZFXs=">AB/3icbVDLSsNAFJ34rPUVFdx0M1iEFqQkWrQuhIblxXsA9oYJtNpO3TyYGYihNiFv+LGhSLd+hFu3Pk3TpIiaj1wuYdz7mXuHCdgVEjD+NQWFpeWV1Zza/n1jc2tbX1ntyX8kGPSxD7zecdBgjDqkakpFOwAlyHUbazvgy8dt3hAvqezcyCojloqFHBxQjqSRb3w9tWhI2vY1LvDw5irJetvWiUTFSwHlizkixXriodt7htGHrH72+j0OXeBIzJETXNAJpxYhLihmZ5HuhIAHCYzQkXU95BJhxen9E3iolD4c+FyVJ2Gq/tyIkStE5Dpq0kVyJP56ifif1w3loGbF1AtCSTycPTQIGZQ+TMKAfcoJlixSBGFO1a0QjxBHWKrI8mkI5wlOv78T1rHFfOkUr1WadRAhwogANQAiY4A3VwBRqgCTC4B4/gGbxoD9qT9qpNs9EFbazB35Be/sCW3+YEg=</latexit>

ui(s(r)) = X

y∈Y

Ο€(y(r), s(r))ui(s(r)

i , y(r))

<latexit sha1_base64="PVGEb3Mtn1chUAMq4swRsXUprB8=">ACL3icbVBbS8MwGE3nbc7b1EdfgkPYQEar4gURJoL4OMG5yVpLmqUaTNOSpEIp+0e+O6v2IuIr76L0y7Kd4OBA7nI8v3/EiRqUyzSejMDY+MTlVnC7NzM7NL5QXl85lGAtMWjhkoeh4SBJGOWkpqhjpRIKgwGOk7d0cZX7lghJQ36mkog4Abri1KcYKS25ePYpV5mVZFrV+DB7aMAzdNbMrhRd+OaDUZWufkTzu0pGafA65YpZN3PAv8QakUrjcP+hYjlR0y0P7F6I4BwhRmSsmuZkXJSJBTFjPRLdixJhPANuiJdTkKiHTS/N4+XNKD/qh0I8rmKvfJ1IUSJkEnk4GSF3L314m/ud1Y+XvOinlUawIx8NFfsygCmFWHuxRQbBiSYIC6r/CvE1EgrXEpL2Evw/bXyX/J+Ubd2qxvneo2dsEQRbACVkEVWGAHNMAJaIWwOAODMAzeDHujUfj1XgbRgvGaGYZ/IDx/gFMGape</latexit>

ui = (1 βˆ’ Ξ΄)

∞

X

r=1

Ξ΄rβˆ’1ui(s(r))

<latexit sha1_base64="BoJdR0BUt5cJ9+1row0IOZpCzk=">ACJXicbVBNSyNBFHwT3V3NfpjVoyDNhoXkDCji6uQgODFg4cIxgiZOjp9GhjT8/Q/UYIw/wZL/4Bz573sgdFBE/+FTuJyO5mCxqKqnq8fhWmUh03SentLD47v2HpeXyx0+fv6xUvq6emCTjHdZIhN9GlLDpVC8iwIlP01p3EoeS+82J/4vUujUjUMY5TPojpmRKRYBStFRaWSBIu+Y1/BGXSOu+yeIg12vGOa+UBGOi5kzHXDK4iN18wr+l6USdBpeo23SnIPFeSXWvBYe3GzekE1Tu/VHCspgrZJIa0/fcFAc51SiY5EXZzwxPKbugZ7xvqaIxN4N8emVBvltlRKJE26eQTNU/J3IaGzOQ5uMKZ6bf72J+D+vn2G0M8iFSjPkis0WRZkmJBJZWQkNGcox5ZQpoX9K2HnVFOGtjytITdCbfTp4nJ5tNb6v548i2sQ0zLME6fIMaePAT9uAOtAFBlfwC+7g3rl2fjsPzuMsWnJeZ9bgLzjPL3dnpvw=</latexit>
slide-22
SLIDE 22

Example: Cournot Competition with Noisy Demand

[Green and Porter, Noncooperative Collusion under Imperfect Price Information, 1984]

  • Firms set production levels π‘Ÿ)

\ , … , π‘Ÿa \ privately at round 𝑠

  • Firms do not observe each other’s output levels
  • Market demand is stochastic
  • Market price depends on total production and market demand
  • Low price could be due to high production or low demand
  • Firms utility depends on their own production and market price
slide-23
SLIDE 23

Simpler Example: Noisy Prisoner’s Dilemma

  • Prisoners don’t observe each others actions, they only observe signal 𝑧
  • 𝑣) 𝑇, 𝑧 = 1 + 𝑧

𝑣) 𝐷, 𝑧 = 4 + 𝑧

  • 𝑣; 𝑇, 𝑧 = 1 + 𝑧

𝑣; 𝐷, 𝑧 = 4 + 𝑧

  • Signal 𝑧 is defined by continuous random variable π‘Œ with 𝐹 π‘Œ = 0
  • If 𝑑 = 𝑇, 𝑇 , then 𝑧 = π‘Œ
  • If 𝑑 = 𝑇, 𝐷 or 𝐷, 𝑇 , then 𝑧 = π‘Œ βˆ’ 2
  • If 𝑑 = 𝐷, 𝐷 , then 𝑧 = π‘Œ βˆ’ 4
  • Normal form stage game is

Prisoner 2 Prisoner 1 Stay Silent Confess Stay Silent (1+X, 1+X) (-1+X, 2+X) Confess (2+X, -1+X) (X, X)

slide-24
SLIDE 24

Trigger-Price Strategy

  • Consider following trigger strategy for noisy Prisoner’s Dilemma
  • (I) - Play (S, S) until 𝑧 ≀ π‘§βˆ—, then go to (II)
  • (II) - Play (C, C) for 𝑆 rounds, then go back to (I)
  • Note that punishment uses NE of stage game
  • We can choose π‘§βˆ— and 𝑆 such that this strategy profile is SPE
  • We use one-shot deviation principle
  • Deviation in (II) is obviously not beneficial
  • In (I), if agents do not deviate, their expected utility is

v = (1 βˆ’ Ξ΄) ⇣ (1 + 0) + F(yβˆ—)Ξ΄(R+1)v +

  • 1 βˆ’ F(yβˆ—)
  • Ξ΄v

⌘

<latexit sha1_base64="FpByB7DCrlbsfWkDcWlL6BXGA=">ACPXicbVBLSwMxGMzWV62vqkdBgyLsWqy7KloPgiI4EXF2kK3Ldk0raHZB0m2UJb+sV78D968ieBEa9ezXaL+BoITGbmI/nGCRgV0jQftNTI6Nj4RHoyMzU9MzuXnV+4EX7IMSlin/m87CBGPVIUVLJSDngBLkOIyWnfRL7pQ7hgvretewGpOqilkebFCOpHr2ugMPoW5t2g3CJDLsY9rSdStnGrlTvVvbMBK9FulXOcvowQ7MQdtRGWtz6KuLAZOUcuN5I1Prpl5cwD4l1hDsna0FfRXzpefLurZe7vh49AlnsQMCVGxzEBWI8QlxYz0MnYoSIBwG7VIRVEPuURUo8H2PbiulAZs+lwdT8KB+n0iQq4QXdRSRfJW/Hbi8X/vEom4VqRL0glMTDyUPNkEHpw7hK2KCcYMm6iDMqforxLeIyxV4UkJBzH2vlb+S26289ZOfvdStVEACdJgCawCHVhgHxyBM3ABigCDPngEL+BVu9OetTftPYmtOHMIvgB7eMTfBGrWg=</latexit>

β‡’ v = 1 βˆ’ Ξ΄ 1 βˆ’ Ξ΄(1 βˆ’ Ξ΄)

  • 1 βˆ’ F(yβˆ—)(1 βˆ’ Ξ΄R)
  • <latexit sha1_base64="dlMJWSRY+QSMznUM7cBVIx/QRlw=">ACP3icbZDLSgMxFIYz3q23qgsXboIitIJlRkWrIAiCulSxVejUkzbTBzITlTKcO48w18D1/Aja/gzq0bF4q4dWfavH2Q+DLf84hOb8TCq7ANB+Mnt6+/oHBoeHUyOjY+ER6cqogkhSVqCBCOSpQxQT3GcF4CDYaSgZ8RzBTpznVb9pMGk4oF/DM2QlT1S87nLKQFtVdJF+4jX6kCkDC7wJW7gLWy7ktDYWrKrTABJupT5gqzt8Jq+7WaZ4vZrn0WHyXtUjZJVdLzZs5sC/8F6xPmtzf3bq6uszMHlfS9XQ1o5DEfqCBKlSwzhHJMJHAqWJKyI8VCQs9JjZU0+sRjqhy390/wgnaq2A2kPj7gtvt9IiaeUk3P0Z0egbr6XWuZ/9VKEbj5csz9MALm085DbiQwBLgVJq5ySiIpgZCJd/xbROdHygI+EsNHSWnflv1BczlkrudVDnUYedTSEZtEcyiALraNtI8OUAFRdIse0TN6Me6MJ+PVeOu09hifM9Poh4z3D6B0r98=</latexit>
slide-25
SLIDE 25

Trigger-Price Strategy (cont.)

  • If some agent deviates in (1), then her expected utility is
  • Deviation provides immediate utility, but increases probability of entering (II)
  • To have SPE, we mush have 𝑀 β‰₯ 𝑀f which means
  • Any 𝑆 and π‘§βˆ— that satisfy this constraint construct SPE
  • Best trigger-price strategy could be found if we maximize 𝑀 subject to this constraint

vd = (1 βˆ’ Ξ΄) ⇣ (2 + 0) + F(yβˆ— + 2)Ξ΄(R+1)v +

  • 1 βˆ’ F(yβˆ— + 2)
  • Ξ΄v

⌘

<latexit sha1_base64="wvuFfCkQlQHlOcDZIyhyhdHW3G4=">ACQ3icbVBLSwMxGMz6tr6qHgUNirDrYt2tovUgFAURvKhYFbu1ZLNpDWYfJNlCWfrHPHnxD3jz6sGLB0W8Cma7Ir4GApOZ+fIYN2JUSMu613p6+/oHBoeGcyOjY+MT+cmpExHGHJMKDlnIz1wkCKMBqUgqGTmLOEG+y8ipe7WT+qctwgUNg2PZjkjNR82ANihGUkn1/Hmr7sEtqNvLjkeYRIazTZu6XjQtw9zV2xdLZtHInItEPzJtowNb0ISOq1L28ldCbQ2Y5ZSfnmHk6vkFq2B1Af8S+5MslFei67n92YeDev7O8UIc+ySQmCEhqrYVyVqCuKSYkU7OiQWJEL5CTVJVNEA+EbWk20EHLirFg42QqxVI2FW/TyTIF6LtuyrpI3kpfnup+J9XjWjVEtoEMWSBDi7qBEzKEOYFgo9ygmWrK0Iwpyqt0J8iTjCUtWelbCZYv3ry3/JSbFgrxbWDlUbJZBhCMyAeaADG2yAMtgDB6ACMLgBj+AZvGi32pP2qr1l0R7tc2Ya/ID2/gE0/K0U</latexit>

v β‰₯ 2(1 βˆ’ Ξ΄) 1 βˆ’ Ξ΄(1 βˆ’ Ξ΄)

  • 1 βˆ’ F(yβˆ— + 2)(1 βˆ’ Ξ΄R)
  • <latexit sha1_base64="FRgkm/e13O/nUwU92LtqPUkP8=">ACOHicbVDJSgNBEO1xN25RDx68NIqQKIaZKG4nQVBvLpgYyCShp1MTG3sWunuEMzRP/A/PHnxM7yJFw+KePUL7CTu8UHB6/eq6KrnhJxJZr3Rk9vX/A4NBwamR0bHwiPTlVlEkKBRowANRcogEznwoKY4lEIBxHM4nDrnOy3/9AKEZIF/opohVDzS8JnLKFaqUPLrDdAGy7gtA4n7GW7TpwRbJ/Em/NdthDf3azTSri0v57JdRjY+TtplNUrX0vJkz28DdxPog89tbe9eXV9mZw1r6zq4HNPLAV5QTKcuWGapKTIRilEOSsiMJIaHnpAFlTX3igazE7cMTvKCVOnYDoctXuK3+nIiJ2XTc3SnR9SZ/Ou1xP+8cqTcjUrM/DBS4NPOR27EsQpwK0VcZwKo4k1NCBVM74rpGdEZKp1J4TNFta+Tu4mxXzOWsmtHuk0NlAHQ2gWzaEMstA62kb76BAVEU36AE9oWfj1ng0XozXTmuP8TEzjX7BeHsHVGesIQ=</latexit>

β‡’ 2F(yβˆ— + 2) βˆ’ F(yβˆ—) ≀ 1 βˆ’ Ξ΄(1 βˆ’ Ξ΄) Ξ΄(1 βˆ’ Ξ΄)(1 βˆ’ Ξ΄R)

<latexit sha1_base64="lK7aAaiHonfkzLe6PLlSfGoCFw=">ACQ3icbZDfShtBFMZnra0xtjG2l94MipC0JOxGieldQBAvU2lUmo1hdnI2GTL7h5mzLemyj9FX8Dm8QW8wV60wtFelvoZGPFP/1g4Mf3ncPMfF4shUbvrIWXiy+fLVUWC6uvH5TWi2vT3SUaI4dHkI3XiMQ1ShNBFgRJOYgUs8CQce5O9WX78FZQWUfgZpzH0AzYKhS84Q2MNyl/cQzEaI1Mq+kYb+5Xp6fsPjWothyp1JVDXV4ynTs0dgkRW+QfVLH3q3NpephVs0F5067buehzcO5gs938YX+/LZ1BuVLdxjxJIAQuWRa9xw7xn7KFAouISu6iYaY8QkbQc9gyALQ/TvIKNbxhlSP1LmhEhz9+FGygKtp4FnJgOGY/0m5n/y3oJ+q1+KsI4Qj5/CI/kRQjOiuUDoUCjnJqgHElzFspHzPTGZrai3kJH2dq3n/5ORw16s52feTaNF5iqQdbJBKsQhu6RNDkiHdAkn5+QnuSY31oX1y7q1fs9HF6y7nXfkaw/fwHBhLGr</latexit>
slide-26
SLIDE 26

Questions?

slide-27
SLIDE 27

Acknowledgement

  • This lecture is a slightly modified version of one prepared by
  • Asu Ozdaglar [MIT 6.254]