ece700 07 game theory with engineering applications
play

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - PowerPoint PPT Presentation

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi Outline Finitely and infinitely repeated games w/ and w/o perfect monitoring Trigger strategies Folk theorems Readings:


  1. ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi

  2. Outline • Finitely and infinitely repeated games w/ and w/o perfect monitoring • Trigger strategies • Folk theorems • Readings: • MAS Sec. 6.1, GT Sec. 5.1 and 5.5

  3. <latexit sha1_base64="praSL3IReXNHTQC1lxMvD0NwPxQ=">ACFHicbVDLSgMxFM3UV62vqks3wSJUxDJTxQdYqLhxWcU+oNMOmTRtQzMPkoxQhvkIN/6Cn+DGhSJuBd35A278BRemnSJqPRA4nHMuN/fYPqNC6vqblpiYnJqeSc6m5uYXFpfSysV4QUckzL2mMdrNhKEUZeUJZWM1HxOkGMzUrV7JwO/ekm4oJ57Ifs+aTio49I2xUgqyUpvBRaFBWiKwLFCXjCiZngemS3CJGqGfNuIoAo0wyzfjFJWOqPn9CHgODFGJFM8frn5zH+8l6z0q9nycOAQV2KGhKgbui8bIeKSYkailBkI4iPcQx1SV9RFDhGNcHhUBDeU0oJtj6vnSjhUf06EyBGi79gq6SDZFX+9gfifVw9k+6ARUtcPJHFxvKgdMCg9OGgItignWLK+Ighzqv4KcRdxhKXqMS7hcIC975PHSWfM3Zyu2eqjSMQIwnWwDrIAgPsgyI4BSVQBhcgVtwDx60a+1Oe9Se4mhCG82sgl/Qnr8ACui1Q=</latexit> Finitely Repeated Games (with Perfect Monitoring) • In repeated games, stage game 𝐻 is played by same agents for R rounds • Agents discount utilities by discount factor 0 ≤ 𝜀 ≤ 1 • Game is denoted by 𝐻 & 𝜀 • At each round, outcomes of all past rounds are observed by all agents • Agents’ overall utility is sum of discounted utilities at each round ) , … , 𝑣 ( & • Given sequence of utilities 𝑣 ( R δ r − 1 u ( r ) X u i = i r =1 • In general, strategies at each round could depend on history of play • Memory-less (also called stationary ) strategies are special cases

  4. Example: Finitely-Repeated Prisoners’ Dilemma • Suppose that Prisoners’ Dilemma is played in R ( < ∞ ) rounds Prisoner 2 Stay Silent Confess Prisoner 1 Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2) • What is SPE of this game? • We can use backward induction • Starting from last round, (C, C) is dominant strategy • Regardless of history, (C, C) is dominant strategy at each round • There exists unique SPE which is (C, C) at each round

  5. SPE in Finitely Repeated Games [Theorem] • If stage game 𝐻 has unique pure strategy equilibrium 𝑡 ∗ , then 𝐻 & 𝜀 has unique SPE in which 𝑡 0 = 𝑡 ∗ for all 𝑠 = 1, … , 𝑆 , regardless of history [Proof] • By backward induction, at round 𝑆 , we have 𝑡 & = 𝑡 ∗ • Given this, then we have 𝑡 &4) = 𝑡 ∗ , and continuing inductively, 𝑡 0 = 𝑡 ∗ for all 𝑠 = 1, … , 𝑆 , regardless of history

  6. <latexit sha1_base64="HaS4GJK/zsYRK2PrnJxz5C2EcQ8=">ACI3icbVDLSsNAFJ34rPVdelmUIS6sCQqvlBQ3LhUsCo0bZhMJ3XoZBJmboQS8i9uxD9x48IHbly4deMvuHDaiPg6MHA451zu3OPHgmuw7Wer39gcGi4MFIcHRufmCxNTZ/oKFGUVWkInXmE80El6wKHAQ7ixUjoS/Yqd/e7/qnF0xpHslj6MSsHpKW5AGnBIzklbYSj+MdXHaW3CYTQBZdnYReqnacrJG6XAbQyXKnkaolJ8Mm30jLajEreqV5u2L3gP8S5PM7+49XL8v70eqVHtxnRJGQSqCBa1xw7hnpKFHAqWFZ0E81iQtukxWqGShIyXU97N2Z4wShNHETKPAm4p36fSEmodSf0TIkcK5/e13xP6+WQLBRT7mME2CS5ouCRGCIcLcw3OSKURAdQwhV3PwV03OiCAVTa17CZhdrXyf/JSfLFWelsnpk2thGOQpoFs2hMnLQOtpFB+gQVRFl+gG3aF768q6tR6tpzaZ3OzKAfsF4+ADl7qLQ=</latexit> <latexit sha1_base64="UD7+EuZcQV4tNkSs84mcnMkcsZI=">ACLHicbVBLSwMxGMz6rPV9egl+AC9lF0VrYdCwYvHWqwK3bpk02wbms0uybdKWfYHefGvCOJBEa/evHoSNG1FfA0EJjPfkHzjx4JrsO0Ha2R0bHxiMjeVn56ZnZsvLCye6ChRlNVpJCJ15hPNBJesDhwEO4sVI6Ev2KnfPej7pxdMaR7JY+jFrBmStuQBpwSM5BUOEo/jMnYFD7205ire7gBRKrELpcB9DLsBorQ1NWJGVBlJztPa5kJnacbajPLzCXvFVbtoj0A/kucT7Jacd7fVtZeXqte4dZtRTQJmQqiNYNx46hmRIFnAqW5d1Es5jQLmzhqGShEw308GyGV43SgsHkTJHAh6o3xMpCbXuhb6ZDAl09G+vL/7nNRISs2UyzgBJunwoSARGCLcbw63uGIURM8QhU3f8W0Q0w5YPodlrDfx+7Xyn/JyVbR2S7uHJk2SmiIHFpGK2gDOWgPVdAhqI6ougK3aB79GBdW3fWo/U0HB2xPjNL6Aes5w+Qyq2e</latexit> Infinitely Repeated Games • Infinitely repeated play of 𝐻 with discount factor 𝜀 is denoted by 𝐻 5 𝜀 • Agents’ utility is average of discounted utilities at each round ) , … , 𝑣 ( 5 • For 𝜀 < 1 , given sequence of utilities 𝑣 ( ∞ δ r − 1 u ( r ) X u i = (1 − δ ) i r =1 ) , … , 𝑣 ( 5 • For 𝜀 = 1 , given sequence of utilities 𝑣 ( r =1 u ( r ) P R i u i = lim R R →∞

  7. <latexit sha1_base64="J6hXRWqJpSkiJ6eFsJQRNSPICXs=">ACcnicbVFdaxQxFM1Mq9b1a1V8UdCri/aDsy24AdsoeCLjxXctrBZh0z2zjZtJjMkd6TLMC+fd89U/4IuvgpnZRaz1QuDk3HOS3JOk0MpRFH0LwpXVK1evrV3v3Lh56/ad7t17hy4vrcSRzHVujxPhUCuDI1Kk8biwKLJE41Fy9rbpH31C61RuPtC8wEkmZkalSgryVNz94mL1sdqgzXqPJzhTpL+NFd3Gn4LXnDCc6pUCjU4r7ObNex5tLUNPM2t0BrskDhv5acX5fHp0sANtrvGhOd+JgcWhtDYOJrp8sa424v6UVtwGQyWoLe/vlX+X1VH8Tdr3yayzJDQ1IL58aDqKBJSwpqbHu8NJhIeSZmOHYQyMydJOqjayG56Zgh/BL0PQsn87KpE5N8Sr8wEnbh/ew35v964pPT1pFKmKAmNXFyUlhohyZ/mCqLkvTcAyGt8m8FeSKskOR/qdOG8Kapl39GvgwOd/qD3f7ue5/GkC1qjT1iz9gG7BXbJ+9YwdsxCT7ETwIHgdPgp/hw/Bp2FtIw2Dpuc8uVLj9G3Jbvms=</latexit> Trigger Strategies (TS) • Agents get punished if they deviate from agreed profile • In non-forgivingTS (or grim TS), punishment continues forever ( if s ( r ) = s ∗ , 8 r < t s ∗ s ( t ) i = if s ( r ) i s j 6 = s ∗ j , 9 r < t i j 6 is punishment strategy of 𝑗 for agent 𝑘 • Here, 𝑡 ∗ is agreed profile, and 𝑡 ( 6 , forever • Single deviation by 𝑘 trigers agent 𝑗 to switch to 𝑡 (

  8. Trigger Strategies in Repeated Prisoners’ Dilemma Prisoner 2 • Suppose both agents use following trigger strategy Stay Silent Confess Prisoner 1 • Play S unless someone has ever played C in past Stay Silent (-1, -1) (-3, 0) • Play C forever if someone has played C in past Confess (0, -3) (-2, -2) • Under what conditions is this SPE? • We use one-stage deviation principle • Step 1: (S is best response to S) 1 + 𝜀 + 𝜀 ; + ⋯ • Utility from S: − 1 − 𝜀 = − 1 − 𝜀 / 1 − 𝜀 = −1 0 + 2𝜀 + 2𝜀 ; + ⋯ • Utility from C: − 1 − 𝜀 = −2𝜀 1 − 𝜀 / 1 − 𝜀 = −2𝜀 • S is better than C if 𝜀 ≥ 1/2 • Step 2: (C is best response to C) • Other agents will always play C, thus C is best response

  9. Remarks • Cooperation is equilibrium, but so are many other strategy profiles • If 𝑡 ∗ is NE of 𝐻 , then “ each agent plays 𝑡 ( ∗ ” is SPE of 𝐻 & 𝜀 • Future play of other agents is independent of how each agent plays • Optimal play is to maximize current utility, i.e. , play static best response • Sets of equilibria for finite and infinite horizon versions can be different • Multiplicity of equilibria in repeated prisoner’s dilemma only occurs at 𝑆 = ∞ • For any finite 𝑆 (thus for 𝑆 → ∞ ), repeated prisoners’ dilemma has unique SPE

  10. TS in Finitely Repeated Games • If 𝐻 has multiple equilibria, then 𝐻 & (𝜀) does not have unique SPE • Consider following example Agent 2 x y z Agent 1 x (3, 3) (0, 4) (-2, 0) y (4, 0) (1, 1) (-2, 0) z (0, -2) (0, -2) (-1, -1) • Stage game has two pure NE: (y, y) and (z, z) • Socially optimal outcome, (x, x), is not equilibrium • In twice repeated play, we can support (x, x) in first round

  11. TS in Finitely Repeated Games (cont.) • TS strategy • Play x in first round • Play y in second round if opponent played x; otherwise, play z • We can use one-shot deviation principle • For simplicity, suppose 𝜀 = 1 • Playing x first and y next leads to utility of 4 • Playing y first triggers opponent to play z next, which leads to utility 3 • Deviation is not profitable!

  12. Repetition Can Lead to Bad Outcomes • Consider this game Agent 2 x y z Agent 1 x (2, 2) (2, 1) (0, 0) y (1, 2) (1, 1) (-1, 0) z (0, 0) (0, -1) (-1, -1) • Strategy x strictly dominates y and z for both agents • Unique Nash equilibrium of stage game is (x, x) • If 𝜀 ≥ 1/2 , this game has SPE in which (y, y) is played in every round • It is supported by slightly more complicated strategy than grim trigger • I. Play y in every round unless someone deviates, then go to II • II. Play z. If no one deviates go to I. If someone deviates stay in II

  13. <latexit sha1_base64="IkDCELqS51Mqf/r7UGub8peXdw=">ACLXicbVDLSgMxFM34tr6qLt0Ei6CgZWrBygIunCpYGuhU4c7adqGJpkhyYhlmA/xF9z4Fe5FcKGILv0N06mIrwOBwzncnNPEHGmjes+OUPDI6Nj4xOTuanpmdm5/PxCVYexIrRCQh6qWgCaciZpxTDaS1SFETA6XnQPez75dUaRbKM9OLaENAW7IWI2Cs5OePhJ9sPQiYSnex6oNvYEk37iadYWkJmpleDKT7RvQ7HPVi1Z/+av+fmCW3Qz4L+k9EkKBztv10t3teaJn3/wmiGJBZWGcNC6XnIj0hAGUY4TXNerGkEpAtWrdUgqC6kWTXpnjFKk3cCpV90uBM/T6RgNC6JwKbFGA6+rfXF/z6rFp7TQSJqPYUEkGi1oxybE/epwkylKDO9ZAkQx+1dMOqCAGFtwLitht4+tr5P/kupmsVQulk9tG3togAm0hJbRKiqhbXSAjtEJqiCbtA9ekLPzq3z6Lw4r4PokPM5s4h+wHn/A8IrGw=</latexit> <latexit sha1_base64="nKWPzLxWbg6YlOqMulrJ9pCOQ7c=">ACLXicbVDLSgMxFM34tr6qLrsJFkFBy4yCVlAQdOFSwWqhU0Imk9ZgkhmSTLEM8yH+ghu/wr0ILiqiS3/DTCtSHwcCh3Pu4eaeIOZMG9ftOSOjY+MTk1PThZnZufmF4uLShY4SRWiNRDxS9QBrypmkNcMp/VYUSwCTi+D6Pcv+xQpVkz03pk2B25K1GMHGSqh47CcypCqPp50MXgAfcEkSn3N2gKjdJNlmZXwDUo1YhlMEFuzZGPIX0fFsltx+4B/ifdFyofV9vSQz08RcUnP4xIqg0hGOtG54bm2aKlWGE06zgJ5rGmFzjNm1YKrGgupn2r83gqlVC2IqUfdLAvjqcSLHQuisCOymwudK/vVz8z2skplVtpkzGiaGSDBa1Eg5NBPqYMgUJYZ3LcFEMftXSK6wsTYgv9EvZy7Hyf/JdcbFW87cr2mW1jHwBUpgBawBD+yCQ3ACTkENEHAHkEPvDj3zrPz6rwNRkecr8wy+AHn4xP/gaz1</latexit> Feasible and Individually Rational Utilities • 𝑊 = Convex hull of 𝑤 ∈ ℝ ℐ there exists 𝑡 ∈ 𝑇 such that 𝑣 s = 𝑤 • Utility in repeated game is just a weighted average of utilities in stage game • Note that 𝑊 ≠ 𝑤 ∈ ℝ ℐ there exists 𝜏 ∈ Σ such that 𝑣 𝜏 = 𝑤 • Recall minmax value of agent 𝑗 v i = min σ − i max u i ( s i , σ − i ) s i • Also recall minmax strategy against 𝑗 m i − i = arg min σ − i max u i ( s i , σ − i ) s i • Utility vector 𝑤 ∈ ℝ ℐ is strictly individually rational if 𝑤 ( > 𝑤 ( , ∀𝑗

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend