Learning and Efficiency in Games (with Dynamic Population) va - PowerPoint PPT Presentation

Learning and Efficiency in Games (with Dynamic Population) Éva Tardos Cornell Joint work with Thodoris Lykouris and Vasilis Syrgkanis

Large population games: traffic routing • Traffic subject to congestion delays • cars and packets follow shortest path • Congestion game =cost (delay) depends only on congestion on edges

Example 2: advertising auctions $ $ $ advertising auctions • Advertisers leave and join the system • Changes in system setup • Advertiser values change 3

Questions + Motivation • Repeated game: How do players behave? • Nash equilibrium? • Today: Machine Learning • With players (or player objectives) changing over time • Efficiency loss due to selfish behavior of players (Price of Anarchy)

Traffic Pattern (optimal) delay C 1 hour x/100 0 min A B Time: 1.5 hours 1 hour D y/100

Not Nash equilibrium! C 1 hour x/100 0 min A B Time: 1.5 hours 1 hour D y/100 Nash: Stable solution: no incentive to deviate

Nash equilibrium C 1 hour x/100 0min 100 A B Time: 2 hours 1 hour D y/100 Nash: Stable solution: no incentive to deviate But how did the players find it?

Congestion game in Social Science Kleinberg- Oren STOC’11 Which project should I try? • Each project j has reward 𝑑 projects 𝑘 • Each player has a probability 𝑞 𝑗𝑘 for solving ??? • Fair credit: equally shared by discoverers Uniform players and fair sharing= congestion game Unfair sharing and/or different abilities: Vetta utility game

Nash as Selfish Outcome ? • Can the players find Nash? • Which Nash? Daskalakis-Goldberg- Papadimitrou’06 Nash exists, but …. Finding Nash is • PPAD hard in many games • Coordination problem (multiple Nash)

Repeated games a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time Outcome for Outcome for ( a 1 1 , a 2 1 , …, a n 1 ) ( a 1 t , a 2 t , …, a n t ) • Assume same game each period • Player’s value/cost additive over periods

Learning outcome a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time Maybe here they don’t By here they have a know how to play, who are better idea… the other players, …

Nash equilibrium a 1 1 a 1 2 a 1 3 a 1 a 1 a 1 a 1 a 1 a 1 a 1 a 1 a 1 time a 2 1 a 2 2 a 2 3 a 2 a 2 a 2 a 2 a 2 a 2 a 2 a 2 a 2 … … … … … … … … … … … … a n 1 a n 2 a n 3 a n a n a n a n a n a n a n a n a n Nash equilibrium: Stable actions a with no regret for any alternate strategy 𝑦 : 𝑑𝑝𝑡𝑢 𝑗 𝑦, 𝑏 −𝑗 ≥ 𝑑𝑝𝑡𝑢 𝑗 (𝑏) No regret

No-regret without stability: learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time For any fixed action 𝑦 (with d options) : 𝑢 ) 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ≤ 𝑑𝑝𝑡𝑢 𝑗 (𝑦, 𝑏 −𝑗 𝑢 𝑢 No-regret 𝑢 ) Regret: R i (x,T)= 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 − 𝑑𝑝𝑡𝑢 𝑗 (𝑦, 𝑏 −𝑗 ≤ 𝑝(𝑈) 𝑢 𝑢 Many simple rules ensure R i (x,T) approx. ~ 𝑈𝑚𝑝𝑕 𝑒 for all x MWU (Hedge), Regret Matching, etc.

No-regret without stability: learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time For any fixed action 𝑦 (with d options) : Approx. 𝑢 ) 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ≤ 𝑑𝑝𝑡𝑢 𝑗 (𝑦, 𝑏 −𝑗 𝑢 𝑢 no-regret 𝑢 ) Regret: R i (x,T)= 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 − (1 + 𝜗) 𝑑𝑝𝑡𝑢 𝑗 (𝑦, 𝑏 −𝑗 ≤ 𝑝(𝑈) 𝑢 𝑢 Many simple rules ensure R i (x,T) approx. ~ 𝑃(log 𝑒/𝜗) for all x MWU (Hedge), Regret Matching, etc. Foster, Li, Lykouris, Sridharan, T’16

Dynamics of rock-paper-scissor 1 1 1 3 3 Nash: 3 Scissor R P S - 9 1 -1 R -1 1 -9 Learning -1 -9 1 P dynamic 1 -9 -1 1 -1 -9 S -1 1 -9 Rock Paper Payoffs/utility • Doesn’t converge • correlates on shared history

Main Question • Efficiency loss due to selfish behavior of players (Price of Anarchy) • In repeated game settings • With players (or player objectives) changing over time Examples $ $ $ internet routing advertising auctions • Advertisers leave and join the system • Traffic changes over time • Advertiser values change 16

Result: routing, limit for very small users Theorem (Roughgarden- T’02): In any network with continuous, non-decreasing cost functions and small users cost of opt with cost of Nash with  rates 2r i for all i rates r i for all i Nash equilibrium: stable solution where no player had incentive to deviate. cost of worst Nash equilibrium Price of Anarchy= “socially optimum” cost

Quality of Learning outcomes: Price of Total Anarchy Bounds average welfare assuming no-regret learners 𝑈 1 𝑈 𝑑𝑝𝑡𝑢(𝑏 𝑢 ) 𝑢=1 Price of Total Anarchy= lim 𝑈→∞ “socially optimum” cost [Blum, Hajiaghayi, Ligett, Roth, 2008] 18

Result 2: routing with learning players Theorem (Blum, Even- Dar, Ligett’06; Roughgarden’09): Price of anarchy bounds developed for Nash equilibria extend to no- regret learning outcomes a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time Assumes a stable set of participants

Today: Dynamic Population Classical model: • Game is repeated identically and nothing changes Dynamic population model: At each step t each player i is replaced with an arbitrary new player with probability p In a population of N players, each step, Np players replaced in expectation 20

Learning players can adapt…. Goal: Bound average welfare assuming adaptive no-regret learners 𝑈 𝑑𝑝𝑡𝑢(𝑏 𝑢 , 𝑤 𝑢 ) 𝑢=1 𝑄𝑝𝐵 = lim 𝑈 𝑃𝑞𝑢(𝑤 𝑢 ) 𝑈→∞ 𝑢=1 where 𝑤 𝑢 is the vector of player types at time t even when the rate of change is high, i.e. a large fraction can turn over at every step. 21

Need for adaptive learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t time Example routing • Strategy = path • Best “fixed” strategy in hindsight very weak in changing environment • Learners can adapt to the changing environment 22

Need for adaptive learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … a n 1 a n 2 a n 3 a n t projects time Example 2: matching (project selection) • Strategy = choose a project • Best “fixed” strategy in hindsight very weak in changing environment • Learners can adapt to the changing environment 23

Adaptive Learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … 𝜐 2 𝜐 1 a n 1 a n 2 a n 3 a n t time • Adaptive regret [Hazan- Seshadiri’07, Luo - Schapire’15, Blum - Mansour’07, Lehrer’03] for all player i, strategy x and interval [𝜐 1 , 𝜐 2 ] 𝜐 2 𝑢 ; 𝑤 𝑢 𝑆 𝑗 𝑦, 𝜐 1 , 𝜐 2 = 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ; 𝑤 𝑢 − 𝑑𝑝𝑡𝑢 𝑗 𝑦, 𝑏 −𝑗 ≤ 𝑝 𝜐 2 − 𝜐 1 𝑢=𝜐 1 rates of ~ 𝜐 2 − 𝜐 1  Regret with respect to a strategy that changes k times ≤ ~ 𝑙𝑈 24

Adaptive Learning a 1 1 a 1 2 a 1 3 a 1 t a 2 1 a 2 2 a 2 3 a 2 t … … … … 𝜐 2 𝜐 1 a n 1 a n 2 a n 3 a n t time • Adaptive regret [Foster,Li,Lykouris,Sridharan,T’16] for all player i, strategy x and interval [𝜐 1 , 𝜐 2 ] 𝑢 ; 𝑤 𝑢 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ; 𝑤 𝑢 − 1 + 𝜗 𝑑𝑝𝑡𝑢 𝑗 𝑦, 𝑏 −𝑗 𝜐 2 𝑆 𝑗 𝑦, 𝜐 1 , 𝜐 2 = ≤ 𝑃(k log 𝑒/𝜗) 𝑢=𝜐 1 Regret with respect to a strategy that changes k times Using any of MWU (Hedge), Regret Matching, etc. mixed with a bit of “forgetting” 25

Result (Lykouris, Syrgkanis, T’16) : Bound average welfare close to Price of Anarchy for Nash 𝟐 even when the rate of change is high, 𝒒 ≈ 𝐦𝐩𝐡 𝒐 with n players assuming adaptive no-regret learners - Worst case change of player type  need for adapting to changing environment - Sudden large change is unlikely 26

No-regret and Price of Anarchy Low regret: 𝑈 𝑢 ; 𝑤 𝑢 𝑆 𝑗 𝑦 = 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ; 𝑤 𝑢 − 𝑑𝑝𝑡𝑢 𝑗 𝑦, 𝑏 −𝑗 ≤ 𝑝 𝑈 projects 𝑢=1 Best action varies with choices of others… Consider Optimal Solution ∗ be the choice in OPT Let x= 𝑏 𝑗 No regret for all players i: 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ≤ 𝑑𝑝𝑡𝑢 𝑗 (𝒃 𝒋 ∗ , 𝑏 −𝑗 ) 𝑢 𝑢 ∗ Players don’t have to know 𝒃 𝒋 27

Proof Technique: Smoothness (Roughgarden’09 ) ∗ in optimum Consider optimal solution: player i does action 𝑏 𝑗 𝑢 ) No regret: 𝑑𝑝𝑡𝑢 𝑗 𝑏 𝑢 ≤ 𝑑𝑝𝑡𝑢 𝑗 (𝑏 𝑗 ∗ , 𝑏 −𝑗 ∗ ) (doesn’t need to know 𝑏 𝑗 𝑢 𝑢 A game is ( λ,μ )-smooth ( λ > 0; μ< 1 ): if for all strategy vectors a ∗ , 𝑏 −𝑗 𝑑𝑝𝑡𝑢 𝑗 𝑏 ≤ 𝑑𝑝𝑡𝑢 𝑗 (𝑏 𝑗 ) ≤ 𝜇 𝑃𝑄𝑈 + 𝜈 𝑑𝑝𝑡𝑢(𝑏) 𝑗 𝑗 𝜇 A Nash equilibrium a has cost(a) ≤ 1−𝜈 Opt

Learning and Efficiency in Games (with Dynamic Population) va - PowerPoint PPT Presentation

Learning and Efficiency in Games (with Dynamic Population) va Tardos Cornell Joint work with Thodoris Lykouris and Vasilis Syrgkanis Large population games: traffic routing Traffic subject to congestion delays cars and packets follow

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Games with Sequential Actions: (Finite) Extensive- Form Games Xinshuo Weng Outline What are

Digital Games An Introduction What are Digital Games? Commonly referred to as video games

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Diversification, Efficiency, and Diversification, Efficiency, and Diversification, Efficiency,

ECON 4100: Industrial Organization Lecture 2- Efficiency 1 Overview Efficiency and markets

El Paso Electric El Paso Electric Energy Efficiency Energy Efficiency Standard Offer Programs -

Models of concurrency, categories, and games Pierre Clairambault and Glynn Winskel Models of

Coarsening the density of defects after a very slow quench Leticia F. Cugliandolo Universit

Routing in Cost-shared Networks: Equilibria and Dynamics Debmalya Panigrahi (joint works with

Estimating dynamic stochastic general equilibrium models in Stata David Schenck Senior

Supply and Shorting in Speculative Markets Marcel Nutz Columbia University with Johannes

Equilibration process of the QGP and its connection to jet physics Sren Schlichting |

REALITY Road Emission Activity-Link based InvenTorY Megan Lebacque Ecole des Ponts

Recent Results and Challenges in Glassy and Out of Equilibrium Dynamics Giulio Biroli Institute

L ECTURE 10: F EEDBACK - BASED C ONTROL 2 I NSTRUCTOR : G IANNI A. D I C ARO FOLLOWING A PATH (PI

Sambuz

Useful Links

Newsletter

Mail Us

Learning and Efficiency in Games (with Dynamic Population) va - PowerPoint PPT Presentation

Learning and Efficiency in Games (with Dynamic Population) va Tardos Cornell Joint work with Thodoris Lykouris and Vasilis Syrgkanis Large population games: traffic routing Traffic subject to congestion delays cars and packets follow

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Games with Sequential Actions: (Finite) Extensive- Form Games Xinshuo Weng Outline What are

Digital Games An Introduction What are Digital Games? Commonly referred to as video games

Tom Nichols VP PC Games, North America Aeria Games &amp; Entertainment Agenda Aeria Games?

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Diversification, Efficiency, and Diversification, Efficiency, and Diversification, Efficiency,

ECON 4100: Industrial Organization Lecture 2- Efficiency 1 Overview Efficiency and markets

El Paso Electric El Paso Electric Energy Efficiency Energy Efficiency Standard Offer Programs -

Models of concurrency, categories, and games Pierre Clairambault and Glynn Winskel Models of

Coarsening the density of defects after a very slow quench Leticia F. Cugliandolo Universit

Routing in Cost-shared Networks: Equilibria and Dynamics Debmalya Panigrahi (joint works with

Estimating dynamic stochastic general equilibrium models in Stata David Schenck Senior

Supply and Shorting in Speculative Markets Marcel Nutz Columbia University with Johannes

Equilibration process of the QGP and its connection to jet physics Sren Schlichting |

REALITY Road Emission Activity-Link based InvenTorY Megan Lebacque Ecole des Ponts

Recent Results and Challenges in Glassy and Out of Equilibrium Dynamics Giulio Biroli Institute

L ECTURE 10: F EEDBACK - BASED C ONTROL 2 I NSTRUCTOR : G IANNI A. D I C ARO FOLLOWING A PATH (PI

Sambuz

Useful Links

Newsletter

Mail Us

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?