Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum - PowerPoint PPT Presentation

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games H.L. Prasad † , Prashanth L.A. ♯ and Shalabh Bhatnagar ♯ † Streamoid Technologies, Inc ♯ Indian Institute of Science H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 1 / 21

Multi-agent RL setting Environment � r 1 , r 2 , . . . , r N � Reward r = , a 1 , a 2 , . . . , a N � Action a = � next state y . . . 1 2 N Agents H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 2 / 21

Problem area Markov Chains ( S , p ) Markov Normal-form Decision Games Processes ( N, A , r ) , ( S , A , p, r, β ) , N -agents single agent Stochastic Games ( N, S , A , p, r, β ) , N -agents H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 3 / 21

Problem area (revisited) � Zero-sum � Zero-sum Normal-form Stochastic Games Games General- � sum Design Objective: ! General-sum Online algorithm, Convergence to Nash equilibrium 1 1If NE is a useful objective for learning in games, then we have a strong contribution! H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 4 / 21

A General Optimization Problem H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 5 / 21

Value function � � � β t � v π ( s ) = E r ( s t , a ) π ( s t , a ) | s 0 = s t a ∈A ( x ) Value function Reward Policy A stationary Markov strategy π ∗ = π 1 ∗ , π 2 ∗ , . . . , π N ∗ � � is said to be Nash if v i π ∗ ( s ) ≥ v i � π i ,π − i ∗ � ( s ) , ∀ π i , ∀ i, ∀ s ∈ S H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 6 / 21

Dynamic Programming Idea � � v i E π i ( x ) Q i π − i ∗ ( x, a i ) π ∗ ( x ) = max , π i ( x ) ∈ ∆( A i ( x )) Marginal Value after fixing a i ∼ π i Optimal (Nash) Value where Q-value is given by   Q i π − i ( x, a i ) = E π − i ( x )  r i ( x, a ) + β � p ( y | x, a ) v i ( y )  y ∈ U ( x ) H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 7 / 21

Optimization problem - informal terms Need to solve: v i E π i ( x ) Q i π − i ∗ ( x, a i ) � � π ∗ ( x ) = max (1) π i ( x ) ∈ ∆( A i ( x )) Formulation: Objective. minimize the Bellman error v i ( x ) − E π i Q i π − i ( x, a i ) in every state, for every agent Constraint 1. ensure policy π is a distribution Constraint 2. Q i π − i ( x, a i ) ≤ v i π ( x ) ← − a proxy for the max in (1) H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 8 / 21

Optimization problem in formal terms N � � v i ( x ) − E π i Q i π − i ( x, a i ) � � min v,π f ( v, π ) = i =1 x ∈S subject to π i ( x, a i ) ≥ 0 , ∀ a i ∈ A i ( x ) , x ∈ S , i = 1 , 2 , . . . , N, N � π i ( x, a i ) = 1 , ∀ x ∈ S , i = 1 , 2 , . . . , N. i =1 π − i ( x, a i ) ≤ v i ( x ) , ∀ a i ∈ A i ( x ) , x ∈ S , i = 1 , 2 , . . . , N. Q i H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 9 / 21

Solution approach Usual approach: Apply KKT conditions to solve the general optimization problem Caveat: Imposes a tricky linear independence requirement Alternative: Use a simpler set of SG-SP conditions H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 10 / 21

A sufficient condition SG-SP Point A point ( v ∗ , π ∗ ) is said to be an SG-SP point if it is feasible and for all x ∈ X and i ∈ { 1 , 2 , . . . , N } ∀ a i ∈ A i ( x ) π i ∗ ( x, a i ) g i x,a i ( v i ∗ , π − i ∗ ( x )) = 0 , where g i x,a i ( v i , π − i ( x )) := Q i π − i ( x, a i ) − v i ( x ) . Nash ⇔ SG-SP: A strategy π ∗ is Nash if and only if ( v ∗ , π ∗ ) is an SG-SP point H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 11 / 21

An Online Algorithm: ON-SGSP H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 12 / 21

ON-SGSP’s decentralized online learning model Environment r , y r , y a 1 r , y a 2 a N . . . 1 2 N ON-SGSP H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 13 / 21

ON-SGSP - operational flow Policy Evaluation Policy π i Value v π i Policy Improvement Policy evaluation: estimate the value function using temporal difference (TD) learning Policy improvement: perform gradient descent for the policy using a descent direction Descent direction ensures convergence to a global minimum of the optimization problem H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 14 / 21

More on the descent direction Descend along � � � TD-learning for � g i x,a i ( v i , π − i ) − π i ( x, a i ) � � � � ∂f ( v, π ) policy evaluation � × sgn ∂π i From Lagrange multiplier and slack variable theory Solution tracks an ODE with limit as an SG-SP point 1 sgn is a continuous version of sgn H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 15 / 21

Experiments H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 16 / 21

A single state non-generic 2-player game Payoff Matrix Player 2 → a 1 a 2 a 3 Player 1 ↓ a 1 1 , 0 0 , 1 1 , 0 0 , 1 1 , 0 1 , 0 a 2 a 3 0 , 1 0 , 1 1 , 1 H.L. Prasad, Prashanth L A, Shalabh Bhatnagar RL Algorithms for NE in General-Sum Games 17 / 21

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum - PowerPoint PPT Presentation

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games H.L. Prasad , Prashanth L.A. and Shalabh Bhatnagar Streamoid Technologies, Inc Indian Institute of Science H.L. Prasad, Prashanth L A,

Free-Fall Timescale of Sun Free-fall timescale: The time it would take a star (or cloud) to

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Uniqueness of Nash Equilibria in Atomic Splittable Congestion Games Veerle Timmermans Tobias

Algorithms for finding Nash Equilibria Ethan Kim School of Computer Science McGill University

Computing Pure Nash Equilibria in Symmetric Action Graph Games Albert Xin Jiang Kevin

Oblivious AQM and Nash Equilibria Dutta, Goal and Heidmann In Proceedings of the IEEE Infocom,

Finding Nash Equilibria in Certain Classes of 2-Player Game Adrian Vetta McGill University

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Concepts Game Theory MohammadAmin Fazli Algorithmic Game Theory 1 TOC Computing the Nash

Algorithmic Game Theory CoReLab (NTUA) Lecture 3: Tractability of Nash Equilibria PPAD

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder

Harvard Medical School Editor in Chief Hepatology Up to Date Def Definition of of NASH NASH

Models of Language Evolution Evolutionary game theory & signaling games Michael Franke

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Distributed Prediction Markets modeled by Janyl Jumadinova Raj Dasgupta Weighted Bayesian

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 2: Game Theory Preliminaries

A Strong Belief, Loosely Held: Bringing Empathy to IT Nirmal Mehta (@normalfaults) Nirmal Mehta

Mean-Field optimization problems and non-anticipative optimal transport Beatrice Acciaio London

Fiscal Bi-Monthly Call Mary Voorhies and Betty Jo Schafer September 12th, 2019 Agenda Final

T T o orah Portion Outline of Bamidbar The Torah portion derives its title from v1:1 In

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum - PowerPoint PPT Presentation

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games H.L. Prasad , Prashanth L.A. and Shalabh Bhatnagar Streamoid Technologies, Inc Indian Institute of Science H.L. Prasad, Prashanth L A,

Free-Fall Timescale of Sun Free-fall timescale: The time it would take a star (or cloud) to

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Uniqueness of Nash Equilibria in Atomic Splittable Congestion Games Veerle Timmermans Tobias

Algorithms for finding Nash Equilibria Ethan Kim School of Computer Science McGill University

Computing Pure Nash Equilibria in Symmetric Action Graph Games Albert Xin Jiang Kevin

Oblivious AQM and Nash Equilibria Dutta, Goal and Heidmann In Proceedings of the IEEE Infocom,

Finding Nash Equilibria in Certain Classes of 2-Player Game Adrian Vetta McGill University

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Concepts Game Theory MohammadAmin Fazli Algorithmic Game Theory 1 TOC Computing the Nash

Algorithmic Game Theory CoReLab (NTUA) Lecture 3: Tractability of Nash Equilibria PPAD

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder

Harvard Medical School Editor in Chief Hepatology Up to Date Def Definition of of NASH NASH

Models of Language Evolution Evolutionary game theory &amp; signaling games Michael Franke

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Distributed Prediction Markets modeled by Janyl Jumadinova Raj Dasgupta Weighted Bayesian

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 2: Game Theory Preliminaries

A Strong Belief, Loosely Held: Bringing Empathy to IT Nirmal Mehta (@normalfaults) Nirmal Mehta

Mean-Field optimization problems and non-anticipative optimal transport Beatrice Acciaio London

Fiscal Bi-Monthly Call Mary Voorhies and Betty Jo Schafer September 12th, 2019 Agenda Final

T T o orah Portion Outline of Bamidbar The Torah portion derives its title from v1:1 In

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Models of Language Evolution Evolutionary game theory & signaling games Michael Franke