Reinforcement learning with restrictions on the action set Mario - PowerPoint PPT Presentation

Reinforcement learning with restrictions on the action set Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile Joint work with Mathieu Faure (AMSE-GREQAM)

Reinforcement learning with restrictions on the action set Introduction Outline 1 Introduction 2 The Model 3 Main Result 4 Examples 5 Sketch of the Proof

Reinforcement learning with restrictions on the action set Introduction Motivation Most debated and studied learning procedure in game theory : Fictitious play [Brown51] R S P R 0 1 -1 -1 0 1 S 1 -1 0 P Consider an N -player normal form game which is repeated in discrete time. At each time, players compute a best response to the opponent’s empirical average play. The idea is to study the asymptotic behavior of the empirical frequency of play of player i , v i n .

Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96]

Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96] Recall that A game G = ( N , ( S i ) i 2 N , ( G i ) i 2 N ) is a potential game if it exists a function k =1 S k ! R such that Φ : Π N G i ( s i , s � i ) � G i ( r i , s � i ) = Φ ( s i , s � i ) � Φ ( r i , s � i ) , for all s i , r i 2 S i and s � i 2 S � i . Primary example : Congestion games [Rosenthal 73]

Reinforcement learning with restrictions on the action set Introduction Motivation Large body of literature devoted to the question of identifying classes of games where the empirical frequencies of play converge to the set of Nash equilibria of the underlying game. Zero-sum games [Robinson 51] General (non-degenerate) 2 ⇥ 2 [Miyasawa 61] Potential games [Monderer and Shapley 96] 2-player games where one of the players has only two actions [Berger 05] New proofs and generalizations using stochastic approximation techniques [Benaim et al 05, Hofbauer and Sorin 06] Several variations and applications in multiple domains (transportation, telecomunications, etc)

Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information !

Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response.

Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response. (ii) Each player is informed of the action selected by her opponents at each stage ; thus she can compute the empirical frequencies

Reinforcement learning with restrictions on the action set Introduction Problem Players need a lot of information ! Three main assumptions are made here : (i) Each player knows the structure of the game, i.e. she knows her own payo ff function, so she can compute a best response. (ii) Each player is informed of the action selected by her opponents at each stage ; thus she can compute the empirical frequencies (iii) Each player is allowed to choose any action at each time, so that she can actually play a best response.

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] Most work in this direction proceeds as follows : a) construct a sequence of mixed strategies which are updated taking into account the payo ff they receive (which is the only information agents have access to) b) Study the convergence (or non-convergence) of this sequence.

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? ? ? S ? ? ? ? ? P ? ? ? ? ? Actions played : Actions played Payo ff received : Payo ff received :

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? ? ? ? ? P ? ? ? ? ? Actions played : R Actions played : D Payo ff received : 1 Payo ff received : -1

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? ? -1 ? ? P ? ? ? ? ? Actions played : R, S Actions played : D, C Payo ff received : 1, -1 Payo ff received : -1, 1

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? 2 -1 ? ? P ? ? ? ? ? Actions played : R, S, S Actions played : D, C, B Payo ff received : 1, -1, 2 Payo ff received : -1, 1, -2

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] A B C D E R ? ? ? 1 ? S ? 2 -1 ? ? P ? ? -10 ? ? Actions played : R, S, S, P Actions played : D, C, B, C Payo ff received : 1, -1, 2, -10 Payo ff received : -1, 1, -2, 10

Reinforcement learning with restrictions on the action set Introduction Dropping (i) and (ii) One approach (among many others) is to assume that the agents observe only their realized payo ff at each stage. Payo ff function are unkown This is the minimal information framework of the so-called reinforcement learning procedures [Borgers and Sarin 97, Erev and Roth 98] How do players use the available information ? Tipically, it is supposed that players are given a rule of behavior (a choice rule ) which depends on a state variable constructed by means of the aggregate information they gather.

Reinforcement learning with restrictions on the action set Introduction Dropping (iii) Players have restrictions on their action set, due to limited computational capacity or even to physical restrictions. Some hypotheses are needed regarding payers’ ability to explore their action set.

Reinforcement learning with restrictions on the action set Introduction Dropping (iii) Players have restrictions on their action set, due to limited computational capacity or even to physical restrictions. Some hypotheses are needed regarding payers’ ability to explore their action set. For example : R S P R 0 1 -1 S -1 0 1 P 1 -1 0 R S P This kind of restrictions were introduced recently by [Benaim and Raimond 10] in the fictitious play information framework.

Reinforcement learning with restrictions on the action set Introduction Our contribution In this work We drop all the three assumptions.

Reinforcement learning with restrictions on the action set The Model Outline 1 Introduction 2 The Model 3 Main Result 4 Examples 5 Sketch of the Proof

Reinforcement learning with restrictions on the action set The Model Setting Let G = ( N , ( S i ) i 2 N , ( G i ) i 2 N ) be a given finite normal form game i S i is the set of action profiles. S = Q ∆ ( S i ) is the mixed action set for player i , i.e 8 9 : � i 2 R | S i | : < � i ( s i ) = 1 , � i ( s i ) � 0 , 8 s i 2 S i = ∆ ( S i ) = X ; , s i 2 S i i ∆ ( S i ). and ∆ = Q As usual, we use the notation � i to exclude player i , namely S � i denotes j 6 = i S j and ∆ � i the set Q j 6 = i ∆ ( S i ). the set Q

Reinforcement learning with restrictions on the action set Mario - PowerPoint PPT Presentation

Reinforcement learning with restrictions on the action set Reinforcement learning with restrictions on the action set Mario Bravo Universidad de Chile Joint work with Mathieu Faure (AMSE-GREQAM) Reinforcement learning with restrictions on the

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Selectional Restrictions Selectional Restrictions Introduction Selectional Restrictions

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Reinforcement Learning for Continuous State and Action Spaces Gradient Methods 1 MACHINE LEARNING

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

Scaling limits of non-increasing Markov chains and applications to random trees and coalescents

Zeros and critical points of monochromatic random waves 06-18-2018 Yaiza Canzani The setting: (

Quantitative CLTs via Martingale Embeddings Dan Mikulincer Weizmann Institute of Science Joint

Inference for periodic Ornstein Uhlenbeck process driven by fractional Brownian motion Jeannette

Consensus and disagreement in opinion dynamics Nina Gantert Based on joint work with Markus

Determinantal point processes and spaces of holomorphic functions Yanqi Qiu AMSS, Chinese

The story of the film so far... Discrete random variables X 1 , . . . , X n on the same probability

Persisting randomness in randomly growing discrete structures: graphs and search trees R. Gr