Equilibria in large one-arm bandit games A. Salomon Universit e - PowerPoint PPT Presentation

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Equilibria in large one-arm bandit games A. Salomon Universit´ e Paris 13 HEC Paris November 25 th , 2008 A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: one-arm bandits There are N one-arm bandit machines. When operated, a machine yields a random payoff. Nature chooses a state Θ, which can be High (Θ = θ ) or Low (Θ = θ ). This state determines the distribution of payoff of all the N one-arm bandits, i.e. machines are perfectly correlated. If the state is High the expectation of a payoff, also denoted θ , and is positive. When the state is Low the expected payoff, denoted θ , is negative. The machines are operated sequentially, and conditionally to the state payoffs are i.i.d. across stages and across machines. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n 3 She observes who stayed in. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n 3 She observes who stayed in. Pay-offs are private information, but decisions are publicly observed. Players discount payoffs at rate δ : if player stop at stage τ , her overall payoff is � τ − 1 n =1 δ n − 1 X i n . A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A possible strategy for player i is to set a priori a sequence of α ) and to stop the game as soon as p i cutoffs π n ( � n − 1 < π n ( � α n − 1 ). A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A possible strategy for player i is to set a priori a sequence of α ) and to stop the game as soon as p i cutoffs π n ( � n − 1 < π n ( � α n − 1 ). Theorem [D.Rosenberg, E.Solan, N.Vieille] Assume that the law of p i 1 has a density. Every best reply is a cutoff strategy. There exists symmetric equilibria in cutoff strategy. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Question What happens when the number of players N goes to + ∞ ? A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies How it works Asymptotically Deterministic Equilibria How it could not work Results An Example A first large game result. Let us define the cutoff p ∗ as: p ∗ θ 1 − δ + (1 − p ∗ ) θ = 0. This is the cutoff that makes a player indifferent between leaving or staying one more stage and being told the state. A. Salomon Equilibria in large one-arm bandit games

Model Cutoff Strategies How it works Asymptotically Deterministic Equilibria How it could not work Results An Example A first large game result. Let us define the cutoff p ∗ as: p ∗ 1 − δ + (1 − p ∗ ) θ = 0. θ This is the cutoff that makes a player indifferent between leaving or staying one more stage and being told the state. Theorem [D.Rosenberg, E.Solan, N.Vieille] Assume that p i 1 has full support. In equilibria, as the number of players N gets large, all players tends to play with cutoff p ∗ after the first payoff: � ) − p ∗ | − sup i ∈{ 1 , 2 ,..., N } | π i 1 ( � N → + ∞ 0. − − − − → A. Salomon Equilibria in large one-arm bandit games

Equilibria in large one-arm bandit games A. Salomon Universit e - PowerPoint PPT Presentation

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November 25 th , 2008 A. Salomon Equilibria in large one-arm bandit

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Upper confidence bound strategy on stochastical bandits Multiarmed bandit: K arms, at each step we

Multi-Arm Bandit Sutton and Barto Sutton slides and Silver 1 Multi-Arm Bandits Sutton and

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurlien Garivier

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Equilibrium Exchange Rates in PNG: a new approach to policy- setting in RRDCs Martin Davies

To Most people, the Energy picture is way too FUZZY !! Good, Better, BEST. TODAYs Power

Dr. B. V. Venkatarama Reddy Professor, Dept of Civil Engineering, Indian Institute of Science,

Contract Disclosure and Monitoring A Presentation by Raphael B.T . Mgaya JURISolutions &

Congestion Games Karousatou Christina Algor. Game Theory June 2, 2011 Karousatou Christina

vf vfLr LrRo Ro es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

Symmetric structure of GreenNaghdi equations and global existence for small data of the viscous

Discrete Serrins Problem A. Carmona, A.M. Encinas and C. Ara uz Dept. Matem` atica

Sambuz

Useful Links

Newsletter

Mail Us

Equilibria in large one-arm bandit games A. Salomon Universit e - PowerPoint PPT Presentation

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November 25 th , 2008 A. Salomon Equilibria in large one-arm bandit

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Upper confidence bound strategy on stochastical bandits Multiarmed bandit: K arms, at each step we

Multi-Arm Bandit Sutton and Barto Sutton slides and Silver 1 Multi-Arm Bandits Sutton and

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurlien Garivier

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Equilibrium Exchange Rates in PNG: a new approach to policy- setting in RRDCs Martin Davies

To Most people, the Energy picture is way too FUZZY !! Good, Better, BEST. TODAYs Power

Dr. B. V. Venkatarama Reddy Professor, Dept of Civil Engineering, Indian Institute of Science,

Contract Disclosure and Monitoring A Presentation by Raphael B.T . Mgaya JURISolutions &amp;

Congestion Games Karousatou Christina Algor. Game Theory June 2, 2011 Karousatou Christina

vf vfLr LrRo Ro es esa O;oL oLFk Fkk Self-exploration, Self-investigation, Self-study 1.

Symmetric structure of GreenNaghdi equations and global existence for small data of the viscous

Discrete Serrins Problem A. Carmona, A.M. Encinas and C. Ara uz Dept. Matem` atica

Sambuz

Useful Links

Newsletter

Mail Us

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Contract Disclosure and Monitoring A Presentation by Raphael B.T . Mgaya JURISolutions &