equilibria in large one arm bandit games
play

Equilibria in large one-arm bandit games A. Salomon Universit e - PowerPoint PPT Presentation

Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November 25 th , 2008 A. Salomon Equilibria in large one-arm bandit


  1. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Equilibria in large one-arm bandit games A. Salomon Universit´ e Paris 13 HEC Paris November 25 th , 2008 A. Salomon Equilibria in large one-arm bandit games

  2. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: one-arm bandits There are N one-arm bandit machines. When operated, a machine yields a random payoff. Nature chooses a state Θ, which can be High (Θ = θ ) or Low (Θ = θ ). This state determines the distribution of payoff of all the N one-arm bandits, i.e. machines are perfectly correlated. If the state is High the expectation of a payoff, also denoted θ , and is positive. When the state is Low the expected payoff, denoted θ , is negative. The machines are operated sequentially, and conditionally to the state payoffs are i.i.d. across stages and across machines. A. Salomon Equilibria in large one-arm bandit games

  3. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. A. Salomon Equilibria in large one-arm bandit games

  4. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. A. Salomon Equilibria in large one-arm bandit games

  5. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. A. Salomon Equilibria in large one-arm bandit games

  6. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n A. Salomon Equilibria in large one-arm bandit games

  7. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n 3 She observes who stayed in. A. Salomon Equilibria in large one-arm bandit games

  8. Model Cutoff Strategies One-arm bandits Asymptotically Deterministic Equilibria Progress of the game Results An Example Model: progress of the game Each of the N players operates her machine in discrete time and must decide when to stop. The decision to stop is irreversible and yields an outside payoff normalized to zero. At any stage n ≥ 1, for each player i the following sequence of events unfolds. 1 She decides to drop out or to stay in. 2 If she stays, she gets and observes a random payoff X i n 3 She observes who stayed in. Pay-offs are private information, but decisions are publicly observed. Players discount payoffs at rate δ : if player stop at stage τ , her overall payoff is � τ − 1 n =1 δ n − 1 X i n . A. Salomon Equilibria in large one-arm bandit games

  9. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). A. Salomon Equilibria in large one-arm bandit games

  10. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. A. Salomon Equilibria in large one-arm bandit games

  11. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A. Salomon Equilibria in large one-arm bandit games

  12. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A possible strategy for player i is to set a priori a sequence of α ) and to stop the game as soon as p i cutoffs π n ( � n − 1 < π n ( � α n − 1 ). A. Salomon Equilibria in large one-arm bandit games

  13. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Cutoff Strategies At the beginning, all players have the same prior about the state of the world: p 0 = P (Θ = θ ). After each payoff, they can update this prior to get a private belief : p i n = P (Θ = θ | X i 1 , X i 2 , ..., X i n ). They also learn from others’ behaviours. At a given stage, the status of the players can be summed up in a � � if i is still active α n , with α i random vector � n = k if i exited at stage k ≤ n . A possible strategy for player i is to set a priori a sequence of α ) and to stop the game as soon as p i cutoffs π n ( � n − 1 < π n ( � α n − 1 ). Theorem [D.Rosenberg, E.Solan, N.Vieille] Assume that the law of p i 1 has a density. Every best reply is a cutoff strategy. There exists symmetric equilibria in cutoff strategy. A. Salomon Equilibria in large one-arm bandit games

  14. Model Cutoff Strategies Asymptotically Deterministic Equilibria Results An Example Question What happens when the number of players N goes to + ∞ ? A. Salomon Equilibria in large one-arm bandit games

  15. Model Cutoff Strategies How it works Asymptotically Deterministic Equilibria How it could not work Results An Example A first large game result. Let us define the cutoff p ∗ as: p ∗ θ 1 − δ + (1 − p ∗ ) θ = 0. This is the cutoff that makes a player indifferent between leaving or staying one more stage and being told the state. A. Salomon Equilibria in large one-arm bandit games

  16. Model Cutoff Strategies How it works Asymptotically Deterministic Equilibria How it could not work Results An Example A first large game result. Let us define the cutoff p ∗ as: p ∗ 1 − δ + (1 − p ∗ ) θ = 0. θ This is the cutoff that makes a player indifferent between leaving or staying one more stage and being told the state. Theorem [D.Rosenberg, E.Solan, N.Vieille] Assume that p i 1 has full support. In equilibria, as the number of players N gets large, all players tends to play with cutoff p ∗ after the first payoff: � ) − p ∗ | − sup i ∈{ 1 , 2 ,..., N } | π i 1 ( � N → + ∞ 0. − − − − → A. Salomon Equilibria in large one-arm bandit games

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend