pac identification of many good arms in stochastic multi
play

PAC Identification of Many Good Arms in Stochastic Multi-Armed - PowerPoint PPT Presentation

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8 What Is It All About? 2 / 8 What Is It All


  1. PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8

  2. What Is It All About? 2 / 8

  3. What Is It All About? 2 / 8

  4. What Is It All About? 2 / 8

  5. What Is It All About? 3 / 8

  6. What Is a Multi-Armed Bandit? 1.0 0.9 0.5 0.5 0.2 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  7. What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  8. What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) We need an alternative. Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  9. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . 5 / 8

  10. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  11. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  12. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  13. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  14. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  15. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  16. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. Redefine the problem to identify 1 from the best m arms. Defining ρ = m n , generalise the problem. What if we n is relatively small? 5 / 8

  17. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. 6 / 8

  18. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . 6 / 8

  19. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m : Best subset identification. 6 / 8

  20. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m = 1 : Best arm identification. 6 / 8

  21. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . k = m : Best subset identification. k = m = 1 : Best arm identification. Contributions: LUCB -k-m (Fully sequential + Adaptive). Worst case upper and lower bound. 6 / 8

  22. Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8

  23. Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8

  24. Thank You! Poster: #54 Email: arghya@cse.iitb.ac.in 8 / 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend