Adaptations of the Thompson Sampling Algorithm for Multi-Armed - PowerPoint PPT Presentation

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke Supervisor: David Leslie 24th April 2015 1 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Introduction Motivation In many real life problems there is a trade-off to be made between exploitation and exploration. For example; ◮ In clinical trials. ◮ In portfolio optimization. ◮ In website optimization. ◮ Choosing a restaurant. 2 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Introduction Multi-Armed Bandits One of the best ways to model the exploitation vs exploration trade-off is using Multi-Armed Bandits. ◮ Each of k slot machines has an unknown reward distribution. ◮ Want to maximize reward, or equivalently minimize regret. ◮ Regret is the accumulated difference in expected reward of the arm we played and the optimal arm. Figure: A Multi-Armed Bandit (image from research.microsoft.com) 3 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Thompson Sampling Thompson Sampling 4 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Thompson Sampling Thompson Sampling For the case of Bernoulli rewards, the Thompson Sampling algorithm is: 1. Initialize with Uniform ( Beta (1 , 1)) priors on the reward of each arm. 2. At each time step, t : ◮ Sample θ i from Beta ( s i ( t − 1) + 1 , f i ( t − 1) + 1) for each arm i . ◮ Play the arm that corresponds to the largest θ i . ◮ Update s i ( t ) and f i ( t ) for all i . Where s i ( t ) is the number of successes from playing arm i in t total plays of the algorithm and f i ( t ) the number of failures. 4 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Thompson Sampling Sampling Distributions Figure: Thompson Sampling for the 2-armed Bernoulli bandit with p = (0 . 35 , 0 . 8). 5 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Optimistic Bayesian Sampling OBS: Motivation ◮ If the variance of the better arm is too large, Thompson Sampling will often end up playing the inferior arm. ◮ May et al. (2012) propose a new method, Optimistic Bayesian Sampling, to combat this. 6 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Optimistic Bayesian Sampling OBS: Outline ◮ Optimistic Bayesian Sampling is the same as Thompson Sampling except for the decision rule. ◮ At each time step t , play the arm that maximizes q i = max { θ i , µ i } where θ i ∼ Beta ( s i ( t − 1) + 1 , f i ( t − 1) + 1) and µ i is the mean of this distribution. Optimistic Bayesian Sampling has been shown empirically and theoretically to perform better that Thompson Sampling. 7 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Optimistic Bayesian Sampling using Rejection Sampling Motivation Thompson Sampling Optimistic Bayesian Sampling 2000 5000 1500 Frequency Frequency 1000 3000 500 1000 0 0 0.3 0.4 0.5 0.6 0.7 0.40 0.45 0.50 0.55 0.60 0.65 0.70 prob prob 8 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Optimistic Bayesian Sampling using Rejection Sampling Optimistic Bayesian Sampling using Rejection Sampling We can use Rejection Sampling to obtain samples from the truncated Beta distribution. Rejection Sampling 1500 Frequency 1000 500 0 0.3 0.4 0.5 0.6 0.7 prob 9 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Algorithms Optimistic Bayesian Sampling using Rejection Sampling Optimistic Bayesian Sampling using Rejection Sampling We can use Rejection Sampling to obtain samples from the truncated Beta distribution. Rejection Sampling 1500 Frequency 1000 500 0 0.3 0.4 0.5 0.6 0.7 prob ◮ The algorithm is the same as for Thompson Sampling but sampling from the truncated Beta ( s i ( t − 1) + 1 , f i ( t − 1) + 1). ◮ Can choose any proposal distribution, the simplest is the Beta distribution. 9 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Simulation Study Simulation Study The three methods, Thompson Sampling, Optimistic Bayesian Sampling and Optimistic Bayesian Sampling using Rejection Sampling were tested on four simulations with Bernoulli rewards. 10 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Simulation Study Simulation Study The three methods, Thompson Sampling, Optimistic Bayesian Sampling and Optimistic Bayesian Sampling using Rejection Sampling were tested on four simulations with Bernoulli rewards. Simulation 1 The 2 armed bandit with randomly generated probabilities p = (0 . 34 , 0 . 92) Simulation 2 The 5 armed bandit with p = (0 . 45 , 0 . 45 , 0 . 45 , 0 . 55 , 0 . 45) Simulation 3 The 10 armed bandit with p = (0 . 9 , 0 . 8 , . . . , 0 . 8) Simulation 4 The 20 armed bandit with randomly generated probabilities p = (0 . 56 , 0 . 09 , 0 . 68 , 0 . 69 , 0 . 19 , 0 . 45 , 0 . 77 , 0 . 29 , 0 . 58 , 0 . 11 , 0 . 91 , 0 . 17 , 0 . 29 , 0 . 95 , 0 . 90 , 0 . 39 , 0 . 38 , 0 . 53 , 0 . 84 , 0 . 03). 10 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Simulation Study Results 11 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Conclusion Conclusion ◮ Both adaptations of the Thompson Sampling algorithm seem to perform better than the original in simulations. ◮ However, Optimistic Bayesian Sampling using Rejection Sampling can be slow. ◮ The theoretical regret bound of OBS is better than that of Thompson Sampling - there has not been a regret bound proved for OBS using Rejection Sampling. 12 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Conclusion Future Work ◮ More careful consideration of the proposal distribution for Optimistic Bayesian Sampling using Rejection Sampling. ◮ Theoretical results for OBS using Rejection Sampling. ◮ Further simulations with: ◮ more arms, ◮ more complex reward distributions, ◮ contextual bandits, ◮ addition or subtraction of arms mid-way through the algorithm. 13 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits References References Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. . Biometrika, pages 285-294. Agrawal, S. and Goyal, N. (2011). Analysis of thompson sampling for the multi-armed bandit problem. arXiv preprint arXiv:1111.1797. May, B. C., Korda,N., Lee, A. and Leslie, D. S. (2012). Optimistic bayesian sampling in contextual-bandit problems . The Journal of Machine Learning Research, 13(1):2069-2106. 14 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits References References Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. . Biometrika, pages 285-294. Agrawal, S. and Goyal, N. (2011). Analysis of thompson sampling for the multi-armed bandit problem. arXiv preprint arXiv:1111.1797. May, B. C., Korda,N., Lee, A. and Leslie, D. S. (2012). Optimistic bayesian sampling in contextual-bandit problems . The Journal of Machine Learning Research, 13(1):2069-2106. Thank you for listening, any questions? 14 / 14

Adaptations of the Thompson Sampling Algorithm for Multi-Armed - PowerPoint PPT Presentation

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke Supervisor: David Leslie 24th April 2015 1 / 14 Adaptations of the Thompson

Animal adaptations Nichole .L. rosa What are adaptations? Adaptations are traits that help

Animal Adaptations Thomas Britton What are adaptations Adaptations are traits that help

Animal Adaptation Jonica Farrell What are Adaptations Adaptations are traits that help

Animal adaptations Jada What are adaptations Adaptations help animal It help animal get

ANIMAL ADAPTATIONS LOGAN AND MIKAEEL WHAT ARE ADAPTIONS

Animal adaptations Tamareion Martin What are adaptations? Adaptations are things inside a animal

Animal adaptations By Gladys Anima and Armonie Johnson What are adaptations? Animals have

Animal adaptation Dion What are adaptations Adaptations are traits that help organisms

Animal Adaptations By Tyce and Jessie What are adaptations? The Main idea is bode pars help

BY JEREMIAH WHAT ARE ADAPTATIONS ADAPTATIONS HELP ANIMALS LIVE SOME ANIMALS ADAPTED TO HAVE

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Enhancing Sustainability of Water and Sanitation Facilities Through Mobile Phone Technology

Climatology of Fronts and Associated Surface Baroclinic Zones in the Great Lakes Region Melissa

Flexible Resource Adequacy Criteria and Must Offer Obligation Working Group Meeting Karl

May 2018 FORWARD LOOKING STATEMENTS DISCLOSURE Some of the statements in this presentation

New Porcelain Tile Standard for Rehabilitation of Platform Slabs at Minnesota Avenue &

Intro ADA Operations Contact Info Todd Grugel ph: 651-366-3531 email: todd.grugel@state.mn.us

Learning Kernel-Based Halfspaces with the Zero-One Loss Shai Shalev-Shwartz 1 , Ohad Shamir 1 and

Citizens Advisory Group Meeting No. 4 December 15, 2015 Agenda 1. Project Overview

Adaptations of the Thompson Sampling Algorithm for Multi-Armed - PowerPoint PPT Presentation

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke Supervisor: David Leslie 24th April 2015 1 / 14 Adaptations of the Thompson

Animal adaptations Nichole .L. rosa What are adaptations? Adaptations are traits that help

Animal Adaptations Thomas Britton What are adaptations Adaptations are traits that help

Animal Adaptation Jonica Farrell What are Adaptations Adaptations are traits that help

Animal adaptations Jada What are adaptations Adaptations help animal It help animal get

ANIMAL ADAPTATIONS LOGAN AND MIKAEEL WHAT ARE ADAPTIONS

Animal adaptations Tamareion Martin What are adaptations? Adaptations are things inside a animal

Animal adaptations By Gladys Anima and Armonie Johnson What are adaptations? Animals have

Animal adaptation Dion What are adaptations Adaptations are traits that help organisms

Animal Adaptations By Tyce and Jessie What are adaptations? The Main idea is bode pars help

BY JEREMIAH WHAT ARE ADAPTATIONS ADAPTATIONS HELP ANIMALS LIVE SOME ANIMALS ADAPTED TO HAVE

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Enhancing Sustainability of Water and Sanitation Facilities Through Mobile Phone Technology

Climatology of Fronts and Associated Surface Baroclinic Zones in the Great Lakes Region Melissa

Flexible Resource Adequacy Criteria and Must Offer Obligation Working Group Meeting Karl

May 2018 FORWARD LOOKING STATEMENTS DISCLOSURE Some of the statements in this presentation

New Porcelain Tile Standard for Rehabilitation of Platform Slabs at Minnesota Avenue &amp;

Intro ADA Operations Contact Info Todd Grugel ph: 651-366-3531 email: todd.grugel@state.mn.us

Learning Kernel-Based Halfspaces with the Zero-One Loss Shai Shalev-Shwartz 1 , Ohad Shamir 1 and

Citizens Advisory Group Meeting No. 4 December 15, 2015 Agenda 1. Project Overview

New Porcelain Tile Standard for Rehabilitation of Platform Slabs at Minnesota Avenue &