Are sample means in multi-armed bandits positively or negatively - PowerPoint PPT Presentation

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C

Stochastic multi-armed bandit μ K μ 2 . . . μ 1 ∼ Y "Random reward"

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . .

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 t = 2

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮

Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Stopping time

Collected data can be used to identify an interesting arm... Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 "Interesting!"

̂ ...and data can be used to estimate the mean. Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Sample mean μ κ ( 𝒰 ) of chosen arm κ

̂ Q. Bias of sample mean? 𝔽 [ μ κ ( 𝒰 ) − μ κ ] ≤ or ≥ 0?

̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0

̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time

̂ ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time This work : Sample mean of chosen arm at stopping time 𝔽 [ μ κ ( 𝒰 ) − μ κ ] Chosen Arm Stopping Time

̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ]

̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'.

̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’.

̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’. (c) positively biased under ‘optimistic choosing’.

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Sample from arm k N k ( 𝒰 )

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k N k ( 𝒰 ) Increasing

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case

Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case Positive bias under best arm identification, sequential testing

Poster #12 @ Hall B + C Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin, Aaditya Ramdas and Alessandro Rinaldo

Are sample means in multi-armed bandits positively or negatively - PowerPoint PPT Presentation

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C Stochastic

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Lecture 1.5: Multisets and multichoosing Matthew Macauley Department of Mathematical Sciences

The Representability of Partial Recursive Yan Steimle Functions in Arithmetical Theories and

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola

2.2.5 Advanced Encryption Standard (AES) A collection of proposals has been studied by the

Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden

Slides Dynamic and precise Product catalog 2017 Latest version of the catalogs You can always

Cyclic Coded Integer-Forcing Equalization Or Ordentlich Joint work with Uri Erez EE-Systems, Tel

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition

Sambuz

Useful Links

Newsletter

Mail Us