On conditional versus marginal bias in multi-armed bandits Jaehyeok - PowerPoint PPT Presentation

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU

Stochastic Multi-armed bandits (MABs) μ K μ 2 μ 1 . . . ∼ Y "Random reward" 2

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . t = 1 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . Y 1 t = 1 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . Y 1 t = 1 t = 2 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 ⋮ 3

Adaptive sampling scheme to maximize rewards / to identify the best arm μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Stopping time 3

Collected data can be used to identify an interesting arm... μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 4

̂ Collected data can be used to identify an interesting arm... μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 e.g., κ = arg max μ k ( 𝒰 ) ⋮ k 𝒰 4

̂ ...and the data can be used to conduct statistical inferences. μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Sample mean μ κ ( 𝒰 ) at a stopping time 𝒰 5

̂ Q. Sign of the bias of sample mean? 𝔽 [ μ κ ( 𝒰 ) − μ κ ] ≤ or ≥ 0? Xu et al. [2013] : An informal argument why the sample mean is negatively biased for “optimistic” algorithms. Villar et al. [2015] : Demonstrate this negative bias in a simulation study motivated by using MAB for clinical trials. 6

̂ ̂ Nie et al. [2018] Sample mean is negatively biased 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time for MABs designed to maximize cumulative reward. Shin et al. [2019] Introduced "monotonicity property" characterizing the bias of the sample mean for more general classes of MABs. 𝔽 [ μ κ ( 𝒰 ) − μ κ ] Chosen Arm Stopping Time 7

̂ ̂ Nie et al. [2018] Sample mean is negatively biased 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time for MABs designed to maximize cumulative reward. Shin et al. [2019] Introduced "monotonicity property" characterizing the bias of the sample mean for more general classes of MABs. 𝔽 [ μ κ ( 𝒰 ) − μ κ ] Chosen Arm Stopping Time 8

However, current understanding of bias is limited in two aspects. 1. Existing results concern the bias of the sample mean only. 9

However, current understanding of bias is limited in two aspects. 1. Existing results concern the bias of the sample mean only. We study the bias of monotone functions of the rewards. 9

However, current understanding of bias is limited in two aspects. 1. Existing results concern the bias of the sample mean only. We study the bias of monotone functions of the rewards. 2. Existing guarantees cover only the marginal bias. 9

However, current understanding of bias is limited in two aspects. 1. Existing results concern the bias of the sample mean only. We study the bias of monotone functions of the rewards. 2. Existing guarantees cover only the marginal bias. We extend previous results to cover the conditional bias. 9

Marginal vs Conditional bias • K prototype items μ K μ 2 μ 1 . . . Want to screen out some items by testing vs H 1 k : μ k < c for H 0 k : μ k ≥ c k = 1,…, K . 10

̂ ̂ Marginal vs Conditional bias H 0 : μ ≥ c vs H 1 : μ < c μ ( t ) μ ( t ) T 𝒰 "Keep the item" "Screen out the item at " 𝒰 (Reject the null) (Fail to reject the null) 11

̂ ̂ ̂ ̂ Marginally, the sample mean is negatively biased. Item 1 Item 2 Item K μ ( t ) μ ( t ) μ ( t ) . . . T 𝒰 𝒰 𝔽 [ μ k − μ k ] ≤ 0, k = 1,…, K "Underestimating the true mean revenue." (e.g. Starr & Woodroofe [1968], Shin et al. [2019]) 12

̂ ̂ ̂ ...however, we usually do not evaluate the sample mean every time. Item 1 Item 2 Item K μ ( t ) μ ( t ) μ ( t ) . . . T 𝒰 𝒰 13

̂ ̂ ̂ ̂ Conditioned on the "active" event, the sample mean is positively biased. Item 1 Item 2 Item K μ ( t ) μ ( t ) μ ( t ) . . . T 𝒰 𝒰 𝔽 [ μ k − μ k ∣ item k is active ] ≥ 0, k = 1,…, K "Overestimating the true mean revenue." 13

̂ ̂ Conditional bias of the empirical cumulative distribution function (CDF) For a fixed y ∈ ℝ , 𝔽 [ F k , 𝒰 ( y ) − F k ( y ) ∣ C ] ≤ or ≥ 0? e.g., C = { Reject the null }, { Chosen as the best arm } . . . . where F k , 𝒰 : Empirical CDF of arm k at time 𝒰 F k : True CDF of arm k at time 𝒰 14

Tabular model of MABs μ 1 μ 2 μ K . . . i.i.d. i.i.d. i.i.d. ∼ ∼ ∼ X * ∞ ∈ ℝ ℕ× K X * X * X * . . . 1, K 1,1 1,2 } X * X * X * . . . := 2,1 2,2 2, K ⋮ ⋮ ⋮ ⋮ "Hypothetical table" 15

Tabular model of MABs μ 1 μ 2 μ K Time . . . X * X * X * . . . 1, K 1,1 1,2 X * X * X * . . . 2,1 2,2 2, K ⋮ ⋮ ⋮ ⋮ 16

Tabular model of MABs μ 1 μ 2 μ K Time . . . X * X * X * t = 1 . . . 1, K 1,1 1,2 X * X * X * . . . 2,1 2,2 2, K ⋮ ⋮ ⋮ ⋮ 16

Tabular model of MABs μ 1 μ 2 μ K Time . . . X * X * t = 1 Y 1 . . . 1, K 1,1 X * X * X * . . . 2,1 2,2 2, K ⋮ ⋮ ⋮ ⋮ 16

Tabular model of MABs μ 1 μ 2 μ K Time . . . X * X * t = 1 Y 1 . . . 1, K 1,1 X * X * X * t = 2 . . . 2,1 2,2 2, K ⋮ ⋮ ⋮ ⋮ 16

Tabular model of MABs μ 1 μ 2 μ K Time . . . X * X * t = 1 Y 1 . . . 1, K 1,1 X * X * t = 2 Y 2 . . . 2,2 2, K ⋮ ⋮ ⋮ ⋮ 16

Hypothetical dataset Hypothetical table ∞ ∪ { W t } ∞ 𝒠 * ∞ = X * t =1 Random seeds 17

Hypothetical dataset ∞ ∪ { W t } ∞ Given 𝒠 * ∞ = X * t =1 , and for each and can be C 𝒰 N k ( t ) t k expressed as some functions of . 𝒠 * ∞ 18

̂ Monotone effect of a sample Theorem 1 ( C ) Suppose arm has a finite mean. If is an increasing k N k ( 𝒰 ) function of each while keeping all other entries in X * 𝒠 * ∞ i , k fixed then we have 𝔽 [ (Negative conditional bias of F k , 𝒰 ( y ) − F k ( y ) ∣ C ] ≤ 0 the empirical CDF) 19

̂ ̂ Monotone effect of a sample Theorem 1 ( C ) Suppose arm has a finite mean. If is an increasing k N k ( 𝒰 ) function of each while keeping all other entries in X * 𝒠 * ∞ i , k fixed then we have 𝔽 [ (Negative conditional bias of F k , 𝒰 ( y ) − F k ( y ) ∣ C ] ≤ 0 the empirical CDF) (Positive conditional bias of 𝔽 [ μ k ( 𝒰 ) − μ k ∣ C ] ≥ 0 the sample mean) 19

̂ Monotone effect of a sample Theorem 1 ( C ) Suppose arm has a finite mean. If is a decreasing k N k ( 𝒰 ) function of each while keeping all other entries in X * 𝒠 * ∞ i , k fixed then we have 𝔽 [ (Positive conditional bias of F k , 𝒰 ( y ) − F k ( y ) ∣ C ] ≥ 0 the empirical CDF) 20

̂ ̂ Monotone effect of a sample Theorem 1 ( C ) Suppose arm has a finite mean. If is a decreasing k N k ( 𝒰 ) function of each while keeping all other entries in X * 𝒠 * ∞ i , k fixed then we have 𝔽 [ (Positive conditional bias of F k , 𝒰 ( y ) − F k ( y ) ∣ C ] ≥ 0 the empirical CDF) (Negative conditional bias 𝔽 [ μ k ( 𝒰 ) − μ k ∣ C ] ≤ 0 of the sample mean) 20

E.g.: Best arm identification • K prototype items μ K μ 2 μ 1 . . . Want to figure out which one has the largest revenue. 21

E.g.: Best arm identification lil' UCB algorithm μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 22

̂ E.g.: Best arm identification lil' UCB algorithm μ K μ 2 μ 1 Time . . . Y 1 t = 1 Y 2 t = 2 (Upper confidence bound) A t = arg max μ k ( t ) + u ( N k ( t )) k 22

On conditional versus marginal bias in multi-armed bandits Jaehyeok - PowerPoint PPT Presentation

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Stochastic Multi-armed bandits (MABs) K 2 1

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

E XPECTATIONS , N ETWORKS , AND C ONVENTIONS Benjamin Golub Stephen Morris Harvard Princeton

Direct measurement of Chern numbers in the diffraction pattern of a Fibonacci chain. JMC15

"I don't care who does the electing, as long as I get to do the nominating"

File class in Java n Programmers refer to input/output as "I/O". n Input is

What is a Strategic Plan? Source: www.thefreedictionary.com/strategic 1 12/7/2011 What is

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari

(De)Constructing Bias on Skin Lesion Datasets A. Bissoto, M. Fornaciali, E. Valle, S.

Protective Optimization Technologies: The revolution will not be optimized? Seda Grses

On conditional versus marginal bias in multi-armed bandits Jaehyeok - PowerPoint PPT Presentation

On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Stochastic Multi-armed bandits (MABs) K 2 1

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Adaptations of the Thompson Sampling Algorithm for Multi-Armed Bandits Ciara Pike-Burke

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Social Learning in Multi Agent Multi Armed Bandits Abishek Sankararaman, UC Berkeley April 9,

Muti-armed Bandits,Online Learning and Sequential Prediction Jian Li Institute for

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

E XPECTATIONS , N ETWORKS , AND C ONVENTIONS Benjamin Golub Stephen Morris Harvard Princeton

Direct measurement of Chern numbers in the diffraction pattern of a Fibonacci chain. JMC15

&quot;I don't care who does the electing, as long as I get to do the nominating&quot;

File class in Java n Programmers refer to input/output as &quot;I/O&quot;. n Input is

What is a Strategic Plan? Source: www.thefreedictionary.com/strategic 1 12/7/2011 What is

Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari

(De)Constructing Bias on Skin Lesion Datasets A. Bissoto, M. Fornaciali, E. Valle, S.

Protective Optimization Technologies: The revolution will not be optimized? Seda Grses

"I don't care who does the electing, as long as I get to do the nominating"

File class in Java n Programmers refer to input/output as "I/O". n Input is