Are sample means in multi-armed bandits positively or negatively - - PowerPoint PPT Presentation

are sample means in multi armed bandits positively or
SMART_READER_LITE
LIVE PREVIEW

Are sample means in multi-armed bandits positively or negatively - - PowerPoint PPT Presentation

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C Stochastic


slide-1
SLIDE 1

Are sample means in multi-armed bandits positively or negatively biased?

Jaehyeok Shin1, Aaditya Ramdas1,2 and Alessandro Rinaldo1

  • Dept. of Statistics and Data Science1,

Machine Learning Dept.2, CMU

Poster #12 @ Hall B + C

slide-2
SLIDE 2

Stochastic multi-armed bandit

μK μ1 μ2

. . .

Y ∼

"Random reward"

slide-3
SLIDE 3

μK μ1

. . .

Time

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-4
SLIDE 4

μK μ1

. . .

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-5
SLIDE 5

μK μ1

. . .

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-6
SLIDE 6

μK μ1

. . .

Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-7
SLIDE 7

μK μ1

. . .

t = 2 Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-8
SLIDE 8

μK μ1

. . .

t = 2 Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-9
SLIDE 9

μK μ1

. . .

Y2 t = 2 Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-10
SLIDE 10

μK μ1

. . .

⋮ Y2 t = 2 Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

slide-11
SLIDE 11

μK μ1

. . .

⋮ Y2 t = 2 Y1

Time

t = 1

Adaptive sampling scheme to maximize rewards / to identify the best arm

μ2

𝒰

Stopping time

slide-12
SLIDE 12

μK μ1

. . .

⋮ Y2 t = 2 Y1

Time

t = 1

Collected data can be used to identify an interesting arm...

μ2

𝒰

"Interesting!"

slide-13
SLIDE 13

μK μ1

. . .

⋮ Y2 t = 2 Y1

Time

t = 1

...and data can be used to estimate the mean.

μ2

̂ μκ(𝒰) 𝒰

Sample mean

  • f chosen arm κ
slide-14
SLIDE 14

𝔽 [ ̂ μκ(𝒰) − μκ] ≤ or ≥ 0?

  • Q. Bias of sample mean?
slide-15
SLIDE 15

Nie et al. 2018 : Sample mean is negatively biased.

𝔽 [ ̂ μk(t) − μk] ≤ 0

slide-16
SLIDE 16

Nie et al. 2018 : Sample mean is negatively biased. Fixed Arm Fixed Time

𝔽 [ ̂ μk(t) − μk] ≤ 0

slide-17
SLIDE 17

Nie et al. 2018 : Sample mean is negatively biased. Fixed Arm Fixed Time

𝔽 [ ̂ μk(t) − μk] ≤ 0

This work : Sample mean of chosen arm at stopping time Chosen Arm Stopping Time

𝔽 [ ̂ μκ(𝒰) − μκ]

slide-18
SLIDE 18

This work : Sample mean of chosen arm at stopping time is ...

𝔽 [ ̂ μκ(𝒰) − μκ]

slide-19
SLIDE 19

This work : Sample mean of chosen arm at stopping time is ...

𝔽 [ ̂ μκ(𝒰) − μκ]

(a) negatively biased under ‘optimistic sampling'.

slide-20
SLIDE 20

This work : Sample mean of chosen arm at stopping time is ...

𝔽 [ ̂ μκ(𝒰) − μκ]

(a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’.

slide-21
SLIDE 21

This work : Sample mean of chosen arm at stopping time is ...

𝔽 [ ̂ μκ(𝒰) − μκ]

(a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’. (c) positively biased under ‘optimistic choosing’.

slide-22
SLIDE 22

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Theorem [Informal]

slide-23
SLIDE 23

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Positive bias Increasing Theorem [Informal]

slide-24
SLIDE 24

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Positive bias Increasing Theorem [Informal] Decreasing Negative bias

slide-25
SLIDE 25

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Positive bias Increasing Theorem [Informal] Decreasing Negative bias Agnostic to algorithm

slide-26
SLIDE 26

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Positive bias Increasing Theorem [Informal] Decreasing Negative bias Agnostic to algorithm Includes Nie et al. 2018 as a special case

slide-27
SLIDE 27

Monotone effect of a sample

Sample from arm k

1(κ = k) Nk(𝒰)

Positive bias Increasing Theorem [Informal] Decreasing Negative bias Agnostic to algorithm Positive bias under best arm identification, sequential testing Includes Nie et al. 2018 as a special case

slide-28
SLIDE 28

Are sample means in multi-armed bandits positively or negatively biased?

Jaehyeok Shin, Aaditya Ramdas and Alessandro Rinaldo

Poster #12 @ Hall B + C