are sample means in multi armed bandits positively or
play

Are sample means in multi-armed bandits positively or negatively - PowerPoint PPT Presentation

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C Stochastic


  1. Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C

  2. Stochastic multi-armed bandit μ K μ 2 . . . μ 1 ∼ Y "Random reward"

  3. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . .

  4. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . t = 1

  5. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . t = 1

  6. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1

  7. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 t = 2

  8. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 t = 2

  9. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2

  10. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮

  11. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Stopping time

  12. Collected data can be used to identify an interesting arm... Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 "Interesting!"

  13. ̂ ...and data can be used to estimate the mean. Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Sample mean μ κ ( 𝒰 ) of chosen arm κ

  14. ̂ Q. Bias of sample mean? 𝔽 [ μ κ ( 𝒰 ) − μ κ ] ≤ or ≥ 0?

  15. ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0

  16. ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time

  17. ̂ ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time This work : Sample mean of chosen arm at stopping time 𝔽 [ μ κ ( 𝒰 ) − μ κ ] Chosen Arm Stopping Time

  18. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ]

  19. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'.

  20. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’.

  21. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’. (c) positively biased under ‘optimistic choosing’.

  22. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Sample from arm k N k ( 𝒰 )

  23. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k N k ( 𝒰 ) Increasing

  24. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing

  25. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm

  26. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case

  27. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case Positive bias under best arm identification, sequential testing

  28. Poster #12 @ Hall B + C Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin, Aaditya Ramdas and Alessandro Rinaldo

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend