Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability
Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020
Simpler Optimal Algorithm for Contextual Bandits under Realizability - - PowerPoint PPT Presentation
Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020 Stochastic Contextual Bandits For round = 1,
Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020
unknown distribution πΈππππ’ππ¦π’
π’ π¦π’, ππ’ β [0,1] according to
an unknown distribution πΈπ¦π’,ππ’ with (conditional) mean π½ π
π’ π¦π’, ππ’ π¦π’ = π¦, ππ’ = π = πβ(π¦, π)
models
πβ β πΊ
general function class πΊ?
efficient?
the above two challenges in practice, as they typically
Thompson Sampling)
bandits)
regression with a general πΊβ are very well-studied in ML
squares regression oracle (ERM with square loss): min
πβπΊ ΰ· π’=1 π‘
π π¦π’, ππ’ β π
π’(π¦π’, ππ’) 2 ,
can we design an algorithm that achieves the optimal regret via a few calls to this oracle?
(2018), Foster and Rakhlin (2020)