simpler optimal algorithm for
play

Simpler Optimal Algorithm for Contextual Bandits under Realizability - PowerPoint PPT Presentation

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020 Stochastic Contextual Bandits For round = 1,


  1. Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work with David Simchi-Levi (MIT) July 18 RealML @ ICML 2020

  2. Stochastic Contextual Bandits β€’ For round 𝑒 = 1, β‹― , π‘ˆ β€’ Nature generates a random context 𝑦 𝑒 according to a fixed unknown distribution 𝐸 π‘‘π‘π‘œπ‘’π‘“π‘¦π‘’ β€’ Learner observes 𝑦 𝑒 and makes a decision 𝑏 𝑒 ∈ {1, … , 𝐿} β€’ Nature generates a random reward 𝑠 𝑒 𝑦 𝑒 , 𝑏 𝑒 ∈ [0,1] according to an unknown distribution 𝐸 𝑦 𝑒 ,𝑏 𝑒 with (conditional) mean 𝑒 𝑦 𝑒 , 𝑏 𝑒 𝑦 𝑒 = 𝑦, 𝑏 𝑒 = 𝑏 = 𝑔 βˆ— (𝑦, 𝑏) 𝔽 𝑠 β€’ We call 𝑔 βˆ— the ground-truth reward function β€’ In statistical learning, people use a function class 𝐺 to approximate 𝑔 βˆ— . Some examples of 𝐺 : β€’ Linear class / high-dimension linear class / generalized linear models β€’ Reproducing kernel Hilbert spaces β€’ Lipschitz and HΓΆlder spaces β€’ Neural networks

  3. Challenges β€’ We are interested in contextual bandits with a general function class 𝐺 β€’ Realizability assumption: 𝑔 βˆ— ∈ 𝐺 β€’ Statistical challenges : how to achieve the minimax optimal regret for a general function class 𝐺 ? β€’ Computational challenges : how to make the algorithm computational efficient? β€’ Existing contextual bandits approaches cannot simultaneously address the above two challenges in practice, as they typically β€’ Rely on strong parametric/structural assumptions on 𝐺 (e.g., UCB variants and Thompson Sampling) β€’ Become computationally intractable for large 𝐺 (e.g., EXP4) β€’ Assume computationally expensive or statistically restrictive oracles that are only implementable for specific F (a series of work on oracle-based contextual bandits)

  4. Research Question β€’ Observation: the statistical and computational aspects of β€œoffline regression with a general 𝐺 ” are very well-studied in ML β€’ Can we reduce general contextual bandits to general offline regression? β€’ Specifically, for any 𝐺 , given an offline regression oracle, i.e., a least- squares regression oracle (ERM with square loss): 𝑑 𝑒 (𝑦 𝑒 , 𝑏 𝑒 ) 2 , min π‘”βˆˆπΊ ෍ 𝑔 𝑦 𝑒 , 𝑏 𝑒 βˆ’ 𝑠 𝑒=1 can we design an algorithm that achieves the optimal regret via a few calls to this oracle? β€’ An open problem mentioned in Agarwal et al. (2012), Foster et al. (2018), Foster and Rakhlin (2020)

  5. Our Contributions β€’ We provide the first optimal and efficient offline- regression-oracle-based algorithm for general contextual bandits (under realizability) β€’ The algorithm is much simpler and faster than existing approaches to general contextual bandits β€’ We provide the first universal and optimal black- box reduction from contextual bandits to offline regression β€’ Any advances in offline (square loss) regression immediately translate to contextual bandits, statistically and computationally

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend