the epoch greedy algorithm for contextual multi armed
play

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - PowerPoint PPT Presentation

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora Overview Bandit problem Contextual bandits Epoch-Greedy algorithm Overview Bandit problem Contextual


  1. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora

  2. Overview • Bandit problem • Contextual bandits • Epoch-Greedy algorithm

  3. Overview • Bandit problem • Contextual bandits • Epoch-Greedy algorithm

  4. Bandits • K arms each arm i – Wins (reward 1) with probability p i – Looses (reward 0) with probability 1- p i • Exploration vs. Exploitation – Exploration is unbiased – Exploitation is biased by exploration only • Regret – Max return – Actual return

  5. Web Example • Some number of ads that can be displayed – Each ad translates to an arm • Each ad can be clicked on by a user – If clicked reward 1 if not reward 0 • Want to have adds clicked as often as possible – This will make the most money

  6. Overview • Bandit problem • Contextual bandits • Epoch-Greedy algorithm

  7. Contextual Bandits • Add Context to the bandit problem – Information aiding in arm choosing – Helps know which arm is best • The rest follows the Bandit problem • Want to find optimal solution • More useful than regular bandits

  8. Web Problem • Now we have user information – A user profile – Search Query – A users preferences • Use this information to choose an ad – Better chance of choosing an ad that is clicked on

  9. Overview • Bandit problem • Contextual bandits • Epoch-Greedy algorithm

  10. Epoch-Greedy Overview Exploration (unbiased input) Hypotheses Black Box: (best arm) Transforms Input to hypotheses Context Similar idea to the papers we saw on Thursday

  11. Exploration • Look at a fixed time horizon – Time horizon is the total number of pulls • Choose a number of Exploration steps n steps T-n Steps Exploration Exploitation T

  12. Minimizing Regret • No explore regret = T • All exploit regret = T • Some minimum between those points Regret Regret Regret T T T n n n T T T

  13. Creating a Hypotheses • Simple two armed case • Remember binary thresholds • Want to learn the threshold value t ε ε If x < t : pick arm 1 x > t : pick arm 2

  14. Creating a Hypotheses (Cont.) • Want to be within ε of the threshold – Need ≈ O(1/ε ) • As the function gets more complex – Need ≈ O((1/ε )*C) – C denotes how complex the function is – A quick note for those of you who took 156 the C is similar to VC dimension

  15. Epoch • Don’t always know the time horizon • Append groupings of known time horizons – Repeat until time actually ends • This specific paper has chosen a single exploration step at the beginning of each epoch

  16. Epoch-Greedy Algorithm • Do a single step of exploration – Begin creating an unbiased vector of inputs to create the hypotheses – Observe context information • Add the learned information to past exploration and create a new hypotheses – This uses the contextual data and exploration • For a set number of steps exploit the hypotheses arm

  17. Review Using Web Example • Have a variety of ads that can be shown – Sports – Movie – Insurance

  18. Review (Cont) • Search Query – Golf Club Repair – Randomly choose – Clicked • Search Query – Car Body Repair – See Repair and Car – Not Clicked

  19. Review (Cont.) • Search Query – Horror Movie – Randomly choose – Clicked • Search Query – Sheep Movie – See Sheep and Movie – Clicked

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend