The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - - PowerPoint PPT Presentation

the epoch greedy algorithm for contextual multi armed
SMART_READER_LITE
LIVE PREVIEW

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits - - PowerPoint PPT Presentation

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora Overview Bandit problem Contextual bandits Epoch-Greedy algorithm Overview Bandit problem Contextual


slide-1
SLIDE 1

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

Authors: John Langford, Tom Zhang Presented by: Ben Flora

slide-2
SLIDE 2

Overview

  • Bandit problem
  • Contextual bandits
  • Epoch-Greedy algorithm
slide-3
SLIDE 3

Overview

  • Bandit problem
  • Contextual bandits
  • Epoch-Greedy algorithm
slide-4
SLIDE 4

Bandits

  • K arms each arm i

– Wins (reward 1) with probability pi – Looses (reward 0) with probability 1- pi

  • Exploration vs. Exploitation

– Exploration is unbiased – Exploitation is biased by exploration only

  • Regret

– Max return – Actual return

slide-5
SLIDE 5

Web Example

  • Some number of ads that can be displayed

– Each ad translates to an arm

  • Each ad can be clicked on by a user

– If clicked reward 1 if not reward 0

  • Want to have adds clicked as often as possible

– This will make the most money

slide-6
SLIDE 6

Overview

  • Bandit problem
  • Contextual bandits
  • Epoch-Greedy algorithm
slide-7
SLIDE 7

Contextual Bandits

  • Add Context to the bandit problem

– Information aiding in arm choosing – Helps know which arm is best

  • The rest follows the Bandit problem
  • Want to find optimal solution
  • More useful than regular bandits
slide-8
SLIDE 8

Web Problem

  • Now we have user information

– A user profile – Search Query – A users preferences

  • Use this information to choose an ad

– Better chance of choosing an ad that is clicked on

slide-9
SLIDE 9

Overview

  • Bandit problem
  • Contextual bandits
  • Epoch-Greedy algorithm
slide-10
SLIDE 10

Epoch-Greedy Overview

Exploration (unbiased input) Black Box: Transforms Input to hypotheses Hypotheses (best arm)

Similar idea to the papers we saw on Thursday

Context

slide-11
SLIDE 11

Exploration

  • Look at a fixed time horizon

– Time horizon is the total number of pulls

  • Choose a number of Exploration steps

T n steps Exploration T-n Steps Exploitation

slide-12
SLIDE 12

Minimizing Regret

  • No explore regret = T
  • All exploit regret = T
  • Some minimum between those points

T Regret n T T Regret n T T Regret n T

slide-13
SLIDE 13

Creating a Hypotheses

  • Simple two armed case
  • Remember binary thresholds
  • Want to learn the threshold value

t

If x < t : pick arm 1 x > t : pick arm 2

ε ε

slide-14
SLIDE 14

Creating a Hypotheses (Cont.)

  • Want to be within ε of the threshold

– Need ≈ O(1/ε)

  • As the function gets more complex

– Need ≈ O((1/ε)*C) – C denotes how complex the function is – A quick note for those of you who took 156 the C is similar to VC dimension

slide-15
SLIDE 15

Epoch

  • Don’t always know the time horizon
  • Append groupings of known time horizons

– Repeat until time actually ends

  • This specific paper has chosen a single

exploration step at the beginning of each epoch

slide-16
SLIDE 16

Epoch-Greedy Algorithm

  • Do a single step of exploration

– Begin creating an unbiased vector of inputs to create the hypotheses – Observe context information

  • Add the learned information to past

exploration and create a new hypotheses

– This uses the contextual data and exploration

  • For a set number of steps exploit the

hypotheses arm

slide-17
SLIDE 17

Review Using Web Example

  • Have a variety of ads that can be shown

– Sports – Movie – Insurance

slide-18
SLIDE 18

Review (Cont)

  • Search Query

– Golf Club Repair – Randomly choose – Clicked

  • Search Query

– Car Body Repair – See Repair and Car – Not Clicked

slide-19
SLIDE 19

Review (Cont.)

  • Search Query

– Horror Movie – Randomly choose – Clicked

  • Search Query

– Sheep Movie – See Sheep and Movie – Clicked