1
Meta-Learning Contextual Bandit Exploration
Amr Sharaf University of Maryland amr@cs.umd.edu Hal Daum´ e III Microsoft Research & University of Maryland me@hal3.name
Abstract
Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum e - - PowerPoint PPT Presentation
Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum e III University of Maryland Microsoft Research & University of Maryland amr@cs.umd.edu me@hal3.name Abstract 1 Can we learn to explore in contextual bandits? 2
1
Amr Sharaf University of Maryland amr@cs.umd.edu Hal Daum´ e III Microsoft Research & University of Maryland me@hal3.name
Abstract
2
3
4
5
6
Goal: Maximize Sum of Rewards
7
Examples / Time
loss (explore)
loss (exploit)
Roll-out with π*
…
Deviation
Roll-in with π
8
9
Win statistics: each (row, column) entry shows the number of times the row algorithm won against the column, minus the number of losses.
10
Win statistics: each (row, column) entry shows the number of times the row algorithm won against the column, minus the number of losses.
11
meta-learning setting.
sufficiently quickly, Mêlée will achieve sublinear regret.
12
13