Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum e - PowerPoint PPT Presentation

Sep 21, 2022 •243 likes •381 views

Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum e III University of Maryland Microsoft Research & University of Maryland amr@cs.umd.edu me@hal3.name Abstract 1 Can we learn to explore in contextual bandits? 2

Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum´ e III University of Maryland Microsoft Research & University of Maryland amr@cs.umd.edu me@hal3.name Abstract 1
Can we learn to explore in contextual bandits? 2
Contextual Bandits: News Display 3
Contextual Bandits: News Display NEW NEW NEW NEW 4
Contextual Bandits: News Display NEW 5
Contextual Bandits: News Display NEW Goal: Maximize Sum of Rewards 6
Training Mêlée by Imitation Access to π * at train Roll-out with π * Goal: learn π loss exploit (exploit) t-1 t … loss explore (explore) Deviation Roll-in with π Examples / Time 7
Generalization: Meta-Features - No direct dependency on the contexts x. - Features include: - Calibrated predicted probability p(a t | f t , x t ); - Entropy of the predicted probability distribution; - A one-hot encoding for the predicted action ft(x t ); - Current time step t; - Average observed rewards for each action. 8
A representative learning curve 9
Win / Loss Statistics Win statistics: each (row, column) entry shows the number of times the row algorithm won against the column, minus the number of losses. 10
Win / Loss Statistics Win statistics: each (row, column) entry shows the number of times the row algorithm won against the column, minus the number of losses. 11
Theoretical Guarantees - The no-regret property of Aggrevate can be leveraged in our meta-learning setting. - We relate the regret of the learner to the overall regret of π . - This shows that, if the underlying classifier improves su ffi ciently quickly, Mêlée will achieve sublinear regret. 12
Conclusion - Q: Can we learn to explore in contextual bandits? - A: Yes, by imitating an expert exploration policy; - Generalize across bandit problems using meta-features; - Outperform alternative strategies in most settings; - We provide theoretical guarantees. 13

Recommend

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5 0.1 0.9 0.5 0.1 0.0 0.0 0.0 0.0 estimate n-armed bandit n-armed bandit 0.9 0.5 0.1 0.9 0.5 0.1 0 0.0 0.0 0.0 0.0

677 views • 21 slides

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5 0.1 n-armed bandit 0.9 0.5 0.1 0.0 0.0 0.0 0.0 estimate n-armed bandit 0.9 0.5 0.1 0 0.0 0.0 0.0 0.0 estimate 0 0 0 0.0 0

995 views • 84 slides

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry Interviewing techniques Contextual Design Contextual design is: An established process for analyzing tasks people do and designing technology to

456 views • 43 slides

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem A New, Fast, and Simple Algorithm A New, Fast, and Simple Algorithm A New, Fast, and

1.56k views • 134 slides

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim, Myunghee Cho

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim, Myunghee Cho Paik Seoul National University June 13, 2019 Introduction We propose a new contextual multi-armed bandit (MAB) algorithm for the nonstationary

209 views • 20 slides

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang Presented by: Ben Flora Overview Bandit problem Contextual bandits Epoch-Greedy algorithm Overview Bandit problem Contextual

674 views • 19 slides

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed Bandit Utility /** * Returns the winnings from one pull of the one armed * bandit. * * @param coin the coin deposited in the one armed bandit.

623 views • 58 slides

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta -Modeling and Modeling and Model Transformations Model Transformations Peter Fritzson, Adrian Pop Peter Fritzson, Adrian Pop OpenModelica Course, 2007

381 views • 14 slides

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work activity data Identification, sorting, organization, interpretation, consolidation, and communication For purpose of understanding work context for

449 views • 26 slides

A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu,

A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu, John Langford, Rebort E. Schapire Presentator: Qingyun Wu News Recommendation Cycle A K-armed Bandit Formulation A gambler must decide which of

560 views • 14 slides

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell

Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine Human Exploration vs Robot Exploration Human Exploration vs Robot Exploration Human Exploration vs Robot

1.2k views • 30 slides

Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1 1

Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1 1 Department of Computer Science, UCLA 2 Google Research 1 / 49 Outline Background Contextual bandit problem Deep neural networks 2 / 49

739 views • 49 slides

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known parameters: number of arms n and

809 views • 67 slides

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol`

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol` o Cesa-Bianchi The Multi-Armed Bandit Problem The bandit problem [Robbins, 1952] . . . K slot machines Rewards X i ,1 , X i ,2 , . . . of

591 views • 15 slides

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina Biehl Ekaterina Biehl Overview: Overview: based on * A. Broder et al.: A Semantic Approach to Contextual Advertising . SIGIR Conference, 2007

602 views • 23 slides

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual Inquiry Contextual Inquiry Go where the customer works, immerse yourself in the context, observe the customer (user) as he or she works, and talk to

534 views • 28 slides

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof.

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for

349 views • 17 slides

tr ts ts t

tr ts t t tr ts ts t

672 views • 49 slides

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits and Gittins indices 1 An Example [Most of this lecture from Berry & Fristedt] You want to maximize the sum of two observa- tions. The process

407 views • 13 slides

CMP722 ADVANCED COMPUTER VISION Lecture #6 Deep Reinforcement Learning Aykut Erdem //

Image: StarCraft II DeepMind feature layer API CMP722 ADVANCED COMPUTER VISION Lecture #6 Deep Reinforcement Learning Aykut Erdem // Hacettepe University // Spring 2019 Illustration: William Joel Previously on CMP722 image captioning

1.52k views • 96 slides

CS 101.2: Notes for Lecture 2 (Bandit Problems) Andreas Krause January 9, 2009 In these notes we

CS 101.2: Notes for Lecture 2 (Bandit Problems) Andreas Krause January 9, 2009 In these notes we prove logarithmic regret for the UCB 1 algorithm (based on Auer et al, 2002). 1 Notation j : Index of slot machine arm (1 to k ). n :

331 views • 4 slides

Wireless Optimisation via Convex Bandits Unlicensed LTE/WiFi Coexistence Cristina Cano and

Wireless Optimisation via Convex Bandits Unlicensed LTE/WiFi Coexistence Cristina Cano and Gergely Neu Universitat Oberta de Catalunya Universitat Pompeu Fabra Sigcomm NetAI 2018 Unlicensed LTE/WiFi Coexistence 3/20 Unlicensed LTE Mobile

374 views • 20 slides

Sample-Based Methods for Continuous Action Markov Decision Processes Chris Mansley Ari

Sample-Based Methods for Continuous Action Markov Decision Processes Chris Mansley Ari Weinstein Michael Littman Rutgers Unversity From Learning to Planning Bellman Equation From Learning to Planning Bellman Equation Continuous State

392 views • 20 slides

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University May 6, 2015 PP, PEC, AM (McGill

434 views • 29 slides