Kernel Methods for Cooperative Contextual Bandits Introduction - PowerPoint PPT Presentation

Kernel Methods for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Kernel Methods for Cooperative Contextual Bandits Introduction Motivation UCB Algorithms Basic Cooperation Summary of Contributions Abhimanyu Dubey and Alex Pentland Our Method Contextual Bandits Our Parameterization Media Lab and Institute for Data Systems and Society (IDSS) Algorithm Massachusetts Institute of Technology Regret Guarantees Conclusion dubeya@mit.edu ICML 2020

Kernel Methods Motivation for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Introduction ◮ Distributed learning is an increasingly popular paradigm in ML: multiple Motivation UCB Algorithms parties collaborate to train a stronger joint model by sharing data. Basic Cooperation Summary of Contributions ◮ An alternative is to let data remain in a distributed setup, and have one ML Our Method algorithm (agent) for each data center, i.e., federated learning. Contextual Bandits Our Parameterization ◮ Each agent can communicate with other agents to (securely) share relevant Algorithm Regret Guarantees information, e.g., over a network. Conclusion ◮ The group of all agents therefore collectively cooperate to solve their own learning problems.

Kernel Methods Multi-Armed Bandits for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Introduction Motivation UCB Algorithms Basic Cooperation Summary of Contributions Our Method Contextual Bandits Our Parameterization Algorithm Regret Guarantees Conclusion Figure: Multi-armed bandit (courtesy lilianweng.github.io ).

Kernel Methods The Upper Confidence Bound (UCB) Algorithm for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ “Optimism in the face of uncertainty” strategy – i.e. to be optimistic about Introduction an arm when we are uncertain of its utility. Motivation UCB Algorithms ◮ For each of K arms, we compute Basic Cooperation Summary of Contributions � � n k ( t − 1) Our Method r i 2 ln( t − 1) Contextual Bandits i =1 k Q k ( t ) = + . Our Parameterization n k ( t − 1) n k ( t − 1) Algorithm � �� Regret Guarantees � �� empirical mean Conclusion “ uncertainty ” ◮ Choose arm with largest Q k ( t ). ◮ This general family of algorithms has strong guarantees as well.

Kernel Methods Basic Cooperation: UCB with Naive Averaging for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ Basic Idea : Use observations from neighbors naively to construct Q -values. Introduction Motivation ◮ Assume 2 agents ( A and B ): UCB Algorithms Basic Cooperation Summary of � � n k ( t − 1) k , A + � n k ( t − 1) Contributions r i r i 2 ln( t − 1) Our Method i =1 i =1 k , B Q A k ( t ) = + . Contextual Bandits n A k ( t − 1) + n B n A k ( t − 1) + n B k ( t − 1) k ( t − 1) Our Parameterization � �� Algorithm � �� Regret Guarantees mean of both agents’ observations smaller “uncertainty” Conclusion ◮ Works well when each agent faces the same bandit problem. ◮ Can trivially be extended to other algorithms (e.g., Thompson Sampling)

Kernel Methods Naive combinations aren’t always useful for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Introduction Motivation UCB Algorithms Basic Cooperation ◮ Consider two agents A and B , each solving a 2-armed bandit problem. Summary of Contributions ◮ For agent A , let the arms have mean payouts (0 . 8 , 0 . 2). Our Method ◮ For agent B , let the arms have mean payouts (0 . 2 , 0 . 8). Contextual Bandits Our Parameterization ◮ If each agent naively incorporated the other agents’ observations, they will Algorithm Regret Guarantees each have mean estimates of arms as ≈ (0 . 5 , 0 . 5), leading to O ( T ) regret. Conclusion

Kernel Methods Summary for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ It is clear that instead of naively combining observations from neighbors, Introduction agents must intelligently “weigh” external behavior. Motivation UCB Algorithms Basic Cooperation ◮ Intuitively, this weighing factor would be a function of how “similar” the Summary of Contributions agents’ problems are. Our Method Contextual Bandits ◮ For each agent, when rewards are drawn from arbitrary distributions, it is Our Parameterization Algorithm unclear how “similarity” can be measured between the distributions. Regret Guarantees Conclusion ◮ Summary . In this work, we propose a framework based on Reproducing Kernel Hilbert Spaces (RKHS) to measure similarity between agent rewards, and several near-optimal algorithms for the cooperative contextual bandit problem using this framework.

Kernel Methods The Contextual Bandit Problem for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ At any trial t = 1 , 2 , ... , each agent v ∈ V is supplied a decision set D v , t . Introduction ◮ They select an action x v , t ∈ D v , t and obtain a reward y v , t . Motivation UCB Algorithms Basic Cooperation Summary of y v , t = f v ( x v , t ) + ε v , t , Contributions Our Method Contextual Bandits ◮ The objective of the problem is to minimize the group regret: Our Parameterization Algorithm Regret Guarantees � � T � � Conclusion f v ( x ∗ R G ( T ) = v , t ) − f v ( x v , t ) , t =1 v ∈ V where, x ∗ t = arg max x ∈ D v , t f v ( x ).

Kernel Methods The Cooperative Contextual Bandit for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ We assume the | V | agents communicate via an undirected, connected Introduction Motivation graph G = ( V , E ), where ( i , j ) ∈ E if agents i and j can communicate. UCB Algorithms Basic Cooperation ◮ Messages from any agent v are available to agent v ′ after d ( v , v ′ ) − 1 trials Summary of Contributions of the bandit, where d is the distance between the agents in G . Our Method Contextual Bandits ◮ Every trial, every agent sends the following message m v , t to all its Our Parameterization Algorithm neighbors in G : Regret Guarantees Conclusion m v , t = � t , v , x v , t , y v , t � ◮ This message is forwarded from agent to agent γ times (taking one trial of the bandit problem each between forwards), after which it is dropped.

Kernel Methods Parameteric Network Contexts for Cooperative Contextual Bandits Dubey and Pentland ◮ We assume that each agent v has an underlying network context , denoted ICML 2020 by z v , and the reward function f v ( · ) is parameterized by z v , i.e., for some Introduction unknown but fixed function F , Motivation UCB Algorithms Basic Cooperation Summary of f v ( x ) = F ( x , z v ) ∀ x ∈ X , z v ∈ Z . Contributions Our Method Contextual Bandits ◮ We denote ˜ x = ( x , z v ). Furthermore, we assume that F has a bounded Our Parameterization Algorithm norm in some RKHS H with kernel � K and feature φ ( · ). Regret Guarantees ◮ For a given φ : ( X × Z ) → R d and unknown (but fixed) vector θ ∈ R d , Conclusion x ) T θ . F (˜ x ) = φ (˜ ◮ This implies that in some higher-order feature space (kernel space), F is a linear function, and φ can be thought of as a “feature extractor”.

Kernel Methods Kernel Assumption for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ Now, the kernel function � x 1 ) ⊤ φ (˜ K (˜ x 1 , ˜ x 2 ) = φ (˜ x 2 ). We assume that this Introduction kernel is a composition of two separate kernels (where ˜ x i = ( x i , z i )): Motivation UCB Algorithms Basic Cooperation � K (˜ x 1 , ˜ x 2 ) = K z ( z 1 , z 2 ) · K x ( x 1 , x 2 ) . Summary of Contributions � �� Our Method network kernel action kernel Contextual Bandits Our Parameterization Algorithm Regret Guarantees ◮ K z provides us with a generic framework to measure similarity between Conclusion agent functions, and can be learnt online when it is unknown. ◮ K x can be any PSD kernel, e.g., Gaussian (RBF), Linear, deep neural network features, etc. K z can be derived from geographical or demographic constraints (e.g., social networks).

Kernel Methods Algorithm for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 ◮ We face three challenges in algorithm design: Introduction Motivation ◮ Non-identical reward functions f v . UCB Algorithms ◮ Communication delays between agents. Basic Cooperation Summary of ◮ Heterogeneity (agents possess different information at all times). Contributions Our Method ◮ We modify the Kernel-UCB [Valko13] algorithm as follows: Contextual Bandits Our Parameterization ◮ Non-identical rewards : We augment contexts x with network contexts z , Algorithm and use the augmented kernel � Regret Guarantees K to create the UCB. Conclusion ◮ Delays : We use a subsampling technique similar to [Weinberger02], i.e., each agent runs γ UCB instances in parallel. ◮ Heterogeneity : We partition G carefully in terms of cliques to bound heterogeneity, i.e., each agent only accepts messages from a subset of nodes.

Kernel Methods for Cooperative Contextual Bandits Introduction - PowerPoint PPT Presentation

Kernel Methods for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Kernel Methods for Cooperative Contextual Bandits Introduction Motivation UCB Algorithms Basic Cooperation Summary of Contributions Abhimanyu Dubey and Alex

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1 1

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual

Serving Contextual Communities Serving Contextual Communities The Evangelical Theological

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

CHAPTER 4: PRACTICAL REASONING AGENTS An Introduction to Multiagent Systems

Rational Agents (Ch. 2) Rational agent Remember vacuum problem? Agent program: if [Dirty],

Agent-Based Systems Interconnection Formerly only user-computer interaction, nowadays

Intelligent Agents 2 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 2 1 2 Intelligence Agents

BusyBees Safe Controllers for Multi-Agent Swarms Joshua Durham 1/24 Overview Motivation

Overview Agent Architectures Definition of agent architecture Classical Architectures for

Experience and Prospects for Various Control Experience and Prospects for Various Control

Economic Issues in Shared Infrastructures Costas Courcoubetis Department of Computer Science,

Kernel Methods for Cooperative Contextual Bandits Introduction - PowerPoint PPT Presentation

Kernel Methods for Cooperative Contextual Bandits Dubey and Pentland ICML 2020 Kernel Methods for Cooperative Contextual Bandits Introduction Motivation UCB Algorithms Basic Cooperation Summary of Contributions Abhimanyu Dubey and Alex

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Contextual Inquiry Take Aways Overview of Contextual Design Contextual inquiry

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits Authors: John Langford, Tom Zhang

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Simpler Optimal Algorithm for Contextual Bandits under Realizability Yunzong Xu MIT Joint work

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Contextual Analysis SWEN-444 Contextual analysis Systematic analysis of contextual user work

Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou 1 Lihong Li 2 Quanquan Gu 1 1

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

Experimental Design &amp; Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual

Serving Contextual Communities Serving Contextual Communities The Evangelical Theological

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

CHAPTER 4: PRACTICAL REASONING AGENTS An Introduction to Multiagent Systems

Rational Agents (Ch. 2) Rational agent Remember vacuum problem? Agent program: if [Dirty],

Agent-Based Systems Interconnection Formerly only user-computer interaction, nowadays

Intelligent Agents 2 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 2 1 2 Intelligence Agents

BusyBees Safe Controllers for Multi-Agent Swarms Joshua Durham 1/24 Overview Motivation

Overview Agent Architectures Definition of agent architecture Classical Architectures for

Experience and Prospects for Various Control Experience and Prospects for Various Control

Economic Issues in Shared Infrastructures Costas Courcoubetis Department of Computer Science,

Experimental Design & Evaluation 4. Contextual Inquiry SunyoungKim,PhD Contextual