Cooperative Multi-Agent Bandits with Heavy Tails Introduction - PowerPoint PPT Presentation

Cooperative Bandits with Heavy Tails Dubey and Pentland ICML 2020 Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation Summary Abhimanyu Dubey and Alex Pentland Background K-Armed Bandits Cooperation Optimism Media Lab and Institute for Data Systems and Society (IDSS) Heavy Tails Massachusetts Institute of Technology Method Message-Passing dubeya@mit.edu Algorithm Regret Guarantees Optimizations ICML 2020 Conclusion

Cooperative Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits Cooperation Summary Background K-Armed Bandits Cooperation Optimism Heavy Tails Method Message-Passing Algorithm Regret Guarantees Optimizations Conclusion Figure: Multi-armed bandit (courtesy lilianweng.github.io ).

Cooperative Cooperative Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction ◮ Distributed learning is an increasingly popular paradigm in ML: multiple K-Armed Bandits Cooperation parties collaborate to train a stronger joint model by sharing data. Summary ◮ An alternative is to let data remain in a distributed setup, and have one ML Background K-Armed Bandits algorithm (agent) for each data center, i.e., federated learning. Cooperation Optimism ◮ Each agent can communicate with other agents to (securely) share relevant Heavy Tails Method information, e.g., over a network. Message-Passing ◮ The group of all agents therefore collectively cooperate to solve their own Algorithm Regret Guarantees Optimizations learning problems. Conclusion

Cooperative Summary of Contributions Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ In many application areas, observations are heavy tailed, e.g., in internet Introduction K-Armed Bandits traffic analysis and supply chain networks. Cooperation Summary ◮ Current cooperative bandit algorithms operate largely by distributed Background consensus , that averages opinions held by agents. K-Armed Bandits Cooperation ◮ Consensus protocols are inherently not robust to heavy-tailed reward Optimism Heavy Tails distributions, and have inefficient communication complexity. Method Message-Passing ◮ Summary : In this paper, we propose algorithms for the heavy-tailed Algorithm Regret Guarantees cooperative bandit that uses an alternative decentralized communication Optimizations protocol, resulting in efficient and robust multi-agent bandit learning. Conclusion

Cooperative Stochastic Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ K actions (“arms”) that return rewards r k sampled i.i.d. from K different Introduction distributions, each with mean µ k . K-Armed Bandits Cooperation ◮ The problem proceeds in rounds; at each round t , the agent chooses action Summary a t , and obtains a randomly drawn reward r ( t ), such that E [ r ( t )] = µ a t . Background K-Armed Bandits ◮ The goal is to minimize regret (for µ ∗ = arg max k ∈ [ K ] µ k ), Cooperation Optimism Heavy Tails � � ( µ ∗ − µ k ) E [ n k ( T )] T · µ ∗ Method R ( T ) = − µ k E [ n k ( T )] = Message-Passing � �� Algorithm k ∈ [ K ] k ∈ [ K ] best possible Regret Guarantees avg. reward � �� Optimizations obtained reward expected “loss” from (in expectation) Conclusion picking suboptimal arms

Cooperative Cooperative Multi-Armed Bandits Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits ◮ M agents are each faced with the same K -armed bandit problem. Cooperation Summary ◮ Agents are connected by a (connected, undirected) graph G . Background K-Armed Bandits ◮ The agents must cooperate to collectively minimize the group regret : Cooperation Optimism Heavy Tails � R G ( T ) = R m ( T ) Method Message-Passing m ∈G Algorithm Regret Guarantees Optimizations Conclusion

Cooperative The Upper Confidence Bound (UCB) Algorithm Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ “Optimism in the face of uncertainty” strategy – i.e. to be optimistic about Introduction an arm when we are uncertain of its utility. K-Armed Bandits Cooperation ◮ For each arm, we compute Summary Background K-Armed Bandits � � n k ( t − 1) r i Cooperation 2 ln( t − 1) i =1 k Optimism Q k ( t ) = + . Heavy Tails n k ( t − 1) n k ( t − 1) Method � �� Message-Passing empirical mean UCB( t ) Algorithm Regret Guarantees Optimizations ◮ Choose arm with largest Q k ( t ). Conclusion

Cooperative Heavy-Tailed Distributions Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ A random variable X is light-tailed if it admits a finite moment generating Introduction function, i.e. there exists u 0 > 0 such that ∀| u | ≤ u 0 , K-Armed Bandits Cooperation Summary M X ( u ) � E [exp( uX )] < ∞ . Background K-Armed Bandits Otherwise X is heavy-tailed. Cooperation Optimism ◮ When rewards are sub-Gaussian, the empirical mean and variance are the Heavy Tails Method obvious estimators for the first 2 moments. Message-Passing ◮ They are asymptotically optimal estimators (rate of concentration). Algorithm Regret Guarantees ◮ They can be computed in O (1) time for streaming settings. Optimizations ◮ In case of heavy-tailed rewards we require robust estimators to obtain Conclusion optimal regret.

Cooperative Robust Estimators and the Running Consensus Bandits with Heavy Tails Dubey and Pentland ICML 2020 Introduction K-Armed Bandits ◮ Distributed consensus works by slowly ”averaging” opinions between Cooperation Summary neighboring agents. This subsequent averaging causes information to Background diffuse throughout the network. K-Armed Bandits Cooperation ◮ Robust mean estimators, however, are fundamentally incompatible with Optimism Heavy Tails naive averaging, and cannot be updated in O (1) time. Method ◮ Trimmed mean and Catoni’s estimators require O ( T ) consensus algorithms. Message-Passing ◮ Median-of-means estimator requires O (log T ) consensus algorithms. Algorithm Regret Guarantees Optimizations Conclusion

Cooperative Message Passing Protocol Bandits with Heavy Tails Dubey and Pentland ICML 2020 ◮ Instead of a consensus, each agent communicates its actions and rewards in Introduction K-Armed Bandits the form of a tuple ( a t , r t , d ), where d ≤ γ is the life of the message (i.e., it Cooperation Summary is dropped after it has been forward γ times). Background ◮ For any time t , each agent K-Armed Bandits ◮ Gathers all messages M ( t ) from its neighbors and discards stale messages. Cooperation Optimism ◮ Chooses an arm following any algorithm and obtains a reward. Heavy Tails ◮ Adds the action-reward tuple ( a t , r t , γ ) to M ( t ). Method Message-Passing ◮ Sends each message in M ( t ) to all its neighbors. Algorithm Regret Guarantees ◮ Since we are working with individual rewards, all robust estimators can be Optimizations applied to this protocol. Conclusion

Cooperative Robust Message-Passing UCB Bandits with Heavy Tails Dubey and Pentland ICML 2020 For any time t , each agent m Introduction ◮ Gathers all messages M ( t ) from its neighbors and discards all messages K-Armed Bandits Cooperation with d = 0. Summary ◮ Filters all unseen messages by arm k and adds new rewards to Background K-Armed Bandits corresponding sets S k m ( t ). Cooperation Optimism ◮ Computes the mean ˆ µ m k ( t ) for each arm k from S k m ( t ) using any robust Heavy Tails Method mean estimator. Message-Passing ◮ Chooses arm that maximizes ˆ k ( t ) + UCB m µ m Algorithm k ( t ), and obtains reward r t . Regret Guarantees Optimizations ◮ Adds the action-reward tuple ( a t , r t , γ ) to M ( t ). Conclusion ◮ Sends each message in M ( t ) to all its neighbors.

Cooperative Lower Bounds Bandits with Heavy Tails Dubey and Pentland ICML 2020 Lower Bound for Cooperative Setting Introduction K-Armed Bandits Under suitable assumptions, for any ∆ ∈ (0 , 1 / 4) and ε ∈ (0 , 1], there exist Cooperation Summary K ≥ 2 heavy-tailed distributions such that any consistent algorithm obtains Background � K ∆ − 1 /ε ln T � regret of order Ω when run on a connected graph G . K-Armed Bandits Cooperation Optimism ◮ This is a generalization of the lower bound for multiple arm pulls to account Heavy Tails for delayed feedback over connected graphs. Method Message-Passing ◮ Existing optimality rates are in comparison to a single agent pulling MT Algorithm Regret Guarantees arms sequentially, which we demonstrate to be inaccurate with upper Optimizations Conclusion bounds that match the above lower bound.

Cooperative Multi-Agent Bandits with Heavy Tails Introduction - PowerPoint PPT Presentation

Cooperative Bandits with Heavy Tails Dubey and Pentland ICML 2020 Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation Summary Abhimanyu Dubey and Alex Pentland Background K-Armed Bandits

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Freight TAILS Presentation to: FREVUE London Partners Meeting 25th October 2016 Freight TAILS

CS 331: Artificial Intelligence in the last column tails black 3 0.09 sum to 1 tails red 1

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

User search and free sofuware culture by sajolida 1. What is Tails 2. Our usability process 3.

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Kernel Methods for Cooperative Contextual Bandits Introduction Motivation UCB Algorithms Basic

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Advanced Econometrics 2, Hilary term 2021 Multi-armed bandits Maximilian Kasy Department of

Making Lemonade Teaching Young Children Teaching Young Children to Think Optimistically

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 1Q 2020 Management

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality Kwang-Sung Jun

What If We Only Know Our Solution Hurwiczs Home Page Optimism-Pessimism Title Page

Over-optimism in biostatistics and bioinformatics Anne-Laure Boulesteix joint with M. Jelizarow,

(Ir)rational Exuberance: Optimism, Ambiguity and Risk Anat Bracha and Don Brown Boston FRB and

The case for optimism Singapore Healthcare Management Congress August 14 16, 2018 Michael J.

sr rt t rtst