MergeDTS for Large Scale Condorcet Dueling Bandits Chang Li , Ilya - PowerPoint PPT Presentation

Apr 06, 2024 •44 likes •104 views

MergeDTS for Large Scale Condorcet Dueling Bandits Chang Li , Ilya Markov, Maarten de Rijke and Masrour Zoghi What are dueling bandits? The K -armed dueling bandits (Yue et al, COLT 2009) : K arms (aka actions) Each time-step:

MergeDTS for Large Scale Condorcet Dueling Bandits Chang Li , Ilya Markov, Maarten de Rijke and Masrour Zoghi
What are dueling bandits? • The K -armed dueling bandits (Yue et al, COLT 2009) : K arms (aka actions) • Each time-step: • ➡ the algorithm chooses two arms, l and r (for “left” and “right”); ➡ the dueling happens between l and r with one returned as the winner. Goal : converge to the optimal play for both l and r. • � 2
What is the optimal play? • Notation : is the preference matrix with P := [ P ij ] P ij = Pr (arm i beats arm j ) • Assumption : there exists one arm that on average beats all the other arms: called the Condorcet winner. P 1 j > 0 . 5 for all j 6 = 1 • Regret : the loss of comparing non-Condorcet winner. r t = 0 . 5 ∗ ( P 1 l − 0 . 5) + 0 . 5 ∗ ( P 1 r − 0 . 5) • Optimal play : only play the Condorcet winner, i.e. choose the Condorcet winner as l and r. � 3
Related works • DTS (Wu et al. NIPS 2016) , etc.   Limited to small scale set up, i.e. K is small • Self-Sparring (Sui et al. UAI 2017) , etc.   Designed under strict assumptions, i.e. not cyclic relationship • MergeRUCB (Zoghi, WSDM 2014)   Designed for large scale dueling bandits yet with high cumulative regret � 4
Merge Double Thompson Sampling • Randomly partition arms into small groups. • Each time step: 1. Sample a tournament inside a small group; 2. Choose the winner and loser of the tournament as l and r , respectively; 3. Compare l and r online, and update statistic; 4. Eliminate an arm if it is dominated by any other arm with high confidence. 5. If half arms are eliminated, re-partition rankers. • Stop if only one arm left. � 5
Experiment: online ranker evaluation MSLR-Navigational MergeRUCB α = 0 . 8 6 25000 DTS α = 0 . 8 6 Self-Sparring 20000 Cumulative regret MergeDTS α = 0 . 8 6 15000 10000 5000 10 4 10 5 10 6 10 7 10 8 Iteration � 6

Recommend

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem A New, Fast, and Simple Algorithm A New, Fast, and Simple Algorithm A New, Fast, and

1.56k views • 134 slides

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Cooperative Bandits with Heavy Tails Dubey and Pentland ICML 2020 Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation Summary Abhimanyu Dubey and Alex Pentland Background K-Armed Bandits

350 views • 16 slides

The Dueling Bandits Problem Yisong Yue Collaborators Yanan

The Dueling Bandits Problem Yisong Yue Collaborators Yanan Vincent Josef Sui Zhuang Broder Joel Thorsten Bobby Burdick Joachims Kleinberg

791 views • 76 slides

Finding Nash Equilibria in Dueling Games Dehghani, Gholami, Seddighin University of Maryland

Finding Nash Equilibria in Dueling Games Dehghani, Gholami, Seddighin University of Maryland milad621@gmail.com,saeedreza.seddighin@gmail.com,sina.dehghani@gmail.com May 7, 2014 Dehghani, Gholami, Seddighin (UMD) Dueling Games May 7, 2014 1

258 views • 11 slides

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Introduction to bandits Games Hierarchical bandits Lipschitz optimization X -armed bandits Planning Conclusion Introduction to Bandits R emi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA

1.1k views • 67 slides

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

A large-scale International IPv6 Network A large-scale International IPv6 Network A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org www.6net.org A large-scale International IPv6 Network A

174 views • 15 slides

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel Progra ram Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel Progra ram The Basic Idea Chicago Bandits

203 views • 9 slides

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline Background q What are bandits? q Motivations Data poisoning attacks on stochastic bandits q Offline model q Online model q Simulation results

218 views • 17 slides

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Multi-Armed Bandits Problem: bandits with unknown average reward () Which arm should we play at each

414 views • 15 slides

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Bandits Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard University 1 / 25 Bandits Agenda Thus far: Supervised machine learning data are given. Next: Active learning

665 views • 25 slides

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Differentially- Private Federated Linear Bandits Dubey and Pentland June 2020 Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual Bandits Summary Background Abhimanyu Dubey and Alex Pentland

437 views • 20 slides

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar] Sec. 2.9 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Outline Bayesian bandits Thompson sampling Contextual bandits

464 views • 22 slides

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A.

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A. Joint work with Aditya Gopalan , Michael Fu and Steve Marcus University of Maryland, College Park Indian Institute of Science

510 views • 27 slides

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team Universitat Pompeu Fabra, Barcelona Online learning and bandits Adaptive bounds in online learning Adaptive bounds for bandits Outline

1.22k views • 75 slides

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint work with milie Kaufmann PhD Student Team SCEE, IETR, CentraleSuplec, Rennes & Team SequeL, CRIStAL, Inria, Lille CMAP Seminar 31 st

1.47k views • 96 slides

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

951 views • 63 slides

IIFC Activities IIFC Activities at Bh bh A Bhabha Atomic Research Centre i R h C (BARC), India

IIFC Activities IIFC Activities at Bh bh A Bhabha Atomic Research Centre i R h C (BARC), India ( ), By By G.P. Srivastava Director, E&I Group Director, E&I Group Current Accelerator Programs at BARC LEHIPA Low Energy High

847 views • 58 slides

Mobicents 2.0 The Open Source Communication Platform DERUELLE Jean JBoss, by Red Hat 138

Mobicents 2.0 The Open Source Communication Platform DERUELLE Jean JBoss, by Red Hat 138 AGENDA > VoIP Introduction > VoIP Basics > Mobicents 2.0 Overview SIP Servlets Server JAIN SLEE Server Media Server

624 views • 43 slides

What makes a successful speech-enabled call routing application? Diana Binnenpoorte and Dorota

What makes a successful speech-enabled call routing application? Diana Binnenpoorte and Dorota Iskra LangTech February 28 - 29, 2008 Rome, Italy 1 Overview Who are we? Why speech-enabled call routing? What is speech-enabled

267 views • 16 slides

Session Initiation Protocol (SIP) Sess o o o oco (S ) Part II Prof. Ai-Chun Pang Graduate

Session Initiation Protocol (SIP) Sess o o o oco (S ) Part II Prof. Ai-Chun Pang Graduate Institute of Networking and Multimedia, Dept. of Comp. Sci. and Info. Engr., National Taiwan University Email: acpang@csie.ntu.edu.tw

591 views • 38 slides

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract Use multiprocessor shared-memory or distributed memory machines to search the game tree in parallel. Questions: Is it

841 views • 41 slides

Classification Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit Ram GT PhD alum; SkyTree

651 views • 31 slides

Sem Semantic 3D Modelling antic 3D Modelling ubor Ladick work with Christian Hne, Nikolay

Sem Semantic 3D Modelling antic 3D Modelling ubor Ladick work with Christian Hne, Nikolay Savinov, Jianbo Shi, Bernhard Zeisl, Marc Pollefeys Schedule Introduction Discrete MRF Optimization using Graph Cuts Classifiers for

1.73k views • 152 slides

1 Course Outline Course Outline Course Outline Course Outline 3D Graphics Pipeline 3D

Goals Goals Foundations of Computer Graphics Foundations of Computer Graphics Systems: Write complex 3D graphics programs (Spring 2010) (Spring 2010) (real-time in OpenGL, offline raytracer, animation) CS 184, Lecture 1: Overview and

176 views • 5 slides