On the Complexity of Best Arm Identification in Multi-Armed Bandit - PowerPoint PPT Presentation

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March 2015

Simple Multi-Armed Bandit Model Roadmap Simple Multi-Armed Bandit Model 1 Complexity of Best Arm Identification 2 Lower bounds on the complexities Gaussian Feedback Binary Feedback

Simple Multi-Armed Bandit Model The (stochastic) Multi-Armed Bandit Model Environment K arms with parameters θ = ( θ 1 , . . . , θ K ) such that for any possible choice of arm a t ∈ { 1 , . . . , K } at time t , one receives the reward X t = X a t , t where, for any 1 ≤ a ≤ K and s ≥ 1 , X a , s ∼ ν a , and the ( X a , s ) a , s are independent. Reward distributions ν a ∈ F a parametric family, or not: canonical exponential family, general bounded rewards Example Bernoulli rewards: θ ∈ [ 0 , 1 ] K , ν a = B ( θ a ) Strategy The agent’s actions follow a dynamical strategy π = ( π 1 , π 2 , . . . ) such that A t = π t ( X 1 , . . . , X t − 1 )

Simple Multi-Armed Bandit Model Real challenges Randomized clinical trials original motivation since the 1930’s dynamic strategies can save resources Recommender systems: advertisement website optimization news, blog posts, . . . Computer experiments large systems can be simulated in order to optimize some criterion over a set of parameters but the simulation cost may be high, so that only few choices are possible for the parameters Games and planning (tree-structured options)

Simple Multi-Armed Bandit Model Performance Evaluation: Cumulated Regret Cumulated Reward: S T = � T t = 1 X t Goal: Choose π so as to maximize T K � � � � E [ S T ] = E [ X t ✶ { A t = a }| X 1 , . . . , X t − 1 ] E t = 1 a = 1 K � µ a E [ N π = a ( T )] a = 1 a ( T ) = � where N π t ≤ T ✶ { A t = a } is the number of draws of arm a up to time T , and µ a = E ( ν a ) . Regret Minimization: maximizing E [ S T ] ⇐ ⇒ minimizing � R T = T µ ∗ − E [ S T ] = ( µ ∗ − µ a ) E [ N π a ( T )] a : µ a <µ ∗ where µ ∗ ∈ max { µ a : 1 ≤ a ≤ K }

Simple Multi-Armed Bandit Model Upper Confidence Bound Strategies UCB [Lai&Robins ’85; Agrawal ’95; Auer&al ’02] Construct an upper confidence bound for the expected reward of each arm: � S a ( t ) log ( t ) + N a ( t ) 2 N a ( t ) � �� estimated reward exploration bonus Choose the arm with the highest UCB It is an index strategy [Gittins ’79] Its behavior is easily interpretable and intuitively appealing Listen to Robert Nowak’s talk tomorrow!

Simple Multi-Armed Bandit Model Optimality? Generalization of [Lai&Robbins ’85] Theorem [Burnetas and Katehakis, ’96] If π is a uniformly efficient strategy, then for any θ ∈ [ 0 , 1 ] K , � � E N a ( T ) 1 ≥ lim inf K inf ( ν a , µ ∗ ) log ( T ) T →∞ δ 1 2 where � K inf ( ν a , µ ∗ ) = inf K ( ν a , ν ′ ) : ν ′ ∈ F a , E ( ν ′ ) ≥ µ ∗ � ν a K inf ( ν a , µ⋆ ) ν ∗ Idea: change of distribution δ 1 µ ∗ δ 0

Simple Multi-Armed Bandit Model Reaching Optimality: Empirical Likelihood The KL-UCB Algorithm , AoS 2013 joint work with O. Cappé, O-A. Maillard, R. Munos, G. Stoltz Parameters: An operator Π F : M 1 ( S ) → F ; a non-decreasing function f : N → R Initialization: Pull each arm of { 1 , . . . , K } once for t = K to T − 1 do compute for each arm a the quantity � � � � ≤ f ( t ) � � U a ( t ) = sup E ( ν ) : ν ∈ F and KL Π F ˆ ν a ( t ) , ν N a ( t ) pick an arm A t + 1 ∈ arg max U a ( t ) a ∈{ 1 ,..., K } end for

Simple Multi-Armed Bandit Model Regret bound Theorem: Assume that F is the set of finitely supported probability distributions over S = [ 0 , 1 ] , that µ a > 0 for all arms a and that µ ⋆ < 1 . There exists a constant M ( ν a , µ ⋆ ) > 0 only depending on ν a and µ ⋆ such that, with the choice � � for t ≥ 2 , for all T ≥ 3 : f ( t ) = log ( t ) + log log ( t ) log ( T ) 36 � 4 / 5 log � � � � � ν a , µ ⋆ � + N a ( T ) ≤ log ( T ) log ( T ) E � ( µ ⋆ ) 4 K inf � � 2 µ ⋆ 72 � � 4 / 5 + ( µ ⋆ ) 4 + log ( T ) � ν a , µ ⋆ � 2 ( 1 − µ ⋆ ) K inf +( 1 − µ ⋆ ) 2 M ( ν a , µ ⋆ ) � � 2 / 5 log ( T ) 2 ( µ ⋆ ) 2 � � 2 µ ⋆ + log log ( T ) ν a , µ ⋆ � + ν a , µ ⋆ � 2 + 4 . � � K inf ( 1 − µ ⋆ ) K inf

Simple Multi-Armed Bandit Model Regret bound Theorem: Assume that F is the set of finitely supported probability distributions over S = [ 0 , 1 ] , that µ a > 0 for all arms a and that µ ⋆ < 1 . There exists a constant M ( ν a , µ ⋆ ) > 0 only depending on ν a and µ ⋆ such that, with the choice � � f ( t ) = log ( t ) + log log ( t ) for t ≥ 2 , for all T ≥ 3 : log ( T ) 36 � 4 / 5 log � � � � � ν a , µ ⋆ � + N a ( T ) ≤ log ( T ) log ( T ) E � ( µ ⋆ ) 4 K inf � � 2 µ ⋆ 72 � � 4 / 5 + ( µ ⋆ ) 4 + log ( T ) � ν a , µ ⋆ � 2 ( 1 − µ ⋆ ) K inf +( 1 − µ ⋆ ) 2 M ( ν a , µ ⋆ ) � � 2 / 5 log ( T ) 2 ( µ ⋆ ) 2 � � 2 µ ⋆ + log log ( T ) ν a , µ ⋆ � + ν a , µ ⋆ � 2 + 4 . � � K inf ( 1 − µ ⋆ ) K inf

Complexity of Best Arm Identification Roadmap Simple Multi-Armed Bandit Model 1 Complexity of Best Arm Identification 2 Lower bounds on the complexities Gaussian Feedback Binary Feedback

Complexity of Best Arm Identification Best Arm Identification Strategies A two-armed bandit model is a pair ν = ( ν 1 , ν 2 ) of probability distributions (’arms’) with respective means µ 1 and µ 2 a ∗ = argmax a µ a is the (unknown) best arm Strategy = a sampling rule ( A t ) t ∈ N where A t ∈ { 1 , 2 } is the arm chosen at time t (based on past observations) a sample Z t ∼ ν A t is observed a stopping rule τ indicating when he stops sampling the arms a recommendation rule ˆ a τ ∈ { 1 , 2 } indicating which arm he thinks is best (at the end of the interaction) In classical A/B Testing, the sampling rule A t is uniform on { 1 , 2 } and the stopping rule τ = t is fixed in advance.

Complexity of Best Arm Identification Best Arm Identification Joint work with Emilie Kaufmann and Olivier Cappé (Telecom ParisTech) Goal: design a strategy A = (( A t ) , τ, ˆ a τ ) such that: Fixed-budget setting Fixed-confidence setting a τ � = a ∗ ) ≤ δ P ν (ˆ τ = t a t � = a ∗ ) as small p t ( ν ) := P ν (ˆ E ν [ τ ] as small as possible as possible See also: [Mannor&Tsitsiklis ’04], [Even-Dar&al. ’06], [Audibert&al.’10], [Bubeck&al. ’11,’13], [Kalyanakrishnan&al. ’12], [Karnin&al. ’13], [Jamieson&al. ’14]...

Complexity of Best Arm Identification Two possible goals Goal: design a strategy A = (( A t ) , τ, ˆ a τ ) such that: Fixed-budget setting Fixed-confidence setting a τ � = a ∗ ) ≤ δ τ = t P ν (ˆ a t � = a ∗ ) as small p t ( ν ) := P ν (ˆ E ν [ τ ] as small as possible as possible In the particular case of uniform sampling : Fixed-budget setting Fixed-confidence setting classical test of sequential test of ( µ 1 > µ 2 ) against ( µ 1 < µ 2 ) ( µ 1 > µ 2 ) against ( µ 1 < µ 2 ) based on t samples with probability of error uniformly bounded by δ [Siegmund 85]: sequential tests can save samples !

Complexity of Best Arm Identification The complexities of best-arm identification For a class M bandit models, algorithm A = (( A t ) , τ, ˆ a τ ) is... Fixed-budget setting Fixed-confidence setting consistent on M if δ -PAC on M if a t � = a ∗ ) − a τ � = a ∗ ) ≤ δ ∀ ν ∈ M , p t ( ν ) = P ν (ˆ t →∞ 0 → ∀ ν ∈ M , P ν (ˆ From the literature � � t E ν [ τ ] ≃ C ′ H ′ ( ν ) log ( 1 /δ ) p t ( ν ) ≃ exp − CH ( ν ) [Audibert&al.’10],[Bubeck&al’11] [Mannor&Tsitsiklis ’04],[Even-Dar&al. ’06] [Bubeck&al’13],... [Kalanakrishnan&al’12],... = ⇒ two complexities � � − 1 E ν [ τ ] − 1 κ B ( ν ) = inf lim sup t log p t ( ν ) κ C ( ν ) = A δ − PAC lim sup inf log ( 1 /δ A cons. t →∞ δ → 0 for a probability of error ≤ δ , for a probability of error ≤ δ , budget t ≃ κ B ( ν ) log ( 1 /δ ) E ν [ τ ] ≃ κ C ( ν ) log ( 1 /δ )

Complexity of Best Arm Identification Lower bounds on the complexities Changes of distribution Theorem: how to use (and hide) the change of distribution Let ν and ν ′ be two bandit models with K arms such that for all a , the distributions ν a and ν ′ a are mutually absolutely continuous. For any almost-surely finite stopping time σ with respect to ( F t ) , K � � � E ν [ N a ( σ )] KL ( ν a , ν ′ a ) ≥ sup kl P ν ( E ) , P ν ′ ( E ) , E∈F σ a = 1 � � where kl ( x , y ) = x log ( x / y ) + ( 1 − x ) log ( 1 − x ) / ( 1 − y ) . Useful remark: 1 � � ∀ δ ∈ [ 0 , 1 ] , δ, 1 − δ ≥ log kl 2 . 4 δ ,

On the Complexity of Best Arm Identification in Multi-Armed Bandit - PowerPoint PPT Presentation

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurlien Garivier Institut de Mathmatiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March 2015 Simple Multi-Armed Bandit

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

Multi-Arm Bandit Sutton and Barto Sutton slides and Silver 1 Multi-Arm Bandits Sutton and

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Best Arm Identification in Multi-Armed Bandits Jean-Yves Audibert 1 , 2 & S ebastien Bubeck 3

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Arms Diffusion and War Authors: Muhammet A. Bas and Andrew J. Coe Presentation by: Damon Edwards

A Sense of Identity During the Tudor ages, knights fought battles dressed in heavy, metal

Buy Green Save Green! Utilizing Life Cycle Assessment and Waste Archeology Strategies to

Growth and Capabilities 1 Erik Berglof, Simon Commander and Heike Harmgart 2 1 This paper pulls

ERS ECONOMIC RESEARCH SERVICE Road map Road map Project experience using CE data Project

Mission: Holton-Arms School cultivates the unique potential of young women through the

Development and the ATT Deepayan Basu Ray ATT Monitor Coordinator Control Arms Secretariat

RIG THE PERFECT ANGLE EVERY TIME Two picking arms each with five adjustment holes make

On the Complexity of Best Arm Identification in Multi-Armed Bandit - PowerPoint PPT Presentation

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurlien Garivier Institut de Mathmatiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March 2015 Simple Multi-Armed Bandit

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

ARM A55 Cortex Austin Bae, Harrison Ding 12/5/2018 Introduction Implements the ARM v8.2-A

Porting FreeBSD on Xen on ARM How to support your OS as Xen ARM guest Julien Grall

Multi-Arm Bandit Sutton and Barto Sutton slides and Silver 1 Multi-Arm Bandits Sutton and

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Best Arm Identification in Multi-Armed Bandits Jean-Yves Audibert 1 , 2 &amp; S ebastien Bubeck 3

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Arms Diffusion and War Authors: Muhammet A. Bas and Andrew J. Coe Presentation by: Damon Edwards

A Sense of Identity During the Tudor ages, knights fought battles dressed in heavy, metal

Buy Green Save Green! Utilizing Life Cycle Assessment and Waste Archeology Strategies to

Growth and Capabilities 1 Erik Berglof, Simon Commander and Heike Harmgart 2 1 This paper pulls

ERS ECONOMIC RESEARCH SERVICE Road map Road map Project experience using CE data Project

Mission: Holton-Arms School cultivates the unique potential of young women through the

Development and the ATT Deepayan Basu Ray ATT Monitor Coordinator Control Arms Secretariat

RIG THE PERFECT ANGLE EVERY TIME Two picking arms each with five adjustment holes make

Best Arm Identification in Multi-Armed Bandits Jean-Yves Audibert 1 , 2 & S ebastien Bubeck 3