Stochastic Bandits Kirthevasan Kandasamy Carnegie Mellon University - PowerPoint PPT Presentation

Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x Construct upper conf. bound: ϕ t ( x ) = µ t − 1 ( x ) + β 1 / 2 σ t − 1 ( x ). t 9/39

Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x t x Maximise upper confidence bound. 9/39

GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . 10/39

GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . GP-UCB , κ is an SE kernel (Srinivas et al. 2010) � log( n ) d vol ( X ) f ( x ⋆ ) − max w.h.p S n = t =1 ,..., n f ( x t ) � n 10/39

GP-UCB (Srinivas et al. 2010) f ( x ) x 11/39

GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 11/39

Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 12/39

Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x Draw sample g from posterior. Choose x t = argmax x g ( x ). 12/39

Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) x 13/39

Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 1 x 13/39

Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 14/39

SL2College www.sl2college.org 15/39

SL2College Research Collaboration Program -Ashwin de Silva www.sl2college.org/research-collab research-collab@sl2college.org 16/39

SL2College Research Collaboration Program How it works We have a pool of doctoral/post-doctoral/professorial mentors (all Sri Lankan). We connect Sri Lankan undergrads to mentors, who will guide the students on a research project. Aim: Publish a paper (at a good venue) within a 9-15 month time frame. 17/39

Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . 18/39

Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. 18/39

Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. ◮ We send your CV & statement to the mentor. If he/she is interested, we initiate a collaboration. ◮ You report to us once every 3 months. 18/39

SL2College Research Collaboration Team Ashwin Nuwan Rajitha Umashanthi Kirthevasan www.sl2college.org/research-collab research-collab@sl2college.org 19/39

Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 20/39

Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 21/39

Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 21/39

Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 2. Hyper-parameter tuning: Train & validate with a subset of the data. 3. Robotics & autonomous driving: computer simulation vs real world experiment. 21/39

Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) 22/39

Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) . . . with theoretical guarantees (Kandasamy et al. NIPS 2016a&b, Kandasamy et al. ICML 2017) 22/39

Multi-fidelity Bandits (Kandasamy et al. ICML 2017) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters Z X 23/39

Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . X 23/39

Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. 23/39

Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). 23/39

Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). A cost function, λ : Z → R + . λ ( z ) λ ( z ) = O ( z p ) (say). Z z • 23/39

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). 24/39

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. 24/39

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. No reward for pulling an arm at low fidelities, but use cheap evaluations at z � = z • to speed up search for x ⋆ . 24/39

Algorithm: BOCA (Kandasamy et al. ICML 2017) 25/39

Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + 25/39

Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 25/39

Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � λ ( z ) � q � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) = ξ ( z ) λ ( z • ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” 26/39

Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” large h Z small h Z E.g.: For SE kernels, bandwidth h Z controls smoothness. 26/39

Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ 27/39

Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z 27/39

Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . 27/39

Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . N.B: Dropping constants and polylog terms. 27/39

Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 28/39

Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 1000 1500 2000 2500 3000 3500 28/39

Stochastic Bandits Kirthevasan Kandasamy Carnegie Mellon University - PowerPoint PPT Presentation

Stochastic Bandits Kirthevasan Kandasamy Carnegie Mellon University University of Moratuwa, Sri Lanka August 17, 2017 Slides: www.cs.cmu.edu/~kkandasa/misc/mora-slides.pdf Slides are up on my webpage: www.cs.cmu.edu/~kkandasa On-line

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A.

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

What actions need to be taken to ensure a successful industrial transformation process? Sascha

Levels of structure within Chinese character constituents James Myers National Chung Cheng

Pre-discharge management and criteria for discharge Piotr Ponikowski Wroclaw, Poland

Creating a detailed energy breakdown from just the monthly electricity bill Nipun Batra , Amarjeet

Gut Feelings: Short Cuts To Better Decision Making Gerd Gigerenzer Max Planck Institute for

Ultra-Low-Power Integrated Circuits and Physiochemical Sensors for Next-Generation

DREAM: Dynamic Resource Allocation for Software-defined Measurement (SIGCOMM14) Masoud

Asymmetry of Genetic Code and the Role of Parrondos Paradox presented by Lee Kee Jin B.Eng.

Sambuz

Useful Links

Newsletter

Mail Us