stochastic bandits
play

Stochastic Bandits Kirthevasan Kandasamy Carnegie Mellon University - PowerPoint PPT Presentation

Stochastic Bandits Kirthevasan Kandasamy Carnegie Mellon University University of Moratuwa, Sri Lanka August 17, 2017 Slides: www.cs.cmu.edu/~kkandasa/misc/mora-slides.pdf Slides are up on my webpage: www.cs.cmu.edu/~kkandasa On-line


  1. Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x Construct upper conf. bound: ϕ t ( x ) = µ t − 1 ( x ) + β 1 / 2 σ t − 1 ( x ). t 9/39

  2. Algorithm 1: Upper Confidence Bounds in GP Bandits Model f ∼ GP ( 0 , κ ). Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) . f ( x ) ϕ t = µ t − 1 + β 1 / 2 σ t − 1 t x t x Maximise upper confidence bound. 9/39

  3. GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . 10/39

  4. GP-UCB µ t − 1 ( x ) + β 1 / 2 x t = argmax σ t − 1 ( x ) t x ◮ µ t − 1 : Exploitation ◮ σ t − 1 : Exploration ◮ β t controls the tradeoff. β t ≍ log t . GP-UCB , κ is an SE kernel (Srinivas et al. 2010) � log( n ) d vol ( X ) f ( x ⋆ ) − max w.h.p S n = t =1 ,..., n f ( x t ) � n 10/39

  5. GP-UCB (Srinivas et al. 2010) f ( x ) x 11/39

  6. GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 11/39

  7. GP-UCB (Srinivas et al. 2010) f ( x ) t = 2 x 11/39

  8. GP-UCB (Srinivas et al. 2010) f ( x ) t = 3 x 11/39

  9. GP-UCB (Srinivas et al. 2010) f ( x ) t = 4 x 11/39

  10. GP-UCB (Srinivas et al. 2010) f ( x ) t = 5 x 11/39

  11. GP-UCB (Srinivas et al. 2010) f ( x ) t = 6 x 11/39

  12. GP-UCB (Srinivas et al. 2010) f ( x ) t = 7 x 11/39

  13. GP-UCB (Srinivas et al. 2010) f ( x ) t = 11 x 11/39

  14. GP-UCB (Srinivas et al. 2010) f ( x ) t = 25 x 11/39

  15. Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 12/39

  16. Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 12/39

  17. Algorithm 2: Thompson Sampling in GP Bandits Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x Draw sample g from posterior. Choose x t = argmax x g ( x ). 12/39

  18. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) x 13/39

  19. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 1 x 13/39

  20. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 2 x 13/39

  21. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 3 x 13/39

  22. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 4 x 13/39

  23. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 5 x 13/39

  24. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 6 x 13/39

  25. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 7 x 13/39

  26. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 11 x 13/39

  27. Thompson Sampling (TS) in GPs (Thompson, 1933) f ( x ) t = 25 x 13/39

  28. Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 14/39

  29. SL2College www.sl2college.org 15/39

  30. SL2College Research Collaboration Program -Ashwin de Silva www.sl2college.org/research-collab research-collab@sl2college.org 16/39

  31. SL2College Research Collaboration Program How it works We have a pool of doctoral/post-doctoral/professorial mentors (all Sri Lankan). We connect Sri Lankan undergrads to mentors, who will guide the students on a research project. Aim: Publish a paper (at a good venue) within a 9-15 month time frame. 17/39

  32. Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . 18/39

  33. Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. 18/39

  34. Application Process ◮ Fill out the application form on our webpage: www.sl2college.org/research-collab - mention areas of interests and preferred mentors. ◮ .. and email your CV to research-collab@sl2college.org . ◮ If we decide to proceed, we ask you to submit a ∼ 1 page research statement, - your research interests & future plans - why you are interested in working with aforesaid mentor. ◮ We send your CV & statement to the mentor. If he/she is interested, we initiate a collaboration. ◮ You report to us once every 3 months. 18/39

  35. SL2College Research Collaboration Team Ashwin Nuwan Rajitha Umashanthi Kirthevasan www.sl2college.org/research-collab research-collab@sl2college.org 19/39

  36. Outline ◮ Part I: Stochastic bandits (cont’d) 1. Gaussian processes for smooth bandits 2. Algorithms: Upper Confidence Bound (UCB) & Thompson Sampling (TS) ◮ Digression: SL2College Research Collaboration Program ◮ Part II: My research 1. Multi-fidelity bandit: cheap approximations to an expensive experiments 2. Parallelising arm pulls 20/39

  37. Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 21/39

  38. Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 21/39

  39. Part 2.1: Multi-fidelity Bandits Motivating question: What if we have cheap approximations to f ? 1. Computational astrophysics and other scientific experiments: simulations and numerical computations with less granularity. Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 2. Hyper-parameter tuning: Train & validate with a subset of the data. 3. Robotics & autonomous driving: computer simulation vs real world experiment. 21/39

  40. Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) 22/39

  41. Multi-fidelity Methods For specific applications, ◮ Industrial design (Forrester et al. 2007) ◮ Hyper-parameter tuning (Agarwal et al. 2011, Klein et al. 2015, Li et al. 2016) ◮ Active learning (Zhang & Chaudhuri 2015) ◮ Robotics (Cutler et al. 2014) Multi-fidelity bandits & optimisation (Huang et al. 2006, Forrester et al. 2007, March & Wilcox 2012, Poloczek et al. 2016) . . . with theoretical guarantees (Kandasamy et al. NIPS 2016a&b, Kandasamy et al. ICML 2017) 22/39

  42. Multi-fidelity Bandits (Kandasamy et al. ICML 2017) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters Z X 23/39

  43. Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . X 23/39

  44. Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. 23/39

  45. Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). 23/39

  46. Multi-fidelity Bandits (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all granularity values X ← space of cosmological parameters f ( x ) g : Z × X → R . g ( z , x ) ← likelihood score when per- forming integrations on a grid of size z Z x ⋆ at cosmological parameters x . z • X Denote f ( x ) = g ( z • , x ) where z • ∈ Z . z • = highest grid size. End Goal: Find x ⋆ = argmax x f ( x ). A cost function, λ : Z → R + . λ ( z ) λ ( z ) = O ( z p ) (say). Z z • 23/39

  47. Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). 24/39

  48. Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. 24/39

  49. Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. No reward for pulling an arm at low fidelities, but use cheap evaluations at z � = z • to speed up search for x ⋆ . 24/39

  50. Algorithm: BOCA (Kandasamy et al. ICML 2017) 25/39

  51. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + 25/39

  52. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 25/39

  53. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 25/39

  54. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

  55. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

  56. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

  57. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

  58. Algorithm: BOCA (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R std-dev σ t − 1 : Z × X → R + (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � λ ( z ) � q � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) = ξ ( z ) λ ( z • ) (3) (cheapest z in Z t ) z t = argmin λ ( z ) z ∈Z t 25/39

  59. Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” 26/39

  60. Theoretical Results for BOCA g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” large h Z small h Z E.g.: For SE kernels, bandwidth h Z controls smoothness. 26/39

  61. Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ 27/39

  62. Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z 27/39

  63. Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . 27/39

  64. Theoretical Results for BOCA SE kernel, GP-UCB (Srinivas et al. 2010) � vol ( X ) w.h.p S (Λ) � Λ BOCA SE kernel, (Kandasamy et al. ICML 2017) � � vol ( X α ) vol ( X ) ∀ α > 0 , w.h.p S (Λ) � + Λ 2 − α Λ 1 � � X α = x ; f ( x ⋆ ) − f ( x ) � C α h Z If h Z is large (good approximations), vol ( X α ) ≪ vol ( X ), and BOCA is much better than GP-UCB . N.B: Dropping constants and polylog terms. 27/39

  65. Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 28/39

  66. Experiment: Cosmological inference on Type-1a supernovae data Estimate Hubble constant, dark matter fraction & dark energy fraction by maximising likelihood on N • = 192 data. Requires numerical integration on a grid of size G • = 10 6 . Approximate with N ∈ [50 , 192] or G ∈ [10 2 , 10 6 ] (2D fidelity space) . 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 1000 1500 2000 2500 3000 3500 28/39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend