bayesian optimisation
play

Bayesian optimisation Gilles Louppe April 11, 2016 Problem - PowerPoint PPT Presentation

Bayesian optimisation Gilles Louppe April 11, 2016 Problem statement x = arg max f ( x ) x Constraints: f is a black box for which no closed form is known; gradients df dx are not available. f is expensive to evaluate;


  1. Bayesian optimisation Gilles Louppe April 11, 2016

  2. Problem statement x ∗ = arg max f ( x ) x Constraints: • f is a black box for which no closed form is known; gradients df dx are not available. • f is expensive to evaluate; • (optional) uncertainty on observations y i of f e.g., y i = f ( x i ) + ǫ i because of Poisson fluctuations. Goal: find x ∗ , while minimizing the number of evaluations f ( x ). 2 / 18

  3. Disclaimer If you do not have these constraints, there is certainly a better optimisation algorithm than Bayesian optimisation. (e.g., L-BFGS-B, Powell’s method (as in Minuit), etc) 3 / 18

  4. Bayesian optimisation for t = 1 : T , 1. Given observations ( x i , y i ) for i = 1 : t , build a probabilistic model for the objective f . Integrate out all possible true functions, using Gaussian process regression. 2. Optimise a cheap utility function u based on the posterior distribution for sampling the next point. x t +1 = arg max u ( x ) x Exploit uncertainty to balance exploration against exploitation. 3. Sample the next observation y t +1 at x t +1 . 4 / 18

  5. Where shall we sample next? 1.5 True (unknown) Observations 1.0 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 5 / 18

  6. Build a probabilistic model for the objective function 1.5 True (unknown) Observations µ GP ( x ) 1.0 CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x This gives a posterior distribution over functions that could have generated the observed data. 6 / 18

  7. Acquisition functions Acquisition functions u( x ) specify which sample x should be tried next: • Upper confidence bound UCB( x ) = µ GP ( x ) + κσ GP ( x ); • Probability of improvement PI( x ) = P ( f ( x ) ≥ f ( x + t ) + κ ); • Expected improvement EI( x ) = E [ f ( x ) − f ( x + t )]; • ... and many others. where x + t is the best point observed so far. In most cases, acquisition functions provide knobs (e.g., κ ) for controlling the exploration-exploitation trade-off. • Search in regions where µ GP ( x ) is high (exploitation) • Probe regions where uncertainty σ GP ( x ) is high (exploration) 7 / 18

  8. Plugging everything together ( t = 0) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x x t +1 = arg max x UCB( x ) 8 / 18

  9. ... and repeat until convergence ( t = 1) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 9 / 18

  10. ... and repeat until convergence ( t = 2) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 10 / 18

  11. ... and repeat until convergence ( t = 3) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 11 / 18

  12. ... and repeat until convergence ( t = 4) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 12 / 18

  13. ... and repeat until convergence ( t = 5) x + = 0 . 2858 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 13 / 18

  14. What is Bayesian about Bayesian optimization? • The Bayesian strategy treats the unknown objective function as a random function and place a prior over it. The prior captures our beliefs about the behaviour of the function. It is here defined by a Gaussian process whose covariance function captures assumptions about the smoothness of the objective. • Function evaluations are treated as data. They are used to update the prior to form the posterior distribution over the objective function. • The posterior distribution, in turn, is used to construct an acquisition function for querying the next point. 14 / 18

  15. Limitations • Bayesian optimisation has parameters itself! Choice of the acquisition function Choice of the kernel (i.e. design of the prior) Parameter wrapping Initialization scheme • Gaussian processes usually do not scale well to many observations and to high-dimensional data. Sequential model-based optimization provides a direct and effective alternative (i.e., replace GPs by a tree-based model). 15 / 18

  16. Applications • Bayesian optimization has been used in many scientific fields, including robotics, machine learning or life sciences. • Use cases for high energy physics? Optimisation of simulation parameters in event generators; Optimisation of compiler flags to maximize execution speed; Optimisation of hyper-parameters in machine learning for HEP; ... let’s discuss further ideas? 16 / 18

  17. Software • Python Spearmint https://github.com/JasperSnoek/spearmint GPyOpt https://github.com/SheffieldML/GPyOpt RoBO https://github.com/automl/RoBO scikit-optimize https://github.com/MechCoder/scikit-optimize (work in progress) • C++ MOE https://github.com/yelp/MOE Check also this Github repo for a vanilla implementation reproducing these slides. 17 / 18

  18. Summary • Bayesian optimisation provides a principled approach for optimising an expensive function f ; • Often very effective, provided it is itself properly configured; • Hot topic in machine learning research. Expect quick improvements! 18 / 18

  19. References Brochu, E., Cora, V. M., and De Freitas, N. (2010). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 . Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE , 104(1):148–175.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend