A Short Introduction to Bayesian Optimization With applications to - PowerPoint PPT Presentation

A Short Introduction to Bayesian Optimization With applications to parameter tuning on accelerators Johannes Kirschner 28th February 2018 ICFA Workshop on Machine Learning for Accelerator Control

Solve x ∗ = arg max f ( x ) x ∈X 0

Application: Tuning of Accelerators Example: x = Parameter settings on accelerator f ( x ) = Pulse energy 1

Application: Tuning of Accelerators Example: x = Parameter settings on accelerator f ( x ) = Pulse energy Goal: Find x ∗ = arg max x ∈X f ( x ) . . . using only noisy evaluations y t = f ( x t ) + ǫ t . 1

Part 1) A flexible & statistically sound model for f : Gaussian Processes 1

From Linear Least Squares to Gaussian Processes Given: Measurements ( x 1 , y 1 ) , . . . , ( x t , y t ). Goal: Find statistical estimator ˆ f ( x ) of f . 2

From Linear Least Squares to Gaussian Processes Regularized linear least squares: T � 2 + � θ � 2 ˆ � x ⊤ � θ = arg min t θ − y t θ ∈ R d t =1 3

From Linear Least Squares to Gaussian Processes Least squares regression in a Hilbert space H : T � 2 + � f � 2 ˆ � � f = arg min f ( x t ) − y t H f ∈H t =1 4

From Linear Least Squares to Gaussian Processes Least squares regression in a Hilbert space H : T � 2 + � f � 2 ˆ � � f = arg min f ( x t ) − y t H f ∈H t =1 Closed form solution if H is a Reproducing Kernel Hilbert Space ! Defined by a kernel k : X × X → R . � − � x − y � 2 � Example: RBF Kernel k ( x , y ) = exp 2 σ 2 Kernel characterizes smoothness of functions in H . 4

From Linear Least Squares to Gaussian Processes T � 2 + � f � 2 ˆ � � f = arg min f ( x t ) − y t H f ∈H t =1 5

From Linear Least Squares to Gaussian Processes Bayesian Interpretation: ˆ f is the posterior mean of a Gaussian Process . A Gaussian Process is a distribution over functions , such that - any finite collection of evaluations is multivariate normal distributed, - the covariance structure is defined through the kernel. 5

Part 2) Bayesian Optimization Algorithms 5

Bayesian Optimization: Introduction Idea: Use confidence intervals to efficiently optimize f . Example: Plausible Maximizers 6

Bayesian Optimization: GP-UCB Idea: Use confidence intervals to efficiently optimize f . Example: GP-UCB ( G aussian P rocess - U pper C onfidence B ound) 7

Bayesian Optimization: GP-UCB Idea: Use confidence intervals to efficiently optimize f . Example: GP-UCB ( G aussian P rocess - U pper C onfidence B ound) → f ( x ∗ ) Convergence guarantee: f ( x t ) − as t − → ∞ 7

Bayesian Optimization: GP-UCB Idea: Use confidence intervals to efficiently optimize f . Example: GP-UCB ( G aussian P rocess - U pper C onfidence B ound) √ � � � T 1 x =1 f ( x ∗ ) − f ( x t ) ≤ O Convergence guarantee: 1 / T T 7

Extension 1: Safe Bayesian Optimization Objective: Keep a safety function s ( x ) below a threshold c . max x ∈X f ( x ) s.t. s ( x ) ≤ c SafeOpt: [Sui et al.,(2015); Berkenkamp et al. (2016)] 8

Extension 1: Safe Bayesian Optimization Safe Tuning of 2 Matching Quadrupoles at SwissFEL: 8

Extension 2: Heteroscedastic Noise What if the noise variance depends on evaluation point? 9

Extension 2: Heteroscedastic Noise What if the noise variance depends on evaluation point? Standard approaches, like GP-UCB, are agnostic to noise level. Information Directed Sampling : Bayesian optimization with heteroscedastic noise; including theoretical guarantees. [Kirschner and Krause (2018); Russo and Van Roy (2014)] 9

Acknowledgments Experiments at SwissFEL Joined work with Franziska Frei, Nicole Hiller, Rasmus Ischebeck, Andreas Krause, Morjmir Mutny Plots Thanks to Felix Berkenkamp for sharing his python notebooks. Pictures Accelerator Structure: Franziska Frei 10

References F. Berkenkamp, A. P. Schoellig, A. Krause., Safe Controller Optimization for Quadrotors with Gaussian Processes , ICRA, 2016 J. Kirschner and A. Krause, Information Directed Sampling and Bandits with Heteroscedastic Noise , ArXiv preprint, 2018 D. Russo and B. Van Roy, Learning to Optimize via Information-Directed Sampling , NIPS 2014 Y. Sui, A. Gotovos, J. W. Burdick, and A. Krause, Safe exploration for optimization with Gaussian processes , ICML 2015 11

A Short Introduction to Bayesian Optimization With applications to - PowerPoint PPT Presentation

A Short Introduction to Bayesian Optimization With applications to parameter tuning on accelerators Johannes Kirschner 28th February 2018 ICFA Workshop on Machine Learning for Accelerator Control Solve x = arg max f ( x ) x X 0

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

CSC321 Lecture 21: Bayesian Hyperparameter Optimization Roger Grosse Roger Grosse CSC321

2016 ANNUAL GENERAL MEETING Short Sea Shipping is OUR BUSINESS 2 Short Sea Shipping is OUR

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Overview Prediction with Gaussian Processes: Basic Ideas Bayesian Prediction Chris Williams

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Gaussian processes - Refresher and some more in insig ights Marcel Lthi Graphics and Vision

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Lecture 13 Gaussian Process Models - Part 2 Colin Rundel 03/01/2017 1 EDA and GPs 2 t i t j t

Gaussian Processes for Robotics McGill COMP 765 Oct 24 th , 2017 A robot must learn Modeling

Understanding Wide Neural Networks Jaehoon Lee Google Brain HEP-AI Journal Club Feb 5, 2019

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2