BayesOpt: hot topics and current challenges Javier Gonz alez - PowerPoint PPT Presentation

BayesOpt: hot topics and current challenges Javier Gonz´ alez Masterclass, 7-February, 2107 @Lancaster University

Agenda of the day ◮ 9:00-11:00, Introduction to Bayesian Optimization : ◮ What is BayesOpt and why it works? ◮ Relevant things to know. ◮ 11:30-13:00, Connections, extensions and applications : ◮ Extensions to multi-task problems, constrained domains, early-stopping, high dimensions. ◮ Connections to Armed bandits and ABC. ◮ An applications in genetics. ◮ 14:00-16:00, GPyOpt LAB! : Bring your own problem! ◮ 16:30-15:30, Hot topics current challenges : ◮ Parallelization. ◮ Non-myopic methods ◮ Interactive Bayesian Optimization.

Section III: Hot topics and challenges ◮ Parallel Bayesian Optimization ◮ Non-myopic methods. ◮ Interactive Bayesian Optimization.

Scalable BO: Parallel/batch BO Avoiding the bottleneck of evaluating f ◮ Cost of f ( x n ) = cost of { f ( x n, 1 ) , . . . , f ( x n,nb ) } . ◮ Many cores available, simultaneous lab experiments, etc.

Considerations when designing a batch ◮ Available pairs { ( x j , y i ) } n i =1 are augmented with the evaluations of f on B n b = { x t, 1 , . . . , x t,nb } . t ◮ Goal: design B n b 1 , . . . , B n b m . Notation: ◮ I n : represents the available data set D n and the GP structure when n data points are available ( I t,k in the batch context). ◮ α ( x ; I n ): generic acquisition function given I n .

Optimal greedy batch design Sequential policy : Maximize: α ( x ; I t, 0 ) Greedy batch policy, 1st element t-th batch : Maximize: α ( x ; I t, 0 )

Optimal greedy batch design Sequential policy : Maximize: α ( x ; I t, 0 ) Greedy batch policy, 2nd element t-th batch : Maximize: � α ( x ; I t, 1 ) p ( y t, 1 | x t, 1 , I t, 0 ) p ( x t, 1 |I t, 0 ) d x t, 1 dy t, 1 ◮ p ( y t, 1 | x 1 , I t, 0 ): predictive distribution of the GP . ◮ p ( x 1 |I t, 0 ) = δ ( x t, 1 − arg max x ∈X α ( x ; I t, 0 )).

Optimal greedy batch design Sequential policy : Maximize: α ( x ; I t,k − 1 ) Greedy batch policy, k-th element t-th batch : Maximize: k − 1 � � α ( x ; I t,k − 1 ) p ( y t,j | x t,j , I t,j − 1 ) p ( x t,j |I t,j − 1 ) d x t,j dy t,j j =1 ◮ p ( y t,j | x t,j , I t,j − 1 ): predictive distribution of the GP . ◮ p ( x j |I t,j − 1 ) = δ ( x t,j − arg max x ∈X α ( x ; I t,j − 1 )).

Available approaches [Azimi et al., 2010; Desautels et al., 2012; Chevalier et al., 2013; Contal et al. 2013] ◮ Exploratory approaches, reduction in system uncertainty. ◮ Generate ‘fake’ observations of f using p ( y t,j | x j , I t,j − 1 ). ◮ Simultaneously optimize elements on the batch using the joint distribution of y t 1 , . . . y t,nb . Bottleneck: All these methods require to iteratively update p ( y t,j | x j , I t,j − 1 ) to model the iteration between the elements in the batch: O ( n 3 ) How to design batches reducing this cost? Local penalization

Goal: eliminate the marginalization step “To develop an heuristic approximating the ‘optimal batch design strategy’ at lower computational cost, while incorporating information about global properties of f from the GP model into the batch design” Lipschitz continuity: | f ( x 1 ) − f ( x 2 ) | ≤ L � x 1 − x 2 � p .

Interpretation of the Lipschitz continuity of f M = max x ∈X f ( x ) and B r xj ( x j ) = { x ∈ X : � x − x j � ≤ r x j } where r x j = M − f ( x j ) L 20 10 0 f(x) 10 True function 20 Samples Exclusion cones 30 Active regions 0.4 0.6 0.8 1.0 1.2 x x M / ∈ B r xj ( x j ) otherwise, the Lipschitz condition is violated.

Probabilistic version of B r x ( x ) We can do this because f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) and σ 2 ( r x j ) = σ 2 ( x j ) ◮ r x j is Gaussian with µ ( r x j ) = M − µ ( x j ) . L L 2 Local penalizers: ϕ ( x ; x j ) = p ( x / ∈ B r x j ( x j )) ϕ ( x ; x j ) = p ( r x j < � x − x j � ) = 0 . 5erfc( − z ) 1 √ where z = n ( x j ) ( L � x j − x � − M + µ n ( x j )). 2 σ 2 ◮ Reflects the size of the ’Lipschitz’ exclusion areas. ◮ Approaches to 1 when x is far form x j and decreases otherwise.

Idea to collect the batches Without using explicitly the model. Optimal batch: maximization-marginalization k − 1 � � α ( x ; I t,k − 1 ) p ( y t,j | x t,j , I t,j − 1 ) p ( x t,j |I t,j − 1 ) d x t,j dy t,j j =1 Proposal : maximization-penalization. Use the ϕ ( x ; x j ) to penalize the acquisition and predict the expected change in α ( x ; I t,k − 1 ) .

Local penalization strategy [Gonz´ alez, Dai, Hennig, Lawrence, 2016] 1st batch element 2nd batch element 3th batch element 9 9 9 α ( x ) α ( x ) α ( x ) ϕ 1 ( x ) 8 8 8 α ( x ) ϕ 1 ( x ) α ( x ) ϕ 1 ( x ) ϕ 2 ( x ) 7 7 ϕ 1 ( x ) 7 ϕ 2 ( x ) 6 6 6 value 5 value 5 value 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 10 5 0 5 10 10 5 0 5 10 10 5 0 5 10 x x x The maximization-penalization strategy selects x t,k as   k − 1   � x t,k = arg max  g ( α ( x ; I t, 0 )) ϕ ( x ; x t,j )  , x ∈X j =1 g is a transformation of α ( x ; I t, 0 ) to make it always positive.

Example for L = 50 L controls the exploration-exploitation balance within the batch.

Finding an unique Lipschitz constant Let f : X → R be a L-Lipschitz continuous function defined on a compact subset X ⊆ R D . Then L p = max x ∈X �∇ f ( x ) � p , is a valid Lipschitz constant. The gradient of f at x ∗ is distributed as a multivariate Gaussian ∇ f ( x ∗ ) | X , y , x ∗ ∼ N ( µ ∇ ( x ∗ ) , Σ 2 ∇ ( x ∗ )) We choose: ˆ � µ ∇ ( x ∗ ) � L = max X

Experiments: Sobol function Best (average) result for some given time budget.

2D experiment with ‘large domain’ Comparison in terms of the wall clock time 1.0 EI 1.1 UCB Rand-EI Best found value 1.2 Rand-UCB SM-UCB 1.3 B-UCB PE-UCB 1.4 Pred-EI Pred-UCB 1.5 qEI LP-EI 1.6 LP-UCB 1.7 0 50 100 150 200 250 300 Time(seconds)

Myopia of optimisation techniques ◮ Most global optimisation techniques are myopic, in considering no more than a single step into the future. ◮ Relieving this myopia requires solving the multi-step lookahead problem. Figure: Two evaluations, if the first evaluation is made myopically, the second must be sub-optimal.

Non-myopic thinking To think non-myopically is important: it is a way of integrating in our decisions the information about our available (limited) resources to solve a given problem.

Acquisition function: expected loss [Osborne, 2010] Loss of evaluating f at x ∗ assuming it is returning y ∗ : � y ∗ ; if y ∗ ≤ η λ ( y ∗ ) � η ; if y ∗ > η. where η = min { y 0 } , the current best found value. The loss expectation is : � Λ 1 ( x ∗ |I 0 ) � E [min( y ∗ , η )] = λ ( y ∗ ) p ( y ∗ | x ∗ , I 0 ) dy ∗ I 0 is the current information D , θ and likelihood type.

The expected loss (improvement) is myopic ◮ Selects the next evaluation as if it was the last one. ◮ The remaining available budget is not taken into account when deciding where to evaluate. How to take into account the effect of future evaluations in the decision?

Expected loss with n steps ahead Intractable even for a handful number of steps ahead n � � Λ n ( x ∗ |I 0 ) = λ ( y n ) p ( y j | x j , I j − 1 ) p ( x j |I j − 1 ) dy ∗ . . . dy n d x 2 . . . d x n j =1 ◮ p ( y j | x j , I j − 1 ): predictive distribution of the GP at x j and ◮ p ( x j |I j − 1 ): optimisation step.

Relieving the myopia of Bayesian optimisation We present... GLASSES! G lobal optimisation with L ook- A head through S tochastic S imulation and E xpected-loss S earch

GLASSES Rendering the approximation sparse Idea : jointly model the epistemic uncertainty about the steps ahead using some defining some point process. � Γ n ( x ∗ |I 0 ) = λ ( y n ) p ( y | X , I 0 , x ∗ ) p ( X |I 0 , x ∗ ) d y d X

GLASSES Technical details Selecting a good p ( X |I 0 , x ∗ ) is complicated. ◮ Replace integrating over p ( X |I 0 , x ∗ ) by conditioning over an oracle predictor F n ( x ∗ ) of the n future locations. ◮ y = ( y ∗ , . . . , y n ) T : Gaussian outputs of f at F n ( x ∗ ). ◮ Λ n � � � � x ∗ | I 0 , F n ( x ∗ ) = Γ n ( x ∗ |I 0 , F n ( x ∗ )) = E min( y , η ) . � � ◮ E min( y , η ) is computed using Expectation Propagation.

BayesOpt: hot topics and current challenges Javier Gonz alez - PowerPoint PPT Presentation

BayesOpt: hot topics and current challenges Javier Gonz alez Masterclass, 7-February, 2107 @Lancaster University Agenda of the day 9:00-11:00, Introduction to Bayesian Optimization : What is BayesOpt and why it works? Relevant

BayesOpt: Extensions and applications Javier Gonz alez Masterclass, 7-February, 2107

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Current Trends and Hot Topics from a MHRA Borderline Perspective Trends and Hot topics

Scaling Bayesian Optimization in High Dimensions Stefanie Jegelka, MIT BayesOpt Workshop 2017

Speech Processing 11-492/18-495 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Operating Systems Operating Systems Hot Topics Hot Topics http://d3s.mff.cuni.cz Martin Dck

Hot Topics in Clinical Nutrition Hot Topics In Clinical Nutrition Disclosure Robert Baron, MD

Hot Topics in Musical Acoustics Applied to Real-Time Sound Synthesis Julius Smith CCRMA, Stanford

Hot Topics in Hot Topics in Sports Medicine 2015 Sports Medicine 2015 Sports concussion

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Lecture 4.5: Hot early life and the hot early Earth The Apex Chert microfossils/ Oxygen isotopes

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017

Multiple Myopias, Multiple Selves, and the Under Saving Problem Daniel Shaviro, NYU Law School

1. Anesthetic neurotoxicity Growing concern about the effects of Pediatric Ophthalmology

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley and Garnett STA

Equity Vesting and Managerial Myopia Alex Edmans, LBS, Wharton, NBER, CEPR, ECGI Vivian W. Fang,

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Sovereign debt, government myopia and the financial sector Viral V Acharya (NYU Stern, CEPR and

Terminating Ring Exploration with Myopic Oblivious Robots GRASTA-MAC Open Problem Session

BayesOpt: hot topics and current challenges Javier Gonz alez - PowerPoint PPT Presentation

BayesOpt: hot topics and current challenges Javier Gonz alez Masterclass, 7-February, 2107 @Lancaster University Agenda of the day 9:00-11:00, Introduction to Bayesian Optimization : What is BayesOpt and why it works? Relevant

BayesOpt: Extensions and applications Javier Gonz alez Masterclass, 7-February, 2107

Applications of Constrained BayesOpt in Robotics and Rethinking Priors &amp; Hyperparameters Marc

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Current Trends and Hot Topics from a MHRA Borderline Perspective Trends and Hot topics

Scaling Bayesian Optimization in High Dimensions Stefanie Jegelka, MIT BayesOpt Workshop 2017

Speech Processing 11-492/18-495 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Operating Systems Operating Systems Hot Topics Hot Topics http://d3s.mff.cuni.cz Martin Dck

Hot Topics in Clinical Nutrition Hot Topics In Clinical Nutrition Disclosure Robert Baron, MD

Hot Topics in Musical Acoustics Applied to Real-Time Sound Synthesis Julius Smith CCRMA, Stanford

Hot Topics in Hot Topics in Sports Medicine 2015 Sports Medicine 2015 Sports concussion

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Lecture 4.5: Hot early life and the hot early Earth The Apex Chert microfossils/ Oxygen isotopes

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017

Multiple Myopias, Multiple Selves, and the Under Saving Problem Daniel Shaviro, NYU Law School

1. Anesthetic neurotoxicity Growing concern about the effects of Pediatric Ophthalmology

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley and Garnett STA

Equity Vesting and Managerial Myopia Alex Edmans, LBS, Wharton, NBER, CEPR, ECGI Vivian W. Fang,

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Sovereign debt, government myopia and the financial sector Viral V Acharya (NYU Stern, CEPR and

Terminating Ring Exploration with Myopic Oblivious Robots GRASTA-MAC Open Problem Session

Applications of Constrained BayesOpt in Robotics and Rethinking Priors & Hyperparameters Marc