ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin - - PowerPoint PPT Presentation

atm algorithm
SMART_READER_LITE
LIVE PREVIEW

ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin - - PowerPoint PPT Presentation

Benchmarking The ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin Bodner Brown University Providence, RI, USA BBOB Workshop GECCO 2019 Prague 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function


slide-1
SLIDE 1

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 1 of 34

Benchmarking The ATM Algorithm

  • n the BBOB 2009 Noiseless Function Testbed

Benjamin Bodner

Brown University Providence, RI, USA

BBOB Workshop GECCO 2019 Prague

slide-2
SLIDE 2

Content

Motivation Intuition

Introduction

01

BBOB Noiseless BBOB Large-scale Internal runtime

Results

03

Parameters & main equations Parameter adaptation Resource allocation

Main Components

02

Recent progress Goals moving forward Conclusions

Summary

04

2 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4/1/2020

slide-3
SLIDE 3

Motivation

  • Growing need for
  • ptimization methods for

very high-dimensional settings

Image from: https://towardsdatascience.com/why- deep-learning-is-needed-over-traditional- machine-learning-1b6a99177063

Optimization Algorithms

  • Problems commonly

have 10^5- 10^8

  • ptimizable variables

[Devlin et al. 2019]

Deep Learning Physical Sciences

Image from GOMC: https://gomc- wsu.github.io/Manual/index.html 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 3 of 34

slide-4
SLIDE 4

Motivation

Deep Learning Gradient-based

  • ptimization methods can

create many difficulties

[Shalev-Shwartz et al. 2017]

Current ways of mitigating these issues Vanishing gradients Getting stuck in local minima Hyperparameter tuning

Image from: https://towardsdatascience.com/gradient-descent- algorithm-and-its-variants-10f652806a3

Noise

Image from [He et al. 2015]),

Architecture Design Regularization

Image from: Srivastava, Nitish, et al. 2014

[Sutskever 2013]

12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4 of 34

Do not always work

slide-5
SLIDE 5

Motivation

Image by Thomas Splettstoesser: https://www.behance.net/gallery/10952399/Protein- Folding-Funnel Image from GOMC: https://gomc-wsu.github.io/Manual/index.html

  • Functions are non-convex
  • Notoriously have large

numbers of local minima

[Nichita 2002]

Image from: https://en.wikibooks.org/wiki/Structural_Bioch emistry/Proteins/Protein_Folding_Problem

Physical Sciences

  • Simulated annealing and

quasi-Newton methods can be slow

  • Do not always converge to the

global minima [Hao et al. 2015]

5 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 12/7/2019

Interacting Particles Protein Folding

slide-6
SLIDE 6

Motivation

Existing algorithms have been highly successful in these settings Characteristics intentionally designed into the BBOB function testbeds

[BIPOP CMA-ES, Hansen 2009]

Covariance matrices and Hessians limit their scalability capabilities Key components and operations are usually of order D^2

Images from: Finck, Hansen, Ros, Auger 2015 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 6 of 34

slide-7
SLIDE 7

Proposal

Eliminate the use of D^2 objects and operations Adaptive Two Mode (ATM) Algorithm

A black box optimization algorithm which

  • nly maintains objects and executes
  • perations of order D

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 7 of 34

slide-8
SLIDE 8

The Adaptive Two Mode Algorithm

Directional distribution Isotropic distribution

  • The two modes complement each other
  • ATM uses a set of rules to control the amplitudes

and interactions between the modes Uses a combination of two kinds of search distributions / “modes”

Exploitation Exploration

8 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 12/7/2019

slide-9
SLIDE 9

ATM Algorithm

Start from isotropic distribution If sample leads to improvement: Suggest samples in that direction If new samples also lead to improvement: Sample in same direction at exponentially increasing amplitude Once no more “good” samples are found: Start over with the isotropic search (using an evolutionary strategy)

1 4 3 2

Best Sample Best Sample from last step Regular Sample

12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 9 of 34

Repeat

slide-10
SLIDE 10

Parameters of the Algorithm

There are (currently) 8 parameters which play several roles in the ATM algorithm:

  • Controlling the growth factors of the modes:
  • Controlling the amplitudes of the modes

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 10 of 34

𝑗𝑔 𝑌𝑐𝑓𝑡𝑢𝑢 − 𝑌𝑐𝑓𝑡𝑢𝑢−1

2 > 𝛦𝑌𝑛𝑗𝑜 2

: 𝑒𝑝 𝑒 += 1, 𝑠 = 0 𝑓𝑚𝑡𝑓: 𝑒𝑝 𝑠 += 1, 𝑒 = 0 𝑆 = 𝑆𝑛𝑏𝑦 exp 𝐻𝑠 sin 𝑛𝑝𝑒 𝜌𝑠 2 𝑈

𝑠

, 𝜌 2 − 1 𝐸 = 𝑆𝑛𝑏𝑦exp 𝐻𝑒𝑒 − 𝐸𝑒𝑠

slide-11
SLIDE 11

Parameters of the Algorithm

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 11 of 24

  • Controlling the search distribution in different axis:

𝑻 = 𝛾𝑻 + 1 − 𝛾 𝑛𝑓𝑏𝑜 𝑷 − 𝑃𝐻𝑐𝑓𝑡𝑢 𝒀 − 𝒀𝐻𝑐𝑓𝑡𝑢𝑢

2

𝑩 = 𝛽 𝑻 + 𝛽2

slide-12
SLIDE 12

𝑃𝑄 = (𝑛𝑓𝑏𝑜 Δ𝑃𝑄𝑐𝑓𝑡𝑢 + 𝑛𝑗𝑜 Δ𝑃𝑄𝑐𝑓𝑡𝑢 ) 2

Online Parameter Tuning

𝛦𝑃𝑄𝑐𝑓𝑡𝑢 = 𝐶𝑓𝑡𝑢 𝑑ℎ𝑏𝑜𝑕𝑓 𝑗𝑜 𝑢ℎ𝑓 𝑢𝑠𝑣𝑓 𝑝𝑐𝑘𝑓𝑑𝑢𝑗𝑤𝑓 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜, 𝑔𝑝𝑣𝑜𝑒 𝑐𝑧 𝑢ℎ𝑓 𝑞𝑏𝑠𝑏𝑛𝑢𝑓𝑠 𝑡𝑓𝑢

Changing characteristics at different stages Different functions Need for online parameter tuning

  • 4 intertwined parameter sets
  • Parameter sets are optimized by

another Two-Mode algorithm

  • Objective function designed to

reflect the “success” at the task of minimizing the true objective function

How to do this?

+

12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 13 of 34

slide-13
SLIDE 13

Problem with Online Tuning

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 14 of 34

New parameter sets Changing local search space Good chance for unsuitable sets

+

Proposal Fewer resources to “bad” parameter sets more resources to better ones

Resources allocated to parameter set Performance of parameter set

slide-14
SLIDE 14

Parallel Optimization with Resource Allocation

  • Given a fixed number of samples 𝑂𝑢𝑝𝑢, distributed among 𝑛 parameter sets.
  • Change the allocation of samples to reflect their performance

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 15 of 34

𝑶𝑢+1 = 𝑶𝑢 − 𝐿 𝑁−1 𝜠𝑷𝑸𝒄𝒇𝒕𝒖𝒖 − K0M−1 𝐎t − 𝐎0

𝑶𝒖 = 𝑆𝑓𝑡𝑝𝑣𝑠𝑑𝑓 𝑏𝑚𝑚𝑝𝑑𝑏𝑢𝑗𝑝𝑜 𝑤𝑓𝑑𝑢𝑝𝑠 𝑏𝑢 𝑗𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢

slide-15
SLIDE 15

Parallel Optimization with Resource Allocation – Choice of Matrices

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 16 of 34

K = 𝑛 − 1 ∗ 𝐿 −𝐿 ⋯ −𝐿 −𝐿 𝑛 − 1 ∗ 𝐿 ⋯ −𝐿 ⋮ ⋮ ⋱ ⋮ −𝐿 −𝐿 ⋯ 𝑛 − 1 ∗ 𝐿 𝐿0 = 𝑙0 𝐽 𝑁 = 𝜈 𝐽

  • Conserves the total number of samples
  • Merit-based allocation system

𝑶𝑢+1 = 𝑶𝑢 − 𝐿 𝑁−1 𝜠𝑷𝑸𝒄𝒇𝒕𝒖𝒖 − K0M−1 𝐎t − 𝐎0

slide-16
SLIDE 16

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 20 of 34

Information Flow Throughout ATM Components

Resource allocation Parameter Set3 Parameter Set4 Evaluate Samples Parameter Set1 Parameter Set2 Values of

  • bjective

function

Suggestions for samples Repeat

slide-17
SLIDE 17

ATM Optimization Process

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 21 of 24

Sum Of Different Powers - f14 Sharp Ridge - f13 Rotated Ellipse - f10

slide-18
SLIDE 18

Succeeds at solving:

  • 23/24 in 2D
  • 8/24 in 40D

Results on BBOB Testbed - Overview

  • Underperforms on

non-separable functions

  • Especially if ill-conditioned

and/or noisy One of the best

  • ptimizers for the

separable functions subset (f1-5) +Large budget

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4/1/2020 24 of 34

slide-19
SLIDE 19

Results on BBOB Testbed - Successes

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 25 of 34

  • Very effective at optimizing separable functions
  • Capable at optimizing functions with “large” regions

around the global minima which are convex

(“large” = comparable to 𝑆𝑛𝑏𝑦)

slide-20
SLIDE 20

Results on BBOB Testbed Underperformance

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 26 of 34

  • Poor performance on

rotated and ill conditioned functions

  • Poor performance

rotated and noisy/ multimodal functions

slide-21
SLIDE 21

Results from BBOB Largescale

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 27 of 34

Budget = 3000D

slide-22
SLIDE 22

Ability to Scale to Large Search Spaces

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 28 of 34

Internal runtime of the ATM algorithm scales linearly as a function of the number of variables in the search space

Results from timing experiment:

  • Internal Runtime =

Total Runtime – Evaluation time

  • 128 function evaluations,

averaged over 3 runs

  • f1 sphere function
  • Number of variables to pass

1.0 sec internal runtime

  • Google Colab GPU

CMA (pip install CMA) BFGS (scipy.optimize) Internal Runtime as a Function of Number of Variables in Search Space Internal Runtime (seconds) Number of variables in search space (NU) ATM Nelder Mead (scipy.optimize) L-BFGS-B (scipy.optimize)

slide-23
SLIDE 23

Recent Progress

  • Introduced a primary axis

updated by a moving average rule

  • Performance of the ATM

is improved on rotated functions

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 29 of 34

Iterations log10 Δ𝑔 Convergence Plot Rotated Ellipse f10 Convergence Plot Sharp Ridge f13 Original ATM New Version

slide-24
SLIDE 24

Goals Moving Forward

Improve performance on rotated and ill-conditioned functions (without using DxD objects)

01

Increase performance in noisy environments – use averaging and moving mean

03 02 04

Add second population with weak restart conditions – for multimodal functions Make the ATM more user friendly and customizable

For more information see: https://github.com/BjBodner/ATM-optimization-algorithm

30 of 34 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

slide-25
SLIDE 25

Conclusions

  • Good candidate for
  • ptimizing very

high-dimensional problems

  • More research is

needed

  • Scales linearly

with size of the search space:

  • No DxD objects
  • Underperforms on

rotated functions e.g., ill-conditioned and/or noisy functions

  • Very efficient at
  • ptimizing

separable functions

Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4/1/2020 31 of 34

The ATM Algorithm

slide-26
SLIDE 26

Acknowledgements

Dr Brenda Rubenstein, Brown University, Providence, RI, USA for her guidance in developing this algorithm. Her contributions and encouragement were essential in advancing this project forward and getting it to its current form. Dr Eran Triester Ben-Gurion University, Beersheva, Israel for his

  • ngoing collaboration.

Working with him is significantly helping improve the performance

  • f the algorithm.

32 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 4/1/2020

slide-27
SLIDE 27

References

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 33 of 34

  • Nikolaus Hansen. Benchmarking a BI-Population CMA-ES on the BBOB-2009 Function

Testbed . GECCO '09 Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers Pages 2389-2396

  • Bin Qian, Angel R. Ortiz, David Baker. Improvement of comparative model accuracy

by free-energy optimization along principal components of natural structural

  • variation. PNAS October 26, 2004, vol.101,no. 43, 1534
  • Dan Vladimir Nichita, Susana Gomez, Eduardo Luna. Multiphase equilibria calculation

by direct minimization of Gibbs free energy with a global optimization method. Computers and Chemical Engineering 26 (2002) 1703/1724

slide-28
SLIDE 28

References

  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Touta. BERT: Pre-training of Deep

Bidirectional Transformers for Language Understanding. arXiv:1810.04805

  • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah. Failures of Gradient-Based Deep
  • Learning. ICML’17 Proceedings of the 34th International Conference on MachineLearning-

Volume70,Pages3067-3075 2017.

  • Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun. Deep Residual Learning for Image
  • Recognition. arXiv:1512.03385v1 [cs.CV] 10 Dec 2015.
  • Sutskever,I.,Martens,J.,Dahl,G.,Hinton,G. On the importance of initialization and momentum

in deep learning. In Proceedings of the 30 the International Conference on Machine Learning-Volume28,I CML13, III1139-III-1147 (JMLR.org,2013)

4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 34 of 34

slide-29
SLIDE 29

Thank you! Questions?

Email: benjamin_bodner@brown_edu For more information see: https://github.com/BjBodner/ATM-optimization-algorithm