Performance Assessment in Optimization Anne Auger, CMAP & Inria

Visualization and presentation of single runs

Displaying 3 runs (three trials)

Displaying 51 runs

Which Statistics?

More problems with average / expectations from Hansen GECCO 2019 Experimentation tutorial

Which Statistics?

Implications from Hansen GECCO 2019 Experimentation tutorial

Benchmarking Black-Box Optimizers Benchmarking: running an algorithm on several test functions in order to evaluate the performance of the algorithm

Why Numerical Benchmarking? Evaluate the performance of optimization algorithms Compare the performance of di ff erent algorithms understand strength and weaknesses of algorithms help in design of new algorithms

On performance measures …

Performance measure - What to measure? CPU time (to reach a given target) drawbacks: depend on the implementation, on the language, on the machine time is spent on code optimization instead of science Testing heuristics, we have it all wrong, J.N. Hooker, 1995 Journal of Heuristics Prefer “absolute” value: # of function evaluations to reach a given target assumptions: internal cost of the algorithm negligible or measured independently

On performance measures - Requirements “Algorithm A is 10/100 times faster than Algorithm B to solve this type of problems”

On performance measures - Requirements “Algorithm A is 10/100 times faster than Algorithm B to solve this type of problems” quantitative measures As opposed to displayed: mean f-value after 3.10^5 f-evals (51 runs) bold: statistically signi fi cant concluded: “EFWA signi fi cantly better than EFWA-NG” Source: Dynamic search in fireworks algorithm, Shaoqiu Zheng, Andreas Janecek, Junzhi Li and Ying Tan CEC 2014

On performance measures - Requirements a performance measure should be quantitative, with a ratio scale well-interpretable with a meaning relevant in the “real world” simple

Fixed Cost versus Fixed Budget - Collecting Data

Fixed Cost versus Fixed Budget - Collecting Data Collect for a given target (several target), the number of function evaluations needed to reach a target Repeat several times: if algorithms are stochastic, never draw a conclusion from a single run if deterministic algorithm, repeat by changing (randomly) the initial conditions

ECDF: Empirical Cumulative Distribution Function of the Runtime

̂ De fi nition of an ECDF Let be real random variables. Then the ( X 1 , …, X n ) empirical cumulative distribution function (ECDF) is defined as n F ( t ) = 1 ∑ 1 X i ≤ t n i =1

We display the ECDF of the runtime to reach target function values (see next slides for illustrations)

A Convergence Graph A Convergence Graph

First Hitting Time is Monotonous

15 Runs

15 Runs ≤ 15 Runtime Data Points target

Empirical Cumulative Distribution Empirical CDF 1 the ECDF of run lengths to reach the target 0.8 ● has for each data point a 0.6 vertical step of constant size 0.4 ● displays for each x-value (budget) 0.2 the count of observations to the left (first 0 hitting times)

Empirical Cumulative Distribution Empirical CDF 1 interpretations possible: 0.8 ● 80% of the runs reached the target 0.6 ● e.g. 60% of the 0.4 runs need between 2000 and 4000 0.2 evaluations 0

Aggregation 15 runs

Aggregation 15 runs 50 targets

Aggregation 15 runs 50 targets ECDF with 750 steps

We can aggregate over: • different targets • different functions and targets We should not aggregate over dimension as functions of different dimensions have typically very different runtimes

ECDF aggregated over targets - single functions ECDF for 3 different algorithms

ECDF aggregated over targets - single function ECDF for a single algorithm different dimensions

ERT/ART: Average Runtime

Which performance measure ?

Expected Running Time (restart algo) ERT = E [ RT r ] = 1 − p s p s E [ RT unsuccessful + E [ RT successful ] Estimator for ERT #succ b p s = # Runs \ RT unsucc = Average Evals of unsuccessful runs \ RT succ = Average Evals of successful runs #Evals ART = #success

Example: scaling behavior A R T A

On Test functions

Test functions function testbed (set of test functions) should “re fl ect reality”: should model typical di ff iculties one is willing to solve Example: BBOB testbed (implemented in the COCO platform) the test functions are mainly non-convex and non-separable scalable with the search space dimension not too easy to solve, but yet comprehensible

Test functions (cont.) If aggregation of results over all functions from a testbed (through ECDF): one needs to be careful that some di ff iculties are not over- represented or that not too many easy functions are present

The bbob Testbed • 24 functions in 5 groups: • 6 dimensions: 2, 3, 5, 10, 20, (40 optional)

BBOB testbed Black-Box Optimization Benchmarking test suite http://coco.gforge.inria.fr/doku.php?id=start noiseless / noisy testbed noiseless testbed noisy testbed

COCO platform: automatizing the benchmarking process

https://github.com/numbbo/coco Step 1: download COCO

https://github.com/numbbo/coco Step 2: installation of post-processing

http://coco.gforge.inria.fr/doku.php?id=algorithms Step 3: downloading data for the moment: IPOP-CMA-ES

https://github.com/numbbo/coco postprocess python –m bbob_pproc IPOP-CMA-ES

Performance Assessment in Optimization Anne Auger, CMAP & Inria - PowerPoint PPT Presentation

Performance Assessment in Optimization Anne Auger, CMAP & Inria Visualization and presentation of single runs Displaying 3 runs (three trials) Displaying 3 runs (three trials) Displaying 3 runs (three trials) Displaying 51 runs Which

Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole Polytechnique) Laurent Dumas (U.

INTRODUCTION AND METHODS OF CMAP SOFTWARE BY IPC-ENG What is Cmap? CMAP is a software

Derivative Free Optimization Optimization and AMS Masters - University Paris Saclay Exercices -

Stochastic / Randomized Derivative Free Optimization Anne Auger (Inria and CMAP, Ecole

PCS SERVICE FOR SALE FOR SALE Used PHI 660 Scanning Auger PHI 660 Scanning Auger Used

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

Recent results of the Pierre Auger Observatory 1 Creusot Alexandre for the Pierre Auger

Latest results of the Pierre Auger Observatory Isabelle Lhenry-Yvon for the Pierre Auger

Reconstruction accuracy of the surface detector of the Pierre Auger Observatory The Pierre Auger

Highlights of the Pierre Auger Observatory Antonella Castellina (INFN,Torino) 1 20th

Particle physics at the Pierre Auger Observatory Jan Ebr* for the Pierre Auger Collaboration

WELCOME TO THE WELCOME TO THE PIERRE AUGER PIERRE AUGER OBSERVATORY! OBSERVATORY!

Information Geometric Optimization How information theory sheds new light on black-box

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

Measurement of the UHECRs flux and composition with Pierre Auger Observatory OBSERVATORY Ioana

Arrival directions of the highest-energy cosmic rays detected by the Pierre Auger Observatory Ugo

2017

X-ray Polarimetry with Gas Pixel detectors R. Bellazzini INFN - sez. Pisa, Pisa, Italy December

Attosecond science of complex molecules: perspectives at FELs Session VI: attosecond experiments

(On behalf of SIDDHARTA and AMADEUS collaborations) LNF INFN, Frascati Hadrons in Nuclei, YITP,

PVMD Ren van Swaaij Delft University of Technology Three recombination processes Shockley-

The EGI CernVM-FS Infrastructure Evolution Towards a Global Facility and Latest Developments

Renewable Thermal in RPS March 10, 2014 This webinar is co-sponsored by the Renewable Energy

Smartphone-Based Optical Fiber Sensor for the Assessment of a Fed-Batch Bioreactor Marco Csar