2. Empirical analysis and comparisons of stochastic optimization - - PowerPoint PPT Presentation

2 empirical analysis and comparisons of stochastic
SMART_READER_LITE
LIVE PREVIEW

2. Empirical analysis and comparisons of stochastic optimization - - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics 2. Empirical analysis and comparisons of stochastic optimization algorithms Petr Po s k Substantial part of this material is based on


slide-1
SLIDE 1

CZECH TECHNICAL UNIVERSITY IN PRAGUE

Faculty of Electrical Engineering Department of Cybernetics

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 1 / 30

  • 2. Empirical analysis and comparisons
  • f stochastic optimization algorithms

Petr Poˇ s´ ık Substantial part of this material is based on slides provided with the book ’Stochastic Local Search: Foundations and Applications’ by Holger H. Hoos and Thomas St¨ utzle (Morgan Kaufmann, 2004) See www.sls-book.net for further information.

slide-2
SLIDE 2

Contents

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 2 / 30

■ No-Free-Lunch Theorem ■ What is so hard about the comparison of stochastic methods? ■ Simple statistical comparisons ■ Comparisons based on running length distributions

slide-3
SLIDE 3

Motivation

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 3 / 30

slide-4
SLIDE 4

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

slide-5
SLIDE 5

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

slide-6
SLIDE 6

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing.

slide-7
SLIDE 7

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing. ■ If something appears to be free, there is always a cost to the person or to society as a

whole even though that cost may be hidden or distributed.

slide-8
SLIDE 8

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing. ■ If something appears to be free, there is always a cost to the person or to society as a

whole even though that cost may be hidden or distributed. No-Free-Lunch theorem in search and optimization [WM97]

■ Informally, for discrete spaces: “Any two algorithms are equivalent when their

performance is averaged across all possible problems.”

slide-9
SLIDE 9

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing. ■ If something appears to be free, there is always a cost to the person or to society as a

whole even though that cost may be hidden or distributed. No-Free-Lunch theorem in search and optimization [WM97]

■ Informally, for discrete spaces: “Any two algorithms are equivalent when their

performance is averaged across all possible problems.”

■ For a particular problem (or a particular class of problems), different search

algorithms may obtain different results.

slide-10
SLIDE 10

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing. ■ If something appears to be free, there is always a cost to the person or to society as a

whole even though that cost may be hidden or distributed. No-Free-Lunch theorem in search and optimization [WM97]

■ Informally, for discrete spaces: “Any two algorithms are equivalent when their

performance is averaged across all possible problems.”

■ For a particular problem (or a particular class of problems), different search

algorithms may obtain different results.

■ If an algorithm achieves superior results on some problems, it must pay with

inferiority on other problems.

slide-11
SLIDE 11

No-Free-Lunch Theorem

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 4 / 30

“There is no such thing as a free lunch.”

■ Refers to the nineteenth century practice in American bars of offering a “free lunch”

with drinks.

■ The meaning of the adage: It is impossible to get something for nothing. ■ If something appears to be free, there is always a cost to the person or to society as a

whole even though that cost may be hidden or distributed. No-Free-Lunch theorem in search and optimization [WM97]

■ Informally, for discrete spaces: “Any two algorithms are equivalent when their

performance is averaged across all possible problems.”

■ For a particular problem (or a particular class of problems), different search

algorithms may obtain different results.

■ If an algorithm achieves superior results on some problems, it must pay with

inferiority on other problems. It makes sense to study which algorithms are suitable for which kinds of problems!!!

[WM97]

  • D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Trans. on Evolutionary Computation, 1(1):67–82,

1997.

slide-12
SLIDE 12

Monte Carlo vs. Las Vegas Algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 5 / 30

EOA belong to the class of Monte Carlo or Las Vegas algorithms (LVAs):

slide-13
SLIDE 13

Monte Carlo vs. Las Vegas Algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 5 / 30

EOA belong to the class of Monte Carlo or Las Vegas algorithms (LVAs):

■ Monte Carlo algorithm: It always stops and provides a solution, but the solution

may not be correct. The solution quality is a random variable.

slide-14
SLIDE 14

Monte Carlo vs. Las Vegas Algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 5 / 30

EOA belong to the class of Monte Carlo or Las Vegas algorithms (LVAs):

■ Monte Carlo algorithm: It always stops and provides a solution, but the solution

may not be correct. The solution quality is a random variable.

■ Las Vegas algorithm: It always produces a correct solution, but needs a priori

unknown time to find it. The running time is a random variable.

slide-15
SLIDE 15

Monte Carlo vs. Las Vegas Algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 5 / 30

EOA belong to the class of Monte Carlo or Las Vegas algorithms (LVAs):

■ Monte Carlo algorithm: It always stops and provides a solution, but the solution

may not be correct. The solution quality is a random variable.

■ Las Vegas algorithm: It always produces a correct solution, but needs a priori

unknown time to find it. The running time is a random variable.

■ LVA can be turned to MCA by bounding the allowed running time.

slide-16
SLIDE 16

Monte Carlo vs. Las Vegas Algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 5 / 30

EOA belong to the class of Monte Carlo or Las Vegas algorithms (LVAs):

■ Monte Carlo algorithm: It always stops and provides a solution, but the solution

may not be correct. The solution quality is a random variable.

■ Las Vegas algorithm: It always produces a correct solution, but needs a priori

unknown time to find it. The running time is a random variable.

■ LVA can be turned to MCA by bounding the allowed running time. ■ MCA can be turned to LVA by restarting the algorithm from randomly chosen states.

slide-17
SLIDE 17

Las Vegas algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 6 / 30

Las Vegas algorithms:

■ An algorithm A for a decision problem class Π is a Las Vegas algorithm iff it has the

following properties:

■ If A terminates for certain π ∈ Π and returns a solution s, then s is guaranteed to

be a correct solution of π.

■ For any given instance π ∈ Π, the runtime of A applied to π, RTA,π, is a random

variable.

slide-18
SLIDE 18

Las Vegas algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 6 / 30

Las Vegas algorithms:

■ An algorithm A for a decision problem class Π is a Las Vegas algorithm iff it has the

following properties:

■ If A terminates for certain π ∈ Π and returns a solution s, then s is guaranteed to

be a correct solution of π.

■ For any given instance π ∈ Π, the runtime of A applied to π, RTA,π, is a random

variable.

■ An algorithm A for an optimization problem class Π is an optimization Las Vegas

algorithm iff it has the following properties:

■ For any given instance π ∈ Π, the runtime of A applied to π needed to find a

solution with certain quality q, RTA,π(q), is a random variable.

■ For any given instance π ∈ Π, the solution quality achieved by A applied to π

after certain time t, SQA,π(t), is a random variable.

slide-19
SLIDE 19

Las Vegas algorithms

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 6 / 30

Las Vegas algorithms:

■ An algorithm A for a decision problem class Π is a Las Vegas algorithm iff it has the

following properties:

■ If A terminates for certain π ∈ Π and returns a solution s, then s is guaranteed to

be a correct solution of π.

■ For any given instance π ∈ Π, the runtime of A applied to π, RTA,π, is a random

variable.

■ An algorithm A for an optimization problem class Π is an optimization Las Vegas

algorithm iff it has the following properties:

■ For any given instance π ∈ Π, the runtime of A applied to π needed to find a

solution with certain quality q, RTA,π(q), is a random variable.

■ For any given instance π ∈ Π, the solution quality achieved by A applied to π

after certain time t, SQA,π(t), is a random variable.

■ LVAs are typically incomplete or at most asymptotically complete.

slide-20
SLIDE 20

Runtime Behaviour for Decision Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 7 / 30

Definitions:

A is an algorithm for a class Π of decision problems.

Ps (RTA,π ≤ t) is a probability that A finds a solution for a problem instance π ∈ Π in time less than or equal to t.

slide-21
SLIDE 21

Runtime Behaviour for Decision Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 7 / 30

Definitions:

A is an algorithm for a class Π of decision problems.

Ps (RTA,π ≤ t) is a probability that A finds a solution for a problem instance π ∈ Π in time less than or equal to t. Complete algorithm A can provably solve any solvable decision problem instance π ∈ Π after a finite time, i.e. A is complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax) = 1.

(1)

slide-22
SLIDE 22

Runtime Behaviour for Decision Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 7 / 30

Definitions:

A is an algorithm for a class Π of decision problems.

Ps (RTA,π ≤ t) is a probability that A finds a solution for a problem instance π ∈ Π in time less than or equal to t. Complete algorithm A can provably solve any solvable decision problem instance π ∈ Π after a finite time, i.e. A is complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax) = 1.

(1) Asymptotically complete algorithm A can solve any solvable problem instance π ∈ Π with arbitrarily high probability when allowed to run long enough, i.e. A is asymptotically complete if and only if

∀π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t) = 1.

(2)

slide-23
SLIDE 23

Runtime Behaviour for Decision Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 7 / 30

Definitions:

A is an algorithm for a class Π of decision problems.

Ps (RTA,π ≤ t) is a probability that A finds a solution for a problem instance π ∈ Π in time less than or equal to t. Complete algorithm A can provably solve any solvable decision problem instance π ∈ Π after a finite time, i.e. A is complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax) = 1.

(1) Asymptotically complete algorithm A can solve any solvable problem instance π ∈ Π with arbitrarily high probability when allowed to run long enough, i.e. A is asymptotically complete if and only if

∀π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t) = 1.

(2) Incomplete algorithm A cannot be guaranteed to find the solution even if allowed to run indefinitely long, i.e. if it is not asymptotically complete, i.e. A is incomplete if and only if

∃ solvable π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t) < 1.

(3)

slide-24
SLIDE 24

Runtime Behaviour for Optimization Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 8 / 30

Simple generalization based on transforming the optimization problem to related decision problem by setting the solution quality bound to q = r · q∗(π):

A is an algorithm for a class Π of optimization problems.

Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a solution of quality better than or equal to q for a solvable problem instance π ∈ Π in time less than or equal to t.

q∗(π) is the quality of optimal solution to problem π.

r ≥ 1, q > 0.

slide-25
SLIDE 25

Runtime Behaviour for Optimization Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 8 / 30

Simple generalization based on transforming the optimization problem to related decision problem by setting the solution quality bound to q = r · q∗(π):

A is an algorithm for a class Π of optimization problems.

Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a solution of quality better than or equal to q for a solvable problem instance π ∈ Π in time less than or equal to t.

q∗(π) is the quality of optimal solution to problem π.

r ≥ 1, q > 0. Algorithm A is r-complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax, SQA,π ≤ r · q∗(π)) = 1.

(4)

slide-26
SLIDE 26

Runtime Behaviour for Optimization Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 8 / 30

Simple generalization based on transforming the optimization problem to related decision problem by setting the solution quality bound to q = r · q∗(π):

A is an algorithm for a class Π of optimization problems.

Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a solution of quality better than or equal to q for a solvable problem instance π ∈ Π in time less than or equal to t.

q∗(π) is the quality of optimal solution to problem π.

r ≥ 1, q > 0. Algorithm A is r-complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax, SQA,π ≤ r · q∗(π)) = 1.

(4) Algorithm A is asymptotically r-complete if and only if

∀π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t, SQA,π ≤ r · q∗(π)) = 1.

(5)

slide-27
SLIDE 27

Runtime Behaviour for Optimization Problems

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 8 / 30

Simple generalization based on transforming the optimization problem to related decision problem by setting the solution quality bound to q = r · q∗(π):

A is an algorithm for a class Π of optimization problems.

Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a solution of quality better than or equal to q for a solvable problem instance π ∈ Π in time less than or equal to t.

q∗(π) is the quality of optimal solution to problem π.

r ≥ 1, q > 0. Algorithm A is r-complete if and only if

∀π ∈ Π, ∃tmax : Ps (RTA,π ≤ tmax, SQA,π ≤ r · q∗(π)) = 1.

(4) Algorithm A is asymptotically r-complete if and only if

∀π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t, SQA,π ≤ r · q∗(π)) = 1.

(5) Algorithm A is r-incomplete if and only if

∃ solvable π ∈ Π : lim

t→∞ Ps (RTA,π ≤ t, SQA,π ≤ r · q∗(π)) < 1.

(6)

slide-28
SLIDE 28

Some Tweaks

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 9 / 30

■ Incompleteness of many LVAs is typically caused by their inability to escape from

attractive local minima regions of the search space.

■ Remedy: use diversification mechanisms such as random restart, random walk,

tabu, . . .

■ In many cases, these can render algorithms provably asymptotically complete,

but effectiveness in practice can vary widely.

■ Completeness can be achived by restarting an incomplete method from a solution

generated by a complete (exhaustive) algorithm.

■ Typically very ineffective due to large size of the search space.

slide-29
SLIDE 29

Theoretical vs. Empirical Analysis of LVAs

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 10 / 30

■ Practically relevant Las Vegas algorithms are typically difficult to analyse

  • theoretically. (Algorithms are often non-deterministic.)
slide-30
SLIDE 30

Theoretical vs. Empirical Analysis of LVAs

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 10 / 30

■ Practically relevant Las Vegas algorithms are typically difficult to analyse

  • theoretically. (Algorithms are often non-deterministic.)

■ Cases in which theoretical results are available are often of limited practical

relevance, because they

■ rely on idealised assumptions that do not apply to practical situations,

slide-31
SLIDE 31

Theoretical vs. Empirical Analysis of LVAs

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 10 / 30

■ Practically relevant Las Vegas algorithms are typically difficult to analyse

  • theoretically. (Algorithms are often non-deterministic.)

■ Cases in which theoretical results are available are often of limited practical

relevance, because they

■ rely on idealised assumptions that do not apply to practical situations, ■ apply to worst-case or highly idealised average-case behaviour only, or

slide-32
SLIDE 32

Theoretical vs. Empirical Analysis of LVAs

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 10 / 30

■ Practically relevant Las Vegas algorithms are typically difficult to analyse

  • theoretically. (Algorithms are often non-deterministic.)

■ Cases in which theoretical results are available are often of limited practical

relevance, because they

■ rely on idealised assumptions that do not apply to practical situations, ■ apply to worst-case or highly idealised average-case behaviour only, or ■ capture only asymptotic behaviour and do not reflect actual behaviour with

sufficient accuracy.

slide-33
SLIDE 33

Theoretical vs. Empirical Analysis of LVAs

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 10 / 30

■ Practically relevant Las Vegas algorithms are typically difficult to analyse

  • theoretically. (Algorithms are often non-deterministic.)

■ Cases in which theoretical results are available are often of limited practical

relevance, because they

■ rely on idealised assumptions that do not apply to practical situations, ■ apply to worst-case or highly idealised average-case behaviour only, or ■ capture only asymptotic behaviour and do not reflect actual behaviour with

sufficient accuracy. Therefore, analyse the behaviour of LVAs using empirical methodology, ideally based

  • n the scientific method:

■ make observations ■ formulate hypothesis/hypotheses (model) ■ While not satisfied with model (and deadline not exceeded):

  • 1. design computational experiment to test model
  • 2. conduct computational experiment
  • 3. analyse experimental results
  • 4. revise model based on results
slide-34
SLIDE 34

Application Scenarios and Evaluation Criteria

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 11 / 30

Type 1: Hard time limit tmax for finding solution; solutions found later are useless (real-time environments with strict deadlines, e.g., dynamic task scheduling or on-line robot control).

⇒ Evaluation criterion:

  • dec. prob.: solution probability at time tmax, Ps (RT ≤ tmax)

  • pt. prob.: expected quality of the solution found at time tmax, E(SQ(tmax))
  • bj. function

time tmax avg( ftmax) var( ftmax)

Possible problem: What does “The expected solution quality of algorithm A is 2 times better than for algorithm B” actually mean?

slide-35
SLIDE 35

Application Scenarios and Evaluation Criteria (cont.)

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 12 / 30

Type 2: No time limits given, algorithm can be run until a solution is found (off-line computations, non-realtime environments, e.g., configuration of production facility).

⇒ Evaluation criterion:

  • dec. prob.: expected runtime to solve a problem

  • pt. prob.: expected runtime to reach solution of certain quality
  • bj. function

time ftarget avg(t ftarget) var(t ftarget)

Is there any problem with “The expected runtime of algorithm A is 2 times larger than for algorithm B”?

slide-36
SLIDE 36

Application Scenarios and Evaluation Criteria (cont.)

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 13 / 30

Type 3: Utility of solutions depends in more complex ways on the time required to find them; characterised by a utility function U:

  • dec. prob.: U : R+ → 0, 1, where U(t) = utility of solution found at time t

  • pt. prob.: U : R+ × R+ → 0, 1, where U(t, q) = utility of solution with quality q

found at time t

slide-37
SLIDE 37

Application Scenarios and Evaluation Criteria (cont.)

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 13 / 30

Type 3: Utility of solutions depends in more complex ways on the time required to find them; characterised by a utility function U:

  • dec. prob.: U : R+ → 0, 1, where U(t) = utility of solution found at time t

  • pt. prob.: U : R+ × R+ → 0, 1, where U(t, q) = utility of solution with quality q

found at time t Example: The direct benefit of a solution is invariant over time, but the cost of computing time diminishes the final payoff according to U(t) = max{u0 − c · t, 0} (constant discounting).

slide-38
SLIDE 38

Application Scenarios and Evaluation Criteria (cont.)

Motivation

  • No-Free-Lunch

Theorem

  • Monte Carlo vs. Las

Vegas Algorithms

  • Las Vegas

algorithms

  • Runtime Behaviour

for Decision Problems

  • Runtime Behaviour

for Optimization Problems

  • Some Tweaks
  • Theoretical vs.

Empirical Analysis of LVAs

  • Application

Scenarios and Evaluation Criteria Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 13 / 30

Type 3: Utility of solutions depends in more complex ways on the time required to find them; characterised by a utility function U:

  • dec. prob.: U : R+ → 0, 1, where U(t) = utility of solution found at time t

  • pt. prob.: U : R+ × R+ → 0, 1, where U(t, q) = utility of solution with quality q

found at time t Example: The direct benefit of a solution is invariant over time, but the cost of computing time diminishes the final payoff according to U(t) = max{u0 − c · t, 0} (constant discounting).

⇒ Evaluation criterion: utility-weighted solution probability

  • dec. prob.: U(t) · Ps (RT ≤ t), or

  • pt. prob.: U(t, q) · Ps (RT ≤ t, SQ ≤ q)

requires detailed knowledge of Ps (. . .) for arbitrary t (and arbitrary q).

slide-39
SLIDE 39

Empirical Algorithm Comparison

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 14 / 30

slide-40
SLIDE 40

CPU Runtime vs Operation Counts

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 15 / 30

Remark: Is it better to measure the time in seconds or e.g. in function evaluations?

■ Results of experiments should be comparable. ■ Wall-clock time depends on the machine configuration, computer language, and on

the operating system used to run the experiments.

■ Since the objective function is often the most time-consuming operation in the

  • ptimization cycle, many authors use the number of objective function evaluations as the

primary measure of “time”.

slide-41
SLIDE 41

Scenario 1: Limited time

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 16 / 30

■ Let them run for certain time tmax and compare the average quality of returned

solution, ave(SQ)

  • bj. function

time tmax

slide-42
SLIDE 42

Scenario 1: Limited time

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 16 / 30

■ Let them run for certain time tmax and compare the average quality of returned

solution, ave(SQ)

  • bj. function

time tmax

■ For tmax,1, blue algorithm is better than red.

slide-43
SLIDE 43

Scenario 1: Limited time

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 16 / 30

■ Let them run for certain time tmax and compare the average quality of returned

solution, ave(SQ)

  • bj. function

time tmax,1 tmax,2

■ For tmax,1, blue algorithm is better than red. ■ For tmax,2, blue algorithm is worse than red. ■ WARNING! The figure can change when tmax changes!!!

slide-44
SLIDE 44

Scenario 1: Limited time

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 16 / 30

■ Let them run for certain time tmax and compare the average quality of returned

solution, ave(SQ)

  • bj. function

time tmax,1 tmax,2

■ For tmax,1, blue algorithm is better than red. ■ For tmax,2, blue algorithm is worse than red. ■ WARNING! The figure can change when tmax changes!!! ■ Can our claims be false? What is the probability that our claims are wrong?

slide-45
SLIDE 45

Student’s t-test

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 17 / 30

Independent two-sample t-test:

■ Statistical method used to test if the means of 2 normally distributed populations are

equal.

■ The larger the difference between means, the higher the probability the means are

different.

■ The lower the variance inside the populations, the higher the probability the means

are different.

■ For details, see e.g. [Luk09, sec. 11.1.2]. ■ Implemented in most mathematical and statistical software, e.g. in MATLAB. ■ Can be easily implemented in any language.

Assumptions:

■ Both populations should have normal distribution. ■ Almost never fulfilled with the populations of solution qualities.

Remedy: a non-parametric test!

[Luk09] Sean Luke. Essentials of Metaheuristics. 2009. available at http://cs.gmu.edu/∼sean/book/metaheuristics/.

slide-46
SLIDE 46

Mann-Whitney-Wilcoxon rank-sum test

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 18 / 30

Non-parametric test assessing whether two independent samples of observations have equally large values.

■ Virtually identical to: ■ combine both samples (for each observation, remember its original group), ■ sort the values, ■ replace the values by ranks, ■ use the ranks with ordinary parametric two-sample t-test. ■ The measurements must be at least ordinal: ■ We must be able to sort them. ■ This allows us to merge results from runs which reached the target level with the

results of runs which did not.

slide-47
SLIDE 47

Scenario 2: Prescribed target level

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 19 / 30

■ Let them run until they find a solution of certain quality ftarget and compare the

average runtime, ave(RT)

  • bj. function

time ftarget

slide-48
SLIDE 48

Scenario 2: Prescribed target level

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 19 / 30

■ Let them run until they find a solution of certain quality ftarget and compare the

average runtime, ave(RT)

  • bj. function

time ftarget

■ For ftarget,1, blue algorithm is better than red.

slide-49
SLIDE 49

Scenario 2: Prescribed target level

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 19 / 30

■ Let them run until they find a solution of certain quality ftarget and compare the

average runtime, ave(RT)

  • bj. function

time ftarget,1 ftarget,2

■ For ftarget,1, blue algorithm is better than red. ■ For ftarget,2, blue algorithm still seems to better than red (if it finds the solution, it

finds it faster), but 2 blue runs did not reach the target level yet, i.e. (we are much less sure that blue is better).

■ WARNING! The figure can change when ftarget changes!!!

slide-50
SLIDE 50

Scenario 2: Prescribed target level

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 19 / 30

■ Let them run until they find a solution of certain quality ftarget and compare the

average runtime, ave(RT)

  • bj. function

time ftarget,1 ftarget,2

■ For ftarget,1, blue algorithm is better than red. ■ For ftarget,2, blue algorithm still seems to better than red (if it finds the solution, it

finds it faster), but 2 blue runs did not reach the target level yet, i.e. (we are much less sure that blue is better).

■ WARNING! The figure can change when ftarget changes!!! ■ The same statistical tests as for scenario 1 can be used.

slide-51
SLIDE 51

Scenarios 1 and 2 combined

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 20 / 30

■ Let them run until they find a solution of certain quality ftarget or until they use all

the allowed time tmax.

  • bj. function

time ftarget tmax

RT is measured in seconds or function evaluations, SQ is measured in something different; now, how can we test if one algorithm is better than the other?

slide-52
SLIDE 52

Scenarios 1 and 2 combined

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 20 / 30

■ Let them run until they find a solution of certain quality ftarget or until they use all

the allowed time tmax.

  • bj. function

time ftarget tmax

RT is measured in seconds or function evaluations, SQ is measured in something different; now, how can we test if one algorithm is better than the other?

■ The situation when the algorithm reaches ftarget is better than when it reaches tmax.

We can still sort the values.

■ We can use the Mann-Whitney U-test.

slide-53
SLIDE 53

Scenarios 1 and 2 combined

Motivation Empirical Algorithm Comparison

  • CPU Runtime vs

Operation Counts

  • Scenario 1: Limited

time

  • Student’s t-test
  • Mann-Whitney-

Wilcoxon rank-sum test

  • Scenario 2:

Prescribed target level

  • Scenarios 1 and 2

combined Analysis based on runtime distribution Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 20 / 30

■ Let them run until they find a solution of certain quality ftarget or until they use all

the allowed time tmax.

  • bj. function

time ftarget tmax

RT is measured in seconds or function evaluations, SQ is measured in something different; now, how can we test if one algorithm is better than the other?

■ The situation when the algorithm reaches ftarget is better than when it reaches tmax.

We can still sort the values.

■ We can use the Mann-Whitney U-test. ■ WARNING! Again, if we change ftarget and/or tmax, the figure can change!!!

slide-54
SLIDE 54

Analysis based on runtime distribution

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 21 / 30

slide-55
SLIDE 55

Runtime distributions

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 22 / 30

LVAs are often designed and evaluated without apriori knowledge of the application scenario:

■ Assume the most general scenario — type 3 with a utility function (which is often,

however, unknown as well).

■ Evaluate based on solution probabilities Ps (RT ≤ t, SQ ≤ q) for arbitrary runtimes t

and solution qualities q. Study distributions of random variables characterising runtime and solution quality of an algorithm for the given problem instance.

slide-56
SLIDE 56

RTD defintion

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 23 / 30

Given a Las Vegas alg. A for optimization problem π:

■ The success probability Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a

solution for a solvable instance π ∈ Π of quality ≤ q in time ≤ t.

slide-57
SLIDE 57

RTD defintion

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 23 / 30

Given a Las Vegas alg. A for optimization problem π:

■ The success probability Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a

solution for a solvable instance π ∈ Π of quality ≤ q in time ≤ t.

■ The run-time distribution (RTD) of A on π is the probability distribution of the

bivariate random variable (RTA,π, SQA,π).

slide-58
SLIDE 58

RTD defintion

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 23 / 30

Given a Las Vegas alg. A for optimization problem π:

■ The success probability Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a

solution for a solvable instance π ∈ Π of quality ≤ q in time ≤ t.

■ The run-time distribution (RTD) of A on π is the probability distribution of the

bivariate random variable (RTA,π, SQA,π).

■ The runtime distribution function rtd : R+ × R+ → [0, 1], defined as

rtd(t, q) = Ps (RTA,π ≤ t, SQA,π ≤ q), completely characterises the RTD of A on π.

slide-59
SLIDE 59

RTD defintion

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 23 / 30

Given a Las Vegas alg. A for optimization problem π:

■ The success probability Ps (RTA,π ≤ t, SQA,π ≤ q) is the probability that A finds a

solution for a solvable instance π ∈ Π of quality ≤ q in time ≤ t.

■ The run-time distribution (RTD) of A on π is the probability distribution of the

bivariate random variable (RTA,π, SQA,π).

■ The runtime distribution function rtd : R+ × R+ → [0, 1], defined as

rtd(t, q) = Ps (RTA,π ≤ t, SQA,π ≤ q), completely characterises the RTD of A on π.

slide-60
SLIDE 60

RTD cross-sections

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 24 / 30

We can study the RTD using cross-sections:

slide-61
SLIDE 61

RTD cross-sections (cont.)

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 25 / 30

We can study the RTD using cross-sections: Horizontal cross-sections reveal the dependence of SQon RT:

■ The lines represent various quantiles;

e.g. for 75%-quantile we can expect that 75% of runs will return a better combination of SQ and RT.

slide-62
SLIDE 62

Empirical measurement of RTDs

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 26 / 30

Empirical estimation of Ps (RT ≤ t, SQ ≤ q):

■ Perform N independent runs of A on problem π. ■ For nth run, n ∈ 1, . . . , N, store the so-called solution quality trace, i.e. tn,i and qn,i each

time the quality is improved.

Ps(t, q) = nS(t,q)

N

, where nS(t, q) is the number of runs which provided at least one solution with ti ≤ t and qi ≤ q. Empirical RTDs are approximations of an algorithm’s true RTD:

■ The larger the N, the better the approximation.

slide-63
SLIDE 63

RTD based algorithm comparisons

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 27 / 30

E.g. type 2 application scenario: set ftarget and compare RTDs of the algorithms

0.0 0.2 0.4 0.6 0.8 1.0

  • bj. function

time time ftarget Ps

slide-64
SLIDE 64

RTD based algorithm comparisons

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 27 / 30

E.g. type 2 application scenario: set ftarget and compare RTDs of the algorithms . . . and add another ftarget level . . .

0.0 0.2 0.4 0.6 0.8 1.0

  • bj. function

time ftarget,1 ftarget,2 Ps

slide-65
SLIDE 65

RTD based algorithm comparisons

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 27 / 30

E.g. type 2 application scenario: set ftarget and compare RTDs of the algorithms . . . and add another ftarget level . . .

0.0 0.2 0.4 0.6 0.8 1.0

  • bj. function

time ftarget,1 ftarget,2 Ps This way we can aggregate RTDs of an algorithm A not only

■ over various ftarget levels, but also

slide-66
SLIDE 66

RTD based algorithm comparisons

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution

  • Runtime

distributions

  • RTD defintion
  • RTD cross-sections
  • Empirical

measurement of RTDs

  • RTD based

algorithm comparisons

  • Example of

comparison Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 27 / 30

E.g. type 2 application scenario: set ftarget and compare RTDs of the algorithms . . . and add another ftarget level . . .

0.0 0.2 0.4 0.6 0.8 1.0

  • bj. function

time ftarget,1 ftarget,2 Ps This way we can aggregate RTDs of an algorithm A not only

■ over various ftarget levels, but also ■ over different problems π ∈ Π (!!!), of

course with certain loss of information.

slide-67
SLIDE 67

Example of comparison

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 28 / 30

Workshop on black-box optimization benchmarking (BBOB) at GECCO conference: all unimodal, low cond. unimodal, high cond.

1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions DIRECT MCS GLOBAL NEWUOA BIPOP-CMA-ES best 2009 f1-24 1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions DIRECT MCS GLOBAL NEWUOA BIPOP-CMA-ES best 2009 f6-9 1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions DIRECT MCS GLOBAL NEWUOA BIPOP-CMA-ES best 2009 f10-14

separable multimodal, structured multimodal, weak structure

1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions NEWUOA GLOBAL DIRECT MCS BIPOP-CMA-ES best 2009 f1-5 1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions GLOBAL NEWUOA MCS DIRECT BIPOP-CMA-ES best 2009 f15-19 1 2 3 4 5 6 7 8 log10 of (ERT / dimension) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of functions DIRECT GLOBAL MCS NEWUOA BIPOP-CMA-ES best 2009 f20-24

slide-68
SLIDE 68

Summary

  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 29 / 30

slide-69
SLIDE 69

Summary

Motivation Empirical Algorithm Comparison Analysis based on runtime distribution Summary

  • Summary
  • P. Poˇ

s´ ık c 2014 A6M33SSL: Statistika a spolehlivost v l´ ekaˇ rstv´ ı – 30 / 30

■ No-free-lunch: all algorithms behave equally on average. ■ Comparison of optimization algorithms ■ makes sense only on a well-defined class of problems, ■ is not easy since the chosen measures of algorithm quality are often random

variables,

■ is often inconclusive unless the application scenario (utility function) is known. ■ The most common scenario is ■ fix available runtime tmax, ■ perform several runs and measure the solution quality at the end of each, ■ compare the algorithms based on median (or average) solution quality returned,

and

■ asses statistical significance of the difference using Mann-Whitney U test. ■ All measures for comparison can be derived from rtd(t, q).