Bayesian Analysis for Algorithm Performance Comparison Is it - - PowerPoint PPT Presentation

bayesian analysis for algorithm performance comparison
SMART_READER_LITE
LIVE PREVIEW

Bayesian Analysis for Algorithm Performance Comparison Is it - - PowerPoint PPT Presentation

Bayesian Analysis for Algorithm Performance Comparison Is it possible to compare optimization algorithms without hypothesis testing? Josu Ceberio Is there a reproducibility crisis? Fuente: Monya Baker (2016) Is there a reproducibility crisis?


slide-1
SLIDE 1

Josu Ceberio

Bayesian Analysis for Algorithm Performance Comparison

Is it possible to compare optimization algorithms without hypothesis testing?

slide-2
SLIDE 2

Is there a reproducibility crisis?

Fuente: Monya Baker (2016) Is there a reproducibility crisis? Nature, 533, 452-454

slide-3
SLIDE 3

Hypothesis

Idea for solving a set

  • f problems more

efficiently.

Questions

Is my algorithm better than the state-

  • f-the-art?

On which problems is my algorithm better? Why is my algorithm better (or worse)?

Experimentation

Compare the performance

  • f my algorithm with the-

state-of-the-art on some benchmark of problems. The analysis of the results should take into account the associated uncertainty.

Conclusions

What conclusions do we draw from the experimentation? How do we answer to the formulated questions?

Is there a reproducibility crisis?

slide-4
SLIDE 4

The Questions

How l likely i is m my p proposal t to be be the b best a algorithm t to s solve a p problem? How l likely i is m my p proposal t to be be the b best a algorithm f from the c compared o

  • nes?
slide-5
SLIDE 5

The Point

STATISTICAL A ANALYSIS O OF EXPERIMENTAL R RESULTS NULL HYPOTHESIS STATISTICAL TESTING

WHAT N NHST C COMPUTES

p(t(x) > τ|H0)

<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>

Unknown Behaviour Observed Sample

slide-6
SLIDE 6

The controversy with NHST

slide-7
SLIDE 7

The controversy with NHST

We assume the null hypothesis, the average performance of the compared methods is the same. Then, the observed difference is computed from data and the probability of observing such a difference (or bigger) is estimated: the p-value. The p-value refers to the probability of erroneously assuming that there are differences when actually there are not. It is used to measure the magnitude of difference, as it decreases when the difference increases. WHAT N NHST C COMPUTES

p(t(x) > τ|H0)

<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>

1 − p(t(x) > τ|H0) = p(t(x) < τ|H0)

<latexit sha1_base64="ixOtl42DABu1QXwNHfHlqHtk6E=">ACDXicbZC7SgNBFIZnvcZ4W7W0GYxCUh2VdBCJWCTMoK5QLIs5PZMjshZmzYoh5ARtfxcZCEVt7O9/GSbKIJv4w8POdczhzfi8WXIFlfRlz8wuLS8uZlezq2vrGprm1XVNRIimr0khEsuERxQPWRU4CNaIJSOBJ1jd612N6vVbJhWPwhvox8wJSCfkPqcENHLNfswzkP+rnDZApLcl12rcIEn5PyHuGbOKlpj4VljpyaHUlVc87PVjmgSsBCoIEo1bSsGZ0AkcCrYMNtKFIsJ7ZEOa2obkoApZzC+ZogPNGljP5L6hYDH9PfEgARK9QNPdwYEumq6NoL/1ZoJ+GfOgIdxAiyk0V+IjBEeBQNbnPJKIi+NoRKrv+KaZdIQkEHmNUh2NMnz5raUdE+LlrXJ7mSncaRQbtoD+WRjU5RCZVRBVURQ/oCb2gV+PReDbejPdJ65yRzuygPzI+vgFYSZkn</latexit>

WHAT W WE W WOULD L LIKE T TO K KNOW

1 − p(H0|x) = p(H1|x)

<latexit sha1_base64="1JetnS1nfDHVeV06DeUX+AEQ8Y=">AB/HicbZDLSgMxFIYz9VbrbRLN8Ei1IVloJuhIKbLivYC7TDkEnTNjSTGZKMOIz1Vdy4UMStD+LOtzHTzkJbfwh8/OczsnvR5wp7TjfVmFldW19o7hZ2tre2d2z9w/aKowloS0S8lB2fawoZ4K2NOcdiNJceBz2vEnN1m9c0+lYqG40lE3QCPBsygrWxPLuMTqNqw3MeH06uM0AGPLvi1JyZ4DKgHCogV9Ozv/qDkMQBFZpwrFQPOZF2Uyw1I5xOS/1Y0QiTCR7RnkGBA6rcdHb8FB4bZwCHoTRPaDhzf0+kOFAqCXzTGWA9Vou1zPyv1ov18MpNmYhiTQWZLxrGHOoQZknAZOUaJ4YwEQycyskYywx0SavkgkBLX5GdpnNXRec24vKnWUx1Eh+AIVAECl6AOGqAJWoCABDyDV/BmPVkv1rv1MW8tWPlMGfyR9fkDE+OTDg=</latexit>

p(H0|x)

<latexit sha1_base64="/MpXzWcP8EqakOTUlXIz1ULR90=">AB73icbVDLSgNBEOz1GeMr6tHLYBDiJeyqoMeAlxwjmAckS5idzCZDZmfXmV4xPyEFw+KePV3vPk3TpI9aGJBQ1HVTXdXkEh0HW/nZXVtfWNzdxWfntnd2+/cHDYMHGqGa+zWMa6FVDpVC8jgIlbyWa0yiQvBkMb6Z+84FrI2J1h6OE+xHtKxEKRtFKraRU7bpPj2fdQtEtuzOQZeJlpAgZat3CV6cXszTiCpmkxrQ9N0F/TDUKJvk30kNTygb0j5vW6poxI0/nt07IadW6ZEw1rYUkpn6e2JMI2NGUWA7I4oDs+hNxf+8dorhtT8WKkmRKzZfFKaSYEymz5Oe0JyhHFlCmRb2VsIGVFOGNqK8DcFbfHmZNM7L3kXZvb0sVrwsjhwcwmUwIMrqEAValAHBhKe4RXenHvnxXl3PuatK042cwR/4Hz+ABOrj0c=</latexit>
slide-8
SLIDE 8

The Point

Unknown Behaviour Observed Sample

Many alternatives to handle uncertainty associated with empirical results:

6WDWLVWLFDOQDO\VLV +DQGERRN

$&RPSUHKHQVLH+DQGERRNRI6DLVLFDO &RQFHSV7HFKQLTHVDQG6RIDUH7RROV (GLLRQ 'U0LFKDHO-GH6PLK

slide-9
SLIDE 9

WHAT N NHST C COMPUTES

p(t(x) > τ|H0)

<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>

BAYESIAN STATISTICAL ANALYSIS

The Point

STATISTICAL A ANALYSIS O OF EXPERIMENTAL R RESULTS NULL HYPOTHESIS STATISTICAL TESTING Unknown Behaviour Observed Sample

slide-10
SLIDE 10

The Bayesian Approach

The method focuses on estimating relevant information about the underlying performance parametric distribution represented by a set of parameters θ. This method asses the distribution of θ conditioned on a sample s drawn from the performance distribution. Instead of having a single probability distribution to model the underlying performance, Bayesian statistics considers all possible distributions and assigns a probability to each.

P(θ|s) ∝ P(s|θ)P(θ)

<latexit sha1_base64="1oaUrufzQhQHrQgFYQ+vqg7duQg=">ACEXicbVDLSsNAFJ34rPUVdelmsAjpiRV0GXRjcsI9gFtKJPpB06eTBzI5S0v+DGX3HjQhG37tz5N07bCNp6YOBwzr3cOcdPBFdg21/Gyura+sZmYau4vbO7t28eHDZUnErK6jQWsWz5RDHBI1YHDoK1EslI6AvW9IfXU795z6TicXQHo4R5IelHPOCUgJa6puVaHRgwIGNV7iQyTiDGrqXGc7GMf+xy1yzZFXsGvEycnJRQDrdrfnZ6MU1DFgEVRKm2YyfgZUQCp4JNip1UsYTQIemztqYRCZnyslmiCT7VSg8HsdQvAjxTf29kJFRqFPp6MiQwUIveVPzPa6cQXHoZj5IUWETnh4JUYB17Wg/uckoiJEmhEqu/4rpgEhCQZdY1CU4i5GXSaNac4q1dvzUu0qr6OAjtEJspCDLlAN3SAX1RFD+gJvaBX49F4Nt6M9/noipHvHKE/MD6+ASzGnJY=</latexit>

Posterior distribution

  • f the parameters

Likelihood function Prior distribution

  • f the parameters

HOW D DO W WE C COMPARE M MULTIPLE AL ALGORITHMS?

slide-11
SLIDE 11

Minimizing some instances of a problem Minimizing a given instance of a problem

Algorithm f1

GA 100 PSO 90 ILP 135 SA 105 GP 95 . . . . . .

From Results to Rankings

Observed Sample

σ1

3 1 5 4 2 . . .

Algorithm f2

GA 130 PSO 80 ILP 135 SA 30 GP 300 . . . . . .

σ2

3 2 4 1 5 . . .

σ3

3 5 2 4 1 . . .

σ4

4 5 3 1 2 . . .

σ5

4 3 2 5 1 . . .

Algorithm f3

GA 37 PSO 352 ILP 19 SA 100 GP 10 . . . . . .

Algorithm f4

GA 566 PSO 756 ILP 101 SA 56 GP 57 . . . . . .

Algorithm f5

GA 256 PSO 125 ILP 89 SA 369 GP 36 . . . . . .

rankings, permutations

slide-12
SLIDE 12
  • Each algorithm in the comparison has a weight associated.
  • The weights sum up 1.
  • The weight associated to an algorithm represents its probability to appear at first rank.

Plackett-luce Model

P(σ) =

n

Y

i=1

wσi Pn

j=i wσj

!

<latexit sha1_base64="l2ncjWDTg/lJpaxSOQNZ0W4MK+s=">ACQXicbVBLSwMxGMzWd31VPXoJFqFeyq4KeikIXjxWsFro1iWbZtvYJLsk3ypl2b/mxX/gzbsXD4p49WL6OGjrQGAyMx9fMmEiuAHXfXEKc/MLi0vLK8XVtfWNzdLW9rWJU01Zg8Yi1s2QGCa4Yg3gIFgz0YzIULCbsH8+9G/umTY8VlcwSFhbkq7iEacErBSUmvWKb3hXkoOan+i4E2S85uW3CvuCRVDxI01o9hBk45B18zy3l1QG2V2ND4O/zDtrYl/zbg8OglLZrboj4FniTUgZTVAPSs9+J6apZAqoIMa0PDeBdkY0cCpYXvRTwxJC+6TLWpYqIplpZ6MGcrxvlQ6OYm2PAjxSf09kRBozkKFNSgI9M+0Nxf+8VgrRaTvjKkmBKTpeFKUCQ4yHdeIO14yCGFhCqOb2rZj2iC0NbOlFW4I3/eVZcn1Y9Y6q7uVx+ex4Uscy2kV7qI8dILO0AWqowai6BG9onf04Tw5b86n8zWOFpzJzA76A+f7BwPtslo=</latexit>
slide-13
SLIDE 13

#1 #2 #3 #4 #4

Plackett-luce Model

w1 = 0.3

<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T89Osw8k=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs2oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsw2k3URSLkNOLmd+51HqjSL5YOZJjQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUE8XsrYiMscLE2IRKNgR/9eV10q6fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit>

w4 = 0.6

<latexit sha1_base64="ih+pLwy7ZdqSbp6UNhQ/Da/0pZk=">AB7nicbVBNSwMxEJ3Ur1q/qh69BIvgadmtRb0IBS8eK9gPaJeSTbNtaDa7JFmlLP0RXjwo4tXf481/Y9ruQVsfDzem2FmXpAIro3rfqPC2vrG5lZxu7Szu7d/UD48auk4VZQ1aSxi1QmIZoJL1jTcCNZJFCNRIFg7GN/O/PYjU5rH8sFMEuZHZCh5yCkxVmo/9Ws3rnPZL1dcx50DrxIvJxXI0eiXv3qDmKYRk4YKonXcxPjZ0QZTgWblnqpZgmhYzJkXUsliZj2s/m5U3xmlQEOY2VLGjxXf09kJNJ6EgW2MyJmpJe9mfif101NeO1nXCapYZIuFoWpwCbGs9/xgCtGjZhYQqji9lZMR0QRamxCJRuCt/zyKmlVHe/Cqd7XKvVaHkcRTuAUzsGDK6jDHTSgCRTG8Ayv8IYS9ILe0ceitYDymWP4A/T5A+G6jpE=</latexit>

w3 = 0.17

<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszWkvlhSHPqdf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bGJYrju1kQOvEzUkFcrSG5a/BKCJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d4ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>

w2 = 0.03

<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4KrtQS9CwYvHCvYD2qVk09k2NJtdk6xSv+EFw+KePXvePfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>
slide-14
SLIDE 14

#1 #2 #3 #4

Plackett-luce Model

w1 = 0.3

<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T89Osw8k=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs2oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsw2k3URSLkNOLmd+51HqjSL5YOZJjQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUE8XsrYiMscLE2IRKNgR/9eV10q6fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit>

w3 = 0.17

<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszWkvlhSHPqdf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bGJYrju1kQOvEzUkFcrSG5a/BKCJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d4ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>

w2 = 0.03

<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4KrtQS9CwYvHCvYD2qVk09k2NJtdk6xSv+EFw+KePXvePfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>

#1

slide-15
SLIDE 15

#2 #3 #4

Plackett-luce Model

w3 = 0.17

<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszWkvlhSHPqdf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bGJYrju1kQOvEzUkFcrSG5a/BKCJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d4ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>

w2 = 0.03

<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4KrtQS9CwYvHCvYD2qVk09k2NJtdk6xSv+EFw+KePXvePfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>

#1 #2 #3

P(σ) = w4 w1 + w2 + w3 + w4 · w1 w1 + w2 + w3 · w3 w2 + w3 · w2 w2

<latexit sha1_base64="k2yXUvSJjQYl5+sp1WrWkD6O2tU=">ACVnicbZHNS8MwGMbTzrk5v6oevRSHMBmMdhvoRh48TjBfcBaSpqlW1jSliR1jNJ/Ui/6p3gR02C+3jhDQ+/Jy9JnvgxJUJa1pemFw6Kh6XyUeX45PTs3Li47Iso4Qj3UEQjPvShwJSEuCeJpHgYcwyZT/HAnz3l/uANc0Gi8FUuYuwyOAlJQBCUCnkG69YcQSYM3j06AYconXvtTC12fe41VbfqOXDQOJ/vr3hb3qt3NvDm0ueUbValjLMneFvRZVsK6uZ7w74wglDIcSUSjEyLZi6aQS4IozipOInAM0QxO8EjJEDIs3HQZS2beKjI2g4irDqW5pP8nUsiEWDBf7WRQTsW2l8N93iRwYObkjBOJA7R6qAgoaMzDxjc0w4RpIulICIE3VXE02hSkKqn6ioEOztJ+KfrNhtxrWS7va/jKINrcANqwAb3oAOeQRf0AIf4FvTtYL2qf3oRb202qpr65krsFG68QuBprRa</latexit>
slide-16
SLIDE 16

The bayesian model

Posterior distribution of the weights Likelihood of the sample Prior distribution of the weights N

Y

k=1 n

Y

i=1

@ wσ(k)

i

Pn

j=i wσ(k)

j

1 A

<latexit sha1_base64="382jpMOvUOBX2CNv68sNU9hmgk=">ACUHicbVFNaxsxFHzrNh910sRNj72ImoBzMbtNoL0EArnkFKonYDXWbSydq1Y0i7S2xQj9ifmklt/Ry49tLRa24U27gOheTPzkDRKSykshuG3oPXi5cbm1var9s7u6739zpuDoS0qw/iAFbIwNym1XArNByhQ8pvScKpSya/T2XmjX9zY0Whv+C85GNFcy0ywSh6KunkcWmKSeJmp1F9e7lqRNoEkueYS/ODGXua+JiK3JFb1vdlR7T13XnqpU4u5ORWNfs9x5C4mNyKd4lHS6YT9cFkH0Qp0YVXSecxnhSsUlwjk9TaURSWOHbUoGCS1+24srykbEZzPvJQU8Xt2C0CqcmhZyYkK4xfGsmC/XvCUWXtXKXeqShO7XOtIf+njSrMPo2d0GWFXLPlQVklCRakSZdMhOEM5dwDyozwdyVsSn2A6P+g7UOInj95HQw/9KPjfvj5pHt2sopjG97Be+hB/hDC7gCgbA4AGe4Af8DB6D78GvVrC0/tnhLfxTrfZvGq62uA=</latexit>

R = {σ(1), . . . , σ(N)}

<latexit sha1_base64="p6uONzgcyQDNmoWv+HTlxo17g=">ACD3icbVBNS8NAEN3Ur1q/oh69LBalhVISLehFKHjxJFVsKzSxbDbdulmE3Y3Qgn5B178K148KOLVqzf/jds2iLY+GHi8N8PMPC9iVCrL+jJyC4tLyv51cLa+sbmlrm905JhLDBp4pCF4tZDkjDKSVNRxchtJAgKPEba3vB87LfviZA05DdqFBE3QH1OexQjpaWueXh95iSOpP0A3SUlu5xWHOaHSlZ+tMty6qRds2hVrQngPLEzUgQZGl3z0/FDHAeEK8yQlB3bipSbIKEoZiQtOLEkEcJD1CcdTkKiHSTyT8pPNCKD3uh0MUVnKi/JxIUSDkKPN0ZIDWQs95Y/M/rxKp36iaUR7EiHE8X9WIGVQjH4UCfCoIVG2mCsKD6VogHSCsdIQFHYI9+/I8aR1V7eOqdVUr1mtZHmwB/ZBCdjgBNTBWiAJsDgATyBF/BqPBrPxpvxPm3NGdnMLvgD4+Mb1wab2w=</latexit>

P(w|R) ∝

<latexit sha1_base64="CzIyNBVIpLnUlZDF5eJdtnMe9Lw=">AB/3icbVDLSgMxFM3UV62vUcGNm2AR6qbMaEGXBTcuq9gHdIaSTNtaGYSkoxSpl34K25cKOLW3Dn35hpZ6GtBwKHc+7lnpxAMKq043xbhZXVtfWN4mZpa3tnd8/eP2gpnkhMmpgzLjsBUoTRmDQ1Yx0hCQoChpB6PrzG8/EKkoj+/1WBA/QoOYhQjbaSefdSoeBHSwyBMH6eTuzNPSC4079lp+rMAJeJm5MyNHo2V9en+MkIrHGDCnVdR2h/RJTEj05KXKCIQHqEB6Roao4goP53ln8JTo/RhyKV5sYz9fdGiKlxlFgJrOsatHLxP+8bqLDKz+lsUg0ifH8UJgwqDnMyoB9KgnWbGwIwpKarBAPkURYm8pKpgR38cvLpHVedS+qzm2tXK/ldRTBMTgBFeCS1AHN6ABmgCDCXgGr+DNerJerHfrYz5asPKdQ/AH1ucPKnKWJw=</latexit>

1 B

n

Y

i=1

wαi−1

i

<latexit sha1_base64="/gfyjh4UDNfus5EbeDuQVHLsAyw=">ACE3icbVDLSsNAFJ34rPUVdelmsAgiWBIt6EYounFZwT6gScNkMmHTiZhZqKUkH9w46+4caGIWzfu/BunbRbaeuDC4Zx7ufceP2FUKsv6NhYWl5ZXVktr5fWNza1tc2e3JeNUYNLEMYtFx0eSMpJU1HFSCcRBEU+I21/eD32/dESBrzOzVKiBuhPqchxUhpyTOPnVAgnNl5dpU7iYgDL6OXdt7j8MGjvcxBLBkgj8ITO/fMilW1JoDzxC5IBRoeOaXE8Q4jQhXmCEpu7aVKDdDQlHMSF52UkShIeoT7qachQR6WaTn3J4qJUAhrHQxRWcqL8nMhRJOYp83RkhNZCz3lj8z+umKrxwM8qTVBGOp4vClEVw3FAMKCYMVGmiAsqL4V4gHSISkdY1mHYM+PE9ap1X7rGrd1ir1WhFHCeyDA3AEbHAO6uAGNEATYPAInsEreDOejBfj3fiYti4Yxcwe+APj8wfKaZ4G</latexit>

B = Qn

i=1 Γ(αi)

Γ(Pn

i=1 αi)

<latexit sha1_base64="lQ2UQ095A4jrK9whnNjihdhrbPg=">ACL3icbVDLSgMxFM34rPU16tJNsAh1U2ZU0I1QFNRlBauFTh3upBkbmSGJCOUoX/kxl/pRkQRt/6Fa1vDwQO5zLzT1Rypk2nvfgTExOTc/MFuaK8wuLS8vuyuqFTjJFaJ0kPFGNCDTlTNK6YbTRqoiIjTy6h7NPQvb6jSLJHnpfSloBryWJGwFgpdI8PD4JYAcmDVCXtMGcHfv9K4uAEhIByADztQMi2+vmHojPxlfq0Q7fkVbwR8F/ij0kJjVEL3UHQTkgmqDSEg9ZN30tNKwdlGOG0XwyTVMgXbimTUslCKpb+ejePt60ShvHibJPGjxSv0/kILTuicgmBZiO/u0Nxf+8Zmbi/VbOZJoZKsn7oj2CR4WB5uM0WJ4T1LgChm/4pJB2x9xlZctCX4v0/+Sy62K/5OxTvbLV3x3U0DraQGXkoz1URaeohuqIoFs0QI/oyblz7p1n5+U9OuGMZ9bQDzivb04RqSw=</latexit>

No way to sample posterior distribution exactly à MCMC

slide-17
SLIDE 17

Bayesian inference for algorithm ranking analysis

Instance #1 Instance #m Instance #2

  • Inst. #1
  • Inst. #m
  • Inst. #2

Alg1 w1 w2 wn Alg2 Algn

Performance Matrix Weight Vector Sample Run the Algorithms Rank the Algorithms

  • Inst. #1
  • Inst. #m
  • Inst. #2

Alg1 Alg2 Algn

Ranking Matrix MCMC Sampling Query Posterior

0.0 0.2 0.4 0.6

slide-18
SLIDE 18

The Case of Study

23 F FUNCTIONS T TO O OPTIMIZE:

  • OneMax (F1) and W-model extensions (F4-F10)
  • LeadingOnes (F2) and W-model extensions (F11-

F17)

  • Harmonic (F3)
  • LABS: Low Autocorrelation Binary Sequences (F18)
  • Ising-Ring (F19)
  • Ising-Torus (F20)
  • Ising-Triangular (F21)
  • MIVS: Maximum Independent Vertex Set (F22)
  • NQP: N-Queens problem (F23)

n ∈ {16, 64, 100, 625}

<latexit sha1_base64="HS0JdBr8a6YmSKd4vVyu+TiOCPw=">AB/nicbVBNS8NAEJ3Ur1q/ouLJy2IRPJS1Fr1VvDisYKthSaUzXbLt1swu5GKHgX/HiQRGv/g5v/hu3bQ7a+mDg8d4M/OCmDOlHefbyq2srq1v5DcLW9s7u3v2/kFLRYktEkiHsl2gBXlTNCmZprTdiwpDgNOH4LRzdR/eKRSsUjc63FM/RAPBOszgrWRuvaR8JjwUrdWqlVLruOUapULb9K1i07ZmQEtEzcjRcjQ6NpfXi8iSUiFJhwr1XGdWPsplpoRTicFL1E0xmSEB7RjqMAhVX46O3+CTo3SQ/1ImhIazdTfEykOlRqHgekMsR6qRW8q/ud1Et2/8lMm4kRTQeaL+glHOkLTLFCPSUo0HxuCiWTmVkSGWGKiTWIFE4K7+PIyaVXK7nm5clct1q+zOPJwDCdwBi5cQh1uoQFNIJDCM7zCm/VkvVjv1se8NWdlM4fwB9bnD6Ask0w=</latexit>

Pr Problem Si Size: 11 11 Me Metah aheuri ristic al algori rithms:

  • greedy Hill Climber (gHC)
  • Randomlized Local Search (RLS)
  • (1+1) EA
  • fast Genetic Algorithm (fGA)
  • (1+10) EA
  • (1+10) EAr/2,2r
  • (1+10) EAnorm
  • (1+10) EAvar
  • (1+10) EAlog-n
  • (1+(λ+λ)) GA
  • “vanilla” GA (vGA)

Re Results of 1 11.132 ru runs ns are are co collect cted (23 x x 4 4 x x 1 11 x x 1 11)

  • Aggregation of performances across 11 instances.
  • Median performance across 11 repetitions.

Es Estim imate th the pr probability of

  • f ea

each al algori rithm be being top top-rank ranked

  • as its expected weight in the posterior distribution of weights

An Analyze th the un uncertainty ab about th the pr probabilities

  • By estimating the 90% credible intervals of the posterior distribution of weights (5% and 95%)
slide-19
SLIDE 19

Inference analyses & results

QUALITATIVE S SUMMARY Similar perf. (1+(λ+λ)) GA, (1+1)-EA, (1+10)-EAvar, (1+10)-Ealog-n, (1+10)-Eanorm,(1+10)-EAr/2,2r and fGA. Extreme perf. vGA and gHC. Easily treated instances are F1-F6, F8, F11-F13 and F15-16. Best solutions found for n=625

slide-20
SLIDE 20

Inference analyses & results

Fix Fixed-ta target t pe perspe pect ctive – Record Running-time

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.2 0.4 0.6

Probability of winning Algorithm

F17, n=625, φ=625 F19, n=100, φ=100

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.1 0.2 0.3 0.4 0.5

Probability of winning Algorithm

Credible Intervals Only 11 samples to do inference à High uncertainty is expected! The more samples, the lower the uncertainty à Credibility intervals are more tight!

Expected probability High uncertainty

IN INTERP RPRE RETABIL ILIT ITY

slide-21
SLIDE 21

Inference analyses & results

Fix Fixed-ta target t pe perspe pect ctive – Record Running-time – Set of easy functions

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.00 0.25 0.50 0.75 1.00

Probability of winning Algorithm

n=625, all runs

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.2 0.4 0.6

Probability of winning Algorithm

n=625, median

Credible Intervals Set of functions, two paths à (1) take all the runs, (2) take the median of the runs on each instance. gHC is the best in both cases à with more samples the uncertainty is lower

slide-22
SLIDE 22

Inference analyses & results

Fix Fixed-ta target t pe perspe pect ctive – Record Running-time – Set of non-easy functions Credible Intervals Good estimations à credible intervals smaller than 0.05 Probabilities are similar à due to overlapping Uncertainty about which is the best à but not due to limitation of data, but due to equivalence in the algorithms

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.050 0.075 0.100 0.125 0.150

Probability of winning Algorithm

n=625, all runs

slide-23
SLIDE 23

Inference analyses & results

Fix Fixed-bu budget pe perspe pect ctive – Evolution winning probability - %90 credibility intervals

0.0 0.2 0.4 0.6 300 600 900

Budget Winning probability

(1+(,)) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS

F21, n=100

gHC is the best, but probability decreases while the rest improve. gHC becomes better, as the budget increases.

3 4 5 6 7 8 9 10 11

Algorithms ranked with average data Wilcoxon test for pairwise comparisons, and shaffer’s method for p-value correction.

BAYESIAN A ANALYSIS ESTIMATED P PROBABILITY A AND NOTION O OF U UNCERTAINTY I IN T THE FORM O OF C CREDIBLE I INTERVAL

slide-24
SLIDE 24

Inference analyses & results

Im Impact of

  • f th

the pr prior di distribution – Comparison of three different priors

0.0 0.2 0.4 0.6 ( 1 + (

  • ,
  • )

) G A ( 1 + 1 ) E A g H C ( 1 + 1 ) E A _ r / 2 , 2 r ( 1 + 1 ) E A ( 1 + 1 ) E A _ l

  • g
  • n

. ( 1 + 1 ) E A _ n

  • r

m . ( 1 + 1 ) E A _ v a r . f G A v G A R L S

Algorithm Winning probability Prior

Unifor Empirical Deceptive

F9, n=100, φ=100

Empirical data favours the best performing algorithms Neligible effect (even when median values are considered)

slide-25
SLIDE 25

Discussion

Bayesian inference using Plackett-Luce for analysis of algorithms’ performance ranking Include it in the practical EC performance comparison’ tool set à IOHProfiler Strong points Ability to handle multiple algorithms Interpretability Exact description of the uncertainty WEAKNESSES Aggregating performances into rankings we loose information about the magnitude of differences Limitations of the Plackett-Luce model à From n! to n parameters. How do we deal with ties?

slide-26
SLIDE 26

scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems

slide-27
SLIDE 27

Josu Ceberio

Bayesian Analysis for Algorithm Performance Comparison

Thank you very much for your attention!