Bayesian Analysis for Algorithm Performance Comparison Is it - PowerPoint PPT Presentation

Bayesian Analysis for Algorithm Performance Comparison Is it possible to compare optimization algorithms without hypothesis testing? Josu Ceberio

Is there a reproducibility crisis? Fuente: Monya Baker (2016) Is there a reproducibility crisis? Nature, 533, 452-454

Is there a reproducibility crisis? Hypothesis Questions Experimentation Conclusions Idea for solving a set Is my algorithm Compare the performance What conclusions do we of problems more better than the state- of my algorithm with the- draw from the efficiently. of-the-art? state-of-the-art on some experimentation? benchmark of problems. On which problems is How do we answer to the my algorithm better? The analysis of the results formulated questions? should take into account Why is my algorithm the associated better (or worse)? uncertainty .

The Questions How l likely i is m my p proposal t to be be the b best a algorithm t to s solve a p problem? How l likely i is m my p proposal t to be be the b best a algorithm f from the c compared o ones?

<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> The Point Unknown Behaviour STATISTICAL A ANALYSIS O OF EXPERIMENTAL R RESULTS Observed Sample WHAT N NHST C COMPUTES NULL HYPOTHESIS STATISTICAL TESTING p ( t ( x ) > τ | H 0 )

The controversy with NHST

<latexit sha1_base64="/MpXzWcP8EqakOTUlXIz1ULR90=">AB73icbVDLSgNBEOz1GeMr6tHLYBDiJeyqoMeAlxwjmAckS5idzCZDZmfXmV4xPyEFw+KePV3vPk3TpI9aGJBQ1HVTXdXkEh0HW/nZXVtfWNzdxWfntnd2+/cHDYMHGqGa+zWMa6FVDpVC8jgIlbyWa0yiQvBkMb6Z+84FrI2J1h6OE+xHtKxEKRtFKraRU7bpPj2fdQtEtuzOQZeJlpAgZat3CV6cXszTiCpmkxrQ9N0F/TDUKJvk30kNTygb0j5vW6poxI0/nt07IadW6ZEw1rYUkpn6e2JMI2NGUWA7I4oDs+hNxf+8dorhtT8WKkmRKzZfFKaSYEymz5Oe0JyhHFlCmRb2VsIGVFOGNqK8DcFbfHmZNM7L3kXZvb0sVrwsjhwcwmUwIMrqEAValAHBhKe4RXenHvnxXl3PuatK042cwR/4Hz+ABOrj0c=</latexit> <latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> <latexit sha1_base64="1JetnS1nfDHVeV06DeUX+AEQ8Y=">AB/HicbZDLSgMxFIYz9VbrbRLN8Ei1IVloJuhIKbLivYC7TDkEnTNjSTGZKMOIz1Vdy4UMStD+LOtzHTzkJbfwh8/OczsnvR5wp7TjfVmFldW19o7hZ2tre2d2z9w/aKowloS0S8lB2fawoZ4K2NOcdiNJceBz2vEnN1m9c0+lYqG40lE3QCPBsygrWxPLuMTqNqw3MeH06uM0AGPLvi1JyZ4DKgHCogV9Ozv/qDkMQBFZpwrFQPOZF2Uyw1I5xOS/1Y0QiTCR7RnkGBA6rcdHb8FB4bZwCHoTRPaDhzf0+kOFAqCXzTGWA9Vou1zPyv1ov18MpNmYhiTQWZLxrGHOoQZknAZOUaJ4YwEQycyskYywx0SavkgkBLX5GdpnNXRec24vKnWUx1Eh+AIVAECl6AOGqAJWoCABDyDV/BmPVkv1rv1MW8tWPlMGfyR9fkDE+OTDg=</latexit> <latexit sha1_base64="ixOtl42DABu1QXwNHfHlqHtk6E=">ACDXicbZC7SgNBFIZnvcZ4W7W0GYxCUh2VdBCJWCTMoK5QLIs5PZMjshZmzYoh5ARtfxcZCEVt7O9/GSbKIJv4w8POdczhzfi8WXIFlfRlz8wuLS8uZlezq2vrGprm1XVNRIimr0khEsuERxQPWRU4CNaIJSOBJ1jd612N6vVbJhWPwhvox8wJSCfkPqcENHLNfswzkP+rnDZApLcl12rcIEn5PyHuGbOKlpj4VljpyaHUlVc87PVjmgSsBCoIEo1bSsGZ0AkcCrYMNtKFIsJ7ZEOa2obkoApZzC+ZogPNGljP5L6hYDH9PfEgARK9QNPdwYEumq6NoL/1ZoJ+GfOgIdxAiyk0V+IjBEeBQNbnPJKIi+NoRKrv+KaZdIQkEHmNUh2NMnz5raUdE+LlrXJ7mSncaRQbtoD+WRjU5RCZVRBVURQ/oCb2gV+PReDbejPdJ65yRzuygPzI+vgFYSZkn</latexit> The controversy with NHST We assume the null hypothesis , the average The p-value refers to the probability of erroneously performance of the compared methods is the same. assuming that there are differences when actually Then, the observed difference is computed from data there are not . It is used to measure the magnitude of and the probability of observing such a difference (or difference, as it decreases when the difference bigger) is estimated: the p-value . increases. WHAT N NHST C COMPUTES WHAT W WE W WOULD L LIKE T TO K KNOW p ( t ( x ) > τ | H 0 ) p ( H 0 | x ) 1 − p ( t ( x ) > τ | H 0 ) = p ( t ( x ) < τ | H 0 ) 1 − p ( H 0 | x ) = p ( H 1 | x )

The Point Unknown Behaviour Many alternatives to handle uncertainty associated with empirical results: 6WDWLVWLFDO��QDO\VLV +DQGERRN $�&RPSUHKHQVL�H�+DQGERRN�RI�6�D�LV�LFDO &RQFHS�V��7HFKQLT�HV�DQG�6RI��DUH�7RROV Observed Sample ��(GL�LRQ 'U�0LFKDHO�-�GH�6PL�K

<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2/wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH27NScOfAqcXNSRjmav3V68c0jZgEKojWXdJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL5VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> The Point Unknown Behaviour STATISTICAL A ANALYSIS O OF EXPERIMENTAL R RESULTS Observed Sample WHAT N NHST C COMPUTES NULL HYPOTHESIS BAYESIAN STATISTICAL STATISTICAL TESTING p ( t ( x ) > τ | H 0 ) ANALYSIS

<latexit sha1_base64="1oaUrufzQhQHrQgFYQ+vqg7duQg=">ACEXicbVDLSsNAFJ34rPUVdelmsAjpiRV0GXRjcsI9gFtKJPpB06eTBzI5S0v+DGX3HjQhG37tz5N07bCNp6YOBwzr3cOcdPBFdg21/Gyura+sZmYau4vbO7t28eHDZUnErK6jQWsWz5RDHBI1YHDoK1EslI6AvW9IfXU795z6TicXQHo4R5IelHPOCUgJa6puVaHRgwIGNV7iQyTiDGrqXGc7GMf+xy1yzZFXsGvEycnJRQDrdrfnZ6MU1DFgEVRKm2YyfgZUQCp4JNip1UsYTQIemztqYRCZnyslmiCT7VSg8HsdQvAjxTf29kJFRqFPp6MiQwUIveVPzPa6cQXHoZj5IUWETnh4JUYB17Wg/uckoiJEmhEqu/4rpgEhCQZdY1CU4i5GXSaNac4q1dvzUu0qr6OAjtEJspCDLlAN3SAX1RFD+gJvaBX49F4Nt6M9/noipHvHKE/MD6+ASzGnJY=</latexit> The Bayesian Approach The method focuses on estimating relevant information about the underlying performance parametric distribution represented by a set of Likelihood parameters θ . function This method asses the distribution of θ P ( θ | s ) ∝ P ( s | θ ) P ( θ ) conditioned on a sample s drawn from the performance distribution. Posterior distribution Prior distribution of the parameters of the parameters Instead of having a single probability distribution to model the underlying performance, Bayesian HOW D DO W WE C COMPARE M MULTIPLE statistics considers all possible distributions AL ALGORITHMS? and assigns a probability to each.

From Results to Rankings Observed Sample Minimizing some instances of a problem Minimizing a given instance of a problem σ 1 σ 2 σ 3 σ 4 σ 5 Algorithm f 1 Algorithm f 5 Algorithm f 2 Algorithm f 4 Algorithm f 3 GA 100 3 3 3 4 4 GA 256 GA 130 GA 566 GA 37 1 2 5 5 3 PSO 90 PSO 125 PSO 80 PSO 756 PSO 352 5 4 2 3 2 ILP 135 ILP 89 ILP 135 ILP 101 ILP 19 SA 105 4 1 4 1 5 SA 369 SA 30 SA 56 SA 100 GP 95 2 5 1 2 1 GP 36 GP 300 GP 57 GP 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rankings, permutations

<latexit sha1_base64="l2ncjWDTg/lJpaxSOQNZ0W4MK+s=">ACQXicbVBLSwMxGMzWd31VPXoJFqFeyq4KeikIXjxWsFro1iWbZtvYJLsk3ypl2b/mxX/gzbsXD4p49WL6OGjrQGAyMx9fMmEiuAHXfXEKc/MLi0vLK8XVtfWNzdLW9rWJU01Zg8Yi1s2QGCa4Yg3gIFgz0YzIULCbsH8+9G/umTY8VlcwSFhbkq7iEacErBSUmvWKb3hXkoOan+i4E2S85uW3CvuCRVDxI01o9hBk45B18zy3l1QG2V2ND4O/zDtrYl/zbg8OglLZrboj4FniTUgZTVAPSs9+J6apZAqoIMa0PDeBdkY0cCpYXvRTwxJC+6TLWpYqIplpZ6MGcrxvlQ6OYm2PAjxSf09kRBozkKFNSgI9M+0Nxf+8VgrRaTvjKkmBKTpeFKUCQ4yHdeIO14yCGFhCqOb2rZj2iC0NbOlFW4I3/eVZcn1Y9Y6q7uVx+ex4Uscy2kV7qI8dILO0AWqowai6BG9onf04Tw5b86n8zWOFpzJzA76A+f7BwPtslo=</latexit> Plackett-luce Model ! n w σ i Y P ( σ ) = P n j = i w σ j i =1 ● Each algorithm in the comparison has a weight associated. ● The weights sum up 1. ● The weight associated to an algorithm represents its probability to appear at first rank.

Bayesian Analysis for Algorithm Performance Comparison Is it - PowerPoint PPT Presentation

Bayesian Analysis for Algorithm Performance Comparison Is it possible to compare optimization algorithms without hypothesis testing? Josu Ceberio Is there a reproducibility crisis? Fuente: Monya Baker (2016) Is there a reproducibility crisis?

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Overview and Compa risons of Long-T erm Finanial Risk Mo dels Overview and Compa

Creating a Transit Generation The Effect of the U-Pass on Lifelong Transit Use By Caitlin

Lepton-nucleus interactions within many-body approaches: from the quasi-elastic to the DIS region

Build Your Own Idea Factory! IN 20 15-SECOND STEPS! by Aaron Vegh This presentation is about

LCS 11: Cognitive Science Overview of information processing in cognitive science Overview of

The Secret of Success In Iraq: Its All About The People SAME Mid-Atlantic/ Middle East

Taming the Beast: Topic imaging Predictive approach Sparse Machine Learning for Large Text

Sambuz

Useful Links

Newsletter

Mail Us