free lunch for optimisation under the universal
play

Free Lunch for Optimisation under the Universal Distribution Tom - PowerPoint PPT Presentation

Free Lunch for Optimisation under the Universal Distribution Tom Everitt 1 Tor Lattimore 2 Marcus Hutter 3 1 Stockholm University, Stockholm, Sweden 2 University of Alberta, Edmonton, Canada 3 Australian National University, Canberra, Australia


  1. Free Lunch for Optimisation under the Universal Distribution Tom Everitt 1 Tor Lattimore 2 Marcus Hutter 3 1 Stockholm University, Stockholm, Sweden 2 University of Alberta, Edmonton, Canada 3 Australian National University, Canberra, Australia July 7, 2014 Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 1 / 11

  2. Outline Are universal optimisation algorithms possible? Background: Finite Black-box Optimisation (FBBO) and the NFL theorems The Universal Distribution Our results Conclusions and Outlook Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 2 / 11

  3. Finite Black-box Optimisation FBBO is a formal setting for Simulated Annealing, Genetic Algorithms, etc. It is characterized by: Finite search space X , finite range Y , unknown f : X → Y . An optimisation algorithm repeatedly chooses points x i ∈ X to evaluate. Goal: Minimimize probes-till-max (Optimisation Time). Distribution P over the finite set { f : X → Y } = Y X . P -expected Optimisation Time: Perf P ( a ) = E P [ probes-till-max ( a )] P affects bounds on optimisation performance. Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 3 / 11

  4. The NFL (No Free Lunch) theorems Definition There is NFL for P if Perf P ( a ) = Perf P ( b ) . Theorem (Original NFL (Wolpert&Macready, 1997)) P uniform = ⇒ NFL for P . = ⇒ so no universal optimisation? Uniform = unbiased? Uniform means random noise. 10 5 0 0 5 10 Our suggestion to avoid NFL: The Universal Distribution (not new). Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 4 / 11

  5. The Universal Distribution – Background Kolmogorov complexity: K ( x ) := min p { ℓ ( p ) : p prints x } m ( x ) := 2 − K ( x ) Universal distribution: 000000000 0101001101 Example: Low High K High Low m Agrees with Occam’s razor with “simplicity bias” Dominates all (semi-)computable (semi-)measures Essentially regrouping invariant Offers mathematical solution to the induction problem (Solomonoff induction). Successfully used in Reinforcement Learning (Hutter, 2005), and for general clustering algorithm (Cilibrasi&Vitanyi, 2003) Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 5 / 11

  6. The Universal Distribution in FBBO May equivalently be defined in two ways: m XY ( f ) := 2 − K ( f | X,Y ) (1) ≈ “the probability that a ‘random’ program acts like f ” (2) (1) shows bias towards simplicity 6 4 Y 2 0 0 1 2 3 4 5 6 X Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 6 / 11

  7. The Universal Distribution in FBBO May equivalently be defined in two ways: m XY ( f ) := 2 − K ( f | X,Y ) (1) ≈ “the probability that a ‘random’ program acts like f ” (2) (2) shows the wide applicability of the universal distribution. ? f X ? Y X Y Uniform distribution Universal distribution The uncertainty pertains to the system behind the mapping. Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 7 / 11

  8. Results – Good News The universal distribution permits free lunch Theorem (Universal Free Lunch) There is free lunch under the universal distribution for all sufficiently large search spaces. Follows from simplicity bias: 4 4 2 2 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 8 / 11

  9. Results – Bad News Unfortunately, the universal distribution does not permit sublinear maximum finding Theorem (Asymptotic bounds) Expected optimisation time increases linearly with the size of the search space. Optimisation is a hard problem. Degenerate functions impede performance (NIAH-functions and “adversarial” functions). Needle-in-a-haystack function: 1 0 0 1 2 3 4 5 6 Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 9 / 11

  10. Conclusions and Outlook The universal distribution is a philosophically justified prior for finite black-box optimisation. It offers free lunch, but not sublinear maximum finding. So meta-heuristics with different universal performance exist, but the difference is limited. Future research: Minimal condition enabling sublinear maximum finding. Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 10 / 11

  11. References Rudi Cilibrasi and Paul M B Vitanyi. Clustering by compression. IEEE Transactions on Information Theory , 51(4):27, 2003. Marcus Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability . Lecture Notes in Artificial Intelligence (LNAI 2167). Springer, 2005. David H Wolpert and William G Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation , 1(1):270–283, 1997. Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 11 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend