Free Lunch for Optimisation under the Universal Distribution Tom - - PowerPoint PPT Presentation

free lunch for optimisation under the universal
SMART_READER_LITE
LIVE PREVIEW

Free Lunch for Optimisation under the Universal Distribution Tom - - PowerPoint PPT Presentation

Free Lunch for Optimisation under the Universal Distribution Tom Everitt 1 Tor Lattimore 2 Marcus Hutter 3 1 Stockholm University, Stockholm, Sweden 2 University of Alberta, Edmonton, Canada 3 Australian National University, Canberra, Australia


slide-1
SLIDE 1

Free Lunch for Optimisation under the Universal Distribution

Tom Everitt1 Tor Lattimore2 Marcus Hutter3

1Stockholm University, Stockholm, Sweden 2University of Alberta, Edmonton, Canada 3Australian National University, Canberra, Australia

July 7, 2014

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 1 / 11

slide-2
SLIDE 2

Outline

Are universal optimisation algorithms possible? Background: Finite Black-box Optimisation (FBBO) and the NFL theorems The Universal Distribution Our results Conclusions and Outlook

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 2 / 11

slide-3
SLIDE 3

Finite Black-box Optimisation

FBBO is a formal setting for Simulated Annealing, Genetic Algorithms, etc. It is characterized by: Finite search space X, finite range Y , unknown f : X → Y . An optimisation algorithm repeatedly chooses points xi ∈ X to evaluate. Goal: Minimimize probes-till-max (Optimisation Time). Distribution P over the finite set {f : X → Y } = Y X. P-expected Optimisation Time: PerfP (a) = EP [probes-till-max(a)] P affects bounds on optimisation performance.

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 3 / 11

slide-4
SLIDE 4

The NFL (No Free Lunch) theorems

Definition

There is NFL for P if PerfP (a) = PerfP (b).

Theorem (Original NFL (Wolpert&Macready, 1997))

P uniform = ⇒ NFL for P. = ⇒ so no universal optimisation? Uniform = unbiased? Uniform means random noise. 5 10 5 10 Our suggestion to avoid NFL: The Universal Distribution (not new).

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 4 / 11

slide-5
SLIDE 5

The Universal Distribution – Background

Kolmogorov complexity: K(x) := minp{ℓ(p) : p prints x} Universal distribution: m(x) := 2−K(x) Example: 000000000 0101001101 K Low High m High Low Agrees with Occam’s razor with “simplicity bias” Dominates all (semi-)computable (semi-)measures Essentially regrouping invariant Offers mathematical solution to the induction problem (Solomonoff induction). Successfully used in Reinforcement Learning (Hutter, 2005), and for general clustering algorithm (Cilibrasi&Vitanyi, 2003)

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 5 / 11

slide-6
SLIDE 6

The Universal Distribution in FBBO

May equivalently be defined in two ways: mXY (f) := 2−K(f|X,Y ) (1) ≈ “the probability that a ‘random’ program acts like f” (2) (1) shows bias towards simplicity 1 2 3 4 5 6 2 4 6 X Y

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 6 / 11

slide-7
SLIDE 7

The Universal Distribution in FBBO

May equivalently be defined in two ways: mXY (f) := 2−K(f|X,Y ) (1) ≈ “the probability that a ‘random’ program acts like f” (2) (2) shows the wide applicability of the universal distribution. ? X Y Uniform distribution X f ? Universal distribution Y The uncertainty pertains to the system behind the mapping.

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 7 / 11

slide-8
SLIDE 8

Results – Good News

The universal distribution permits free lunch

Theorem (Universal Free Lunch)

There is free lunch under the universal distribution for all sufficiently large search spaces. Follows from simplicity bias: 1 2 3 4 5 2 4 1 2 3 4 5 2 4

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 8 / 11

slide-9
SLIDE 9

Results – Bad News

Unfortunately, the universal distribution does not permit sublinear maximum finding

Theorem (Asymptotic bounds)

Expected optimisation time increases linearly with the size of the search space. Optimisation is a hard problem. Degenerate functions impede performance (NIAH-functions and “adversarial” functions). Needle-in-a-haystack function: 1 2 3 4 5 6 1

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 9 / 11

slide-10
SLIDE 10

Conclusions and Outlook

The universal distribution is a philosophically justified prior for finite black-box optimisation. It offers free lunch, but not sublinear maximum finding. So meta-heuristics with different universal performance exist, but the difference is limited. Future research: Minimal condition enabling sublinear maximum finding.

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 10 / 11

slide-11
SLIDE 11

References

Rudi Cilibrasi and Paul M B Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):27, 2003. Marcus Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Lecture Notes in Artificial Intelligence (LNAI 2167). Springer, 2005. David H Wolpert and William G Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):270–283, 1997.

Everitt, Lattimore, Hutter Free Lunch for Optimisation July 7, 2014 11 / 11