Introduction to Global Optimization Fabio Schoen 2008 - - PowerPoint PPT Presentation

introduction to global optimization
SMART_READER_LITE
LIVE PREVIEW

Introduction to Global Optimization Fabio Schoen 2008 - - PowerPoint PPT Presentation

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Introduction to Global Optimization p. Global Optimization Problems x S R n f ( x ) min What is it meant by global optimization? Of course


slide-1
SLIDE 1

Introduction to Global Optimization

Fabio Schoen 2008

http://gol.dsi.unifi.it/users/schoen

Introduction to Global Optimization – p.

slide-2
SLIDE 2

Global Optimization Problems

min

x∈S⊆Rn f(x)

What is it meant by global optimization? Of course we sould like to find f ∗ = min

x∈S⊆Rn f(x)

and x∗ = arg min f(x) : f(x∗) ≤ f(x) ∀ x ∈ S

Introduction to Global Optimization – p.

slide-3
SLIDE 3

This definition in unsatisfactory: the problem is “ill posed” in x (two objective functions which differ only slightly might have global optima which are arbitrarily far) it is however well posed in the optimal values: ||f − g|| ≤ δ ⇒|f ∗ − g∗| ≤ ε

Introduction to Global Optimization – p.

slide-4
SLIDE 4

Quite often we are satisfied in looking for f ∗ and search one or more feasible solutions suche that f(¯ x) ≤ f(x∗) + ε Frequently, however, this is too ambitious a task!

Introduction to Global Optimization – p.

slide-5
SLIDE 5

Research in Global Optimization

the problem is highly relevant, especially in applications the problem is very hard (perhaps too much) to solve there are plenty of publications on global optimization algorithms for specific problem classes there are only relatively few papers with relevant theoretical contents

  • ften from elegant theories, weak algorithms have been

produced and viceversa, the best computational methods

  • ften lack a sound theoretical support

Introduction to Global Optimization – p.

slide-6
SLIDE 6

many global optimization papers get published on applied research journals Bazaraa, Sherali, Shetty “Nonlinear Programming: theory and algorithms”, 1993: the word “global optimum” appears for the first time on page 99, the second time at page 132, then at page 247: “A desirable property of an algorithm for solving [an

  • ptimization] problem is that it generates a sequence of

points converging to a global optimal solution. In many cases however we may have to be satisfied with less favorable outcomes.” after this (in 638 pages) it never appears anymore. “Global

  • ptimization” is never cited.

Introduction to Global Optimization – p.

slide-7
SLIDE 7

Similar situation in Bertsekas, Nonlinear Programming (1999): 777 pages, but only the definition of global minima and maxima is given! Nocedal & Wrigth, “Numerical Optimization”, 2nd edition, 2006: Global solutions are needed in some applications, but for many problems they are difficult to recognize and even more difficult to locate . . . many successful global optimization algorithms require the solution of many local optimization problems, to which the algorithms described in this book can be applied

Introduction to Global Optimization – p.

slide-8
SLIDE 8

Complexity

Global optimization is “hopeless”: without “global” information no algorithm will find a certifiable global optimum unless it generates a dense sample. There exists a rigorous definition of “global” information – some examples: number of local optima global optimum value for global optimization problems over a box, (an upper bound on) the Lipschitz constant |f(y) − f(x)| ≤ Lx − y ∀ x, y Concavity of the objective function + convexity of the feasible region an explicit representation of the objective function as the difference between two convex functions (+ convexity of the

Introduction to Global Optimization – p.

slide-9
SLIDE 9

Complexity

Global optimization is computationally intractable also according to classical complexity theory. Special cases: Quadratic programming: min

l≤Ax≤u

1 2xTQx + cTx is NP–hard [Sahni, 1974] and, when considered as a decision problem, NP-complete [Vavasis, 1990].

Introduction to Global Optimization – p.

slide-10
SLIDE 10

Many special cases are still NP–hard: norm maximization on a parallelotope: max x b ≤ Ax ≤ c Quadratic optimization on a hyper-rectangle (A = I) when even only one eigenvalue of Q is negative quadratic minimization over a simplex min

x≥0

1 2xTQx + cTx

  • j

xj = 1 Even checking that a point is a local optimum is NP-hard

Introduction to Global Optimization – p. 1

slide-11
SLIDE 11

Applications of global optimization

concave minimization – quantity discounts, scale economies fixed charge combinatorial optimization - binary linear programming: min cTx + KxT(1 − x) Ax = b x ∈ [0, 1]

  • r:

min cTx Ax = b x ∈ [0, 1] xT(1 − x) =

Introduction to Global Optimization – p. 1

slide-12
SLIDE 12

Minimization of cost functions which are neither convex nor

  • concave. E.g.: finding the minimum conformation of

complex molecules – Lennard-Jones micro-cluster, protein folding, protein-ligand docking, Example: Lennard-Jones: pair potential due to two atoms at X1, X2 ∈ R3: v(r) = 1 r12 − 2 r6 where r = X1 − X2. The total energy of a cluster of N atoms located at X1, . . . , XN ∈ R3 is defined as:

  • i=1,...,N
  • j<i

v(||Xi − Xj||) This function has a number of local (non global) minima which grows like exp(N)

Introduction to Global Optimization – p. 1

slide-13
SLIDE 13

Lennard-Jones potential

  • 3
  • 2
  • 1

1 2 3 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 attractive(x) repulsive(x) lennard-jones(x)

Introduction to Global Optimization – p. 1

slide-14
SLIDE 14

Protein folding and docking

Potential energy model:E = El + Ea + Ed + Ev + Ee where: El =

  • i∈L

1 2Kb

i (ri − r0 i )2

(contribution of pairs of bonded atoms): Ea =

  • i∈A

1 2Kθ

i (θi − θ0 i )2

(angle between 3 bonded atoms) Ed =

  • i∈T

1 2Kφ

i [1 + cos(nφi − γ)]

(dihedrals)

Introduction to Global Optimization – p. 1

slide-15
SLIDE 15

Ev =

  • (i,j)
  • ∈ C

Aij R12

ij

− Bij R6

ij

  • (van der Waals)

Ee = 1 2

  • (i,j)
  • ∈ C

qiqj εRij (Coulomb interaction)

Introduction to Global Optimization – p. 1

slide-16
SLIDE 16

Docking

Given two macro-molecules M1, M2, find their minimal energy coupling If no bonds are changed ⇒to find the optimal docking it is sufficient to minimized: Ev + Ee =

  • i∈M1,j∈M2

Aij R12

ij

− Bij R6

ij

  • + 1

2

  • i∈M1,j∈M2

qiqj εRij

Introduction to Global Optimization – p. 1

slide-17
SLIDE 17

Main algorithmic strategies

Two main families:

  • 1. with global information (“structured problems”)
  • 2. without global information (“unstructured problems”)

Structured problems ⇒stochastic and deterministic methods Unstructured problems ⇒typically stochastic algorithms Every global optimization method should try to find a balance between exploration of the feasible region approximations of the optimum

Introduction to Global Optimization – p. 1

slide-18
SLIDE 18

Example: Lennard Jones

LJN = min LJ(X) = min

N−1

  • i=1

N

  • j=i+1

1 Xi − Xj12 − 2 Xi − Xj6 This is a highly structured problem. But is it easy/convenient to use its structure? And how?

Introduction to Global Optimization – p. 1

slide-19
SLIDE 19

LJ

The map F1 : R3N → RN(N−1)/2

+

F1(X1, . . . , XN) →

  • X1 − X22, . . . , XN−1 − XN2

is convex and the function F2 : RN(N−1)/2

+

→ R F2(r12, . . . , rN−1,N) → 1 r6

ij

− 2 1 r3

ij

is the difference between two convex functions. Thus LJ(X) can be seen as the difference between two convex function (a d.c. programming problem)

Introduction to Global Optimization – p. 1

slide-20
SLIDE 20

NB: every C2 function is d.c., but often its d.c. decomposition is not known. D.C. optimization is very elegant, there exists a nice duality theory, but algorithms are typically very inefficient.

Introduction to Global Optimization – p. 2

slide-21
SLIDE 21

A primal method for d.c. optimization

“cutting plane” method (just an example, not particularly efficient, useless for high dimensional problems). Any unconstrained d.c. problem can be represented as an equivalent problem with linear objective, a convex constraint and a reverse convex constraint. If g, h ar convex, then min g(x) − h(x) is equivalent to: min z g(x) − h(x) ≤ z which is equivalent to min z g(x) ≤ w h(x) + z ≥ w

Introduction to Global Optimization – p. 2

slide-22
SLIDE 22

D.C. canonical form

min cTx g(x) ≤ h(x) ≥ where h, g: convex. Let Ω = {x : g(x) ≤ 0} C = {x : h(x)≤0} Hp: 0 ∈ intΩ ∩ intC, cTx > 0∀x ∈ Ω \ intC Fundamental property: if a D.C. problem admits an optimum, at least one optimum belongs to ∂Ω ∩ ∂C

Introduction to Global Optimization – p. 2

slide-23
SLIDE 23

Discussion of the assumptions

g(0) < 0, h(0) < 0, cTx > 0∀ feasible x. Let ¯ x be a solution to the convex problem min cTx g(x) ≤ 0 If h(¯ x) ≥ 0 then ¯ x solves the d.c. problem. Otherwise cTx > cT ¯ x for all feasible x. Coordinate transformation: y = x − ¯ x: min cTy ˆ g(y) ≤ 0 ˆ h(y) ≥ 0 where ˆ g(y) = g(y + ¯ x). Then cTy > 0 for all feasible solutions and ˆ h(0) > 0; by continuity it is possible to choose ¯ x so that ˆ g(0) < 0.

Introduction to Global Optimization – p. 2

slide-24
SLIDE 24
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0

Introduction to Global Optimization – p. 2

slide-25
SLIDE 25

Let ¯ x best known solution. Let D(¯ x) = {x ∈ Ω : cTx ≤ cT ¯ x} If D(¯ x) ⊆ C then ¯ x is optimal; Check: a polytope P (with known vertices) is built which contains D(¯ x) If all vertices of P are in C ⇒optimal solution. Otherwise let v: best feasible vertex; the intersection of the segment [0, v] with ∂C (if feasible) is an improving point x. Otherwise a cut is introduced in P which is tangent to Ω in x.

Introduction to Global Optimization – p. 2

slide-26
SLIDE 26
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 ¯ x D(¯ x) = {x ∈ Ω : cTx ≤ cT ¯ x}

Introduction to Global Optimization – p. 2

slide-27
SLIDE 27

Initialization

Given a feasible solution ¯ x, take a polytope P such that P ⊇ D(¯ x) i.e. y : cTy ≤ cT ¯ x y

feasible

⇒y ∈ P If P ⊂ C, i.e. if y ∈ P ⇒h(y) ≤ 0 then ¯ x is optimal. Checking is easy if we know the vertices of P.

Introduction to Global Optimization – p. 2

slide-28
SLIDE 28
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 ¯ x P: D(¯ x) ⊆ P with vertices V1, . . . , Vk. V ⋆ := arg max h(Vj) V ⋆

Introduction to Global Optimization – p. 2

slide-29
SLIDE 29

Step 1

Let V ⋆ the vertex with largest h() value. Surely h(V ⋆) > 0 (otherwise we stop with an optimal solution) Moreover: h(0) < 0 (0 is in the interior of C). Thus the line from V ⋆ to 0 must intersect the boundary of C Let xk be the intersection point. It might be feasible (⇒improving) or not.

Introduction to Global Optimization – p. 2

slide-30
SLIDE 30
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 ¯ x xk = ∂C ∩ [V ⋆, 0] V ⋆ xk

Introduction to Global Optimization – p. 3

slide-31
SLIDE 31
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 If xk ∈ Ω, set ¯ x := xk ¯ x

Introduction to Global Optimization – p. 3

slide-32
SLIDE 32
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 Otherwise if xk ∈ Ω, the polytope is divided

Introduction to Global Optimization – p. 3

slide-33
SLIDE 33
  • 9
  • 8
  • 7
  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

1 2

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Ω C cTx = 0 Otherwise if xk ∈ Ω, the polytope is divided

Introduction to Global Optimization – p. 3

slide-34
SLIDE 34

Duality for d.c. problems

min

x∈S g(x) − h(x)

where f, g: convex. Let h⋆(u) := sup{uTx − h(x) : x ∈ Rn} g⋆(u) := sup{uTx − g(x) : x ∈ Rn} the conjugate functions of h e g. The problem inf{h⋆(u) − g⋆(u) : u : h⋆(u) < +∞} is the Fenchel-Rockafellar dual. If min g(x) − h(x) admits an

  • ptimum, then Fenchel dual is a strong dual.

Introduction to Global Optimization – p. 3

slide-35
SLIDE 35

If x⋆ ∈ arg min g(x) − h(x) then u⋆ ∈ ∂h(x⋆) (∂ denotes subdifferential) is dual optimal and if u⋆ ∈ arg min h⋆(u) − g⋆(u) then x⋆ ∈ ∂g⋆(u⋆) is an optimal primal solution.

Introduction to Global Optimization – p. 3

slide-36
SLIDE 36

A primal/dual algorithm

Pk : min g(x) − (h(xk) + (x − xk)Tyk) and Dk : min h⋆(y) − (g⋆(yk−1) + xT

k (y − yk−1)

Introduction to Global Optimization – p. 3

slide-37
SLIDE 37

Exact Global Optimization

Introduction to Global Optimization – p. 3

slide-38
SLIDE 38

GlobOpt - relaxations

Consider the global optimization problem (P): min f(x) x ∈ X and assume the min exists and is finite and that we can use a relaxation (R): min g(y) y ∈ Y Usually both X and Y are subsets of the same space Rn. Recall: (R) is a relaxation of (P) iff: X ⊆ Y g(x) ≤ f(x) for all x ∈ X

Introduction to Global Optimization – p. 3

slide-39
SLIDE 39

Branch and Bound

  • 1. Solve the relaxation (R) and let L be the (global) optimum

value (assume it is feasible for (R))

  • 2. (Heuristically) solve the original problem (P) (or, more

generally, find a “good” feasible solution to (P) in X). Let U be the best feasible function value known

  • 3. if U − L ≤ ε then stop: U is a certified ε–optimum for (P)
  • 4. otherwise split X and Y into two parts and apply to each of

them the same method

Introduction to Global Optimization – p. 3

slide-40
SLIDE 40

Tools

“good relaxations”: easy yet accurate good upper bounding, i.e., good heuristics for (P) Good relaxations can be obtained, e.g., through: convex relaxations domain reduction

Introduction to Global Optimization – p. 3

slide-41
SLIDE 41

Convex relaxations

Assume X is convex and Y = X. If g is the convex envelop of f

  • n X, then solving the convex relaxation (R), in one step gives

the certified global optimum for (P). g(x) is a convex under-estimator of f on X if: g(x)is convex g(x) ≤ f(x) ∀ x ∈ X g is the convex envelop of f on X if: gis a convex under-estimator off g(x) ≥ h(x) ∀ x ∈ X ∀ h : convex under-estimator of f

Introduction to Global Optimization – p. 4

slide-42
SLIDE 42

A 1-D example

Introduction to Global Optimization – p. 4

slide-43
SLIDE 43

Convex under-estimator

Introduction to Global Optimization – p. 4

slide-44
SLIDE 44

Branching

Introduction to Global Optimization – p. 4

slide-45
SLIDE 45

Bounding

fathomed Upper bound lower bounds

Introduction to Global Optimization – p. 4

slide-46
SLIDE 46

Relaxation of the feasible domain

Let min

x∈S f(x)

be a GlobOpt problem where f is convex, while S is non convex. A relaxation (outer approximation) is obtained replacing S with a larger set Q. If Q is convex ⇒convex optimization problem. If the optimal solution to min

x∈Q f(x)

belongs to S ⇒optimal solution to the original problem.

Introduction to Global Optimization – p. 4

slide-47
SLIDE 47

Example

min

x∈[0,5],y∈[0,3] −x − 2y

xy ≤ 3

1 2 3 4 5 6 1 2 3 4

Introduction to Global Optimization – p. 4

slide-48
SLIDE 48

Relaxation

min

x∈[0,5],y∈[0,3] −x − 2y

xy ≤ 3 We know that: (x + y)2 = x2 + y2 + 2xy thus xy = ((x + y)2 − x2 − y2)/2 and, as x and y are non-negative, x2 ≤ 5x, y2 ≤ 3y, thus a (convex) relaxation of xy ≤ 3 is (x + y)2 − 5x − 3y ≤ 6

Introduction to Global Optimization – p. 4

slide-49
SLIDE 49

Relaxation

1 2 3 4 5 6 1 2 3 4

Optimal solution of the relaxed convex problem: (2, 3) (value: −8)

Introduction to Global Optimization – p. 4

slide-50
SLIDE 50

Stronger Relaxation

min

x∈[0,5],y∈[0,3] −x − 2y

xy ≤ 3 Thus: (5 − x)(3 − y) ≥ 0 ⇒ 15 − 3x − 5y + xy ≥ 0 ⇒ xy ≥ 3x + 5y − 15 Thus a (convex) relaxation of xy ≤ 3 is 3x + 5y − 15 ≤ 3 i.e.: 3x + 5y ≤ 18

Introduction to Global Optimization – p. 4

slide-51
SLIDE 51

Relaxation

1 2 3 4 5 6 1 2 3 4

The optimal solution of the convex (linear) relaxation is (1, 3) which is feasible ⇒optimal for the original problem

Introduction to Global Optimization – p. 5

slide-52
SLIDE 52

Convex (concave) envelopes

How to build convex envelopes of a function or how to relax a non convex constraint? Convex envelopes ⇒lower bounds Convex envelopes of −f(x) ⇒upper bounds Constraint: g(x) ≤ 0 ⇒if h(x) is a convex underestimator of g then h(x) ≤ 0 is a convex relaxations. Constraint: g(x) ≥ 0 ⇒if h(x) is concave and h(x) ≥ g(x), then h(x) ≥ 0 is a “convex” constraint

Introduction to Global Optimization – p. 5

slide-53
SLIDE 53

Convex envelopes

Definition: a function is polyhedral if it is the pointwise maximum

  • f a finite number of linear functions.

(NB: in general, the convex envelope is the pointwise supremum of affine minorants) The generating set X of a function f over a convex set P is the set X = {x ∈ Rn : (x, f(x))is a vertex of epi(convP(f))} I.e., given f we first build its convex envelop in P and then define its epigraph {(x, y) : x ∈ P, y ≥ f(x)}. This is a convex set whose extreme points can be denoted by V . X are the x coordinates of V

Introduction to Global Optimization – p. 5

slide-54
SLIDE 54

Generating sets

* * * *

Introduction to Global Optimization – p. 5

slide-55
SLIDE 55

b b b

Introduction to Global Optimization – p. 5

slide-56
SLIDE 56

Characterization

Let f(x) be continuously differentiable in a polytope P. The convex envelope of f on P is polyhedral if and only if X(f) = Vert(P) (the generating set is the vertex set of P) Corollary: let f1, . . . , fm ∈ C1(P) and

i fi(x) possess

polyhedral convex envelopes on P. Then

Conv(

  • i

fi(x)) =

  • i

Convfi(x)

iff the generating set of

i Conv(fi(x)) is Vert(P)

Introduction to Global Optimization – p. 5

slide-57
SLIDE 57

Characterization

If a f(x) is such that Convf(x) is polyhedral, than an affine function h(x) such that

  • 1. h(x) ≤ f(x) for all x ∈ Vert(P)
  • 2. there exist n + 1 affinely independent vertices of P,

V1, . . . , Vn+1 such that f(Vi) = h(Vi) i = 1, . . . , n + 1 belongs to the polyhedral description of Convf(x) and h(x) = convf(x) for any x ∈ Conv(V1, . . . , Vn+1).

Introduction to Global Optimization – p. 5

slide-58
SLIDE 58

Characterization

The condition may be reversed: given m affine functions h1, . . . , hm such that, for each of them

  • 1. hj(x) ≤ f(x) for all x ∈ Vert(P)
  • 2. there exist n + 1 affinely independent vertices of P,

V1, . . . , Vn+1 such that f(Vi) = hj(Vi) i = 1, . . . , n + 1 Then the function ψ(x) = maxj φj(x) is the convex envelope of a polyhedral function f iff the generating set of ψ is Vert(P) for every vertex Vi we have ψ(Vi) = f(Vi)

Introduction to Global Optimization – p. 5

slide-59
SLIDE 59

Sufficient condition

If f(x) is lower semi-continuous in P and for all x ∈ Vert(P) there exists a line ℓx: x ∈ interior of P ∩ ℓx and f(x) is concave in a neighborhood of x on ℓx, then Convf(x) is polyhedral Application: let f(x) =

  • i,j

αijxixj The sufficient condition holds for f in [0, 1]n ⇒bilinear forms are polyhedral in an hypercube

Introduction to Global Optimization – p. 5

slide-60
SLIDE 60

Application: a bilinear term

(Al-Khayyal, Falk (1983)): let x ∈ [ℓx, ux], y ∈ [ℓy, uy]. Then the convex envelope of xy in [ℓx, ux] × [ℓy, uy is φ(x, y) = max{ℓyx + ℓxy − ℓxℓy; uyx + uxy − uxuy} In fact: φ(x, y) is a under-estimate of xy: (x − ℓx)(y − ℓy) ≥ 0 xy ≥ ℓyx + ℓxy − ℓxℓy and analogously for xy ≥ uyx + uxy − uxuy

Introduction to Global Optimization – p. 5

slide-61
SLIDE 61

Bilinear terms

xy ≥ φ(x, y) = max{ℓyx + ℓxy − ℓxℓy; uyx + uxy − uxuy} No other (polyhedral) function underestimating xy is tighter. In fact ℓyx + ℓxy − ℓxℓy belongs to the convex envelope: it underestimates xy and coincides with xy at 3 vertices ((ℓx, ℓy), (ℓx, uy), (ux, ℓy)). Analogously for the other affine function. All vertices are interpolated by these 2 underestimating hyperplanes ⇒they form the convex envelop of xy

Introduction to Global Optimization – p. 6

slide-62
SLIDE 62

All easy then?

Of course no! Many things can go wrong . . . It is true that, on the hypercube, a bilinear form:

  • i<j

αijxixj is polyhedral (easy to see) but we cannot guarantee in general that the generating set of the envelope are the vertices of the hypercube! (in particular, if α’s have opposite signs) if the set is not an hypercube, even a bilinear term might be non polyhedral: e.g. xy on the triangle {0 ≤ x ≤ y ≤ 1} Finding the (polyhedral) convex envelope of a bilinear form on a generic polytope P is NP–hard!

Introduction to Global Optimization – p. 6

slide-63
SLIDE 63

Fractional terms

A convex underestimate of a fractional term x/y over a box can be obtained through w ≥ ℓx/y + x/uy − ℓx/uy

if ℓx ≥ 0

w ≥ x/uy − ℓxy/ℓyuy + ℓx/ℓy

if ℓx < 0

w ≥ ux/y + x/ℓy − ux/ℓy

if ℓx ≥ 0

w ≥ x/ℓy − uxy/ℓyuy + ux/uy

if ℓx < 0

(a better underestimate exists)

Introduction to Global Optimization – p. 6

slide-64
SLIDE 64

Univariate concave terms

If f(x), x ∈ [ℓx, ux], is concave, then the convex envelope is simply its linear interpolation at the extremes of the interval: f(ℓx) + f(ux) − f(ℓx) ux − ℓx (x − ℓx)

Introduction to Global Optimization – p. 6

slide-65
SLIDE 65

Underestimating a general nonconvex fun

Let f(x) ∈ C2 be general non convex. Than a convex underestimate on a box can be defined as φ(x) = f(x) −

n

  • i=1

αi(xi − ℓi)(ui − xi) where αi > 0 are parameters. The Hessian of φ is ∇2φ(x) = ∇2f(x) + 2diag(α) φ is convex iff ∇2φ(x) is positive semi-definite.

Introduction to Global Optimization – p. 6

slide-66
SLIDE 66

How to choose αi’s? One possibility: uniform choice: αi = α. In this case convexity of φ is obtained iff α ≥ max

  • 0, −1

2 min

x∈[ℓ,u] λmin(x)

  • where λmin(x) is the minimum eigenvalue of ∇2f(x)

Introduction to Global Optimization – p. 6

slide-67
SLIDE 67

Key properties

φ(x) ≤ f(x) φ interpolates f at all vertices of [ℓ, u] φ is convex Maximum separation: max(f(x) − φ(x)) = 1 4α

  • i

(ui − ℓi)2 Thus the error in underestimation decreases when the box is split.

Introduction to Global Optimization – p. 6

slide-68
SLIDE 68

Estimation of α

Compute an interval Hessian [H] : [H(x)]ij = [hL

ij(x), hU ij(x)] in

[ℓ, u] Find α such that [H] + 2diag(α) 0. Gerschgorin theorem for real matrices: λmin ≥ min

i

  • hii −
  • j=i

|hij|

  • Extension to interval matrices:

λmin ≥ min

i

  • hL

ii −

  • j=i

max{|hL

ij|, |hU ij|}uj − ℓj

ui − ℓi

  • Introduction to Global Optimization – p. 6
slide-69
SLIDE 69

Improvements

new relaxation functions (other than quadratic). Example Φ(x; γ) = −

n

  • i=1

(1 − eγi(xi−ℓi))(1 − eγi(ui−xi)) gives a tighter underestimate than the quadratic function partitioning: partition the domain into a small number of regions (hyper-rectangules); evaluate a convex underestimator in each region; join the underestimators to form a single convex function in the whole domain

Introduction to Global Optimization – p. 6

slide-70
SLIDE 70

Domain (range) reduction

Techniques for cutting the feasible region without cutting the global optimum solution. Simplest approaches: feasibility-based and optimality-based range reduction (RR). Let the problem be: min

x∈S f(x)

Feasibility based RR asks for solving ℓi = min xi ui = max xi x ∈ S x ∈ S for all i ∈ 1, . . . , n and then adding the constraints x ∈ [ℓ, u] to the problem (or to the sub-problems generated during Branch & Bound)

Introduction to Global Optimization – p. 6

slide-71
SLIDE 71

Feasibility Based RR

If S is a polyhedron, RR requires the solution of LP’s: [ℓ¯

, u¯ ] = min / max x¯ 

Ax ≤ b x ∈ [L, U] “Poor man’s” L.P . based RR: from every constraint

j aijxj ≤ bi

in which ai¯

 > 0 then

 ≤ 1

ai¯

  • bi −
  • j=¯

aijxj

 ≤ 1

ai¯

  • bi −
  • j=¯

min{aijLj, aijUj}

  • Introduction to Global Optimization – p. 7
slide-72
SLIDE 72

Optimality Based RR

Given an incumbent solution ¯ x ∈ S, ranges are updated by solving the sequence: ℓi = min xi ui = max xi f(x) ≤ f(¯ x) f(x) ≤ f(¯ x) x ∈ S x ∈ S where f(x) is a convex underestimate of f in the current domain. RR can be applied iteratively (i.e., at the end of a complete RR sequence, we might start a new one using the new bounds)

Introduction to Global Optimization – p. 7

slide-73
SLIDE 73

generalization

min

x∈X f(x)

(P) g(x) ≤ 0 a (non convex) problem; let min

x∈ ¯ X f(x)

(R) g(x) ≤ 0 be a convex relaxation of (P): {x ∈ X : g(x) ≤ 0} ⊆ {x ∈ ¯ X : g(x) ≤ 0}

and

x ∈ X : g(x) ≤ 0⇒ f(x) ≤ f(x)

Introduction to Global Optimization – p. 7

slide-74
SLIDE 74

R.H.S. perturbation

Let φ(y) = min

x∈ ¯ X f(x)

(Ry) g(x) ≤ y be a perturbation of (R). (R) convex ⇒(Ry) convex for any y. Let ¯ x: an optimal solution of (R) and assume that the i–th constraint is active: g(¯ x) = 0 Then, if ¯ xy is an optimal solution of (Ry) ⇒gi(x) ≤ yi is active at ¯ xy if yi ≤ 0

Introduction to Global Optimization – p. 7

slide-75
SLIDE 75

Duality

Assume (R) has a finite optimum at ¯ x with value φ(0) and Lagrange multipliers µ. Then the hyperplane H(y) = φ(0) − µTy is a supporting hyperplane of the graph of φ(y) at y = 0, i.e. φ(y) ≥ φ(0) − µTy ∀ y ∈ Rm

Introduction to Global Optimization – p. 7

slide-76
SLIDE 76

Main result

If (R) is convex with optimum value φ(0), constraint i is active at the optimum and the Lagrange multiplier is µi > 0 then, if U is an upper bound for the original problem (P) the constraint: gi(x) ≥ −(U − L)/µi (where L = φ(0)) is valid for the original problem (P), i.e. it does not exclude any feasible solution with value better than U.

Introduction to Global Optimization – p. 7

slide-77
SLIDE 77

proof

Problem (Ry) can be seen as a convex relaxation of the perturbed non convex problem Φ(y) = min

x∈X f(x)

g(x) ≤ y and thus φ(y) ≤ Φ(y). Thus underestimating (Ry) produces an underestimate of Φ(y). Let y := eiyi; From duality: L − µTeiyi ≤ φ(eiyi) ≤ Φ(eiyi) If yi < 0 then U is an upper bound also for Φ(eiyi), thus L − µiyi ≤ U. But if yi < 0 then constraint i is active. For any feasible x there exists a yi < 0 such that g(x) ≤ yi is active ⇒we may substitute yi with gi(x) and deduce L − µigi(x) ≤ U

Introduction to Global Optimization – p. 7

slide-78
SLIDE 78

Applications

Range reduction: let x ∈ [ℓ, u] in the convex relaxed problem. If variable xi is at its upper bound in the optimal solution, them we can deduce xi ≥ max{ℓi, ui − (U − L)/λi} where λi is the optimal multiplier associated to the i–th upper

  • bound. Analogously for active lower bounds:

xi ≤ min{ui, ℓi + (U − L)/λi}

Introduction to Global Optimization – p. 7

slide-79
SLIDE 79

Let the constraint aT

i x ≤ bi

be active in an optimal solution of the convex relaxation (R). Then we can deduce the valid inequality aiTx ≥ bi − (U − L)/µi

Introduction to Global Optimization – p. 7

slide-80
SLIDE 80

Methods based on “merit functions”

Bayesian algorithm: the objective function is considered as a realization of a stochastic process f(x) = F(x; ω) A loss function is defined, e.g.: L(x1, ..., xn; ω) = min

i=1,n F(xi; ω) − min x F(x; ω)

and the next point to sample is placed in order to minimize the expected loss (or risk) xn+1 = arg min E (L(x1, ..., xn, xn+1) | x1, ..., xn) = arg min E (min(F(xn+1; ω) − F(x; ω)) | x1, ..., xn)

Introduction to Global Optimization – p. 7

slide-81
SLIDE 81

Radial basis method

Given k observations (x1, f1), . . . , (xk, fk), an interpolant is built: s(x) =

n

  • i=1

λiΦ(x − xi) + p(x) p: polynomial of a (prefixed) small degree m. Φ: radial function like, e.g.: Φ(r) = r linear Φ(r) = r3 cubic Φ(r) = r2 log r thin plate spline Φ(r) = e−γr2 gaussian Polynomial p is necessary to guarantee existence of a unique interpolant (i.e. when the matrix {Φij = Φ(xi − xj)} is singular)

Introduction to Global Optimization – p. 8

slide-82
SLIDE 82

“Bumpiness”

Let f ⋆

k an estimate of the value of the global optimum after k

  • bservations. Let sy

k the (unique) interpolant of the data points

(xi, fi)i = 1, . . . , k (y, f ⋆

k)

Idea: the most likely location of y is such that the resulting interpolant has minimum “bumpiness” Bumpiness measure: σ(sk) = (−1)m+1 λisy

k(xi)

Introduction to Global Optimization – p. 8

slide-83
SLIDE 83

TO BE DONE

Introduction to Global Optimization – p. 8

slide-84
SLIDE 84

Stochastic methods

Pure Random Search - random uniform sampling over the feasible region Best start: like Pure Random Search, but a local search is started from the best observation Multistart: Local searches started from randomly generated starting points

Introduction to Global Optimization – p. 8

slide-85
SLIDE 85
  • 3
  • 2
  • 1

1 2 3 1 2 3 4 5

r s r s r s r s r s r s r s r s r s r s

+ + + + + + + + + +

Introduction to Global Optimization – p. 8

slide-86
SLIDE 86
  • 3
  • 2
  • 1

1 2 3 1 2 3 4 5

r s r s r s r s r s r s r s r s r s r s

+ + + + + + + + + +

Introduction to Global Optimization – p. 8

slide-87
SLIDE 87

Clustering methods

Given a uniform sample, evaluate the objective function Sample Transformation (or concentration): either a fraction

  • f “worst” points are discarded, or a few steps of a gradient

method are performed Remaining points are clustered from the best point in each cluster a single local search is started

Introduction to Global Optimization – p. 8

slide-88
SLIDE 88

Uniform sample

−1 −3 −5

r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s

1 2 3 4 5 1 2 3 4 5

Introduction to Global Optimization – p. 8

slide-89
SLIDE 89

Sample concentration

−1 −3 −5

r s r s r s r s r s r s r s r s r s r s r s r s r s r s r s

+ + + + + + + + + + + + + + + +

1 2 3 4 5 1 2 3 4 5

Introduction to Global Optimization – p. 8

slide-90
SLIDE 90

Clustering

−1 −3 −5

r r r r r r r r r r u r u r r

1 2 3 4 5 1 2 3 4 5

Introduction to Global Optimization – p. 8

slide-91
SLIDE 91

Local optimization

−1 −3 −5

r r r r r r r r r r u r u r r

1 2 3 4 5 1 2 3 4 5

Introduction to Global Optimization – p. 9

slide-92
SLIDE 92

Clustering: MLSL

Sampling proceed in batches of N points. Given sample points X1, . . . , Xk ∈ [0, 1]n, label Xj as “clustered” iff ∃ Y ∈ X1, . . . , Xk: ||Xj − Y || ≤ ∆k := 1 √ 2π log k k σΓ

  • 1 + n

2 1

n

and f(Y ) ≤ f(Xj)

Introduction to Global Optimization – p. 9

slide-93
SLIDE 93

Simple Linkage

A sequential sample is generated (batches consist of a single

  • bservation). A local search is started only from the last

sampled point (i.e. there is no “recall”) unless there exists a sufficiently near sampled point with better function valure

Introduction to Global Optimization – p. 9

slide-94
SLIDE 94

Smoothing methods

Given f : Rn → R, the Gaussian transform is defined as: fλ(x) = 1 πn/2λn

  • Rn f(y) exp
  • −y − x2/λ2

When λ is sufficiently large ⇒fλ is convex. Idea: starting with a large enough λ, minimize the smoothed function and slowly decrease λ towards 0.

Introduction to Global Optimization – p. 9

slide-95
SLIDE 95

Smoothing methods

  • 10
  • 5

5 10

  • 10
  • 5

5 10 0.5 1 1.5 2 2.5 3

Introduction to Global Optimization – p. 9

slide-96
SLIDE 96
  • 10
  • 5

5 10

  • 10
  • 5

5 10 0.5 1 1.5 2 2.5 3

Introduction to Global Optimization – p. 9

slide-97
SLIDE 97
  • 10
  • 5

5 10

  • 10
  • 5

5 10 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

Introduction to Global Optimization – p. 9

slide-98
SLIDE 98
  • 10
  • 5

5 10

  • 10
  • 5

5 10 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Introduction to Global Optimization – p. 9

slide-99
SLIDE 99
  • 10
  • 5

5 10

  • 10
  • 5

5 10 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Introduction to Global Optimization – p. 9

slide-100
SLIDE 100

Transformed function landscape

Elementary idea: local optimization smooths out many “high frequency” oscillations

Introduction to Global Optimization – p. 9

slide-101
SLIDE 101

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-102
SLIDE 102

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-103
SLIDE 103

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-104
SLIDE 104

Monotonic Basin-Hopping

k := 0; f ⋆ := +∞; while k < MaxIter do Xk: random initial solution X⋆

k = arg min f(x; Xk);

(local minimization started at Xk) fk = f(X⋆

k);

if fk < f ⋆ = ⇒ f ⋆ := fk NoImprove := 0; while NoImprove < MaxImprove do X = random perturbation of Xk Y = arg minf(x; X) ; if f(Y ) < f ⋆ = ⇒ Xk := Y ; NoImprove := 0; f ⋆ := f(Y )

  • therwise

NoImprove + + end while end while

Introduction to Global Optimization – p. 10

slide-105
SLIDE 105

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-106
SLIDE 106

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-107
SLIDE 107

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-108
SLIDE 108

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10

slide-109
SLIDE 109

1 2 3 4 5 6 7 8 9 10 Introduction to Global Optimization – p. 10