Algorithms for constrained local optimization Fabio Schoen 2008 - - PowerPoint PPT Presentation

algorithms for constrained local optimization
SMART_READER_LITE
LIVE PREVIEW

Algorithms for constrained local optimization Fabio Schoen 2008 - - PowerPoint PPT Presentation

Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained local optimization p.


slide-1
SLIDE 1

Algorithms for constrained local

  • ptimization

Fabio Schoen 2008

http://gol.dsi.unifi.it/users/schoen

Algorithms for constrained local optimization – p.

slide-2
SLIDE 2

Feasible direction methods

Algorithms for constrained local optimization – p.

slide-3
SLIDE 3

Frank–Wolfe method

Let X: convex set. Consider the problem: min

x∈X f(x)

Let xk ∈ X ⇒choosing a feasible direction dk corresponds to choosing a point x ∈ X : dk = x − xk. “Steepest descent” choice: min

x∈X ∇Tf(xk)(x − xk)

(a linear objective with convex constraints, usually easy to solve). Let ˆ xk be an optimal solution of this problem.

Algorithms for constrained local optimization – p.

slide-4
SLIDE 4

Frank–Wolfe

If ∇Tf(xk)(ˆ xk − xk) = 0 then ∇Tf(xk)d ≥ 0 for every feasible direction d ⇒first order necessary conditions hold. Otherwise, letting dk = ˆ xk − x, this is a descent direction along which a step αk ∈ (0, 1] might be chosen according to Armijo’s rule.

Algorithms for constrained local optimization – p.

slide-5
SLIDE 5

Convergence of Frank-Wolfe method

Under mild conditions the method converges to a point satisfying first order necessary conditions. However it is usually extremely slow (convergence may be sub–linear) It might find applications in very large scale problems in which solving the sub-problem for direction determination is very easy (e.g. when X is a polytope).

Algorithms for constrained local optimization – p.

slide-6
SLIDE 6

Gradient Projection methods

Generic iteration: xk+1 = xk + αk(¯ xk − xk) where the direction dk = ¯ xk − xk is obtained finding ¯ xk = [xk − sk∇f(xk)]+ where: sk ∈ R+ and [·]+ represents projection over the feasible set.

Algorithms for constrained local optimization – p.

slide-7
SLIDE 7

The method is slightly faster than Frank-Wolfe, with a linear convergence rate similar to that of (unconstrained) steepest descent. It might be applied if projection is relatively cheap, e.g. when the feasible set is a box. A point xk satisfies first order necessary conditions dT∇f(xk) ≥ 0 iff xk = [xk − sk∇f(xk)]+

Algorithms for constrained local optimization – p.

slide-8
SLIDE 8

Lagrange Multiplier Algorithms

Algorithms for constrained local optimization – p.

slide-9
SLIDE 9

Barrier Methods

min f(x) gj(x) ≤ 0 j = 1, . . . , r A Barrier is a continuous function which tends to +∞ whenever x approaches the boundary of the feasible region. Examples of barrier functions: B(x) = −

  • j

log(−gj(x))

logaritmic barrier

B(x) = −

  • j

1 gj(x)

invers barrier

Algorithms for constrained local optimization – p.

slide-10
SLIDE 10

Barrier Method

Let εk ↓ 0 and x0 strictly feasible, i.e. gj(x0) < 0 ∀ j. Then let xk = arg min

x∈Rn(f(x) + εkB(x))

Proposition: every limit point of {xk} is a global minimum of the constrained optimization problem

Algorithms for constrained local optimization – p. 1

slide-11
SLIDE 11

Analysis of Barrier methods

Special case: a single constraint (might be generalized) Let ¯ x be a limit point of {xk} (a global minimum). If KKT conditions hold, then there exists a unique λ ≥ 0: ∇f(¯ x) + λ∇g(¯ x) = 0 (with λg(¯ x) = 0. xk, solution of the barrier problem min f(x) + εkB(x) g(x) < 0 satisfies ∇f(xk) + εk∇B(xk) = 0

Algorithms for constrained local optimization – p. 1

slide-12
SLIDE 12

. . .

If B(x) = φ(g(x)), ⇒ ∇f(xk) + εkφ′(g(xk))∇g(xk) = 0 In the limit, for k → ∞: lim εkφ′(g(xk))∇g(xk) = λ∇g(¯ x) if limk g(xk) < 0 ⇒φ′(g(xk))∇g(xk) → K (finite) and Kεk → 0 if limk g(xk) = 0 ⇒(thanks to the unicity of Lagrange multipliers), λ = lim

k εkφ′(g(xk))

Algorithms for constrained local optimization – p. 1

slide-13
SLIDE 13

Difficulties in Barrier Methods

strong numeric instability: the condition number of the hessian matrix grows as εk → 0 need for an initial strictly feasible point x0 (partial) remedy: εk is very slowly decreased and the solution of the k + 1–th problem is obtained starting an unconstrained

  • ptimization from xk

Algorithms for constrained local optimization – p. 1

slide-14
SLIDE 14

Example

min(x − 1)2 + (y − 1)2 x + y ≤ 1 Logarithmic Barrier problem: min(x − 1)2 + (y − 1)2 − εk log(1 − x − y) x + y − 1 < 0 Gradient:   2(x − 1) +

εk 1−x−y

2(y − 1) +

εk 1−x−y

  Stationary points x = y = 3

4 ± √1+εk 4

(only the “-” solution is acceptable)

Algorithms for constrained local optimization – p. 1

slide-15
SLIDE 15

Barrier methods and L.P .

min cTx Ax = b x ≥ 0 Logarithmic Barrier on x ≥ 0: min cTx − ε

  • j

log xj Ax = b x > 0

Algorithms for constrained local optimization – p. 1

slide-16
SLIDE 16

The central path

The starting point is usually associated with ε = ∞ and is the unique solution of min −

  • j

log xj Ax = b x > 0 The trajectory x(ε) of solutions to the barrier problem is called the central path and leads to an optimal solution of the LP .

Algorithms for constrained local optimization – p. 1

slide-17
SLIDE 17

Penalty Methods

Penalized problem: min f(x) + ρP(x) where ρ > 0 and P(x) ≥ 0 with P(x) = 0 if x is feasible. Example: min f(x) hi(x) = 0 i = 1, . . . , m A penalized problem might be: min f(x) + ρ

  • i

hi(x)2

Algorithms for constrained local optimization – p. 1

slide-18
SLIDE 18

Convergence of the quadratic penalty me

(for equality constrained problems): let P(x; ρ) = f(x) + ρ

  • i

hi(x)2 Given ρ0 > 0, x0 ∈ Rn, k = 0, let xk+1 = arg min P(x; ρk) (found with an iterative method initialized at xk); let ρk+1 > ρk, k := k + 1. If xk+1 is a global minimizer of P and ρk → ∞ then every limit point of {xk} is a global optimum of the constrained problem.

Algorithms for constrained local optimization – p. 1

slide-19
SLIDE 19

Exact penalties

Exact penalties: there exists a penalty parameter value s.t. the

  • ptimal solution to the penalized problem is the optimal solution
  • f the original one.

ℓ1 penalty function: P1(x; ρ) = f(x) + ρ

  • i

|hi(x)|

Algorithms for constrained local optimization – p. 1

slide-20
SLIDE 20

Exact penalties

for inequality constrained problems: min f(x) hi(x) = 0 gj(x) ≤ 0 the penalized problem is P1(x; ρ) = f(x)ρ

  • i

|hi(x)| + ρ

  • j

max(0, −gj(x))

Algorithms for constrained local optimization – p. 2

slide-21
SLIDE 21

Augmented Lagrangian method

Given an equality constrained problem, reformulate it as: min f(x) + 1 2ρh(x)2 h(x) = 0 The Lagrange function of this problem is called Augmented Lagrangian: L(x; λ) = f(x) + 1 2ρh(x)2 + λTh(x)

Algorithms for constrained local optimization – p. 2

slide-22
SLIDE 22

Motivation

min

x f(x) + 1

2ρh(x)2 + λT h(x) ∇xLρ(x, λ) = ∇f(x) +

  • i

λi∇h(x) + ρh(x)∇h(x) = ∇xL(x, λ) + ρh(x)∇h(x) ∇2

xxLρ(x, λ) = ∇2f(x) +

  • i

λi∇2h(x) + ρh(x)∇2h(x) + ρ∇h(x)∇T h(x) = ∇2

xxL(x, λ) + ρh(x)∇2h(x) + ρ∇h(x)∇T h(x)

Algorithms for constrained local optimization – p. 2

slide-23
SLIDE 23

motivation . . .

Let (x⋆, λ⋆) an optimal (primal and dual) solution. Necessarily: ∇xL(x⋆, λ⋆) = 0; moreover h(x⋆) = 0 thus ∇xLρ(x⋆, λ⋆) = ∇xL(x⋆, λ⋆) + ρh(x⋆)∇h(x⋆) = 0 ⇒(x⋆, λ⋆) is a stationary point for the augmented lagrangian.

Algorithms for constrained local optimization – p. 2

slide-24
SLIDE 24

motivation . . .

Observe that: ∇2

xxLρ(x, λ) = ∇2 xxL(x, λ) + ρh(x)∇2h(x) + ρ∇h(x)∇Th(x)

= ∇2

xxL(x, λ) + ρ∇h(x)∇Th(x)

Assume that sufficient optimality conditions hold: vT∇2

xxL(x⋆, λ⋆)v > 0

∀ v : vT∇h(x⋆) = 0,

Algorithms for constrained local optimization – p. 2

slide-25
SLIDE 25

. . .

Let v = 0 : vT∇h(x⋆)= 0. Then vT∇2

xxLρ(x⋆, λ⋆)vT = vT∇2 xxL(x⋆, λ⋆)vT + ρvT∇h(x⋆)∇Th(x⋆)v

= vT∇2

xxL(x⋆, λ⋆)vT> 0

Algorithms for constrained local optimization – p. 2

slide-26
SLIDE 26

. . .

Let v = 0 : vT∇h(x⋆)= 0. Then vT∇2

xxLρ(x⋆, λ⋆)vT = vT∇2 xxL(x⋆, λ⋆)vT + ρvT∇h(x⋆)∇Th(x⋆)v

= vT∇2

xxL(x⋆, λ⋆)vT + ρ(vT∇h(x⋆))2

which might be negative. However ∃¯ ρ > 0: if ρ ≥ ¯ ρ ⇒vT∇2

xxLρ(x⋆, λ⋆)vT > 0.

Thus, if ρ is large enough, the Hessian of the augmented lagrangian is positive definite and x⋆ is a (strict) local minimum

  • f Lρ(·, λ⋆)

Algorithms for constrained local optimization – p. 2

slide-27
SLIDE 27

Inequality constraints

min f(x) g(x) ≤ 0 Nonlinear transformation of inequalities into equalities: min

x,s f(x)

gj(x) + s2

j = 0

j = 1, p

Algorithms for constrained local optimization – p. 2

slide-28
SLIDE 28

Given the problem min f(x) hi(x) = 0 i = 1, m gj(x) ≤ 0 j = 1, p an Augmented Lagrangian problem might be defined as min Lρ(x, z; λ, µ) = min

x,z f(x) + λTh(x) + 1

2ρh(x)2 +

  • j

µj(gj(x) + z2

j ) + 1

  • j

(gj(x) + z2

j )2

Algorithms for constrained local optimization – p. 2

slide-29
SLIDE 29

. . .

Consider minimization with respect to z variables: min

z

  • j

µj(gj(x) + z2

j ) + 1

  • j

(gj(x) + z2

j )2

= min

u≥0

  • j

µj(gj(x) + uj) + 1 2ρ(gj(x) + uj)2 (quadratic minimization over the nonnegative orthant). Solution: u⋆

j = max{0, ¯

uj} where ¯ u is the unconstrained optimum: ¯ u : µj + ρ(gj(x) + ¯ uj) = 0

Algorithms for constrained local optimization – p. 2

slide-30
SLIDE 30

. . .

Thus: u⋆

j = max{0, −µj

ρ − gj(x)}. Substituting: Lρ(x; λ, µ) = f(x) + λTh(x) + 1 2ρh(x)2 + 1 2ρ

  • j
  • max{0, µj + ρgj(x)} − µ2

j

  • This is an Augmented Lagragian for inequality constrained

problems.

Algorithms for constrained local optimization – p. 3

slide-31
SLIDE 31

Sequential Quadratic Programming

min f(x) hi(x) = 0 Idea: apply Newton’s method to solve the KKT equations: Lagrangian function: L(x; λ) = f(x) +

  • i

λihi(x) let H(x) = [hi(x)] , ∇H(x) = [∇hi(x)]. KKT conditions: F[x; λ] =

  • ∇f(x) + ∇HT(x)λ

H(x)

  • = 0

Algorithms for constrained local optimization – p. 3

slide-32
SLIDE 32

Newton step for SQP

Jacobian of KKT system: F ′(x, λ) =

  • ∇2

xxL(x; λ)

∇TH(x) ∇H(x)

  • Newton step:
  • xk+1

λk+1

  • =
  • xk

λk

  • +
  • dk

∆k

  • where
  • ∇2

xxL(xk; λk)

∇TH(xk) ∇H(xk) dk ∆k

  • =
  • −∇f(xk) − ∇HT(xk)λk

−H(xk)

  • Algorithms for constrained local optimization – p. 3
slide-33
SLIDE 33

existence

The Newton step exists if the Jacobian of the constraint set ∇H(xk) has full row rank the Hessian ∇2

xxL(xk; λk) is positive definite

In this case the Newton step is the unique solution of ∇2

xxL(xk; λk)dk + ∇TH(xk)∆k + ∇f(xk) + ∇HT(xk)λk = 0

∇H(xk)dk + H(xk) = 0

Algorithms for constrained local optimization – p. 3

slide-34
SLIDE 34

Alternative view: SQP

min

d f(xk) + ∇f(xk)Td + 1

2dT∇2

xxL(xk; λk)d

∇H(xk)d + H(xk) = 0 KKT conditions: ∇2

xxL(xk; λk)d + ∇f(xk) + ∇H(xk)Λk = 0

Under the same conditions as before this QP has a unique solution dk with Lagrange multipliers Λk = λk+1

Algorithms for constrained local optimization – p. 3

slide-35
SLIDE 35

Alternative view: SQP

min

d L(xk, λk) + ∇T x L(xk, λk)d + 1

2dT∇2

xxL(xk; λk)d

∇H(xk)d + H(xk) = 0 KKT conditions: ∇2

xxL(xk; λk)d + ∇f(xk) + ∇H(xk)λk + ∇H(xk)Λk = 0

Under the same conditions as before this QP has a unique solution dk with Lagrange multipliers Λk = ∆k+1

Algorithms for constrained local optimization – p. 3

slide-36
SLIDE 36

Thus SQP can be seen as a method which minimizes a quadratic approximation to the Lagrangian subject to a first order approximation of the constraints.

Algorithms for constrained local optimization – p. 3

slide-37
SLIDE 37

Inequalities

If the original problem is min f(x) hi(x) = 0 gj(x) ≤ 0 then the SQP iteration solves min

d

fk + ∇f(xk)Td + 1 2dT∇2

xxL(xk, λk)d

∇T

i hi(xk)p + hi(xk) = 0

∇T

j gj(xk)p + gj(xk) ≤ 0

Algorithms for constrained local optimization – p. 3

slide-38
SLIDE 38

Filter Methods

Basic idea: min f(x) g(x) ≤ 0 can be considered as a problem with two objectives: minimize f(x) minimize g(x) (the second objective has priority over the first)

Algorithms for constrained local optimization – p. 3

slide-39
SLIDE 39

Filter

Given the problem min f(x) gj(x) ≤ 0 j = 1, . . . , k let us consider the bi-criteria optimization problem min f(x) min h(x) where h(x) =

  • j

max{gj(x), 0}

Algorithms for constrained local optimization – p. 3

slide-40
SLIDE 40

Let {fk, hk, k = 1, 2, . . .} the observed values of f and h at points x1, x2, . . .. A pair (fk, hk) dominates a pair (fℓ, hℓ) iff fk ≤ fℓ

and

hk ≤ hℓ A filter is a list of pairs which are non-dominated by the others

Algorithms for constrained local optimization – p. 4

slide-41
SLIDE 41

b c b c b c b c b c

h(x) f(x)

Algorithms for constrained local optimization – p. 4

slide-42
SLIDE 42

Trust region SQP

Consider a Trust-region SQP method: min

d fk + ∇L(xk; λk)Td + 1

2dT∇2

xxL(xk; λk)d

∇T

j gj(xk)p + gj(xk) ≤ 0

d∞ ≤ ρ (the ∞ norm is used here in order to keep the problem a QP) Traditional (unconstrained) trust region methods: if the current step is a failure ⇒reduce the trust region ⇒eventually the step will become a pure gradient step ⇒convergence!

Algorithms for constrained local optimization – p. 4

slide-43
SLIDE 43

Trust region SQP

Here diminishing the trust region radius might lead to infeasible QP’s:

gj(x) ≤ 0 ∇T

j gj(xk)p + gj(xk) ≤ 0

b c

xk

Algorithms for constrained local optimization – p. 4

slide-44
SLIDE 44

Filter methods

Data: x0: starting point, ρ, k = 0 while Convergence criterion not satisfied do if QP is infeasible then

Find xk+1 minimizing constraint violation;

else

Solve QP and get a step dk; try setting xk+1 = xk + dk;

if (fk+1, hk+1) is acceptable to the filter then

Accept xk+1 and add (fk+1, hk+1) to the filter; Remove dominated points from the filter; Possibly increase ρ;

else

Reject the step; Reduce ρ;

end end

set k = k + 1;

end

Algorithms for constrained local optimization – p. 4

slide-45
SLIDE 45

Comparison with other methods

b c b c b c b c b c

h(x) f(x) acceptable steps "classical" method Rejected filter steps

Algorithms for constrained local optimization – p. 4