Reduced-Hessian Methods for Constrained Optimization Philip E. Gill - - PowerPoint PPT Presentation

reduced hessian methods for constrained optimization
SMART_READER_LITE
LIVE PREVIEW

Reduced-Hessian Methods for Constrained Optimization Philip E. Gill - - PowerPoint PPT Presentation

Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its Applications Huatulco, Mexico,


slide-1
SLIDE 1

Reduced-Hessian Methods for Constrained Optimization

Philip E. Gill

University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong

11th US & Mexico Workshop on Optimization and its Applications Huatulco, Mexico, January 8–12, 2018.

UC San Diego | Center for Computational Mathematics 1/45

slide-2
SLIDE 2

Our honoree . . .

UC San Diego | Center for Computational Mathematics 2/45

slide-3
SLIDE 3

Outline

1 Reduced-Hessian Methods for Unconstrained Optimization 2 Bound-Constrained Optimization 3 Quasi-Wolfe Line Search 4 Reduced-Hessian Methods for Bound-Constrained Optimization 5 Some Numerical Results

UC San Diego | Center for Computational Mathematics 3/45

slide-4
SLIDE 4

Reduced-Hessian Methods for Unconstrained Optimization

UC San Diego | Center for Computational Mathematics 4/45

slide-5
SLIDE 5

Definitions

Minimize f : Rn → R ∈ C 2 with quasi-Newton line-search method: Given xk, let fk = f (xk), gk = ∇f (xk), and Hk ≈ ∇2f (xk). Choose pk such that xk + pk minimizes the quadratic model qk(x) = fk + gT

k (x − xk) + 1 2(x − xk)THk(x − xk)

If Hk is positive definite then pk satisfies Hkpk = −gk (qN step)

UC San Diego | Center for Computational Mathematics 5/45

slide-6
SLIDE 6

Definitions

Define xk+1 = xk + αkpk where αk is obtained from line search on φk(α) = f (xk + αpk)

  • Armijo condition:

φk(α) < φk(0) + ηAαφ′

k(0),

ηA ∈ (0, 1

2)

  • (strong) Wolfe conditions:

φk(α) < φk(0) + ηAαφ′

k(0),

ηA ∈ (0, 1

2)

|φ′

k(α)| ≤ ηW|φ′ k(0)|,

ηW ∈ (ηA, 1)

UC San Diego | Center for Computational Mathematics 6/45

slide-7
SLIDE 7

α f(xk + αpk) Wolfe conditions

slide-8
SLIDE 8

Quasi-Newton Methods

Updating Hk:

  • H0 = σIn where σ > 0
  • Compute Hk+1 as the BFGS update to Hk, i.e.,

Hk+1 = Hk − 1 sT

k Hksk

HksksT

k Hk +

1 yT

k sk

ykyT

k ,

where sk = xk+1 − xk, yk = gk+1 − gk, and yT

k sk

approximates the curvature of f along pk.

  • Wolfe condition guarantees that Hk can be updated.

One option to calculate pk:

  • Store upper-triangular Cholesky factor Rk where RT

k Rk = Hk

UC San Diego | Center for Computational Mathematics 8/45

slide-9
SLIDE 9

Quasi-Newton Methods

Updating Hk:

  • H0 = σIn where σ > 0
  • Compute Hk+1 as the BFGS update to Hk, i.e.,

Hk+1 = Hk − 1 sT

k Hksk

HksksT

k Hk +

1 yT

k sk

ykyT

k ,

where sk = xk+1 − xk, yk = gk+1 − gk, and yT

k sk

approximates the curvature of f along pk.

  • Wolfe condition guarantees that Hk can be updated.

One option to calculate pk:

  • Store upper-triangular Cholesky factor Rk where RT

k Rk = Hk

UC San Diego | Center for Computational Mathematics 8/45

slide-10
SLIDE 10

Reduced-Hessian Methods

(Fenelon, 1981 and Siegel, 1992)

Let Gk = span(g0, g1, . . . , gk) and G⊥

k be the orthogonal

complement of Gk in Rn.

Result

Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H0 = σI (σ > 0), then:

  • pk ∈ Gk for all k.
  • If z ∈ Gk and w ∈ G⊥

k , then Hkz ∈ Gk and Hkw = σw.

UC San Diego | Center for Computational Mathematics 9/45

slide-11
SLIDE 11

Reduced-Hessian Methods

(Fenelon, 1981 and Siegel, 1992)

Let Gk = span(g0, g1, . . . , gk) and G⊥

k be the orthogonal

complement of Gk in Rn.

Result

Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H0 = σI (σ > 0), then:

  • pk ∈ Gk for all k.
  • If z ∈ Gk and w ∈ G⊥

k , then Hkz ∈ Gk and Hkw = σw.

UC San Diego | Center for Computational Mathematics 9/45

slide-12
SLIDE 12

Reduced-Hessian Methods

(Fenelon, 1981 and Siegel, 1992)

Let Gk = span(g0, g1, . . . , gk) and G⊥

k be the orthogonal

complement of Gk in Rn.

Result

Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H0 = σI (σ > 0), then:

  • pk ∈ Gk for all k.
  • If z ∈ Gk and w ∈ G⊥

k , then Hkz ∈ Gk and Hkw = σw.

UC San Diego | Center for Computational Mathematics 9/45

slide-13
SLIDE 13

Reduced-Hessian Methods

(Fenelon, 1981 and Siegel, 1992)

Let Gk = span(g0, g1, . . . , gk) and G⊥

k be the orthogonal

complement of Gk in Rn.

Result

Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H0 = σI (σ > 0), then:

  • pk ∈ Gk for all k.
  • If z ∈ Gk and w ∈ G⊥

k , then Hkz ∈ Gk and Hkw = σw.

UC San Diego | Center for Computational Mathematics 9/45

slide-14
SLIDE 14

Reduced-Hessian Methods

Significance of pk ∈ Gk:

  • No need to minimize the quadratic model over the full space.
  • Search directions lie in an expanding sequence of subspaces.

Significance of Hkz ∈ Gk and Hkw = σw:

  • Curvature stored in Hk along any unit vector in G⊥

k is σ.

  • All nontrivial curvature information in Hk can be stored in a

smaller rk × rk matrix, where rk = dim(Gk).

UC San Diego | Center for Computational Mathematics 10/45

slide-15
SLIDE 15

Reduced-Hessian Methods

Significance of pk ∈ Gk:

  • No need to minimize the quadratic model over the full space.
  • Search directions lie in an expanding sequence of subspaces.

Significance of Hkz ∈ Gk and Hkw = σw:

  • Curvature stored in Hk along any unit vector in G⊥

k is σ.

  • All nontrivial curvature information in Hk can be stored in a

smaller rk × rk matrix, where rk = dim(Gk).

UC San Diego | Center for Computational Mathematics 10/45

slide-16
SLIDE 16

Reduced-Hessian Methods

Given a matrix Bk ∈ Rn×rk, whose columns span Gk, let

  • Bk = ZkTk be the QR decomposition of Bk;
  • Wk be a matrix whose orthonormal columns span G⊥

k ;

  • Qk =
  • Zk

Wk

  • .

Then, Hkpk = −gk ⇔ (QT

k HkQk)QT k pk = −QT k gk, where

QT

k HkQk =

  • Z T

k HkZk

Z T

k HkWk

W T

k HkZk

W T

k HkWk

  • =
  • Z T

k HkZk

σIn−rk

  • QT

k gk =

  • Z T

k gk

  • .

UC San Diego | Center for Computational Mathematics 11/45

slide-17
SLIDE 17

Reduced-Hessian Methods

Given a matrix Bk ∈ Rn×rk, whose columns span Gk, let

  • Bk = ZkTk be the QR decomposition of Bk;
  • Wk be a matrix whose orthonormal columns span G⊥

k ;

  • Qk =
  • Zk

Wk

  • .

Then, Hkpk = −gk ⇔ (QT

k HkQk)QT k pk = −QT k gk, where

QT

k HkQk =

  • Z T

k HkZk

Z T

k HkWk

W T

k HkZk

W T

k HkWk

  • =
  • Z T

k HkZk

σIn−rk

  • QT

k gk =

  • Z T

k gk

  • .

UC San Diego | Center for Computational Mathematics 11/45

slide-18
SLIDE 18

Reduced-Hessian Methods

Given a matrix Bk ∈ Rn×rk, whose columns span Gk, let

  • Bk = ZkTk be the QR decomposition of Bk;
  • Wk be a matrix whose orthonormal columns span G⊥

k ;

  • Qk =
  • Zk

Wk

  • .

Then, Hkpk = −gk ⇔ (QT

k HkQk)QT k pk = −QT k gk, where

QT

k HkQk =

  • Z T

k HkZk

Z T

k HkWk

W T

k HkZk

W T

k HkWk

  • =
  • Z T

k HkZk

σIn−rk

  • QT

k gk =

  • Z T

k gk

  • .

UC San Diego | Center for Computational Mathematics 11/45

slide-19
SLIDE 19

Reduced-Hessian Methods

A reduced-Hessian (RH) method obtains pk from pk = Zkqk where qk solves Z T

k HZkqk = −Z T k gk,

(RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization RT

k Rk = Z T k HkZk.

  • The new gradient gk+1 is accepted iff (I − ZkZ T

k )gk+1 > ǫ.

  • Store and update Zk, Rk, Z T

k pk, Z T k gk, and Z T k gk+1.

UC San Diego | Center for Computational Mathematics 12/45

slide-20
SLIDE 20

Reduced-Hessian Methods

A reduced-Hessian (RH) method obtains pk from pk = Zkqk where qk solves Z T

k HZkqk = −Z T k gk,

(RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization RT

k Rk = Z T k HkZk.

  • The new gradient gk+1 is accepted iff (I − ZkZ T

k )gk+1 > ǫ.

  • Store and update Zk, Rk, Z T

k pk, Z T k gk, and Z T k gk+1.

UC San Diego | Center for Computational Mathematics 12/45

slide-21
SLIDE 21

Hk = QkQT

k HkQkQT k

=

  • Zk

Wk Z T

k HkZk

σIn−rk Zk Wk

  • = Zk(Z T

k HkZk)Z T k + σ(I − ZkZ T k ).

⇒ any z such that Z T

k z = 0 satisfies Hkz = σz.

UC San Diego | Center for Computational Mathematics 13/45

slide-22
SLIDE 22

Reduced-Hessian Method Variants

Reinitialization: If gk+1 ∈ Gk, the Cholesky factor Rk is updated as Rk+1 ← Rk √σk+1

  • ,

where σk+1 is based on the latest estimate of the curvature, e.g., σk+1 = yT

k sk

sT

k sk

. Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace.

UC San Diego | Center for Computational Mathematics 14/45

slide-23
SLIDE 23

Reduced-Hessian Method Variants

Reinitialization: If gk+1 ∈ Gk, the Cholesky factor Rk is updated as Rk+1 ← Rk √σk+1

  • ,

where σk+1 is based on the latest estimate of the curvature, e.g., σk+1 = yT

k sk

sT

k sk

. Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace.

UC San Diego | Center for Computational Mathematics 14/45

slide-24
SLIDE 24

Reduced-Hessian Method Variants

Limited-memory: instead of storing the full approximate Hessian, keep information from only the last m steps (m ≪ n). Key differences:

  • Form Zk from search directions instead of the gradients.
  • Must store Tk from Bk = ZkTk to update most quantities.
  • Drop columns from Bk when necessary.

UC San Diego | Center for Computational Mathematics 15/45

slide-25
SLIDE 25

Reduced-Hessian Method Variants

Limited-memory: instead of storing the full approximate Hessian, keep information from only the last m steps (m ≪ n). Key differences:

  • Form Zk from search directions instead of the gradients.
  • Must store Tk from Bk = ZkTk to update most quantities.
  • Drop columns from Bk when necessary.

UC San Diego | Center for Computational Mathematics 15/45

slide-26
SLIDE 26

Main idea: Maintain Bk = ZkTk, where Bk is the m × n matrix Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • with m ≪ n (e.g., m = 5).

Bk = ZkTk is not available until pk has been computed. However, if Zk is a basis for span(pk−m+1, . . . , pk−1, pk) then it is also a basis for span(pk−m+1, . . . , pk−1, gk), i.e., only the triangular factor associated with the (common) basis for each of these subspaces will be different.

UC San Diego | Center for Computational Mathematics 16/45

slide-27
SLIDE 27

Main idea: Maintain Bk = ZkTk, where Bk is the m × n matrix Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • with m ≪ n (e.g., m = 5).

Bk = ZkTk is not available until pk has been computed. However, if Zk is a basis for span(pk−m+1, . . . , pk−1, pk) then it is also a basis for span(pk−m+1, . . . , pk−1, gk), i.e., only the triangular factor associated with the (common) basis for each of these subspaces will be different.

UC San Diego | Center for Computational Mathematics 16/45

slide-28
SLIDE 28

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-29
SLIDE 29

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-30
SLIDE 30

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-31
SLIDE 31

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-32
SLIDE 32

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-33
SLIDE 33

At the start of iteration k, suppose we have the factors of B(g)

k

=

  • pk−m+1

pk−m+2 · · · pk−1 gk

  • Compute pk.
  • Swap pk with gk to give the factors of

Bk =

  • pk−m+1

pk−m+2 · · · pk−1 pk

  • Compute xk+1, gk+1, etc., using a Wolfe line search.
  • Update and reinitialize the factor Rk.
  • If gk+1 is accepted, compute the factors of

B(g)

k+1 =

  • pk−m+2

pk−m+2 · · · pk gk+1

  • UC San Diego | Center for Computational Mathematics

17/45

slide-34
SLIDE 34

Summary of key differences:

  • Form Zk from search directions instead of gradients.
  • Must store Tk from Bk = ZkTk to update most quantities.
  • Can store Bk instead of Zk.
  • Drop columns from Bk when necessary.
  • Half the storage of conventional limited-memory approaches.

Retains finite termination property for quadratic f (both with and without reinitialization).

UC San Diego | Center for Computational Mathematics 18/45

slide-35
SLIDE 35

Summary of key differences:

  • Form Zk from search directions instead of gradients.
  • Must store Tk from Bk = ZkTk to update most quantities.
  • Can store Bk instead of Zk.
  • Drop columns from Bk when necessary.
  • Half the storage of conventional limited-memory approaches.

Retains finite termination property for quadratic f (both with and without reinitialization).

UC San Diego | Center for Computational Mathematics 18/45

slide-36
SLIDE 36

Bound-Constrained Optimization

UC San Diego | Center for Computational Mathematics 19/45

slide-37
SLIDE 37

Bound Constraints

Given ℓ, u ∈ Rn, solve minimize

x∈Rn

f (x) subject to ℓ ≤ x ≤ u. Focus on line-search methods that use the BFGS method:

  • Projected-gradient [Byrd, Lu, Nocedal, Zhu (1995)]
  • Projected-search [Bertsekas (1982)]

These methods are designed to move on and off constraints rapidly and identify the active set after a finite number of iterations.

UC San Diego | Center for Computational Mathematics 20/45

slide-38
SLIDE 38

Projected-Gradient Methods

UC San Diego | Center for Computational Mathematics 21/45

slide-39
SLIDE 39

Definitions

A(x) = {i : xi = ℓi or xi = ui} Define the projection P(x) componentwise, where [P(x)]i =      ℓi if xi < ℓi, ui if xi > ui, xi

  • therwise.

Given an iterate xk, define the piecewise linear paths x−gk(α) = P(xk − αgk) and xpk(α) = P(xk + αpk)

UC San Diego | Center for Computational Mathematics 22/45

slide-40
SLIDE 40

Algorithm L-BFGS-B

Given xk and qk(x), a typical iteration of L-BFGS-B looks like:

xk −gk

Move along projected path x−gk(α)

UC San Diego | Center for Computational Mathematics 23/45

slide-41
SLIDE 41

Algorithm L-BFGS-B

xk −gk xc

k

Find xc

k , the first point that minimizes qk(x) along x−gk(α)

UC San Diego | Center for Computational Mathematics 23/45

slide-42
SLIDE 42

Algorithm L-BFGS-B

xk −gk xc

k

  • x

Find x, the minimizer of qk(x) with xi fixed for every i ∈ A(xc

k )

UC San Diego | Center for Computational Mathematics 23/45

slide-43
SLIDE 43

Algorithm L-BFGS-B

xk −gk xc

k

  • x

¯ x

Find ¯ x by projecting x onto the feasible region

UC San Diego | Center for Computational Mathematics 23/45

slide-44
SLIDE 44

Algorithm L-BFGS-B

xk −gk xc

k

  • x

pk ¯ x

Wolfe line search along pk with αmax = 1 to ensure feasibility

UC San Diego | Center for Computational Mathematics 23/45

slide-45
SLIDE 45

Projected-Search Methods

UC San Diego | Center for Computational Mathematics 24/45

slide-46
SLIDE 46

Definitions

A(x) = {i : xi = ℓi or xi = ui} W(x) = {i : (xi = ℓi and gi > 0) or (xi = ui and gi < 0)} Given a point x and vector p, the projected vector of p at x, Px(p), is defined componentwise, where [Px(p)]i =

  • if i ∈ W(x),

pi

  • therwise.

Given a subspace S, the projected subspace of S at x is defined as Px(S) = {Px(p) : p ∈ S}.

UC San Diego | Center for Computational Mathematics 25/45

slide-47
SLIDE 47

Definitions

A(x) = {i : xi = ℓi or xi = ui} W(x) = {i : (xi = ℓi and gi > 0) or (xi = ui and gi < 0)} Given a point x and vector p, the projected vector of p at x, Px(p), is defined componentwise, where [Px(p)]i =

  • if i ∈ W(x),

pi

  • therwise.

Given a subspace S, the projected subspace of S at x is defined as Px(S) = {Px(p) : p ∈ S}.

UC San Diego | Center for Computational Mathematics 25/45

slide-48
SLIDE 48

Projected-Search Methods

A typical projected-search method updates xk as follows:

  • Compute W(xk)
  • Calculate pk as the solution of

min

p∈Pxk (Rn) qk(xk + p)

  • Obtain xk+1 from an Armijo-like line search on

ψ(α) = f (xpk(α))

UC San Diego | Center for Computational Mathematics 26/45

slide-49
SLIDE 49

Projected-Search Methods

Given xk and f (x), a typical iteration looks like:

pk xk

UC San Diego | Center for Computational Mathematics 27/45

slide-50
SLIDE 50

Projected-Search Methods

pk xk

UC San Diego | Center for Computational Mathematics 27/45

slide-51
SLIDE 51

Projected-Search Methods

pk xk

Armijo-like line search along P(xk + αpk)

UC San Diego | Center for Computational Mathematics 27/45

slide-52
SLIDE 52

Quasi-Wolfe Line Search

UC San Diego | Center for Computational Mathematics 28/45

slide-53
SLIDE 53

Can’t use Wolfe conditions: ψ(α) is a continuous, piecewise differentiable function with cusps where xpk(α) changes direction. As ψ(α) = f

  • xp(α)
  • is only piecewise differentiable, it is not

possible to know when an interval contains a Wolfe step.

UC San Diego | Center for Computational Mathematics 29/45

slide-54
SLIDE 54

Definition

Let ψ′

−(α) and ψ′ +(α) denote left- and right-derivatives of ψ(α).

Define a [strong] quasi-Wolfe step to be an Armijo-like step that satisfies at least one of the following conditions:

  • |ψ′

−(α)| ≤ ηW|ψ′ +(0)|

  • |ψ′

+(α)| ≤ ηW|ψ′ +(0)|

  • ψ′

−(α) ≤ 0 ≤ ψ′ +(α) UC San Diego | Center for Computational Mathematics 30/45

slide-55
SLIDE 55

Definition

Let ψ′

−(α) and ψ′ +(α) denote left- and right-derivatives of ψ(α).

Define a [strong] quasi-Wolfe step to be an Armijo-like step that satisfies at least one of the following conditions:

  • |ψ′

−(α)| ≤ ηW|ψ′ +(0)|

  • |ψ′

+(α)| ≤ ηW|ψ′ +(0)|

  • ψ′

−(α) ≤ 0 ≤ ψ′ +(α) UC San Diego | Center for Computational Mathematics 30/45

slide-56
SLIDE 56

Definition

Let ψ′

−(α) and ψ′ +(α) denote left- and right-derivatives of ψ(α).

Define a [strong] quasi-Wolfe step to be an Armijo-like step that satisfies at least one of the following conditions:

  • |ψ′

−(α)| ≤ ηW|ψ′ +(0)|

  • |ψ′

+(α)| ≤ ηW|ψ′ +(0)|

  • ψ′

−(α) ≤ 0 ≤ ψ′ +(α) UC San Diego | Center for Computational Mathematics 30/45

slide-57
SLIDE 57

Definition

Let ψ′

−(α) and ψ′ +(α) denote left- and right-derivatives of ψ(α).

Define a [strong] quasi-Wolfe step to be an Armijo-like step that satisfies at least one of the following conditions:

  • |ψ′

−(α)| ≤ ηW|ψ′ +(0)|

  • |ψ′

+(α)| ≤ ηW|ψ′ +(0)|

  • ψ′

−(α) ≤ 0 ≤ ψ′ +(α) UC San Diego | Center for Computational Mathematics 30/45

slide-58
SLIDE 58

Theory

Analogous conditions for the existence of the Wolfe step imply the existence of the quasi-Wolfe step. Quasi-Wolfe line searches are ideal for projected-search methods. If ψ is differentiable the quasi-Wolfe and Wolfe conditions are identical. In rare cases, Hk cannot be updated after quasi-Wolfe step.

UC San Diego | Center for Computational Mathematics 31/45

slide-59
SLIDE 59

Reduced-Hessian Methods for Bound-Constrained Optimization

UC San Diego | Center for Computational Mathematics 32/45

slide-60
SLIDE 60

Motivation

Projected-search methods typically calculate pk as the solution to min

p∈Pxk (Rn) qk(xk + p)

If Nk has orthonormal columns that span Pxk(Rn): pk = Nkqk where qk solves (NT

k HkNk)qk = −NT k gk

An RH method for bound-constrained optimization (RH-B) solves pk = Zkqk where qk solves (Z T

k HkZk)qk = −Z T k gk

where the orthonormal columns of Zk span Pxk(Gk).

UC San Diego | Center for Computational Mathematics 33/45

slide-61
SLIDE 61

Motivation

Projected-search methods typically calculate pk as the solution to min

p∈Pxk (Rn) qk(xk + p)

If Nk has orthonormal columns that span Pxk(Rn): pk = Nkqk where qk solves (NT

k HkNk)qk = −NT k gk

An RH method for bound-constrained optimization (RH-B) solves pk = Zkqk where qk solves (Z T

k HkZk)qk = −Z T k gk

where the orthonormal columns of Zk span Pxk(Gk).

UC San Diego | Center for Computational Mathematics 33/45

slide-62
SLIDE 62

Details

Builds on limited-memory RH method framework, plus:

  • Update values dependent on Bk when Wk = Wk−1
  • Use projected-search trappings (working set, projections, ...).
  • Use line search compatible with projected-search methods.
  • Update Hk with curvature in direction restricted to

range(Zk+1).

UC San Diego | Center for Computational Mathematics 34/45

slide-63
SLIDE 63

Cost of Changing the Working Set

An RH-B method can be explicit or implicit. An explicit method stores and uses Zk and an implicit method stores and uses Bk. When Wk = Wk−1, all quantities dependent on Bk are updated:

  • Dropping nd indices: ≈ 21m2nd flops if implicit
  • Adding na indices: ≈ 24m2na flops if implicit
  • Using an explicit method: +6mn(na + nd) flops to update Zk

UC San Diego | Center for Computational Mathematics 35/45

slide-64
SLIDE 64

Some Numerical Results

UC San Diego | Center for Computational Mathematics 36/45

slide-65
SLIDE 65

Results

LRH-B (v1.0) and LBFGS-B (v3.0) LRH-B implemented in Fortran 2003; LBFGS-B in Fortran 77. 373 problems from CUTEst test set, with n between 1 and 192, 627. Termination: Pxk(gk)∞ < 10−5 or 300K itns or 1000 cpu secs.

UC San Diego | Center for Computational Mathematics 37/45

slide-66
SLIDE 66

Implementation

Algorithm LRH-B (v1.0):

  • Limited-memory: restricted to m preceding search directions.
  • Reinitialization: included if n > min(6, m).
  • Lingering: excluded.
  • Method type: implicit.
  • Updates: Bk, Tk, Rk updated for all Wk = Wk−1.

UC San Diego | Center for Computational Mathematics 38/45

slide-67
SLIDE 67

Default parameters

2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function evaluations for 374 UC/BC problems LBFGS-B m = 5 LRH-B m = 10

2 4 6 8 10

Times for 374 UC/BC problems LBFGS-B m = 5 LRH-B m = 10

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Name m Failed LBFGS-B 5 78 LRH-B 10 49

UC San Diego | Center for Computational Mathematics 39/45

slide-68
SLIDE 68

2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function evaluations for 374 UC/BC problems LBFGS-B m = 5 LRH-B m = 5

2 4 6 8 10

Times for 374 UC/BC problems LBFGS-B m = 5 LRH-B m = 5

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Name m Failed LBFGS-B 5 78 LRH-B 5 50

UC San Diego | Center for Computational Mathematics 40/45

slide-69
SLIDE 69

1 2 3 4 5 6 7 8 9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function evaluations for 374 UC/BC problems LBFGS-B m = 10 LRH-B m = 10

2 4 6 8 10

Times for 374 UC/BC problems LBFGS-B m = 10 LRH-B m = 10

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Name m Failed LBFGS-B 10 78 LRH-B 10 49

UC San Diego | Center for Computational Mathematics 41/45

slide-70
SLIDE 70

2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function evaluations for 374 UC/BC problems LBFGS-B m = 5 LBFGS-B m = 10 LRH-B m = 5 LRH-B m = 10

2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

LBFGS-B m = 5 LBFGS-B m = 10 LRH-B m = 5 LRH-B m = 10 Times for 374 UC/BC problems

UC San Diego | Center for Computational Mathematics 42/45

slide-71
SLIDE 71

Outstanding Issues

  • Projected-search methods:
  • When to update/factor?

Better to refactor when there are lots of changes to A.

  • Implement plane rotations via level-two BLAS.
  • How complex should we make the quasi-Wolfe search?

UC San Diego | Center for Computational Mathematics 43/45

slide-72
SLIDE 72

Thank you!

UC San Diego | Center for Computational Mathematics 44/45

slide-73
SLIDE 73

References

  • D. P. Bertsekas, Projected Newton methods for optimization problems with simple

constraints, SIAM J. Control Optim., 20:221–246, 1982.

  • R. Byrd, P. Lu, J. Nocedal and C. Zhu, A limited-memory algorithm for bound

constrained optimization, SIAM J. Sci. Comput. 16:1190–1208, 1995.

  • M. C. Fenelon, Preconditioned conjugate-gradient-type methods for large-scale

unconstrained optimization, Ph.D. thesis, Department of Operations Research, Stanford University, Stanford, CA, 1981.

  • M. W. Ferry, P. E. Gill and E. Wong, A limited-memory reduced Hessian method for

bound-constrained optimization, Report CCoM 18-01, Center for Computational Mathematics, University of California, San Diego, 2018 (to appear).

  • P. E. Gill and M. W. Leonard, Reduced-Hessian quasi-Newton methods for

unconstrained optimization, SIAM J. Optim., 12:209–237, 2001.

  • P. E. Gill and M. W. Leonard, Limited-memory reduced-Hessian methods for

unconstrained optimization, SIAM J. Optim., 14:380–401, 2003.

  • J. Morales and J. Nocedal, Remark on “Algorithm 778: L-BFGS-B: Fortran

subroutines for large-scale bound constrained optimization”, ACM

  • Trans. Math. Softw., 38(1)7:1–7:4, 2011.
  • D. Siegel, Modifying the BFGS update by a new column scaling technique,

Math. Program., 66:45–78, 1994.

UC San Diego | Center for Computational Mathematics 45/45