Adaptive Low Complexity Algorithms for Unconstrained Minimization - - PowerPoint PPT Presentation

adaptive low complexity algorithms for unconstrained
SMART_READER_LITE
LIVE PREVIEW

Adaptive Low Complexity Algorithms for Unconstrained Minimization - - PowerPoint PPT Presentation

The minimization problem and classical solvers Previous contribution: L QN descent methods New contribution: Adaptive L QN descent methods Adaptive Low Complexity Algorithms for Unconstrained Minimization Carmine Di Fiore, Stefano Fanelli, Paolo


slide-1
SLIDE 1

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Adaptive Low Complexity Algorithms for Unconstrained Minimization

Carmine Di Fiore, Stefano Fanelli, Paolo Zellini mailto:difiore@mat.uniroma2.it Cortona, September 2004

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-2
SLIDE 2

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

1

The minimization problem and classical solvers

2

Previous contribution: LQN descent methods

3

New contribution: Adaptive LQN descent methods

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-3
SLIDE 3

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The minimization problem and classical solvers

f (x∗) = minx∈Rn f (x), find x∗

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-4
SLIDE 4

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The minimization problem and classical solvers

f (x∗) = minx∈Rn f (x), find x∗ Descent methods generate a minimizing sequence {xk}+∞

k=0 by the iterative scheme:

x0 ∈ Rn, g0 = ∇f (x0), d0 = −g0 For k = 0, 1, . . .              xk+1 = xk + λkdk λk > 0 gk+1 = ∇f (xk+1) Bk+1 = n × n matrix, positive definite (pd) dk+1 = −B−1

k+1gk+1

  • descent direction

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-5
SLIDE 5

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The Newton descent method Bk+1 = ∇2f (xk+1) A quadratic rate of convergence O(n3) arithmetic operations to compute xk+1 from xk

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-6
SLIDE 6

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The Newton descent method Bk+1 = ∇2f (xk+1) A quadratic rate of convergence O(n3) arithmetic operations to compute xk+1 from xk Quasi-Newton (QN) descent methods Bk+1 defined in terms of ∇f A superlinear rate of convergence Convergence under weak analytical assumptions O(n2) arithmetic operations to compute xk+1 from xk O(n2) memory allocations for implementation

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-7
SLIDE 7

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The Newton descent method Bk+1 = ∇2f (xk+1) A quadratic rate of convergence O(n3) arithmetic operations to compute xk+1 from xk Quasi-Newton (QN) descent methods Bk+1 defined in terms of ∇f A superlinear rate of convergence Convergence under weak analytical assumptions O(n2) arithmetic operations to compute xk+1 from xk O(n2) memory allocations for implementation Main example: the BFGS method (Broyden et al.’70)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-8
SLIDE 8

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

BFGS

x0 ∈ Rn, d0 = −g0 For k = 0, 1, . . .        xk+1 = xk + λkdk λk | sT

k yk > 0

Bk+1 = ϕ (Bk, xk+1 − xk

  • sk

, gk+1 − gk

  • yk

) dk+1 = −B−1

k+1gk+1

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-9
SLIDE 9

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

BFGS

x0 ∈ Rn, d0 = −g0 For k = 0, 1, . . .        xk+1 = xk + λkdk λk | sT

k yk > 0

Bk+1 = ϕ (Bk, xk+1 − xk

  • sk

, gk+1 − gk

  • yk

) dk+1 = −B−1

k+1gk+1

ϕ properties ⇒

  • Bk+1 inherites positive definiteness from Bk

Proof: B pd & sTy > 0 ⇒ ϕ(B, s, y) pd

  • Bk+1(xk+1 − xk) = gk+1 − gk

Proof: ϕ(B, s, y)s = y

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-10
SLIDE 10

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The updating function ϕ in Bk+1 = ϕ (Bk, sk, yk) is ϕ (B, s, y) = B + 1 yTsyyT − 1 sTBsBssTB ⇒ BFGS is a secant method: Bk+1(xk+1 − xk

  • sk

) = gk+1 − gk

  • yk

secant equation Proof (independent on B): ϕ(B, s, y)s =

  • B +

1 yT syyT − 1 sT BsBssTB

  • s

= Bs +

1 yT sy(yTs) − 1 sT BsBs(sTBs)

= y

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-11
SLIDE 11

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Quasi-Newton (QN) descent methods for large scale problems Bk+1 defined in terms of ∇f A fast rate of convergence Convergence under weak analytical assumptions less than O(n2) arithmetic operations to compute xk+1 from xk less than O(n2) memory allocations for implementation Classical example: the Limited memory BFGS method (Nocedal et al. ’80)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-12
SLIDE 12

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Quasi-Newton (QN) descent methods for large scale problems Bk+1 defined in terms of ∇f A fast rate of convergence Convergence under weak analytical assumptions less than O(n2) arithmetic operations to compute xk+1 from xk less than O(n2) memory allocations for implementation Classical example: the Limited memory BFGS method (Nocedal et al. ’80) A recent proposal: the LQN method (Di Fiore, Fanelli, Zellini et al. ’00)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-13
SLIDE 13

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Previous contribution: LQN descent methods

Replace the matrix Bk in Bk+1 = ϕ(Bk, sk, yk) with a matrix Ak of a low complexity space L

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-14
SLIDE 14

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Previous contribution: LQN descent methods

Replace the matrix Bk in Bk+1 = ϕ(Bk, sk, yk) with a matrix Ak of a low complexity space L Choice of L Bk ∈ sd U for some unitary matrix U, where sd U = { Ud(z)U∗ : z ∈ C n }, d(z) =      z1 · · · z2 . . . ... . . . · · · zn      ⇒ choose L = sd U, U =fast unitary transform (U = Fourier, Hartley, . . .)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-15
SLIDE 15

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Choice of Ak in L Ak = the best least squares fit to Bk in L = sd U, i.e. Ak = LBk where LBk − BkF = min

X∈L X − BkF

The LQN algorithm x0 ∈ Rn, d0 = −g0 For k = 0, 1, . . .        xk+1 = xk + λkdk λk | sT

k yk > 0

Bk+1 = ϕ( LBk , xk+1 − xk

  • sk

, gk+1 − gk

  • yk

) dk+1 = −B−1

k+1gk+1

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-16
SLIDE 16

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Bk+1 = ϕ(LBk, sk, yk)

  • Bk+1 inherites positive definiteness from Bk

Proof: B pd ⇒ LB pd

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-17
SLIDE 17

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Bk+1 = ϕ(LBk, sk, yk)

  • Bk+1 inherites positive definiteness from Bk

Proof: B pd ⇒ LB pd

  • Bk+1sk = yk, i.e. LQN is a secant method

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-18
SLIDE 18

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Bk+1 = ϕ(LBk, sk, yk)

  • Bk+1 inherites positive definiteness from Bk

Proof: B pd ⇒ LB pd

  • Bk+1sk = yk, i.e. LQN is a secant method
  • Bk+1 projected on L gives rise the Eigenvalue Updating Formula

zk+1 = zk + 1 sT

k yk

|U∗yk|2 − 1 zk T|U∗sk|2 d(zk)2|U∗sk|2 (EUF)

where LBk = Ud(zk)U∗. (EUF) and the Sherman-Morrison formula imply that each step of LQN can be performed via two matrix-vector products U · z and some inner products

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-19
SLIDE 19

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Bk+1 = ϕ(LBk, sk, yk)

  • Bk+1 inherites positive definiteness from Bk

Proof: B pd ⇒ LB pd

  • Bk+1sk = yk, i.e. LQN is a secant method
  • Bk+1 projected on L gives rise the Eigenvalue Updating Formula

zk+1 = zk + 1 sT

k yk

|U∗yk|2 − 1 zk T|U∗sk|2 d(zk)2|U∗sk|2 (EUF)

where LBk = Ud(zk)U∗. (EUF) and the Sherman-Morrison formula imply that each step of LQN can be performed via two matrix-vector products U · z and some inner products Main result: U = fast transform ⇒ Space complexity: O(n)= memory allocations for U Time complexity (per step): O(n log n)= cost of U · z

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-20
SLIDE 20

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

LQN rate of convergence Theory : linear rate of convergence

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-21
SLIDE 21

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

LQN rate of convergence Theory : linear rate of convergence Experiments : fast rate of convergence, competitive with L-BFGS

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-22
SLIDE 22

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

LQN rate of convergence Theory : linear rate of convergence Experiments : fast rate of convergence, competitive with L-BFGS

  • The Ionosphere data set (n = 1408)

100 200 300 400 seconds 0.1 2 4 6 8 error function L-B 90 L-B 30 L-B 13 HQN

Figure: LQN and L-BFGS applied to a function of 1408 variables

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-23
SLIDE 23

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

New contribution: Adaptive LQN descent methods

In the updating formula Bk+1 = ϕ(LBk, sk, yk) adapt the space L to the current iteration

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-24
SLIDE 24

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

New contribution: Adaptive LQN descent methods

In the updating formula Bk+1 = ϕ(LBk, sk, yk) adapt the space L to the current iteration The adaptive criterion A LQN drawback with respect to BFGS is

that the updated matrix LBk does not solve the previous secant equation Xsk−1 = yk−1

Let Lsy be the matrix of L = sd U s.t. Lsy sk−1 = yk−1 (Lsy = LBk) ⇒ Lsy = Udiag

  • [U∗yk−1]i

[U∗sk−1]i

  • U∗

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-25
SLIDE 25

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

New contribution: Adaptive LQN descent methods

In the updating formula Bk+1 = ϕ(LBk, sk, yk) adapt the space L to the current iteration The adaptive criterion A LQN drawback with respect to BFGS is

that the updated matrix LBk does not solve the previous secant equation Xsk−1 = yk−1

Let Lsy be the matrix of L = sd U s.t. Lsy sk−1 = yk−1 (Lsy = LBk) ⇒ Lsy = Udiag

  • [U∗yk−1]i

[U∗sk−1]i

  • U∗

AIM: LBk close to Lsy during the minimization procedure

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-26
SLIDE 26

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

New contribution: Adaptive LQN descent methods

In the updating formula Bk+1 = ϕ(LBk, sk, yk) adapt the space L to the current iteration The adaptive criterion A LQN drawback with respect to BFGS is

that the updated matrix LBk does not solve the previous secant equation Xsk−1 = yk−1

Let Lsy be the matrix of L = sd U s.t. Lsy sk−1 = yk−1 (Lsy = LBk) ⇒ Lsy = Udiag

  • [U∗yk−1]i

[U∗sk−1]i

  • U∗

AIM: LBk close to Lsy during the minimization procedure → Lsy positive definite like LBk U =fast transform s.t. [U∗yk−1]i [U∗sk−1]i > 0

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-27
SLIDE 27

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-28
SLIDE 28

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but . . . . . . λk | sT

k yk > 0

Bk+1 = ϕ(LBk, sk, yk) if Lskyk is pd then{ dk+1 = −B−1

k+1gk+1

}

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-29
SLIDE 29

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but . . . . . . λk | sT

k yk > 0

Bk+1 = ϕ(LBk, sk, yk) if Lskyk is pd then{ dk+1 = −B−1

k+1gk+1

} else { dk+1 = −(LBk+1)−1gk+1 ← temporary descent direction

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-30
SLIDE 30

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but . . . . . . λk | sT

k yk > 0

Bk+1 = ϕ(LBk, sk, yk) if Lskyk is pd then{ dk+1 = −B−1

k+1gk+1

} else { dk+1 = −(LBk+1)−1gk+1 ← temporary descent direction define a fast transform U s.t. Lskyk is pd

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-31
SLIDE 31

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but . . . . . . λk | sT

k yk > 0

Bk+1 = ϕ(LBk, sk, yk) if Lskyk is pd then{ dk+1 = −B−1

k+1gk+1

} else { dk+1 = −(LBk+1)−1gk+1 ← temporary descent direction define a fast transform U s.t. Lskyk is pd set L = sd U }

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-32
SLIDE 32

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

The adaptive LQN algorithm Like the LQN algorithm, but . . . . . . λk | sT

k yk > 0

Bk+1 = ϕ(LBk, sk, yk) if Lskyk is pd then{ dk+1 = −B−1

k+1gk+1

} else { dk+1 = −(LBk+1)−1gk+1 ← temporary descent direction define a fast transform U s.t. Lskyk is pd set L = sd U } How to define such U ?

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-33
SLIDE 33

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Definition of U Lsy = Udiag

  • [U∗yk]i

[U∗sk]i

  • U∗ is positive definite iff

U is such that [U∗yk]i [U∗sk]i > 0 ∀i (Crit)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-34
SLIDE 34

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Definition of U Lsy = Udiag

  • [U∗yk]i

[U∗sk]i

  • U∗ is positive definite iff

U is such that [U∗yk]i [U∗sk]i > 0 ∀i (Crit) Main results: Under our hypothesis on λk (λk | sT

k yk > 0) a

matrix U satisfying (Crit) exists and can be obtained as the product of two Householder matrices: U = H(u)H(p), H(z) = I − 2 z2 zz∗ (u, p suitable vectors), ⇒ Space complexity: O(n)= memory allocations for U Time complexity (per step): O(n)= cost of U · z (better than LQN)

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-35
SLIDE 35

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Rate of convergence of adaptive LQN Experiments : fast rate of convergence, competitive with LQN

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-36
SLIDE 36

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Rate of convergence of adaptive LQN Experiments : fast rate of convergence, competitive with LQN

  • The Ionosphere data set (n = 1408)

10 20 30 40 50 seconds 0.1 2 4 6 8 error function LkQN HQN

Figure: LQN and adaptive LQN applied to a function of 1408 variables

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-37
SLIDE 37

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

  • The Iris plant data set (n = 315)

Number of iterations to obtain f (xk) < 0.1

f x1 x2 x3 x4 LQN 10930 13108 3854 7663 adaptive LQN 3430 1663 3647 1525

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-38
SLIDE 38

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

  • The Iris plant data set (n = 315)

Number of iterations to obtain f (xk) < 0.1

f x1 x2 x3 x4 LQN 10930 13108 3854 7663 adaptive LQN 3430 1663 3647 1525

Number of iterations to obtain f (xk) < 0.01

f x1 x2 x3 x4 LQN 24085 42344 6184 33250 adaptive LQN 19961 2886 8306 3111

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min

slide-39
SLIDE 39

The minimization problem and classical solvers Previous contribution: LQN descent methods New contribution: Adaptive LQN descent methods

Two strategies Secant equation: Lsysk = yk Best least squares approximation: LBk − BkF = minX∈L X − BkF How to apply both strategies ? The adaptive LQN algorithm illustrated is a possible solution Work in progress: look for other solutions

Cortona, September 2004

  • C. Di Fiore, S. Fanelli, P. Zellini

Adaptive Low Complexity Algorithms for Unconstrained Min