NLO: June 13, 2013 Dr. Thomas M. Surowiec Humboldt University of - - PowerPoint PPT Presentation

nlo june 13 2013
SMART_READER_LITE
LIVE PREVIEW

NLO: June 13, 2013 Dr. Thomas M. Surowiec Humboldt University of - - PowerPoint PPT Presentation

Introduction to Quasi-Newton Methods Local Convergence Theory NLO: June 13, 2013 Dr. Thomas M. Surowiec Humboldt University of Berlin Department of Mathematics Summer 2013 Dr. Thomas M. Surowiec BMS Course NLO, Summer 2013 Introduction to


slide-1
SLIDE 1

Introduction to Quasi-Newton Methods Local Convergence Theory

NLO: June 13, 2013

  • Dr. Thomas M. Surowiec

Humboldt University of Berlin Department of Mathematics

Summer 2013

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-2
SLIDE 2

Introduction to Quasi-Newton Methods Local Convergence Theory

Motivation

The only real difference between QN methods and the classical Newton method is the usage of second derivatives in the latter. As the calculation of the Hessian can be quite expensive computationally,

  • ne tries to avoid this.

Basic Idea:

1

Find Hk such that Hkdk = −∇f(xk)

2

Do line search to obtain αk and set xk+1 = xk + αkdk

3

Use xk, xk+1,Hk to obtain the update Hk+1.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-3
SLIDE 3

Introduction to Quasi-Newton Methods Local Convergence Theory

Motivation

For the intial matrix, we chose a simple, symmetric pos. def. matrix. Oftentimes I. Some of the benefits of this method include:

1

Only need 1st derivatives.

2

Hk (will be chosen to) always be pos. def., dk a descent direction.

3

Some variants only require O(n2) multiplications per iteration. Note all QN methods can guarantee Hk pos. def. One speaks of variable metric methods, when Hk is always pos. def.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-4
SLIDE 4

Introduction to Quasi-Newton Methods Local Convergence Theory

Deriving Update Rules

Starting with a pos. def. sym. matrix, one choses a simple ansatz such that a condition called the QN condition holds. Let sa = αada ( = −αa(Hk)−1∇f(xk)) ya = ∇f(x+) − ∇f(xa), x+ = xa + sa Using Taylor’s theorem, we can motivate the following important conditions known as the Quasi-Newton or Secant Condition: Any updated Hessian approximation H+ should satisfy: H+sa = ya

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-5
SLIDE 5

Introduction to Quasi-Newton Methods Local Convergence Theory

Deriving Update Rules

Using α ∈ R, u ∈ Rn and the ansatz H+ = Ha + αuuT, we get the symmetric-rank-1-update: H+ = Ha + (ya − Hasa)(ya − Hasa)T (ya − Hasa)Tsa Drawbacks:

1

H+ not necessarily pos. def.

2

if ya − Hasa ≈ 0 or (ya − Hasa)Tsa ≈ 0, then numerical problems appear. Using α ∈ R, u, v ∈ Rn and the ansatz H+ = Ha + αuv T, we get the non-symmetric-rank-1-update: H+ = Ha + (ya − Hasa)sT

a

sT

a sa

This is update is decidedly disadvantageous.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-6
SLIDE 6

Introduction to Quasi-Newton Methods Local Convergence Theory

Deriving Update Rules: BFGS & DFP

Using α, β ∈ R, u, v ∈ Rn and the ansatz H+ = Ha + αuuT + βvv T, we get the symmetric-rank-2-update: H+ = Ha + yay T

a

y T

a sa

− (Hasa)(Hasa)T sT

a Hasa

This is known as the “Broyden-Fletcher-Goldfarb-Shanno” (BFGS) update, it will be our main focus. One can also directly update the inverse. B+ = Ba + sasT

a

sT

a ya

− (Baya)(Baya)T yT

a Baya

This symmetric-rank-2-update of the inverse is known as the “Davidon-Fletcher-Powell” (DFP) update. This reduces the theoretical bound for the number of multiplications needed for the calculation. However, a large body of experimental evidence shows that the BFGS outperforms this method.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-7
SLIDE 7

Introduction to Quasi-Newton Methods Local Convergence Theory

Properties of BFGS

Lemma 1.1 Let Ha ∈ Sn be positive definite, yT

a sa > 0, and H+ determined according to

  • BFGS. Then H+ ∈ Sn is positive definite.

Proof. On the board.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-8
SLIDE 8

Introduction to Quasi-Newton Methods Local Convergence Theory

Invariance under Affine Transformations

By showing that the BFGS and Newton method’s are invariant under affine transformations, we greatly simplify the analysis in the coming days. In particular, we will be apply to assume that ∇2f(x∗) = I.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-9
SLIDE 9

Introduction to Quasi-Newton Methods Local Convergence Theory

Central Convergence Result

Theorem 2.1 Let (A) be satisfied. Then ∃δ > 0 such that for ||x0 − x∗|| ≤ δ and ||H0 − ∇2f(x∗)|| ≤ δ, the BFGS method is well-defined and converges q-superlinearly to x∗. The proof requires a number of observations and auxiliary results.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-10
SLIDE 10

Introduction to Quasi-Newton Methods Local Convergence Theory

Proving the Convergence Result

Lemma 2.1 Let (A) be satisfied. If Ha ∈ Sn is positive definite and x+ = xa − H−1

a ∇f(xa),

then there exists a δ0 > 0 such that for 0 < ||xa − x∗|| ≤ δ0 and ||Fa|| ≤ δ0 it holds that y T

a sa > 0.

Moreover, if H+ is the BFGS update of Ha, then it follows that F+ = H−1

+

− I = (I − wawT

a )Fa(I − wawT a ) + Da

with wa = sa/||sa||, Da ∈ Rn×n, and ||Da|| ≤ KD||sa|| with KD > 0. Fa is always used to represent the error of the inverse of the current (a for aktuell) approximation in regards to Hessian at the solution.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013

slide-11
SLIDE 11

Introduction to Quasi-Newton Methods Local Convergence Theory

Proving the Convergence Result

Corollary 2.1 Under the same assumptions, on has ||F+|| ≤ ||Fa|| + KD||sa|| ≤ ||Fa|| + KD(||xa − x∗|| + ||x+ − x∗||) Proof. Since we are working in finite dimensions, we may take any norm for the space Rn×n. Therefore, we let || · || be the Frobenius norm. By expanding the term (I − wawT

a )Fa(I − wawT a ) and considering ||(I − wawT a )Fa(I − wawT a )||2,

we obtain ||F+|| ≤ ||Fa|| + KD||sa|| for some KD > 0. The rest follows from the triangle inequality.

  • Dr. Thomas M. Surowiec

BMS Course NLO, Summer 2013