In case you missed it - Who am I ? Name: S ebastien Gros - - PowerPoint PPT Presentation

in case you missed it who am i
SMART_READER_LITE
LIVE PREVIEW

In case you missed it - Who am I ? Name: S ebastien Gros - - PowerPoint PPT Presentation

In case you missed it - Who am I ? Name: S ebastien Gros Nationality: Swiss Residence: G oteborg, Sweden Affiliation: Chalmers University of Technology Department: Signals & Systems Position: Assistant Professor Email:


slide-1
SLIDE 1

In case you missed it - Who am I ?

Name: S´ ebastien Gros Nationality: Swiss Residence: G¨

  • teborg, Sweden

Affiliation: Chalmers University of Technology Department: Signals & Systems Position: Assistant Professor Email: grosse@chalmers.se Tel. +46 31 772 15 55 Recent research topics: distributed & parallelized methods for optimal control, estimation & system identification, NMPC & Economic NMPC, optimal control for complex mechanical systems, integrators for real-time optimal control, robust optimal control, aerospace applications, airborne wind energy, wind turbine control, smart grids, traffic control

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 1 / 32

slide-2
SLIDE 2

Numerical Optimal Control with DAEs Lecture 5: Newton method & SQP

S´ ebastien Gros

AWESCO PhD course

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 2 / 32

slide-3
SLIDE 3

Survival map of Direct Optimal Control

OCP Collocation Single-Shooting Multiple-Shooting NLP Interior-Point Interior-Point SQP Active-Set QP solver QP solver

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 3 / 32

slide-4
SLIDE 4

Survival map of Direct Optimal Control

OCP Collocation Single-Shooting Multiple-Shooting NLP Interior-Point Interior-Point SQP Active-Set QP solver QP solver

Newton - a general-purpose sledgehammer for algebraic equations...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 3 / 32

slide-5
SLIDE 5

Survival map of Direct Optimal Control

OCP Collocation Single-Shooting Multiple-Shooting NLP Interior-Point Interior-Point SQP Active-Set QP solver QP solver

Newton - a general-purpose sledgehammer for algebraic equations... ... will be used to solve the KKT conditions !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 3 / 32

slide-6
SLIDE 6

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 4 / 32

slide-7
SLIDE 7

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 5 / 32

slide-8
SLIDE 8

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-9
SLIDE 9

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-10
SLIDE 10

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-11
SLIDE 11

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-12
SLIDE 12

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-13
SLIDE 13

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Optimality conditions for NLP with equality and/or inequality constraints:

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-14
SLIDE 14

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Optimality conditions for NLP with equality and/or inequality constraints: 1st-Order Necessary Conditions: A (local) optimum w⋆ satisfying LICQ of a (differentiable) NLP corresponds to a unique KKT point

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-15
SLIDE 15

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Optimality conditions for NLP with equality and/or inequality constraints: 1st-Order Necessary Conditions: A (local) optimum w⋆ satisfying LICQ of a (differentiable) NLP corresponds to a unique KKT point 2nd-Order Sufficient Conditions require positivity of the Hessian ∇2

wL in all

critical feasible directions at the solution

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-16
SLIDE 16

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 Most NLP solvers are in essence ”KKT solvers” A point {w∗, µ∗, λ∗ } is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Optimality conditions for NLP with equality and/or inequality constraints: 1st-Order Necessary Conditions: A (local) optimum w⋆ satisfying LICQ of a (differentiable) NLP corresponds to a unique KKT point 2nd-Order Sufficient Conditions require positivity of the Hessian ∇2

wL in all

critical feasible directions at the solution

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 6 / 32

slide-17
SLIDE 17

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 7 / 32

slide-18
SLIDE 18

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-19
SLIDE 19

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-20
SLIDE 20

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-21
SLIDE 21

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-22
SLIDE 22

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-23
SLIDE 23

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-24
SLIDE 24

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-25
SLIDE 25

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-26
SLIDE 26

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-27
SLIDE 27

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-28
SLIDE 28

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-29
SLIDE 29

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-30
SLIDE 30

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0 This is a full-step Newton iteration

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step w ← w + ∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-31
SLIDE 31

Core idea

Goal: solve r (w) = 0... how ?!?

  • 0.4
  • 0.2

0.2

w w r (w) Key idea: guess w, iterate the linear model: r (w + ∆w) ≈ r (w) + ∇r (w)⊤ ∆w = 0 This is a full-step Newton iteration Reduced steps are often needed

Algorithm: Newton method Input: w, tol while r (w) ∞ ≥ tol do Compute r (w) and ∇r (w) Compute the Newton direction ∇r (w)T ∆w = −r (w) Newton step, t ∈ ]0, 1] w ← w + t∆w return w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 8 / 32

slide-32
SLIDE 32

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-33
SLIDE 33

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-34
SLIDE 34

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-35
SLIDE 35

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-36
SLIDE 36

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-37
SLIDE 37

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-38
SLIDE 38

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-39
SLIDE 39

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 1 w w r (w) The full-step Newton iteration can be unstable !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-40
SLIDE 40

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 0.8 w w r (w) The full-step Newton iteration can be unstable !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-41
SLIDE 41

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 0.8 w w r (w) The full-step Newton iteration can be unstable !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-42
SLIDE 42

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 0.8 w w r (w) The full-step Newton iteration can be unstable !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-43
SLIDE 43

Why reduced steps ?

Newton step with t ∈ ]0, 1]: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 t = 0.8 w w r (w) The full-step Newton iteration can be unstable !! While the reduced-steps Newton iteration is stable...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 9 / 32

slide-44
SLIDE 44

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ?

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-45
SLIDE 45

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ?

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-46
SLIDE 46

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-47
SLIDE 47

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but Proof: r (w + t∆w) < r (w) holds for some t > 0 if d dt r (w + t∆w) 2

  • t=0

< 0 with r (w) 2 differentiable.

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-48
SLIDE 48

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but Proof: r (w + t∆w) < r (w) holds for some t > 0 if d dt r (w + t∆w) 2

  • t=0

< 0 with r (w) 2 differentiable. I.e. 2r (w)T d dt r (w + t∆w)t=0 < 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-49
SLIDE 49

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but Proof: r (w + t∆w) < r (w) holds for some t > 0 if d dt r (w + t∆w) 2

  • t=0

< 0 with r (w) 2 differentiable. I.e. 2r (w)T d dt r (w + t∆w)t=0 < 0 We have d dt r (w + t∆w)t=0 = ∇r (w)T∆w = −∇r (w)T∇r (w)−Tr (w) = −r (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-50
SLIDE 50

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but Proof: r (w + t∆w) < r (w) holds for some t > 0 if d dt r (w + t∆w) 2

  • t=0

< 0 with r (w) 2 differentiable. I.e. 2r (w)T d dt r (w + t∆w)t=0 < 0 We have d dt r (w + t∆w)t=0 = ∇r (w)T∆w = −∇r (w)T∇r (w)−Tr (w) = −r (w) Then d dt r (w + t∆w) 2

  • t=0

= −2r (w) 2 < 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-51
SLIDE 51

Does Newton always work ?

Is the Newton step ∆w always providing a direction ”improving” r (w) ? I.e. is there always a t > 0 s.t. r (w + t∆w) < r (w) is true ? Yes... but How to select the step size t ∈ ]0, 1] ? Globalization... Line-search: reduce t until some criteria of progression on r are met Trust region: confine the step ∆w within a region where ∇r (w) provides a good model of r (w) Filter techniques: monitor progress on specific components of r (w) separately ... ... ensures that progress is made in one way or another. Note: most of these techniques are specific to optimization.

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 10 / 32

slide-52
SLIDE 52

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-53
SLIDE 53

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-54
SLIDE 54

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-55
SLIDE 55

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-56
SLIDE 56

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-57
SLIDE 57

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-58
SLIDE 58

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-59
SLIDE 59

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w Newton stops with r (w) = 0 and ∇r (w) singular i.e. the Newton direction ∆w given by ∇r (w)⊤ ∆w = −r (w) is undefined...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-60
SLIDE 60

But still, Newton can fail...

Solve r (w) = 0

r(w)

  • 0.4
  • 0.2

0.2 0.4 0.6

w w Newton stops with r (w) = 0 and ∇r (w) singular i.e. the Newton direction ∆w given by ∇r (w)⊤ ∆w = −r (w) is undefined... This is a common failure mode for Newton-based solvers when tackling very non-linear r and starting with a poor initial guess !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 11 / 32

slide-61
SLIDE 61

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-62
SLIDE 62

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-63
SLIDE 63

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-64
SLIDE 64

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-65
SLIDE 65

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

Theorem: assume Nonlinearity of r:

  • M−1

k

  • ∇r(w)T − ∇r(w∗)T
  • ≤ ω w − w∗, for

w ∈ [wk, w⋆]

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-66
SLIDE 66

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

Theorem: assume Nonlinearity of r:

  • M−1

k

  • ∇r(w)T − ∇r(w∗)T
  • ≤ ω w − w∗, for

w ∈ [wk, w⋆] Jacobian approximation error:

  • M−1

k (∇r(wk)T − Mk)

  • ≤ κk < 1
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-67
SLIDE 67

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

Theorem: assume Nonlinearity of r:

  • M−1

k

  • ∇r(w)T − ∇r(w∗)T
  • ≤ ω w − w∗, for

w ∈ [wk, w⋆] Jacobian approximation error:

  • M−1

k (∇r(wk)T − Mk)

  • ≤ κk < 1

Good initial guess w0 − w∗ ≤ 2

ω (1 − max {κk})

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-68
SLIDE 68

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

Theorem: assume Nonlinearity of r:

  • M−1

k

  • ∇r(w)T − ∇r(w∗)T
  • ≤ ω w − w∗, for

w ∈ [wk, w⋆] Jacobian approximation error:

  • M−1

k (∇r(wk)T − Mk)

  • ≤ κk < 1

Good initial guess w0 − w∗ ≤ 2

ω (1 − max {κk})

Then wk → w∗ with the following linear-quadratic contraction in each iteration: wk+1 − w∗ ≤

  • κk + ω

2 wk − w∗

  • wk − w∗.
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-69
SLIDE 69

Convergence of full-step Newton methods

Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − ∇r (wk)−⊤ r (wk) Newton-type method (Jacobian approx.) M∆w = −r (w) w ← w + ∆w Yields the iteration k = 0, 1, ....: wk+1 ← wk − M−1

k r (wk)

Theorem: assume Nonlinearity of r:

  • M−1

k

  • ∇r(w)T − ∇r(w∗)T
  • ≤ ω w − w∗, for

w ∈ [wk, w⋆] Jacobian approximation error:

  • M−1

k (∇r(wk)T − Mk)

  • ≤ κk < 1

Good initial guess w0 − w∗ ≤ 2

ω (1 − max {κk})

Then wk → w∗ with the following linear-quadratic contraction in each iteration: wk+1 − w∗ ≤

  • κk + ω

2 wk − w∗

  • wk − w∗.

What about reduced steps ? Slow convergence when t < 1 (damped phase). When full steps become feasible, fast convergence to the solution.

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 12 / 32

slide-70
SLIDE 70

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-71
SLIDE 71

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1]

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-72
SLIDE 72

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] Inexact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] if M > 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-73
SLIDE 73

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] Inexact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] if M > 0 Exact full (t = 1) Newton steps converge quadratically if close enough to the solution

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-74
SLIDE 74

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] Inexact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] if M > 0 Exact full (t = 1) Newton steps converge quadratically if close enough to the solution Inexact full (t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good”

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-75
SLIDE 75

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] Inexact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] if M > 0 Exact full (t = 1) Newton steps converge quadratically if close enough to the solution Inexact full (t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇r becomes singular

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-76
SLIDE 76

Newton methods - Short Survival Guide

Exact Newton method: ∇r (w)⊤ ∆w = −r (w) w ← w + t∆w Newton-type method M∆w = −r (w) w ← w + t∆w Exact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] Inexact Newton direction ∆w improves r for a sufficiently small step size t ∈ ]0, 1] if M > 0 Exact full (t = 1) Newton steps converge quadratically if close enough to the solution Inexact full (t = 1) Newton steps converge linearly if close enough to the solution and if the Jacobian approximation is ”sufficiently good” Newton iteration fails if ∇r becomes singular Newton methods with globalization converge in two phases: damped (slow) phase when reduced steps (t < 1) are needed, quadratic/ linear when full steps are possible.

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 13 / 32

slide-77
SLIDE 77

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 14 / 32

slide-78
SLIDE 78

Core idea

A vast majority of solvers try to find a KKT point w, µ, λ i.e: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ... where L = Φ (w) + λ⊤g (w) + µ⊤h (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 15 / 32

slide-79
SLIDE 79

Core idea

A vast majority of solvers try to find a KKT point w, µ, λ i.e: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ... where L = Φ (w) + λ⊤g (w) + µ⊤h (w) Let’s consider for now equality constrained problems, i.e. find w, λ s.t.: ∇wL(w, λ) = 0 g(w) = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 15 / 32

slide-80
SLIDE 80

Core idea

A vast majority of solvers try to find a KKT point w, µ, λ i.e: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ... where L = Φ (w) + λ⊤g (w) + µ⊤h (w) Let’s consider for now equality constrained problems, i.e. find w, λ s.t.: ∇wL(w, λ) = 0 g(w) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... r (w, λ) = ∇wL(w, λ) g(w)

  • = 0
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 15 / 32

slide-81
SLIDE 81

Core idea

A vast majority of solvers try to find a KKT point w, µ, λ i.e: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ... where L = Φ (w) + λ⊤g (w) + µ⊤h (w) Let’s consider for now equality constrained problems, i.e. find w, λ s.t.: ∇wL(w, λ) = 0 g(w) = 0 Idea: apply the Newton method on the KKT conditions, i.e. Solve... r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

... by iterating ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 15 / 32

slide-82
SLIDE 82

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-83
SLIDE 83

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: ∇2

wL(w, λ)∆w

+ ∇w,λL(w, λ)∆λ = −∇wL(w, λ) ∇g(w)T∆w = −g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-84
SLIDE 84

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇w,λL(w, λ)∆λ = −∇wL(w, λ) ∇g(w)T∆w = −g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-85
SLIDE 85

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w)∆λ = −∇wL(w, λ) ∇g(w)T∆w = −g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-86
SLIDE 86

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w)∆λ = −∇Φ(w) − ∇g(w)λ ∇g(w)T∆w = −g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-87
SLIDE 87

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-88
SLIDE 88

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

The Newton direction on the KKT conditions

∇2

wL(w, λ)

∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)
  • ∆w

λ + ∆λ

  • = −

∇Φ(w) g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-89
SLIDE 89

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

The Newton direction on the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)
  • ∆w

λ + ∆λ

  • = −

∇Φ(w) g(w)

  • where H (w, λ) = ∇2

wL(w, λ) is the Hessian of the problem.

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-90
SLIDE 90

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

The Newton direction on the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • where H (w, λ) = ∇2

wL(w, λ) is the Hessian of the problem. Note: update of the dual

variable is λ+ = λ + ∆λ

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-91
SLIDE 91

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

The Newton direction on the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • where H (w, λ) = ∇2

wL(w, λ) is the Hessian of the problem. Note: update of the dual

variable is λ+ = λ + ∆λ ∇wL(w, λ) is not needed for computing the Newton step

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-92
SLIDE 92

Newton method on the KKT conditions

KKT conditions r (w, λ) = ∇wL(w, λ) g(w)

  • = 0

Newton direction ∇r (w, λ)T ∆w ∆λ

  • = −r (w, λ)

Given by: using ∇wL(w, λ) = ∇Φ(w) + ∇g(w)λ ∇2

wL(w, λ)∆w

+ ∇g(w) (λ + ∆λ) = −∇Φ(w) ∇g(w)T∆w = −g(w)

The Newton direction on the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • where H (w, λ) = ∇2

wL(w, λ) is the Hessian of the problem. Note: update of the dual

variable is λ+ = λ + ∆λ ∇wL(w, λ) is not needed for computing the Newton step The updated dual variables λ+ are readily provided !

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 16 / 32

slide-93
SLIDE 93

Newton Iteration for Optimization - Example

min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-94
SLIDE 94

Newton Iteration for Optimization - Example

Iterate:

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-95
SLIDE 95

Newton Iteration for Optimization - Example

Iterate:

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • with:

∇g (w) = 2w = 2w1 2w2

  • min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-96
SLIDE 96

Newton Iteration for Optimization - Example

Iterate:

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • with:

∇g (w) = 2w = 2w1 2w2

  • L (w, λ) = Φ (w) + λg (w)

∇wL (w, λ) = 2 1 1 4

  • w +

1

  • + 2λw

min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-97
SLIDE 97

Newton Iteration for Optimization - Example

Iterate:

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • with:

∇g (w) = 2w = 2w1 2w2

  • L (w, λ) = Φ (w) + λg (w)

∇wL (w, λ) = 2 1 1 4

  • w +

1

  • + 2λw

H (w, λ) = 2 + 2λ 1 1 4 + 2λ

  • min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-98
SLIDE 98

Newton Iteration for Optimization - Example

Iterate:

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • with:

∇g (w) = 2w = 2w1 2w2

  • L (w, λ) = Φ (w) + λg (w)

∇wL (w, λ) = 2 1 1 4

  • w +

1

  • + 2λw

H (w, λ) = 2 + 2λ 1 1 4 + 2λ

  • ∇Φ (w) =
  • 2w1 + w2 + 1

w1 + 4w2

  • min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0
  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-99
SLIDE 99

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-100
SLIDE 100

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-101
SLIDE 101

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-102
SLIDE 102

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-103
SLIDE 103

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-104
SLIDE 104

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-105
SLIDE 105

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-106
SLIDE 106

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-107
SLIDE 107

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-108
SLIDE 108

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-109
SLIDE 109

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-110
SLIDE 110

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-111
SLIDE 111

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-112
SLIDE 112

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-113
SLIDE 113

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-114
SLIDE 114

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-115
SLIDE 115

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-116
SLIDE 116

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-117
SLIDE 117

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-118
SLIDE 118

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-119
SLIDE 119

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-120
SLIDE 120

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-121
SLIDE 121

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-122
SLIDE 122

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-123
SLIDE 123

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-124
SLIDE 124

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-125
SLIDE 125

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-126
SLIDE 126

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-127
SLIDE 127

Newton Iteration for Optimization - Example

Algorithm: Newton method

Input: guess w, λ while ∇L or g ≥ tol do Compute H (w, λ) , ∇g (w) , ∇Φ (w) , g (w) Compute Newton direction

  • H

∇g ∇gT ∆w λ+

  • = −

∇Φ g

  • ∆λ = λ+ − λ

Compute Newton step, t ∈ ]0, 1] w ← w + t∆w, λ ← λ + t∆λ return w, λ min

w

1 2wT 2 1 1 4

  • w + wT

1

  • s.t. g (w) = wTw − 1 = 0

Guess λ = 0, step t = 1

  • 2
  • 1

1 2

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

w1 w1 Your initial guess matters !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 17 / 32

slide-128
SLIDE 128

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-129
SLIDE 129

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-130
SLIDE 130

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if

∇g(w) is full column rank (LICQ)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-131
SLIDE 131

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if

∇g(w) is full column rank (LICQ) ∀d = 0, such that ∇g(w)⊤d = 0 d⊤H (w, λ) d > 0 (SOSC)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-132
SLIDE 132

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if

∇g(w) is full column rank (LICQ) ∀d = 0, such that ∇g(w)⊤d = 0 d⊤H (w, λ) d > 0 (SOSC) If (w, λ) is LICQ & SOSC, then the KKT matrix is invertible in a neighborhood of (w, λ)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-133
SLIDE 133

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if

∇g(w) is full column rank (LICQ) ∀d = 0, such that ∇g(w)⊤d = 0 d⊤H (w, λ) d > 0 (SOSC) If (w, λ) is LICQ & SOSC, then the KKT matrix is invertible in a neighborhood of (w, λ) If LICQ & SOSC hold at the solution, then the Newton iteration is well defined in its neighborhood

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-134
SLIDE 134

Invertibility of the KKT matrix

The Newton direction of the KKT conditions

H (w, λ) ∇g(w) ∇g(w)T

  • KKT matrix (symmetric indefinite)

∆w λ+

  • = −

∇Φ(w) g(w)

  • The KKT matrix is invertible if

∇g(w) is full column rank (LICQ) ∀d = 0, such that ∇g(w)⊤d = 0 d⊤H (w, λ) d > 0 (SOSC) If (w, λ) is LICQ & SOSC, then the KKT matrix is invertible in a neighborhood of (w, λ) In practice, when the solution fails LICQ/SOSC, it is common to

  • bserve the solver struggling

numerically, as the KKT matrix becomes increasingly ill-conditioned !! If LICQ & SOSC hold at the solution, then the Newton iteration is well defined in its neighborhood

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 18 / 32

slide-135
SLIDE 135

Quadratic model interpretation

Problem: min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 19 / 32

slide-136
SLIDE 136

Quadratic model interpretation

Problem: min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • The Newton direction is also given by the Quadratic Program (QP):

min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 19 / 32

slide-137
SLIDE 137

Quadratic model interpretation

Problem: min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • The Newton direction is also given by the Quadratic Program (QP):

min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 Dual variables λ+ given by the dual variables of the QP, i.e. λ+ = λQP

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 19 / 32

slide-138
SLIDE 138

Quadratic model interpretation

Problem: min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • The Newton direction is also given by the Quadratic Program (QP):

min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 Dual variables λ+ given by the dual variables of the QP, i.e. λ+ = λQP Proof: the KKT conditions of the QP are equivalent to the system providing the Newton direction

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 19 / 32

slide-139
SLIDE 139

Quadratic model interpretation

Problem: min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • The Newton direction is also given by the Quadratic Program (QP):

min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 Dual variables λ+ given by the dual variables of the QP, i.e. λ+ = λQP Proof: the KKT conditions of the QP are equivalent to the system providing the Newton direction The Newton direction is given by solving a quadratic models of the original problem !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 19 / 32

slide-140
SLIDE 140

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 20 / 32

slide-141
SLIDE 141

What about inequality constraints ?

Find the ”primal-dual” variables x, µ, λ such that: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 21 / 32

slide-142
SLIDE 142

What about inequality constraints ?

Find the ”primal-dual” variables x, µ, λ such that: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ...

  • 5
  • 4
  • 3
  • 2
  • 1

1

  • 1

1 2 3 4 5

hi(w) µi hi(w) not active hi(w) active Solution manifold of µihi(w) = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 21 / 32

slide-143
SLIDE 143

What about inequality constraints ?

Find the ”primal-dual” variables x, µ, λ such that: Primal Feasibility: g (w) = 0, h (w) ≤ 0, Dual Feasibility: ∇wL (w, µ, λ ) = 0, µ ≥ 0, Complementarity Slackness: µihi(w) = 0, i = 1, ...

  • 5
  • 4
  • 3
  • 2
  • 1

1

  • 1

1 2 3 4 5

hi(w) µi hi(w) not active hi(w) active Solution manifold of µihi(w) = 0

Manifold generated by the Complementary Slackness condition is not smooth, Newton can not be used !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 21 / 32

slide-144
SLIDE 144

Quadratic model interpretation

NLP min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • with H(w, λ) = ∇2

wL(w, λ)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 22 / 32

slide-145
SLIDE 145

Quadratic model interpretation

NLP min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • with H(w, λ) = ∇2

wL(w, λ)

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 22 / 32

slide-146
SLIDE 146

Quadratic model interpretation

NLP min

w

Φ (w) s.t. g (w) = 0 The Newton direction is given by H(w, λ) ∇g (w) ∇g (w)T ∆w λ+

  • = −

∇Φ (w) g (w)

  • with H(w, λ) = ∇2

wL(w, λ)

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 Dual variables λ+ given by the dual variables of the QP, i.e. λ+ = λQP

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 22 / 32

slide-147
SLIDE 147

Quadratic interpretation for inequality constraints

Problem: min

w

Φ (w) s.t. g (w) = 0 s.t. h (w) ≤ 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 23 / 32

slide-148
SLIDE 148

Quadratic interpretation for inequality constraints

Problem: min

w

Φ (w) s.t. g (w) = 0 s.t. h (w) ≤ 0 The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 23 / 32

slide-149
SLIDE 149

Quadratic interpretation for inequality constraints

Problem: min

w

Φ (w) s.t. g (w) = 0 s.t. h (w) ≤ 0 The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

Dual variables λ+ and µ+ given by the dual variables of the QP, i.e. λ+ = λQP, µ+ = µQP

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 23 / 32

slide-150
SLIDE 150

SQP Algorithm

Algorithm: SQP with line-search

Input: guess w, λ, µ while ∇L∞ or g∞ or max (0, hi) ≥ tol do Compute g, h, ∇Φ(w), ∇g(w), ∇h(w), H (w, µ, λ) Compute Newton direction by solving the QP min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 Select step size t to ensure progress (c.f. globalization / line-search) Take primal step: w ← w + t∆w Take dual step: λ ← (1 − t)λ + tλQP, µ ← (1 − t)µ + tµQP return w, λ, µ

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 24 / 32

slide-151
SLIDE 151

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0 QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-152
SLIDE 152

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0 µ2 = 0

  • Res. = 2.21

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-153
SLIDE 153

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0 µ2 = 0

  • Res. = 2.21

Linearized constraints

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-154
SLIDE 154

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0 µ2 = 0

  • Res. = 2.21

Contours of QP cost

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-155
SLIDE 155

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0 µ2 = 0

  • Res. = 2.21

Step with t = 1

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-156
SLIDE 156

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.48789 µ2 = 0.4285

  • Res. = 0.4025

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-157
SLIDE 157

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.48789 µ2 = 0.4285

  • Res. = 0.4025

Linearized constraints

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-158
SLIDE 158

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.48789 µ2 = 0.4285

  • Res. = 0.4025

Contours of QP cost

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-159
SLIDE 159

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.48789 µ2 = 0.4285

  • Res. = 0.4025

Step with t = 1

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-160
SLIDE 160

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.60331 µ2 = 0.25659

  • Res. = 0.12986

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-161
SLIDE 161

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.60331 µ2 = 0.25659

  • Res. = 0.12986

Linearized constraints

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-162
SLIDE 162

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.60331 µ2 = 0.25659

  • Res. = 0.12986

Contours of QP cost

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-163
SLIDE 163

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.60331 µ2 = 0.25659

  • Res. = 0.12986

Step with t = 1

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-164
SLIDE 164

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68358 µ2 = 0.16672

  • Res. = 0.013072

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-165
SLIDE 165

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68358 µ2 = 0.16672

  • Res. = 0.013072

Linearized constraints

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-166
SLIDE 166

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68358 µ2 = 0.16672

  • Res. = 0.013072

Contours of QP cost

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-167
SLIDE 167

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68358 µ2 = 0.16672

  • Res. = 0.013072

Step with t = 1

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-168
SLIDE 168

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68999 µ2 = 0.16001

  • Res. = 3.0916e-05

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-169
SLIDE 169

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68999 µ2 = 0.16001

  • Res. = 3.0916e-05

Linearized constraints

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-170
SLIDE 170

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68999 µ2 = 0.16001

  • Res. = 3.0916e-05

Contours of QP cost

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-171
SLIDE 171

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.68999 µ2 = 0.16001

  • Res. = 3.0916e-05

Step with t = 1

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-172
SLIDE 172

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

µ1 = 0.69 µ2 = 0.16

  • Res. = 6.0624e-11

QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-173
SLIDE 173

SQP - Illustration

NLP: min

w

1 2 w − w02

Q

s.t. h (w) ≤ 0

1 2 3 4 5 6 10-10 10-5 100 1 2 3 4 5 0.5 1

Res

Iteration Iteration

Step-size t QP: min

w

1 2∆w⊤H (w, µ) ∆w + ∇Φ (w)⊤ ∆w s.t. h (w) + ∇h (w)⊤ ∆w ≤ 0 Hessian: H (w, µ) = ∇2

wΦ (w) + ∇2 w

  • µ⊤h (w)
  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 25 / 32

slide-174
SLIDE 174

Some remarks

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 26 / 32

slide-175
SLIDE 175

Some remarks

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

SQP inherits the convergence properties of the Newton method

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 26 / 32

slide-176
SLIDE 176

Some remarks

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

SQP inherits the convergence properties of the Newton method What happens if SOSC fails during the iterations ? I.e. for an iterate w, λ, µ: d⊤H(w, λ, µ)d ≯ 0 for some d = 0 being a critical feasible direction ?

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 26 / 32

slide-177
SLIDE 177

Some remarks

The Newton direction is given by the Quadratic Program (QP): min

∆w

1 2∆wTH(w, λ, µ)∆w + ∇Φ (w)T ∆w s.t. g (w) + ∇g (w)T ∆w = 0 h (w) + ∇h (w)T ∆w ≤ 0 with H(w, λ) = ∇2

wL(w, λ)

SQP inherits the convergence properties of the Newton method What happens if SOSC fails during the iterations ? I.e. for an iterate w, λ, µ: d⊤H(w, λ, µ)d ≯ 0 for some d = 0 being a critical feasible direction ? QP unbounded !! Heuristics are used in SQP methods to modify H(w, λ, µ) and recover an adequate curvature in the QP cost (regularization).

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 26 / 32

slide-178
SLIDE 178

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 27 / 32

slide-179
SLIDE 179

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-180
SLIDE 180

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-181
SLIDE 181

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if:

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-182
SLIDE 182

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if: constraints are close to linear or Φ(w⋆) ≈ 0 (implies λ, µ ≈ 0)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-183
SLIDE 183

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if: constraints are close to linear or Φ(w⋆) ≈ 0 (implies λ, µ ≈ 0) and

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-184
SLIDE 184

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if: constraints are close to linear or Φ(w⋆) ≈ 0 (implies λ, µ ≈ 0) and all ∇2Ri(w) are small (R close to linear), or all Ri(w) are small, i.e. Φ(w⋆) ≈ 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-185
SLIDE 185

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if: constraints are close to linear or Φ(w⋆) ≈ 0 (implies λ, µ ≈ 0) and all ∇2Ri(w) are small (R close to linear), or all Ri(w) are small, i.e. Φ(w⋆) ≈ 0 Typical application to tracking & fitting problems: R(w) = y(w) − ¯ y

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-186
SLIDE 186

Newton-type Methods - Gauss-Newton Hessian approximation

Cost function of the type Φ(w) = 1

2R(w)2, with R(w) ∈ I

Rm

Gauss-Newton Hessian approximation

Observe that ∇2

wΦ (w) =

∂ ∂w (∇R(w)R(w)) = ∇R(w)∇R(w)⊤ +

m

  • i=1

∇2Ri(w)Ri(w) Gauss-Newton method proposes to use: Bk = ∇R(wk)∇R(wk)T + αkI Bk is a good approximation if: constraints are close to linear or Φ(w⋆) ≈ 0 (implies λ, µ ≈ 0) and all ∇2Ri(w) are small (R close to linear), or all Ri(w) are small, i.e. Φ(w⋆) ≈ 0 Typical application to tracking & fitting problems: R(w) = y(w) − ¯ y

Convergence

If Φ (wk) → 0 then κk → 0 Can get superlinear convergence...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 28 / 32

slide-187
SLIDE 187

Newton-type Methods - BFGS

Compute numerical derivative of H(w) in an efficient (iterative) way

BFGS

Define sk = wk+1 − wk yk = ∇L(wk+1) − ∇L(wk) Idea: Update Bk → Bk+1 such that Bk+1sk = yk (secant condition)

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 29 / 32

slide-188
SLIDE 188

Newton-type Methods - BFGS

Compute numerical derivative of H(w) in an efficient (iterative) way

BFGS

Define sk = wk+1 − wk yk = ∇L(wk+1) − ∇L(wk) Idea: Update Bk → Bk+1 such that Bk+1sk = yk (secant condition) BFGS formula: Bk+1 = Bk − BkssTBk sTBks + yyT sTy , B0 = I

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 29 / 32

slide-189
SLIDE 189

Newton-type Methods - BFGS

Compute numerical derivative of H(w) in an efficient (iterative) way

BFGS

Define sk = wk+1 − wk yk = ∇L(wk+1) − ∇L(wk) Idea: Update Bk → Bk+1 such that Bk+1sk = yk (secant condition) BFGS formula: Bk+1 = Bk − BkssTBk sTBks + yyT sTy , B0 = I See ”Powell’s trick” to make sure that Bk+1 > 0

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 29 / 32

slide-190
SLIDE 190

Newton-type Methods - BFGS

Compute numerical derivative of H(w) in an efficient (iterative) way

BFGS

Define sk = wk+1 − wk yk = ∇L(wk+1) − ∇L(wk) Idea: Update Bk → Bk+1 such that Bk+1sk = yk (secant condition) BFGS formula: Bk+1 = Bk − BkssTBk sTBks + yyT sTy , B0 = I See ”Powell’s trick” to make sure that Bk+1 > 0

Convergence

It can be shown that Bk → H(w), then κk → 0 Can get superlinear convergence...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 29 / 32

slide-191
SLIDE 191

Outline

1

KKT conditions - Quick Reminder

2

The Newton method

3

Newton on the KKT conditions

4

Sequential Quadratic Programming

5

Hessian approximation

6

Maratos effect

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 30 / 32

slide-192
SLIDE 192

Maratos effect - Some NLPs can yield ”creeping” convergence

5 10 15 20 25 30 35 10-10 10-5 100 5 10 15 20 25 30 35 0.5 1

Res

Iteration Iteration

Step-size t

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 31 / 32

slide-193
SLIDE 193

Maratos effect - Some NLPs can yield ”creeping” convergence

5 10 15 20 25 30 35 10-10 10-5 100 5 10 15 20 25 30 35 0.5 1

Res

Iteration Iteration

Step-size t What is going on ?!? This is a case of the Maratos effect, can happen with nonlinear constraints...

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 31 / 32

slide-194
SLIDE 194

Maratos effect

Consider the NLP : min

u,v

Φ = 3v 2 − 2u s.t. g = u − v 2 = 0 Optimum: w∗ =

  • .

Consider the iterate: wk =

  • a2

a

  • The Newton step is:

∆wk = −

  • 2a2

a

  • for λ = 2...

Define: w =

  • u

v

  • −0.25

−0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25

wk wk+1 ∆wk w∗ g = 0 u v

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 32 / 32

slide-195
SLIDE 195

Maratos effect

Consider the NLP : min

u,v

Φ = 3v 2 − 2u s.t. g = u − v 2 = 0 Optimum: w∗ =

  • .

Consider the iterate: wk =

  • a2

a

  • The Newton step is:

∆wk = −

  • 2a2

a

  • for λ = 2...

The full Newton step: wk+1 = wk + ∆wk is much closer to w∗ than wk. Define: w =

  • u

v

  • −0.25

−0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25

wk wk+1 ∆wk w∗ g = 0 u v

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 32 / 32

slide-196
SLIDE 196

Maratos effect

Consider the NLP : min

u,v

Φ = 3v 2 − 2u s.t. g = u − v 2 = 0 Optimum: w∗ =

  • .

Consider the iterate: wk =

  • a2

a

  • The Newton step is:

∆wk = −

  • 2a2

a

  • for λ = 2...

The full Newton step: wk+1 = wk + ∆wk is much closer to w∗ than wk. Define: w =

  • u

v

  • −0.25

−0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25

wk wk+1 ∆wk w∗ g = 0 u v But: Φ(wk+1) > Φ(wk) |g(wk+1)| > |g(wk)| No penalty function can accept ∆wk !!

  • S. Gros

Optimal Control with DAEs, lecture 5 17th of February, 2016 32 / 32