Constrained Nonlinear Optimization Moritz Diehl & S ebastien - - PowerPoint PPT Presentation

constrained nonlinear optimization
SMART_READER_LITE
LIVE PREVIEW

Constrained Nonlinear Optimization Moritz Diehl & S ebastien - - PowerPoint PPT Presentation

Constrained Nonlinear Optimization Moritz Diehl & S ebastien Gros S. Gros, M. Diehl 1 / 12 Outline KKT conditions 1 Some intuitions on the KKT conditions 2 Second Order Sufficient Conditions (SOSC) 3 S. Gros, M. Diehl 2 / 12


slide-1
SLIDE 1

Constrained Nonlinear Optimization

Moritz Diehl & S´ ebastien Gros

  • S. Gros, M. Diehl

1 / 12

slide-2
SLIDE 2

Outline

1

KKT conditions

2

Some intuitions on the KKT conditions

3

Second Order Sufficient Conditions (SOSC)

  • S. Gros, M. Diehl

2 / 12

slide-3
SLIDE 3

Outline

1

KKT conditions

2

Some intuitions on the KKT conditions

3

Second Order Sufficient Conditions (SOSC)

  • S. Gros, M. Diehl

3 / 12

slide-4
SLIDE 4

Algebraic Characterization of Unconstrained Local Optima

Consider the unconstrained problem: minw Φ (w)

1st-Order Necessary Condition of Optimality (FONC)

w∗ local optimum ⇒ ∇Φ(w∗) = 0, w∗ stationary point

2nd-Order Sufficient Conditions of Optimality (SOSC)

NLP: ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≻ 0 ⇒ x∗ strict local minimum ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≺ 0 ⇒ x∗ strict local maximum No conclusion can be drawn in the case ∇2Φ(w∗) is indefinite!

  • S. Gros, M. Diehl

4 / 12

slide-5
SLIDE 5

Algebraic Characterization of Unconstrained Local Optima

Consider the unconstrained problem: minw Φ (w)

1st-Order Necessary Condition of Optimality (FONC)

w∗ local optimum ⇒ ∇Φ(w∗) = 0, w∗ stationary point

2nd-Order Sufficient Conditions of Optimality (SOSC)

NLP: ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≻ 0 ⇒ x∗ strict local minimum ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≺ 0 ⇒ x∗ strict local maximum No conclusion can be drawn in the case ∇2Φ(w∗) is indefinite!

  • S. Gros, M. Diehl

4 / 12

slide-6
SLIDE 6

Algebraic Characterization of Unconstrained Local Optima

Consider the unconstrained problem: minw Φ (w)

1st-Order Necessary Condition of Optimality (FONC)

w∗ local optimum ⇒ ∇Φ(w∗) = 0, w∗ stationary point

2nd-Order Sufficient Conditions of Optimality (SOSC)

NLP: ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≻ 0 ⇒ x∗ strict local minimum ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≺ 0 ⇒ x∗ strict local maximum No conclusion can be drawn in the case ∇2Φ(w∗) is indefinite! Note: ∇Φ(w∗) = 0 then ∄ d such that ∇Φ(w∗)Td < 0 ∇2Φ ≻ 0 then ∀ d = 0, dT∇2Φ(w∗)d > 0

  • S. Gros, M. Diehl

4 / 12

slide-7
SLIDE 7

Algebraic Characterization of Unconstrained Local Optima

Consider the unconstrained problem: minw Φ (w)

1st-Order Necessary Condition of Optimality (FONC)

w∗ local optimum ⇒ ∇Φ(w∗) = 0, w∗ stationary point

2nd-Order Sufficient Conditions of Optimality (SOSC)

NLP: ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≻ 0 ⇒ x∗ strict local minimum ∇Φ(w∗) = 0 and ∇2Φ(w∗) ≺ 0 ⇒ x∗ strict local maximum No conclusion can be drawn in the case ∇2Φ(w∗) is indefinite! Note: ∇Φ(w∗) = 0 then ∄ d such that ∇Φ(w∗)Td < 0 ∇2Φ ≻ 0 then ∀ d = 0, dT∇2Φ(w∗)d > 0 Local optimum: ”No direction d can improve the cost (locally)”

  • S. Gros, M. Diehl

4 / 12

slide-8
SLIDE 8

FONC for equality constraints

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0

  • S. Gros, M. Diehl

5 / 12

slide-9
SLIDE 9

FONC for equality constraints

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 Definition: a point w satisfies LICQa iff ∇g (w) is full column rank

aLinear Independence Constraint Qualification

  • S. Gros, M. Diehl

5 / 12

slide-10
SLIDE 10

FONC for equality constraints

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 Definition: a point w satisfies LICQa iff ∇g (w) is full column rank

aLinear Independence Constraint Qualification

First-order Necessary Conditions

Let Φ, g in C1. If w∗ is a (local) optimum, and w∗ satisfies LICQ, then there is a unique vector λ such that: Dual feasibility: ∇Φ(w∗) + ∇g(w∗)λ = 0 Primal feasibility: g(w∗) = 0

  • S. Gros, M. Diehl

5 / 12

slide-11
SLIDE 11

FONC for equality constraints

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 Definition: a point w satisfies LICQa iff ∇g (w) is full column rank

aLinear Independence Constraint Qualification

Square system: (n + m) conditions in (n + m) variables (w, λ) Lagrange multipliers: λi ↔ gi Dual feasibility ≡ Lagrangian stationarity: ∇L(w∗, λ∗) = 0 where L(w, λ)

= Φ(w) + λTg(w) is the Lagrangian

First-order Necessary Conditions

Let Φ, g in C1. If w∗ is a (local) optimum, and w∗ satisfies LICQ, then there is a unique vector λ such that: Dual feasibility: ∇Φ(w∗) + ∇g(w∗)λ = 0 Primal feasibility: g(w∗) = 0

  • S. Gros, M. Diehl

5 / 12

slide-12
SLIDE 12

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0

  • S. Gros, M. Diehl

6 / 12

slide-13
SLIDE 13

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros, M. Diehl

6 / 12

slide-14
SLIDE 14

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros, M. Diehl

6 / 12

slide-15
SLIDE 15

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros, M. Diehl

6 / 12

slide-16
SLIDE 16

KKT point

Consider the NLP problem: min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros, M. Diehl

6 / 12

slide-17
SLIDE 17

First-Order Necessary Conditions (FONC)

min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0

First-Order Necessary Conditions

Let Φ, g, h be C1. If w∗ is a (local) optimum and satisfies LICQ, then there is a unique vector λ∗ and µ∗ such that (w∗, λ∗, ν∗) is a KKT point.

  • S. Gros, M. Diehl

7 / 12

slide-18
SLIDE 18

First-Order Necessary Conditions (FONC)

min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0

First-Order Necessary Conditions

Let Φ, g, h be C1. If w∗ is a (local) optimum and satisfies LICQ, then there is a unique vector λ∗ and µ∗ such that (w∗, λ∗, ν∗) is a KKT point. Active constraints: hi(w) < 0 then µ∗

i = 0, and hi is inactive

µ∗

i > 0 and hi(w) = 0 then hi(w) is strictly active

µ∗

i = 0 and hi(w) = 0 then then hi(w) is weakly active

We define the active set A∗ as the set of indices i of the active constraints

  • S. Gros, M. Diehl

7 / 12

slide-19
SLIDE 19

First-Order Necessary Conditions (FONC)

min

w

Φ (w) s.t. g (w) = 0 h (w) ≤ 0 Definition: a point w satisfies LICQ iff [∇g (w) , ∇hA∗ (w)] is full column rank

First-Order Necessary Conditions

Let Φ, g, h be C1. If w∗ is a (local) optimum and satisfies LICQ, then there is a unique vector λ∗ and µ∗ such that (w∗, λ∗, ν∗) is a KKT point. Active constraints: hi(w) < 0 then µ∗

i = 0, and hi is inactive

µ∗

i > 0 and hi(w) = 0 then hi(w) is strictly active

µ∗

i = 0 and hi(w) = 0 then then hi(w) is weakly active

We define the active set A∗ as the set of indices i of the active constraints

  • S. Gros, M. Diehl

7 / 12

slide-20
SLIDE 20

Outline

1

KKT conditions

2

Some intuitions on the KKT conditions

3

Second Order Sufficient Conditions (SOSC)

  • S. Gros, M. Diehl

8 / 12

slide-21
SLIDE 21

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

  • S. Gros, M. Diehl

9 / 12

slide-22
SLIDE 22

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

w1 w2

  • S. Gros, M. Diehl

9 / 12

slide-23
SLIDE 23

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

h (w) ≤ 0 w1 w2

  • S. Gros, M. Diehl

9 / 12

slide-24
SLIDE 24

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

h (w) ≤ 0 w1 w2

  • S. Gros, M. Diehl

9 / 12

slide-25
SLIDE 25

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity

h (w) ≤ 0 w1 w2 −∇Φ (w)

  • S. Gros, M. Diehl

9 / 12

slide-26
SLIDE 26

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.77376

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

  • S. Gros, M. Diehl

9 / 12

slide-27
SLIDE 27

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.77376

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-28
SLIDE 28

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.62894

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-29
SLIDE 29

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.49307

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-30
SLIDE 30

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.35875

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-31
SLIDE 31

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.24579

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-32
SLIDE 32

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.13582

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-33
SLIDE 33

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball.

µ = 0.063056

h (w) ≤ 0 w1 w2 −∇Φ (w) −µ∇h (w)

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-34
SLIDE 34

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball. Weakly active constraint: h (w) = 0, µ = 0 the ball touches the fence but no force is needed.

µ = 0

h (w) ≤ 0 w1 w2 ∇Φ (w) = 0 µ = 0, h(w) = 0

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-35
SLIDE 35

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball. Weakly active constraint: h (w) = 0, µ = 0 the ball touches the fence but no force is needed. Inactive constraint h (w) < 0, µ = 0

µ = 0

h (w) ≤ 0 w1 w2 ∇Φ (w) = 0 µ = 0, h(w) < 0

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-36
SLIDE 36

Some intuitions on the KKT conditions

min

w

Φ(x) s.t. h(w) ≤ 0

Ball rolling down a valley blocked by a fence

−∇Φ is the gravity −µ∇h is the force of the fence. Sign µ ≥ 0 means the fence can only ”push” the ball. Weakly active constraint: h (w) = 0, µ = 0 the ball touches the fence but no force is needed. Inactive constraint h (w) < 0, µ = 0 Complementary slackness µh = 0 describes a contact problem (force exists only if the ball touches)

µ = 0

h (w) ≤ 0 w1 w2 ∇Φ (w) = 0 µ = 0, h(w) < 0

Balance of the forces: ∇L = ∇Φ (w) + µ∇h (w) = 0

  • S. Gros, M. Diehl

9 / 12

slide-37
SLIDE 37

Outline

1

KKT conditions

2

Some intuitions on the KKT conditions

3

Second Order Sufficient Conditions (SOSC)

  • S. Gros, M. Diehl

10 / 12

slide-38
SLIDE 38

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w)

  • S. Gros, M. Diehl

11 / 12

slide-39
SLIDE 39

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Let Φ, g, h be C2

  • S. Gros, M. Diehl

11 / 12

slide-40
SLIDE 40

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Let Φ, g, h be C2 Suppose that w∗ is regular and ∃ λ∗, µ∗ such that (w∗, λ∗, ν∗) is a KKT point

  • S. Gros, M. Diehl

11 / 12

slide-41
SLIDE 41

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Let Φ, g, h be C2 Suppose that w∗ is regular and ∃ λ∗, µ∗ such that (w∗, λ∗, ν∗) is a KKT point Set of feasible directions: F =

  • d

| ∇g(w∗)⊤d = 0, ∇hi(w∗)⊤d ≤ 0, ∀i ∈ A∗

  • S. Gros, M. Diehl

11 / 12

slide-42
SLIDE 42

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Let Φ, g, h be C2 Suppose that w∗ is regular and ∃ λ∗, µ∗ such that (w∗, λ∗, ν∗) is a KKT point Set of feasible directions: F =

  • d

| ∇g(w∗)⊤d = 0, ∇hi(w∗)⊤d ≤ 0, ∀i ∈ A∗ If for any d ∈ F \ {0} with ∇hi(w∗)⊤d = 0 for µ∗

i > 0 the inequality:

dT∇2L(w∗, λ∗, ν∗)d > 0 holds

  • S. Gros, M. Diehl

11 / 12

slide-43
SLIDE 43

Second-Order Sufficient Conditions for a Local Minimum

A point (w∗, µ∗, λ∗ ) is called a KKT point if it satisfies: Dual Feasibility: ∇wL (w∗, µ∗, λ∗ ) = 0, µ∗ ≥ 0, Primal Feasibility: g (w∗) = 0, h (w∗) ≤ 0, Complementary Slackness: µ∗

i hi(w∗) = 0,

∀ i where L = Φ (w) + λTg (w) + µTh (w) Let Φ, g, h be C2 Suppose that w∗ is regular and ∃ λ∗, µ∗ such that (w∗, λ∗, ν∗) is a KKT point Set of feasible directions: F =

  • d

| ∇g(w∗)⊤d = 0, ∇hi(w∗)⊤d ≤ 0, ∀i ∈ A∗ If for any d ∈ F \ {0} with ∇hi(w∗)⊤d = 0 for µ∗

i > 0 the inequality:

dT∇2L(w∗, λ∗, ν∗)d > 0 holds Then, w∗ is a local minimum.

  • S. Gros, M. Diehl

11 / 12

slide-44
SLIDE 44

Summary of Optimality Conditions

Optimality conditions for NLP with equality and/or inequality constraints: 1st-Order Necessary Conditions: A regular local optimum of a (differentiable) NLP is a KKT point 2nd-Order Sufficient Conditions require positivity of the Hessian in all critical feasible directions Non-convex problem ⇒ minimum is not necessarily global. But some non-convex problems have a unique minimum. Some important practical consequences... A local (global) optimum may not be a KKT point. A KKT point may not be a local (global) optimum. ... the lack of equivalence results from a lack of regularity and/or SOSC

  • S. Gros, M. Diehl

12 / 12