Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

unconstrained minimization ii
SMART_READER_LITE
LIVE PREVIEW

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn - - PowerPoint PPT Presentation

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Gradient Descent Method Convergence Analysis Examples General Convex Functions Steepest Descent Method Euclidean and Quadratic


slide-1
SLIDE 1

Unconstrained Minimization (II)

Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj

slide-2
SLIDE 2

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-3
SLIDE 3

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-4
SLIDE 4

General Descent Method

 The Algorithm

Given a starting point 𝑦 ∈ dom 𝑔 Repeat

  • 1. Determine a descent direction Δ𝑦.
  • 2. Line search: Choose a step size 𝑢 0.
  • 3. Update: 𝑦 𝑦 𝑢∆𝑦.

until stopping criterion is satisfied.

 Descent Direction

𝛼𝑔 𝑦

Δ𝑦 0

slide-5
SLIDE 5

Gradient Descent Method

 The Algorithm

Given a starting point 𝑦 ∈ dom 𝑔 Repeat

1. Δ𝑦 ≔ 𝛼𝑔𝑦.

  • 2. Line search: Choose step size 𝑢 via exact or

backtracking line search.

  • 3. Update: 𝑦 ≔ 𝑦 𝑢∆𝑦.

until stopping criterion is satisfied.

 Stopping Criterion

slide-6
SLIDE 6

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-7
SLIDE 7

Preliminary

 is both strongly convex and smooth  Define as

 A quadratic upper bound on

slide-8
SLIDE 8

Analysis for Exact Line Search

  • 1. Minimize Both Sides of

 Left side:

, where is the step

length that minimizes  Right side: is the solution

  • 2. Subtracting

∗ from Both Sides

slide-9
SLIDE 9

Analysis for Exact Line Search

is strongly convex on

  • 4. Combining
  • 5. Applying it Recursively

 

  • coverges to

∗ as

⇒ 𝛼𝑔 𝑦

  • 2𝑛 𝑔 𝑦 𝑞∗

slide-10
SLIDE 10

Discussions

 Iteration Complexity

after at most 

indicates that initialization is important  is a function of the condition number  When is large

iterations

log 1/𝑑 log1 𝑛/𝑁 𝑛/𝑁

slide-11
SLIDE 11

Discussions

 Iteration Complexity

after at most 

indicates that initialization is important  is a function of the condition number  When is large

iterations

log 1/𝑑 log1 𝑛/𝑁 𝑛/𝑁

slide-12
SLIDE 12

Discussions

 Iteration Complexity

after at most 

indicates that initialization is important  is a function of the condition number  Linear Convergence

 Error lies below a line on a log-linear plot of error versus iteration number

iterations

slide-13
SLIDE 13

Analysis for Backtracking Line Search

 Backtracking Line Search

given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 𝑢Δ𝑦 𝑔 𝑦 𝛽𝑢𝛼𝑔 𝑦 ∆𝑦, 𝑢 ≔ 𝛾𝑢

  • for all

0 𝑢 1 𝑁 ⇒ 𝑢 𝑁𝑢 2 𝑢 2 𝑔 𝑢 𝑔 𝑦 𝑢 𝛼𝑔 𝑦

  • 𝑁𝑢

2 𝛼𝑔 𝑦

slide-14
SLIDE 14

Analysis for Backtracking Line Search

 Backtracking Line Search

given a descent direction ∆𝑦 for 𝑔 at 𝑦 ∈ 𝐞𝐩𝐧 𝑔, 𝛽 ∈ 0, 0.5 , 𝛾 ∈ 0, 1 𝑢 ≔ 1 w hile 𝑔 𝑦 𝑢Δ𝑦 𝑔 𝑦 𝛽𝑢𝛼𝑔 𝑦 ∆𝑦, 𝑢 ≔ 𝛾𝑢

  • for all

𝑔 𝑢 𝑔 𝑦 𝑢/2 𝛼𝑔 𝑦

  • 𝑔 𝑦 𝛽𝑢 𝛼𝑔 𝑦
slide-15
SLIDE 15

Analysis for Backtracking Line Search

  • 2. Backtracking Line Search Terminates

 Either with  Or with a value  So,

  • 3. Subtracting

∗ from Both Sides

slide-16
SLIDE 16

Analysis for Backtracking Line Search

  • 4. Combining with Strong Convexity
  • 5. Applying it Recursively

  • converges to

∗ with an exponent

that depends on the condition number  Linear Convergence

slide-17
SLIDE 17

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-18
SLIDE 18

A Quadratic Problem in

 A Quadratic Objective Function

 The optimal point

 The optimal value is  The Hessian of is constant and has eigenvalues and 

min1, 𝛿 , 𝑁 max1, 𝛿

 Condition number

slide-19
SLIDE 19

A Quadratic Problem in

 A Quadratic Objective Function  Gradient Descent Method

 Exact line search starting at

  •  Reduced by the factor
  • 𝑦

𝛿 𝛿 1

𝛿 1

  • , 𝑦

𝛿 𝛿 1

𝛿 1

  • 𝑔 𝑦

𝛿 𝛿 1 2 𝛿 1 𝛿 1

  • 𝛿 1

𝛿 1

  • 𝑔𝑦

Convergence is exactly linear

slide-20
SLIDE 20

A Quadratic Problem in

 Comparisons

  From our general analysis, the error is reduced by  From the closed-form solution, the error is reduced by  When is large, the iteration complexity differs by a factor of

𝛿 1 𝛿 1

  • 1 𝑛/𝑁

1 𝑛/𝑁

  • 1

2𝑛/𝑁 1 𝑛/𝑁

  • 1 𝑛

𝑁

slide-21
SLIDE 21

A Quadratic Problem in

 Experiments

 For not far from one, convergence is rapid

slide-22
SLIDE 22

A Non-Quadratic Problem in

 The Objective Function

 Gradient descent method with backtracking line search

 𝛽 0.1, 𝛾 0.7

  • .

. .

slide-23
SLIDE 23

A Non-Quadratic Problem in

 The Objective Function

 Gradient descent method with exact line search

  • .

. .

slide-24
SLIDE 24

A Non-Quadratic Problem in

 Comparisons

 Both are linear, and exact l.s. is faster

slide-25
SLIDE 25

A Problem in

 A Larger Problem

 and  Gradient descent method with backtracking line search

 𝛽 0.1, 𝛾 0.5

 Gradient descent method with exact line search

slide-26
SLIDE 26

A Problem in

 Comparisons

 Both are linear, and exact l.s. is only a bit faster

slide-27
SLIDE 27

Gradient Method and Condition Number

 A Larger Problem

 Replace by

 A Family of Optimization Problems

 Indexed by

  • /

/ /

slide-28
SLIDE 28

Gradient Method and Condition Number

 Number of iterations required to

  • btain
  • Backtracking line search

with 𝛽 0.3 and 𝛾 0.7

slide-29
SLIDE 29

Gradient Method and Condition Number

 The condition number of the Hessian

  • ∗ at the optimum

The larger the condition number, the larger the number of iterations

slide-30
SLIDE 30

Conclusions

  • 1. The gradient method often exhibits

approximately linear convergence.

  • 2. The convergence rate depends greatly on

the condition number of the Hessian, or the sublevel sets.

  • 3. An exact line search sometimes improves

the convergence of the gradient method, but the effect is not large.

  • 4. The choice of backtracking parameters

has a noticeable but not dramatic effect

  • n the convergence.
slide-31
SLIDE 31

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-32
SLIDE 32

General Convex Functions

 is convex  is Lipschitz continuous  Gradient Descent Method

Given a starting point 𝑦 ∈ dom 𝑔 For 𝑙 1,2, … , 𝐿 do

Update: 𝑦 𝑦 𝑢𝛼𝑔𝑦

End for Return

  • 𝛼𝑔 𝑦

𝐻

slide-33
SLIDE 33

Analysis

 Define

  •  Let

𝑔 𝑦 𝑔 𝑦∗ 𝛼𝑔 𝑦 𝑦 𝑦∗ 1 𝜃 𝑦 𝑦 𝑦 𝑦∗ 1 2𝜃 𝑦 𝑦∗

  • 𝑦 𝑦∗
  • 𝑦 𝑦
slide-34
SLIDE 34

Analysis

 Define

  •  Let

𝑔 𝑦 𝑔 𝑦∗ 𝛼𝑔 𝑦 𝑦 𝑦∗ 1 2𝜃 𝑦 𝑦∗

  • 𝑦 𝑦∗
  • 𝜃

2 𝛼𝑔 𝑦

  • 1

2𝜃 𝑦 𝑦∗

  • 𝑦 𝑦∗
  • 𝜃

2 𝐻 1 𝜃 𝑦 𝑦 𝑦 𝑦∗

slide-35
SLIDE 35

Analysis

 So,  Summing over

 Dividing both sides by

𝑔 𝑦 𝑔 𝑦∗ 1 2𝜃 𝑦 𝑦∗

  • 𝑦 𝑦∗
  • 𝜃

2 𝐻

  • 𝑔 𝑦 𝐿𝑔 𝑦∗
  • 1

2𝜃 𝐸 𝜃𝐿 2 𝐻 1 𝐿 𝑔 𝑦 𝑔 𝑦∗

  • 1

𝐿 1 2𝜃 𝐸 𝜃𝐿 2 𝐻 𝐸 2𝜃𝐿 𝜃 2 𝐻

slide-36
SLIDE 36

Analysis

 By Jensen’s Inequality

  • 𝑔 𝑦̅ 𝑔 𝑦∗ 𝑔

1 𝐿 𝑦

  • 𝑔 𝑦∗

1 𝐿 𝑔 𝑦 𝑔 𝑦∗

  • 𝐸

2𝜃𝐿 𝜃 2 𝐻 𝐻𝐸 𝐿

slide-37
SLIDE 37

Discussions

 How to Ensure

  • ?

 Add a Domain Constraint

 Can model any constrained convex

  • ptimization problem

 Gradient Descent with Projection

 Property of Euclidean Projection

min 𝑔𝑦

  • s. t.

𝑦 ∈ 𝑌

  • 𝑦 𝑦∗

𝑄

𝑦

𝑄

𝑦∗

𝑦 𝑦∗

slide-38
SLIDE 38

Gradient Descent with Projection

 The Problem  The Algorithm

Given a starting point 𝑦 ∈ dom 𝑔 For 𝑙 1,2, … , 𝐿 do

Update: 𝑦 𝑦 𝑢 𝛼𝑔 𝑦 Projection: 𝑦 𝑄

𝑦

  • End for

Return

  •  Assumptions

𝛼𝑔 𝑦

𝐻,

∀𝑦 ∈ 𝑌

slide-39
SLIDE 39

Analysis

 Define

 Let

𝑔 𝑦 𝑔 𝑦∗ 𝛼𝑔 𝑦 𝑦 𝑦∗ 1 2𝜃 𝑦 𝑦∗

  • 𝑦

𝑦∗

  • 𝜃

2 𝐻 1 𝜃 𝑦 𝑦 𝑦 𝑦∗ 1 2𝜃 𝑦 𝑦∗

  • 𝑦 𝑦∗
  • 𝜃

2 𝐻

Property

  • f

Euclidean Projection

slide-40
SLIDE 40

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-41
SLIDE 41

Motivation

 The First-order Taylor Approximation

  • is the directional derivative of

at in the direction  It gives the approximate change in for a small step  is a descent direction if

  • is

negative

 A Good Search Direction

 Make

  • as negative as possible
slide-42
SLIDE 42

Steepest Descent Method

 Normalized Steepest Descent Direction

 with respect to the norm  Equivalent to

 The direction in the unit ball of ⋅ that extends farthest in the direction 𝛼𝑔𝑦

 Unnormalized Steepest Descent Direction

slide-43
SLIDE 43

Steepest Descent Method

 The Algorithm

Given a starting point 𝑦 ∈ dom 𝑔 Repeat

  • 1. Compute steepest descent direction Δ𝑦.
  • 2. Line search: Choose 𝑢 via exact or

backtracking line search.

  • 3. Update: 𝑦 ≔ 𝑦 𝑢Δ𝑦.

until stopping criterion is satisfied.

 When exact line search is used, scale factors in the direction have no effect.

slide-44
SLIDE 44

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-45
SLIDE 45

Steepest Descent Method

 Steepest Descent for Euclidean Norm

 The steepest descent method coincides with the gradient descent method

slide-46
SLIDE 46

Steepest Descent Method

 Steepest Descent for Quadratic Norm

  • quadratic norm, where
  •  The dual norm

  • /
  •  Normalized Steepest Descent Direction

 Unnormalized Steepest Descent Direction

  • /

𝑄𝛼𝑔 𝑦

  • /

/

slide-47
SLIDE 47

Steepest Descent Method

 Steepest Descent for Quadratic Norm

 The ellipsoid is the unit ball of the norm

Δ𝑦 extends as far as possible in the direction 𝛼𝑔 𝑦 while staying in the ellipsoid.

slide-48
SLIDE 48

Steepest Descent Method

 Steepest Descent for Quadratic Norm

 Interpretation via Change of Coordinates  Define

/ , so

  •  An Equivalent Problem

 Gradient descent method  Correspond to the direction

/ / / / / /

slide-49
SLIDE 49

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-50
SLIDE 50

Steepest Descent Method

 Steepest Descent for -norm

 Normalized Steepest Descent Direction

 𝑗 be any index for which 𝛼𝑔 𝑦 𝛼𝑔𝑦  𝑓 is the 𝑗-th standard basis vector

 Unnormalized Steepest Descent Direction

slide-51
SLIDE 51

Steepest Descent Method

 Steepest Descent for -norm

 The diamond is the unit ball of ℓ-norm

Δ𝑦 can always be chosen in the direction of a standard basis vector (or a negative one).

slide-52
SLIDE 52

Steepest Descent Method

 Steepest Descent for -norm  Coordinate-descent Algorithm

  • 1. Select a component of

with maximum absolute value

  • 2. Decrease or increase the corresponding

component of  Simplify, or even trivialize, the line search

slide-53
SLIDE 53

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-54
SLIDE 54

Convergence Analysis

  • 1. Any norm can be bounded in terms of

the Euclidean norm

 Exist

is smooth, i.e,

slide-55
SLIDE 55

Convergence Analysis

  • 3. Exit Condition for the Backtracking

Line Search

  • 0 𝑢 𝛿

𝑁 ⇒ 𝑢 𝑁𝑢 2𝛿 𝑢 2

slide-56
SLIDE 56

Convergence Analysis

  • 3. Exit Condition for the Backtracking

Line Search

  Backtracking line search terminates  So

  • 𝑔 𝑦 𝑔 𝑦 𝑢Δ𝑦 𝑔 𝑦 𝛽 min 1, 𝛾𝛿

𝑁 𝑔 𝑦

  • 𝑔 𝑦 𝛽 𝛿

min 1, 𝛾𝛿 𝑁 𝑔 𝑦

slide-57
SLIDE 57

Convergence Analysis

  • 4. Subtracting

∗ from Both Sides

  • 5. Combining with Strong Convexity

  • 6. Applying it Recursively

 Linear convergence

𝑔 𝑦 𝑞∗ 𝑔 𝑦 𝑞∗ 𝛽 𝛿 min 1, 𝛾𝛿 𝑁 𝑔 𝑦

Fail to illustrate the advantage

slide-58
SLIDE 58

Outline

 Gradient Descent Method

 Convergence Analysis  Examples  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis  Discussion and Examples

slide-59
SLIDE 59

Choice of Norm for Steepest Descent

 Steepest Descent Method with Quadratic -norm

 Equivalent to gradient method after the change of coordinates

 Gradient Method Works Well

 When the condition numbers of the sublevel sets (or Hessian) are moderate

 Steepest Descent Method will Work Well

 When the sublevel sets, after the change of coordinates, are moderately conditioned

slide-60
SLIDE 60

Choice of Norm for Steepest Descent

 Choosing to make the sublevel sets

  • f

are well conditioned

 If an approximation

  • f the Hessian at the
  • ptimal point

were known  A good choice of would be  The Hessian of at the optimum

 Choosing to make the ellipsoid approximate the the sublevel set of

/

/

slide-61
SLIDE 61

Example

 The Objective Function

 Steepest descent method

 Using the two quadratic norms

 Backtracking line search

 𝛽 0.1 and 𝛾 0.7

  • .

. .

slide-62
SLIDE 62

Example

 The Objective Function

  • .

. .

slide-63
SLIDE 63

Example

 The Objective Function

  • .

. .

slide-64
SLIDE 64

Example

 The Objective Function

  • .

. .

slide-65
SLIDE 65

Example

 Why is better than

?

 Problems after the changes of coordinates

 The change of variables associated with 𝑄

yields

sublevel sets with modest condition number

slide-66
SLIDE 66

Summary

 Gradient Descent Method

 Convergence Analysis  General Convex Functions

 Steepest Descent Method

 Euclidean and Quadratic Norms 

  • norm

 Convergence Analysis