CSCI 1951-G Optimization Methods in Finance Part 09: Interior - - PowerPoint PPT Presentation

csci 1951 g optimization methods in finance part 09
SMART_READER_LITE
LIVE PREVIEW

CSCI 1951-G Optimization Methods in Finance Part 09: Interior - - PowerPoint PPT Presentation

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35 This material is covered in S. Boyd, L. Vandenberges book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/ . Some of the


slide-1
SLIDE 1

CSCI 1951-G – Optimization Methods in Finance Part 09: Interior Point Methods

March 23, 2018

1 / 35

slide-2
SLIDE 2

This material is covered in S. Boyd, L. Vandenberge’s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/. Some of the materials and the figures are taken from it.

2 / 35

slide-3
SLIDE 3

Context

  • Two weeks ago: unconstrained problems,

solved with descent methods

  • Last week: linearly constrained problems,

solved with Newton’s method

  • This week: inequality constrained problems,

solved with interior point methods

3 / 35

slide-4
SLIDE 4

Inequality constrained minimization problems

min f0(x) s.t. fi(x) ≤ 0, i = 1, . . . , m Ax = b f0, . . . , fm: convex and twice continuously differentiable, A ∈ Rp×n, rank(A) = p < n) Assume:

  • optimal solution x∗ exists, with obj. value p∗.
  • problem is strictly feasible (i.e., feasible region has interior points)

⇒ Slater’s condition hold: There exist λ∗ and ν∗ that, with x∗, satisfy KKTs.

4 / 35

slide-5
SLIDE 5

Hierarchy of algorithms

Transforming constrained problem to unconstrained: always possible, but has drawbacks Solving the constrained problem: direct, leverages problem structure What’s the constrained problem class that is the easiest to solve? Qadratic Problems with Linear equality Constraints (LCQP) Only require to solve ...a system of linear equations How did we solve generic problems with linear equality constraints? With Newton’s method, which solves a sequence of ...LCQPs! We will solve inequality constrained problems with interior point methods, which solve a sequence of linear constrained problems!

5 / 35

slide-6
SLIDE 6

Problem Transformation

Goal: approximate the Inequality Constrained Problem (ICP) with an Equality Constrained Problem (ECP) solvable with Newton’s method; We start by transforming the ICP into an equivalent ECP: From: min f0(x) s.t. fi(x) ≤ 0, i = 1, . . . , m Ax = b To: min g(x) s.t. Ax = b For g(x) = f0(x) +

m

  • i=1

I_(fi(x)) where I_(u) =

  • u ≤ 0

∞ u > 0 So we just use Newton’s method and we are done. The End. Nope.

6 / 35

slide-7
SLIDE 7

Logarithmic barrier

min f0(x) +

m

  • i=1

I_(fi(x)) s.t. Ax = b The obj. function is in general not differentiable: We can’t use Newton’s method. We want to approximate I_(u) with a differentiable function: ˆ I_(u) = −1 t log(−u) with domain −R++, and where t > 0 is a parameter

7 / 35

slide-8
SLIDE 8

Logarithmic barrier

ˆ I_(u) is convex and differentiable

u −3 −2 −1 1 −5 5 10

Figure 11.1 The dashed lines show the function I−(u), and the solid curves show I−(u) = −(1/t) log(−u), for t = 0.5, 1, 2. The curve for t = 2 gives the best approximation.

8 / 35

slide-9
SLIDE 9

Logarithmic barrier

min f0(x) − 1 t

m

  • i=1

log(−fi(x)) s.t. Ax = b The objective function is convex and differentiable: we can use Newton’s method φ(x) = − m

i=1 log(−fi(x)) is called the logarithmic barrier for the

problem

9 / 35

slide-10
SLIDE 10

Example: Inequality form linear programming

min cTx Ax ≤ b The logarithmic barrier for this problem is φ(x) = −

m

  • i=1

log(bi − aT

i x)

where ai are the rows of A.

10 / 35

slide-11
SLIDE 11

How to choose t?

min f0(x) + 1 t φ(x) s.t. Ax = b is an approximation of the original problem. How does the quality of the approximation change with t? As t grows, 1

t φ(x) tends to I_(fi(x))

so the approximation quality increases So let’s just use a large t? Nope.

11 / 35

slide-12
SLIDE 12

Why not using (immediately) a large t?

What’s the intuition behind Newton’s method? Replace obj. function with 2nd-order Taylor approximation at x: f(x + v) ≈ f(x) + ∇f(x)Tv + 1 2vT∇2f(x)v When does this approximation (and Newton’s method) work well? When the Hessian changes slowly Is it the case for the barrier function?

12 / 35

slide-13
SLIDE 13

Back to the example

min cTx s.t. Ax ≤ b φ(x) = −

m

  • i=1

log(bi − aT

i x)

∇2φ(x) =

m

  • i=1

1 (bi − aT

i x)2 aiaT i

The Hessian changes fast as x gets close to the boundary of the feasible region.

13 / 35

slide-14
SLIDE 14

Why not using (immediately) a large t?

The Hessian of the function f0 + 1

t φ varies rapidly near the

boundary of the feasible set. This fact makes directly using a large t not efficient Instead, we will solve a sequence of problems in the form min f0(x) + 1 t φ(x) s.t. Ax = b for increasing values of t We start each Newton’ minimization at the solution of the problem for the previous value of t.

14 / 35

slide-15
SLIDE 15

The central path

Slight rewrite: min tf0(x) + φ(x) s.t. Ax = b Assume it has a unique solution x∗(t) for each t > 0. Central path: {x∗(t) : t > 0} (made of central points)

15 / 35

slide-16
SLIDE 16

The central path

Necessary and sufficient conditions for x∗(t):

  • Strict feasibility:

Ax∗(t) = b fi(x∗(t)) < 0, i = 1, . . . , m

  • Zero of the Lagrangian (centrality condition): Exists ˆ

ν 0 = t∇f0(x∗(t)) + ∇φ(x∗(t)) + ATˆ ν = t∇f0(x∗(t)) +

m

  • i=1

1 −fi(x∗(t))∇fi(x∗(t)) + ATˆ ν

16 / 35

slide-17
SLIDE 17

Back to the example

min cTx s.t. Ax ≤ b φ(x) = −

m

  • i=1

log(bi − aT

i x)

Centrality condition: 0 = t∇f0(x∗(t)) + ∇φ(x∗(t)) + ATˆ ν = tc +

m

  • i=1

1 bi − aT

i xai

17 / 35

slide-18
SLIDE 18

Back to the example

0 = tc + m

i=1 1 bi−aT

i xai

c x⋆ x⋆(10)

Figure 11.2 Central path for an LP with n = 2 and m = 6. The dashed curves show three contour lines of the logarithmic barrier function φ. The central path converges to the optimal point x⋆ as t → ∞. Also shown is the point on the central path with t = 10. The optimality condition (11.9) at this point can be verified geometrically: The line cT x = cT x⋆(10) is tangent to the contour line of φ through x⋆(10).

18 / 35

slide-19
SLIDE 19

Dual point from the central path

Every central point x∗(t) yields a dual feasible point (λ∗(t), ν∗(t)), thus a ...lower bound to the optimal obj. value p∗: λ∗

i (t) = −

1 tfi(x∗(t)), i = 1, . . . , m ν∗(t) = ˆ ν t The proof gives us a lot of information

19 / 35

slide-20
SLIDE 20

Proof

  • λi(t) > 0 because fi(x∗(t)) < 0
  • Rewrite the centrality condition:

0 = t∇f0(x∗(t)) +

m

  • i=1

1 −fi(x∗(t))∇fi(x∗(t)) + ATˆ ν = ∇f0(x∗(t)) +

m

  • i=1

λ∗

i (t)∇fi(x∗(t)) + ATν∗(t)

  • The above equals

∂L ∂x (x∗(t), λ∗(t), ν∗(t)) = 0 i.e., x∗(t) ...minimizes the Lagrangian at λ∗(t), ν∗(t);

20 / 35

slide-21
SLIDE 21

Proof

Let’s look at the dual function: g(λ∗(t), ν∗(t)) = f0(x∗(t)) +

m

  • i=1

λ∗

i (t)fix∗(t) + ν∗(t)(Ax − b)

It holds g(λ∗(t), ν∗(t)) = f0(x∗(t)) − m/t So f0(x∗(t))p∗ ≤ m/t i.e., x ∗( t) is no more than m/t-suboptimal! x∗(t) converges to x∗ as t → ∞.

21 / 35

slide-22
SLIDE 22

The barrier method

To get an ε-approximation we could just set t =m/ε and solve min m ε f0(x) + φ(x) Ax = b This method does not scale well with the size of the problem and with ε. Barrier method: Compute x∗(t) for an increasing sequence of values t until t ≥ m/ε

22 / 35

slide-23
SLIDE 23

The barrier method

input: strictly feasible x = x(0), t = t(0) > 0, µ > 1, ε > 0 repeat: 1 Centering step: Compute x∗(t) by minimizing tf0 + φ subject to Ax = b, starting at x 2 Update: x ← x∗(t) 3 Stopping criterion: quit if m/t < ε 4 Increase t: t ← µt What can we ask about this algorithm?

23 / 35

slide-24
SLIDE 24

The barrier method

What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step? 3 What is a good value for µ? 4 How to choose t(0)?

24 / 35

slide-25
SLIDE 25

Convergence

  • The algorithm stops when m/t < ε
  • t starts at t(0)
  • t increases to µt at each iteration

How to compute the number of iterations needed? We must find the smallest i such that m ε < t(0)µi It holds: i = log

m εt(0)

log m

  • Is there anything important that this analysis does not tell us?

It does not tell us whether, as t grows, the centering step becomes more difficult. (It does not)

25 / 35

slide-26
SLIDE 26

m Newton iterations 101 102 103 15 20 25 30 35

Figure 11.8 Average number of Newton steps required to solve 100 randomly generated LPs of different dimensions, with n = 2m. Error bars show stan- dard deviation, around the average value, for each value of m. The growth in the number of Newton steps required, as the problem dimensions range

  • ver a 100:1 ratio, is very small.

26 / 35

slide-27
SLIDE 27

The barrier method

What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step? 3 What is a good value for µ? 4 How to choose t(0)?

27 / 35

slide-28
SLIDE 28

Solving the centering step optimally?

Computing x∗(t) exactly is not necessary: the central path has no significance, it “just” leads to a solution of the original problem Inexact centering will still lead to the solution but the points (λ∗(t), ν∗(t)) may not be dual feasible. This issue can be corrected (homework) Additionally, geting a extremely accurate minimizer of tf0 + φ only takes a few more Newton iterations than a good minimizer, so why not just go for it?

28 / 35

slide-29
SLIDE 29

The barrier method

What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step? 3 What is a good value for µ? 4 How to choose t(0)?

29 / 35

slide-30
SLIDE 30

Choosing µ

The choice of µ involves a trade-off between the number of outer iterations of the barrier method and the number of inner iterations

  • f the Newton’s method

For small µ, t grows ...slowly:

  • the initial point of Newton’s method is very good: in few inner

iterations it converges to the next x(t).

  • successive x(t), x(µt) are close so more outer iterations are needed

For larger µ, the opposite holds. The two effects really cancel out: the total number of inner iterations stay constant for sufficiently large µ.

30 / 35

slide-31
SLIDE 31

Newton iterations duality gap µ = 2 µ = 50 µ = 150 20 40 60 80 10−6 10−4 10−2 100 102

Figure 11.4 Progress of barrier method for a small LP, showing duality gap versus cumulative number of Newton steps. Three plots are shown, corresponding to three values of the parameter µ: 2, 50, and 150. In each case, we have approximately linear convergence of duality gap.

31 / 35

slide-32
SLIDE 32

µ Newton iterations 40 80 120 160 200 20 40 60 80 100 120 140

Figure 11.5 Trade-off in the choice of the parameter µ, for a small LP. The vertical axis shows the total number of Newton steps required to reduce the duality gap from 100 to 10−3, and the horizontal axis shows µ. The plot shows the barrier method works well for values of µ larger than around 3, but is otherwise not sensitive to the value of µ.

32 / 35

slide-33
SLIDE 33

The barrier method

What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step? 3 What is a good value for µ? 4 How to choose t(0)?

33 / 35

slide-34
SLIDE 34

How to choose t(0)

A very large initial t incurs in more inner iterations at the first outer iteration A very small initial t incurs in more outer iterations m/t(0) is the 1st duality gap. We want to choose t(0) so that m/t(0) ≈ µ(f0(x(0)) − p∗). If we have feasible dual points (λ, ν), with duality gap η = f0(x(0)) − g(λ, ν), then we can take t(0) = m/η. Thus in the 1st outer iteration we get the same duality gap as the initial primal and dual.

34 / 35

slide-35
SLIDE 35

Recap

  • Inequality constrained problems
  • Up and down a hierarchy of algorithms
  • The central path
  • Geting the dual points and the optimality certificate
  • The barrier method
  • Convergence, parameters, and other details

35 / 35