A Dynamic Approach to Scaling in Bundle Methods for Convex - - PowerPoint PPT Presentation

a dynamic approach to scaling in bundle methods for
SMART_READER_LITE
LIVE PREVIEW

A Dynamic Approach to Scaling in Bundle Methods for Convex - - PowerPoint PPT Presentation

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz The Bundle Method and the


slide-1
SLIDE 1

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization

Christoph Helmberg joint work with Alois Pichler TU Chemnitz

  • The Bundle Method and the Aggregate
  • Dynamic Choice of the Proximal Term
  • Relation to the Hessian in the Smooth Case
  • A Cheaper Scaling Heuristic
  • Implementational Issues
  • Some Numerical Experiments
slide-2
SLIDE 2

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set

slide-3
SLIDE 3

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set f is specified by a first order oracle: given ¯ y ∈ RM it returns

  • f (¯

y) ∈ R function value

  • g(¯

y) ∈ RM some subgradient (not nec. unique)

y g f(y)

satisfying f (y) ≥ f (¯ y) + g(¯ y), y − ¯ y ∀y ∈ RM (subg. ineq.)

slide-4
SLIDE 4

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set f is specified by a first order oracle: given ¯ y ∈ RM it returns

  • f (¯

y) ∈ R function value

  • g(¯

y) ∈ RM some subgradient (not nec. unique)

g γ

satisfying f (y) ≥ f (¯ y) + g(¯ y), y − ¯ y ∀y ∈ RM (subg. ineq.) Each ω = (γ, g), γ = f (¯ y) − g, ¯ y generates a linear minorant of f fω(y) := γ + g, y ≤ f (y) ∀y ∈ RM

slide-5
SLIDE 5

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set f is specified by a first order oracle: given ¯ y ∈ RM it returns

  • f (¯

y) ∈ R function value

  • g(¯

y) ∈ RM some subgradient (not nec. unique) satisfying f (y) ≥ f (¯ y) + g(¯ y), y − ¯ y ∀y ∈ RM (subg. ineq.) Each ω = (γ, g), γ = f (¯ y) − g, ¯ y generates a linear minorant of f fω(y) := γ + g, y ≤ f (y) ∀y ∈ RM The collected minorants form the bundle, from this we select a model W ⊆ conv{(γ, g): g = g(¯ y i), γ = f (¯ y i) −

  • g, ¯

y i , i = 1, . . . , k},

slide-6
SLIDE 6

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set f is specified by a first order oracle: given ¯ y ∈ RM it returns

  • f (¯

y) ∈ R function value

  • g(¯

y) ∈ RM some subgradient (not nec. unique) satisfying f (y) ≥ f (¯ y) + g(¯ y), y − ¯ y ∀y ∈ RM (subg. ineq.) Each ω = (γ, g), γ = f (¯ y) − g, ¯ y generates a linear minorant of f fω(y) := γ + g, y ≤ f (y) ∀y ∈ RM The collected minorants form the bundle, from this we select a model W ⊆ conv{(γ, g): g = g(¯ y i), γ = f (¯ y i) −

  • g, ¯

y i , i = 1, . . . , k}, Any closed proper convex function is the sup over its linear minorants, f (y) = sup

(γ,g)∈W

γ + g, y , choose compact W ⊆ W.

slide-7
SLIDE 7

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Bundle Method for Nonsmooth Convex Optimization

min f (y) s.t. y ∈ RM with f : RM → R convex (nonsmooth), M = {1, . . . , m} some index set f is specified by a first order oracle: given ¯ y ∈ RM it returns

  • f (¯

y) ∈ R function value

  • g(¯

y) ∈ RM some subgradient (not nec. unique) satisfying f (y) ≥ f (¯ y) + g(¯ y), y − ¯ y ∀y ∈ RM (subg. ineq.) Each ω = (γ, g), γ = f (¯ y) − g, ¯ y generates a linear minorant of f fω(y) := γ + g, y ≤ f (y) ∀y ∈ RM The collected minorants form the bundle, from this we select a model W ⊆ conv{(γ, g): g = g(¯ y i), γ = f (¯ y i) −

  • g, ¯

y i , i = 1, . . . , k}, Maximizing over all ω ∈ W gives a cutting model minorizing f , fW (y) := max

ω∈W fω(y)

≤ f (y) ∀y ∈ RM

slide-8
SLIDE 8

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

convex function

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

slide-9
SLIDE 9

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

convex function

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y

slide-10
SLIDE 10

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

cutting plane model with g ∈ ∂f (ˆ y)

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y

slide-11
SLIDE 11

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

cutting plane model with g ∈ ∂f (ˆ y)

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y

  • 1. Find a candidate by solving

min

y

max

ω∈W fω(y)

slide-12
SLIDE 12

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

solve augmented model → ¯ y

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y

  • 1. Find a candidate by solving the quadratic model

min

y

max

ω∈W fω(y) + u 2y − ˆ

y2

slide-13
SLIDE 13

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

solve augmented model → ¯ y

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y y+

  • 1. Find a candidate by solving the quadratic model

min

y

max

ω∈W fω(y) + u 2y − ˆ

y2

  • 2. Evaluate the function and determine a subgradient (oracle)
slide-14
SLIDE 14

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

solve augmented model → ¯ y

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

y y+

  • 1. Find a candidate by solving the quadratic model

min

y

max

ω∈W fω(y) + u 2y − ˆ

y2

  • 2. Evaluate the function and determine a subgradient (oracle)
  • 3. Decide on
  • null step
  • descent step
slide-15
SLIDE 15

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proximal Bundle Method [Lemar´ echal78,Kiwiel90]

Input: a convex function given by a first order oracle

improve cutting model in ¯ y

−1 −0.5 0.5 1 −1 −0.5 0.5 1 −1 −0.5 0.5 1 1.5 2 2.5 3

  • 1. Find a candidate by solving the quadratic model

min

y

max

ω∈W fω(y) + u 2y − ˆ

y2

  • 2. Evaluate the function and determine a subgradient (oracle)
  • 3. Decide on
  • null step
  • descent step
  • 4. Update model to contain at least aggregate and new minorant

and iterate

slide-16
SLIDE 16

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2= min

y

max

ξω≥0

ξω=1

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2

slide-17
SLIDE 17

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = min

y

max

ξω≥0

ξω=1

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2

slide-18
SLIDE 18

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2

slide-19
SLIDE 19

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2 Determining the saddle point (¯ y, ¯ ω) over Rn × conv W yields

  • ¯

ω = (¯ γ, ¯ g), the aggregate (the “best” minorant in conv W ),

  • ¯

y = ˆ y − 1

u ¯

g, the next candidate for evaluation.

slide-20
SLIDE 20

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2 Determining the saddle point (¯ y, ¯ ω) over Rn × conv W yields

  • ¯

ω = (¯ γ, ¯ g), the aggregate (the “best” minorant in conv W ),

  • ¯

y = ˆ y − 1

u ¯

g, the next candidate for evaluation. The progress f (ˆ y) − f (¯ y) is compared to the predicted decrease f (ˆ y) − f¯

ω(¯

y) = f (ˆ y) − ¯ γ − ˆ y, ¯ g + 1

g2 ≥ 0,

slide-21
SLIDE 21

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2 Determining the saddle point (¯ y, ¯ ω) over Rn × conv W yields

  • ¯

ω = (¯ γ, ¯ g), the aggregate (the “best” minorant in conv W ),

  • ¯

y = ˆ y − 1

u ¯

g, the next candidate for evaluation. The progress f (ˆ y) − f (¯ y) is compared to the predicted decrease f (ˆ y) − f¯

ω(¯

y) = f (ˆ y) − ¯ γ − ˆ y, ¯ g + 1

g2 ≥ 0, This decides on descent step (ˆ y ← ¯ y) or null step (ˆ y ← ˆ y, new ω).

slide-22
SLIDE 22

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2 Determining the saddle point (¯ y, ¯ ω) over Rn × conv W yields

  • ¯

ω = (¯ γ, ¯ g), the aggregate (the “best” minorant in conv W ),

  • ¯

y = ˆ y − 1

u ¯

g, the next candidate for evaluation. The progress f (ˆ y) − f (¯ y) is compared to the predicted decrease f (ˆ y) − f¯

ω(¯

y) = f (ˆ y) − ¯ γ − ˆ y, ¯ g + 1

g2 ≥ 0, This decides on descent step (ˆ y ← ¯ y) or null step (ˆ y ← ˆ y, new ω).

Theorem (e.g. [BoGiLeSa2003])

Let ˆ y k denote the center of iteration k, then f (ˆ y k) → inf f . If, in addition, ˆ y k0 = ˆ y k for k ≥ k0 (finitely many descent steps) then ˆ y k0 minimizes f and (f (ˆ y k) − f¯

ωk(¯

y k))k>k0 ↓ 0.

slide-23
SLIDE 23

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The Aggregate and Convergence

Given weight u > 0, the quadratic subproblem is a saddle point problem min

y

max

ω∈W fω(y)+ u 2y−ˆ

y2 = max

ξω≥0

ξω=1

min

y

  • (γ,g)∈W

ξω(γ + g ⊤y) + u

2y − ˆ

y2 Determining the saddle point (¯ y, ¯ ω) over Rn × conv W yields

  • ¯

ω = (¯ γ, ¯ g), the aggregate (the “best” minorant in conv W ),

  • ¯

y = ˆ y − 1

u ¯

g, the next candidate for evaluation. The progress f (ˆ y) − f (¯ y) is compared to the predicted decrease f (ˆ y) − f¯

ω(¯

y) = f (ˆ y) − ¯ γ − ˆ y, ¯ g + 1

g2 ≥ 0, This decides on descent step (ˆ y ← ¯ y) or null step (ˆ y ← ˆ y, new ω).

Theorem (e.g. [BoGiLeSa2003])

Let ˆ y k denote the center of iteration k, then f (ˆ y k) → inf f . If, in addition, ˆ y k0 = ˆ y k for k ≥ k0 (finitely many descent steps) then ˆ y k0 minimizes f and (f (ˆ y k) − f¯

ωk(¯

y k))k>k0 ↓ 0. f bounded below ⇒ ¯ g k

K

→ 0

slide-24
SLIDE 24

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

The bundle framework offers a lot of flexibility and can be extended in many directions:

  • add scaling/“second order” information via the proximal term
  • allow constraints on y
  • Lagrangian relaxation/decomposition or sums of convex functions
  • generate good primal approximations in Lagrangian relaxation
  • solve the dual to primal cutting plane approaches
  • use specialized cutting models (quadratic subproblem solvable?)
  • asynchronous parallel approaches

For me it offers the potential for “A general tool like the simplex method for LP” → ConicBundle, contains much but not yet all of this . . . Here: choose the proximal term + 1

2y − ˆ

y2

H dynamically

Here: (dynamic scaling, “second order” information)

slide-25
SLIDE 25

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Variable Metric Bundle Methods and second order approaches

  • reversal quasi-Newton for Moreau-Yosida regularization

[Lemar´ echalSagastiz´ abal1997]

  • aggregate Hessians of piecewise smooth parts [Lukˇ

sanVlˇ cek1998]

  • adapt BFGS [Lukˇ

sanVlˇ cek1999], limited memory [HaaralaMiettinenM¨ akel¨ a2007], with bounds [KarmitsaM¨ akel¨ a2010]

  • VU-approach [MifflinSagastiz´

abal2005]

slide-26
SLIDE 26

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Variable Metric Bundle Methods and second order approaches

  • reversal quasi-Newton for Moreau-Yosida regularization

[Lemar´ echalSagastiz´ abal1997]

  • aggregate Hessians of piecewise smooth parts [Lukˇ

sanVlˇ cek1998]

  • adapt BFGS [Lukˇ

sanVlˇ cek1999], limited memory [HaaralaMiettinenM¨ akel¨ a2007], with bounds [KarmitsaM¨ akel¨ a2010]

  • VU-approach [MifflinSagastiz´

abal2005] Here: construct H directly from collected subgradients (no update)

slide-27
SLIDE 27

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Variable Metric Bundle Methods and second order approaches

  • reversal quasi-Newton for Moreau-Yosida regularization

[Lemar´ echalSagastiz´ abal1997]

  • aggregate Hessians of piecewise smooth parts [Lukˇ

sanVlˇ cek1998]

  • adapt BFGS [Lukˇ

sanVlˇ cek1999], limited memory [HaaralaMiettinenM¨ akel¨ a2007], with bounds [KarmitsaM¨ akel¨ a2010]

  • VU-approach [MifflinSagastiz´

abal2005] Here: construct H directly from collected subgradients (no update)

  • include information of dropped subgradients in the model
slide-28
SLIDE 28

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Variable Metric Bundle Methods and second order approaches

  • reversal quasi-Newton for Moreau-Yosida regularization

[Lemar´ echalSagastiz´ abal1997]

  • aggregate Hessians of piecewise smooth parts [Lukˇ

sanVlˇ cek1998]

  • adapt BFGS [Lukˇ

sanVlˇ cek1999], limited memory [HaaralaMiettinenM¨ akel¨ a2007], with bounds [KarmitsaM¨ akel¨ a2010]

  • VU-approach [MifflinSagastiz´

abal2005] Here: construct H directly from collected subgradients (no update)

  • include information of dropped subgradients in the model
  • when dealing with sums of convex functions, this allows to

– combine second order inform. of smooth and nonsmooth functions – rearrange subgroups of functions on the fly (e.g. parallel computing)

slide-29
SLIDE 29

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Variable Metric Bundle Methods and second order approaches

  • reversal quasi-Newton for Moreau-Yosida regularization

[Lemar´ echalSagastiz´ abal1997]

  • aggregate Hessians of piecewise smooth parts [Lukˇ

sanVlˇ cek1998]

  • adapt BFGS [Lukˇ

sanVlˇ cek1999], limited memory [HaaralaMiettinenM¨ akel¨ a2007], with bounds [KarmitsaM¨ akel¨ a2010]

  • VU-approach [MifflinSagastiz´

abal2005] Here: construct H directly from collected subgradients (no update)

  • include information of dropped subgradients in the model
  • when dealing with sums of convex functions, this allows to

– combine second order inform. of smooth and nonsmooth functions – rearrange subgroups of functions on the fly (e.g. parallel computing) Note: Convergence is easily guaranteed by not decreasing eigenvalues after null steps and by enforcing lower and upper bounds on the eigenvalues.

slide-30
SLIDE 30

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

slide-31
SLIDE 31

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-32
SLIDE 32

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-33
SLIDE 33

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-34
SLIDE 34

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-35
SLIDE 35

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-36
SLIDE 36

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

slide-37
SLIDE 37

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-38
SLIDE 38

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-39
SLIDE 39

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-40
SLIDE 40

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-41
SLIDE 41

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-42
SLIDE 42

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H

(Idea)

How to get curvature information from “nonsmooth” oracles?

1 2 3 4 5 6 7 8 9 10

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 6

→ idea: use old, possibly inactive minorants for adapting H Model formed by the aggregate ¯ ω plus quadratic term H ≻ 0 should not violate old minorants ω by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-43
SLIDE 43

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Model formed by aggregate ¯ ω = (¯ γ, ¯ g) plus quadratic term H ≻ 0 should not violate any old minorant ω = (γ, g) by more than ǫ > 0 f¯

ω(y) + 1 2(y − ˆ

y)⊤H(y − ˆ y) ≥ fω(y) − ε

slide-44
SLIDE 44

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Model formed by aggregate ¯ ω = (¯ γ, ¯ g) plus quadratic term H ≻ 0 should not violate any old minorant ω = (γ, g) by more than ǫ > 0 ¯ γ + ¯ g, y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ γ + g, y − ε

slide-45
SLIDE 45

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Model formed by aggregate ¯ ω = (¯ γ, ¯ g) plus quadratic term H ≻ 0 should not violate any old minorant ω = (γ, g) by more than ǫ > 0 ¯ γ + ¯ g, y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ γ + g, y − ε ¯ γ − γ + ε + ¯ g − g, ˆ y

  • =:δ (>0)

+ ¯ g − g, y − ˆ y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ 0

slide-46
SLIDE 46

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Model formed by aggregate ¯ ω = (¯ γ, ¯ g) plus quadratic term H ≻ 0 should not violate any old minorant ω = (γ, g) by more than ǫ > 0 ¯ γ + ¯ g, y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ γ + g, y − ε ¯ γ − γ + ε + ¯ g − g, ˆ y

  • =:δ (>0)

+ ¯ g − g, y − ˆ y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ 0 ⇔

  • H

¯ g − g (¯ g − g)⊤ 2δ

slide-47
SLIDE 47

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Dynamic Choice of Proximal Term +1

2y − ˆ

y2

H Model formed by aggregate ¯ ω = (¯ γ, ¯ g) plus quadratic term H ≻ 0 should not violate any old minorant ω = (γ, g) by more than ǫ > 0 ¯ γ + ¯ g, y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ γ + g, y − ε ¯ γ − γ + ε + ¯ g − g, ˆ y

  • =:δ (>0)

+ ¯ g − g, y − ˆ y + 1

2(y − ˆ

y)⊤H(y − ˆ y) ≥ 0 ⇔

  • H

¯ g − g (¯ g − g)⊤ 2δ

  • Lemma

Given ˆ y, ε > 0, ¯ ω = (¯ γ, ¯ g), ω = (γ, g) with ¯ γ + ¯ g ⊤ˆ y > γ + g ⊤ˆ y − ε a matrix ¯ H 0 ensures f¯

ω(y) + 1 2(y − ˆ

y)⊤ ¯ H(y − ˆ y) ≥ fω(y) − ε for all y ∈ Rn if and only if ¯ H 1 2 (¯ g − g)(¯ g − g)⊤ ¯ γ − γ + ε + (¯ g − g)⊤ˆ y .

slide-48
SLIDE 48

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A “best” ¯ H 0 for all minorants ωi by SDP

Theorem

Given ˆ y, ε > 0, ¯ ω = (¯ γ, ¯ g), ωi = (γi, g i) with γi + (g i)⊤ˆ y < ¯ γ + ¯ g ⊤ˆ y + ε (i = 1, . . . , k), and a preference C ≻ 0, any optimal ¯ H of minimize C, H subject to H 1

2 (¯ g−g i)(¯ g−g i)⊤ ¯ γ+ε−γi+(¯ g−g i)⊤ˆ y ,

i = 1, . . . , k, H 0. satisfies f¯

ω(y) + 1

2(y − ˆ y)⊤ ¯ H(y − ˆ y) ≥ max

i=1,...,k fωi(y) − ε

for all y ∈ Rn. Furthermore, D := span{¯ g − g i : i = 1, . . . , k} ⊆ R( ¯ H). In case C = I, equality holds, D = R( ¯ H).

slide-49
SLIDE 49

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0)

slide-50
SLIDE 50

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k

slide-51
SLIDE 51

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k If A ≻ 0, aggregate ¯ ω is tangent to some ˆ y, but shifted down. → we assume ¯ ω = (¯ γ, ¯ g) = (. . . − ǫ, Aˆ y + b)

slide-52
SLIDE 52

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k If A ≻ 0, aggregate ¯ ω is tangent to some ˆ y, but shifted down. → we assume ¯ ω = (¯ γ, ¯ g) = (. . . − ǫ, Aˆ y + b) → ¯ g − g i = A(ˆ y − y i)

slide-53
SLIDE 53

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k If A ≻ 0, aggregate ¯ ω is tangent to some ˆ y, but shifted down. → we assume ¯ ω = (¯ γ, ¯ g) = (. . . − ǫ, Aˆ y + b) → ¯ g − g i = A(ˆ y − y i)

ε=0

− → H A

1 2 A 1 2 (ˆ

y − y i) A

1 2 (ˆ

y − y i) [A

1 2 (ˆ

y − y i)]⊤ A

1 2 (ˆ

y − y i) A

1 2 .

slide-54
SLIDE 54

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k If A ≻ 0, aggregate ¯ ω is tangent to some ˆ y, but shifted down. → we assume ¯ ω = (¯ γ, ¯ g) = (. . . − ǫ, Aˆ y + b) → ¯ g − g i = A(ˆ y − y i)

ε=0

− → H A

1 2 A 1 2 (ˆ

y − y i) A

1 2 (ˆ

y − y i) [A

1 2 (ˆ

y − y i)]⊤ A

1 2 (ˆ

y − y i) A

1 2 .

Lemma

Let f (x) be as above with A 0. Given ˆ y ∈ Rn, ε ≥ 0 suppose ¯ ω = (¯ γ, ¯ g = ∇f (ˆ y)) satisfies ¯ γ + ¯ g ⊤ˆ y ≤ f (ˆ y) ≤ ¯ γ + ¯ g ⊤ˆ y + ε. Given y i ∈ Rn and ωi = (γi, g i) as above, i = 1, . . . , k, let P ∈ Rn×h, P⊤P = Ih have range space R(P) = span{A(ˆ y − y i): i = 1, . . . , k}, then the projected Hessian PP⊤APP⊤ is feasible for this SDP.

slide-55
SLIDE 55

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Relation to the Hessian in the smooth case

Study f (x) = 1

2x⊤Ax + b⊤x + ρ

convex (A 0) minorants ωi = (γi, g i) = (. . . , Ay i + b) i = 1, . . . , k If A ≻ 0, aggregate ¯ ω is tangent to some ˆ y, but shifted down. → we assume ¯ ω = (¯ γ, ¯ g) = (. . . − ǫ, Aˆ y + b) → ¯ g − g i = A(ˆ y − y i)

ε=0

− → H A

1 2 A 1 2 (ˆ

y − y i) A

1 2 (ˆ

y − y i) [A

1 2 (ˆ

y − y i)]⊤ A

1 2 (ˆ

y − y i) A

1 2 .

Lemma

Let f (x) be as above with A 0. Given ˆ y ∈ Rn, ε ≥ 0 suppose ¯ ω = (¯ γ, ¯ g = ∇f (ˆ y)) satisfies ¯ γ + ¯ g ⊤ˆ y ≤ f (ˆ y) ≤ ¯ γ + ¯ g ⊤ˆ y + ε. Given y i ∈ Rn and ωi = (γi, g i) as above, i = 1, . . . , k, let P ∈ Rn×h, P⊤P = Ih have range space R(P) = span{A(ˆ y − y i): i = 1, . . . , k}, then the projected Hessian PP⊤APP⊤ is feasible for this SDP. Under which conditions is the projected Hessian optimal?

slide-56
SLIDE 56

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Conditions for optimality of the Hessian

f (x) = 1

2x⊤Ax + b⊤x + ρ with A ≻ 0, full dimensional case.

slide-57
SLIDE 57

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Conditions for optimality of the Hessian

f (x) = 1

2x⊤Ax + b⊤x + ρ with A ≻ 0, full dimensional case.

Theorem (Conjugate Directions)

Let A ≻ 0 and vi = (ˆ y − yi), i = 1, . . . , n, be a family of conjugate directions: v⊤

i Avj = δij for i, j = 1, . . . , n.

For any C = n

i=1 ζiviv⊤ i

with ζi > 0 and at least these constraints, ¯ H = A is an optimal solution of the SDP.

slide-58
SLIDE 58

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Conditions for optimality of the Hessian

f (x) = 1

2x⊤Ax + b⊤x + ρ with A ≻ 0, full dimensional case.

Theorem (Conjugate Directions)

Let A ≻ 0 and vi = (ˆ y − yi), i = 1, . . . , n, be a family of conjugate directions: v⊤

i Avj = δij for i, j = 1, . . . , n.

For any C = n

i=1 ζiviv⊤ i

with ζi > 0 and at least these constraints, ¯ H = A is an optimal solution of the SDP.

Corollary (Eigenvector Directions)

Let A ≻ 0 and vi = (ˆ y − yi), i = 1, . . . , n give rise to an eigenvalue decomposition, A = n

i=1 λiviv⊤ i .

For C = I and at least these constraints, ¯ H = A is an optimal solution of the SDP. If directions are close to this, ¯ H should be close to the Hessian

slide-59
SLIDE 59

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Cheaper Nonsmooth Scaling Heuristic (Current Choice)

slide-60
SLIDE 60

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Cheaper Nonsmooth Scaling Heuristic (Current Choice)

Choose a “suitable” orthonormal basis Q ∈ Rn×h of D = span{di =

1 √2δi (¯

g − g i): 1, . . . , k}, restrict H to QΛQ⊤

slide-61
SLIDE 61

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Cheaper Nonsmooth Scaling Heuristic (Current Choice)

Choose a “suitable” orthonormal basis Q ∈ Rn×h of D = span{di =

1 √2δi (¯

g − g i): 1, . . . , k}, restrict H to QΛQ⊤and replace H did⊤

i

“⇔” Λ Q⊤did⊤

i Q

by λj = max{diag(Q⊤did⊤

i Q)j = (Q⊤di)2 j : i = 1, . . . , k}

j = 1, . . . , h.

slide-62
SLIDE 62

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Cheaper Nonsmooth Scaling Heuristic (Current Choice)

Choose a “suitable” orthonormal basis Q ∈ Rn×h of D = span{di =

1 √2δi (¯

g − g i): 1, . . . , k}, restrict H to QΛQ⊤and replace H did⊤

i

“⇔” Λ Q⊤did⊤

i Q

by λj = max{diag(Q⊤did⊤

i Q)j = (Q⊤di)2 j : i = 1, . . . , k}

j = 1, . . . , h. For Q, determine the h most important directions in D by computing the singular value decomposition of [d1, . . . , dk] = QΣP and use Q•,1:h

slide-63
SLIDE 63

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

A Cheaper Nonsmooth Scaling Heuristic (Current Choice)

Choose a “suitable” orthonormal basis Q ∈ Rn×h of D = span{di =

1 √2δi (¯

g − g i): 1, . . . , k}, restrict H to QΛQ⊤and replace H did⊤

i

“⇔” Λ Q⊤did⊤

i Q

by λj = max{diag(Q⊤did⊤

i Q)j = (Q⊤di)2 j : i = 1, . . . , k}

j = 1, . . . , h. For Q, determine the h most important directions in D by computing the singular value decomposition of [d1, . . . , dk] = QΣP and use Q•,1:h Can we justify the choice of this Q in the smooth case again?

slide-64
SLIDE 64

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

For f (x) = 1

2x⊤Ax + b⊤x + ρ

with A ≻ 0 we get di ≈

A(ˆ y−y i) A

1 2 (ˆ

y−y i).

Theorem

Given δ > 0, A ≻ 0, let si ∈ Rn, i = 1, . . . , k, be chosen by a rotationally symmetric distribution, let V = [d1, . . . , dk] with di = Asi/A

1 2 si and

let QΣV P⊤ = V be an SVD with Q⊤Q = In. With probability going to one for k → ∞ there is an orthogonal matrix ¯ Q = Q + O(δ) diagonalizing A = ¯ QΛA ¯ Q⊤.

slide-65
SLIDE 65

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

For f (x) = 1

2x⊤Ax + b⊤x + ρ

with A ≻ 0 we get di ≈

A(ˆ y−y i) A

1 2 (ˆ

y−y i).

Theorem

Given δ > 0, A ≻ 0, let si ∈ Rn, i = 1, . . . , k, be chosen by a rotationally symmetric distribution, let V = [d1, . . . , dk] with di = Asi/A

1 2 si and

let QΣV P⊤ = V be an SVD with Q⊤Q = In. With probability going to one for k → ∞ there is an orthogonal matrix ¯ Q = Q + O(δ) diagonalizing A = ¯ QΛA ¯ Q⊤. In the smooth case, Q gets close to an eigenvector basis of the Hessian!

slide-66
SLIDE 66

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

slide-67
SLIDE 67

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

si rot. symmetric → Q⊤

A si simplifies to si

→ middle random matrix xgh = k

  • i=1

1 k Λ

1 2

Asi(Λ

1 2

Asi)⊤

Λ

1 2

Asi2

  • gh

=

k

  • i=1

1 k

  • λ

1 2

g λ

1 2

h [si]g[si]h

Λ

1 2

Asi2

  • ,

1 ≤ g ≤ h ≤ n.

slide-68
SLIDE 68

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

si rot. symmetric → Q⊤

A si simplifies to si

→ middle random matrix xgh = k

  • i=1

1 k Λ

1 2

Asi(Λ

1 2

Asi)⊤

Λ

1 2

Asi2

  • gh

=

k

  • i=1

1 k

  • λ

1 2

g λ

1 2

h [si]g[si]h

Λ

1 2

Asi2

  • ,

1 ≤ g ≤ h ≤ n. For g < h, each (·) sym. ∈ [−1, 1] = ⇒ E(xgh) = 0, Var(xgh) ≤ 1

k .

slide-69
SLIDE 69

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

si rot. symmetric → Q⊤

A si simplifies to si

→ middle random matrix xgh = k

  • i=1

1 k Λ

1 2

Asi(Λ

1 2

Asi)⊤

Λ

1 2

Asi2

  • gh

=

k

  • i=1

1 k

  • λ

1 2

g λ

1 2

h [si]g[si]h

Λ

1 2

Asi2

  • ,

1 ≤ g ≤ h ≤ n. For g < h, each (·) sym. ∈ [−1, 1] = ⇒ E(xgh) = 0, Var(xgh) ≤ 1

k .

P( max

1≤g<h≤n |xgh| > ε) ≤ n(n − 1)P(|x12| > ε) ≤ n(n − 1) Var(x12) ε2

≤ n(n−1)

ε2k

.

slide-70
SLIDE 70

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

si rot. symmetric → Q⊤

A si simplifies to si

→ middle random matrix xgh = k

  • i=1

1 k Λ

1 2

Asi(Λ

1 2

Asi)⊤

Λ

1 2

Asi2

  • gh

=

k

  • i=1

1 k

  • λ

1 2

g λ

1 2

h [si]g[si]h

Λ

1 2

Asi2

  • ,

1 ≤ g ≤ h ≤ n. For g < h, each (·) sym. ∈ [−1, 1] = ⇒ E(xgh) = 0, Var(xgh) ≤ 1

k .

P( max

1≤g<h≤n |xgh| > ε) ≤ n(n − 1)P(|x12| > ε) ≤ n(n − 1) Var(x12) ε2

≤ n(n−1)

ε2k

. Order on the diagonal: E(xgg) = E(

[s]2

g

[s]2

g+ 1 λg

  • j∈{1,...,n}\{g} λj[s]2

j ) ≥ E(xhh).

slide-71
SLIDE 71

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Proof A = QAΛAQ⊤

A with eigenvalues ΛA = Diag(λ1 ≥ · · · ≥ λn), then

1 k VV ⊤ = QAΛ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

AQ⊤ A .

si rot. symmetric → Q⊤

A si simplifies to si

→ middle random matrix xgh = k

  • i=1

1 k Λ

1 2

Asi(Λ

1 2

Asi)⊤

Λ

1 2

Asi2

  • gh

=

k

  • i=1

1 k

  • λ

1 2

g λ

1 2

h [si]g[si]h

Λ

1 2

Asi2

  • ,

1 ≤ g ≤ h ≤ n. For g < h, each (·) sym. ∈ [−1, 1] = ⇒ E(xgh) = 0, Var(xgh) ≤ 1

k .

P( max

1≤g<h≤n |xgh| > ε) ≤ n(n − 1)P(|x12| > ε) ≤ n(n − 1) Var(x12) ε2

≤ n(n−1)

ε2k

. Order on the diagonal: E(xgg) = E(

[s]2

g

[s]2

g+ 1 λg

  • j∈{1,...,n}\{g} λj[s]2

j ) ≥ E(xhh).

Thus, with probability going to one for k → ∞, Λ

1 2

A

  • 1

k

k

  • i=1

Λ

1 2

AQ⊤ A si

Λ

1 2

AQ⊤ A si

s⊤

i QAΛ

1 2

A

Λ

1 2

AQ⊤ A si

  • Λ

1 2

A k→∞

− → D + Y with diagonal D11 ≥ · · · ≥ Dnn > 0 and perturbation Y ∈ Rn×n, Y F < δ. Use Lemma 4.3 in [SunSun2003].

slide-72
SLIDE 72

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Implementational issues

The heuristic is called at descent steps k for the p last subgradients, it returns a low rank ¯ Hk = GG ⊤, of which we only use the diagonal! The proximal term we use is Hk = ukI + Diag( ¯ Hk), with weight uk > 0 mimicking a trust region behavior.

slide-73
SLIDE 73

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Implementational issues

The heuristic is called at descent steps k for the p last subgradients, it returns a low rank ¯ Hk = GG ⊤, of which we only use the diagonal! The proximal term we use is Hk = ukI + Diag( ¯ Hk), with weight uk > 0 mimicking a trust region behavior. Several further things to make it work:

  • choice of ǫ: choose relative to gap f (ˆ

y) − ¯ γ − ¯ g⊤ˆ y

  • put upper bound on g−¯

g 2δ

(skip/hint for bundle?)

  • select few columns of Q by relative singular values sizes
  • Diag is bad if e.g. G = 1, rescale to relative sizes
  • introduce some history: convex combination with previous H
slide-74
SLIDE 74

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Example: Truck Scheduling [H.R¨

  • hl2007]
  • eCom Logistics operates three warehouses within the same city
  • In each an automatic storage system holds 50000-70000 pallets

storing up to 40000 different products.

  • trucks transport pallets between warehouses to balance demand

A B C

Goal: schedule the trucks, so that demand is satisfied on time

slide-75
SLIDE 75

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Modeling Idea:

  • For each article the flow of pallets between the warehouses is

modeled by a network flow, discretized over time.

  • Capacity on transport arcs is opened by trucks that serve this arc.

t=0.5 t=1.5 t=2.5 t=3.5 t=0 t=1 t=2 t=3

A B C C B A

Article

  • Use Lagrangian relaxation on the coupling constraints.

Transportation of 200 - 1100 articles, 800-1000 pallets per day time steps 10 min → up to 1,100,000 arcs and 4500 multipliers

slide-76
SLIDE 76

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Numerical Experiments

All comparisons on some truck scheduling instances, each with ∼ 800 min-cost-flow problems (mcf-problems) and trice as many piecewise linear convex real functions. We study the influence of the parameters

  • past subgradients: p ∈ {0, 10, 50}
  • bundle size: b ∈ {2, 50, 100}

we test on 32 original instances with known optimal value, time limit per instance and variant is 30min and we compare

  • the time until reaching relative precision 10−3
  • the number of oracle calls needed for this

The performance profiles give for each ρ ∈ [1, 5] the number of instances solved within a factor ρ of the best setting.

slide-77
SLIDE 77

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Performance Profiles (Time & Oracle Calls)

number of problems solved to relative precision 10−3 within a multiple of up to 5 of the best algorithm (≤ 30 min). time

1 1.5 2 2.5 3 3.5 4 4.5 5 5 10 15 20 25 30 trd p0 b2 trd p10 b2 trd p50 b2 trd p0 b50 trd p10 b50 trd p50 b50 trd p0 b100 trd p10 b100 trd p50 b100

  • racle calls

1 1.5 2 2.5 3 3.5 4 4.5 5 5 10 15 20 25 30 trd p0 b2 trd p10 b2 trd p50 b2 trd p0 b50 trd p10 b50 trd p50 b50 trd p0 b100 trd p10 b100 trd p50 b100

Clear winner: scaling with p = 50 gradients and bundle size b = 2 Surprise: it also wins regarding the number of calls!

slide-78
SLIDE 78

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments

Thank you for your attention!