a dynamic approach to scaling in bundle methods for
play

A Dynamic Approach to Scaling in Bundle Methods for Convex - PowerPoint PPT Presentation

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz The Bundle Method and the


  1. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz • The Bundle Method and the Aggregate • Dynamic Choice of the Proximal Term • Relation to the Hessian in the Smooth Case • A Cheaper Scaling Heuristic • Implementational Issues • Some Numerical Experiments

  2. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set

  3. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value f(y) y ) ∈ R M • g (¯ some subgradient g (not nec. unique) y ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.)

  4. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ γ • f (¯ y ) ∈ R function value g y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y )

  5. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } ,

  6. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Any closed proper convex function is the sup over its linear minorants, f ( y ) = sup γ + � g , y � , choose compact W ⊆ W . ( γ, g ) ∈ W

  7. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Maximizing over all ω ∈ W gives a cutting model minorizing f , ∀ y ∈ R M f W ( y ) := max ω ∈ W f ω ( y ) ≤ f ( y )

  8. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 −0.5 0 0 −0.5 0.5 1 −1

  9. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 y −0.5 0 0 −0.5 0.5 1 −1

  10. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1

  11. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving min ω ∈ W f ω ( y ) max y

  12. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y

  13. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle)

  14. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step

  15. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] improve cutting model in ¯ y 3 2.5 Input: a convex function 2 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 0 −0.5 0.5 1 −1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step 4. Update model to contain at least aggregate and new minorant and iterate

  16. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = min ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W

  17. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = min ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W

  18. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W

  19. Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W ω ) over R n × conv W yields Determining the saddle point (¯ y , ¯ • ¯ ω = (¯ γ, ¯ g ), the aggregate (the “best” minorant in conv W ), y − 1 • ¯ y = ˆ u ¯ g , the next candidate for evaluation.

Recommend


More recommend