A Dynamic Approach to Scaling in Bundle Methods for Convex - PowerPoint PPT Presentation

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz • The Bundle Method and the Aggregate • Dynamic Choice of the Proximal Term • Relation to the Hessian in the Smooth Case • A Cheaper Scaling Heuristic • Implementational Issues • Some Numerical Experiments

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value f(y) y ) ∈ R M • g (¯ some subgradient g (not nec. unique) y ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.)

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ γ • f (¯ y ) ∈ R function value g y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y )

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } ,

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Any closed proper convex function is the sup over its linear minorants, f ( y ) = sup γ + � g , y � , choose compact W ⊆ W . ( γ, g ) ∈ W

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Maximizing over all ω ∈ W gives a cutting model minorizing f , ∀ y ∈ R M f W ( y ) := max ω ∈ W f ω ( y ) ≤ f ( y )

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 −0.5 0 0 −0.5 0.5 1 −1

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 y −0.5 0 0 −0.5 0.5 1 −1

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving min ω ∈ W f ω ( y ) max y

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle)

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] improve cutting model in ¯ y 3 2.5 Input: a convex function 2 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 0 −0.5 0.5 1 −1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step 4. Update model to contain at least aggregate and new minorant and iterate

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = min ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W ω ) over R n × conv W yields Determining the saddle point (¯ y , ¯ • ¯ ω = (¯ γ, ¯ g ), the aggregate (the “best” minorant in conv W ), y − 1 • ¯ y = ˆ u ¯ g , the next candidate for evaluation.

A Dynamic Approach to Scaling in Bundle Methods for Convex - PowerPoint PPT Presentation

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz The Bundle Method and the

Bundle: Sue: Data Visualization Presentation with Microsoft Office + Bundle: Sue: Data

Bundle: Sue: Data Visualization Presentation with Microsoft Office + Bundle: Sue: Data

Bundle: Sue: Data Visualization Presentation with Bundle: Sue: Data Visualization Presentation

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Bundle Daily April 4, 2019 Enacting the ABCDEF Bundle Society of Critical Care Medicine: Penn

1 - BUNDLE PRODUCT What do we mean. Some History A pipeline bundle is a package of subsea

10 Credit Bundle Save up to 30% on report purchases 10 Credit Bundle Save up to 30% on report

Paths Towards Patching the Bundle Protocol's (Un)Reliability Wesley Eddy Verizon / NASA

A Parallel Bundle Method for Asynchronous Subspace Optimization in Lagrangian Relaxation Frank

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Developing, implementing and scaling up an acute care bundle for intracerebral haemorrhage in

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

= LEITER f Students 2884 : 4- RADES = 12 DOMAIN = Z CODOWUAIN = 9. } 16,25 4

Exchange operations on noncrossing spanning trees Csaba D. T oth Cal State Northridge, Los

Legacy Code Matters Since maintenance consumes ~60% of software costs, it is probably the most

Ecosystem Dynamics Classwork & Homework www.njctl.org Slide 3 / 47 Slide 4 / 47 Why is a

Methodology Adapted from Menasc & Almeida. 1 Learning Objectives Discuss the concept

ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT

Web Search Using Mobile Cores Quantifying and Mitigating the Price of Efficiency Vijay Janapa

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar

A Dynamic Approach to Scaling in Bundle Methods for Convex - PowerPoint PPT Presentation

Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz The Bundle Method and the

Bundle: Sue: Data Visualization Presentation with Microsoft Office + Bundle: Sue: Data

Bundle: Sue: Data Visualization Presentation with Microsoft Office + Bundle: Sue: Data

Bundle: Sue: Data Visualization Presentation with Bundle: Sue: Data Visualization Presentation

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Bundle Daily April 4, 2019 Enacting the ABCDEF Bundle Society of Critical Care Medicine: Penn

1 - BUNDLE PRODUCT What do we mean. Some History A pipeline bundle is a package of subsea

10 Credit Bundle Save up to 30% on report purchases 10 Credit Bundle Save up to 30% on report

Paths Towards Patching the Bundle Protocol's (Un)Reliability Wesley Eddy Verizon / NASA

A Parallel Bundle Method for Asynchronous Subspace Optimization in Lagrangian Relaxation Frank

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Developing, implementing and scaling up an acute care bundle for intracerebral haemorrhage in

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

= LEITER f Students 2884 : 4- RADES = 12 DOMAIN = Z CODOWUAIN = 9. } 16,25 4

Exchange operations on noncrossing spanning trees Csaba D. T oth Cal State Northridge, Los

Legacy Code Matters Since maintenance consumes ~60% of software costs, it is probably the most

Ecosystem Dynamics Classwork &amp; Homework www.njctl.org Slide 3 / 47 Slide 4 / 47 Why is a

Methodology Adapted from Menasc &amp; Almeida. 1 Learning Objectives Discuss the concept

ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT

Web Search Using Mobile Cores Quantifying and Mitigating the Price of Efficiency Vijay Janapa

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Ecosystem Dynamics Classwork & Homework www.njctl.org Slide 3 / 47 Slide 4 / 47 Why is a

Methodology Adapted from Menasc & Almeida. 1 Learning Objectives Discuss the concept