Computational Optimization Advance Topics NonSmooth Optimization - PowerPoint PPT Presentation

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006

Best Linear Separator: Supporting Plane Method Maximize distance Between two para supporting planes Distance ⋅ = δ x w = “ Margin ” δ − β = ⋅ = β || || x w w

Linearly Inseparable Case: Soft Margin Method Hinge loss 0 1 � + ∑ ( ) − ⋅ + 1 2 min || || max(0,1 ) w C y x w b i i 2 , , wb z = 1 i

Nonsmooth Optimization If Objective is not differentiable Constraints are not diffentiable Then problem is nonsmooth. For today’s lecture, assume everything is convex but possible nonsmooth.

Common non-smooth problems Problems involving max functions Problems involving absolute values Exact penalty formulations Lagrangian dual problems

Strategy I Smooth the nonsmooth problem by reformulating with added variables and constraints � ∑ + 1 2 min || || w C z i 2 , , w b z = 1 i ( ) ⋅ + + ≥ 1 y x w b z i i i . s t ≥ = � 0 1,.., z i i But this increases the problem size

Strategy II Tackle the nonsmooth problem directly. Problems can still be quite nice. Convex functions are always continuous. Need to generalize optimality conditions. Need to generalize algorithms.

Subgradient Generalization of the gradient Definition → n Let : be a convex function. f R R Hinge loss ∈ n a vector g such that R ≥ + f(y) f(x) g' (y - x) is a subgradien t of at x. f 0 1 ≥ + f(y) f(x) g'(y-x)

Subdifferential The subgradient may not be unique The set of all subgradients of f at x is called the subdifferential. ∈ ∂ ( x ) g f If f is differentiable, the subdifferential consists of one point, the gradient of f at x.

Subgradient f(x)=max(0,1-x) Subgradient f(x) ∂ = > ( ) 0 x 1 f x if ∂ = − = ( ) [ 1 , 0 ] x 1 f x if ∂ = − < ( ) 1 x 1 f x if 0 1 ≥ + f(y) f(x) g'(y-x)

Subgradient Method Analogous to Steepest Descent Basic algorithm + = − α 1 k k k k x x g ∈ k k ( ) where g f x α = τ γ k k k is stepsize 1 γ = k . . e g k max( ,|| ||) C g τ = k constant

Stepsize harder The subgradient is not necessarily a direction of descent Contour plot − k g of function ∂ ( x ) f But fixed stepsize schemes can still work

Subgradient descent Algorithms Like gradient descent but with subgradient. Catch the function may not decrease! Stepsize a bit tricky. Usually use fixed step sizes that must be sufficiently small. Or use trust region methods. Converges despite all that.

Next hardest problem ∈ Solve min ( ) . . f x s t x X o Assuming project of x on to Xo is easy for example P(x)=min(||c-x||^2 s.t. L ≤ c ≤ U)

Projected Subgradient descent method Basic algorithm + = − α 1 k k k k ( ) x P x g ∈ k k ( ) where g f x α k is stepsize Optimal if + = − α 1 k k k k ( ) x P x g

Cutting Plane Methods Observe subgradient inequality holds for all y ≥ + k k k f(y) f(x ) g '(y-x ) { } ≥ + k k k f(y) max f(x ) g '(y-x ) k ≥ + 2 2 2 f(y) f(x ) g '(y-x ) ≥ + 1 1 1 f(y) f(x ) g '(y-x )

Cutting Plane Algorithm To solve min f(x) with f subdifferential Start with x1 For k=1,2,…. ∈ ∂ k k ( ) g f x ∈ k arg min x z ≥ + − = i i i . . ( ) '( ) 1,.., s t z f x g y x i k + = 1 k k ( ) ( ) if f x f x then stop optimal

Cutting Plane Method Converges for quite general cases If f is piecewise linear, requires a finite number of cuts. Easy to adapt to linearly constrained cases as well. Can converge slowly. Number of cuts is not bounded in general.

Dual Problem is non smooth Optimize convex program min ( ) f x = . . s t Ax b ∈ 0 x X Lagrangian dual function θ λ = + λ − ( ) min ( ) ' ( ) f x b Ax ∈ x X 0 Lagrangian dual problem θ λ max ( ) λ ≥ 0

Dual function subgradient Subgradient found by solving for ∈ + λ − k k arg min ( ) '( ) x f x b Ax ∈ x X then 0 = − ∈∂ θ λ k k k ( ) ( ) g b Ax

Cutting Plane Method for Dual Problem Similar to unconstrained case except solve for some large fixed C λ ∈ k arg min z ≥ θ λ + − λ = i i i . . ( ) '( ) 1,.., s t z g y i k − ≤ ≤ C y C C constraints insure problem always has a solution

Recover primal variables At optimality need to get back the primal solution x* Look at KKT of master problem Show that using multipliers of master, u, k ∑ = k * x u k x = 1 i

Bundle Methods Problem with cutting plane method is that they may require too many cuts. Bundle methods get around this difficult by using a regularized master problem ρ 2 + − k min z x w ∈ 2 , z y X 0 ≥ + − ∈ i i i k . . ( ) '( ) s t z f x g y x i J

Bundle Methods Wk is called the center. You don’t want to change the center unless you have added enough constraints to get a good decrease. Can drop some or all of the constraints that have 0 Lagrangian multiplers in the regularized master problem.

Bundle algorithm 0. Set k=1, J={} and v1 = -infinity Calculate f(xk) and gk 1. if f(xk) < vk, add k to constraints in J 2. if k=1 or f(xk)<=(1-a)f(w k-1 )+af k-1 (x k ) then w k =x k else w k = w k-1 . Solve restricted master for (x k+1 ,v k+1 ) 3. If fk(x k+1 )=f(w k ) , then stop x k+1 optimal 4. Update J by removing cuts with negative 5. multipliers from solving the subproblems.

Bundle Methods for Nonsmooth optimization No step size needed Nice check for optimality if a function achieves its lower bound it is optimal. Reduces to a series of nice convex quadratic subproblems Can remove constraints while still adding Finite convergence for piecewise linear convex function with polyhedral constraints. Can be extended to nonconvex nonsmooth optimization but things get a bit more tricky. Still only uses first order information so can be slow.

Computational Optimization Advance Topics NonSmooth Optimization - PowerPoint PPT Presentation

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006 Best Linear Separator: Supporting Plane Method Maximize distance Between two para supporting planes Distance = x

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Optimization Process Done by an Optimization Algorithm Jose Rueda Torres Learning Objectives

Optimization (Introduction) Optimization Goal: Find the minimizer that minimizes the

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Fundamental Parameters of QCD from the Lattice Hubert Simma Milano Bicocca, DESY Zeuthen GGI

Two Integer Linear Programming Approaches for Solving the Car Sequencing Problem Matthias

Encrypt or Decrypt ? To Make a Single-Key BBB Secure Nonce-Based MAC Nilanjan Datta 1 , Avijit

Clean Energy Solutions Center and the United Nations Foundations Energy Access Practitioner

Collaborative Computer Meditated Communication Computing Real-world communication

Decision-making and governance CS 278 | Stanford University | Michael Bernstein Last time As

Vron Maxime, Marin Olivier, Monnet Sbastien Universit Pierre et Marie Curie, France, LIP6.

Progetto di Ricerca GNCS 2016 PING Problemi Inversi in Geofisica Firenze, 6 aprile 2016