Computational Optimization Advance Topics NonSmooth Optimization - - PowerPoint PPT Presentation

computational optimization
SMART_READER_LITE
LIVE PREVIEW

Computational Optimization Advance Topics NonSmooth Optimization - - PowerPoint PPT Presentation

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006 Best Linear Separator: Supporting Plane Method Maximize distance Between two para supporting planes Distance = x


slide-1
SLIDE 1

Computational Optimization

Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006

slide-2
SLIDE 2

Best Linear Separator: Supporting Plane Method

x w δ ⋅ = x w β ⋅ =

Maximize distance Between two para supporting planes Distance = “Margin” =

|| || w δ β −

slide-3
SLIDE 3

Linearly Inseparable Case: Soft Margin Method

( )

1 1 2 2 , ,

max(0,1 min || || )

i i i wb z

C y x w b w

=

− ⋅ + + ∑

  • Hinge loss

1

slide-4
SLIDE 4

Nonsmooth Optimization

If Objective is not differentiable Constraints are not diffentiable Then problem is nonsmooth. For today’s lecture, assume everything is convex but possible nonsmooth.

slide-5
SLIDE 5

Common non-smooth problems

Problems involving max functions Problems involving absolute values Exact penalty formulations Lagrangian dual problems

slide-6
SLIDE 6

Strategy I

Smooth the nonsmooth problem by reformulating with added variables and constraints

( )

1 1 2 2 , ,

min || || 1 . 1,..,

w b i i i i z i i

w y x w b s z z t C i z

=

+ ⋅ + = ≥ + ≥

  • But this increases the problem size
slide-7
SLIDE 7

Strategy II

Tackle the nonsmooth problem directly. Problems can still be quite nice. Convex functions are always continuous. Need to generalize optimality conditions. Need to generalize algorithms.

slide-8
SLIDE 8

Subgradient

Generalization of the gradient Definition

at x.

  • f

t subgradien a is x)

  • (y

g' f(x) f(y) such that g vector a function. convex a be : Let f R R R f

n n

+ ≥ ∈ →

Hinge loss 1

f(y) f(x) g'(y-x) ≥ +

slide-9
SLIDE 9

Subdifferential

The subgradient may not be unique The set of all subgradients of f at x is called the subdifferential. If f is differentiable, the subdifferential consists of one point, the gradient of f at x.

) (x f g ∂ ∈

slide-10
SLIDE 10

Subgradient

f(x)=max(0,1-x) Subgradient f(x)

1

f(y) f(x) g'(y-x) ≥ +

1 x 1 ) ( 1 x ] , 1 [ ) ( 1 x ) ( < − = ∂ = − = ∂ > = ∂ if x f if x f if x f

slide-11
SLIDE 11

Subgradient Method Analogous to Steepest Descent

Basic algorithm

1

( ) 1 . . max( ,|| ||) constant

k k k k k k k k k k k k

x x g where g f x is stepsize e g C g α α τ γ γ τ

+ =

− ∈ = = =

slide-12
SLIDE 12

Stepsize harder

The subgradient is not necessarily a direction of descent

Contour plot

  • f function

) (x f ∂

k

g −

But fixed stepsize schemes can still work

slide-13
SLIDE 13

Subgradient descent Algorithms

Like gradient descent but with subgradient. Catch the function may not decrease! Stepsize a bit tricky. Usually use fixed step sizes that must be sufficiently

  • small. Or use trust region methods.

Converges despite all that.

slide-14
SLIDE 14

Next hardest problem

Solve Assuming project of x on to Xo is easy for example P(x)=min(||c-x||^2 s.t. L≤c≤U)

  • X

x t s x f ∈ . . ) ( min

slide-15
SLIDE 15

Projected Subgradient descent method

Basic algorithm Optimal if

1

( ) ( )

k k k k k k k

x P x g where g f x is stepsize α α

+ =

− ∈

1

( )

k k k k

x P x g α

+ =

slide-16
SLIDE 16

Cutting Plane Methods

Observe subgradient inequality holds for all y

k k k

f(y) f(x ) g '(y-x ) ≥ +

1 1 1

f(y) f(x ) g '(y-x ) ≥ +

2 2 2

f(y) f(x ) g '(y-x ) ≥ +

{ }

k k k

f(y) max f(x ) g '(y-x )

k

≥ +

slide-17
SLIDE 17

Cutting Plane Algorithm

To solve min f(x) with f subdifferential Start with x1 For k=1,2,….

) (

k k

x f g ∂ ∈ arg min . . ( ) '( ) 1,..,

k i i i

x z s t z f x g y x i k ∈ ≥ + − =

1

( ) ( )

k k

if f x f x then stop optimal

+

=

slide-18
SLIDE 18

Cutting Plane Method

Converges for quite general cases If f is piecewise linear, requires a finite number of cuts. Easy to adapt to linearly constrained cases as well. Can converge slowly. Number of cuts is not bounded in general.

slide-19
SLIDE 19

Dual Problem is non smooth

Optimize convex program Lagrangian dual function Lagrangian dual problem

. . ) ( min X x b Ax t s x f ∈ =

) ( ' ) ( min ) ( Ax b x f

X x

− + =

λ λ θ max ( )

λ

θ λ

slide-20
SLIDE 20

Dual function subgradient

Subgradient found by solving for then

arg min ( ) '( )

k k x X

x f x b Ax λ

∈ + − ( ) ( )

k k k

g b Ax θ λ = − ∈∂

slide-21
SLIDE 21

Cutting Plane Method for Dual Problem

Similar to unconstrained case except solve for some large fixed C

arg min . . ( ) '( ) 1,..,

k i i i

z s t z g y i k C y C λ θ λ λ ∈ ≥ + − = − ≤ ≤

C constraints insure problem always has a solution

slide-22
SLIDE 22

Recover primal variables

At optimality need to get back the primal solution x* Look at KKT of master problem Show that using multipliers of master, u,

=

=

k i k kx

u x

1

*

slide-23
SLIDE 23

Bundle Methods

Problem with cutting plane method is that they may require too many cuts. Bundle methods get around this difficult by using a regularized master problem

2 ,

min 2 . . ( ) '( )

k z y X i i i k

z x w s t z f x g y x i J ρ

+ − ≥ + − ∈

slide-24
SLIDE 24

Bundle Methods

Wk is called the center. You don’t want to change the center unless you have added enough constraints to get a good decrease. Can drop some or all of the constraints that have 0 Lagrangian multiplers in the regularized master problem.

slide-25
SLIDE 25

Bundle algorithm

  • 0. Set k=1, J={} and v1 = -infinity

1.

Calculate f(xk) and gk if f(xk) < vk, add k to constraints in J

  • 2. if k=1 or f(xk)<=(1-a)f(wk-1)+afk-1(xk)

then wk=xk else wk = wk-1.

3.

Solve restricted master for (xk+1,vk+1)

4.

If fk(xk+1)=f(wk) , then stop xk+1 optimal

5.

Update J by removing cuts with negative multipliers from solving the subproblems.

slide-26
SLIDE 26

Bundle Methods for Nonsmooth

  • ptimization

No step size needed Nice check for optimality if a function achieves its lower bound it is optimal. Reduces to a series of nice convex quadratic subproblems Can remove constraints while still adding Finite convergence for piecewise linear convex function with polyhedral constraints. Can be extended to nonconvex nonsmooth

  • ptimization but things get a bit more tricky.

Still only uses first order information so can be slow.