Incremental Gradient, Subgradient, and Proximal Methods for Convex - - PowerPoint PPT Presentation

incremental gradient subgradient and proximal methods for
SMART_READER_LITE
LIVE PREVIEW

Incremental Gradient, Subgradient, and Proximal Methods for Convex - - PowerPoint PPT Presentation

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey Chapter 4 : Optimization for Machine Learning Summary of Chapter 2 Chapter 2: Convex Optimization with Sparsity Inducing Norm This chapter is on


slide-1
SLIDE 1

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

Chapter 4 : Optimization for Machine Learning

slide-2
SLIDE 2

Summary of Chapter 2

  • Chapter 2: Convex Optimization with Sparsity Inducing Norm
  • This chapter is on convex optimization of the form
  • Where f is convex differentiable function and Ω is sparsity inducing

non-smooth norm

  • Ω l1, l1+ l1/lq, hierarchical l1/lq norm
  • Subgradient, block co-ordinate descent, reweighted l2 algorithms etc
slide-3
SLIDE 3

Summary of Chapter 3

  • This chapter is on Cone linear and quadratic programming of the form

Where is generalized inequality, ∈ , where C is closed pointed cone. Examples of cones :- 1) non-negative orhant 2) Second-order cone :- There is Python package CVXOPT to solve conic problems

slide-4
SLIDE 4

Introduction

  • This chapter considers optimization problems with cost functions

such as Where m is very large. Therefore, using incremental methods that operate on singe

  • rather than entire cost function.
slide-5
SLIDE 5

Least Square and Related inference problems

  • Classical regression
  • L1- regularization problem

Other possibilities include using non-quadratic convex loss functions

slide-6
SLIDE 6

Dual Optimization in Separable Problems

  • The problems of the form
  • On non-convex set Y, have dual form
slide-7
SLIDE 7

Weber Problem in Location Theory

  • Find a point x whose weighted distances from given get of points Y

(y1, y2…, ym) is minimized

slide-8
SLIDE 8

Incremental Gradient Methods

  • Differentiable Problems
  • When the component functions are differentiable we may use incremental

gradient methods of the form

  • Where ik is the index of cost component iterated on

Such methods make fast progress when far from convergence but are slow when close to convergence Fixes: use constant step size or reduce to a small positive value

slide-9
SLIDE 9

Variant of incremental gradient method

  • Gradient method with momentum
  • Aggregate component gradient
  • Incremental gradient methods are also related to stochastic gradient

method.

slide-10
SLIDE 10

Incremental Sub-gradient Methods

  • For cases when component functions are convex and non-

differentiable

  • In place of gradient, arbitrary sub gradient is used.
  • Convexity of fi(x) is essential
  • Even non-incremental methods require sub-linear rate of

convergence, hence incremental methods are favored

slide-11
SLIDE 11

Incremental Proximal Methods

  • These are the problems of the form

This form is desirable as for some components, proximal iteration may be obtained in closed form Proximal iterations are considered more stable than gradient or sub- gradient iterations.

slide-12
SLIDE 12

Incremental Subgradient-Proximal methods

  • These methods include incremental algorithms with combination of

proximal and sub-gradient iteration.

slide-13
SLIDE 13
  • Both zk and xk are within constraint X which can be relaxed for either

proximal or sub-gradient iterations which leads to easier computation

  • So, the iterations in previous slides can be rewritten as:
  • Or

Incremental proximal iterations are closely related to sub-gradient iterations. So, we can re-write two steps given above in one step

slide-14
SLIDE 14

Order of components

  • Incremental sub-gradient proximal method’s effectiveness depends
  • n order {fi, hi} are chosen.
  • 1) Cyclic : {fi, hi} are taken in fixed deterministic order
  • 2) randomized order based on uniform sampling: each iteration pair

{fi, hi} is randomly chosen

  • Both order converge, however randomized order is superior to cyclic
  • rder
slide-15
SLIDE 15

Applications: Regularized least squares

  • Let’s consider problem of the form
  • Where R(x) is a l1-norm
  • Then proximal iteration becomes
slide-16
SLIDE 16

Applications: Regularized least squares

  • It decomposed into

Incremental algorithm are well-suited for such problem as proximal updates can be done in closed form Followed by gradient iteration

slide-17
SLIDE 17

Iterated Projection Algorithm for Feasibility Problem

  • Feasibility problem has the form

Which can be re-written for Lipschitz continuous f and sufficiently large γ For which incremental algorithms apply