(Sub)Gradients and Convexity (contd) A subdifferential is the - - PowerPoint PPT Presentation

sub gradients and convexity
SMART_READER_LITE
LIVE PREVIEW

(Sub)Gradients and Convexity (contd) A subdifferential is the - - PowerPoint PPT Presentation

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all subgradients of the convex function f : f ( x ) ={ h n : h is a subgradient of f at x } Note that this set is guaranteed to be nonempty unless f


slide-1
SLIDE 1

(Sub)Gradients and Convexity (contd)

Asubdifferentialis the closed convex set of all subgradients of the convex functionf: ∂f(x) ={h∈ ℜ n :his a subgradient offatx} Note that this set is guaranteed to be nonempty unlessfis not convex. Often an indicator function,I C :ℜ n → ℜ, is employed to remove the contraints of an

  • ptimization problem (note that convex setC⊆ ℜ n):

{ 0ifx∈C ∞ ifx/ ∈C min f(x)⇐⇒min x f(x) +I C(x),whereI

C(x) = I{x∈C}= x∈C

The subdifferential of the indicator function atxis

the normal cone for all x in C

August 28, 2018 61 / 402

slide-2
SLIDE 2

(Sub)Gradients and Convexity (contd)

Asubdifferentialis the closed convex set of all subgradients of the convex functionf: ∂f(x) ={h∈ ℜ n :his a subgradient offatx} Note that this set is guaranteed to be nonempty unlessfis not convex. Often an indicator function,I C :ℜ n → ℜ, is employed to remove the contraints of an

  • ptimization problem (note that convex setC⊆ ℜ n):

{ 0ifx∈C ∞ ifx/ ∈C min f(x)⇐⇒min x f(x) +I C(x),whereI

C(x) = I{x∈C}= x∈C

The subdifferential of the indicator function atxis known as thenormal cone,N C(x), of C: NC(x) =∂I C(x) ={h∈ ℜ

n :h Tx≥h Tyfor anyy∈C}

August 28, 2018 61 / 402

slide-3
SLIDE 3

Normal Cones (Tangent Cone and Polar) for some Convex Sets

IfCis a convex set and if.. x∈int(C)thenN and bounded.

C(x) ={0}. In general, ifx∈int(domain(f))then∂f(x)is nonempty

x∈CthenN

C(x)is a closed convex cone. In general,∂f(x)is(possiblyempty)closed

convex set since it is the intersection of half spaces There is a relation between the intuitivetangent coneandnormal coneat a point x∈∂C....This relation is the polar relation. Let us construct thenormal cone,N C(x)for some points in a convex setC:

Tangent cone Normal cone

August 28, 2018 62 / 402

slide-4
SLIDE 4

Differentiable convex function has unique subgradient: Proof

Stated inquitively earlier. Now formally: Letf:ℜ

n → ℜbe a convex function. Iffis differentiable atx∈ ℜ n then∂f(x) ={∇f(x)}

We know from (9) that for a differentiablef:D→ ℜand open convex setD,fis convex iff,

Convexity in terms of first order approximation

August 28, 2018 63 / 402

slide-5
SLIDE 5

Differentiable convex function has unique subgradient: Proof

Stated inquitively earlier. Now formally: Letf:ℜ

n → ℜbe a convex function. Iffis differentiable atx∈ ℜ n then∂f(x) ={∇f(x)}

We know from (9) that for a differentiablef:D→ ℜand open convex setD,fis convex iff, for anyx,y∈D,f(y)≥ f(x) +∇

Tf(x)(y− x)

Thus,∇f(x)∈∂f(x). Leth∈∂f(x), thenh T(y−x)≤f(y)−f(x). Sincefis differentiable atx, we have that

The directional derivative exists at x along any direction (including along y-x)

August 28, 2018 63 / 402

slide-6
SLIDE 6

Differentiable convex function has unique subgradient: Proof

Stated inquitively earlier. Now formally: Letf:ℜ

n → ℜbe a convex function. Iffis differentiable atx∈ ℜ n then∂f(x) ={∇f(x)}

We know from (9) that for a differentiablef:D→ ℜand open convex setD,fis convex iff, for anyx,y∈D,f(y)≥ f(x) +∇

Tf(x)(y− x)

Thus,∇f(x)∈∂f(x). Leth∈∂f(x), thenh T(y−x)≤f(y)−f(x). Sincefis differentiable atx, we have that

T

lim f(y)−f(x)−∇ f(x)(y−x) = 0

∥y− x∥ y→x

T

< ϵwhenever

f(y)− f(x)− ∇ f(x)(y− x)

Thus for anyϵ>0there exists aδ>0such that ∥y−x∥<δ.

∥y− x∥

Multiplyingboth sides by∥y− x∥and adding∇

Tf(x)(y−x)to both sides, we get

f(y)−f(x)<∇

Tf(x)(y− x) + ϵ∥y− x∥whenever∥y− x∥< δ

August 28, 2018 63 / 402

slide-7
SLIDE 7

Differentiable convex function has unique subgradient: Proof

But then, given thath∈∂f(x),we obtain hT(y− x)≤ f(y)− f(x)< ∇

Tf(x)(y− x) + ϵ∥y− x∥whenever∥y− x∥< δ

Rearranging we get(h− ∇f(x)) T(y−x)<ϵ∥y−x∥whenever∥y−x∥<δ Considery−x=

At this point, we can try and choose any epsilon and any y-x whose norm will be less than delta

August 28, 2018 64 / 402

slide-8
SLIDE 8

− ∇ f(x − ∇ f(x

unit vector

Differentiable convex function has unique subgradient: Proof

But then, given thath∈∂f(x),we obtain hT(y− x)≤ f(y)− f(x)< ∇

Tf(x)(y− x) + ϵ∥y− x∥whenever∥y− x∥< δ

Rearranging we get(h− ∇f(x)) T(y−x)<ϵ∥y−x∥whenever∥y−x∥<δ

δ

Considery−x= that has norm∥.∥= less thanδ. Then, substituting in

2 δ 2 T ( δ (h− ∇ f(x)) )

< ϵ

y-x = * delta/2

the previous step:(h− ∇f(x))

2∥h− ∇ f(x)∥

Canceling out common terms and evaluating dot product as eucledian norm we get: ∥h− ∇f(x))∥<ϵ, which should be true for anyϵ>0, it should be that ∥h− ∇f(x))∥= 0. Thus, it must be thath=∇f(x))

August 28, 2018 64 / 402

δ(h )) 2∥h )∥

slide-9
SLIDE 9

The Why of (Sub)Gradient

August 28, 2018 65 / 402

slide-10
SLIDE 10

Local and Global Minima, Gradients and Convexity

Recall that for functions of single variable, at local extreme points, the tangent to the curve is a line with a constant component in the direction of the function and is therefore parallel to thex-axis.

If the function is differentiable at the extreme point, then the derivative must vanish.

This idea can be extended to functions of multiple variables. The requirement in this case turns out to be that the tangent plane to the function at any extreme point must be parallel to the planez= 0.

This can happen if and only if the gradient∇Fis parallel to thez−axis at the extreme point,

  • r equivalently, the gradient to the functionfmust be the zero vector at every extreme point.

F(x,z) = f(x) - z

August 28, 2018 66 / 402

slide-11
SLIDE 11

(Sub)Gradients and Optimality: Sufficient Condition

h^T(y-x) >= 0 for all y ...... sufficient condition 1

For a convexf,

0 is a subgradient ............... sufficient condition 2

August 28, 2018 67 / 402

slide-12
SLIDE 12

(Sub)Gradients and Optimality: Sufficient Condition

For a convexf, f(x∗) = min f(x)⇐0∈∂f(x

∗) x∈Rn

The reason:h= 0being a subgradient means that for ally

f(y) >= f(x)

August 28, 2018 67 / 402

slide-13
SLIDE 13

(Sub)Gradients and Optimality: Sufficient Condition

For a convexf, f(x∗) = min f(x)⇐0∈∂f(x

∗) x∈Rn

The reason:h= 0being a subgradient means that for ally f(y)≥f(x ∗) + 0T(y−x ∗) =f(x ∗) The analogy to the differentiable case is:∂f(x) ={∇f(x)}. Thus, for a convex functionf(x), if∇f(x) = 0, thenxmust be a point of glolbal minimum. Is there a necessary condition for a differentiable (possibly non-convex) function having a (local or global) minimum atx? (A little later)

August 28, 2018 67 / 402

slide-14
SLIDE 14

Local Extrema: Necessary Condition through Fermat’s Theorem

A theorem fundamental to determining the locally extreme values of functions of multiple variables.

Claim

If f(x)defined on a domainD⊆ ℜ has a local maximum or minimum atx ∗ and if the first-order partial derivatives exist atx ∗, then fxi(x∗) = 0for all1≤ i≤ n. Proof:

n

August 28, 2018 70 / 402

slide-15
SLIDE 15

Local Extrema: Fermat’s Theorem

To formally prove this result, Consider the functiong i(xi) =f(x ∗,x ∗, . . . ,x ∗ ,x i,x ∗ , . . . ,x

∗).

1

n 1 2 i− 1 i+1

Iffhas a local minimum (maximum) atx ∗, then

2

g_i also has a local min at x_i*

August 28, 2018 71 / 402

slide-16
SLIDE 16

Local Extrema: Fermat’s Theorem

To formally prove this result, Consider the functiong i(xi) =f(x ∗,x ∗, . . . ,x ∗ ,x i,x ∗ , . . . ,x

∗).

1

n 1 2 i− 1 i+1

Iffhas a local minimum (maximum) atx ∗, then there exists an open ball

2

Bϵ = {x|∥x− x

∗∥< ϵ}aroundx

such that for allx∈B ϵ,f(x ∗)≤ f(x) (f(x∗)≥ f(x))

Consider the norm to be the Eucledian norm∥.∥ 2. By Cauchy Shwarz inequality, for a

3

th

unit norm vectore i = [0..1..0]with a1only in thei index in the vector,

August 28, 2018 71 / 402

slide-17
SLIDE 17

Local Extrema: Fermat’s Theorem

To formally prove this result, Consider the functiong i(xi) =f(x ∗,x ∗, . . . ,x ∗ ,x i,x ∗ , . . . ,x

∗).

1

n 1 2 i− 1 i+1

Iffhas a local minimum (maximum) atx ∗, then there exists an open ball

2

Bϵ = {x|∥x− x

∗∥< ϵ}aroundx

such that for allx∈B ϵ,f(x ∗)≤ f(x) (f(x∗)≥ f(x))

Consider the norm to be the Eucledian norm∥.∥ 2. By Cauchy Shwarz inequality, for a

3

th

unit norm vectore i = [0..1..0]with a1only in thei index in the vector, |eT(x−x

∗)|=|x i −x ∗|≤ ∥x− x ∗∥∥ei∥= ∥x− x ∗∥. i i

Thus, the existence of an open ball{x|∥x− x

∗∥< ϵ}aroundx

characterizing the

4

minimum inℜ n also guarantees existence of an open ball around x_i* characterizing

the miniumum of g_i(.) in R

August 28, 2018 71 / 402

slide-18
SLIDE 18

Local Extrema: Fermat’s Theorem

To formally prove this result, Consider the functiong i(xi) =f(x ∗,x ∗, . . . ,x ∗ ,x i,x ∗ , . . . ,x

∗).

1

n 1 2 i− 1 i+1

Iffhas a local minimum (maximum) atx ∗, then there exists an open ball

2

Bϵ = {x|∥x− x

∗∥< ϵ}aroundx

such that for allx∈B ϵ,f(x ∗)≤ f(x) (f(x∗)≥ f(x))

Consider the norm to be the Eucledian norm∥.∥ 2. By Cauchy Shwarz inequality, for a

3

th

unit norm vectore i = [0..1..0]with a1only in thei index in the vector, |eT(x−x

∗)|=|x i −x ∗|≤ ∥x− x ∗∥∥ei∥= ∥x− x ∗∥. i i

Thus, the existence of an open ball{x|∥x− x

∗∥< ϵ}aroundx

characterizing the

4

minimum inℜ n also guarantees the existence of an open ball (projected ball corresponding to a projected norm){x i|∥xi −x ∗∥< ϵ}aroundx inℜ.

∗ i i

Therefore each functiong i(xi)must have a local extremum atx Which, by an earlier

∗.

5

i

result (derived for differentiable functions of single argument) implies that

Each g_i'(x_i*) = 0 That is gradient of f must vanish at x*

August 28, 2018 71 / 402

slide-19
SLIDE 19

Saddle Point

Is the converse of the foregoing result true? That is, if you find anx ∗ that satisifesf xi (x∗) = for all1≤i≤n, is it necessary thatx is an extreme point? The answer is no. In fact, points

that violate the converse of this result are called saddle points.

Definition

n

[Saddle point]: A pointx ∗ is called a saddle point of a function f(x)defined onD⊆ ℜ if x∗ is a critical point of f butx ∗ does not correspond to a local maximum or minimum of the function. Theinflection pointfor a function of single variable, that was discussed earlier, is the analogue of the saddle point for a function of multiple variables. Can you construct a saddle point of a functionf:X × Y→ ℜ ∪{±inf}as a pair (x, y)∈X × Ysatisfying the following? max f(x,y)≤f( x, y)≤ min f(x, y)

y x

August 28, 2018 76 / 402

slide-20
SLIDE 20

Saddle Point

An example forn= 2is the hyperbolic paraboloid f(x1,x 2) =x 2 −x 2, the graph of which is

2 1 2

shown in Figure 4. The hyperbolic paraboloid has a saddle point at(0,0).

Figure 4:

2The hyperbolic paraboloid is shaped like asaddleand can have a critical point called the saddle point. August 28, 2018 77 / 402

slide-21
SLIDE 21

Saddle Point

The hyperbolic paraboloid opens up onx 1-axis (Figure 5):

Figure 5:

August 28, 2018 78 / 402