Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - - PowerPoint PPT Presentation

gradient and epigraph contd
SMART_READER_LITE
LIVE PREVIEW

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 /


slide-1
SLIDE 1

Gradient and Epigraph (contd)

As an example, consider the paraboloid,f

1 2

(x ,x ) = x 2

1 2 2

+x − 9that attains its minimum at (0,0). We see below its epigraph.

Supporting hyperplan (or lower bound) at (0,0)

August 24, 2018 48 / 403

slide-2
SLIDE 2

Illustrations to understand Gradient

For the paraboloid,f(x ,

1 2

x ) =x

2 2 1 2

+x −9, the corresponding

1 2

F(x ,x ,z) = x +

2 2 1 2

x −9−zandthe pointx

0 = (x0,z) = (1,1,−7)which lies onthe

0-level surface ofF. The gradient∇F(x 1,x 2,z)is[2x 1,2x 2,−1], which when evaluated atx 0 = (1,1,−7)is[−2,−2,−1]. The equation of the tangent plane tofatx

0is

therefore given by2(x 1 −1) + 2(x 2 −1)−7 =z. The paraboloid attains its minimum at(0,0). Plot the tanget plane to the surface at (0,0,f(0,0))as also the gradient vector∇Fat(0,0,f(0,0)). What do youexpect?

August 24, 2018 49 / 403

slide-3
SLIDE 3

Illustrations to understand Gradient

For the paraboloid,f(x ,

1 2

x ) =x

2 2 1 2

+x −9, the corresponding

1 2

F(x ,x ,z) = x +

2 2 1 2

x −9−zandthe pointx

0 = (x0,z) = (1,1,−7)which lies onthe

0-level surface ofF. The gradient∇F(x 1,x 2,z)is[2x 1,2x 2,−1], which when evaluated atx 0 = (1,1,−7)is[−2,−2,−1]. The equation of the tangent plane tofatx

0is

y2(x 1 −1) + 2(x 2 −1)−7 =z. therefore given b The paraboloid attains its minimum at(0,0). Plot the tanget plane to the surface at (0,0,f(0,0))as also the gradient vector∇Fat(0,0,f(0,0)). What do you expect? Ans: A horizontal tanget plane and a vertical gradient!

August 24, 2018 49 / 403

slide-4
SLIDE 4

First-Order Convexity Conditions: The completestatement

Theorem

1

(9)

2

For differentiable f:D→ ℜand open convex setD, f is convexiff, for anyx,y∈D, f(y)≥f(x) + ∇

Tf(x)(y−x)

f is strictly convexiff, for anyx,y∈D, withx̸=y, f(y)>f(x) + ∇

Tf(x)(y−x)

3 f is strongly convexiff, for anyx,y∈D, and for some constant c>0,

f(y)≥f(x) + ∇

Tf(x)(y−x) +

1 c||y−x|| 2 2 (11)

Strict lower bound(10)

August 24, 2018 50 / 403

slide-5
SLIDE 5

First-Order Convexity Condition: Proof

Proof: Sufficiency:The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Considerx 1,x 2 ∈Dand anyθ∈(0,1). Letx=θx

1 + (1−θ)x 2. Then,

f(x1)≥f(x)+ ∇ f(x2)≥f(x)+ ∇ (12)

Tf(x)(x1 −x) multiply by theta

August 24, 2018 51 / 403

Tf(x)(x2 −x) multiply by 1-theta

And add..

slide-6
SLIDE 6

First-Order Convexity Condition: Proof

Proof: Sufficiency:The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Considerx 1,x 2 ∈Dand anyθ∈(0,1). Letx=θx

1 + (1−θ)x 2. Then,

f(x1)≥f(x) + ∇

Tf(x)(x1 −x)

(12) f(x2)≥f(x) + ∇

Tf(x)(x2 −x)

Adding(1−θ)times the second inequality toθtimes the first, we get, θf(x1) + (1−θ)f(x 2)≥f(x) which proves thatf(x)is a convex function. In the case of strict convexity, strict inequality holds in (12) and it follows through. In the case of strong convexity, we need to additionally prove that 2 θ c

1 2

||x−x || + (1 1 1 2

2

−θ) c||x−x ||2=

August 24, 2018 51 / 403

slide-7
SLIDE 7

First-Order Convexity Conditions: Proofs

Necessity:Supposefis convex. Then for allθ∈(0,1)andx

1,x 2 ∈D, we musthave

f(θx2+ (1−θ)x 1)≤θf(x 2) + (1−θ)f(x 1) Thus, ∇Tf(x1)(x2 −x 1) = Directional derivative of f at x1 along

x2 - x1

August 24, 2018 52 / 403

slide-8
SLIDE 8

First-Order Convexity Conditions: Proofs

Necessity:Supposefis convex. Then for allθ∈(0,1)andx

1,x 2 ∈D, we musthave

f(θx2+ (1−θ)x 1)≤θf(x 2) + (1−θ)f(x 1) Thus,

T

∇ f(x )(

1 2 1

x −x ) =lim

θ→ 1

f x +θ(

2 1

( )

1

x −x ) −f(x ) θ ≤f(x 2)−f(x 1) This proves necessity for (9). The necessity proofs for (10) and (11) are very similar, except for a small difference for the case of strict convexity; the strict inequality is not preserved when we take

  • limits. Suppose equality does hold in the case of strict convexity, that is for a strictly convex

functionf, let f(x2) =f(x 1) + ∇ Tf(x1)(x2 −x 1) (13) for somex 2̸=x 1.

August 24, 2018 52 / 403

slide-9
SLIDE 9

First-Order Convexity Conditions: Proofs

Necessity (contd for strict case): Becausefis stricly convex, for anyθ∈(0,1)we canwrite f ((1−θ)x 1 +θx 2) =f (x1 +θ(x 2 −x 1)) <(1−θ)f(x 1) +θf(x 2)(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get

August 24, 2018 53 / 403

slide-10
SLIDE 10

First-Order Convexity Conditions: Proofs

Necessity (contd for strict case): Becausefis stricly convex, for anyθ∈(0,1)we canwrite f ((1−θ)x 1 +θx 2) =f (x1 +θ(x 2 −x 1)) <(1−θ)f(x 1) +θf(x 2)(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get f(x1) +θ∇ Tf(x1)(x2 −x 1)≤f (x1 +θ(x2 −x 1) )<f(x 1) +θ∇ Tf(x1)(x2 −x 1) contradiction. which is a Thus, equality can never hold in (9) for anyx 1 = ̸ x 2. This proves the necessity of (10).

August 24, 2018 53 / 403

slide-11
SLIDE 11

First-Order Convexity Conditions: The complete statement

The geometrical interpretation of this theorem is that at any point, the linear approximation based on a local derivative gives a lower estimate of the function,i.e.the convex function always lies above the supporting hyperplane at that point. This is pictorially depictedbelow:

August 24, 2018 54 / 403

slide-12
SLIDE 12

(Tight) Lower-bound for any (non-differentiable) Convex Function?

For any convex functionf(even if non-differentiable) The epi-graphepi(f)will be convex The convex epi-graphepi(f)will have

a supporting hyperplane at any boundary point (x,f(x))

August 24, 2018 55 / 403

slide-13
SLIDE 13

(Tight) Lower-bound for any (non-differentiable) Convex Function?

For any convex functionf(even if non-differentiable) The epi-graphepi(f)will be convex The convex epi-graphepi(f)will have a supporting hyperplane at every boundarypointx

epi(f) x

[h,-1]

There exist multiple supporting hyperplanes Let a supporting hyperplane be characterized by a normal vector [h(x), -1] When f was differentiable, this vector was [gradient(x), -1]

August 24, 2018 55 / 403

slide-14
SLIDE 14

(Tight) Lower-bound for any (non-differentiable) Convex Function?

For any convex functionf(even if non-differentiable) The epi-graphepi(f)will be convex The convex epi-graphepi(f)will have a supporting hyperplane at every boundarypointx

{ [ ] [ ] [ [v,z]|⟨ h(x),−1 ,[v,z]⟩=⟨ h(x),−1 , x ] } ,f(x) ⟩ for all [v,z]on the hyperplane and ⟨[h(x),−1] ,[y,z]⟩ ≤ ⟨ [h(x),−1 ] ,[x,f(x)]⟩ for all[y,z]∈epi(f)whic h also includes

[y,f(y)]

August 24, 2018 55 / 403

slide-15
SLIDE 15

(Tight) Lower-bound for any (non-differentiable) Convex Function?

For any convex functionf(even if non-differentiable) The epi-graphepi(f)will be convex The convex epi-graphepi(f)will have a supporting hyperplane at every boundarypointx

▶ {

[v,z]|⟨ [h(x),−1 ] ,[v,z]⟩=⟨ [h(x),−1 ] , [x,f(x) ]⟩ } for all[v,z]on the hyperplane and ⟨[h(x),−1] ,[y,z]⟩ ≤ ⟨ [h(x),−1 ] ,[x,f(x)]⟩for all[y,z]∈epi(f)which also includes [y,f(y)]

[ ] [ ] [ ] [ ] Thus:⟨ h(x),−1 , y,f(y) ⟩ ≤ ⟨ h(x),−1 , x,f(x) ⟩for ally∈domain off The normal to such a supporting hyperplane serves the same purpose asthe

[gradient(x),-1]

August 24, 2018 55 / 403

slide-16
SLIDE 16

The What, Why and How of (sub)gradients

What of (sub)gradient: Normal to supporting hyperplane at point (x,f(x) of epi(f) Need not

be unique Gradient is a subgradient when the function is differentiable

August 24, 2018 56 / 403

slide-17
SLIDE 17

The What, Why and How of (sub)gradients

Normal to the tightly lower bounding linear approximation to a What of (sub)gradient: convex function Why of (sub)gradient: (sub)Gradient necessary and sufficient conditions of optimality for

convex functions Important for algorithms for optimization Subgradients are important for non-differentiable functions and constraint optimization

August 24, 2018 56 / 403

slide-18
SLIDE 18

The What, Why and How of (sub)gradients

Ability to deal with Constraints, Optimality Conditions, What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: Optimization Algorithms How of (sub)gradient: How to compute subgradient of complex non-differentiable

convex functions Calculus of convex functions and of subgradients

August 24, 2018 56 / 403

slide-19
SLIDE 19

First-Order Convexity Conditions: Subgradients

August 24, 2018 58 / 403

The foregoing result motivates the definition of thesubgradientfor non-differentiable convex functions, which has properties very similar to the gradient vector.

Definition

[Subgradient]: Letf:D→ ℜbe a convex function defined on a convex setD. Avector h∈ℜn is said to be asubgradientoffat the pointx∈Dif f(y)≥f(x) +h T(y−x) for ally∈D. The set of all such vectors is called the subdifferential offatx. For a differentiable convex function, the gradient at pointxis the only subgradient at that

  • point. Most properties of differentiable convex functions that hold in terms of the gradient

also hold in terms of the subgradient for non-differentiable convex functions. Eg: Subgradient forf(x) =∥x∥ 1 is ? Once we develop tools (the HOW part) we will

see that the subdifferential contains infinite such h at some points x

slide-20
SLIDE 20

(Sub)Gradients and Convexity (contd)

ngle unique) To say that a functionf:ℜ n 7→ ℜis differentiable atxis to say that there is a ( s i linear tangent that under estimates the function: f(y)≥f(x) +▽f(x) T(y−x),∀x,y

Can you think of a non-convex function f which has a non-empty subdifferential (atleast at some points x)? Could this be for the negative of the Gaussian?

August 24, 2018 59 / 403

slide-21
SLIDE 21

(Sub)Gradients and Convexity (contd)

In this figure we see the functionfatxhas many possible linear tangents that may fit

  • appropriately. Recall that asubgradientis anyh∈ ℜ

n (same dimension asx) such that:

f(y)≥f(x) +h T(y−x),∀y Thus, intuitively, if a function is differentiable at a pointxthen

August 24, 2018 60 / 403