Constrained Optimization inℜ: Recap
October 19, 2018 197 / 424
Constrained Optimization in : Recap October 19, 2018 197 / 424 - - PowerPoint PPT Presentation
Constrained Optimization in : Recap October 19, 2018 197 / 424 Global Extrema on Closed Intervals Recall the extreme value theorem. A consequence is that: if either of c or d lies in( a , b ), then it is a critical number of f ; else each
October 19, 2018 197 / 424
1 2 Compute the values of f at the critical points and at the endpoints of the
3 Select the least and greatest of the computed values. October 19, 2018 198 / 424
▶ We first computef ′(x) = 12x2 −16x+ 5which is0atx=
2 6 1, 5. 2 6 27
▶ Values at the critical points aref( 1 ) = 1,f( 5 ) = 25 . ▶ The values at the end points aref(0) = 0andf(1) = 1. ▶ Therefore, the minimum value isf(0) = 0and the maximum value isf(1) =f(
2 1 ) =1.
October 19, 2018 199 / 424
h→0+
h→0−
October 19, 2018 200 / 424
′(a)≤0or
October 19, 2018 201 / 424
October 19, 2018 201 / 424
October 19, 2018 202 / 424
′(c) = 0, then f(c)isthe ′(c) = 0, then f(c)isthe
2 3
−π π ′ 2 2 2 3
2 sinx 3 cos2 x 6
π. Further, 2 2
−π , π). Therefore,fattains the maximumvalue 6 9 π π 2 √ 3
October 19, 2018 203 / 424
3
1 2
R √ h2 h−R = +r
2
r
Figure 10:
October 19, 2018 204 / 424
October 19, 2018 206 / 424
gi
i(x)≤0
▶ We have shown that this is convex if eachg i(x)is convex.
October 19, 2018 207 / 424
gi
i(x)≤0
▶ We have shown that this is convex if eachg i(x)is convex.
gi
i
October 19, 2018 207 / 424
i
i
i
x x
i Ci
October 19, 2018 208 / 424
i
i
i
x x
i Ci
F f
I i
Ci
▶ hf(x) =∇f(x)iff(x)is differentiable. Also,−∇f(x)atx
k optimizes
October 19, 2018 208 / 424
i
i
i
x x
i Ci
F f
I i
Ci
▶ hf(x) =∇f(x)iff(x)is differentiable. Also,−∇f(x)atx
k optimizes the first order k h
approximation forf(x)aroundx :−∇f(x) =argmin f
k
(x ) +
T k
1 2 ∇ f(x )h+ ||
2
h|| : Variations on the form of 1 ||h||2 lead to Mirror Descent etc.
▶ hICi
i ICi i
has other solutions ifxis on the boundary: Analysis for convexg i’s leads to KKT conditions and Dual Ascent etc.
replacing with entropic
2
(x)isd∈R
n s.t.d Tx≥d Ty,∀y∈C
. Also,h (x) = 0ifxis in t
r
h
e
e
g u
i n
l a
t
r
e
i z
r i
e
r
, and
October 19, 2018 208 / 424
i ICi (x))
xin
k leaving
k+1: x k+1 k
T k k
k 2
October 19, 2018 209 / 424
i ICi (x))
xin
k leaving
k+1: x
k+1 k T k k
k 2
2
October 19, 2018 209 / 424
i ICi (x))
xin
k leaving
k+1: x
k+1 k T k
k
k 2
k t
k 2
2
1 x 2t
k k 2
k+1 c xk −t∇f(x k)
x
k
k 2
(point closest to the unregulated gradient descent update with a later regulation using c(x))
this unregulated descent will be often referred to as z
October 19, 2018 209 / 424
PROX gives you the point closes to the unregulated (wrt to c(x)) update when we also bring in the effect of (minimizing) c(x) Basically we have phased out the subgradient descent update into two phases (a) unregulated update (such as gradient descent) for f(x) alone (b) course correction based on c(x)
xin
k
c k k
k)>f(x k−1)) issatisfieda
aBetter criteria can be found using Lagrange duality theory, etc.
Figure 11: The generalized gradient descent algorithm.
October 19, 2018 210 / 424
x
2
9ftakes values in the extended real number line such thatf(x)<+∞for at least onexandf(x)>−∞for October 19, 2018 211 / 424
it is finite value < +inf atleast at one point and is not -inf everywhere else
x
2
Forx∈ ℜ,c(x) = Forz∈ ℜ&t= 1, proxc(z) = Simplified Lasso:λ|x| 1 ?? µx x≥0 ∞x<0 ?? µλx3 x≥0 ∞x<0 ?? −λlogx x>0 ∞x≤0 ?? Inspired by or inspire δ[0,η]∩ℜ ??
ftakes values in the extended real number line such thatf(x)<+∞for at least onexandf(x)>−∞for
9 October 19, 2018 211 / 424
s barrier function
x
2
λc (
1x
) c(x) = Fort= 1, proxc(z) = Constant:c ?? Affine:a Tx+b ?? Convex quadratic: 1 xTAx+b Tx+c
2
(whereA∈S n ,b
n)
??
+ ∈ ℜ n
Sum over components:c(x) = ∑ ci(xi)
i=1
??? c(λx+a) ??
λ
?? calculus c(x) +a Tx+ β ∥x∥2+γ
2
?? c(Ax+b) ?? c(∥x∥) ??
Forx∈ ℜ,c(x) = Forz∈ ℜ&t= 1, proxc(z) = Simplified Lasso:λ|x| 1 ?? µx x≥0 ∞x<0 ?? µλx3 x≥0 ∞x<0 ?? −λlogx x>0 ∞x≤0 ?? δ[0,η]∩ℜ ??
ftakes values in the extended real number line such thatf(x)<+∞for at least onexandf(x)>−∞for
9 October 19, 2018 211 / 424
2 2
1
(0)
▶ Let xb(k+1) ≡z (k+1) be a next gradient descent iterate forf(x k) ▶
x 2 (k+1) 1 (k+1) 2 2
Computex =argmin ∥x−z ∥ +λ
1
t∥x∥ by setting subgradient of this objective
https://www.cse.iitb.ac.in/~cs709/notes/enotes/lassoElaboration.pdf)
1 2 3
...
k w.r.t
▶ Setk=k+ 1,untilstopping criterion is satisfied (such as no significant changes inx
x(k−1))
prox step
October 19, 2018 213 / 424
... ... Vector x^(k+1) is obtained by componentwise minimization
2 2 1
(0)
▶ Letz (k+1) be a next gradient descent iterate forf(x k) ▶
∥x∥1
( Compute prox z )
(k+1) (k+1)
= x =argmin
x 1 (k+1) 2 2t 2 1
∥x−z ∥ +λ∥x∥ as follows:
If x
( k+ 1 ) i
=− λt+z
( k+ 1 )
z ( k + 1)
i
>λt, then
( k + 1)
Ifz i <−λt, thenx
( k+ 1 ) =λt+z i ( k+ 1 ) i i 1 2 3
0otherwise.
▶ Setk=k+ 1,untilstopping criterion is satisfied (such as no significant changes inx
k w.r.t
x(k−1))
If unregulated z was gretater than lambda t reduce it by that amount
October 19, 2018 214 / 424
c
x
2
6λ x √ 2 Forx∈ ℜ,c(x) = Forz∈ ℜ&t= 1, proxc(z) = Simplified Lasso:λ|x| [|x|−λ] +sign(x) µx x≥0 ∞x<0 [x−µ] + µλx3 x≥0 ∞x<0 −1 + √ 1 + 12λ[x]+ −λlogx x>0 ∞x≤0 + x + 4λ 2 δ[0,η]∩ℜ min{max{x,0},η}
October 19, 2018 215 / 424