more subgradient calculus function convexity
play

More Subgradient Calculus: Function Convexity first Following - PowerPoint PPT Presentation

More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? n i f i is convex if


  1. More Subgradient Calculus: Function Convexity first Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? n ∑ α i f i is convex if each f i for1 ≤ i ≤ n is convex and Nonnegative weighted sum: f = i =1 α i ≥0,1≤ i ≤ n . Composition with affine function: f ( Ax + b )is convex if f is convex. For example: ∑ m The log barrier for linear inequalities, f ( x ) =− log( b i − a T x ), is convex since−log( x )is ▶ i i =1 convex. ▶ Any norm of an affine function, f ( x ) =|| Ax + b ||, is convex. September 1, 2018 86 / 402

  2. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) Norms: important special case, f ( x ) =|| x || p The derivations done in class could be used to show that if any other subgradient exists for g outside the stated set above, that could be used to construct a subgradient for f outside the stated set above as well! September 1, 2018 87 / 402

  3. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) = max z T x where q is such that Norms: important special case, f ( x ) = || x || p || z || q ≤1 1/ p + 1/ q = 1. Then On the board we have used y instead of z September 1, 2018 87 / 402

  4. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) = max z T x where q is such that Norms: important special case, f ( x ) = || x || p || z || q ≤1 1/ p + 1/ q = 1. Then { } ∂ f ( x ) = q ≤ 1and y z T x y :|| y || T x =max = || z || q ≤1 y corresponds to z where the max is attained The part above is largely connected to previous discussion on max of convex functions September 1, 2018 87 / 402

  5. More of Basic Subgradient Calculus Scaling:∂( af ) = a ·∂ f provided a >0. The condition a >0makes function f remain convex. Addition:∂( f 1 + f 2 ) =∂( f 1 ) +∂( f 2 ) Affine composition: if g ( x ) = f ( A x + b ), then ∂ g ( x ) = A T ∂ f ( A x + b ) Norms: important special case, f ( x ) = || x || = max z T x where q is such that p || z || q ≤1 This is derived in 1/ p + 1/ q = 1. Then class { } { } ∂ f ( x ) = q ≤ 1and y q ≤ 1and y y :|| y || T x =max z T x T x = || x || = y :|| y || p || z || q ≤1 Why ||y||_q <= 1 is because of Minkowski's inequality September 1, 2018 87 / 402

  6. Subgradients for the ‘Lasso’ Problem in Machine Learning We use Lasso (min f ( x )) as an example to illustrate subgradients of affine composition: x f ( x ) = 1|| y − x || 2 +λ|| x || 1 2 The subgradients of f ( x )are x - y + \lambda s Where s = {+1,-1}^n such that ||x||_1 = s^T x September 1, 2018 88 / 402

  7. Subgradients for the ‘Lasso’ Problem in Machine Learning We use Lasso (min f ( x )) as an example to illustrate subgradients of affine composition: x f ( x ) = 1|| y − x || 2 +λ|| x || 1 2 The subgradients of f ( x )are h = x − y +λ s , i ∈ [−1,1]if x where s i = sign ( x i )if x i = 0and s i = 0. Second component is a result of the convex hull September 1, 2018 88 / 402

  8. More Subgradient Calculus: Composition Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if We will consider ▶ q i is convex, p is convex and nondecreasing in each argument only the first case ▶ or q i is concave, p is convex and nonincreasing in each argument September 1, 2018 89 / 402

  9. More Subgradient Calculus: Composition Following functions, though convex, may not be differentiable everywhere. How does one compute their subgradients? (what holds for subgradient also holds for gradient) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if In both conditions, q i is convex, p is convex and nondecreasing in each argument composition will be ▶ ▶ or q i is concave, is convex and nonincreasing in each argument p concave if p is Some examples illustrating this property are: concave exp q ( x )is convex if q is convex ▶ exp is a monotonic and convex p ∑ m log q i ( x )is concave if q i are concave and positive ▶ p is concave i =1 and hence the ∑ m ▶ log exp q i ( x )is convex if q i are convex composition is concave i =1 ▶ 1/ q ( x )is convex if q is concave and positive September 1, 2018 89 / 402

  10. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): September 1, 2018 90 / 402

  11. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): ( ) f ( y ) = p ( q 1 ( y ), . . . , q k ( y )) ≥ p q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ▶ q 1 q k Where h q i ∈ ∂ q i ( x )for i = 1.. k and since p (.)is non-decreasing in each argument. ( ) q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) p ≥ ▶ q 1 q k ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( h T ( y − x ), . . . , h T ( y − x ) p q 1 q k Where h p ∈ ∂ p ( q 1 ( x ), . . . , q k ( x )) All we need to do next is club together h_p and h_q and leave only (y-x) in the second component September 1, 2018 90 / 402

  12. More Subgradient Calculus: Composition (contd) Composition with functions: Let p : ℜ k → ℜ with q ( x ) = ∞ , ∀ x / ∈ d om h and q : ℜ n → ℜ k . Define f ( x ) = p ( q ( x )). f is convex if ▶ q i is convex, p is convex and nondecreasing in each argument ▶ or q i is concave, p is convex and nonincreasing in each argument Subgradients for the first case (second one is homework): ( ) f ( y ) = p ( q 1 ( y ), . . . , q k ( y )) ≥ p q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ▶ q 1 q k Where h q i ∈ ∂ q i ( x )for i = 1.. k and since p (.)is non-decreasing in each argument. ) ( q 1 ( x ) + h T ( y − x ), . . . , q k ( x ) + h T ( y − x ) ≥ ▶ p q 1 q k ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( h T ( y − x ), . . . , h T ( y − x ) p q 1 q k Where h p ∈ ∂ p ( q 1 ( x ), . . . , q k ( x )) = f ( x ) + ∑ ( h p ) i h T ( y − x ) p ( q 1 ( x ), . . . , q k ( x )) + h T ( ) k h T ( y − x ), . . . , h T ( y − x ) ▶ p q 1 q k q i i =1 That is, ∑ ( h p ) i h q is a subgradient of the composite function at x . k i H/W: Derive the subdi ff erentials to example functions on previous slide i =1 September 1, 2018 90 / 402

  13. More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum: If c ( x , y )is convex in( x , y )andCis a convex set, then d ( x ) =inf c ( x , y )is y ∈ C convex. For example: ▶ Let d ( x ,C)that returns the distance of a point x to a convex setC. That is || x − y ||= || x − P d ( x ,C) = inf C ( x )||, where, P C ( x ) = argmin d ( x ,C) . Then d ( x ,C)is a y ∈ C x − P C ( x ) convex function and ∇ d ( x ,C) = ∥ x − P C ( x ) ∥ H/w: Prove that d is convex if c is a convex function and if C is a convex set September 1, 2018 91 / 402

  14. More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be differentiable everywhere. How does one compute their subgradients at points of non-differentiability? Infimum: If c ( x , y )is convex in( x , y )andCis a convex set, then d ( x ) =inf c ( x , y )is y ∈ C convex. For example: ▶ Let d ( x ,C)that returns the distance of a point x to a convex setC. That is || x − y ||= || x − P d ( x ,C) = inf C ( x )||, where, P C ( x ) = argmin d ( x ,C) . Then d ( x ,C)is a y ∈ C x − P C ( x ) convex function and ∇ d ( x ,C) = ....The point of intersection of convex sets ∥ x − P C ( x ) ∥ C 1 , C 2 ,... C m by minimizing... (Subgradients and Alternating Projections) ▶ argmin d ( x ,C)is a special case of the proximity operator: prox c ( x ) = argmin PROX c ( x )of a y ∈ C y 1 || x − y ||The special case is when convex function c ( x ). Here, PROX c ( x ) = c ( y ) + 2 c(x) is the indicator function over C September 1, 2018 91 / 402

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend