More Subgradient Calculus: Proximal Operator Following functions are - PowerPoint PPT Presentation

More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be diferentiable everywhere. How does one compute their subgradients at points of non-diferentiability? Infmum: If c ( x, y ) is convex in ( x, y ) and C is a convex set, then d ( x ) = inf c ( x, y ) is y ∈ C convex. For example: ▶ Let d ( x , C ) that returns the distance of a point x to a convex set C . That is d ( x , C ) = inf || x − y || = || x − P C ( x , y ) || , where, P C ( x , y ) = argmin d ( x , C ) . Then d ( x , C ) y ∈ C is a convex function and ∇ d ( x , C ) = x − P C ( x , y ) ....The point of intersection of convex ∥ x − P C ( x , y ) ∥ sets C , C ,... C m by minimizing... (Subgradients and Alternating Projections) 1 2 ▶ argmin d ( x , C ) is a special case of the proximity operator: prox c ( x ) = argmin PROX c ( x , y ) of y ∈ C y 2 || x − y || The special case is when a convex function c ( x ). Here, PROX c ( x , y ) = c ( y ) + 1 c ( y ) is the indicator function I C ( y ) introduced earlier to eliminate the contraints of an optimization problem. ⋆ Recall that ∂I C ( y ) = N C ( y ) = { h ∈ ℜ : h y ≥ h z for any z ∈ C} n T T ⋆ The subdiferential ∂PROX c ( x , y ) = ∂c ( y ) + y − x which can now be obtained for the special case c ( y ) = I C ( y ). ⋆ We will invoke this when we discuss the proximal gradient descent algorithm September 4, 2018 91 / 406

More Subgradient Calculus: Perspective (Advanced) Following functions are again convex, but again, may not be diferentiable everywhere. How does one compute their subgradients at points of non-diferentiability? Perspective Function: The perspective of a function f : ℜ → ℜ is the function n g : R × ℜ → ℜ n , g ( x, t ) = tf ( x / t ). Function g is convex if f is convex on { ( x, t ) |x / t ∈ d omf, t > 0 } . For example, d omg = ▶ The perspective of f ( x ) = x x is (quadratic-over-linear) function g ( x, t ) = x T x T and is convex. t ▶ The perspective of negative logarithm f ( x ) = − log x is the relative entropy function g ( x, t ) = t log t − t log x and is convex. relative to t September 4, 2018 92 / 406

Ilustrating the Why and How of (Sub)Gradient on Lasso September 4, 2018 93 / 406

Recap: Subgradients for the ‘Lasso’ Problem in Machine Learning Recall Lasso (min f ( x )) as an example to illustrate subgradients of afne composition: x 1 y is fixed 2 || y − x || + λ|| x || 2 f ( x ) = 1 The subgradients of f ( x ) are x - y + lambda s s.t: s_i = sign(x_i) if x_i != 0 o/w: 0 <= s_i <= 1 September 4, 2018 94 / 406

Recap: Subgradients for the ‘Lasso’ Problem in Machine Learning Recall Lasso (min f ( x )) as an example to illustrate subgradients of afne composition: x 1 2 || y − x || + λ|| x || 2 f ( x ) = 1 The subgradients of f ( x ) are h = x − y + λ s , where s i = sign ( x i ) if x i = 0 and s i ∈ [ − 1 , 1] if x i = 0. results from convex hull of union of subdi ff erentials Here we only see "HOW" to compute the subdi ff erential. September 4, 2018 94 / 406

Invoking "WHY" of subdi ff .. Subgradients in a Lasso sub-problem: Sufcient Condition Test We illustrate the sufcient condition again using a sub-problem in Lasso as an example. Consider the simplifed Lasso problem (which is a sub-problem in Lasso): min_x 1 2 || y − x || + λ|| x || 2 f ( x ) = 1 Recall the subgradients of f ( x ): h = x − y + λ s , where s i = sign ( x i ) if x i = 0 and s i ∈ [ − 1 , 1] if x i = 0. A solution to this problem is x_i = 0 if y_i is between -\lambda and \lambda and there exists an s_i between -1 and +1 for this case In fact this s_i = y_i / lambda September 4, 2018 95 / 406

Subgradients in a Lasso sub-problem: Sufcient Condition Test We illustrate the sufcient condition again using a sub-problem in Lasso as an example. Consider the simplifed Lasso problem (which is a sub-problem in Lasso): 1 2 || y − x || + λ|| x || 2 f ( x ) = 1 Recall the subgradients of f ( x ): h = x − y + λ s , where s i = sign ( x i ) if x i = 0 and s i ∈ [ − 1 , 1] if x i = 0. y i + λ ∗ y A solution to this problem is x = S ( y ), where S ( y ) is the soft-thresholding operator : λ λ i − λ if y i > λ if −λ ≤ y i ≤ λ S ( y ) = 0 λ if y i < −λ Now if x = S ( y ) then there exists a h ( x ) = 0. Why? If y i > λ , we have ∗ λ ∗ x − y i = −λ + λ · 1 = 0. The case of y i < λ is similar. If −λ ≤ y i ≤ λ , we have i ∗ y i y i x i − y i −y i + λ ( λ ) = 0. Here, s i = λ . = September 4, 2018 95 / 406

Proximal Operator and Sufcient Condition Test Recap: d ( x , C ) returns the distance of a point x to a convex set C . That is d ( x , C ) = inf || x − y || . Then d ( x , C ) is a convex function. 2 y ∈ C Recap: argmin || x − y || is a special case of the proximal operator: y ∈ C prox c ( x ) = argmin PROX c ( x , y ) of a convex function c ( x ). Here, y 2 || x − y || 1 2 PROX c ( x , y ) = c ( y ) + The special case is when c ( y ) is the indicator function I C ( y ) introduced earlier to eliminate the contraints of an optimization problem. ▶ Recall that ∂I C ( y ) = N C ( y ) = { h ∈ ℜ : h y ≥ h z for any z ∈ C} n T T ▶ For the special case c ( y ) = I C ( y ), the subdiferential ∂PROX c ( x , y ) = ∂c ( y ) + y − x = { h − x ∈ ℜ : h y ≥ h z for any z ∈ C} n T T that y in C ▶ As per sufcient condition for minimum for this special case, prox c ( x ) = that is closest to x September 4, 2018 96 / 406

Convexity by Restriction to line, (Sub)Gradients and Monotonicity September 4, 2018 97 / 406

Convexity by Restricting to Line A useful technique for verifying the convexity of a function is to investigate its convexity, by restricting the function to a line and checking for the convexity of a function of single variable. Theorem A function f : D → ℜ is (strictly) convex if and only if the function ϕ : D → ℜ defned below, ϕ is (strictly) convex in t for every x ∈ ℜ and for every h ∈ ℜ n n Direction vector or line We saw the connection with Here we see connection ϕ ( t ) = f ( x + t h ) R: convex di ff erentiable fn with direction, independent of di ff erentiability i ff directional deriv is convex { t| x + t h ∈ D . } with the domain of ϕ given by D = ϕ along every direction Thus, we have see that ∗ ∗ If a function has a local optimum at x , it as a local optimum along each component x i of x ∗ If a function is convex in x , it will be convex in each component x i of x September 4, 2018 98 / 406

Convexity by Restricting to Line (contd.) Proof: We will prove the necessity and sufciency of the convexity of ϕ for a convex function f . The proof for necessity and sufciency of the strict convexity of ϕ for a strictly convex f is very similar and is left as an exercise. Proof of Necessity: Assume that f is convex. And we need to prove that ϕ ( t ) = f ( x + t h ) is also convex. Let t , t ∈ D and θ ∈ [0 , 1]. Then, ( ) (for any direction h) ϕ 1 2 ϕ ( θt + (1 − θ ) t ) = f θ ( x + t h ) + (1 − θ )( x + t h ) 1 2 1 2 <= \theta f(...x_1) + (1-\theta) f(...x_2) September 4, 2018 99 / 406

Convexity by Restricting to Line (contd.) Proof: We will prove the necessity and sufciency of the convexity of ϕ for a convex function f . The proof for necessity and sufciency of the strict convexity of ϕ for a strictly convex f is very similar and is left as an exercise. Proof of Necessity: Assume that f is convex. And we need to prove that ϕ ( t ) = f ( x + t h ) is also convex. Let t , t ∈ D and θ ∈ [0 , 1]. Then, ( ) ϕ 1 2 ( ) ( ) ϕ ( θt + (1 − θ ) t ) = f θ ( x + t h ) + (1 − θ )( x + t h ) 1 2 1 2 ≤ θf ( x + t h ) + (1 − θ ) f ( x + t h ) = θϕ ( t ) + (1 − θ ) ϕ ( t ) (16) 1 2 1 2 Thus, ϕ is convex. September 4, 2018 99 / 406

Convexity by Restricting to Line (contd.) ( ) Proof of Sufciency: Assume that for every h ∈ ℜ and every x ∈ ℜ , ϕ ( t ) = f ( x + t h ) is n n convex. We will prove that f is convex. Let x , x ∈ D . Take, x = x and h = x − x . We 1 2 1 2 1 know that ϕ ( t ) = f x + t ( x − x ) is convex, with ϕ (1) = f ( x ) and ϕ (0) = f ( x ). 1 2 1 2 1 Therefore, for any θ ∈ [0 , 1] ( ) f θ x + (1 − θ ) x = ϕ ( θ ) 2 1 <= theta \phi(1) + (1-theta)\phi(0) = theta f(x2) + (1-theta) f(x1) September 4, 2018 100 / 406

Convexity by Restricting to Line (contd.) ( ) Proof of Sufciency: Assume that for every h ∈ ℜ and every x ∈ ℜ , ϕ ( t ) = f ( x + t h ) is n n convex. We will prove that f is convex. Let x , x ∈ D . Take, x = x and h = x − x . We 1 2 1 2 1 know that ϕ ( t ) = f x + t ( x − x ) is convex, with ϕ (1) = f ( x ) and ϕ (0) = f ( x ). 1 2 1 2 1 Therefore, for any θ ∈ [0 , 1] ( ) f θ x + (1 − θ ) x = ϕ ( θ ) 2 1 ≤ θϕ (1) + (1 − θ ) ϕ (0) ≤ θf ( x ) + (1 − θ ) f ( x ) (17) 2 1 This implies that f is convex. September 4, 2018 100 / 406

More on SubGradient kind of functions: Monotonicity A diferentiable function f : ℜ → ℜ is (strictly) convex, if and only if f ( x ) is (strictly) ′ increasing. Is there a closer analog for f : ℜ → ℜ ? n Ans: Yes. We need a notion of monotonicity of vectors (subgradients) September 4, 2018 101 / 406

More Subgradient Calculus: Proximal Operator Following functions are - PowerPoint PPT Presentation

More Subgradient Calculus: Proximal Operator Following functions are again convex, but again, may not be diferentiable everywhere. How does one compute their subgradients at points of non-diferentiability? Infmum: If c ( x, y ) is convex in ( x, y

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

Kurdyka- Lojasiewicz inequality and Kurdyka- Lojasiewicz inequality and subgradient

More Subgradient Calculus: Function Convexity first Following functions are again convex, but

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Relational Calculus Another Theoretical QL-Relational Calculus Comes in two flavors: Tuple

Lecture 1 : Lambda Calculus CS6202 Introduction 1 Lambda Calculus Lambda Calculus

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

Deep Unfolded Proximal Interior Point Algorithm for Image Restoration C. Bertocchi 1 , E.

Deep Unfolding of a Proximal Interior Point Method for Image Restoration M.-C. Corbineau 1 in

Computational Semantics: More Calculus -calculus Recap NLTK semantics operations Scott

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Ito calculus, Malliavin calculus and Mathematical Finance Shigeo Kusuoka Mathematical Finance

Resequencing Calculus Existing Solutions An Early Multivariate Approach Our Solution Revised

Whats Calculus? Answer: Next semester! (Fundamental Theorem of Calculus, by Newton and

Event calculus Problem Event Calculus I Constraint Logic Fritz Hamm Programming Event

On Adaptive Interventions and SMART Daniel Almirall; Inbal (Billie) Nahum-Shani IES 2015 Principal

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

NCDawareRank A Novel Ranking Method that Exploits the Decomposable Structure of the Web

On Corson and Valdivia compact spaces* Reynaldo Rojas Hern andez Centro de Ciencias Matem

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer

COMMUNICATIVE LANGUAGE TEACHING TO SUPPORT PROJECT-BASED LANGUAGE LEARNING IN HERITAGE LANGUAGES

Privacy, AllJoyn, IoT: Why proximal networks are better JAMES KANE Co-Founder, Two Bulls 24

Accelerating Optimization via Adaptive Prediction Mehryar Mohri 1 Scott Yang 2 1 Google, New York