Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x − 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 / 403

Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is therefore given by2( x 1 − 1) + 2( x 2 − 1) − 7 = z . The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do youexpect? August 24, 2018 49 / 403

Illustrations to understand Gradient 2 2 For the paraboloid, f ( x , x ) = x + x − 9, the corresponding 1 2 1 2 F ( x , x , z ) = x 2 2 0 = ( x 0 , z ) = (1 , 1 , − 7)which lies onthe x − 9 − z andthe point x + 1 2 1 2 0-level surface of F . The gradient ∇ F ( x 1 , x 2 , z )is[2 x 1 , 2 x 2 , − 1], which when evaluated at x 0 = (1 , 1 , − 7)is[ − 2 , − 2 , − 1]. The equation of the tangent plane to f at x 0 is y2( x 1 − 1) + 2( x 2 − 1) − 7 = z . therefore given b The paraboloid attains its minimum at(0 , 0). Plot the tanget plane to the surface at (0 , 0 , f (0 , 0))as also the gradient vector ∇ F at(0 , 0 , f (0 , 0)). What do you expect? Ans: A horizontal tanget plane and a vertical gradient! August 24, 2018 49 / 403

First-Order Convexity Conditions: The completestatement Theorem For differentiable f : D→ ℜ and open convex set D , f is convex iff , for any x , y ∈ D , 1 T f ( x )( y − x ) f ( y ) ≥ f ( x ) + ∇ (9) f is strictly convex iff , for any x , y ∈ D , with x ̸ = y , 2 T f ( x )( y − x ) f ( y ) > f ( x ) + ∇ Strict lower bound (10) 3 f is strongly convex iff , for any x , y ∈ D , and for some constant c > 0 , 1 c || y − x || 2 T f ( x )( y − x ) + f ( y ) ≥ f ( x ) + ∇ (11) 2 August 24, 2018 50 / 403

First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) multiply by theta f ( x 1 ) ≥ f ( x )+ ∇ f ( x 2 ) ≥ f ( x )+ ∇ T f ( x )( x 2 − x ) multiply by 1-theta (12) And add.. August 24, 2018 51 / 403

First-Order Convexity Condition: Proof Proof: Sufficiency: The proof of sufficiency is very similar for all the three statements of the theorem. So we will prove only for statement (9). Suppose (9) holds. Consider x 1 , x 2 ∈ Dand any θ ∈ (0 , 1). Let x = θ x 1 + (1 − θ ) x 2 . Then, T f ( x )( x 1 − x ) f ( x 1 ) ≥ f ( x ) + ∇ T f ( x )( x 2 − x ) f ( x 2 ) ≥ f ( x ) + ∇ (12) Adding(1 − θ )times the second inequality to θ times the first, we get, θ f ( x 1 ) + (1 − θ ) f ( x 2 ) ≥ f ( x ) which proves that f ( x )is a convex function. In the case of strict convexity, strict inequality holds in (12) and it follows through. In the case of strong convexity, we need to additionally prove that 1 1 1 2 c || x − x || 2 = θ c || x − x || + (1 − θ ) 2 2 2 August 24, 2018 51 / 403

First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ∇ T f ( x 1 )( x 2 − x 1 ) = Directional derivative of f at x1 along x2 - x1 August 24, 2018 52 / 403

First-Order Convexity Conditions: Proofs Necessity: Suppose f is convex. Then for all θ ∈ (0 , 1)and x 1 , x 2 ∈ D, we musthave f ( θ x 2 + (1 − θ ) x 1 ) ≤ θ f ( x 2 ) + (1 − θ ) f ( x 1 ) Thus, ( ) f x + θ ( x − x ) − f ( x ) 1 2 1 1 T ∇ f ( x )( x − x ) =lim ≤ f ( x 2 ) − f ( x 1 ) 1 2 1 θ θ → 0 This proves necessity for (9). The necessity proofs for (10) and (11) are very similar, except for a small difference for the case of strict convexity; the strict inequality is not preserved when we take limits. Suppose equality does hold in the case of strict convexity, that is for a strictly convex function f , let f ( x 2 ) = f ( x 1 ) + ∇ T f ( x 1 )( x 2 − x 1 ) (13) for some x 2 ̸ = x 1 . August 24, 2018 52 / 403

First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get August 24, 2018 53 / 403

First-Order Convexity Conditions: Proofs Necessity (contd for strict case): Because f is stricly convex, for any θ ∈ (0 , 1)we canwrite f ((1 − θ ) x 1 + θ x 2 ) = f ( x 1 + θ ( x 2 − x 1 )) < (1 − θ ) f ( x 1 ) + θ f ( x 2 )(14) Since (9) is already proved for convex functions, we use it in conjunction with (13), and (14), to get f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) ≤ f ( x 1 + θ ( x 2 − x 1 ) ) < f ( x 1 ) + θ ∇ T f ( x 1 )( x 2 − x 1 ) Thus, equality can never hold in (9) for any x 1 = ̸ x 2 . This proves the which is a contradiction. necessity of (10). August 24, 2018 53 / 403

First-Order Convexity Conditions: The complete statement The geometrical interpretation of this theorem is that at any point, the linear approximation based on a local derivative gives a lower estimate of the function, i.e. the convex function always lies above the supporting hyperplane at that point. This is pictorially depictedbelow: August 24, 2018 54 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at any boundary point (x,f(x)) August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x epi(f) x [h,-1] There exist multiple supporting hyperplanes Let a supporting hyperplane be characterized by a normal vector [h(x), -1] When f was di ff erentiable, this vector was [gradient(x), -1] August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ ] [ ] [ ] } { [ v , z ]| ⟨ h ( x ) , − 1 , [ v , z ] ⟩ = ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all [ v , z ]on the hyperplane and ▶ ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )whic h also includes [y,f(y)] August 24, 2018 55 / 403

(Tight) Lower-bound for any (non-differentiable) Convex Function? For any convex function f (even if non-differentiable) The epi-graph epi ( f )will be convex The convex epi-graph epi ( f )will have a supporting hyperplane at every boundarypoint x [ v , z ]| ⟨ [ h ( x ) , − 1 ] , [ v , z ] ⟩ = ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ ▶ { } for all[ v , z ]on the hyperplane and ⟨ [ h ( x ) , − 1 ] , [ y , z ] ⟩ ≤ ⟨ [ h ( x ) , − 1 ] , [ x , f ( x ) ] ⟩ for all[ y , z ] ∈ epi ( f )which also includes [ y , f ( y ) ] [ ] [ ] [ ] [ ] Thus: ⟨ h ( x ) , − 1 , y , f ( y ) ⟩ ≤ ⟨ h ( x ) , − 1 , x , f ( x ) ⟩ for all y ∈ domain of f The normal to such a supporting hyperplane serves the same purpose as the [gradient(x),-1] August 24, 2018 55 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to supporting hyperplane at point (x,f(x) of epi(f) Need not be unique Gradient is a subgradient when the function is di ff erentiable August 24, 2018 56 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: (sub)Gradient necessary and su ffi cient conditions of optimality for convex functions Important for algorithms for optimization Subgradients are important for non-di ff erentiable functions and constraint optimization August 24, 2018 56 / 403

The What, Why and How of (sub)gradients What of (sub)gradient: Normal to the tightly lower bounding linear approximation to a convex function Why of (sub)gradient: Ability to deal with Constraints, Optimality Conditions, Optimization Algorithms How of (sub)gradient: How to compute subgradient of complex non-di ff erentiable convex functions Calculus of convex functions and of subgradients August 24, 2018 56 / 403

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, - PowerPoint PPT Presentation

Gradient and Epigraph (contd) x 2 ( x , x ) = 2 As an example, consider the paraboloid, f + x 9that attains its minimum at 1 2 1 2 (0 , 0). We see below its epigraph. Supporting hyperplan (or lower bound) at (0,0) August 24, 2018 48 /

classifier Sutanu Gayen Drawbacks of state-of-the art chess engines Contd.. Rule of square:

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Toward an Epic Epigraph Graph Francis Bond , Graham Matthews Linguistics and Multilingual Studies,

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

BrickNet (contd) BrickNet (contd) Other Academic Projects Other Academic Projects

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 574: Randomized Algorithms Lecture 20. Random Walks and Electrical Networks, contd.

Outline Unix-style access control, contd CSci 5271 Multilevel and mandatory access control

HACCPAC Mobile Community care in the palm of your hand. Introductions Agenda Introducing

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Gradient interfaces with and without disorder Codina Cotar University College London September

Prosodic Marking of Focus Domains German German Categorial or Gradient Categorial and

Concepts of programming languages Idris an Pali, Stefan Koppier, Kevin Namink, Luca

Normalization by Evaluation for System F Andreas Abel Department of Computer Science

ARCHER Training Courses General Overview Reusing this material This work is licensed under a

Type checking and normalisation James Chapman - University of Nottingham My thesis Type

On the tightness of SDP relaxations of QCQPs Alex L. Wang 1 and Fatma Kln-Karzan 1 1 Carnegie

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Summary Key topics. Familiarity with form of basic network gradient. Deep network

Sparse regression DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science