introductory course on non smooth optimisation
play

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of


  1. Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang Department of Applied Mathematics and Theoretical Physics

  2. Table of contents 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions Gradient descent 4 Heavy-ball method 5 Nesterov’s optimal schemes 6 Dynamical system 7

  3. Convexity Convex set A set S ⊂ R n is convex if for any θ ∈ [ 0 , 1 ] and two points x , y ∈ S , θ x + ( 1 − θ ) y ∈ S . Convex function Function F : R n → R is convex if dom ( F ) is convex and for all x , y ∈ dom ( F ) and θ ∈ [ 0 , 1 ] , F ( θ x + ( 1 − θ ) y ) ≤ θ F ( x ) + ( 1 − θ ) F ( y ) . Proper convex: F ( x ) < + ∞ at least for one x and F ( x ) > −∞ for all x . 1st-order condition: F is continuous differentiable F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � , ∀ x , y ∈ dom ( F ) . 2nd-order condition: if F is twice differentiable ∇ 2 F ( x ) � 0 , ∀ x ∈ dom ( F ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  4. Unconstrained smooth optimisation Problem Unconstrained smooth optimisation x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. Optimality condition: let x ⋆ be an minimiser of F ( x ) , then 0 = ∇ F ( x ⋆ ) . F ( x k ) ∇ F ( x ) ∇ F ( x ⋆ ) x Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  5. Example: quadratic minimisation Quadratic programming General quadratic programming problem min 2 x T Ax + b T x + c , 1 x ∈ R n where A ∈ R n × n is symmetric positive definite, b ∈ R n and c ∈ R . Optimality condition: 0 = Ax ⋆ + b . Special Special case ase least square || Ax − b || 2 = x T ( A T A ) x − 2 ( A T b ) T x + b T b . Optimality condition A T Ax ⋆ = A T b . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  6. Example: geometric programming Geometric programming � � m � min i = 1 exp ( a T i x + b i ) . x ∈ R n log Optimality condition: � m i x ⋆ + b i ) a i . 0 = 1 i = 1 exp ( a T i x ⋆ + b i ) � m i = 1 exp ( a T Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  7. Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system

  8. Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  9. Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. The set of minimisers, i.e. Argmin ( F ) = { x ∈ R n : F ( x ) = min x ∈ R n F ( x ) } is non-empty. However, given x ⋆ ∈ Argmin ( F ) , no closed form expression. Iterative strategy to find one x ⋆ ∈ Argmin ( F ) : start from x 0 and generate a train of sequsence { x k } k ∈ N such taht k →∞ x k = x ⋆ ∈ Argmin ( F ) . lim Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  10. Problem Unconstrained smooth optimisation Consider minising x ∈ R n F ( x ) , min where F : R n → R is proper convex and smooth differentiable. x k − 1 x k x k +1 x k +2 x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  11. Descent methods Iterative scheme For each k = 1 , 2 , ... , find γ k > 0 and d k ∈ R n and then x k + 1 = x k + γ k d k , where d k is called search/descent direction. γ k is called step-size. Descent methods An algorithm is called descent method, if there holds F ( x k + 1 ) < F ( x k ) . NB : if x k ∈ Argmin ( F ) , then x k + 1 = x k ... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  12. Conditions From convexity of F , we have F ( x k + 1 ) ≥ F ( x k ) + �∇ F ( x k ) , x k + 1 − x k � , which gives �∇ F ( x k ) , x k + 1 − x k � ≥ 0 = ⇒ F ( x k + 1 ) ≥ F ( x k ) . Since x k + 1 − x k = γ k d k , the direction d k should be such that �∇ F ( x k ) , d k � < 0 . ∇ F ( x k ) x k x ⋆ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  13. General descent method General descent method initial : x 0 ∈ dom ( F ) ; initial repea epeat : 1. Find a descent direction d k . 2. Choose a step-size γ k : line search. 3. Update x k + 1 = x k + γ k d k . un until til : stopping criterion is satisfied. Stopping criterion: ǫ > 0 is the tolerance, Function value: F ( x k + 1 ) − F ( x k ) ≤ ǫ (can be time consuming). Sequence: || x k + 1 − x k || ≤ ǫ . Optimality condition: ||∇ F ( x k ) || ≤ ǫ . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  14. Exact line search Exact line search Suppose that the direction d k is given. Choose γ k such that F ( x ) is minimised along the ray x k + γ d k , γ > 0: γ k = argmin γ> 0 F ( x k + γ d k ) . Useful when the minimistion problem for γ k is simple. γ k can be found analytically for special cases. F ( x k + γd k ) F ( x k + γ k d k ) γ = 0 γ k γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  15. Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. Reduce F enough along the direction d k . Since d k is a descent direction �∇ F ( x k ) , d k � < 0 . Stopping criterion for backtracking: F ( x k + γ d k ) ≤ F ( x k ) + δγ �∇ F ( x k ) , d k � . When γ is small enough F ( x k + γ d k ) ≈ F ( x k ) + γ �∇ F ( x k ) , d k � < F ( x k ) + δγ �∇ F ( x k ) , d k � , whcih means backtracking eventually will stop. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  16. Backtracking/inexact line search Backtracking line search Suppose that the direction d k is given. Choose δ ∈ ] 0 , 0 . 5 [ and β ∈ ] 0 , 1 [ , let γ = 1 while F ( x k + γ d k ) > F ( x k ) + δγ �∇ F ( x k ) , d k � : γ = βγ. F ( x k + γd k ) F ( x k ) + γ ∇ F ( x k ) T d k F ( x k ) + δγ ∇ F ( x k ) T d k γ = 0 γ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  17. Outline 1 Unconstrained smooth optimisation 2 Descent methods 3 Gradient of convex functions 4 Gradient descent 5 Heavy-ball method 6 Nesterov’s optimal schemes 7 Dynamical system

  18. Monotonicity Monotonicity of gradient Let F : R n → R be proper convex and smooth differentiable, then �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 , ∀ x , y ∈ dom ( F ) . C 1 : proper convex and smooth differentiable functions on R n . oof Owing to convexity, given x , y ∈ dom ( F ) , we have Pr Proof F ( y ) ≥ F ( x ) + �∇ F ( x ) , y − x � and F ( x ) ≥ F ( y ) + �∇ F ( y ) , x − y � . Summing them up yields �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ 0 . NB : Let F ∈ C 1 , F is convex if and only if ∇ F ( x ) is monotone. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  19. Lipschitz continuous gradient Lipschitz continuity The gradient of F is L -Lipschitz continuous if there exists L > 0 such that ||∇ F ( x ) − ∇ F ( y ) || ≤ L || x − y || , ∀ x , y ∈ dom ( F ) . L : proper convex functions with L -Lipschitz continuous gradient on R n . C 1 If F ∈ C 1 L , then 2 || x || 2 − F ( x ) H ( x ) = L def is convex. Hint : monotonicity of ∇ H ( x ) , i.e. Hin �∇ H ( x ) − ∇ H ( y ) , x − y � = L || x − y || 2 − �∇ F ( x ) − ∇ F ( y ) , x − y � ≥ L || x − y || 2 − L || x − y || 2 = 0 . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  20. Descent lemma Descent lemma, quadratic upper bound Let F ∈ C 1 L , then there holds F ( y ) ≤ F ( x ) + �∇ F ( x ) , y − x � + L 2 || y − x || 2 , ∀ x , y ∈ dom ( F ) . oof Define H ( t ) = F ( x + t ( y − x )) , then Pr Proof � 1 � 1 F ( y ) − F ( x ) = H ( 1 ) − H ( 0 ) = ∇ H ( t ) d t = ( y − x ) T ∇ F ( x + t ( y − x )) d t 0 0 � 1 � 1 � ( y − x ) T � � �� ≤ ( y − x ) T ∇ F ( x ) d t + ∇ F ( x + t ( y − x )) − ∇ F ( x ) � d t 0 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x ||||∇ F ( x + t ( y − x )) − ∇ F ( x ) || d t 0 � 1 ≤ ( y − x ) T ∇ F ( x ) + || y − x || tL || y − x || d t 0 = ( y − x ) T ∇ F ( x ) + L 2 || y − x || 2 . 2 || x || 2 − F ( x ) . def NB : first-order condition of convexity for H ( x ) = L Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  21. Descent lemma: consequences Corollary L and x ⋆ ∈ Argmin ( F ) , then Let F ∈ C 1 2 L ||∇ F ( x ) || 2 ≤ F ( x ) − F ( x ⋆ ) ≤ L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . 1 oof Right-hand inequality: ∇ F ( x ⋆ ) = 0, Pr Proof F ( x ) ≤ F ( x ⋆ ) + �∇ F ( x ⋆ ) , x − x ⋆ � + L 2 || x − x ⋆ || 2 , ∀ x ∈ dom ( F ) . Left-hand inequality: � 2 || y − x || 2 � F ( x ⋆ ) ≤ min F ( x ) + �∇ F ( x ) , y − x � + L y ∈ dom ( F ) = F ( x ) − 1 2 L ||∇ F ( x ) || 2 . The corresponding y is y = x − 1 L ∇ F ( x ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend