Unconstrained minimization Lectures for PHD course on Numerical - PowerPoint PPT Presentation

Unconstrained minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS – Universit´ a di Trento November 21 – December 14, 2011 Unconstrained minimization 1 / 58

Outline General iterative scheme 1 Backtracking Armijo line-search 2 Global convergence of backtracking Armijo line-search Global convergence of steepest descent Wolfe–Zoutendijk global convergence 3 The Wolfe conditions The Armijo-Goldstein conditions Algorithms for line-search 4 Armijo Parabolic-Cubic search Wolfe linesearch Unconstrained minimization 2 / 58

The problem (1 / 3) Given f : ❘ n �→ ❘ : minimize f ( x ) x ∈ ❘ n the following regularity about f ( x ) is assumed in the following: Assumption (Regularity assumption) We assume f ∈ C 1 ( ❘ n ) with Lipschitz continuous gradient, i.e. there exists γ > 0 such that � ∇ f ( x ) T − ∇ f ( y ) T � � ≤ γ � x − y � , � ∀ x , y ∈ ❘ n Unconstrained minimization 3 / 58

The problem (2 / 3) Definition (Global minimum) Given f : ❘ n �→ ❘ a point x ⋆ ∈ ❘ n is a global minimum if ∀ x ∈ ❘ n . f ( x ⋆ ) ≤ f ( x ) , Definition (Local minimum) Given f : ❘ n �→ ❘ a point x ⋆ ∈ ❘ n is a local minimum if f ( x ⋆ ) ≤ f ( x ) , ∀ x ∈ B ( x ⋆ ; δ ) . Obviously a global minimum is a local minimum. Find a global minimum in general is not an easy task. The algorithms presented in the sequel will approximate local minima’s. Unconstrained minimization 4 / 58

The problem (3 / 3) Definition (Strict global minimum) Given f : ❘ n �→ ❘ a point x ⋆ ∈ ❘ n is a strict global minimum if ∀ x ∈ ❘ n \ { x ⋆ } . f ( x ⋆ ) < f ( x ) , Definition (Strict local minimum) Given f : ❘ n �→ ❘ a point x ⋆ ∈ ❘ n is a strict local minimum if f ( x ⋆ ) < f ( x ) , ∀ x ∈ B ( x ⋆ ; δ ) \ { x ⋆ } . Obviously a strict global minimum is a strict local minimum. Unconstrained minimization 5 / 58

First order Necessary condition Lemma (First order Necessary condition for local minimum) Given f : ❘ n �→ ❘ satisfying the regularity assumption. If a point x ⋆ ∈ ❘ n is a local minimum then ∇ f ( x ⋆ ) T = 0 . Proof. Consider a generic direction d , then for δ small enough we have λ − 1 � � f ( x ⋆ + λ d ) − f ( x ⋆ ) ≥ 0 , 0 < λ < δ so that λ → 0 λ − 1 � � lim f ( x ⋆ + λ d ) − f ( x ⋆ ) = ∇ f ( x ⋆ ) d ≥ 0 , because d is a generic direction we have ∇ f ( x ⋆ ) T = 0 . Unconstrained minimization 6 / 58

1 The first order necessary condition do not discriminate maximum, minimum, or saddle points. 2 To discriminate maximum and minimum we need more information, e.g. second order derivative of f ( x ) . 3 With second order derivative we can build necessary and sufficient condition for a minima. 4 In general using only first and second order derivative at the point x ⋆ it is not possible to deduce a necessary and sufficient condition for a minima. Unconstrained minimization 7 / 58

Second order Necessary condition Lemma (Second order Necessary condition for local minimum) Given f ∈ C 2 ( ❘ n ) if a point x ⋆ ∈ ❘ n is a local minimum then ∇ f ( x ⋆ ) T = 0 and ∇ 2 f ( x ⋆ ) is semi-definite positive, i.e. d T ∇ 2 f ( x ⋆ ) d ≥ 0 , ∀ d ∈ ❘ n Example This condition is only, necessary, in fact consider f ( x ) = x 2 1 − x 3 2 , � 2 � 0 2 x 1 , − 3 x 2 ∇ 2 f ( x ) = � � ∇ f ( x ) = , 2 0 − 6 x 2 for the point x ⋆ = 0 we have ∇ f ( 0 ) = 0 and ∇ 2 f ( 0 ) semi-definite positive, but 0 is a saddle point not a minimum. Unconstrained minimization 8 / 58

Proof. The condition ∇ f ( x ⋆ ) T = 0 comes from first order necessary conditions. Consider now a generic direction d , and the finite difference: f ( x ⋆ + λ d ) − 2 f ( x ⋆ ) + f ( x ⋆ − λ d ) ≥ 0 λ 2 by using Taylor expansion for f ( x ) f ( x ⋆ ± λ d ) = f ( x ⋆ ) ± ∇ f ( x ⋆ ) λ d + λ 2 2 d T ∇ 2 f ( x ⋆ ) d + o ( λ 2 ) and from the previous inequality d T ∇ 2 f ( x ⋆ ) d + 2 o ( λ 2 ) /λ 2 ≥ 0 taking the limit λ → 0 and form the arbitrariness of d we have that ∇ 2 f ( x ⋆ ) must be semi-definite positive. Unconstrained minimization 9 / 58

Second order sufficient condition Lemma (Second order sufficient condition for local minimum) Given f ∈ C 2 ( ❘ n ) if a point x ⋆ ∈ ❘ n satisfy: 1 ∇ f ( x ⋆ ) T = 0 ; 2 ∇ 2 f ( x ⋆ ) is definite positive; i.e. d T ∇ 2 f ( x ⋆ ) d > 0 , ∀ d ∈ ❘ n \ { x ⋆ } then x ⋆ ∈ ❘ n is a strict local minimum. Remark Because ∇ 2 f ( x ⋆ ) is symmetric we can write λ min d T d ≤ d T ∇ 2 f ( x ⋆ ) d ≤ λ max d T d If ∇ 2 f ( x ⋆ ) is positive definite we have λ min > 0 . Unconstrained minimization 10 / 58

Proof. Consider now a generic direction d , and the Taylor expansion for f ( x ) f ( x ⋆ + d ) = f ( x ⋆ ) + ∇ f ( x ⋆ ) d + 1 2 d T ∇ 2 f ( x ⋆ ) d + o ( � d � 2 ) ≥ f ( x ⋆ ) + 1 2 λ min � d � 2 + o ( � d � 2 ) ≥ f ( x ⋆ ) + 1 2 λ min � d � 2 � 1 + o ( � d � 2 ) / � d � 2 � choosing d small enough we can write f ( x ⋆ + d ) ≥ f ( x ⋆ ) + 1 4 λ min � d � 2 > f ( x ⋆ ) , d � = 0 , � d � ≤ δ. i.e. x ⋆ is a strict minimum. Unconstrained minimization 11 / 58

General iterative scheme Outline General iterative scheme 1 Backtracking Armijo line-search 2 Global convergence of backtracking Armijo line-search Global convergence of steepest descent Wolfe–Zoutendijk global convergence 3 The Wolfe conditions The Armijo-Goldstein conditions Algorithms for line-search 4 Armijo Parabolic-Cubic search Wolfe linesearch Unconstrained minimization 12 / 58

General iterative scheme How to find a minimum Given f : ❘ n �→ ❘ : minimize x ∈ ❘ n f ( x ) . 1 We can solve the problem by solving the necessary condition. i.e by solving the nonlinear systems ∇ f ( x ) T = 0 . 2 Using such an approach we looses the information about f ( x ) . 3 Moreover such an approach can find solution corresponding to a maximum or saddle points. 4 A better approach is to use all the information and try to build minimizing procedure, i.e. procedures that, starting from a point x 0 build a sequence { x k } such that f ( x k +1 ) ≤ f ( x k ) . In this way, at least, we avoid to converge to a strict maximum. Unconstrained minimization 13 / 58

General iterative scheme Iterative Methods in practice very rare to be able to provide explicit minimizer. iterative method: given starting guess x 0 , generate the sequence, � � x k , k = 1 , 2 , . . . AIM: ensure that (a subsequence) has some favorable limiting properties: satisfies first-order necessary conditions satisfies second-order necessary conditions Unconstrained minimization 14 / 58

General iterative scheme Line-search Methods A generic iterative minimization procedure can be sketched as follows: calculate a search direction p k from x k ensure that this direction is a descent direction, i.e. whenever ∇ f ( x k ) T � = 0 ∇ f ( x k ) p k < 0 , so that, at least for small steps along p k , the objective function f ( x ) will be reduced use line-search to calculate a suitable step-length α k > 0 so that f ( x k + α k p k ) < f ( x k ) . Update the point: x k +1 = x k + α k p k Unconstrained minimization 15 / 58

General iterative scheme Generic minimization algorithm Written with a pseudo-code the minimization procedure is the following algorithm: Generic minimization algorithm Given an initial guess x 0 , let k = 0 ; while not converged do Find a descent direction p k at x k ; Compute a step size α k using a line-search along p k . Set x k +1 = x k + α k p k and increase k by 1 . end while The crucial points which differentiate the algorithms are: 1 The computation of the direction p k ; 2 The computation of the step size α k . Unconstrained minimization 16 / 58

General iterative scheme Practical Line-search methods The first developed minimization algorithms try to solve α k = arg min α> 0 f ( x k + α p k ) performing exact line-search by univariate minimization; rather expensive and certainly not cost effective. Modern methods implements inexact line-search: ensure steps are neither too long nor too short try to pick useful initial step size for fast convergence best methods are based on: backtracking–Armijo search; Armijo–Goldstein search; Franke–Wolfe search; Unconstrained minimization 17 / 58

General iterative scheme backtracking line-search To obtain a monotone decreasing sequence we can use the following algorithm: Backtracking line-search Given α init (e.g., α init = 1 ); Given τ ∈ (0 , 1) typically τ = 0 . 5 ; Let α (0) = α init ; while not f ( x k + α ( ℓ ) p k ) < f ( x k ) do set α ( ℓ +1) = τα ( ℓ ) ; increase ℓ by 1 ; end while Set α k = α ( ℓ ) . To be effective the previous algorithm should terminate in a finite number of steps. The next lemma assure that if p k is a descent direction then the algorithm terminate. Unconstrained minimization 18 / 58

Backtracking Armijo line-search Outline General iterative scheme 1 Backtracking Armijo line-search 2 Global convergence of backtracking Armijo line-search Global convergence of steepest descent Wolfe–Zoutendijk global convergence 3 The Wolfe conditions The Armijo-Goldstein conditions Algorithms for line-search 4 Armijo Parabolic-Cubic search Wolfe linesearch Unconstrained minimization 19 / 58

Unconstrained minimization Lectures for PHD course on Numerical - PowerPoint PPT Presentation

Unconstrained minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universit a di Trento November 21 December 14, 2011 Unconstrained minimization 1 / 58 Outline General iterative scheme 1

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical

Optimal approximation for unconstrained non-submodular minimization Marwa El Halabi Stefanie

Minimization Using Descent Information we will consider the minimization of unconstrained

Adaptive Low Complexity Algorithms for Unconstrained Minimization Carmine Di Fiore, Stefano

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

Local, Unconstrained Function Optimization COMPSCI 527 Computer Vision COMPSCI 527

Descent Algorithms for Optimizing Unconstrained Problems Techniques relevant for most (convex)

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

10. Unconstrained minimization terminology and assumptions gradient descent method

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

Random maps with unconstrained genus Thomas Budzinski Joint work with Nicolas Curien and Bram

A Benchmark Study of Large-scale Unconstrained Face Recognition Shengcai Liao, Zhen Lei, Dong Yi,

Unconstrained Face Recognition and Analysis S. Kevin Zhou Siemens Corporate Research, Inc.

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Algorithms for unconstrained local optimization Fabio Schoen 2008

Wiener filtering illustrations 6.011, Spring 2018 Lec 21 1 Unconstrained Wiener filter

Resource Protection Ordinance & Grading Ordinance Community Workshops October 2020

Bulbar Features Communication Gabrielle Rossano, Speech and Language Therapist, National

PUBLIC HEALTH POLICY CHANGE POLICY OPTIONS FOR COMBATING TOBACCO INDUSTRY PRICE DISCOUNTING The

Give birth to the end of Hep B Hepatitis B: What Hospitals Need to Do to Protect Newborns A

Hierarchies of Decision Problems over Algebraic Structures Defined by Quantifiers Christine

Barbara Howe shares with you A short introduction Our Associations recent achievements and

Model-based recursive partitioning for Bradley-Terry models Florian Wickelmaier Carolin Strobl

Lax monads and generalized multicategory theory Dimitri Chikhladze 1 A category ( x, a

Unconstrained minimization Lectures for PHD course on Numerical - PowerPoint PPT Presentation

Unconstrained minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universit a di Trento November 21 December 14, 2011 Unconstrained minimization 1 / 58 Outline General iterative scheme 1

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical

Optimal approximation for unconstrained non-submodular minimization Marwa El Halabi Stefanie

Minimization Using Descent Information we will consider the minimization of unconstrained

Adaptive Low Complexity Algorithms for Unconstrained Minimization Carmine Di Fiore, Stefano

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

Local, Unconstrained Function Optimization COMPSCI 527 Computer Vision COMPSCI 527

Descent Algorithms for Optimizing Unconstrained Problems Techniques relevant for most (convex)

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

10. Unconstrained minimization terminology and assumptions gradient descent method

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

Random maps with unconstrained genus Thomas Budzinski Joint work with Nicolas Curien and Bram

A Benchmark Study of Large-scale Unconstrained Face Recognition Shengcai Liao, Zhen Lei, Dong Yi,

Unconstrained Face Recognition and Analysis S. Kevin Zhou Siemens Corporate Research, Inc.

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Algorithms for unconstrained local optimization Fabio Schoen 2008

Wiener filtering illustrations 6.011, Spring 2018 Lec 21 1 Unconstrained Wiener filter

Resource Protection Ordinance &amp; Grading Ordinance Community Workshops October 2020

Bulbar Features Communication Gabrielle Rossano, Speech and Language Therapist, National

PUBLIC HEALTH POLICY CHANGE POLICY OPTIONS FOR COMBATING TOBACCO INDUSTRY PRICE DISCOUNTING The

Give birth to the end of Hep B Hepatitis B: What Hospitals Need to Do to Protect Newborns A

Hierarchies of Decision Problems over Algebraic Structures Defined by Quantifiers Christine

Barbara Howe shares with you A short introduction Our Associations recent achievements and

Model-based recursive partitioning for Bradley-Terry models Florian Wickelmaier Carolin Strobl

Lax monads and generalized multicategory theory Dimitri Chikhladze 1 A category ( x, a

Resource Protection Ordinance & Grading Ordinance Community Workshops October 2020