Spectral properties of steplength selections in gradient methods: - PowerPoint PPT Presentation

Spectral properties of steplength selections in gradient methods: from unconstrained to constrained optimization L. Zanni Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Italy Variational Methods and Optimization in Imaging IHP - Paris, 4 - 8 February 2019 Joint work with: S. Crisci, V. Ruggiero , University of Ferrara, Italy F. Porta , University of Modena and Reggio Emilia, Italy L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Outline Gradient methods for unconstrained problems 1 Spectral properties of steplength selections Design selection rules by exploiting spectral properties From the quadratic case to general unconstrained problems Gradient projection methods for box-constrained problems 2 Spectral properties of steplengths in the quadratic case New steplength rules taking into account the constraints Scaled gradient projection methods 3 Define the diagonal scaling The steplengths in variable metric approaches Practical behaviour in imaging Conclusions 4 L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Motivation for the steplength analysis Constrained optimization problems min x ∈ Ω f ( x ) (1) f : R N − → R continuously differentiable function Ω ⊂ R N , nonempty closed convex set defined by simple constraints Gradient Projection (GP) methods for min x ∈ Ω f ( x ) x ( k ) + ϑ k d ( k ) x ( k +1) = d ( k ) = P Ω x ( k ) − α k ∇ f ( x ( k ) ) � � − x ( k ) ϑ k ∈ (0 , 1] , P Ω ( x ) = argmin z ∈ Ω � z − x � α k > 0 , Usually the updating rules for the steplength α k are those exploited in the unconstrained case: is this a suitable choice? L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Spectral analysis of steplength selections ➤ The unconstrained case ➤ The box-constrained case ➤ The Scaled Gradient Projection methods L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Steplength selection: the unconstrained case The recipe exploited by state-of-the-art selection rules: define steplengths by trying to capture, in an inexpensive way, some second order information design selection rules in the strictly convex quadratic case: f ( x ) = 1 2 x T A x − b T x , A symmetric positive definite second order information ↔ spectral properties of A design selection rules that generalize, in an inexpensive way, to non-quadratic cases ∇ 2 f ( x ( k ) ) depends on the iterations but ∇ 2 f ( x ( k ) ) → ∇ 2 f ( x ∗ ) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

A popular example: the Barzilai-Borwein (BB) selection rules Consider the gradient method for the problem min f ( x ) : x ( k +1) = x ( k ) − α k ∇ f ( x ( k ) ) k = 0 , 1 , . . . , Suggestion [Barzilai-Borwein, IMA J. Num. Anal. 1988] : Force the matrix ( α k I ) − 1 to approximate the Hessian ∇ 2 f ( x ( k ) ) by imposing quasi-Newton properties � ( αI ) − 1 s ( k − 1) − z ( k − 1) � = s ( k − 1) T s ( k − 1) α BB1 = argmin k s ( k − 1) T z ( k − 1) α ∈ R or � s ( k − 1) − ( αI ) z ( k − 1) � = s ( k − 1) T z ( k − 1) α BB2 = argmin k z ( k − 1) T z ( k − 1) α ∈ R s ( k − 1) = x ( k ) − x ( k − 1) � z ( k − 1) = ( ∇ f ( x ( k ) ) − ∇ f ( x ( k − 1) )) . � where , L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Spectral properties of the BB steplength rules Consider a gradient method for the quadratic unconstrained case: min f ( x ) ≡ 1 2 x T A x − b T x , A = diag ( λ 1 , . . . , λ N ) , 0 < λ 1 < · · · < λ N x ( k +1) = x ( k ) − α k g ( k ) , g ( k ) = ∇ f ( x ( k ) ) , k = 0 , 1 , . . . ➩ g ( k +1) = (1 − α k λ i ) g ( k ) i = 1 , . . . , N i i g ( k +1) g ( k + j ) - α k = 1 ⇒ = 0 ⇒ = 0 , j = 2 , 3 . . . λ i i i g ( k + N ) = 0 (Finite Termination) - α k + i − 1 = 1 λ i , i = 1 , . . . , N ⇒ α k must aim at approximating the inverse of the eigenvalues of A L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

BB rules in the quadratic case = g ( k − 1) T A g ( k − 1) = g ( k − 1) T g ( k − 1) 1 1 ≤ α BB2 α BB1 ≤ g ( k − 1) T A g ( k − 1) ≤ g ( k − 1) T A 2 g ( k − 1) k k λ N λ 1 Example A = diag ( λ 1 , . . . , λ 10 ) , λ i = 111 i − 110 f ( x ) = 1 2 x T A x − b T x b random vector; b i ∈ [ − 10 , 10] stopping rule: � g ( k ) � ≤ 10 − 8 � g (0) � L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Quadratic case: exploiting spectral properties In the quadratic case ( A = diag ( λ 1 , . . . , λ N ) , 0 < λ 1 < · · · < λ N ), we have g ( k +1) = (1 − α k λ j ) g ( k ) • j = 1 , . . . , N j j  � � � � � g ( k +1) � g ( k ) � ≪ very useful � � � �  i i  �    α k ≈ 1  � � � � � g ( k +1) � g ( k ) • ⇒ � < if j < i useful � � � � j j λ i �    � � � � � g ( k +1) � g ( k )  � > if j > i, λ j > 2 λ i dangerous  � � � �  j j � α BB2 /α BB1 = cos 2 ( g ( k − 1) , A g ( k − 1) ) • k k Idea for improving the BB rules : force a sequence of small α BB2 to reduce | g i | for large i , leading to k gradients in which these components are not dominant after a sequence of small α k , if α BB2 /α BB1 ≈ 1 , exploit k k g T g α BB1 = aiming at obtaining α BB1 ≈ 1 /λ i for small i g T A g L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Practical implementations of this idea: ABB and ABBmin rules Alternate Barzilai-Borwein selection rule [Zhou-Gao-Dai, COAP (2006) ]  α BB 2 α BB 2 if k < τ, τ ∈ (0 , 1)  k α BB 1  α ABB k = k  α BB 1 otherwise  k ABBmin rule [Frassoldati-Zanghirati-Zanni, JIMO (2008) ]  � α BB 2 � if α BB 2 / α BB 1 min | j = max { 1 , k − M α } , ..., k < τ j k k ABB min  α = k α BB 1 otherwise  k where M α > 0 is a parameter. L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

ABB and ABBmin rules on the previous toy problem ABB min ABB 0 0 10 10 −1 −1 10 10 α k α k −2 −2 10 10 −3 −3 10 10 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 40 45 Iterations Iterations Error Cauchy Steepest Descent (CSD) 0 10 α k = argmin α> 0 f ( x ( k ) − α k g ( k ) ) CSD BB1 −2 10 BB2 α k = α BB 1 BB1 → ABB ||x k −x * ||/||x * || k ABB min −4 10 α k = α BB 2 BB2 → k −6 ABB → alternation 10 ABBmin → modified alternation 50 100 150 200 250 300 350 400 450 500 Iterations L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Similar behaviour on randomly generated test problems Quadratic test problems: N = 1000 λ N = 10 4 , λ 1 = 1 , λ i , i = 2 , . . . , N − 1 , log-spaced λ = 10 3 , λ = 1 , λ i = λ + ( λ − λ ) ∗ s i , i = 1 , . . . , N, s i ∈ (0 , 0 . 2) , i = 1 , . . . , N/ 2 , s i ∈ (0 . 8 , 1) , i = N/ 2 + 1 , . . . , N. [Di Serafino-Ruggiero-Toraldo-Z., AMC 2018] L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Other efficient steplength rules based on spectral properties [Pronzato-Zhigljavsky, Comput. Optim. Appl. 50 (2011)] [Fletcher, Math. Program. Ser. A 135 (2012)] [Pronzato-Zhigljavsky-Bukina, Acta Appl. Math. 127 (2013)] [De Asmundis-Di Serafino-Riccio-Toraldo, IMA J. Numer. Anal. 33 (2013)]] [De Asmundis-Di Serafino-Hager-Toraldo-Zhan, Comput. Optim. Appl. 59 (2014)] [Gonzaga-Schneider, Comput. Optim. Appl. 63 (2016)] [Gonzaga, Math. Program. Ser. A 160 (2016)] Aimed at breaking the well-known cycling behaviour of the Steepest Descent method they share R-linear convergence rate in the quadratic case not all these rules easily generalize to general non-quadratic problems (BB-based rules have this crucial property) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

General unconstrained problems : min x ∈ R N f ( x ) Gradient methods with nonmonotone linesearch: Init. : x (0) ∈ R N , 0 < α min ≤ α max , α 0 ∈ [ α min , α max ] , δ, σ ∈ (0 , 1) , M ∈ N ; for k = 0 , 1 , . . . f ref = max { f ( x ( k − j ) ) , 0 ≤ j ≤ min( k, M ) } ; ν k = α k ; f ( x ( k ) − ν k g ( k ) ) > f ref − σν k g ( k ) T g ( k ) while (line search) ν k = δν k ; end x ( k +1) = x ( k ) − ν k g ( k ) ; define a tentative steplength α k +1 ∈ [ α min , α max ] end ➤ tentative steplength: exploit effective steplength selections designed for the quadratic case and generalizable in an inexpensive way. ➤ R-linear convergence of { f ( x ( k ) ) } when f is strongly convex with Lipschitz-cont. gradient ( [Dai, JOTA 2002], [Dai-Liao, IMA J.Num.Anal. 2002] ) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Spectral properties of steplength selections in gradient methods: - PowerPoint PPT Presentation

Spectral properties of steplength selections in gradient methods: from unconstrained to constrained optimization L. Zanni Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Italy Variational Methods and

Training & Placement Presentation 2018 batch No. of Selections - 6 No. of Selections - 5

On the steplength selection in Stochastic Gradient Methods Giorgia Franchini

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

On the regularization properties of some spectral gradient methods Daniela di Serafino

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

The Individual Psychology Of Alfred Adler A Systematic Presentation In Selections From His

Using Office 365 Secure Score to Enhance your Microsoft Cloud Security Chris Rhoda VP for

1 '(%&

A Proposal for an OpenMath JSON Encoding Tom Wiesing Michael Kohlhase August 13, 2018 Recap:

Years of AMANDA-II Tyce DeYoung Department of Physics and Center for Particle Astrophysics Penn

Asia Pacific Journal of Marketing and Logistics A proposed method for the development of marketing

Fourth Quarter Fourth Quarter Fiscal Year 2011 Fiscal Year 2011 Fiscal Year 2011 Fiscal Year

Poster Presenter Prep Call September 25, 2019 www.energystorage.org About Storage Exchange

Delek US Holdings, Inc. First Quarter 2020 Earnings Call May 6, 2020 Disclaimers Forward

Sambuz

Useful Links

Newsletter

Mail Us

Spectral properties of steplength selections in gradient methods: - PowerPoint PPT Presentation

Spectral properties of steplength selections in gradient methods: from unconstrained to constrained optimization L. Zanni Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Italy Variational Methods and

Training &amp; Placement Presentation 2018 batch No. of Selections - 6 No. of Selections - 5

On the steplength selection in Stochastic Gradient Methods Giorgia Franchini

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

On the regularization properties of some spectral gradient methods Daniela di Serafino

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

Web Course Web Course Physical Properties of Glass Physical Properties of Glass 1. Properties

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

The Individual Psychology Of Alfred Adler A Systematic Presentation In Selections From His

Using Office 365 Secure Score to Enhance your Microsoft Cloud Security Chris Rhoda VP for

1 '(%&amp;

A Proposal for an OpenMath JSON Encoding Tom Wiesing Michael Kohlhase August 13, 2018 Recap:

Years of AMANDA-II Tyce DeYoung Department of Physics and Center for Particle Astrophysics Penn

Asia Pacific Journal of Marketing and Logistics A proposed method for the development of marketing

Fourth Quarter Fourth Quarter Fiscal Year 2011 Fiscal Year 2011 Fiscal Year 2011 Fiscal Year

Poster Presenter Prep Call September 25, 2019 www.energystorage.org About Storage Exchange

Delek US Holdings, Inc. First Quarter 2020 Earnings Call May 6, 2020 Disclaimers Forward

Sambuz

Useful Links

Newsletter

Mail Us

Training & Placement Presentation 2018 batch No. of Selections - 6 No. of Selections - 5

1 '(%&