New class of limited-memory variationally-derived variable metric - PDF document

New class of limited-memory variationally-derived variable metric methods 1 Jan Vlˇ cek, Ladislav Lukˇ san Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Lukˇ san is also from Technical University of Liberec We present a new family of limited-memory variationally-derived variable metric (VM) line search methods with quadratic termination property for unconstrained minimization. Starting with x 0 ∈ R N , VM line search methods (see [6], [3]) generate iterations x k +1 ∈ R N by the process x k +1 = x k + s k , s k = t k d k , where the direction vectors d k ∈ R N are descent, i.e. g T k d k < 0, k ≥ 0, and the stepsizes t k > 0 satisfy f ( x k +1 ) − f ( x k ) ≤ ε 1 t k g T g T k +1 d k ≥ ε 2 g T k d k , k d k , (1) k ≥ 0, with 0 < ε 1 < 1 / 2 and ε 1 < ε 2 < 1, where f is an objective function, g k = ∇ f ( x k ). We denote y k = g k +1 − g k , k ≥ 0 and by � . � F the Frobenius matrix norm. We describe a new family in Section 1 and in Section 2 a correction formula, which uses the previous vectors s k − 1 , y k − 1 . Numerical results are presented in Section 3. 1 A new family of limited-memory methods Our methods are based on approximations ¯ k , k > 0, ¯ H k = U k U T H 0 = 0, of the inverse Hessian matrix, which are invariant under linear transformations (see [3] for significance of the invariance property in case of ill-conditioned problems), where N × min( k, m ) matrices U k , 1 ≤ m ≪ N , are obtained by limited-memory updates with scaling parameters γ k > 0 (see [6]) that satisfy the quasi-Newton condition ¯ H k +1 y k = ̺ k s k , (2) where ̺ k > 0 is a nonquadratic correction parameter (see [6]). We frequently omit index k , replace index k + 1 by symbol +, index k − 1 by symbol − and denote V r = I − ry T /r T y for r ∈ R N , r T y � = 0 (projection matrix), a = y T ¯ Hy, ¯ b = s T B ¯ c = s T B ¯ ¯ c − ¯ B = H − 1 , b = s T y> 0 , b 2 ≥ 0 . ¯ Hy, ¯ HBs, δ = ¯ a ¯ 1.1 Variationally-derived invariant limited-memory method Standard VM updates can be derived as updates with the minimum change of VM matrix in the sense of some norm (see [6]). We extend this approach to limited- memory methods (see also [10], [12]), using the product form of the update and 1 This work was supported by the Grant Agency of the Czech Academy of Sciences, project No. IAA1030405, the Grant Agency of the Czech Republic, and the Institutional research plan No. AV0Z10300504 1

+ y = ¯ replacing the quasi-Newton condition U + U T H + y = ̺s equivalently by + y = √ γz, U + ( √ γz ) = ̺s, U T z T z = ( ̺/γ ) b. (3) Theorem 1.1. Let T be a symmetric positive definite matrix, ̺ > 0 , γ > 0 , z ∈ R m , 1 ≤ m ≤ N , p = Ty and U the set of N × m matrices. Then the unique solution to min { ϕ ( U + ) : U + ∈ U} s . t . (3) , ϕ ( U + ) = y T Ty � T − 1 / 2 ( U + − √ γU ) � 2 F , is √ γU + = sz T I − zz T Uz − y T Uz � z T 1 � � = U − p � s − γ � � p T yy T U + b + V p U p T y p b , (4) z T z ̺ which yields the following projection form of limited-memory update of ¯ H ss T I − zz T 1 H + = ̺ � � ¯ U T V T + V p U p . (5) z T z γ γ b We can show that updates (4), (5) can be invariant under linear transformations, H = UU T as inverse Hessian. i.e. can preserve the same transformation property of ¯ x = Rx + r , where R is N × N Theorem 1.2 . Consider a change of variables ˜ nonsingular matrix, r ∈ R N . Let vector p lie in the subspace generated by vectors s , ¯ Hy and Uz and suppose that z , γ and coefficients in the linear combination of vectors s , ¯ Hy and Uz forming p are invariant under the transformation x → ˜ x , i.e. they are not influenced by this transformation. Then for ˜ U = RU matrix U + given by (4) also transforms to ˜ U + = RU + . In the special case (this choice satisfies the assumptions of Theorem 1.2) a ] ¯ p = ( λ/b ) s + [(1 − λ ) / ¯ Hy if ¯ a � = 0 , p = (1 /b ) s, λ = 1 otherwise (6) we can easily compare (5) with the scaled Broyden class update of ¯ H with parameter η = λ 2 , to obtain (1 /γ ) ¯ H + = (1 /γ ) ¯ H BC z T z ) V p Uz ( V p Uz ) T , where (see [11]) − (1 / + = ( ̺/b ) ss T + γV p ¯ ¯ H BC HV T p . (7) + � Update (7) is useful for starting iterations. Setting U + = [ ̺/b s ] in the first � iteration, every update (7) modifies U and adds one column ̺/b s to U + . Except for the starting iterations we will assume that matrix U has m ≥ 1 columns. To choose parameter z , we utilize analogy with standard VM methods , setting H = SS T , replacing U by N × N matrix S and using Theorem 1.1 for the standard scaled Broyden class update (see [6]) of matrix H = B − 1 and the assertion Lemma 1.1. Every update (4) with S , S + instead of U , U + , z = α 1 S T y + α 2 S T Bs satisfying z T z = ( ̺/γ ) b and p given by (6) belongs to the scaled Broyden class with � 2 � α 1 η = λ 2 − bγ α 2 y T Hy. b λ − y T Hy (1 − λ ) (8) ̺ 2

Thus we concentrate here on the choice z = α 1 U T y + α 2 U T Bs , α 2 � = 0, which yields � ̺ b ( U T Bs + θ U T y ) z = ± (9) γ aθ 2 + 2¯ ¯ bθ + ¯ c by z T z = ( ̺/γ ) b , where θ = α 1 /α 2 . The following lemma gives simple conditions for z to be invariant under linear transformations. Note that the standard unit values of ̺ , γ , used in our numerical experiments, satisfy this conditions. Lemma 1.2. Let numbers ̺ , γ and θ/t be invariant under transformation ˜ x = Rx + r , where t is the stepsize, R is N × N nonsingular matrix and r ∈ R N , and suppose that ˜ U = RU . Then vector z given by (9) is invariant under this transformation. In our numerical experiments we use the choice θ = − ¯ b/ ¯ a for ¯ a � = 0 (if ¯ a = 0, we do not update), which gives good results. Then θ/t is invariant and (9) gives z = � a ¯ a U T Bs − ¯ b U T y ) . In this case we have y T Uz = 0 and V p Uz = Uz . ± ( ̺/γ ) b / (¯ δ ) (¯ 1.2 Variationally-derived simple correction To have matrices ¯ H k invariant, we use such updates that − ¯ H k g k cannot be used as the direction vectors d k . Thus we replace ¯ H k by H k to calculate d k = − H k g k . We will find the minimum correction (in the sense of Frobenius matrix norm) of matrix ¯ H + + ζI , ζ > 0, in order that the resultant matrix H + may satisfy the quasi-Newton condition H + y = ̺s . First we give the projection variant of the well-known Greenstadt’s theorem, see [4]. For M = ¯ H + + ζI , the resulting correction (12) together with update (4) give the new family of limited-memory VM methods. Theorem 1.3. Let M, W be symmetric matrices, W positive definite, ̺ > 0 , q = Wy and denote M the set of N × N symmetric matrices. Then the unique solution to min {� W − 1 / 2 ( M + − M ) W − 1 / 2 � F : M + ∈ M} s . t . M + y = ̺s (10) is determined by the relation V q ( M + − M ) V T q = 0 and can be written in the form M + = E + V q ( M − E ) V T q , (11) where E is any symmetric matrix satisfying Ey = ̺s , e.g. E = ( ̺/b ) ss T . Theorem 1.4 . Let W be a symmetric positive definite matrix, ζ > 0 , ̺ > 0 , q = Wy and denote M the set of N × N symmetric matrices. Suppose that matrix ¯ H + satisfies the quasi-Newton condition (2). Then the unique solution to min {� W − 1 / 2 ( H + − ¯ H + − ζI ) W − 1 / 2 � F : H + ∈ M} s . t . H + y = ̺s is H + = ¯ H + + ζV q V T q . (12) 3

New class of limited-memory variationally-derived variable metric - PDF document

New class of limited-memory variationally-derived variable metric methods 1 Jan Vl cek, Ladislav Luk san Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Luk san is also from Technical University of Liberec

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

28.05.04 09:50 Memory Management The computer memory is a limited resource so the Memory

Welcome to The Memory Class An Introduction to Memory Problems and the Memory Center Agenda For

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

Last Class: Memory Management Allocating memory to processes Limited physical memory,

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Dynamic Memory Management 333 Dynamic Memory Management Process Memory Layout Process Memory

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

Memory Management Ideally programmers want memory that is large fast non

Analysis of a Parallel 3D MD application Russian-German School on High-Performance Computer

Spherical and hyperbolic 2-spheres with cone singularities Workshop Hyperbolic geometry and

UMBC A B M A L F T U M B C I O M Y O T R 1 (November 26, 2000 11:15 pm) I E

Approximate Graph Operations on Parallel Platforms Approximate Graph Operations on Parallel

A unified continuum mechanical approach for the computer age About the course Hans Petter

Text Classification using Weka Jrg Steffen, DFKI Substitute Gnter Neumann, DFKI

Influence of Salicylic Acid applica2on on Oxida2ve and Molecular