Conjugate Directions Powells method is based on a model quadratic - PDF document

Conjugate Directions • Powell’s method is based on a model quadratic objective function and conjugate directions in R n with respect to the Hessian of the quadratic objective function. R n , ∈ • what does it mean for two vectors u v to be conjugate ? R n , ∈ Definition: given that u v , then u and v are said to be mutually orthogonal u T v ( , ) ( , ) is our notation for the scalar product ). � if u v = = 0 (where u v R n , ∈ Definition: given that u v , then u and v are said to be mutually conjugate with respect to a symmetric positive definite matrix A if u and A v are mutually orthogonal, i.e. u T A v ( , ) = u A v = 0 . � • Note that if two vectors are mutually conjugate with respect to the identity matrix, that is A = , then they are mutually orthogonal. I Eigenvectors • x i is an eigenvector of the matrix A , with corresponding eigenvalue λ i if it satis- fies the equation λ i x i 1 … n , , A x i = i = and λ i is a solution to the characteristic equation A λ i I – = 0 . × R n n ∈ • If A is a symmetric positive definite matrix, then there will exist n eigenvectors, x 1 … x n , , ( , ) which are mutually orthogonal ( i.e. x i x j = 0 for ≠ ). i j ( , ) ( x i λ x j , ) λ x i x j ( , ) ≠ , this implies that the • Now since: x i A x j = = = for i 0 j eigenvectors, x i , are mutually conjugate with respect to the matrix A .

We Can Expand Any Vector In Terms Of A Set Of Conjugate Vectors Theorem: A set of n mutually conjugate vectors in R n span the R n space and therefore constitute a basis for R n . � Proof: 1 … n , , = let u i , i be mutually conjugate with respect to a symmetric positive × R n n ∈ definite matrix A . Consider a linear combination which is equal to zero: n ∑ α i u i = 0 i = 1 we pre-multiply by the matrix A n n ∑ ∑ α i u i α i A u i = = A 0 i = 1 i = 1 and take the inner product with u k n n ⎛ ⎞ ⎜ ⎟ ∑ ∑ , α i A u i α i u k A u i ( , ) α k u k A u k ( , ) u k = = = 0 ⎜ ⎟ ⎝ ⎠ i = 1 i = 1 Now, since A is positive definite, we have ( , ) > ∀ , ≠ u k A u k 0 , u k u k 0 Therefore, it must be that α k ∀ , which implies that u i , i 1 … n , , = 0 , k = are linearly independent and since there are n of them, they form a basis for the R n space. �

• What does it mean for a set of vectors to be linearly independent? Can you prove that a set of n linearly independent vectors in R n form a basis for the R n space? Expansion of an Arbitrary Vector R n ∈ Now consider an arbitrary vector x . We can expand x in our mutually conjugate basis as follows: n ∑ α i u i x = i = 1 where the scalar values α i are to be determined. We next take the inner product of u k with A x : n n ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ∑ ∑ ( , ) , α i u i , α i A u i u k A x = u k A = u k ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ = = i 1 i 1 n ∑ α i u k A u i ( , ) α k u k A u k ( , ) = = = i 1 from which we can solve for the scalar coefficients as ( , ) u k A x α k = - - - - - - - - - - - - - - - - - - - - - - - ( , ) u k A u k R n ∈ and we have that an arbitrary vector x can be expanded in terms of n 1 … n , , mutually conjugate vectors u i , i = as n ( , ) u k A x ∑ - - - - - - - - - - - - - - - - - - - - - - - u i = x ( , ) u k A u k i = 1

Definition: If a minimization method always locates the minimum of a general quadratic function in no more than a predetermined number of steps directly related to number of variables n , then the method is called quadratically convergent . � 1 - x T A x b T x ( ) - Theorem: If a quadratic function Q x = + + c is minimized 2 sequentially once along each direction of a set of n linearly independent, A - conjugate directions, then the global minimum of Q will be located at or before the n th step regardless of the starting point. � Proof: We know that ∇ Q x * A x * ( ) = + = (1) b 0 1 … n , , and given u i , i = to be A -conjugate vectors or, in this case, directions of minimization, we know from previous theorem that they are linearly independent. Let x 1 be the starting point of our search, then expanding the minimum x * as n x 1 ∑ x * α i u i = + (2) i = 1 n ⎛ ⎞ A x 1 ⎜ ⎟ ∑ A x * α i u i + = + + b b ⎜ ⎟ ⎝ ⎠ i = 1 n n A x 1 A x 1 ∑ ∑ α i u i α i A u i = + + = + + = b A b 0 i = 1 i = 1 taking the inner product with u j (using the notation v T u ( , ) = v u ) we have

n n T b T b A x 1 T A x 1 T A u i ∑ ∑ ( ) α i A u i ( ) α i u j u j + + u j = u j + + = 0 i = 1 i = 1 which, since the u i vectors are mutually conjugate with respect to the matrix A , we have T b A x 1 T A u j ( ) α j u j u j + + = 0 which can be re-written as T A x 1 T A u j ( ) α j u j b + u j + = 0 . Solving for the coefficients we have T A x 1 ( ) + b u j α j = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . (3) T A u j u j Now in an iterative scheme where we determine successive approximations along the u i directions by minimization, we have x i + 1 x i * u i λ i 1 … N , , = + , i = (4) * are found by minimizing Q x i where the λ i ( λ i u i ) with respect to the variable + λ i , and N is possibly greater than n . Therefore, letting y i x i + 1 x i λ i u i = = + , we set the derivative of Q y i λ i Q x i ( ) ( λ i u i ) with respect to λ i equal to 0 using the chain rule of ( ) = + differentiation: n j ∂ y i ⎛ ⎞ ∂ Q d Q x i + 1 T ∇ Q x i + 1 ∑ ( ) ( ) = ⎜ ⎟ = = u i 0 ∂ λ i λ i d j ⎝ ⎠ ∂ y i * λ i j = 1

but ∇ Q x i + 1 A x i + 1 ( ) = b + and therefore T b A x i ( ( λ i u i ) ) u i + + = 0 * are given by from which we get that the λ i T x iT A x i b T u i ( ) b + u i + A u i * λ i = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . (5) T A u i T A u i u i u i From (4), we can write i + x i 1 x i * u i x 1 * u j ∑ λ i λ j = + = + j = 1 i – 1 x i x 1 * u j ∑ λ j = + . j = 1 Forming the product x iT A u i in (5) we get i – 1 x iT T T x 1 * u j T A u i x 1 ∑ ( ) λ j ( ) A u i = A u i + = A u i j = 1 T A u i * can be written as ≠ . Therefore, the λ i because u j = 0 for j i T A x 1 ( ) b + u i * λ i = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (6) T A u i u i * but comparing this (3) we see that λ i α i = and therefore n x 1 * u j ∑ x * λ j = + (7) j = 1

which says that starting at x 1 we take n steps of “length” λ j * , given by (6), in the u j directions and we get the minimum. * Therefore x * is reached in n steps or less if some λ j = 0 . �

Example: consider the quadratic function of two variables given as 2 2 f x ( ) = 1 + x 1 – x 2 + x 1 + x 2 . Use the previous theorem to find the minimum starting at the origin and minimizing successively along the two directions given by T T the unit vectors u 1 = and u 2 = . (First show that these vectors are 1 0 0 1 mutually conjugate with respect to the Hessian matrix of the function.) Solution: first write the function in matrix form as x 1 x 1 1 1 2 0 b T x - x T A x - - x 1 x 2 - f x ( ) = 1 + + = c + + – 1 1 2 2 x 2 0 4 x 2 where we can clearly see the Hessian matrix A . We can now check that the two directions given are mutually conjugate with respect to A as T A u 2 T A u 1 2 0 0 2 0 1 u 1 = = 0 , u 1 = = 2 , 1 0 1 0 0 4 1 0 4 0 T A u 2 2 0 0 = = u 2 4 . 0 1 0 4 1 T Starting from x 1 * and λ 2 * , from (6) as we find the two lengths, λ 1 = 0 0 1 T 1 – 1 A x 1 ( ) b + u 1 0 * 1 λ 1 = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - = – - - T A u 1 2 2 u 1 0 T 1 – 1 A x 1 ( ) b + u 2 1 * 1 λ 2 = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - = – - - T A u 2 4 4 u 2 and therefore, from (7), the minimum is found as

Conjugate Directions Powells method is based on a model quadratic - PDF document

Conjugate Directions Powells method is based on a model quadratic objective function and conjugate directions in R n with respect to the Hessian of the quadratic objective function. R n , what does it mean for two vectors u v to

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Determining Weakly Reversible Linearly Conjugate Chemical Reaction Networks with Minimal

Tracking Perform ance of the MMax Conjugate Gradient Algorithm Bei Xie and Tam al Bose

Formal proof of SCHUR conjugate function Toumazet Objectives and tools SCHUR Micaela Mayero 1

Conjugate Phase Retrieval in the Paley-Wiener Space Eric Weber Iowa State University CodEx

Choosing Priors Probability Intervals 18.05 Spring 2014 January 1, 2017 1 /25 Conjugate

Conjugate Direction minimization Lectures for PHD course on Numerical optimization Enrico

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

CS 5220: More Sparse LA David Bindel 2017-10-26 1 Reminder: Conjugate Gradients What if we only

PRESENTATION FOR TESTING DIRECTIONS (NO 2) The World Health Organization declared COVID-19 a

CULTURE PLAN STRATEGIC DIRECTIONS FOR THE NEXT 5 YEARS RICHARD NEWIRTH CULTURAL SERVICES

The Glass Menagerie Tristan, Jacob, Harrison Author Choices Stage Directions Juxtaposition

Compass Directions! Learning Objective To understand how to read directions on maps using the

ANLP Lecture 22 Lexical Semantics with Dense Vectors Shay Cohen (Based on slides by Henry

Hedetniemi conjecture for strict vector chromatic number Robert mal (joint with C.Godsil,

Distributed Systems events vs. physical clocks : time of day Assume no central time source

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space

Maximal Vector Computation in Large Data Sets Parke Godfrey 1 Ryan Shipley 2 Jarek Gryz 1 1 York

Unitary friezes and frieze vectors Emily Gunawan and Ralf Schiffler University of Connecticut

Conjugate Directions Powells method is based on a model quadratic - PDF document

Conjugate Directions Powells method is based on a model quadratic objective function and conjugate directions in R n with respect to the Hessian of the quadratic objective function. R n , what does it mean for two vectors u v to

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Determining Weakly Reversible Linearly Conjugate Chemical Reaction Networks with Minimal

Tracking Perform ance of the MMax Conjugate Gradient Algorithm Bei Xie and Tam al Bose

Formal proof of SCHUR conjugate function Toumazet Objectives and tools SCHUR Micaela Mayero 1

Conjugate Phase Retrieval in the Paley-Wiener Space Eric Weber Iowa State University CodEx

Choosing Priors Probability Intervals 18.05 Spring 2014 January 1, 2017 1 /25 Conjugate

Conjugate Direction minimization Lectures for PHD course on Numerical optimization Enrico

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

CS 5220: More Sparse LA David Bindel 2017-10-26 1 Reminder: Conjugate Gradients What if we only

PRESENTATION FOR TESTING DIRECTIONS (NO 2) The World Health Organization declared COVID-19 a

CULTURE PLAN STRATEGIC DIRECTIONS FOR THE NEXT 5 YEARS RICHARD NEWIRTH CULTURAL SERVICES

The Glass Menagerie Tristan, Jacob, Harrison Author Choices Stage Directions Juxtaposition

Compass Directions! Learning Objective To understand how to read directions on maps using the

ANLP Lecture 22 Lexical Semantics with Dense Vectors Shay Cohen (Based on slides by Henry

Hedetniemi conjecture for strict vector chromatic number Robert mal (joint with C.Godsil,

Distributed Systems events vs. physical clocks : time of day Assume no central time source

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

Introduction to Information Retrieval http://informationretrieval.org IIR 6&amp;7: Vector Space

Maximal Vector Computation in Large Data Sets Parke Godfrey 1 Ryan Shipley 2 Jarek Gryz 1 1 York

Unitary friezes and frieze vectors Emily Gunawan and Ralf Schiffler University of Connecticut

Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space