Low Rank Approximation Lecture 9 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 9 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Manifold optimization General setting: Aim at solving optimization problem X ∈M r f ( X ) , min where M r is a manifold of rank- r matrices or tensors. Goal: Modify classical optimization algorithms (line search, Newton, quasi-Newton, ...) to produce iterates that stay on M r . Advantages over ALS: ◮ No need to solve subproblems, at least for first-order methods; ◮ Can draw on concepts from classical smooth optimization (line search strategies, convergence analysis, ...). Two valuable resources: ◮ Absil/Mahony/Sepulchre’2011: Optimization Algorithms on Matrix Manifolds. PUP , 2008. Available from https://press.princeton.edu/absil ◮ Manopt, a Matlab toolbox for optimization on manifolds. Available from https://manopt.org/ 2

Manifolds For open sets U ⊂ M , V ⊂ R d chart is bijective function ϕ : U → V . Atlas of M into R d is collection of charts ( U α , ϕ α ) such that: ◮ � α U α = M ◮ for any α, β with U α ∩ U β � = {∅} , change of coordinates : R d → R d ϕ β ◦ ϕ − 1 α is smooth ( C ∞ ) on its domain ϕ α ( U α ∩ U β ) . Illustration taken from Wikipedia. 3

Manifolds In the following, we assume that atlas is maximal. Proper definition of smooth manifold M needs further properties (topology induced by maximal atlas is Hausdorff and second-countable). See [Lee’2003] and [Absil et al.’2008]. Properties of M : ◮ finite-dimensional vector spaces are always manifolds; ◮ d = dimension of M ; ◮ M does not need to be connected (in the context of smooth optimization makes sense to consider connected manifolds only); ◮ function f : M → R differentiable at point x ∈ M if and only if f ◦ ϕ − 1 : ϕ ( U ) ⊂ R d → R is differentiable at ϕ ( x ) for some chart ( U , ϕ ) with x ∈ U . 4

Manifolds: First examples Lemma Let M be a smooth manifold and N ⊂ M an open subset. Then N is a smooth manifold (of equal dimension). Proof: Given atlas for M obtain atlas for N by selecting charts ( U , ϕ ) with U ⊂ N . Example: GL ( n , R ) , the set of real invertible n × n matrices, is a smooth manifold. Show that R m × n EFY. , the set of real m × n matrices of full rank min { m , n } , is a smooth manifold. ∗ EFY. Show that the set of n × n symmetric positive definite matrices is a smooth manifold. Two main classes of matrix manifolds: ◮ embedded submanifolds of R m × n ; Example: Stiefel manifold of orthonormal bases. ◮ quotient manifolds; Example: Grassmann manifold R m × n / GL ( n , R ) . ∗ Will focus on embedded submanifolds (much easier to work with). 5

Immersions and submersion Let M 1 , M 2 be smooth manifolds and F : M 1 → M 2 . Let x ∈ M 1 and y = F ( x ) ∈ M 2 . Choose charts ϕ 1 , ϕ 2 around x , y . Then coordinate representation of F given by : R d 1 → R d 2 . ˆ F := ϕ 2 ◦ F ◦ ϕ − 1 1 ◮ F is called smooth if ˆ F is smooth (that is, C ∞ ). ◮ rank of F at x ∈ M 1 defined as the rank of D ˆ F ( ϕ ( x 1 )) (Jacobian of ˆ F at ϕ ( x 1 ) ) ◮ F is called an immersion if its rank equals d 1 at every x ∈ M 1 . ◮ F is called a submersion if its rank equals d 2 at every x ∈ M 1 . 6

Embedded submanifolds Subset N ⊂ M is called an embedded submanifold of dimension k in M if for each point p ∈ N there is a chart ( U , ϕ ) in M such that all elements of U ∩ N are obtained by varying first k coordinates only. (See Chapter 5 of [Lee’2003] for more details.) Theorem Let M , N be smooth manifolds and let F : M → N be a smooth map with constant rank ℓ . Then each level set F − 1 ( y ) := { x ∈ M : F ( x ) = y } is a closed embedded submanifold of codimension ℓ in M . Corollaries: ◮ If F : M → N is a submersion then each level is a closed embedded submanifold of codimension equal to the dimension of N . ◮ In fact, by open submanifold lemma, only need to check full rank condition of submersion for points in the level set (replace M by the open set for which F has full rank). 7

The Stiefel manifold For m ≥ n , consider the set of all m × n matrices with orthonormal columns: St ( m , n ) := { X ∈ R m × n : X T X = I n } . Corollary St ( m , n ) is an embedded submanifold of R m × n . Proof: Define F : R m × n → symm ( n ) as F : X �→ X T X , where symm ( n ) denotes set of n × n symmetric matrices. At X ∈ St ( m , n ) , consider Jacobian DF ( X ) : H �→ X T H + H T X . Given symmetric Y ∈ R n × n , set H = XY / 2. Then DF ( X )[ H ] = Y ; thus DF ( X ) is surjective. EFY. What is the dimension of the Stiefel manifold? 8

The manifold of rank- k matrices Locality of definition of embedded submanifolds implies the following lemma (Lemma 5.5 in [Lee’2003]). Lemma Let N be subset of smooth manifold M . Suppose every point p ∈ N has a neighborhood U ⊂ M such that U ∩ N is an embedded submanifold of U . Then N is an embedded submanifold of M . Theorem Given m ≥ n, the set M k = { A ∈ R m × n : rank ( A ) = k } is an embedded submanifold of R m × n for every 0 ≤ k ≤ n. 9

The manifold of rank- k matrices Choose arbitrary A 0 ∈ M k . After a suitable permutation, may assume w.l.o.g. that � A 11 � A 12 A 11 ∈ R k × k is invertible . A 0 = , A 21 A 22 This property remains true in an open neighborhood U ⊂ R m × n of A 0 . Factorize A ∈ U as � � � A 11 � � A − 1 � I 0 0 I 11 A 12 A = . A 21 A − 1 A 22 − A 21 A − 1 I 0 11 A 12 0 I 11 Define F : U → R ( m − k ) × ( n − k ) as F : A �→ A 22 − A 21 A − 1 11 A 12 . Then F − 1 ( 0 ) = U ∩ M k . 10

The manifold of rank- k matrices For arbitrary Y ∈ R ( m − k ) × ( n − k ) , we obtain that �� 0 �� 0 DF ( A ) = Y . 0 Y Thus, F is a submersion. In turn, U ∩ M k is an embedded submanifold of U . By lemma, M k is an embedded submanifold of R m × n . EFY. What is the dimension of M k ? EFY. Is M k connected? Prove that the set of symmetric rank- k matrices is an embedded submanifold of R n × n . Is this manifold connected? EFY. 11

Tangent space In the following, much of the discussion restricted to submanifolds M embedded in vector space V with inner product �· , ·� and induced norm � · � . Given smooth curve γ : R → M with x = γ ( 0 ) , we call γ ′ ( 0 ) ∈ V a tangent vector at x . The tangent space T x M ⊂ V is the set of all tangent vectors at x . Lemma T x M is a subspace of V. Proof. If v 1 , v 2 are tangent vectors then there are smooth curves γ 1 , γ 2 such that x = γ 1 ( 0 ) = γ 2 ( 0 ) and γ ′ 1 ( 0 ) = v 1 , γ ′ 2 ( 0 ) = v 2 . To show that α v 1 + β v 2 for α, β ∈ R is again a tangent vector, consider chart ( U , ϕ ) around x such that ϕ ( x ) = 0. Define γ ( t ) = ϕ − 1 ( αϕ ( γ 1 ( t )) + βϕ ( γ 2 ( t ))) for t sufficiently close to 0. Then γ ( 0 ) = x and γ ′ ( 0 ) = α v 1 + β v 2 . EFY. Prove that the dimension of Tx M equals the dimension of M using a coordinate chart. 12

Tangent space Application of definition to Stiefel manifold. Let γ ( t ) = X + tY + O ( t 2 ) be a smooth curve with X ∈ St ( m , n ) . To ensure that γ ( t ) ∈ St ( m , n ) , we require I n = γ ( t ) T γ ( t ) = ( X + tY ) T ( X + tY )+ O ( t 2 ) = I n + t ( X T Y + Y T X )+ O ( t 2 ) . Thus, X T Y + Y T X = 0 characterizes tangent space: { Y ∈ R m × n : X T Y = − Y T X } T x St ( m , n ) = { XW + X ⊥ W ⊥ : W ∈ R n × n , W = − W T , W ⊥ ∈ R ( m − n ) × n } = where the columns of X ⊥ form basis of span ( X ) ⊥ 13

Tangent space When M is defined (at least locally) as level set of constant rank function F : V → R N , we have T x M = ker ( DF ( x )) . Proof. Let v ∈ T x M , that is, there is a curve γ : R → M such that γ ( 0 ) = x and γ ′ ( 0 ) = v . Then, by chain rule, � DF ( x )[ v ] = DF ( x )[ γ ′ ( 0 )] = ∂ � ∂ t F ( γ ( t )) = 0 , � � t = 0 because F is constant on M . Thus, T x M ⊂ ker ( DF ( x )) , which completes the proof by counting dimensions. 14

Tangent space of M k Recall that M k was obtained as level set of local submersion F : A �→ A 22 − A 21 A − 1 11 A 12 . Given A ∈ M k consider SVD � � � � V Σ 0 � T . � U A = U ⊥ V ⊥ 0 0 We have � Σ � 0 DF [ H ] = H 22 . 0 0 Thus, H is in the kernel if and only if H 22 = 0. In terms of A this implies R k × k R k × ( n − k ) � � � � V � T T A M k � U = ker ( DF ( A )) = U ⊥ V ⊥ R ( m − k ) × k 0 { UMV T + U p V T + UV T p : M ∈ R k × k , U T p U = V T p V = 0 } . = EFY. Compute the tangent space for the embedded submanifold of rank- k symmetric matrices. 15

Riemannian manifold and gradient For submanifold M embedded in vector space V : Inner product �· , ·� on V induces inner product on T x M . This turns M into a Riemannian manifold. 1 The (Riemannian) gradient of smooth f : M → R at x ∈ M is defined as the unique element grad f ( x ) ∈ T x M that satisfies � grad f ( x ) , ξ � = Df ( x )[ ξ ] , ∀ ξ ∈ T x M . EFY. Prove that the Riemannian gradient satisfies the steepest ascent property grad f ( x ) = arg max Df ( x )[ ξ ] . � grad f ( x ) � 2 ξ ∈ Tx M � ξ � = 1 1 In general, for a Riemannian manifold one needs to have an inner product on T x M that varies smoothly wrt x . 16

Low Rank Approximation Lecture 9 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 9 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Manifold optimization General setting: Aim at solving optimization problem X M r f ( X ) ,

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

The limit spectrum of special random matrices Patryk Pagacz 1 Department of Mathematics,

Local Cohomology with Support in Ideals of Maximal Minors and subMaximal Pfaffians Claudiu

Horns problem, and Fourier analysis Jacques Faraut Symmetries in Geometry, Analysis, and

INTRODUCTION TO SYMPLECTIC TOPOLOGY Milena Pabiniak Friday October 20, 2006 GRADUATE STUDENT

Some Remarks on the Exponential Map on the Groups SO ( n ) and SE ( n ) Faculty of Mathematics and

IDEALS AND THEIR INTEGRAL CLOSURE ALBERTO CORSO (joint work with C. Huneke and W.V. Vasconcelos)

Quantum cluster algebra at roots of unity and discriminant formula Bach Nguyen Louisiana State

Toric Nearly K ahler 6-manifolds Andrei Moroianu CNRS - Paris-Sud University Progress and