 
              CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18
Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18
Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. SVD theorem: A can be written as UDV T where V is a n × n orthonormal matrix D is a m × n diagonal matrix with ≤ min { m , n } non-zeroes called the singular values of A U is a m × m orthonormal matrix Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18
SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18
SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 We can in fact restrict attention to r the rank of A . r � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18
SVD Interpreting A as a linear operator A : R n → R m Columns of V is an orthonormal basis and hence V T x for x ∈ R n expresses x in the V basis. Note that V T x is a rigid transformation (does not change length of x ). Let y = V T z . D is a diagonal matrix which only stretches y along the coordinate axes. Also adjusts dimension to go from n to m with right number of zeroes. Let z = Dy . Then Uz is a rigid transformation that expresses z in the basis corresponding to rows of U . Thus any linear operator can be broken up into a sequence of three simpler/basic type of transformations Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 18
Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18
Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18
Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Why this magic? Frobenius norm and basic properties of vector projections Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18
Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � � a i − � a i , v � v � 2 is distance of a i from line described by v . Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18
Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18
Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 How to find best v ? Not obvious: we will come to it a bit later Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18
Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18
Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Minimizing � A − B � 2 F is same as finding orthogonal vectors v 1 , v 2 to maximize m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) � i =1 in other words the best fit 2 -dimensional space Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18
Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Greedy algorithm works! Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18
Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18
Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 If v 1 ⊂ H then done because we can assume wlog that w 1 = v 1 and v 2 is at least as good as w 2 . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18
Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18
Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Note that 1 � 2 + � v ′′ � v ′ 1 � 2 2 = � v 1 � 2 2 = 1 . 1 1 � 2 v ′ Wlog we can assume by rotation that w 1 = 1 and w 2 is � v ′ orthogonal to v ′ 1 . Hence w 2 is orthogonal to v 1 . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18
Recommend
More recommend