SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18

Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18

Singular Value Decomposition (SVD) Let A be a m × n real-valued matrix a i denotes vector corresponding to row i m rows. think of each row as a data point in R n Data applications: m ≫ n Other notation: A is a n × d matrix. SVD theorem: A can be written as UDV T where V is a n × n orthonormal matrix D is a m × n diagonal matrix with ≤ min { m , n } non-zeroes called the singular values of A U is a m × m orthonormal matrix Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 18

SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18

SVD Let d = min { m , n } . u 1 , u 2 , . . . , u m columns of U , left singular vectors of A v 1 , v 2 , . . . , v n columns of V (rows of V T ) right singular vectors of A σ 1 ≥ σ 2 ≥ . . . , ≥ σ d are singular values where d = min { m , n } . And σ i = D i , i d � σ i u i v T A = i i =1 We can in fact restrict attention to r the rank of A . r � σ i u i v T A = i i =1 Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 18

SVD Interpreting A as a linear operator A : R n → R m Columns of V is an orthonormal basis and hence V T x for x ∈ R n expresses x in the V basis. Note that V T x is a rigid transformation (does not change length of x ). Let y = V T z . D is a diagonal matrix which only stretches y along the coordinate axes. Also adjusts dimension to go from n to m with right number of zeroes. Let z = Dy . Then Uz is a rigid transformation that expresses z in the basis corresponding to rows of U . Thus any linear operator can be broken up into a sequence of three simpler/basic type of transformations Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 18

Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

Low rank approximation property of SVD Question: Given A ∈ R m × n and integer k find a matrix B of rank at most k such that � A − B � is minimized Fact: For Frobenius norm optimum for all k is captured by SVD. That is, A k = � k i =1 σ i u i v T is the best rank k approximation to A i � A − A k � F = B : rank ( B ) ≤ k � A − B � F min Why this magic? Frobenius norm and basic properties of vector projections Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 18

Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

Geometric meaning Consider k = 1 . What is the best rank 1 matrix B that minimizes � A − B � F Since B is rank 1 , B = uv T where v ∈ R n and u ∈ R m Wlog v is a unit vector m � A − uv T � 2 � || a i − u ( i ) v || 2 F = i =1 If we know v then best u to minimize above is determined. Why? For fixed v , u ( i ) = � a i , v � � a i − � a i , v � v � 2 is distance of a i from line described by v . Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 18

Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18

Geometric meaning What is the best rank 1 matrix B that minimizes � A − B � F It is to find unit vector/direction v to minimize m � || a i − � a i , v � v || 2 i =1 which is same as finding unit vector v to maximize m � � a i , v � 2 i =1 How to find best v ? Not obvious: we will come to it a bit later Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 18

Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18

Best rank two approximation Consider k = 2 . What is the best rank 2 matrix B that minimizes � A − B � F Since B has rank 2 we can assume without loss of generality that B = u 1 v T 1 + u 2 v T 2 where v 1 , v 2 are orthogonal unit vectors (span a space of dimension 2 ) Minimizing � A − B � 2 F is same as finding orthogonal vectors v 1 , v 2 to maximize m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) � i =1 in other words the best fit 2 -dimensional space Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 18

Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18

Greedy algorithm Find v 1 as the best rank 1 approximation. That is � m i =1 � a i , v � 2 v 1 = arg max v , � v � 2 =1 � m i =1 � a i , v � 2 . For v 2 solve arg max v ⊥ v 1 , � v � 2 =1 Alternatively: let a ′ i = a i − � a i , v 1 � v 1 . Let � m i =1 � a ′ i , v � 2 v 2 = arg max v , � v � 2 =1 Greedy algorithm works! Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 18

Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18

Greedy algorithm correctness Proof that Greedy works for k = 1 . Suppose w 1 , w 2 are orthogonal unit vectors that form the best fit 2 -d space. Let H be the space spanned by w 1 , w 2 . Suffices to prove that m m ( � a i , v 1 � 2 + � a i , v 2 � 2 ) ≥ ( � a i , w 1 � 2 + � a i , w 2 � 2 ) � � i =1 i =1 If v 1 ⊂ H then done because we can assume wlog that w 1 = v 1 and v 2 is at least as good as w 2 . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 18

Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18

Greedy algorithm correctness Suppose v 1 �∈ H . Let v ′ 1 be projection of v 1 onto H and v ′′ 1 = v 1 − v ′ 1 be the component of v 1 orthogonal to H . Note that 1 � 2 + � v ′′ � v ′ 1 � 2 2 = � v 1 � 2 2 = 1 . 1 1 � 2 v ′ Wlog we can assume by rotation that w 1 = 1 and w 2 is � v ′ orthogonal to v ′ 1 . Hence w 2 is orthogonal to v 1 . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 18

SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18 Singular Value Decomposition (SVD) Let A be a m n real-valued matrix a i denotes

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

1 Low-rank approximations to a matrix using SVD First point: we can write the SVD as a sum of

A study for hit-time reconstruction of Belle II SVD Yuma Uematsu (UTokyo) on behalf of Belle II

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

Partial Lanczos SVD methods for R Bryan Lewis 1 , adapted from the work of Jim Baglama 2 and Lothar

The Great SVD Mystery James H. Steiger Department of Psychology and Human Development Vanderbilt

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Topological order from quantum loops and nets Paul Fendley It has proved to be quite tricky to T

Important Scientific Presentation Jonathan Doe Department of Electrical Engineering University

Solving large scale eigenvalue problems Lecture 8, April 18, 2018: Krylov spaces

Principal Component Ananalysis 4-8-2016 PCA: the setting Unsupervised learning Unlabeled

QUANTUM COMPUTING Carlile Lavor clavor@ime.unicamp.br UNICAMP, Brazil 1 The postulates of

EE201/MSE207 Lecture 8 Measurement and uncertainty principle Determinate state Theorem: If

Section 6.2 Orthogonal Sets A set of vectors u 1 , u 2 , , u p in R n is called an

Function Representation & Spherical Harmonics Function approximation G (x) ... function

Sambuz

Useful Links

Newsletter

Mail Us

SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 SVD and Low-rank Approximation Lecture 23 April 18, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 18 Singular Value Decomposition (SVD) Let A be a m n real-valued matrix a i denotes

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

1 Low-rank approximations to a matrix using SVD First point: we can write the SVD as a sum of

A study for hit-time reconstruction of Belle II SVD Yuma Uematsu (UTokyo) on behalf of Belle II

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

Partial Lanczos SVD methods for R Bryan Lewis 1 , adapted from the work of Jim Baglama 2 and Lothar

The Great SVD Mystery James H. Steiger Department of Psychology and Human Development Vanderbilt

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Topological order from quantum loops and nets Paul Fendley It has proved to be quite tricky to T

Important Scientific Presentation Jonathan Doe Department of Electrical Engineering University

Solving large scale eigenvalue problems Lecture 8, April 18, 2018: Krylov spaces

Principal Component Ananalysis 4-8-2016 PCA: the setting Unsupervised learning Unlabeled

QUANTUM COMPUTING Carlile Lavor clavor@ime.unicamp.br UNICAMP, Brazil 1 The postulates of

EE201/MSE207 Lecture 8 Measurement and uncertainty principle Determinate state Theorem: If

Section 6.2 Orthogonal Sets A set of vectors u 1 , u 2 , , u p in R n is called an

Function Representation &amp; Spherical Harmonics Function approximation G (x) ... function

Sambuz

Useful Links

Newsletter

Mail Us

Function Representation & Spherical Harmonics Function approximation G (x) ... function