Topics in Algorithms and Data Science Singular Value Decomposition - PowerPoint PPT Presentation

Topics in Algorithms and Data Science Singular Value Decomposition (SVD) Omid Etesami

The problem of best-fit subpace

Best-fit subspace n points in d -dimensional Euclidean space are given. The best-fit subspace of dimension k minimizes the sum of squared distances from the points to the subspace.

Centering data

Centering data For the best-fit affine subspace, translate so that the center of mass of the points lies at the origin. Then find best-fit linear subspace .

Why centering works? Lemma. Best-fit affine subspace of dimension k for the points a 1 ,…,a n passes through their center of mass.

Proof of lemma • W.l.o.g . assume “center of mass = 0”. a i • Let a = projection of 0 onto the affine subspace l . • We can write l = a + S where S is a linear subspace. • The sum is minimized for a = 0.

The greedy approach to best subpace yields the singular vectors

The greedy approach to finding best S 3 k -dimensional subspace S 1 S 0 = {0} for i = 1 to k do S 2 S i = best-fit i -dimensional subspace among those that contain S i-1 S 0

Best-fit line Instead of minimizing sum of squared distances we maximize sum of squared lengths of projections onto the line. a i

1 st singular vector and value • v 1 = unit vector in the direction of best-fit line • |< a i , v 1 >| = length of projection of a i • rows of n×d matrix A = data points. v 1 1 st right singular vector = v 1 = argmax | v |=1 | Av | 1 st singular value = | Av 1 |

Other singular vectors S 3 v 1 S 1 S 2 { v 1 ,…,v i } is an orthonormal basis for S i because v 2 sum of squared lengths of projections on S i is | A v i | + sum of squared lengths of projections on S i-1 . S 0 v i is the i ‘th right singular vector and ơ i (A) = | A v i | is the i ‘th singular vector.

T k Why greedy works? w 2 S k v 1 Proof by induction on k: Consider a k -dimensional subspace T k . It has a unit vector w k orthogonal to S k-1 . Let T k = T k-1 + w k , such that w k is orthogonal to T k-1 . • Sum of squared lengths of projections on T k-1 is at most that for S k-1 . • | A w k | ≤ | A v k |.

Singular values We only consider non-zero singular values, i.e. i ’th singular value is defined only for 1 ≤ i ≤ r = rank (A). Lemma.

Singular Vector Decomposition

Singular Value Decomposition (SVD) Left singular vector u i = A v i / ơ i D diagonal with diagonal entries ơ i U with columns u i V with columns v i

Thm. Left singular vectors are orthogonal. u 1 Proof: u 3 u 2

Uniqueness of singular values/vectors • The sequence of singular values forms a unique non-increasing sequence. • Singular vectors corresponding to a particular singular value ơ are any orthonormal basis for a unique subspace associated with ơ . v 3 v’ 2 v 2 ơ 1 = ơ 2 v ’ 1 v 1 v’ 3 = -v 3 v ’ 3

Best rank- k approximation: Frobenius and spectral norms

Rank- k approximation A k is the best rank- k approximation to A under Frobenius norm.

Representing documents as vectors n×d term-document matrix: I like football John likes basketball Doc 1 1 1 2 0 0 0 Doc 2 0 0 2 1 1 0 Doc 3 0 0 0 1 1 1

Answering queries • Each query is a d- dimensional vector x denoting the importance of each term • Answer = similarity (dot-product) to each document = Ax • O(nd) time to process each query

SVD as preprocessing When many queries, we preprocess and get We can now answer queries in O(kd + kn) time: Good when k << d, n A u 1 ,…,u k ,v 1 ,…, v k A k x preprocessing answering queries

Spectral norm

Spectral norm of error of A k Spectral norm of M = ơ 1

Best rank- k approximation according to spectral norm is A k rank-4 approximation of adjacency matrix

Connection of SVD with eigenvalues

Singular values and eigenvalues • Let B = A T A. Therefore eigenvalues of B are square of singular values of A , and eigenvectors of B are right singular vectors of A . • If A is symmetric, absolute value of eigenvalues of A are singular values of A , and eigenvectors of A are right singular vectors of A.

Analogue of eigenvectors and eigenvalues • A v i = ơ i u i • A T u i = ơ i v i

Computing SVD

Computing SVD by the Power Method If ơ 1 ≠ ơ 2 , then B k tends to ơ 1 2k v i v i T . Estimate of v 1 = a normalized column of B k

Inefficiency of the previous method • Matrix multiplication takes time. • We cannot use the potential sparsity of A. E.g. A may be 10 8 × 10 8 but we represent it by its say 10 9 nonzero entries. B may have 10 16 nonzero entries, so big not even possible to write.

Faster power method Use matrix-vector multiplication instead of matrix-matrix multiplication Algorithm: • Choose a random vector x • Compute B k x = A T A A T A … A T A x • Choose B k x normalized as v 1

Component of random vector along 1 st singular vector v 1 Lemma. Pr[ |< x, v 1 > | ≤ 20/d 1/2 ] ≤ 1/10 + exp(- Ө (d)) . Proof: • x = y / | y | where y spherical Gaussian with unit variance • Pr[ |< y, v 1 >| ≤ 1/10 ] ≤ 1/10 ( < y, v 1 > standard normal Gaussian) • Pr[ | y | ≥ 2 d 1/2 ] ≤ exp(- Ө (d)) (Gaussian annulus theorem)

Analysis of the power method • Let V = span of right singular vectors with singular values ≥ (1 – Ɛ) ơ 1 . • Assume |< x, v 1 > | ≥ δ . • | B k x | ≥ ơ 1 2k δ. • component of B k x perpendicular to V ≤ [( 1 – Ɛ) ơ 1 ] 2k .

Traditional application of SVD: Principal Component Analysis

Movie recommendation n costumers, d movies matrix A where a ij = rating of user i for movie j

Principal Component Analysis (PCA) • Assume there are k underlying factors, e.g. “amount of comedy”, “novelty of story”, … • each movie = k -dimensional vector • each user = k -dimensional vector representing importance of each factor to the user • rating = dot-product <movie, user> • A k = best rank- k approximation to A yields U, V • A – UV treated as noise

Collaborative filtering • A has missing entries: recommend a movie or target an ad based on previous purchases • Assume A = small-rank matrix + noise • One approach is to fill missing values reasonably e.g. by average rating, then apply SVD to recover missing entries

Application of SVD: clustering mixture of spherical Gaussians

Clustering • Partition d -dimensional points into k groups • Finding “best” solution often NP -hard; thus, assume stochastic models of data.

Mixture models A class of stochastic models are mixture models, e.g. mixture of spherical Gaussians F = w 1 p 1 + … + w k p k

Model fitting problem Given n i.i.d. samples drawn according to F , fit a mixture of k Gaussians to them. Possible solution: • First, cluster the points into k clusters • Then, fit a Gaussian to each cluster (by choosing empirical mean and variance)

Inter-center distance • If two Gaussian centers are very close, clustering unresolvable. • If every two Gaussian centers are at least say six times the standard deviation apart, clustering unambiguous.

Distance based clustering • If x, y are independent samples from the same Gaussian, then | x – y | 2 = 2 (d 1/2 ± O(1)) 2 σ 2 • If x, y are independent samples from two Gaussians at distance Δ , then | x – y | 2 = 2 (d 1/2 ± O(1)) 2 σ 2 + Δ 2 Thus, to distinguish the two cases, we need inter-center distance Δ ≥ Ω ( σ d 1/4 ).

Projection on the subspace spanned by the k centers If we knew the subspace spanned by the k centers, we could project points on that subspace. Inter-center distances would not change, and the samples are still spherical Gaussians but now in k -space, so now a separation of Θ ( σ k 1/4 ) is enough.

How to find the subspace spanned by the k centers? Green points = sample points Theorem. The best-fit subspace of dimension k for points sampled according to the mixture distribution passes through the centers. Thus, we can find the subspace by SVD on a large number of sampled points.

Why best-fit subpace passes through centers? • Best-fit line for a single Gaussian passes through the center. Proof. For a unit vector x and sample point x • Best-fit dim- k subspace for a single Gaussian is any subspace that passes through the center. Proof. Greedy best-fit subspace. • Subspace of dim k passing through k centers is simultaneously best for all Gaussians.

Application of SVD: ranking documents and webpages

Ranking documents Given documents in a collection, how do we rank documents according to their relevance to the collection? Solution. We can rank according to the length of the projection of documents onto the first right singular vector. I like football John likes basketball Doc 1 1 1 2 0 0 0 Doc 2 0 0 2 1 1 0 Doc 3 0 0 0 1 1 1

Ranking webpages • Web as directed graph, webpages as vertices, hyperlinks as edges • Authorities: sources of information (with many pointers from hubs) • Hubs: identify authorities (with many pointers to hubs) Looks like a “circular” definition.

Topics in Algorithms and Data Science Singular Value Decomposition - PowerPoint PPT Presentation

Topics in Algorithms and Data Science Singular Value Decomposition (SVD) Omid Etesami The problem of best-fit subpace Best-fit subspace n points in d -dimensional Euclidean space are given. The best-fit subspace of dimension k minimizes the sum

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Descriptive and combinatorial set theory Introduction Singular cardinals, at singular cardinals

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

Best rank-one approximation Definition: The first left singular vector of A is defined to be the

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Versatility of Singular Value Decomposition (SVD) January 7, 2015 Assumption : Data = Real Data +

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

Section 6.6 Least Squares Problems Data Modeling: Best fit line What does it minimize? Best fit

Offline Analysis of H4 Beam Line Instrumentation Data Alexander Booth for N. Charitonidis, Y.

STAT 113 Analytic Inference for Regression Colin Reimer Dawson Oberlin College 21-24 April 2017

Develop Your Data Mindset Module 8 - Progress Monitoring Part 2 - Background Knowledge (Graphing

The conditional CAPM does not explain asset- pricing anomalies Jonathan Lewellen & Stefan

Robust Statistics and Generative Adversarial Networks Yuan YAO HKUST Chao Gao (Chicago) Jiyu

xtdcce2 : Estimating Dynamic Common Correlated Effects in Stata Jan Ditzen Spatial Economics and

Exclusion Bias in the Estimation of Peer Effects Bet Caeyers (Institute for Fiscal Studies,

Topics in Algorithms and Data Science Singular Value Decomposition - PowerPoint PPT Presentation

Topics in Algorithms and Data Science Singular Value Decomposition (SVD) Omid Etesami The problem of best-fit subpace Best-fit subspace n points in d -dimensional Euclidean space are given. The best-fit subspace of dimension k minimizes the sum

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Descriptive and combinatorial set theory Introduction Singular cardinals, at singular cardinals

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

Best rank-one approximation Definition: The first left singular vector of A is defined to be the

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Versatility of Singular Value Decomposition (SVD) January 7, 2015 Assumption : Data = Real Data +

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

Section 6.6 Least Squares Problems Data Modeling: Best fit line What does it minimize? Best fit

Offline Analysis of H4 Beam Line Instrumentation Data Alexander Booth for N. Charitonidis, Y.

STAT 113 Analytic Inference for Regression Colin Reimer Dawson Oberlin College 21-24 April 2017

Develop Your Data Mindset Module 8 - Progress Monitoring Part 2 - Background Knowledge (Graphing

The conditional CAPM does not explain asset- pricing anomalies Jonathan Lewellen &amp; Stefan

Robust Statistics and Generative Adversarial Networks Yuan YAO HKUST Chao Gao (Chicago) Jiyu

xtdcce2 : Estimating Dynamic Common Correlated Effects in Stata Jan Ditzen Spatial Economics and

Exclusion Bias in the Estimation of Peer Effects Bet Caeyers (Institute for Fiscal Studies,

The conditional CAPM does not explain asset- pricing anomalies Jonathan Lewellen & Stefan