Versatility of Singular Value Decomposition (SVD) January 7, 2015

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A .

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise rank ( B ) ≤ k . || C || ( = Max | u |= 1 || Cu | ) ≤ ∆ .

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise rank ( B ) ≤ k . || C || ( = Max | u |= 1 || Cu | ) ≤ ∆ . k << n , d . ∆ small.

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise rank ( B ) ≤ k . || C || ( = Max | u |= 1 || Cu | ) ≤ ∆ . k << n , d . ∆ small. �� C 2 Caution: || C || F ( = ij ) need not be smaller than for example || B || F . In words, overall noise can be larger than overall real data.

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise rank ( B ) ≤ k . || C || ( = Max | u |= 1 || Cu | ) ≤ ∆ . k << n , d . ∆ small. �� C 2 Caution: || C || F ( = ij ) need not be smaller than for example || B || F . In words, overall noise can be larger than overall real data. Given any A , Singular Value Decomposition (SVD) finds B of rank k (or less) for which || A − B || is minimum. Space spanned by columns of B is the best-fit subspace for A in the sense of least sum over all data points of squared distances to subspace.

Assumption : Data = Real Data + Noise Each Data Point is a column of the n × d Data Matrix A . A = B C + . �� Real Data Noise rank ( B ) ≤ k . || C || ( = Max | u |= 1 || Cu | ) ≤ ∆ . k << n , d . ∆ small. �� C 2 Caution: || C || F ( = ij ) need not be smaller than for example || B || F . In words, overall noise can be larger than overall real data. Given any A , Singular Value Decomposition (SVD) finds B of rank k (or less) for which || A − B || is minimum. Space spanned by columns of B is the best-fit subspace for A in the sense of least sum over all data points of squared distances to subspace. A very powerful tool. Decades of theory, algorithms. Here: Example applications.

Example I- Mixture of Spherical Gaussians F ( x ) = w 1 N ( µ 1 , σ 2 1 ) + w 2 N ( µ 2 , σ 2 2 ) +···+ w k N ( µ k , σ 2 k ) , in d dimensions.

Example I- Mixture of Spherical Gaussians F ( x ) = w 1 N ( µ 1 , σ 2 1 ) + w 2 N ( µ 2 , σ 2 2 ) +···+ w k N ( µ k , σ 2 k ) , in d dimensions. Learning Problem : Given i.i.d. samples from F ( · ) , find the components ( µ i , σ i , w i ). Really a Clustering Problem.

Example I- Mixture of Spherical Gaussians F ( x ) = w 1 N ( µ 1 , σ 2 1 ) + w 2 N ( µ 2 , σ 2 2 ) +···+ w k N ( µ k , σ 2 k ) , in d dimensions. Learning Problem : Given i.i.d. samples from F ( · ) , find the components ( µ i , σ i , w i ). Really a Clustering Problem. In 1-dimension, we can solve the learning problem if Means of the component densities are Ω ( 1 ) standard deviations apart.

Example I- Mixture of Spherical Gaussians F ( x ) = w 1 N ( µ 1 , σ 2 1 ) + w 2 N ( µ 2 , σ 2 2 ) +···+ w k N ( µ k , σ 2 k ) , in d dimensions. Learning Problem : Given i.i.d. samples from F ( · ) , find the components ( µ i , σ i , w i ). Really a Clustering Problem. In 1-dimension, we can solve the learning problem if Means of the component densities are Ω ( 1 ) standard deviations apart. But in d dimensions: Approximate k means fails. Pair of Sample from different clusters may be closer than a pair from the same !

SVD to the Rescue For a mixture of k spherical Gaussians (with different variances), the best-fit k dimensional subspace (found by SVD) passes through all the k centers. Vempala, Wang.

SVD to the Rescue For a mixture of k spherical Gaussians (with different variances), the best-fit k dimensional subspace (found by SVD) passes through all the k centers. Vempala, Wang. Beautiful proof: For one spherical Gaussian with non-zero mean, the best fit 1-dim subspace passes through the mean. And any k-dim subspace containing the mean is a best-fit k − dimensional space.

SVD to the Rescue For a mixture of k spherical Gaussians (with different variances), the best-fit k dimensional subspace (found by SVD) passes through all the k centers. Vempala, Wang. Beautiful proof: For one spherical Gaussian with non-zero mean, the best fit 1-dim subspace passes through the mean. And any k-dim subspace containing the mean is a best-fit k − dimensional space. So, now if a k − dimensional space contains all the k means, it is individually the best for each component Gaussian !!

SVD to the Rescue For a mixture of k spherical Gaussians (with different variances), the best-fit k dimensional subspace (found by SVD) passes through all the k centers. Vempala, Wang. Beautiful proof: For one spherical Gaussian with non-zero mean, the best fit 1-dim subspace passes through the mean. And any k-dim subspace containing the mean is a best-fit k − dimensional space. So, now if a k − dimensional space contains all the k means, it is individually the best for each component Gaussian !! Simple Observation to finish : Given the k − space containing the means, we need only solve a k − dim problem. Can be done in time exponential only in k

Planted Clique Problem Given G = G ( n , 1 / 2 ) + S × S , ( S unknown, | S | = s ), find S in poly � n ) . time. Best known: s ≥ Ω (   1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1  1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1      1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1     ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   A =   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1

Planted Clique Problem Given G = G ( n , 1 / 2 ) + S × S , ( S unknown, | S | = s ), find S in poly � n ) . time. Best known: s ≥ Ω (   1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1  1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1      1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1     ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   A =   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 || Planted Clique || = s . Random Matrix Theory : Random ± 1 � n . So, SVD finds S when s ≥ � n . matrix has norm at most 2 Alon, Boppanna-1985.

Planted Clique Problem Given G = G ( n , 1 / 2 ) + S × S , ( S unknown, | S | = s ), find S in poly � n ) . time. Best known: s ≥ Ω (   1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1  1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1      1 1 1 ± 1 ± 1 ± 1 ± 1 ± 1     ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   A =   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1      ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1   ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 ± 1 || Planted Clique || = s . Random Matrix Theory : Random ± 1 � n . So, SVD finds S when s ≥ � n . matrix has norm at most 2 Alon, Boppanna-1985. Feldman, Grigorescu, Reyzin, Vempala, Xiao (2014): Cannot be beaten by Statistical Learning Algorithms.

Planted Gaussians: Signal and Noise A n × n matrix and S ⊆ [ n ] , | S | = k .   . . . . . . . . µ +   . . . . . . .     . . . . . . . .     . . . . . . . .   A =    . . . . . . . .    N ( 0 , σ 2 )   . . . . . . .     . . . . . . . .   . . . . . . . .

Planted Gaussians: Signal and Noise A n × n matrix and S ⊆ [ n ] , | S | = k . A ij all independent r.v.’s   . . . . . . . . µ +   . . . . . . .     . . . . . . . .     . . . . . . . .   A =    . . . . . . . .    N ( 0 , σ 2 )   . . . . . . .     . . . . . . . .   . . . . . . . .

Planted Gaussians: Signal and Noise A n × n matrix and S ⊆ [ n ] , | S | = k . A ij all independent r.v.’s For i , j ∈ S , Pr ( A ij ≥ µ ) ≥ 1 / 2. (Eg. N ( µ , σ 2 ) ). Signal = µ .   . . . . . . . . µ +   . . . . . . .     . . . . . . . .     . . . . . . . .   A =    . . . . . . . .    N ( 0 , σ 2 )   . . . . . . .     . . . . . . . .   . . . . . . . .

Versatility of Singular Value Decomposition (SVD) January 7, 2015 - PowerPoint PPT Presentation

Versatility of Singular Value Decomposition (SVD) January 7, 2015 Assumption : Data = Real Data + Noise Each Data Point is a column of the n d Data Matrix A . Assumption : Data = Real Data + Noise Each Data Point is a column of the n d Data

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

EXPLORER+700 When versatility & high-speed matters EXPLORER+ 700 Versatility:

Striving for Versatility in Striving for Versatility in Publish/Subscribe Publish/Subscribe

S9226 Fast singular value decomposition on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Samuel

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to

Topics in Algorithms and Data Science Singular Value Decomposition (SVD) Omid Etesami The

XML Documents XML Documents The XML Namespace mechanism Anders Mller & Michael I.

Distributional Hypothesis Zellig Harris: words that occur in the same contexts tend to have

A Method of Moments for Mixture Models and Hidden Markov Models Anima Anandkumar @ Daniel Hsu #

INTRODUCTION TO PROGRAMMING Using Arduino Disclaimer Many of these slides are mine

Whole Numbers Jumping Jack Snap Game (numeral Cluedo Numerals! cards 0 to 20) Tell your child

Lecture 3: Word and document embeddings Plan of the lecture Part 1 : Distributional semantics

Factor Vocab Word 2 Fraction Division Its meaning (As it is used A whole number A whole

AutoDiff: Reverse Mode v 0 v 5 v 2 v 3 ln x 1 v 0 v 5 v 2 v 1 + x 2 v 4 y v 6 +