high dimensional data pca
play

High Dimensional Data PCA So far we ve considered scalar data - PDF document

High Dimensional Data PCA So far we ve considered scalar data values f i (or We have n data points from m dimensions: interpolated/approximated each component of store as columns of an mxn matrix A vector values individually) We


  1. High Dimensional Data PCA � So far we � ve considered scalar data values f i (or � We have n data points from m dimensions: interpolated/approximated each component of store as columns of an mxn matrix A vector values individually) � We � re looking for linear correlations � In many applications, data is itself in high dimensional space between dimensions • Or there � s no real distinction between dependent (f) • Roughly speaking, fitting lines or planes or and independent (x) -- we just have data points hyperplanes through the origin to the data � Assumption: data is actually organized along a • May want to subtract off the mean value along smaller dimension manifold each dimension for this to make sense • generated from smaller set of parameters than number of output variables � Huge topic: machine learning � Simplest: Principal Components Analysis (PCA) cs542g-term1-2006 1 cs542g-term1-2006 2 Reduction to 1D The rank-1 problem � Assume data points fit through a line through the � Use Least-Squares formulation again: 2 origin (1D subspace) A � uw T min � In this case, say line is along unit vector u. (m- u � R m , u = 1 F dimensional vector) w � R n � Each data point should be a multiple of u (call � Clean it up: take w= � v with � � 0 and |v|=1 the scalar multiples w i ): 2 A * i = uw i A � u � v T min u � R m , u = 1 F � That is, A would be rank-1: A=uw T v � R n , v = 1 � � 0 � Problem in general: find rank-1 matrix that best approximates A � u and v are the first principal components of A cs542g-term1-2006 3 cs542g-term1-2006 4 Solving the rank-1 problem Finding u 2 = u T Avv T A T u ( ) u T Av � First look at u: � Remember trace version of Frobenius norm: 2 = tr A � u � v T ( ) ( ) T A � u � v T A � u � v T ( ) u = u T AA T F ( ) � tr A T u � v T ( ) � tr v � u T A ( ) + tr v � u T u � v T ( ) = tr A T A � AA T is symmetric, thus has a complete set of ( ) � 2 u T Av � + � 2 = tr A T A orthonormal eigenvectors X, eigenvectors mu � Minimize with respect to sigma first: m � � Write u in this basis: � 2 = 0 u = ˆ u i X i � � A � u � v T i = 1 F � 2 u T Av + 2 � = 0 � Then maximizing: T � � � � m m m � = u T Av � � � u T AA T u = µ i ˆ � = µ i ˆ ˆ 2 u i X i u i X i u i � � � � � � � � Then plug in to get a problem for u and v: i = 1 i = 1 i = 1 ( ) ( ) � Obviously pick u to be the eigenvector with min � u T Av max u T Av 2 2 � largest eigenvalue cs542g-term1-2006 5 cs542g-term1-2006 6

  2. Finding v Generalizing � Write the thing we � re maximizing as: � In general, if we expect problem to have 2 = v T A T uu T Av ( ) subspace dimension k, we want the u T Av closest rank-k matrix to A ( ) v = v T A T A • That is, express the data points as linear combinations of a set of k basis vectors � Same argument gives v the eigenvector (plus error) corresponding to max eigenvalue of A T A • We want the optimal set of basis vectors and the optimal linear combinations: � Note we also have 2 = max � AA T 2 � 2 = u T Av ( ) ( ) = max � A T A ( ) = A 2 A � UW T min 2 U � R m � k , U T U = I F W � R n � k cs542g-term1-2006 7 cs542g-term1-2006 8 Finding W Finding U � Plugging in W=A T U we get � Take the same approach as before: 2 = tr A � UW T ( ) ( ) T A � UW T A � UW T 2 min A � UW T F F = tr A T A � 2tr WU T A + tr WU T UW T � min � 2tr A T UU T A + tr A T UU T A 2 � 2tr WU T A + W � max tr U T AA T U = A F 2 F � Set gradient w.r.t. W equal to zero: � AA T is symmetric, hence has a complete set of orthogonormal eigenvectors, say � 2 A T U + 2 W = 0 columns of X, and eigenvalues along the diagonal of M (sorted in decreasing order): W = A T U AA T = XMX T cs542g-term1-2006 9 cs542g-term1-2006 10 Finding U cont’d Back to W � Our problem is now: � We can write W=V � T for an orthogonal V, and maxtr U T XMX T U square kxk � � Note X and U are both orthogonal, so is X T U, � Same argument as for U gives that V should be Z T Z = I tr Z T MZ which we can call Z: the first k eigenvectors of A T A max � What is � ? k m � � � From earlier rank-1 case we know � max µ j Z ji 2 � 11 = � = A 2 = A T Z T Z = I i = 1 j = 1 2 � Since U *1 and V *1 are unit vectors that achieve � Simplest solution: set Z=( I 0) T which means that the 2-norm of A T and A, we can derive that first U is the first k columns of X row and column of � is zero except for diagonal. (first k eigenvectors of AA T ) cs542g-term1-2006 11 cs542g-term1-2006 12

  3. What is � The Singular Value Decomposition T from A � Going all the way to k=m (or n) we get the � Subtract rank-1 matrix U *1 � 11 V *1 Singular Value Decomposition (SVD) of A • zeros matching eigenvalue of A T A or AA T � A=U � V T � Then we can understand the next part of � � The diagonal entries of � are called the singular values � End up with � a diagonal matrix, � The columns of U (eigenvectors of AA T ) are the containing the squareroots of the first k left singular vectors eigenvalues of AA T or A T A (they � re equal) � The columns of V (eigenvectors of A T A) are the right singular vectors � Gives a formula for A as a sum of rank-1 � matrices: A = � i u i v i T i cs542g-term1-2006 13 cs542g-term1-2006 14 Cool things about the SVD Least Squares with SVD A 2 = � 1 � 2-norm: � Define pseudo-inverse for a general A: 2 = � 1 2 + � + � n 2 A F � Frobenius norm: T A + = V � + U T = n v i u i � � Rank(A)= # nonzero singular values � i • Can make a sensible numerical estimate i = 1 � Null(A) spanned by columns of U for zero � i > 0 singular values � Note if A T A is invertible, A + =(A T A) -1 A T � Range(A) spanned by columns of V for nonzero • I.e. solves the least squares problem] singular values A � 1 = V � � 1 U T � For invertible A: � If A T A is singular, pseudo-inverse defined: A + b is the x that minimizes ||b-Ax|| 2 and of n T v i u i � = all those that do so, has smallest ||x|| 2 � i i = 1 cs542g-term1-2006 15 cs542g-term1-2006 16 Solving Eigenproblems The Symmetric Eigenproblem � Computing the SVD is another matter! � Assume A is symmetric and real � We can get U and V by solving the symmetric � Find orthogonal matrix V and diagonal matrix D eigenproblem for AA T or A T A, but more s.t. AV=VD specialized methods are more accurate • Diagonal entries of D are the eigenvalues, corresponding columns of V are the eigenvectors � The unsymmetric eigenproblem is another � Also put: A=VDV T or V T AV=D related computation, with complications: • May involve complex numbers even if A is real � There are a few strategies • If A is not normal (AA T � A T A), it doesn � t have a full • More if you only care about a few eigenpairs, not the basis of eigenvectors complete set… • Eigenvectors may not be orthogonal… Schur decomp � Also: finding eigenvalues of an nxn matrix is � Generalized problem: Ax= � Bx equivalent to solving a degree n polynomial • No “analytic” solution in general for n � 5 � LAPACK provides routines for all these • Thus general algorithms are iterative � We � ll examine symmetric problem in more detail cs542g-term1-2006 17 cs542g-term1-2006 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend