principal component analysis
play

Principal component analysis Ingo Blechschmidt December 17th, 2014 - PowerPoint PPT Presentation

Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12 Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine


  1. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  2. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  3. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  4. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  5. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  6. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  7. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  8. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  9. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  10. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  11. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  12. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  13. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  14. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  15. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  16. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  17. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  18. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  19. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  20. Theory Applications Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG Principal component analysis 1 / 12

  21. Theory Applications Outline 1 Theory Singular value decomposition Pseudoinverses Low-rank approximation 2 Applications Image compression Proper orthogonal decomposition Principal component analysis Eigenfaces Digit recognition Kleine Bayessche AG Principal component analysis 2 / 12

  22. Theory Applications SVD Pseudoinverses Low-rank approximation Singular value decomposition Let A ∈ R n × m . Then there exist numbers σ 1 ≥ σ 2 ≥ · · · ≥ σ m ≥ 0 , an orthonormal basis v 1 , . . . , v m of R m , and an orthonormal basis w 1 , . . . , w n of R n , such that A v i = σ i w i , i = 1 , . . . , m . In matrix language: A = W Σ V t , V = ( v 1 | . . . | v m ) ∈ R m × m orthogonal , where W = ( w 1 | . . . | w n ) ∈ R n × n orthogonal , Σ = diag ( σ 1 , . . . , σ m ) ∈ R n × m . Kleine Bayessche AG Principal component analysis 3 / 12

  23. • The singular value decomposition (SVD) exists for any real matrix, even rectangular ones. • The singular values σ i are unique. • The basis vectors are not unique. • If A is orthogonally diagonalizable with eigenvalues λ i (for instance, if A is symmetric), then σ i = | λ i | . �� �� � ij A 2 i σ 2 • � A � Frobenius = ij = tr( A t A ) = i . • There exists a generalization to complex matrices. In this case, the matrix A can be decomposed as W Σ V ⋆ , where V ⋆ is the complex conjugate of V t and W and V are unitary matrices. • The singular value decomposition can also be formulated in a basis-free manner as a result about linear maps between finite-dimensional Hilbert spaces.

  24. Existence proof (sketch): 1. Consider the eigenvalue decomposition of the symmetric and positive-semidefinite matrix A t A : We have an orthonor- mal basis v i of eigenvectors corresponding to eigenvalues λ i . 2. Set σ i := √ λ i . 3. Set w i := 1 σ i A v i (for those i with λ i � = 0 ). 4. Then A v i = σ i w i holds trivially. σ i σ j ( A t A v i , v j ) = λ i δ ij 1 5. The w i are orthonormal: ( w i , w j ) = σ i σ j . 6. If necessary, extend the w i to an orthonormal basis. This proof gives rise to an algorithm for calculating the SVD, but unless A t A is small, it has undesirable numerical properties. (But note that one can also use AA t !) Since the 1960ies, there exists a stable iterative algorithm by Golub and van Loan.

  25. Theory Applications SVD Pseudoinverses Low-rank approximation The pseudoinverse of a matrix Let A ∈ R n × m and b ∈ R n . Then the solutions to the optimization problem � A x − b � 2 − → min under x ∈ R m are given by � 0 � x = A + b + V , ⋆ where A = W Σ V t is the SVD and A + = W Σ + V t , Σ + = diag ( σ − 1 1 , . . . , σ − 1 m ) . Kleine Bayessche AG Principal component analysis 4 / 12

  26. • In the formula for Σ + , set 0 − 1 := 0 . • If A happens to be invertible, then A + = A − 1 . • The pseudoinverse can be used for polynomial approxima- tion: Let data points ( x i , y i ) ∈ R 2 , 1 ≤ i ≤ N , be given. Want to find a polynomial p ( z ) = � n k =0 α i z i , n ≪ N , such that N | p ( x i ) − y i | 2 − � → min. i =1 In matrix language, this problem is written � A u − y � 2 − → min where u = ( α 0 , . . . , α N ) T ∈ R n +1 and x 2  x n    1 x 1 · · · y 1 1 1 x 2 x n 1 · · · x 2 y 2     2 2  ∈ R N × ( n +1) ,  ∈ R N . A = y =  . . . .   .  ... . . . . .     . . . . .   x 2 x n 1 · · · x N y N N N

  27. Theory Applications SVD Pseudoinverses Low-rank approximation Low-rank approximation Let A = W Σ V t ∈ R n × m and 1 ≤ r ≤ n , m . Then a solution to the optimization problem � A − M � Frobenius − → min under all matrices M with rank M ≤ r is given by M = W Σ r V t , where Σ r = diag ( σ 1 , . . . , σ r , 0 , . . . , 0) . The approximation error is � � A − W Σ r V t � F = σ 2 r +1 + · · · + σ 2 m . Kleine Bayessche AG Principal component analysis 5 / 12

  28. • This is the Eckart–Young(–Mirsky) theorem. • Beware of false and incomplete proofs in the literature!

  29. Theory Applications Image compression POD PCA Eigenfaces Digit recognition Image compression Think of images as matrices. Substitute a matrix W Σ V t by W Σ r V t with r small. To reconstruct W Σ r V t , only need to know the r singular values σ 1 , . . . , σ r , r the first r columns of W , and height · r the top r rows of V t . width · r Total amount: r · (1 + height + weight ) ≪ height · width Kleine Bayessche AG Principal component analysis 6 / 12

  30. • See http://speicherleck.de/iblech/stuff/pca-images. pdf for sample compressions and http://pizzaseminar. speicherleck.de/skript4/08-principal-component-analysis/ svd-image.py for the Python code producing theses im- ages. • Image compression by singular value decomposition is mostly of academic interest only. • This might be for the following reasons: other compression algorithms have more efficient implementations; other al- gorithms taylor to the specific properties of human vision; the basis vectors of other approaches (for instance, DCT) are similar to the most important singular basis vectors of a sufficiently large corpus of images. • See http://dsp.stackexchange.com/questions/7859/relationship-between-dct-and-pca .

  31. Theory Applications Image compression POD PCA Eigenfaces Digit recognition Proper orthogonal decomposition Given data points x i ∈ R N , want to find a low-dimensional linear subspace which approximately contains the x i . Minimize � � x i − P U ( x i ) � 2 J ( U ) := i under all r -dimensional subspaces U ⊆ R N , r ≪ N , where P U : R N → R N is the orthogonal projection onto U . Kleine Bayessche AG Principal component analysis 7 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend