pca
play

PCA CS 446 Supervised learning So far, weve done supervised - PowerPoint PPT Presentation

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) y i . k -nn, decision trees, . . . 1 / 18 Supervised learning So far, weve done supervised learning: Given (( x i , y


  1. PCA CS 446

  2. Supervised learning So far, we’ve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) ≈ y i . k -nn, decision trees, . . . 1 / 18

  3. Supervised learning So far, we’ve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) ≈ y i . k -nn, decision trees, . . . Most methods used (regularized) ERM: � n minimize � R ( f ) = 1 i =1 ℓ ( f ( x i ) , y i ) , hope R is small. n least squares, logistic regression, deep networks, SVM, perceptron, . . . 1 / 18

  4. Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? 2 / 18

  5. Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? ◮ Encoding data in some compact representation (and decoding this). ◮ Data analysis; recovering “hidden structure” in data (e.g., recovering cliques or clusters). ◮ Features for supervised learning. ◮ . . . ? 2 / 18

  6. Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? ◮ Encoding data in some compact representation (and decoding this). ◮ Data analysis; recovering “hidden structure” in data (e.g., recovering cliques or clusters). ◮ Features for supervised learning. ◮ . . . ? The task is less clear-cut. In 2019 we still have people trying to formalize it! 2 / 18

  7. 1. PCA (Principal Component Analysis)

  8. PCA motivation Let’s formulate a simplistic linear unsupervised method. 3 / 18

  9. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. 3 / 18

  10. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. 3 / 18

  11. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. 3 / 18

  12. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. 3 / 18

  13. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. ◮ Features for supervised learning. 3 / 18

  14. PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. ◮ Features for supervised learning. Let’s feed the R k -dimensional encoding to supervised methods. 3 / 18

  15. SVD reminder 4 / 18

  16. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 4 / 18

  17. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 4 / 18

  18. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4 / 18

  19. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 4 / 18

  20. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 First part of U , V span the col / row space (respectively), second part the left / right nullspaces (respectively). 4 / 18

  21. SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 First part of U , V span the col / row space (respectively), second part the left / right nullspaces (respectively). New: let ( U k , S k , V k ) denote the truncated SVD with U k ∈ R d × k (first k columns of U ), similarly for the others. 4 / 18

  22. PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . 5 / 18

  23. PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . The goal in unsupervised learning is unclear. We’ll try to define this as “best encoding/decoding in Frobenius sense”: � � 2 � � � X − XED � 2 T min F = � X − XV k V F . � k D ∈ R k × d E ∈ R d × k 5 / 18

  24. PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . The goal in unsupervised learning is unclear. We’ll try to define this as “best encoding/decoding in Frobenius sense”: � � 2 � � � X − XED � 2 T min F = � X − XV k V F . � k D ∈ R k × d E ∈ R d × k Note V k V T k performs orthogonal projection onto subspace spanned by V k ; thus we are finding “best k -dimensional projection of the data”. 5 / 18

  25. PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 6 / 18

  26. PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. 6 / 18

  27. PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. Remark 2. As written, this is not a convex optimization problem! 6 / 18

  28. PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. Remark 2. As written, this is not a convex optimization problem! Remark 3. The second form is interesting. . . 6 / 18

  29. Centered PCA Some treatments replace X with X − 1 µ T , � with mean µ = 1 i =1 x i . n 7 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend