PCA CS 446 Supervised learning So far, weve done supervised - PowerPoint PPT Presentation

PCA CS 446

Supervised learning So far, we’ve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) ≈ y i . k -nn, decision trees, . . . 1 / 18

Supervised learning So far, we’ve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) ≈ y i . k -nn, decision trees, . . . Most methods used (regularized) ERM: � n minimize � R ( f ) = 1 i =1 ℓ ( f ( x i ) , y i ) , hope R is small. n least squares, logistic regression, deep networks, SVM, perceptron, . . . 1 / 18

Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? 2 / 18

Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? ◮ Encoding data in some compact representation (and decoding this). ◮ Data analysis; recovering “hidden structure” in data (e.g., recovering cliques or clusters). ◮ Features for supervised learning. ◮ . . . ? 2 / 18

Un supervised learning Now we only receive ( x i ) n i =1 , and the goal is. . . ? ◮ Encoding data in some compact representation (and decoding this). ◮ Data analysis; recovering “hidden structure” in data (e.g., recovering cliques or clusters). ◮ Features for supervised learning. ◮ . . . ? The task is less clear-cut. In 2019 we still have people trying to formalize it! 2 / 18

1. PCA (Principal Component Analysis)

PCA motivation Let’s formulate a simplistic linear unsupervised method. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. ◮ Features for supervised learning. 3 / 18

PCA motivation Let’s formulate a simplistic linear unsupervised method. ◮ Encoding (and decoding) data in some compact representation. Let’s linearly map data in R d to R k and back. ◮ Data analysis; recovering “hidden structure” in data. Let’s find if data mostly lies on a low-dimensional subspace. ◮ Features for supervised learning. Let’s feed the R k -dimensional encoding to supervised methods. 3 / 18

SVD reminder 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 First part of U , V span the col / row space (respectively), second part the left / right nullspaces (respectively). 4 / 18

SVD reminder 1. SV triples: ( s, u , v ) satisfies Mv = s u , and M T u = s v . 2. Thin decomposition SVD: M = � r i =1 s i u i v T i . 3. Full factorization SVD: M = USV T . 4. “Operational” view of SVD: for M ∈ R n × d ,     s 1 0   ⊤   ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ...       0   u 1 · · · u r u r +1 · · · u n  · · v 1 · · · v r v r +1 · · · v d .      0 s r   ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 0 0 First part of U , V span the col / row space (respectively), second part the left / right nullspaces (respectively). New: let ( U k , S k , V k ) denote the truncated SVD with U k ∈ R d × k (first k columns of U ), similarly for the others. 4 / 18

PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . 5 / 18

PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . The goal in unsupervised learning is unclear. We’ll try to define this as “best encoding/decoding in Frobenius sense”: � � 2 � � � X − XED � 2 T min F = � X − XV k V F . � k D ∈ R k × d E ∈ R d × k 5 / 18

PCA (Principal component analysis) Input: Data as rows of R n × d ∋ X = USV T , integer k . Output: Encoder V k , decoder V T k , encoded data XV k = U k S k . The goal in unsupervised learning is unclear. We’ll try to define this as “best encoding/decoding in Frobenius sense”: � � 2 � � � X − XED � 2 T min F = � X − XV k V F . � k D ∈ R k × d E ∈ R d × k Note V k V T k performs orthogonal projection onto subspace spanned by V k ; thus we are finding “best k -dimensional projection of the data”. 5 / 18

PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 6 / 18

PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. 6 / 18

PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. Remark 2. As written, this is not a convex optimization problem! 6 / 18

PCA properties Theorem. Let X ∈ R n × d with SVD X = USV T and integer k ≤ r be given. � T � 2 � � � X − XED � 2 min F = min � X − XDD � D ∈ R k × d D ∈ R d × k F E ∈ R d × k D T D = I � � r � 2 � � s 2 T = � X − XV k V F = i . � k i = k +1 Additionally, � T � 2 � � F = � X � 2 � XD � 2 min � X − XDD F − max � F D ∈ R d × k D ∈ R d × k D T D = I D T D = I k � = � X � 2 F −� XV k � 2 F = � X � 2 s 2 F − i . i =1 Remark 1. SVD is not unique, but � r i =1 s 2 i is identical across SVD choices. Remark 2. As written, this is not a convex optimization problem! Remark 3. The second form is interesting. . . 6 / 18

Centered PCA Some treatments replace X with X − 1 µ T , � with mean µ = 1 i =1 x i . n 7 / 18

PCA CS 446 Supervised learning So far, weve done supervised - PowerPoint PPT Presentation

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) y i . k -nn, decision trees, . . . 1 / 18 Supervised learning So far, weve done supervised learning: Given (( x i , y

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

The Zen of PCA, t-SNE, and Autoencoders http://mit6874.github.io 1 Today: Gene Expression, PCA,

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Research Update Examining PCa Disparities Globally Rotimi Nettey PGY-6 May 2020 Prostate

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Lecture 2. Random Matrix Theory and Phase Transitions of PCA Yuan Yao Hong Kong University of

PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and Black, ECCV 2012 PCA

Z 1 = a 11 X 1 + a 12 X 2 + + a 1n X n Coefficients for linear model 2 + a 12 2 + + a 1n 2

Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

3D Geometry for Computer Graphics Lesson 2: PCA & SVD Last week - eigendecomposition We

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Principal Component Analysis for CRM Data Verena Pflieger Data Scientist at INWT Statistics

PCA CS 446 Supervised learning So far, weve done supervised - PowerPoint PPT Presentation

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) , find f with f ( x i ) y i . k -nn, decision trees, . . . 1 / 18 Supervised learning So far, weve done supervised learning: Given (( x i , y

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

The Zen of PCA, t-SNE, and Autoencoders http://mit6874.github.io 1 Today: Gene Expression, PCA,

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Research Update Examining PCa Disparities Globally Rotimi Nettey PGY-6 May 2020 Prostate

PCA and Admixture proportions for low depth NGS data Anders Albrechtsen Admixture model

PCA and admixture proportions for NGS data Anders Albrechtsen Admixture model NGSadmix

Lecture 24: Principal Component Analysis Aykut Erdem January 2017 Hacettepe University This

Lecture 2. Random Matrix Theory and Phase Transitions of PCA Yuan Yao Hong Kong University of

PCA applied to bodies e 1 e 2 e 3 e 4 e 5 +4 4 Freifeld and Black, ECCV 2012 PCA

Z 1 = a 11 X 1 + a 12 X 2 + + a 1n X n Coefficients for linear model 2 + a 12 2 + + a 1n 2

Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

3D Geometry for Computer Graphics Lesson 2: PCA &amp; SVD Last week - eigendecomposition We

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Principal Component Analysis for CRM Data Verena Pflieger Data Scientist at INWT Statistics

3D Geometry for Computer Graphics Lesson 2: PCA & SVD Last week - eigendecomposition We